PyPI - overlapindex - Versions diffs - 0.1.0__tar.gz - Mend

overlapindex 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

overlapindex-0.1.0/LICENSE +7 -0
overlapindex-0.1.0/PKG-INFO +158 -0
overlapindex-0.1.0/README.md +132 -0
overlapindex-0.1.0/overlapindex/OverlapIndex.py +185 -0
overlapindex-0.1.0/overlapindex/__init__.py +3 -0
overlapindex-0.1.0/pyproject.toml +22 -0

overlapindex-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,7 @@
+Copyright (c) 2026 Niklas M. Melton
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

overlapindex-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,158 @@
+Metadata-Version: 2.4
+Name: overlapindex
+Version: 0.1.0
+Summary: OverlapIndex (OI), an Incremental Cluster Validity index for identifying the degree of overlap of data classes.
+License: MIT
+License-File: LICENSE
+Keywords: incremental cluster validity,cluster validity,ART,machine learning,transfer learning,clustering
+Author: Niklas M. Melton
+Author-email: niklasmelton@gmail.com
+Requires-Python: >=3.9
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Requires-Dist: artlib (>=0.1.7,<0.2.0)
+Requires-Dist: numpy (>=2.4.1,<3.0.0)
+Project-URL: Documentation, https://github.com/NiklasMelton/OverlapIndex
+Project-URL: Homepage, https://github.com/NiklasMelton/OverlapIndex
+Project-URL: Repository, https://github.com/NiklasMelton/OverlapIndex
+Description-Content-Type: text/markdown
+# OverlapIndex (OI)
+This package provides an implementation of the **Overlap Index (OI)**, an *incremental cluster validity index (iCVI)* designed to quantify the degree of overlap between data classes or clusters. The OI is updated online, sample by sample or in batches, and is particularly suited for streaming, continual learning, and representation analysis.
+The implementation is built on **ARTMAP-based clustering** (Fuzzy ART or Hypersphere
+ART), leveraging the dynamic clustering properties of Adaptive Resonance Theory to
+track class overlap as new data (and classes) arrive.
+---
+## Overview
+The Overlap Index is bounded in the interval **[0, 1]** and has the following interpretation:
+- **OI = 1.0**
+  Indicates perfect class separation (no overlap).
+- **OI = 0.5**
+  Indicates complete overlap between classes.
+- **OI < 0.5**
+  Indicates a degenerate or pathological case in the data distribution.
+The index is computed incrementally by tracking shared cluster activations between pairs of classes and aggregating class-wise overlap into a global measure.
+---
+## Key Properties
+- **Incremental / Online**
+  Supports streaming updates via `add_sample` and mini-batch updates via `add_batch`.
+  New classes can be introduced at any time, enabling analysis of incremental
+  learning scenarios.
+- **Label-Aware**
+  Can be applied both to labeled raw data and to intermediate representations (e.g., neural network activations).
+- **Geometry-Agnostic**
+  Works well on arbitrary geometric structures of data. No geometric constraints are
+  assumed.
+---
+## Typical Use Cases
+The Overlap Index can be used in several settings:
+- **Unsupervised clustering evaluation**
+  As an iCVI, OI provides insight into the quality of a clustering partition as it evolves over time.
+- **Class separability analysis**
+  Measures the degree of overlap in labeled datasets without requiring a classifier.
+- **Representation monitoring in deep learning**
+  Tracks how class separation changes across layers or training epochs.
+- **Backbone evaluation for transfer learning**
+  Compares feature extractors, where higher OI values indicate better class separation in the learned representation.
+---
+## Implementation Notes
+- ART-based clustering is performed using `artlib`’s `FuzzyARTMAP` or `HypersphereARTMAP`.
+- Inputs are **complement coded**, following standard ART practice.
+- Overlap is estimated by monitoring shared best-matching units (BMUs) between class pairs.
+- The global OI is computed as the mean of per-class minimum pairwise overlap scores.
+---
+## Basic Usage
+    from overlap_index import OverlapIndex
+    oi = OverlapIndex(
+        rho=0.9,
+        ART="Hypersphere",
+        match_tracking="MT+"
+    )
+    # Incremental update
+    for x, y in stream:
+        score = oi.add_sample(x, y)
+    # Or batch update
+    score = oi.add_batch(X, Y)
+The returned value is the current Overlap Index after the update.
+---
+## Parameters
+- `rho` *(float)*
+  Vigilance parameter controlling cluster granularity.
+- `r_hat` *(float, Hypersphere ART only)*
+  Maximum cluster radius.
+- `ART` *("Fuzzy" | "Hypersphere")*
+  Choice of ART module.
+- `match_tracking` *(str)*
+  Match-tracking strategy used during ARTMAP learning.
+The default parameters are likely to satisfy most use cases. For very large datasets,
+it may be necessary to use smaller `rho` values (0.5-0.7) to improve run-time
+performance.
+---
+## Output
+- **`index`**
+  Global Overlap Index across all observed classes.
+- **`singleton_index[y]`**
+  Minimum pairwise overlap score for class `y`.
+- **`pairwise_index[(y, b)]`**
+  Pairwise overlap score between classes `y` and `b`.
+---
+## Intended Audience
+This package is intended for researchers and practitioners working on:
+- incremental and continual learning,
+- clustering validation,
+- representation learning,
+- transfer learning

overlapindex-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,132 @@
+# OverlapIndex (OI)
+This package provides an implementation of the **Overlap Index (OI)**, an *incremental cluster validity index (iCVI)* designed to quantify the degree of overlap between data classes or clusters. The OI is updated online, sample by sample or in batches, and is particularly suited for streaming, continual learning, and representation analysis.
+The implementation is built on **ARTMAP-based clustering** (Fuzzy ART or Hypersphere
+ART), leveraging the dynamic clustering properties of Adaptive Resonance Theory to
+track class overlap as new data (and classes) arrive.
+---
+## Overview
+The Overlap Index is bounded in the interval **[0, 1]** and has the following interpretation:
+- **OI = 1.0**
+  Indicates perfect class separation (no overlap).
+- **OI = 0.5**
+  Indicates complete overlap between classes.
+- **OI < 0.5**
+  Indicates a degenerate or pathological case in the data distribution.
+The index is computed incrementally by tracking shared cluster activations between pairs of classes and aggregating class-wise overlap into a global measure.
+---
+## Key Properties
+- **Incremental / Online**
+  Supports streaming updates via `add_sample` and mini-batch updates via `add_batch`.
+  New classes can be introduced at any time, enabling analysis of incremental
+  learning scenarios.
+- **Label-Aware**
+  Can be applied both to labeled raw data and to intermediate representations (e.g., neural network activations).
+- **Geometry-Agnostic**
+  Works well on arbitrary geometric structures of data. No geometric constraints are
+  assumed.
+---
+## Typical Use Cases
+The Overlap Index can be used in several settings:
+- **Unsupervised clustering evaluation**
+  As an iCVI, OI provides insight into the quality of a clustering partition as it evolves over time.
+- **Class separability analysis**
+  Measures the degree of overlap in labeled datasets without requiring a classifier.
+- **Representation monitoring in deep learning**
+  Tracks how class separation changes across layers or training epochs.
+- **Backbone evaluation for transfer learning**
+  Compares feature extractors, where higher OI values indicate better class separation in the learned representation.
+---
+## Implementation Notes
+- ART-based clustering is performed using `artlib`’s `FuzzyARTMAP` or `HypersphereARTMAP`.
+- Inputs are **complement coded**, following standard ART practice.
+- Overlap is estimated by monitoring shared best-matching units (BMUs) between class pairs.
+- The global OI is computed as the mean of per-class minimum pairwise overlap scores.
+---
+## Basic Usage
+    from overlap_index import OverlapIndex
+    oi = OverlapIndex(
+        rho=0.9,
+        ART="Hypersphere",
+        match_tracking="MT+"
+    )
+    # Incremental update
+    for x, y in stream:
+        score = oi.add_sample(x, y)
+    # Or batch update
+    score = oi.add_batch(X, Y)
+The returned value is the current Overlap Index after the update.
+---
+## Parameters
+- `rho` *(float)*
+  Vigilance parameter controlling cluster granularity.
+- `r_hat` *(float, Hypersphere ART only)*
+  Maximum cluster radius.
+- `ART` *("Fuzzy" | "Hypersphere")*
+  Choice of ART module.
+- `match_tracking` *(str)*
+  Match-tracking strategy used during ARTMAP learning.
+The default parameters are likely to satisfy most use cases. For very large datasets,
+it may be necessary to use smaller `rho` values (0.5-0.7) to improve run-time
+performance.
+---
+## Output
+- **`index`**
+  Global Overlap Index across all observed classes.
+- **`singleton_index[y]`**
+  Minimum pairwise overlap score for class `y`.
+- **`pairwise_index[(y, b)]`**
+  Pairwise overlap score between classes `y` and `b`.
+---
+## Intended Audience
+This package is intended for researchers and practitioners working on:
+- incremental and continual learning,
+- clustering validation,
+- representation learning,
+- transfer learning

overlapindex-0.1.0/overlapindex/OverlapIndex.py ADDED Viewed

@@ -0,0 +1,185 @@
+import numpy as np
+from artlib import HypersphereARTMAP, FuzzyARTMAP, complement_code
+from typing import Literal
+from collections import defaultdict
+class GrowingArray1D:
+    def __init__(self, dtype=int):
+        self.array = np.zeros(0, dtype=dtype)
+    def _ensure_size(self, i):
+        if i >= self.array.size:
+            new_size = i + 1
+            new_array = np.zeros(new_size, dtype=self.array.dtype)
+            new_array[:self.array.size] = self.array
+            self.array = new_array
+    def __getitem__(self, i):
+        self._ensure_size(i)
+        return self.array[i]
+    def __setitem__(self, i, value):
+        self._ensure_size(i)
+        self.array[i] = value
+    def __iadd__(self, idx_value):
+        i, value = idx_value
+        self._ensure_size(i)
+        self.array[i] += value
+        return self
+    def __len__(self):
+        return len(self.array)
+    def __repr__(self):
+        return repr(self.array)
+    def asarray(self):
+        return self.array.copy()
+    def __iter__(self):
+        # iterate over the *current* contents only
+        for v in self.array:
+            yield v
+def top_two_indices_against_others(T, classes, class_to_clusters, a):
+    T = np.asarray(T)
+    result = {}
+    clusters_a = class_to_clusters.get(a, set())
+    for b in classes:
+        if b == a:
+            continue
+        clusters_b = class_to_clusters.get(b, set())
+        cluster_indices = list(clusters_a | clusters_b)
+        if len(cluster_indices) == 0:
+            top2 = ()
+        elif len(cluster_indices) == 1:
+            top2 = (cluster_indices[0],)
+        else:
+            values = T[cluster_indices]
+            top2_rel = np.argpartition(values, -2)[-2:]
+            top2_sorted = top2_rel[np.argsort(values[top2_rel])[::-1]]
+            top2 = tuple(cluster_indices[i] for i in top2_sorted)
+        result[b] = top2
+    return result
+class OverlapIndex:
+    def __init__(
+            self,
+            rho: float = 0.9,
+            r_hat: float = np.inf,
+            ART: Literal["Fuzzy", "Hypersphere"] = "Fuzzy",
+            match_tracking="MT+",
+    ):
+        assert ART in ["Fuzzy", "Hypersphere"]
+        if ART == "Fuzzy":
+            self.ARTMAP = FuzzyARTMAP(rho=rho, alpha=1e-10, beta=1.0)
+        else:
+            self.ARTMAP = HypersphereARTMAP(rho=rho, alpha=1e-10, beta=1.0, r_hat=r_hat)
+        self.ART = ART
+        self.sparse_adj = defaultdict(lambda: 0)
+        self.cluster_cardinality = GrowingArray1D()
+        self.rev_map = defaultdict(set)
+        self.pairwise_index = defaultdict(lambda: 1.0)
+        self.singleton_index = defaultdict(lambda: 1.0)
+        self.index = 1.0
+        self.match_tracking = match_tracking
+    @property
+    def module_a(self):
+        return self.ARTMAP.module_a
+    @property
+    def map(self):
+        return self.ARTMAP.map
+    def predict_subset_pairs(self, x, y):
+        assert len(self.module_a.W) >= 0, "ART module is not fit."
+        T, _ = zip(*[
+            self.module_a.category_choice(x, w, params=self.module_a.params)
+            for w in self.module_a.W
+        ])
+        classes = list(self.rev_map.keys())
+        top2bmu = top_two_indices_against_others(T, classes, self.rev_map, y)
+        return top2bmu
+    def add_sample(self, x, y):
+        x_prep = complement_code([x])
+        self.ARTMAP = self.ARTMAP.partial_fit(x_prep, [y],
+                                              match_tracking=self.match_tracking)
+        bmu1 = self.ARTMAP.module_a.labels_[-1]
+        self.rev_map[y].add(bmu1)
+        self.cluster_cardinality[y] += 1
+        top2bmu = self.predict_subset_pairs(x_prep, y)
+        if y not in self.singleton_index:
+            self.singleton_index[y] = 1.0
+        for b in self.rev_map.keys():
+            bmu2 = int(bmu1)
+            if b != y:
+                if len(top2bmu[b]) > 1:
+                    bmu2_, bmu3_ = top2bmu[b]
+                    if bmu2_ == bmu1:
+                        bmu2 = bmu3_
+                    else:
+                        bmu2 = bmu2_
+                if bmu2 in self.rev_map[b]:
+                    self.sparse_adj[(y, b)] += 1
+                self.pairwise_index[(y, b)] = 1. - (
+                        float(self.sparse_adj[(y, b)]) /
+                        float(self.cluster_cardinality[y])
+                )
+        if len(self.rev_map) > 1:
+            self.singleton_index[y] = min(
+                [self.pairwise_index[(y, b)] for b in self.rev_map.keys() if b != y]
+            )
+            self.index = np.mean(list(self.singleton_index.values()))
+        return self.index
+    def add_batch(self, X, Y):
+        X_prep = complement_code(X)
+        self.ARTMAP = self.ARTMAP.partial_fit(X_prep, Y,
+                                              match_tracking=self.match_tracking)
+        BMU1 = self.ARTMAP.module_a.labels_[-len(Y):]
+        for x, y, bmu1 in zip(X_prep, Y, BMU1):
+            self.rev_map[y].add(bmu1)
+            if y not in self.singleton_index:
+                self.singleton_index[y] = 1.0
+            self.cluster_cardinality[y] += 1
+            top2bmu = self.predict_subset_pairs(x, y)  # eq 1 & 2
+            for b in self.rev_map.keys():
+                bmu2 = int(bmu1)
+                if b != y:
+                    if len(top2bmu[b]) > 1:
+                        bmu2_, bmu3_ = top2bmu[b]
+                        if bmu2_ == bmu1:
+                            bmu2 = bmu3_
+                        else:
+                            bmu2 = bmu2_
+                    if bmu2 in self.rev_map[b]:
+                        self.sparse_adj[(y, b)] += 1  # eq 3
+                    self.pairwise_index[(y, b)] = 1. - (
+                            float(self.sparse_adj[(y, b)]) /
+                            float(self.cluster_cardinality[y])
+                    )  # eq 4
+        unique_y = np.unique(Y)
+        if len(self.rev_map) > 1:
+            for y in unique_y:
+                self.singleton_index[y] = min(
+                    [self.pairwise_index[(y, b)] for b in self.rev_map.keys() if b != y]
+                )  # eq 5
+            self.index = np.mean(list(self.singleton_index.values()))  # eq 6
+        return self.index

overlapindex-0.1.0/overlapindex/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from .OverlapIndex import OverlapIndex
+__all__ = ["OverlapIndex"]

overlapindex-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,22 @@
+[tool.poetry]
+name = "overlapindex"
+version = "0.1.0"
+description = "OverlapIndex (OI), an Incremental Cluster Validity index for identifying the degree of overlap of data classes."
+authors = ["Niklas M. Melton <niklasmelton@gmail.com>"]
+license = "MIT"
+readme = "README.md"
+homepage = "https://github.com/NiklasMelton/OverlapIndex"
+repository = "https://github.com/NiklasMelton/OverlapIndex"
+documentation = "https://github.com/NiklasMelton/OverlapIndex"
+keywords = ["incremental cluster validity", "cluster validity", "ART", "machine learning", "transfer learning", "clustering"]
+packages = [{ include = "overlapindex" }]
+[tool.poetry.dependencies]
+python = ">=3.9"
+artlib = ">=0.1.7,<0.2.0"
+numpy = ">=2.4.1,<3.0.0"
+[build-system]
+requires = ["poetry-core>=2.0.0,<3.0.0"]
+build-backend = "poetry.core.masonry.api"