PyPI - coreLearn - Versions diffs - 0.1.0__tar.gz - Mend

coreLearn 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

corelearn-0.1.0/PKG-INFO +482 -0
corelearn-0.1.0/README.md +470 -0
corelearn-0.1.0/coreLearn/__init__.py +17 -0
corelearn-0.1.0/coreLearn/base.py +25 -0
corelearn-0.1.0/coreLearn/distances.py +125 -0
corelearn-0.1.0/coreLearn/evaluator.py +143 -0
corelearn-0.1.0/coreLearn/knn.py +210 -0
corelearn-0.1.0/coreLearn/linear_regression.py +205 -0
corelearn-0.1.0/coreLearn/tests/__init__.py +0 -0
corelearn-0.1.0/coreLearn/tests/test_distances.py +187 -0
corelearn-0.1.0/coreLearn/tests/test_evaluator.py +104 -0
corelearn-0.1.0/coreLearn/tests/test_knn.py +101 -0
corelearn-0.1.0/coreLearn/tests/test_linear_regression.py +154 -0
corelearn-0.1.0/coreLearn.egg-info/PKG-INFO +482 -0
corelearn-0.1.0/coreLearn.egg-info/SOURCES.txt +18 -0
corelearn-0.1.0/coreLearn.egg-info/dependency_links.txt +1 -0
corelearn-0.1.0/coreLearn.egg-info/requires.txt +6 -0
corelearn-0.1.0/coreLearn.egg-info/top_level.txt +1 -0
corelearn-0.1.0/pyproject.toml +18 -0
corelearn-0.1.0/setup.cfg +4 -0

corelearn-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,482 @@
+Metadata-Version: 2.4
+Name: coreLearn
+Version: 0.1.0
+Summary: Basic ML algorithms library built from scratch (KNN + Linear Regression)
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: numpy>=1.21
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+Requires-Dist: scikit-learn; extra == "dev"
+Requires-Dist: jupyter; extra == "dev"
+# CoreLearn
+A lightweight Python machine learning library built from scratch using only **NumPy**.
+Implements KNN classification and Linear Regression with a focus on **software design**, not just accuracy.
+---
+## Installation
+```bash
+# Clone or download the project, then from the coreLearn/ directory:
+pip install -e .
+# Install all dependencies (including dev tools):
+pip install -r requirements.txt
+```
+After installation, import from anywhere:
+```python
+from coreLearn import KNNClassifier, LinearRegression, Evaluator
+```
+---
+## Quick Start
+```python
+from coreLearn import KNNClassifier, LinearRegression, Evaluator, accuracy, mae
+# --- KNN Classification ---
+knn = KNNClassifier(k=5, distance="euclidean", n_jobs=2)
+knn.fit(X_train, y_train)
+predictions = knn.predict(X_test)
+print(accuracy(y_test, predictions))
+# --- Linear Regression ---
+lr = LinearRegression(strategy="normal")
+lr.fit(X_train, y_train)
+predictions = lr.predict(X_test)
+print(mae(y_test, predictions))
+# --- Evaluator ---
+print(Evaluator.evaluate_regression(y_test, predictions))
+# {'mae': ..., 'mse': ..., 'rmse': ...}
+print(Evaluator.evaluate_classification(y_test, knn_preds))
+# {'accuracy': ..., 'precision': ..., 'recall': ..., 'f1': ...}
+```
+---
+## Package Structure
+```
+coreLearn/
+├── __init__.py          ← Public API
+├── base.py              ← Abstract base class — Template Method Pattern
+├── distances.py         ← Distance metrics — Factory Pattern
+├── knn.py               ← KNN Classifier — Recursion + Concurrency + OOP
+├── linear_regression.py ← Linear Regression — Strategy Pattern + OOP
+├── evaluator.py         ← Metric engine — Functional Programming
+├── examples/
+│   ├── demo_notebook.ipynb
+│   ├── housing.csv
+│   └── penguin.csv
+└── tests/
+    ├── test_knn.py
+    ├── test_linear_regression.py
+    ├── test_distances.py
+    └── test_evaluator.py
+```
+---
+## Running Tests
+```bash
+cd coreLearn/
+pytest coreLearn/tests/ -v
+```
+---
+## Learning Outcomes
+### 1 — Object-Oriented Programming (OOP)
+**File:** `base.py`, `knn.py`, `linear_regression.py`, `distances.py`
+#### Abstract Base Class & Inheritance
+`BaseModel` is an abstract class that defines the contract every model must follow.
+`KNNClassifier` and `LinearRegression` both inherit from it:
+```python
+# base.py
+class BaseModel(ABC):
+    @abstractmethod
+    def fit(self, X, y) -> "BaseModel": ...
+    @abstractmethod
+    def predict(self, X) -> list: ...
+# knn.py
+class KNNClassifier(BaseModel):   # ← inheritance
+    def fit(self, X, y): ...
+    def predict(self, X): ...
+# linear_regression.py
+class LinearRegression(BaseModel):  # ← inheritance
+    def fit(self, X, y): ...
+    def predict(self, X): ...
+```
+#### Polymorphism
+Both models share the same interface — they can be used interchangeably:
+```python
+for model in [KNNClassifier(k=3), LinearRegression()]:
+    model.fit(X_train, y_train)   # same call, different behaviour
+    model.predict(X_test)         # same call, different behaviour
+```
+#### Encapsulation
+Internal state is hidden with `_` prefixes. Users interact only through the public API:
+```python
+# knn.py
+self._metric = DistanceMetricFactory.create(distance)  # private
+self._tree   = None                                     # private
+# linear_regression.py — controlled read access via properties
+@property
+def coef_(self) -> np.ndarray:
+    return self._weights[1:]
+@property
+def intercept_(self) -> float:
+    return float(self._weights[0])
+```
+`OptimizationStrategy`, `NormalEquationStrategy`, and `GradientDescentStrategy` inside
+`linear_regression.py` form an additional hierarchy demonstrating inheritance within the library.
+---
+### 2 — Functional Programming
+**File:** `evaluator.py`
+#### Functions as First-Class Objects
+Metric functions are stored in dictionaries as values and called dynamically:
+```python
+# evaluator.py
+_regression_metrics: dict[str, callable] = {
+    "mae":  mae,
+    "mse":  mse,
+    "rmse": rmse,
+}
+@classmethod
+def evaluate_regression(cls, y_true, y_pred) -> dict:
+    # applies every registered function — no if/elif chain
+    return {name: fn(y_true, y_pred) for name, fn in cls._regression_metrics.items()}
+```
+#### Higher-Order Function — `register()`
+`Evaluator.register()` accepts any callable and plugs it in at runtime.
+This is the classic higher-order function pattern: a function (or method) that takes another function as an argument.
+```python
+# Add a custom metric without modifying the Evaluator class
+Evaluator.register(
+    "max_error",
+    lambda y_true, y_pred: max(abs(a - b) for a, b in zip(y_true, y_pred)),
+    kind="regression",
+)
+result = Evaluator.evaluate_regression(y_test, y_pred)
+print(result["max_error"])   # available immediately
+```
+#### Pure Functions
+`mae`, `mse`, `rmse`, `accuracy`, `precision`, `recall`, `f1_score` are all pure functions:
+- No side effects
+- No mutation of inputs
+- Same inputs always produce the same output
+```python
+from coreLearn import mae, accuracy
+mae([1.0, 2.0, 3.0], [1.5, 2.5, 3.5])   # → 0.5  (always)
+accuracy([0, 1, 1], [0, 1, 0])           # → 0.666 (always)
+```
+---
+### 3 — Concurrency
+**File:** `knn.py` — `KNNClassifier.predict()`
+`KNNClassifier` uses `ProcessPoolExecutor` to classify test samples in parallel across
+multiple CPU processes. Unlike threads, each worker runs in its own process with its
+own GIL — enabling true CPU-bound parallelism.
+```python
+# knn.py
+def predict(self, X) -> list:
+    ...
+    if self.n_jobs == 1:
+        # sequential — no overhead for small datasets
+        return [self._predict_one(x) for x in samples]
+    # parallel — distribute samples across n_jobs worker processes
+    args = [(self._tree, x, self.k, self._metric) for x in samples]
+    with ProcessPoolExecutor(max_workers=self.n_jobs) as executor:
+        return list(executor.map(_predict_worker, args))
+```
+**Why no race conditions?**
+Each worker receives its own pickled copy of the KD-Tree and metric via `ProcessPoolExecutor`.
+No shared memory is used, so no synchronization primitives are needed.
+```python
+# n_jobs=1  → sequential (default, safe for notebooks)
+knn = KNNClassifier(k=5, n_jobs=1)
+# n_jobs=4  → 4 parallel worker processes
+knn = KNNClassifier(k=5, n_jobs=4)
+knn.fit(X_train, y_train)
+preds = knn.predict(X_test)
+```
+> **Note:** `ProcessPoolExecutor` requires the `if __name__ == "__main__":` guard on
+> Windows/macOS when used in scripts. The `n_jobs=1` default is safe everywhere.
+---
+### 4 — Recursion
+**File:** `knn.py` — `KDTree`
+The KD-Tree data structure is built and searched using **mutual recursion**.
+Both `_build` and `_search` call themselves with a strictly smaller subproblem each time.
+#### `_build` — Recursive Tree Construction
+**Base case:** empty data → return `None`.
+**Recursive case:** split on the median, call `_build` on each half with `depth + 1`.
+```python
+# knn.py
+def _build(self, data: list, depth: int):
+    if not data:          # ← base case
+        return None
+    axis = depth % len(data[0][0])
+    data.sort(key=lambda item: item[0][axis])
+    mid = len(data) // 2
+    return KDNode(
+        point = data[mid][0],
+        label = data[mid][1],
+        left  = self._build(data[:mid],     depth + 1),  # ← recursion
+        right = self._build(data[mid + 1:], depth + 1),  # ← recursion
+    )
+```
+#### `_search` — Recursive Nearest-Neighbour Search
+**Base case:** node is `None` → return.
+**Recursive case:** visit the near branch, then prune and optionally visit the far branch.
+```python
+# knn.py
+def _search(self, node, target, k, metric, depth, best):
+    if node is None:      # ← base case
+        return
+    dist = metric(target, node.point)
+    # update best list ...
+    self._search(near, target, k, metric, depth + 1, best)  # ← recursion
+    if len(best) < k or abs(diff) < best[-1][0]:
+        self._search(far, target, k, metric, depth + 1, best)  # ← recursion (pruned)
+```
+**Pruning:** the `abs(diff) < best[-1][0]` condition skips the far branch when it cannot
+contain a closer neighbour — achieving O(log n) average search complexity.
+---
+### 5 — SOLID Principles
+**Files:** all modules
+#### S — Single Responsibility
+Every class has exactly one reason to change:
+| Class | Sole Responsibility |
+|-------|-------------------|
+| `BaseModel` | Define the common model contract |
+| `KDTree` | Spatial nearest-neighbour search |
+| `KNNClassifier` | KNN classification logic |
+| `LinearRegression` | Linear regression logic |
+| `NormalEquationStrategy` | Closed-form weight computation |
+| `GradientDescentStrategy` | Iterative gradient-based weight computation |
+| `DistanceMetricFactory` | Instantiate distance metric objects by name |
+| `Evaluator` | Compute and manage evaluation metrics |
+#### O — Open/Closed
+Classes are open for extension, closed for modification.
+New metrics and distance functions can be added **without editing any existing class**:
+```python
+# Add a new metric — Evaluator source code untouched
+Evaluator.register("r2", lambda t, p: ..., kind="regression")
+# Add a new distance — KNNClassifier source code untouched
+DistanceMetricFactory.register("chebyshev", ChebyshevDistance)
+knn = KNNClassifier(k=3, distance="chebyshev")
+```
+#### L — Liskov Substitution
+Any `BaseModel` subclass can replace `BaseModel` without breaking callers:
+```python
+def train_and_score(model: BaseModel, X_train, y_train, X_test, y_test):
+    preds = model.fit_predict(X_train, y_train, X_test)
+    return accuracy(y_test, preds)
+train_and_score(KNNClassifier(k=3), ...)   # works
+train_and_score(LinearRegression(), ...)   # works
+```
+#### I — Interface Segregation
+`DistanceMetric` exposes only what is needed — a single `compute()` method.
+Implementors are not forced to implement anything they do not use:
+```python
+# distances.py
+class DistanceMetric(ABC):
+    @abstractmethod
+    def compute(self, a: list, b: list) -> float: ...
+    # nothing else required
+```
+#### D — Dependency Inversion
+`LinearRegression` depends on the **abstraction** `OptimizationStrategy`,
+not on any concrete strategy class:
+```python
+# linear_regression.py
+self._weights = self._strategy.fit(X_b, y)
+#               ↑ OptimizationStrategy interface — concrete class unknown here
+```
+---
+### 6 — Architectural & Design Patterns
+**Architecture:** Layered
+- **Core layer** (`base.py`, `distances.py`): abstractions and shared contracts
+- **Algorithm layer** (`knn.py`, `linear_regression.py`): concrete ML algorithms
+- **Evaluation layer** (`evaluator.py`): metric computation
+- **Public API** (`__init__.py`): single entry point, re-exports everything
+#### Pattern 1 — Template Method (`base.py`)
+`fit_predict` defines the fixed skeleton (fit → predict).
+Subclasses fill in each step without altering the sequence:
+```python
+# base.py
+def fit_predict(self, X_train, y_train, X_test) -> list:
+    self.fit(X_train, y_train)   # ← step 1: implemented by subclass
+    return self.predict(X_test)  # ← step 2: implemented by subclass
+```
+Every model gets `fit_predict` for free through inheritance.
+#### Pattern 2 — Strategy (`linear_regression.py`)
+The optimisation algorithm is swapped at construction time.
+`LinearRegression.fit()` never knows which concrete strategy it is using:
+```python
+lr_ne = LinearRegression(strategy="normal")           # uses NormalEquationStrategy
+lr_gd = LinearRegression(strategy="gradient_descent") # uses GradientDescentStrategy
+# Both models have the same interface — caller code is identical
+lr_ne.fit(X_train, y_train)
+lr_gd.fit(X_train, y_train)
+```
+To add a third optimiser (e.g. Adam), only a new `OptimizationStrategy` subclass is needed.
+#### Pattern 3 — Factory (`distances.py`)
+`DistanceMetricFactory` centralises object creation.
+`KNNClassifier` never imports `EuclideanDistance` or `ManhattanDistance` directly:
+```python
+# distances.py
+class DistanceMetricFactory:
+    _registry = {"euclidean": EuclideanDistance, "manhattan": ManhattanDistance}
+    @classmethod
+    def create(cls, name: str) -> DistanceMetric:
+        return cls._registry[name]()   # create and return
+    @classmethod
+    def register(cls, name: str, metric_class: type) -> None:
+        cls._registry[name] = metric_class  # extend without modifying
+# knn.py — only depends on the factory, not the concrete classes
+self._metric = DistanceMetricFactory.create(distance)
+```
+---
+## API Reference
+### `KNNClassifier`
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `k` | `int` | `5` | Number of neighbours |
+| `distance` | `str` | `"euclidean"` | `"euclidean"` or `"manhattan"` (or any registered name) |
+| `n_jobs` | `int` | `1` | Worker processes for prediction (`1` = sequential) |
+### `LinearRegression`
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `strategy` | `str` | `"normal"` | `"normal"` (closed-form) or `"gradient_descent"` |
+| `learning_rate` | `float` | `0.01` | Learning rate — gradient descent only |
+| `epochs` | `int` | `1000` | Iterations — gradient descent only |
+### `Evaluator`
+| Method | Description |
+|--------|-------------|
+| `evaluate_regression(y_true, y_pred)` | Returns `{"mae", "mse", "rmse"}` |
+| `evaluate_classification(y_true, y_pred)` | Returns `{"accuracy", "precision", "recall", "f1"}` |
+| `register(name, fn, kind)` | Add a custom metric at runtime |
+### Standalone metric functions
+```python
+from coreLearn import accuracy, mae, mse, rmse, precision, recall, f1_score
+```
+---
+## Dependencies
+| Package | Purpose |
+|---------|---------|
+| `numpy` | Matrix operations, vectorised arithmetic |
+| `pytest` | Unit testing |
+| `scikit-learn` | Datasets and preprocessing in examples only |
+| `pandas` | Data loading in examples only |
+| `matplotlib` | Visualisation in examples only |