PyPI - bplusplus - Versions diffs - 1.2.3__tar.gz → 1.2.4__tar.gz - Mend

bplusplus 1.2.3tar.gz → 1.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of bplusplus might be problematic. Click here for more details.

Files changed (14) hide show

bplusplus-1.2.4/PKG-INFO +207 -0
bplusplus-1.2.4/README.md +178 -0
{bplusplus-1.2.3 → bplusplus-1.2.4}/pyproject.toml +9 -11
bplusplus-1.2.4/src/bplusplus/__init__.py +15 -0
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/inference.py +48 -10
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/prepare.py +12 -11
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/test.py +10 -0
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/train.py +48 -14
bplusplus-1.2.3/PKG-INFO +0 -101
bplusplus-1.2.3/README.md +0 -69
bplusplus-1.2.3/src/bplusplus/__init__.py +0 -5
{bplusplus-1.2.3 → bplusplus-1.2.4}/LICENSE +0 -0
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/collect.py +0 -0
{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/tracker.py +0 -0

bplusplus-1.2.4/PKG-INFO ADDED Viewed

@@ -0,0 +1,207 @@
+Metadata-Version: 2.3
+Name: bplusplus
+Version: 1.2.4
+Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
+License: MIT
+Author: Titus Venverloo
+Author-email: tvenver@mit.edu
+Requires-Python: >=3.10,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Dist: numpy (==1.26.4)
+Requires-Dist: pandas (==2.1.4)
+Requires-Dist: pillow (==11.3.0)
+Requires-Dist: prettytable (==3.7.0)
+Requires-Dist: pygbif (==0.6.5)
+Requires-Dist: pyyaml (==6.0.1)
+Requires-Dist: requests (==2.25.1)
+Requires-Dist: scikit-learn (==1.7.1)
+Requires-Dist: tabulate (==0.9.0)
+Requires-Dist: tqdm (==4.66.4)
+Requires-Dist: ultralytics (==8.3.173)
+Requires-Dist: validators (==0.33.0)
+Description-Content-Type: text/markdown
+# B++ repository
+[![DOI](https://zenodo.org/badge/765250194.svg)](https://zenodo.org/badge/latestdoi/765250194)
+[![PyPi version](https://img.shields.io/pypi/v/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![Python versions](https://img.shields.io/pypi/pyversions/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![License](https://img.shields.io/pypi/l/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![Downloads](https://static.pepy.tech/badge/bplusplus)](https://pepy.tech/project/bplusplus)
+[![Downloads](https://static.pepy.tech/badge/bplusplus/month)](https://pepy.tech/project/bplusplus)
+[![Downloads](https://static.pepy.tech/badge/bplusplus/week)](https://pepy.tech/project/bplusplus)
+This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
+Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
+## Key Features
+- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
+- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
+- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
+- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
+## Pipeline Overview
+The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
+1.  **Collect Data**: Select your target species and fetch raw insect images from the web.
+2.  **Prepare Data**: Filter, clean, and prepare images for training.
+3.  **Train Model**: Train the hierarchical classification model.
+4.  **Download Weights**: Fetch pre-trained weights for the detection model.
+5.  **Test Model**: Evaluate the performance of the trained model.
+6.  **Run Inference**: Run the full pipeline on a video file for real-world application.
+## How to Use
+### Prerequisites
+- Python 3.10+
+### Setup
+1.  **Create and activate a virtual environment:**
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate
+    ```
+2.  **Install the required packages:**
+    ```bash
+    pip install bplusplus
+    ```
+### Running the Pipeline
+The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
+#### Step 1: Collect Data
+Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
+```python
+import bplusplus
+from pathlib import Path
+# Define species and directories
+names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
+GBIF_DATA_DIR = Path("./GBIF_data")
+# Define search parameters
+search = {"scientificName": names}
+# Run collection
+bplusplus.collect(
+    group_by_key=bplusplus.Group.scientificName,
+    search_parameters=search,
+    images_per_group=200,  # Recommended to download more than needed
+    output_directory=GBIF_DATA_DIR,
+    num_threads=5
+)
+```
+#### Step 2: Prepare Data
+Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
+```python
+PREPARED_DATA_DIR = Path("./prepared_data")
+bplusplus.prepare(
+    input_directory=GBIF_DATA_DIR,
+    output_directory=PREPARED_DATA_DIR,
+    img_size=640  # Target image size for training
+)
+```
+#### Step 3: Train Model
+Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
+```python
+TRAINED_MODEL_DIR = Path("./trained_model")
+bplusplus.train(
+    batch_size=4,
+    epochs=30,
+    patience=3,
+    img_size=640,
+    data_dir=PREPARED_DATA_DIR,
+    output_dir=TRAINED_MODEL_DIR,
+    species_list=names
+    # num_workers=0  # Optional: force single-process loading (most stable)
+)
+```
+**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). You can increase it for potentially faster data loading.
+#### Step 4: Download Detection Weights
+The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.
+You can download the weights file from [this link](https://github.com/Tvenver/Bplusplus/releases/download/v1.2.3/v11small-generic.pt).
+Place it in the `trained_model` directory and ensure it is named `yolo_weights.pt`.
+#### Step 5: Run Inference on Video
+Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.
+```python
+VIDEO_INPUT_PATH = Path("my_video.mp4")
+VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
+HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
+YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"
+bplusplus.inference(
+    species_list=names,
+    yolo_model_path=YOLO_WEIGHTS_PATH,
+    hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
+    confidence_threshold=0.35,
+    video_path=VIDEO_INPUT_PATH,
+    output_path=VIDEO_OUTPUT_PATH,
+    tracker_max_frames=60,
+    fps=15  # Optional: set processing FPS
+)
+```
+### Customization
+To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
+```python
+# To use your own species, change the names in this list
+names = [
+    "Vespa crabro",
+    "Vespula vulgaris",
+    "Dolichovespula media",
+    # Add your species here
+]
+```
+#### Handling an "Unknown" Class
+To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
+```python
+# Example with an unknown class
+names_with_unknown = [
+    "Vespa crabro",
+    "Vespula vulgaris",
+    "unknown"
+]
+```
+## Directory Structure
+The pipeline will create the following directories to store artifacts:
+- `GBIF_data/`: Stores the raw images downloaded from GBIF.
+- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
+- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
+# Citation
+All information in this GitHub is available under MIT license, as long as credit is given to the authors.
+**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**

bplusplus-1.2.4/README.md ADDED Viewed

@@ -0,0 +1,178 @@
+# B++ repository
+[![DOI](https://zenodo.org/badge/765250194.svg)](https://zenodo.org/badge/latestdoi/765250194)
+[![PyPi version](https://img.shields.io/pypi/v/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![Python versions](https://img.shields.io/pypi/pyversions/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![License](https://img.shields.io/pypi/l/bplusplus.svg)](https://pypi.org/project/bplusplus/)
+[![Downloads](https://static.pepy.tech/badge/bplusplus)](https://pepy.tech/project/bplusplus)
+[![Downloads](https://static.pepy.tech/badge/bplusplus/month)](https://pepy.tech/project/bplusplus)
+[![Downloads](https://static.pepy.tech/badge/bplusplus/week)](https://pepy.tech/project/bplusplus)
+This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
+Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
+## Key Features
+- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
+- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
+- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
+- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
+## Pipeline Overview
+The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
+1.  **Collect Data**: Select your target species and fetch raw insect images from the web.
+2.  **Prepare Data**: Filter, clean, and prepare images for training.
+3.  **Train Model**: Train the hierarchical classification model.
+4.  **Download Weights**: Fetch pre-trained weights for the detection model.
+5.  **Test Model**: Evaluate the performance of the trained model.
+6.  **Run Inference**: Run the full pipeline on a video file for real-world application.
+## How to Use
+### Prerequisites
+- Python 3.10+
+### Setup
+1.  **Create and activate a virtual environment:**
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate
+    ```
+2.  **Install the required packages:**
+    ```bash
+    pip install bplusplus
+    ```
+### Running the Pipeline
+The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
+#### Step 1: Collect Data
+Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
+```python
+import bplusplus
+from pathlib import Path
+# Define species and directories
+names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
+GBIF_DATA_DIR = Path("./GBIF_data")
+# Define search parameters
+search = {"scientificName": names}
+# Run collection
+bplusplus.collect(
+    group_by_key=bplusplus.Group.scientificName,
+    search_parameters=search,
+    images_per_group=200,  # Recommended to download more than needed
+    output_directory=GBIF_DATA_DIR,
+    num_threads=5
+)
+```
+#### Step 2: Prepare Data
+Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
+```python
+PREPARED_DATA_DIR = Path("./prepared_data")
+bplusplus.prepare(
+    input_directory=GBIF_DATA_DIR,
+    output_directory=PREPARED_DATA_DIR,
+    img_size=640  # Target image size for training
+)
+```
+#### Step 3: Train Model
+Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
+```python
+TRAINED_MODEL_DIR = Path("./trained_model")
+bplusplus.train(
+    batch_size=4,
+    epochs=30,
+    patience=3,
+    img_size=640,
+    data_dir=PREPARED_DATA_DIR,
+    output_dir=TRAINED_MODEL_DIR,
+    species_list=names
+    # num_workers=0  # Optional: force single-process loading (most stable)
+)
+```
+**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). You can increase it for potentially faster data loading.
+#### Step 4: Download Detection Weights
+The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.
+You can download the weights file from [this link](https://github.com/Tvenver/Bplusplus/releases/download/v1.2.3/v11small-generic.pt).
+Place it in the `trained_model` directory and ensure it is named `yolo_weights.pt`.
+#### Step 5: Run Inference on Video
+Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.
+```python
+VIDEO_INPUT_PATH = Path("my_video.mp4")
+VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
+HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
+YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"
+bplusplus.inference(
+    species_list=names,
+    yolo_model_path=YOLO_WEIGHTS_PATH,
+    hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
+    confidence_threshold=0.35,
+    video_path=VIDEO_INPUT_PATH,
+    output_path=VIDEO_OUTPUT_PATH,
+    tracker_max_frames=60,
+    fps=15  # Optional: set processing FPS
+)
+```
+### Customization
+To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
+```python
+# To use your own species, change the names in this list
+names = [
+    "Vespa crabro",
+    "Vespula vulgaris",
+    "Dolichovespula media",
+    # Add your species here
+]
+```
+#### Handling an "Unknown" Class
+To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
+```python
+# Example with an unknown class
+names_with_unknown = [
+    "Vespa crabro",
+    "Vespula vulgaris",
+    "unknown"
+]
+```
+## Directory Structure
+The pipeline will create the following directories to store artifacts:
+- `GBIF_data/`: Stores the raw images downloaded from GBIF.
+- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
+- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
+# Citation
+All information in this GitHub is available under MIT license, as long as credit is given to the authors.
+**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**

{bplusplus-1.2.3 → bplusplus-1.2.4}/pyproject.toml RENAMED Viewed

@@ -1,27 +1,25 @@
 [tool.poetry]
 name = "bplusplus"
-version = "1.2.3"
+version = "1.2.4"
 description = "A simple method to create AI models for biodiversity, with collect and prepare pipeline"
 authors = ["Titus Venverloo <tvenver@mit.edu>", "Deniz Aydemir <deniz@aydemir.us>", "Orlando Closs <orlandocloss@pm.me>", "Ase Hatveit <aase@mit.edu>"]
 license = "MIT"
 readme = "README.md"
 [tool.poetry.dependencies]
-python = "^3.9.0"
+python = "^3.10"
 requests = "2.25.1"
 pandas = "2.1.4"
-ultralytics = ">=8.3.0"
+ultralytics = "8.3.173"
 pyyaml = "6.0.1"
 tqdm = "4.66.4"
 prettytable = "3.7.0"
-torch = "^2.5.0"
-torchvision = "*"
-pillow = "*"
-numpy = "*"
-scikit-learn = "*"
-pygbif = "^0.6.4"
-validators = "^0.33.0"
-tabulate = "^0.9.0"
+pillow = "11.3.0"
+numpy = "1.26.4"
+scikit-learn = "1.7.1"
+pygbif = "0.6.5"
+validators = "0.33.0"
+tabulate = "0.9.0"
 [tool.poetry.group.dev.dependencies]
 jupyter = "^1.0.0"

bplusplus-1.2.4/src/bplusplus/__init__.py ADDED Viewed

@@ -0,0 +1,15 @@
+try:
+    import torch
+    import torchvision
+except ImportError:
+    raise ImportError(
+        "PyTorch and Torchvision are not installed. "
+        "Please install them before using bplusplus by following the instructions "
+        "on the official PyTorch website: https://pytorch.org/get-started/locally/"
+    )
+from .collect import Group, collect
+from .prepare import prepare
+from .train import train
+from .test import test
+from .inference import inference

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/inference.py RENAMED Viewed

@@ -9,6 +9,8 @@ from datetime import datetime
 from pathlib import Path
 from .tracker import InsectTracker
 import torch
+import torchvision.transforms as T
+from torchvision.models.detection import fasterrcnn_resnet50_fpn
 from ultralytics import YOLO
 from torchvision import transforms
 from PIL import Image
@@ -19,6 +21,16 @@ import logging
 from collections import defaultdict
 import uuid
+# Add this check for backwards compatibility
+if hasattr(torch.serialization, 'add_safe_globals'):
+    torch.serialization.add_safe_globals([
+        'torch.LongTensor',
+        'torch.cuda.LongTensor',
+        'torch.FloatStorage',
+        'torch.FloatStorage',
+        'torch.cuda.FloatStorage',
+    ])
 # Set up logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -36,12 +48,15 @@ def get_taxonomy(species_list):
     species_to_genus = {}
     genus_to_family = {}
-    logger.info(f"Building taxonomy from GBIF for {len(species_list)} species")
+    species_list_for_gbif = [s for s in species_list if s.lower() != 'unknown']
+    has_unknown = len(species_list_for_gbif) != len(species_list)
+    logger.info(f"Building taxonomy from GBIF for {len(species_list_for_gbif)} species")
     print(f"\n{'Species':<30} {'Family':<20} {'Genus':<20} {'Status'}")
     print("-" * 80)
-    for species_name in species_list:
+    for species_name in species_list_for_gbif:
         url = f"https://api.gbif.org/v1/species/match?name={species_name}&verbose=true"
         try:
             response = requests.get(url)
@@ -72,6 +87,21 @@ def get_taxonomy(species_list):
         except Exception as e:
             print(f"{species_name:<30} {'Error':<20} {'Error':<20} FAILED")
             logger.error(f"Error retrieving data for '{species_name}': {str(e)}")
+    if has_unknown:
+        unknown_family = "Unknown"
+        unknown_genus = "Unknown"
+        unknown_species = "unknown"
+        if unknown_family not in taxonomy[1]:
+            taxonomy[1].append(unknown_family)
+        taxonomy[2][unknown_genus] = unknown_family
+        taxonomy[3][unknown_species] = unknown_genus
+        species_to_genus[unknown_species] = unknown_genus
+        genus_to_family[unknown_genus] = unknown_family
+        print(f"{unknown_species:<30} {unknown_family:<20} {unknown_genus:<20} {'OK'}")
     taxonomy[1] = sorted(list(set(taxonomy[1])))
     print("-" * 80)
@@ -85,18 +115,26 @@ def get_taxonomy(species_list):
     logger.info(f"Taxonomy built: {len(taxonomy[1])} families, {len(taxonomy[2])} genera, {len(taxonomy[3])} species")
     return taxonomy, species_to_genus, genus_to_family
-def create_mappings(taxonomy):
+def create_mappings(taxonomy, species_list=None):
     """Create index mappings from taxonomy"""
     level_to_idx = {}
     idx_to_level = {}
     for level, labels in taxonomy.items():
         if isinstance(labels, list):
+            # Level 1: Family (already sorted)
             level_to_idx[level] = {label: idx for idx, label in enumerate(labels)}
             idx_to_level[level] = {idx: label for idx, label in enumerate(labels)}
-        else:  # Dictionary
-            level_to_idx[level] = {label: idx for idx, label in enumerate(labels.keys())}
-            idx_to_level[level] = {idx: label for idx, label in enumerate(labels.keys())}
+        else:  # Dictionary for levels 2 and 3
+            if level == 3 and species_list is not None:
+                # For species, the order is determined by species_list
+                sorted_keys = species_list
+            else:
+                # For genus, sort alphabetically
+                sorted_keys = sorted(labels.keys())
+            level_to_idx[level] = {label: idx for idx, label in enumerate(sorted_keys)}
+            idx_to_level[level] = {idx: label for idx, label in enumerate(sorted_keys)}
     return level_to_idx, idx_to_level
@@ -321,9 +359,9 @@ class VideoInferenceProcessor:
         # Build taxonomy from species list
         self.taxonomy, self.species_to_genus, self.genus_to_family = get_taxonomy(species_list)
-        self.level_to_idx, self.idx_to_level = create_mappings(self.taxonomy)
-        self.family_list = self.taxonomy[1]
-        self.genus_list = list(self.taxonomy[2].keys())
+        self.level_to_idx, self.idx_to_level = create_mappings(self.taxonomy, species_list)
+        self.family_list = sorted(self.taxonomy[1])
+        self.genus_list = sorted(list(self.taxonomy[2].keys()))
         # Load models
         print(f"Loading YOLO model from {yolo_model_path}")
@@ -863,7 +901,7 @@ def main():
     species_list = [
         "Coccinella septempunctata", "Apis mellifera", "Bombus lapidarius", "Bombus terrestris",
         "Eupeodes corollae", "Episyrphus balteatus", "Aglais urticae", "Vespula vulgaris",
-        "Eristalis tenax"
+        "Eristalis tenax", "unknown"
     ]
     # Paths (replace with your actual paths)

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/prepare.py RENAMED Viewed

@@ -174,17 +174,18 @@ def _prepare_model_and_clean_images(temp_dir_path: Path):
         print("  ✓ Model weights already exist")
     # Add all required classes to safe globals
-    serialization.add_safe_globals([
-        DetectionModel, Sequential, Conv, Conv2d, BatchNorm2d,
-        SiLU, ReLU, LeakyReLU, MaxPool2d, Linear, Dropout, Upsample,
-        Module, ModuleList, ModuleDict,
-        Bottleneck, C2f, SPPF, Detect, Concat, DFL,
-        # Add torch internal classes
-        torch.nn.parameter.Parameter,
-        torch.Tensor,
-        torch._utils._rebuild_tensor_v2,
-        torch._utils._rebuild_parameter
-    ])
+    if hasattr(serialization, 'add_safe_globals'):
+        serialization.add_safe_globals([
+            DetectionModel, Sequential, Conv, Conv2d, BatchNorm2d,
+            SiLU, ReLU, LeakyReLU, MaxPool2d, Linear, Dropout, Upsample,
+            Module, ModuleList, ModuleDict,
+            Bottleneck, C2f, SPPF, Detect, Concat, DFL,
+            # Add torch internal classes
+            torch.nn.parameter.Parameter,
+            torch.Tensor,
+            torch._utils._rebuild_tensor_v2,
+            torch._utils._rebuild_parameter
+        ])
     return weights_path

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/test.py RENAMED Viewed

@@ -74,6 +74,16 @@ def setup_gpu():
         logger.warning("Falling back to CPU")
         return torch.device("cpu")
+# Add this check for backwards compatibility
+if hasattr(torch.serialization, 'add_safe_globals'):
+    torch.serialization.add_safe_globals([
+        'torch.LongTensor',
+        'torch.cuda.LongTensor',
+        'torch.FloatStorage',
+        'torch.FloatStorage',
+        'torch.cuda.FloatStorage',
+    ])
 class HierarchicalInsectClassifier(nn.Module):
     def __init__(self, num_classes_per_level):
         """

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/train.py RENAMED Viewed

@@ -14,18 +14,28 @@ import logging
 from tqdm import tqdm
 import sys
-def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', output_dir='./output', species_list=None):
+def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', output_dir='./output', species_list=None, num_workers=4):
     """
     Main function to run the entire training pipeline.
     Sets up datasets, model, training process and handles errors.
+    Args:
+        batch_size (int): Number of samples per batch. Default: 4
+        epochs (int): Maximum number of training epochs. Default: 30
+        patience (int): Early stopping patience (epochs without improvement). Default: 3
+        img_size (int): Target image size for training. Default: 640
+        data_dir (str): Directory containing train/valid subdirectories. Default: 'input'
+        output_dir (str): Directory to save trained model and logs. Default: './output'
+        species_list (list): List of species names for training. Required.
+        num_workers (int): Number of DataLoader worker processes.
+                          Set to 0 to disable multiprocessing (most stable). Default: 4
     """
     global logger, device
     logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
     logger = logging.getLogger(__name__)
-    logger.info(f"Hyperparameters - Batch size: {batch_size}, Epochs: {epochs}, Patience: {patience}, Image size: {img_size}, Data directory: {data_dir}, Output directory: {output_dir}")
+    logger.info(f"Hyperparameters - Batch size: {batch_size}, Epochs: {epochs}, Patience: {patience}, Image size: {img_size}, Data directory: {data_dir}, Output directory: {output_dir}, Num workers: {num_workers}")
     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
@@ -52,7 +62,7 @@ def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', o
     taxonomy = get_taxonomy(species_list)
-    level_to_idx, parent_child_relationship = create_mappings(taxonomy)
+    level_to_idx, parent_child_relationship = create_mappings(taxonomy, species_list)
     num_classes_per_level = [len(taxonomy[level]) if isinstance(taxonomy[level], list)
                             else len(taxonomy[level].keys()) for level in sorted(taxonomy.keys())]
@@ -75,14 +85,14 @@ def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', o
         train_dataset,
         batch_size=batch_size,
         shuffle=True,
-        num_workers=4
+        num_workers=num_workers
     )
     val_loader = DataLoader(
         val_dataset,
         batch_size=batch_size,
         shuffle=False,
-        num_workers=4
+        num_workers=num_workers
     )
     try:
@@ -150,14 +160,17 @@ def get_taxonomy(species_list):
     species_to_genus = {}
     genus_to_family = {}
-    logger.info(f"Building taxonomy from GBIF for {len(species_list)} species")
+    species_list_for_gbif = [s for s in species_list if s.lower() != 'unknown']
+    has_unknown = len(species_list_for_gbif) != len(species_list)
+    logger.info(f"Building taxonomy from GBIF for {len(species_list_for_gbif)} species")
     print("\nTaxonomy Results:")
     print("-" * 80)
     print(f"{'Species':<30} {'Family':<20} {'Genus':<20} {'Status'}")
     print("-" * 80)
-    for species_name in species_list:
+    for species_name in species_list_for_gbif:
         url = f"https://api.gbif.org/v1/species/match?name={species_name}&verbose=true"
         try:
             response = requests.get(url)
@@ -199,6 +212,19 @@ def get_taxonomy(species_list):
             print(f"{species_name:<30} {'Error':<20} {'Error':<20} FAILED")
             print(f"Error: {error_msg}")
             sys.exit(1)  # Stop the script
+    if has_unknown:
+        unknown_family = "Unknown"
+        unknown_genus = "Unknown"
+        unknown_species = "unknown"
+        if unknown_family not in taxonomy[1]:
+            taxonomy[1].append(unknown_family)
+        taxonomy[2][unknown_genus] = unknown_family
+        taxonomy[3][unknown_species] = unknown_genus
+        print(f"{unknown_species:<30} {unknown_family:<20} {unknown_genus:<20} {'OK'}")
     taxonomy[1] = sorted(list(set(taxonomy[1])))
     print("-" * 80)
@@ -212,7 +238,7 @@ def get_taxonomy(species_list):
         print(f"  {i}: {family}")
     print("\nGenus indices:")
-    for i, genus in enumerate(taxonomy[2].keys()):
+    for i, genus in enumerate(sorted(taxonomy[2].keys())):
         print(f"  {i}: {genus}")
     print("\nSpecies indices:")
@@ -244,7 +270,7 @@ def get_species_from_directory(train_dir):
     logger.info(f"Found {len(species_list)} species in {train_dir}")
     return species_list
-def create_mappings(taxonomy):
+def create_mappings(taxonomy, species_list=None):
     """
     Creates mapping dictionaries from taxonomy data.
     Returns level-to-index mapping and parent-child relationships between taxonomic levels.
@@ -254,9 +280,17 @@ def create_mappings(taxonomy):
     for level, labels in taxonomy.items():
         if isinstance(labels, list):
+            # Level 1: Family (already sorted)
             level_to_idx[level] = {label: idx for idx, label in enumerate(labels)}
-        else:
-            level_to_idx[level] = {label: idx for idx, label in enumerate(labels.keys())}
+        else:  # dict for levels 2 and 3
+            if level == 3 and species_list is not None:
+                # For species, the order is determined by species_list
+                level_to_idx[level] = {label: idx for idx, label in enumerate(species_list)}
+            else:
+                # For genus (and as a fallback for species), sort alphabetically
+                sorted_keys = sorted(labels.keys())
+                level_to_idx[level] = {label: idx for idx, label in enumerate(sorted_keys)}
             for child, parent in labels.items():
                 if (level, parent) not in parent_child_relationship:
                     parent_child_relationship[(level, parent)] = []
@@ -670,7 +704,7 @@ if __name__ == '__main__':
     species_list = [
         "Coccinella septempunctata", "Apis mellifera", "Bombus lapidarius", "Bombus terrestris",
         "Eupeodes corollae", "Episyrphus balteatus", "Aglais urticae", "Vespula vulgaris",
-        "Eristalis tenax"
+        "Eristalis tenax", "unknown"
     ]
-    train_multitask(species_list=species_list, epochs=2)
+    train(species_list=species_list, epochs=2)

bplusplus-1.2.3/PKG-INFO DELETED Viewed

@@ -1,101 +0,0 @@
-Metadata-Version: 2.3
-Name: bplusplus
-Version: 1.2.3
-Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
-License: MIT
-Author: Titus Venverloo
-Author-email: tvenver@mit.edu
-Requires-Python: >=3.9.0,<4.0.0
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: numpy
-Requires-Dist: pandas (==2.1.4)
-Requires-Dist: pillow
-Requires-Dist: prettytable (==3.7.0)
-Requires-Dist: pygbif (>=0.6.4,<0.7.0)
-Requires-Dist: pyyaml (==6.0.1)
-Requires-Dist: requests (==2.25.1)
-Requires-Dist: scikit-learn
-Requires-Dist: tabulate (>=0.9.0,<0.10.0)
-Requires-Dist: torch (>=2.5.0,<3.0.0)
-Requires-Dist: torchvision
-Requires-Dist: tqdm (==4.66.4)
-Requires-Dist: ultralytics (>=8.3.0)
-Requires-Dist: validators (>=0.33.0,<0.34.0)
-Description-Content-Type: text/markdown
-# Domain-Agnostic Insect Classification Pipeline
-This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
-Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
-## Key Features
-- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
-- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
-- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
-- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
-## Pipeline Overview
-The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
-1.  **Collect Data**: Select your target species and fetch raw insect images from the web.
-2.  **Prepare Data**: Filter, clean, and prepare images for training.
-3.  **Train Model**: Train the hierarchical classification model.
-4.  **Download Weights**: Fetch pre-trained weights for the detection model.
-5.  **Test Model**: Evaluate the performance of the trained model.
-6.  **Run Inference**: Run the full pipeline on a video file for real-world application.
-## How to Use
-### Prerequisites
-- Python 3.8+
-- `venv` for creating a virtual environment (recommended)
-### Setup
-1.  **Create and activate a virtual environment:**
-    ```bash
-    python3 -m venv venv
-    source venv/bin/activate
-    ```
-2.  **Install the required packages:**
-    ```bash
-    pip install bplusplus
-    ```
-### Running the Pipeline
-The entire workflow is contained within **`full_pipeline.ipynb`**. Open it with a Jupyter Notebook or JupyterLab environment and run the cells sequentially to execute the full pipeline.
-### Customization
-To train the model on different insect species, simply modify the `names` list in **Step 1** of the notebook:
-```python
-# a/full_pipeline.ipynb
-# To use your own species, change the names in this list
-names = [
-    "Vespa crabro", "Vespula vulgaris", "Dolichovespula media"
-]
-```
-The pipeline will automatically handle the rest, from data collection to training, for your new set of species.
-## Directory Structure
-The pipeline will create the following directories to store artifacts:
-- `GBIF_data/`: Stores the raw images downloaded from GBIF.
-- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
-- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.

bplusplus-1.2.3/README.md DELETED Viewed

@@ -1,69 +0,0 @@
-# Domain-Agnostic Insect Classification Pipeline
-This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
-Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
-## Key Features
-- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
-- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
-- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
-- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
-## Pipeline Overview
-The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
-1.  **Collect Data**: Select your target species and fetch raw insect images from the web.
-2.  **Prepare Data**: Filter, clean, and prepare images for training.
-3.  **Train Model**: Train the hierarchical classification model.
-4.  **Download Weights**: Fetch pre-trained weights for the detection model.
-5.  **Test Model**: Evaluate the performance of the trained model.
-6.  **Run Inference**: Run the full pipeline on a video file for real-world application.
-## How to Use
-### Prerequisites
-- Python 3.8+
-- `venv` for creating a virtual environment (recommended)
-### Setup
-1.  **Create and activate a virtual environment:**
-    ```bash
-    python3 -m venv venv
-    source venv/bin/activate
-    ```
-2.  **Install the required packages:**
-    ```bash
-    pip install bplusplus
-    ```
-### Running the Pipeline
-The entire workflow is contained within **`full_pipeline.ipynb`**. Open it with a Jupyter Notebook or JupyterLab environment and run the cells sequentially to execute the full pipeline.
-### Customization
-To train the model on different insect species, simply modify the `names` list in **Step 1** of the notebook:
-```python
-# a/full_pipeline.ipynb
-# To use your own species, change the names in this list
-names = [
-    "Vespa crabro", "Vespula vulgaris", "Dolichovespula media"
-]
-```
-The pipeline will automatically handle the rest, from data collection to training, for your new set of species.
-## Directory Structure
-The pipeline will create the following directories to store artifacts:
-- `GBIF_data/`: Stores the raw images downloaded from GBIF.
-- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
-- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.

bplusplus-1.2.3/src/bplusplus/__init__.py DELETED Viewed

@@ -1,5 +0,0 @@
-from .collect import Group, collect
-from .prepare import prepare
-from .train import train
-from .test import test
-from .inference import inference

{bplusplus-1.2.3 → bplusplus-1.2.4}/LICENSE RENAMED Viewed

File without changes

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/collect.py RENAMED Viewed

File without changes

{bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/tracker.py RENAMED Viewed

File without changes

bplusplus 1.2.3__tar.gz → 1.2.4__tar.gz

Potentially problematic release.

bplusplus 1.2.3tar.gz → 1.2.4tar.gz