bplusplus 2.0.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,259 @@
1
+ Metadata-Version: 2.3
2
+ Name: bplusplus
3
+ Version: 2.0.4
4
+ Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
5
+ License: MIT
6
+ Author: Titus Venverloo
7
+ Author-email: tvenver@mit.edu
8
+ Requires-Python: >=3.10,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Programming Language :: Python :: 3.13
15
+ Requires-Dist: numpy (>=1.26.0,<1.26.5) ; sys_platform == "win32"
16
+ Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "darwin" and platform_machine == "arm64"
17
+ Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
18
+ Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "linux" and platform_machine == "aarch64"
19
+ Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "linux" and platform_machine == "x86_64"
20
+ Requires-Dist: pandas (==2.1.4)
21
+ Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "darwin"
22
+ Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "linux"
23
+ Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "win32"
24
+ Requires-Dist: prettytable (==3.7.0)
25
+ Requires-Dist: pygbif (==0.6.5)
26
+ Requires-Dist: pyyaml (==6.0.1)
27
+ Requires-Dist: requests (==2.25.1)
28
+ Requires-Dist: scikit-learn (>=1.3.0,<1.7.0) ; sys_platform == "linux" and platform_machine == "aarch64"
29
+ Requires-Dist: scikit-learn (>=1.3.0,<1.7.0) ; sys_platform == "win32"
30
+ Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "darwin" and platform_machine == "arm64"
31
+ Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
32
+ Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "linux" and platform_machine == "x86_64"
33
+ Requires-Dist: tabulate (==0.9.0)
34
+ Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "darwin" and platform_machine == "arm64"
35
+ Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "linux"
36
+ Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "win32"
37
+ Requires-Dist: torch (>=2.2.0,<2.3.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
38
+ Requires-Dist: tqdm (==4.66.4)
39
+ Requires-Dist: ultralytics (==8.3.173)
40
+ Requires-Dist: validators (==0.33.0)
41
+ Description-Content-Type: text/markdown
42
+
43
+ # B++ repository
44
+
45
+ [![DOI](https://zenodo.org/badge/765250194.svg)](https://zenodo.org/badge/latestdoi/765250194)
46
+ [![PyPi version](https://img.shields.io/pypi/v/bplusplus.svg)](https://pypi.org/project/bplusplus/)
47
+ [![Python versions](https://img.shields.io/pypi/pyversions/bplusplus.svg)](https://pypi.org/project/bplusplus/)
48
+ [![License](https://img.shields.io/pypi/l/bplusplus.svg)](https://pypi.org/project/bplusplus/)
49
+ [![Downloads](https://static.pepy.tech/badge/bplusplus)](https://pepy.tech/project/bplusplus)
50
+ [![Downloads](https://static.pepy.tech/badge/bplusplus/month)](https://pepy.tech/project/bplusplus)
51
+ [![Downloads](https://static.pepy.tech/badge/bplusplus/week)](https://pepy.tech/project/bplusplus)
52
+
53
+ This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
54
+
55
+ Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
56
+
57
+ ## Key Features
58
+
59
+ - **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
60
+ - **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
61
+ - **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
62
+ - **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
63
+ ## Pipeline Overview
64
+
65
+ The process is broken down into five main steps, all detailed in the `full_pipeline.ipynb` notebook:
66
+
67
+ 1. **Collect Data**: Select your target species and fetch raw insect images from the web.
68
+ 2. **Prepare Data**: Filter, clean, and prepare images for training.
69
+ 3. **Train Model**: Train the hierarchical classification model.
70
+ 4. **Validate Model**: Evaluate the performance of the trained model.
71
+ 5. **Run Inference**: Run the full pipeline on a video file for real-world application.
72
+
73
+ ## How to Use
74
+
75
+ ### Prerequisites
76
+
77
+ - Python 3.10+
78
+
79
+ ### Setup
80
+
81
+ 1. **Create and activate a virtual environment:**
82
+ ```bash
83
+ python3 -m venv venv
84
+ source venv/bin/activate
85
+ ```
86
+
87
+ 2. **Install the required packages:**
88
+ ```bash
89
+ pip install bplusplus
90
+ ```
91
+
92
+ ### Running the Pipeline
93
+
94
+ The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
95
+
96
+ #### Step 1: Collect Data
97
+ Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
98
+
99
+ ```python
100
+ import bplusplus
101
+ from pathlib import Path
102
+
103
+ # Define species and directories
104
+ names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
105
+ GBIF_DATA_DIR = Path("./GBIF_data")
106
+
107
+ # Define search parameters
108
+ search = {"scientificName": names}
109
+
110
+ # Run collection
111
+ bplusplus.collect(
112
+ group_by_key=bplusplus.Group.scientificName,
113
+ search_parameters=search,
114
+ images_per_group=200, # Recommended to download more than needed
115
+ output_directory=GBIF_DATA_DIR,
116
+ num_threads=5
117
+ )
118
+ ```
119
+
120
+ #### Step 2: Prepare Data
121
+ Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
122
+
123
+ ```python
124
+ PREPARED_DATA_DIR = Path("./prepared_data")
125
+
126
+ bplusplus.prepare(
127
+ input_directory=GBIF_DATA_DIR,
128
+ output_directory=PREPARED_DATA_DIR,
129
+ img_size=640, # Target image size for training
130
+ conf=0.6, # Detection confidence threshold (0-1)
131
+ valid=0.1, # Validation split ratio (0-1), set to 0 for no validation
132
+ blur=None, # Gaussian blur as fraction of image size (0-1), None = disabled
133
+ )
134
+ ```
135
+
136
+ **Note:** The `blur` parameter applies Gaussian blur before resizing, which can help reduce noise. Values are relative to image size (e.g., `blur=0.01` means 1% of the smallest dimension). Supported image formats: JPG, JPEG, and PNG.
137
+
138
+ #### Step 3: Train Model
139
+ Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
140
+
141
+ ```python
142
+ TRAINED_MODEL_DIR = Path("./trained_model")
143
+
144
+ bplusplus.train(
145
+ batch_size=4,
146
+ epochs=30,
147
+ patience=3,
148
+ img_size=640,
149
+ data_dir=PREPARED_DATA_DIR,
150
+ output_dir=TRAINED_MODEL_DIR,
151
+ species_list=names,
152
+ backbone="resnet50", # Choose: "resnet18", "resnet50", or "resnet101"
153
+ # num_workers=0, # Optional: force single-process loading (most stable)
154
+ # train_transforms=custom_transforms, # Optional: custom torchvision transforms
155
+ )
156
+ ```
157
+
158
+ **Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). The `backbone` parameter allows you to choose between different ResNet architectures—use `resnet18` for faster training or `resnet101` for potentially better accuracy.
159
+
160
+ #### Step 4: Validate Model
161
+ Evaluate the trained model on a held-out validation set. This calculates precision, recall, and F1-score at all taxonomic levels.
162
+
163
+ ```python
164
+ HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
165
+
166
+ results = bplusplus.validate(
167
+ species_list=names,
168
+ validation_dir=PREPARED_DATA_DIR / "valid",
169
+ hierarchical_weights=HIERARCHICAL_MODEL_PATH,
170
+ img_size=640, # Must match training
171
+ batch_size=32,
172
+ backbone="resnet50", # Must match training
173
+ )
174
+ ```
175
+
176
+ #### Step 5: Run Inference on Video
177
+ Process a video file to detect, classify, and track insects using motion-based detection. The pipeline uses background subtraction (GMM) to detect moving insects, tracks them across frames, and classifies confirmed tracks.
178
+
179
+ **Output files generated in `output_dir`:**
180
+ - `{video}_annotated.mp4` - Video showing confirmed tracks with classifications
181
+ - `{video}_debug.mp4` - Debug video with motion mask and all detections
182
+ - `{video}_results.csv` - Aggregated results per confirmed track
183
+ - `{video}_detections.csv` - Frame-by-frame detection data
184
+
185
+ ```python
186
+ VIDEO_INPUT_PATH = Path("my_video.mp4")
187
+ OUTPUT_DIR = Path("./output")
188
+ HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
189
+
190
+ results = bplusplus.inference(
191
+ species_list=names,
192
+ hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
193
+ video_path=VIDEO_INPUT_PATH,
194
+ output_dir=OUTPUT_DIR,
195
+ fps=None, # None = process all frames
196
+ backbone="resnet50", # Must match training
197
+ save_video=True, # Set to False to skip video rendering (only CSV output)
198
+ img_size=60, # Must match training
199
+ )
200
+
201
+ print(f"Detected {results['tracks']} tracks ({results['confirmed_tracks']} confirmed)")
202
+ ```
203
+
204
+ **Note:** Set `save_video=False` to skip generating the annotated and debug videos, which speeds up processing when you only need the CSV detection data.
205
+
206
+ **Custom Detection Configuration:**
207
+
208
+ For advanced control over detection parameters, provide a YAML config file:
209
+
210
+ ```python
211
+ results = bplusplus.inference(
212
+ ...,
213
+ config="detection_config.yaml"
214
+ )
215
+ ```
216
+
217
+ Download a template config from the [releases page](https://github.com/Tvenver/Bplusplus/releases). Parameters control cohesiveness filtering, shape filtering, tracking behavior, and path topology analysis for confirming insect-like movement.
218
+
219
+ ### Customization
220
+
221
+ To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
222
+
223
+ ```python
224
+ # To use your own species, change the names in this list
225
+ names = [
226
+ "Vespa crabro",
227
+ "Vespula vulgaris",
228
+ "Dolichovespula media",
229
+ # Add your species here
230
+ ]
231
+ ```
232
+
233
+ #### Handling an "Unknown" Class
234
+ To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
235
+
236
+ ```python
237
+ # Example with an unknown class
238
+ names_with_unknown = [
239
+ "Vespa crabro",
240
+ "Vespula vulgaris",
241
+ "unknown"
242
+ ]
243
+ ```
244
+
245
+ ## Directory Structure
246
+
247
+ The pipeline will create the following directories to store artifacts:
248
+
249
+ - `GBIF_data/`: Stores the raw images downloaded from GBIF.
250
+ - `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training (`train/` and optionally `valid/` subdirectories).
251
+ - `trained_model/`: Saves the trained model weights (`best_multitask.pt`).
252
+ - `output/`: Inference results including annotated videos and CSV files.
253
+
254
+ # Citation
255
+
256
+ All information in this GitHub is available under MIT license, as long as credit is given to the authors.
257
+
258
+ **Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**
259
+
@@ -0,0 +1,12 @@
1
+ bplusplus/__init__.py,sha256=JFkrasmmlnhq4Y6EbFI2IVKE1PIBvQeHpdSIaBwM1oQ,472
2
+ bplusplus/collect.py,sha256=OcGq5HmfiLu83uTk1Swa_3-Yf1PJnwRuK_aVErFH9Wk,17355
3
+ bplusplus/detector.py,sha256=Jef2aYhMdJfTAHpKI0hmjwa50nQhJJNxHJJ0mNIceMU,11725
4
+ bplusplus/inference.py,sha256=mECL3536c6mgqh8sB9jCzXay30i3R9sWfoLU6ePajrg,54762
5
+ bplusplus/prepare.py,sha256=b5K383iYzgctLFUYeaTUbyEr2VcmZGjCWnvk4VBKxO8,30450
6
+ bplusplus/tracker.py,sha256=JixV1ICGywGhVMTvkq3hrk4MLUUWDh3XJW4VLm4JdO0,11250
7
+ bplusplus/train.py,sha256=BsqjZ5OkDrkFzzjeJXLgc1fYXHtFHXFhf3FdBmBTP7M,38386
8
+ bplusplus/validation.py,sha256=lEWgcAtuQOunOk92P8FKcaS5nVhxUO5HZ9OR3Zeys2U,24845
9
+ bplusplus-2.0.4.dist-info/LICENSE,sha256=rRkeHptDnlmviR0_WWgNT9t696eys_cjfVUU8FEO4k4,1071
10
+ bplusplus-2.0.4.dist-info/METADATA,sha256=-HVEyRL8g9xqw1OZd3W2kE-AdmItjKHae7GMsE7OGOk,11402
11
+ bplusplus-2.0.4.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
12
+ bplusplus-2.0.4.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: poetry-core 2.1.3
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any