bplusplus 2.0.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- bplusplus/__init__.py +15 -0
- bplusplus/collect.py +523 -0
- bplusplus/detector.py +376 -0
- bplusplus/inference.py +1337 -0
- bplusplus/prepare.py +706 -0
- bplusplus/tracker.py +261 -0
- bplusplus/train.py +913 -0
- bplusplus/validation.py +580 -0
- bplusplus-2.0.4.dist-info/LICENSE +21 -0
- bplusplus-2.0.4.dist-info/METADATA +259 -0
- bplusplus-2.0.4.dist-info/RECORD +12 -0
- bplusplus-2.0.4.dist-info/WHEEL +4 -0
|
@@ -0,0 +1,259 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: bplusplus
|
|
3
|
+
Version: 2.0.4
|
|
4
|
+
Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
|
|
5
|
+
License: MIT
|
|
6
|
+
Author: Titus Venverloo
|
|
7
|
+
Author-email: tvenver@mit.edu
|
|
8
|
+
Requires-Python: >=3.10,<4.0
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
15
|
+
Requires-Dist: numpy (>=1.26.0,<1.26.5) ; sys_platform == "win32"
|
|
16
|
+
Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "darwin" and platform_machine == "arm64"
|
|
17
|
+
Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
|
|
18
|
+
Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "linux" and platform_machine == "aarch64"
|
|
19
|
+
Requires-Dist: numpy (>=1.26.0,<1.27.0) ; sys_platform == "linux" and platform_machine == "x86_64"
|
|
20
|
+
Requires-Dist: pandas (==2.1.4)
|
|
21
|
+
Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "darwin"
|
|
22
|
+
Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "linux"
|
|
23
|
+
Requires-Dist: pillow (>=10.0.0,<12.0.0) ; sys_platform == "win32"
|
|
24
|
+
Requires-Dist: prettytable (==3.7.0)
|
|
25
|
+
Requires-Dist: pygbif (==0.6.5)
|
|
26
|
+
Requires-Dist: pyyaml (==6.0.1)
|
|
27
|
+
Requires-Dist: requests (==2.25.1)
|
|
28
|
+
Requires-Dist: scikit-learn (>=1.3.0,<1.7.0) ; sys_platform == "linux" and platform_machine == "aarch64"
|
|
29
|
+
Requires-Dist: scikit-learn (>=1.3.0,<1.7.0) ; sys_platform == "win32"
|
|
30
|
+
Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "darwin" and platform_machine == "arm64"
|
|
31
|
+
Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
|
|
32
|
+
Requires-Dist: scikit-learn (>=1.4.0,<1.8.0) ; sys_platform == "linux" and platform_machine == "x86_64"
|
|
33
|
+
Requires-Dist: tabulate (==0.9.0)
|
|
34
|
+
Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "darwin" and platform_machine == "arm64"
|
|
35
|
+
Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "linux"
|
|
36
|
+
Requires-Dist: torch (>=2.0.0,<2.8.0) ; sys_platform == "win32"
|
|
37
|
+
Requires-Dist: torch (>=2.2.0,<2.3.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
|
|
38
|
+
Requires-Dist: tqdm (==4.66.4)
|
|
39
|
+
Requires-Dist: ultralytics (==8.3.173)
|
|
40
|
+
Requires-Dist: validators (==0.33.0)
|
|
41
|
+
Description-Content-Type: text/markdown
|
|
42
|
+
|
|
43
|
+
# B++ repository
|
|
44
|
+
|
|
45
|
+
[](https://zenodo.org/badge/latestdoi/765250194)
|
|
46
|
+
[](https://pypi.org/project/bplusplus/)
|
|
47
|
+
[](https://pypi.org/project/bplusplus/)
|
|
48
|
+
[](https://pypi.org/project/bplusplus/)
|
|
49
|
+
[](https://pepy.tech/project/bplusplus)
|
|
50
|
+
[](https://pepy.tech/project/bplusplus)
|
|
51
|
+
[](https://pepy.tech/project/bplusplus)
|
|
52
|
+
|
|
53
|
+
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
54
|
+
|
|
55
|
+
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
56
|
+
|
|
57
|
+
## Key Features
|
|
58
|
+
|
|
59
|
+
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
60
|
+
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
61
|
+
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
62
|
+
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
63
|
+
## Pipeline Overview
|
|
64
|
+
|
|
65
|
+
The process is broken down into five main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
66
|
+
|
|
67
|
+
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
68
|
+
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
69
|
+
3. **Train Model**: Train the hierarchical classification model.
|
|
70
|
+
4. **Validate Model**: Evaluate the performance of the trained model.
|
|
71
|
+
5. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
72
|
+
|
|
73
|
+
## How to Use
|
|
74
|
+
|
|
75
|
+
### Prerequisites
|
|
76
|
+
|
|
77
|
+
- Python 3.10+
|
|
78
|
+
|
|
79
|
+
### Setup
|
|
80
|
+
|
|
81
|
+
1. **Create and activate a virtual environment:**
|
|
82
|
+
```bash
|
|
83
|
+
python3 -m venv venv
|
|
84
|
+
source venv/bin/activate
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
2. **Install the required packages:**
|
|
88
|
+
```bash
|
|
89
|
+
pip install bplusplus
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Running the Pipeline
|
|
93
|
+
|
|
94
|
+
The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
|
|
95
|
+
|
|
96
|
+
#### Step 1: Collect Data
|
|
97
|
+
Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
import bplusplus
|
|
101
|
+
from pathlib import Path
|
|
102
|
+
|
|
103
|
+
# Define species and directories
|
|
104
|
+
names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
|
|
105
|
+
GBIF_DATA_DIR = Path("./GBIF_data")
|
|
106
|
+
|
|
107
|
+
# Define search parameters
|
|
108
|
+
search = {"scientificName": names}
|
|
109
|
+
|
|
110
|
+
# Run collection
|
|
111
|
+
bplusplus.collect(
|
|
112
|
+
group_by_key=bplusplus.Group.scientificName,
|
|
113
|
+
search_parameters=search,
|
|
114
|
+
images_per_group=200, # Recommended to download more than needed
|
|
115
|
+
output_directory=GBIF_DATA_DIR,
|
|
116
|
+
num_threads=5
|
|
117
|
+
)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
#### Step 2: Prepare Data
|
|
121
|
+
Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
PREPARED_DATA_DIR = Path("./prepared_data")
|
|
125
|
+
|
|
126
|
+
bplusplus.prepare(
|
|
127
|
+
input_directory=GBIF_DATA_DIR,
|
|
128
|
+
output_directory=PREPARED_DATA_DIR,
|
|
129
|
+
img_size=640, # Target image size for training
|
|
130
|
+
conf=0.6, # Detection confidence threshold (0-1)
|
|
131
|
+
valid=0.1, # Validation split ratio (0-1), set to 0 for no validation
|
|
132
|
+
blur=None, # Gaussian blur as fraction of image size (0-1), None = disabled
|
|
133
|
+
)
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**Note:** The `blur` parameter applies Gaussian blur before resizing, which can help reduce noise. Values are relative to image size (e.g., `blur=0.01` means 1% of the smallest dimension). Supported image formats: JPG, JPEG, and PNG.
|
|
137
|
+
|
|
138
|
+
#### Step 3: Train Model
|
|
139
|
+
Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
|
|
140
|
+
|
|
141
|
+
```python
|
|
142
|
+
TRAINED_MODEL_DIR = Path("./trained_model")
|
|
143
|
+
|
|
144
|
+
bplusplus.train(
|
|
145
|
+
batch_size=4,
|
|
146
|
+
epochs=30,
|
|
147
|
+
patience=3,
|
|
148
|
+
img_size=640,
|
|
149
|
+
data_dir=PREPARED_DATA_DIR,
|
|
150
|
+
output_dir=TRAINED_MODEL_DIR,
|
|
151
|
+
species_list=names,
|
|
152
|
+
backbone="resnet50", # Choose: "resnet18", "resnet50", or "resnet101"
|
|
153
|
+
# num_workers=0, # Optional: force single-process loading (most stable)
|
|
154
|
+
# train_transforms=custom_transforms, # Optional: custom torchvision transforms
|
|
155
|
+
)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). The `backbone` parameter allows you to choose between different ResNet architectures—use `resnet18` for faster training or `resnet101` for potentially better accuracy.
|
|
159
|
+
|
|
160
|
+
#### Step 4: Validate Model
|
|
161
|
+
Evaluate the trained model on a held-out validation set. This calculates precision, recall, and F1-score at all taxonomic levels.
|
|
162
|
+
|
|
163
|
+
```python
|
|
164
|
+
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
|
|
165
|
+
|
|
166
|
+
results = bplusplus.validate(
|
|
167
|
+
species_list=names,
|
|
168
|
+
validation_dir=PREPARED_DATA_DIR / "valid",
|
|
169
|
+
hierarchical_weights=HIERARCHICAL_MODEL_PATH,
|
|
170
|
+
img_size=640, # Must match training
|
|
171
|
+
batch_size=32,
|
|
172
|
+
backbone="resnet50", # Must match training
|
|
173
|
+
)
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
#### Step 5: Run Inference on Video
|
|
177
|
+
Process a video file to detect, classify, and track insects using motion-based detection. The pipeline uses background subtraction (GMM) to detect moving insects, tracks them across frames, and classifies confirmed tracks.
|
|
178
|
+
|
|
179
|
+
**Output files generated in `output_dir`:**
|
|
180
|
+
- `{video}_annotated.mp4` - Video showing confirmed tracks with classifications
|
|
181
|
+
- `{video}_debug.mp4` - Debug video with motion mask and all detections
|
|
182
|
+
- `{video}_results.csv` - Aggregated results per confirmed track
|
|
183
|
+
- `{video}_detections.csv` - Frame-by-frame detection data
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
VIDEO_INPUT_PATH = Path("my_video.mp4")
|
|
187
|
+
OUTPUT_DIR = Path("./output")
|
|
188
|
+
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
|
|
189
|
+
|
|
190
|
+
results = bplusplus.inference(
|
|
191
|
+
species_list=names,
|
|
192
|
+
hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
|
|
193
|
+
video_path=VIDEO_INPUT_PATH,
|
|
194
|
+
output_dir=OUTPUT_DIR,
|
|
195
|
+
fps=None, # None = process all frames
|
|
196
|
+
backbone="resnet50", # Must match training
|
|
197
|
+
save_video=True, # Set to False to skip video rendering (only CSV output)
|
|
198
|
+
img_size=60, # Must match training
|
|
199
|
+
)
|
|
200
|
+
|
|
201
|
+
print(f"Detected {results['tracks']} tracks ({results['confirmed_tracks']} confirmed)")
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Note:** Set `save_video=False` to skip generating the annotated and debug videos, which speeds up processing when you only need the CSV detection data.
|
|
205
|
+
|
|
206
|
+
**Custom Detection Configuration:**
|
|
207
|
+
|
|
208
|
+
For advanced control over detection parameters, provide a YAML config file:
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
results = bplusplus.inference(
|
|
212
|
+
...,
|
|
213
|
+
config="detection_config.yaml"
|
|
214
|
+
)
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Download a template config from the [releases page](https://github.com/Tvenver/Bplusplus/releases). Parameters control cohesiveness filtering, shape filtering, tracking behavior, and path topology analysis for confirming insect-like movement.
|
|
218
|
+
|
|
219
|
+
### Customization
|
|
220
|
+
|
|
221
|
+
To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
|
|
222
|
+
|
|
223
|
+
```python
|
|
224
|
+
# To use your own species, change the names in this list
|
|
225
|
+
names = [
|
|
226
|
+
"Vespa crabro",
|
|
227
|
+
"Vespula vulgaris",
|
|
228
|
+
"Dolichovespula media",
|
|
229
|
+
# Add your species here
|
|
230
|
+
]
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
#### Handling an "Unknown" Class
|
|
234
|
+
To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
# Example with an unknown class
|
|
238
|
+
names_with_unknown = [
|
|
239
|
+
"Vespa crabro",
|
|
240
|
+
"Vespula vulgaris",
|
|
241
|
+
"unknown"
|
|
242
|
+
]
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
## Directory Structure
|
|
246
|
+
|
|
247
|
+
The pipeline will create the following directories to store artifacts:
|
|
248
|
+
|
|
249
|
+
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
250
|
+
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training (`train/` and optionally `valid/` subdirectories).
|
|
251
|
+
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`).
|
|
252
|
+
- `output/`: Inference results including annotated videos and CSV files.
|
|
253
|
+
|
|
254
|
+
# Citation
|
|
255
|
+
|
|
256
|
+
All information in this GitHub is available under MIT license, as long as credit is given to the authors.
|
|
257
|
+
|
|
258
|
+
**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**
|
|
259
|
+
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
bplusplus/__init__.py,sha256=JFkrasmmlnhq4Y6EbFI2IVKE1PIBvQeHpdSIaBwM1oQ,472
|
|
2
|
+
bplusplus/collect.py,sha256=OcGq5HmfiLu83uTk1Swa_3-Yf1PJnwRuK_aVErFH9Wk,17355
|
|
3
|
+
bplusplus/detector.py,sha256=Jef2aYhMdJfTAHpKI0hmjwa50nQhJJNxHJJ0mNIceMU,11725
|
|
4
|
+
bplusplus/inference.py,sha256=mECL3536c6mgqh8sB9jCzXay30i3R9sWfoLU6ePajrg,54762
|
|
5
|
+
bplusplus/prepare.py,sha256=b5K383iYzgctLFUYeaTUbyEr2VcmZGjCWnvk4VBKxO8,30450
|
|
6
|
+
bplusplus/tracker.py,sha256=JixV1ICGywGhVMTvkq3hrk4MLUUWDh3XJW4VLm4JdO0,11250
|
|
7
|
+
bplusplus/train.py,sha256=BsqjZ5OkDrkFzzjeJXLgc1fYXHtFHXFhf3FdBmBTP7M,38386
|
|
8
|
+
bplusplus/validation.py,sha256=lEWgcAtuQOunOk92P8FKcaS5nVhxUO5HZ9OR3Zeys2U,24845
|
|
9
|
+
bplusplus-2.0.4.dist-info/LICENSE,sha256=rRkeHptDnlmviR0_WWgNT9t696eys_cjfVUU8FEO4k4,1071
|
|
10
|
+
bplusplus-2.0.4.dist-info/METADATA,sha256=-HVEyRL8g9xqw1OZd3W2kE-AdmItjKHae7GMsE7OGOk,11402
|
|
11
|
+
bplusplus-2.0.4.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
|
|
12
|
+
bplusplus-2.0.4.dist-info/RECORD,,
|