bplusplus 1.2.2__py3-none-any.whl → 1.2.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of bplusplus might be problematic. Click here for more details.
- bplusplus/__init__.py +13 -5
- bplusplus/inference.py +929 -0
- bplusplus/prepare.py +416 -648
- bplusplus/{hierarchical/test.py → test.py} +32 -9
- bplusplus/tracker.py +261 -0
- bplusplus/{hierarchical/train.py → train.py} +48 -14
- bplusplus-1.2.4.dist-info/METADATA +207 -0
- bplusplus-1.2.4.dist-info/RECORD +11 -0
- {bplusplus-1.2.2.dist-info → bplusplus-1.2.4.dist-info}/WHEEL +1 -1
- bplusplus/resnet/test.py +0 -473
- bplusplus/resnet/train.py +0 -329
- bplusplus/train_validate.py +0 -11
- bplusplus-1.2.2.dist-info/METADATA +0 -260
- bplusplus-1.2.2.dist-info/RECORD +0 -12
- {bplusplus-1.2.2.dist-info → bplusplus-1.2.4.dist-info}/LICENSE +0 -0
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: bplusplus
|
|
3
|
+
Version: 1.2.4
|
|
4
|
+
Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
|
|
5
|
+
License: MIT
|
|
6
|
+
Author: Titus Venverloo
|
|
7
|
+
Author-email: tvenver@mit.edu
|
|
8
|
+
Requires-Python: >=3.10,<4.0
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
15
|
+
Requires-Dist: numpy (==1.26.4)
|
|
16
|
+
Requires-Dist: pandas (==2.1.4)
|
|
17
|
+
Requires-Dist: pillow (==11.3.0)
|
|
18
|
+
Requires-Dist: prettytable (==3.7.0)
|
|
19
|
+
Requires-Dist: pygbif (==0.6.5)
|
|
20
|
+
Requires-Dist: pyyaml (==6.0.1)
|
|
21
|
+
Requires-Dist: requests (==2.25.1)
|
|
22
|
+
Requires-Dist: scikit-learn (==1.7.1)
|
|
23
|
+
Requires-Dist: tabulate (==0.9.0)
|
|
24
|
+
Requires-Dist: tqdm (==4.66.4)
|
|
25
|
+
Requires-Dist: ultralytics (==8.3.173)
|
|
26
|
+
Requires-Dist: validators (==0.33.0)
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
|
|
29
|
+
# B++ repository
|
|
30
|
+
|
|
31
|
+
[](https://zenodo.org/badge/latestdoi/765250194)
|
|
32
|
+
[](https://pypi.org/project/bplusplus/)
|
|
33
|
+
[](https://pypi.org/project/bplusplus/)
|
|
34
|
+
[](https://pypi.org/project/bplusplus/)
|
|
35
|
+
[](https://pepy.tech/project/bplusplus)
|
|
36
|
+
[](https://pepy.tech/project/bplusplus)
|
|
37
|
+
[](https://pepy.tech/project/bplusplus)
|
|
38
|
+
|
|
39
|
+
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
40
|
+
|
|
41
|
+
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
42
|
+
|
|
43
|
+
## Key Features
|
|
44
|
+
|
|
45
|
+
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
46
|
+
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
47
|
+
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
48
|
+
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
49
|
+
## Pipeline Overview
|
|
50
|
+
|
|
51
|
+
The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
52
|
+
|
|
53
|
+
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
54
|
+
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
55
|
+
3. **Train Model**: Train the hierarchical classification model.
|
|
56
|
+
4. **Download Weights**: Fetch pre-trained weights for the detection model.
|
|
57
|
+
5. **Test Model**: Evaluate the performance of the trained model.
|
|
58
|
+
6. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
59
|
+
|
|
60
|
+
## How to Use
|
|
61
|
+
|
|
62
|
+
### Prerequisites
|
|
63
|
+
|
|
64
|
+
- Python 3.10+
|
|
65
|
+
|
|
66
|
+
### Setup
|
|
67
|
+
|
|
68
|
+
1. **Create and activate a virtual environment:**
|
|
69
|
+
```bash
|
|
70
|
+
python3 -m venv venv
|
|
71
|
+
source venv/bin/activate
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
2. **Install the required packages:**
|
|
75
|
+
```bash
|
|
76
|
+
pip install bplusplus
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Running the Pipeline
|
|
80
|
+
|
|
81
|
+
The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
|
|
82
|
+
|
|
83
|
+
#### Step 1: Collect Data
|
|
84
|
+
Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
import bplusplus
|
|
88
|
+
from pathlib import Path
|
|
89
|
+
|
|
90
|
+
# Define species and directories
|
|
91
|
+
names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
|
|
92
|
+
GBIF_DATA_DIR = Path("./GBIF_data")
|
|
93
|
+
|
|
94
|
+
# Define search parameters
|
|
95
|
+
search = {"scientificName": names}
|
|
96
|
+
|
|
97
|
+
# Run collection
|
|
98
|
+
bplusplus.collect(
|
|
99
|
+
group_by_key=bplusplus.Group.scientificName,
|
|
100
|
+
search_parameters=search,
|
|
101
|
+
images_per_group=200, # Recommended to download more than needed
|
|
102
|
+
output_directory=GBIF_DATA_DIR,
|
|
103
|
+
num_threads=5
|
|
104
|
+
)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
#### Step 2: Prepare Data
|
|
108
|
+
Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
PREPARED_DATA_DIR = Path("./prepared_data")
|
|
112
|
+
|
|
113
|
+
bplusplus.prepare(
|
|
114
|
+
input_directory=GBIF_DATA_DIR,
|
|
115
|
+
output_directory=PREPARED_DATA_DIR,
|
|
116
|
+
img_size=640 # Target image size for training
|
|
117
|
+
)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
#### Step 3: Train Model
|
|
121
|
+
Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
TRAINED_MODEL_DIR = Path("./trained_model")
|
|
125
|
+
|
|
126
|
+
bplusplus.train(
|
|
127
|
+
batch_size=4,
|
|
128
|
+
epochs=30,
|
|
129
|
+
patience=3,
|
|
130
|
+
img_size=640,
|
|
131
|
+
data_dir=PREPARED_DATA_DIR,
|
|
132
|
+
output_dir=TRAINED_MODEL_DIR,
|
|
133
|
+
species_list=names
|
|
134
|
+
# num_workers=0 # Optional: force single-process loading (most stable)
|
|
135
|
+
)
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). You can increase it for potentially faster data loading.
|
|
139
|
+
|
|
140
|
+
#### Step 4: Download Detection Weights
|
|
141
|
+
The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.
|
|
142
|
+
|
|
143
|
+
You can download the weights file from [this link](https://github.com/Tvenver/Bplusplus/releases/download/v1.2.3/v11small-generic.pt).
|
|
144
|
+
|
|
145
|
+
Place it in the `trained_model` directory and ensure it is named `yolo_weights.pt`.
|
|
146
|
+
|
|
147
|
+
#### Step 5: Run Inference on Video
|
|
148
|
+
Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
VIDEO_INPUT_PATH = Path("my_video.mp4")
|
|
152
|
+
VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
|
|
153
|
+
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
|
|
154
|
+
YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"
|
|
155
|
+
|
|
156
|
+
bplusplus.inference(
|
|
157
|
+
species_list=names,
|
|
158
|
+
yolo_model_path=YOLO_WEIGHTS_PATH,
|
|
159
|
+
hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
|
|
160
|
+
confidence_threshold=0.35,
|
|
161
|
+
video_path=VIDEO_INPUT_PATH,
|
|
162
|
+
output_path=VIDEO_OUTPUT_PATH,
|
|
163
|
+
tracker_max_frames=60,
|
|
164
|
+
fps=15 # Optional: set processing FPS
|
|
165
|
+
)
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Customization
|
|
169
|
+
|
|
170
|
+
To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
|
|
171
|
+
|
|
172
|
+
```python
|
|
173
|
+
# To use your own species, change the names in this list
|
|
174
|
+
names = [
|
|
175
|
+
"Vespa crabro",
|
|
176
|
+
"Vespula vulgaris",
|
|
177
|
+
"Dolichovespula media",
|
|
178
|
+
# Add your species here
|
|
179
|
+
]
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
#### Handling an "Unknown" Class
|
|
183
|
+
To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
# Example with an unknown class
|
|
187
|
+
names_with_unknown = [
|
|
188
|
+
"Vespa crabro",
|
|
189
|
+
"Vespula vulgaris",
|
|
190
|
+
"unknown"
|
|
191
|
+
]
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Directory Structure
|
|
195
|
+
|
|
196
|
+
The pipeline will create the following directories to store artifacts:
|
|
197
|
+
|
|
198
|
+
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
199
|
+
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
|
|
200
|
+
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
|
|
201
|
+
|
|
202
|
+
# Citation
|
|
203
|
+
|
|
204
|
+
All information in this GitHub is available under MIT license, as long as credit is given to the authors.
|
|
205
|
+
|
|
206
|
+
**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**
|
|
207
|
+
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
bplusplus/__init__.py,sha256=Sr1vk4ocDfV19fC0P4Yds2lXUIHF3XtUHbOKWU4B3Ic,462
|
|
2
|
+
bplusplus/collect.py,sha256=lEJHXPpOo4DALBw6zemdmFuqAXZ12-BKwgesvq5ACYs,7135
|
|
3
|
+
bplusplus/inference.py,sha256=K1_IZqIzpv9yKOfPKCIEe0usDkDgUEE7n-n0d8MgMHk,39259
|
|
4
|
+
bplusplus/prepare.py,sha256=dq7kA5tKyhwRGpE8XFXZse1D5oGhA-O-5GqJnZSw0dA,25924
|
|
5
|
+
bplusplus/test.py,sha256=Ptt_yvDpxBoY_U4AB_aJLmeT9pHkccBDaEZjULIgeYk,30850
|
|
6
|
+
bplusplus/tracker.py,sha256=JixV1ICGywGhVMTvkq3hrk4MLUUWDh3XJW4VLm4JdO0,11250
|
|
7
|
+
bplusplus/train.py,sha256=MCf_hYBfcqMm_mA34pbno4uasjUBFrBN0QUxvQeXSlg,29946
|
|
8
|
+
bplusplus-1.2.4.dist-info/LICENSE,sha256=rRkeHptDnlmviR0_WWgNT9t696eys_cjfVUU8FEO4k4,1071
|
|
9
|
+
bplusplus-1.2.4.dist-info/METADATA,sha256=fIX3m5RXNgwD2VO6wKRM_OtZFUl3YUES-fsXmID-MSk,8037
|
|
10
|
+
bplusplus-1.2.4.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
|
|
11
|
+
bplusplus-1.2.4.dist-info/RECORD,,
|