bplusplus 1.2.3__tar.gz → 1.2.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of bplusplus might be problematic. Click here for more details.
- bplusplus-1.2.4/PKG-INFO +207 -0
- bplusplus-1.2.4/README.md +178 -0
- {bplusplus-1.2.3 → bplusplus-1.2.4}/pyproject.toml +9 -11
- bplusplus-1.2.4/src/bplusplus/__init__.py +15 -0
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/inference.py +48 -10
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/prepare.py +12 -11
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/test.py +10 -0
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/train.py +48 -14
- bplusplus-1.2.3/PKG-INFO +0 -101
- bplusplus-1.2.3/README.md +0 -69
- bplusplus-1.2.3/src/bplusplus/__init__.py +0 -5
- {bplusplus-1.2.3 → bplusplus-1.2.4}/LICENSE +0 -0
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/collect.py +0 -0
- {bplusplus-1.2.3 → bplusplus-1.2.4}/src/bplusplus/tracker.py +0 -0
bplusplus-1.2.4/PKG-INFO
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: bplusplus
|
|
3
|
+
Version: 1.2.4
|
|
4
|
+
Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
|
|
5
|
+
License: MIT
|
|
6
|
+
Author: Titus Venverloo
|
|
7
|
+
Author-email: tvenver@mit.edu
|
|
8
|
+
Requires-Python: >=3.10,<4.0
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
15
|
+
Requires-Dist: numpy (==1.26.4)
|
|
16
|
+
Requires-Dist: pandas (==2.1.4)
|
|
17
|
+
Requires-Dist: pillow (==11.3.0)
|
|
18
|
+
Requires-Dist: prettytable (==3.7.0)
|
|
19
|
+
Requires-Dist: pygbif (==0.6.5)
|
|
20
|
+
Requires-Dist: pyyaml (==6.0.1)
|
|
21
|
+
Requires-Dist: requests (==2.25.1)
|
|
22
|
+
Requires-Dist: scikit-learn (==1.7.1)
|
|
23
|
+
Requires-Dist: tabulate (==0.9.0)
|
|
24
|
+
Requires-Dist: tqdm (==4.66.4)
|
|
25
|
+
Requires-Dist: ultralytics (==8.3.173)
|
|
26
|
+
Requires-Dist: validators (==0.33.0)
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
|
|
29
|
+
# B++ repository
|
|
30
|
+
|
|
31
|
+
[](https://zenodo.org/badge/latestdoi/765250194)
|
|
32
|
+
[](https://pypi.org/project/bplusplus/)
|
|
33
|
+
[](https://pypi.org/project/bplusplus/)
|
|
34
|
+
[](https://pypi.org/project/bplusplus/)
|
|
35
|
+
[](https://pepy.tech/project/bplusplus)
|
|
36
|
+
[](https://pepy.tech/project/bplusplus)
|
|
37
|
+
[](https://pepy.tech/project/bplusplus)
|
|
38
|
+
|
|
39
|
+
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
40
|
+
|
|
41
|
+
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
42
|
+
|
|
43
|
+
## Key Features
|
|
44
|
+
|
|
45
|
+
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
46
|
+
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
47
|
+
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
48
|
+
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
49
|
+
## Pipeline Overview
|
|
50
|
+
|
|
51
|
+
The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
52
|
+
|
|
53
|
+
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
54
|
+
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
55
|
+
3. **Train Model**: Train the hierarchical classification model.
|
|
56
|
+
4. **Download Weights**: Fetch pre-trained weights for the detection model.
|
|
57
|
+
5. **Test Model**: Evaluate the performance of the trained model.
|
|
58
|
+
6. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
59
|
+
|
|
60
|
+
## How to Use
|
|
61
|
+
|
|
62
|
+
### Prerequisites
|
|
63
|
+
|
|
64
|
+
- Python 3.10+
|
|
65
|
+
|
|
66
|
+
### Setup
|
|
67
|
+
|
|
68
|
+
1. **Create and activate a virtual environment:**
|
|
69
|
+
```bash
|
|
70
|
+
python3 -m venv venv
|
|
71
|
+
source venv/bin/activate
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
2. **Install the required packages:**
|
|
75
|
+
```bash
|
|
76
|
+
pip install bplusplus
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Running the Pipeline
|
|
80
|
+
|
|
81
|
+
The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
|
|
82
|
+
|
|
83
|
+
#### Step 1: Collect Data
|
|
84
|
+
Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
import bplusplus
|
|
88
|
+
from pathlib import Path
|
|
89
|
+
|
|
90
|
+
# Define species and directories
|
|
91
|
+
names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
|
|
92
|
+
GBIF_DATA_DIR = Path("./GBIF_data")
|
|
93
|
+
|
|
94
|
+
# Define search parameters
|
|
95
|
+
search = {"scientificName": names}
|
|
96
|
+
|
|
97
|
+
# Run collection
|
|
98
|
+
bplusplus.collect(
|
|
99
|
+
group_by_key=bplusplus.Group.scientificName,
|
|
100
|
+
search_parameters=search,
|
|
101
|
+
images_per_group=200, # Recommended to download more than needed
|
|
102
|
+
output_directory=GBIF_DATA_DIR,
|
|
103
|
+
num_threads=5
|
|
104
|
+
)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
#### Step 2: Prepare Data
|
|
108
|
+
Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
PREPARED_DATA_DIR = Path("./prepared_data")
|
|
112
|
+
|
|
113
|
+
bplusplus.prepare(
|
|
114
|
+
input_directory=GBIF_DATA_DIR,
|
|
115
|
+
output_directory=PREPARED_DATA_DIR,
|
|
116
|
+
img_size=640 # Target image size for training
|
|
117
|
+
)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
#### Step 3: Train Model
|
|
121
|
+
Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
TRAINED_MODEL_DIR = Path("./trained_model")
|
|
125
|
+
|
|
126
|
+
bplusplus.train(
|
|
127
|
+
batch_size=4,
|
|
128
|
+
epochs=30,
|
|
129
|
+
patience=3,
|
|
130
|
+
img_size=640,
|
|
131
|
+
data_dir=PREPARED_DATA_DIR,
|
|
132
|
+
output_dir=TRAINED_MODEL_DIR,
|
|
133
|
+
species_list=names
|
|
134
|
+
# num_workers=0 # Optional: force single-process loading (most stable)
|
|
135
|
+
)
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). You can increase it for potentially faster data loading.
|
|
139
|
+
|
|
140
|
+
#### Step 4: Download Detection Weights
|
|
141
|
+
The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.
|
|
142
|
+
|
|
143
|
+
You can download the weights file from [this link](https://github.com/Tvenver/Bplusplus/releases/download/v1.2.3/v11small-generic.pt).
|
|
144
|
+
|
|
145
|
+
Place it in the `trained_model` directory and ensure it is named `yolo_weights.pt`.
|
|
146
|
+
|
|
147
|
+
#### Step 5: Run Inference on Video
|
|
148
|
+
Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
VIDEO_INPUT_PATH = Path("my_video.mp4")
|
|
152
|
+
VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
|
|
153
|
+
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
|
|
154
|
+
YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"
|
|
155
|
+
|
|
156
|
+
bplusplus.inference(
|
|
157
|
+
species_list=names,
|
|
158
|
+
yolo_model_path=YOLO_WEIGHTS_PATH,
|
|
159
|
+
hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
|
|
160
|
+
confidence_threshold=0.35,
|
|
161
|
+
video_path=VIDEO_INPUT_PATH,
|
|
162
|
+
output_path=VIDEO_OUTPUT_PATH,
|
|
163
|
+
tracker_max_frames=60,
|
|
164
|
+
fps=15 # Optional: set processing FPS
|
|
165
|
+
)
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Customization
|
|
169
|
+
|
|
170
|
+
To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
|
|
171
|
+
|
|
172
|
+
```python
|
|
173
|
+
# To use your own species, change the names in this list
|
|
174
|
+
names = [
|
|
175
|
+
"Vespa crabro",
|
|
176
|
+
"Vespula vulgaris",
|
|
177
|
+
"Dolichovespula media",
|
|
178
|
+
# Add your species here
|
|
179
|
+
]
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
#### Handling an "Unknown" Class
|
|
183
|
+
To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
|
|
184
|
+
|
|
185
|
+
```python
|
|
186
|
+
# Example with an unknown class
|
|
187
|
+
names_with_unknown = [
|
|
188
|
+
"Vespa crabro",
|
|
189
|
+
"Vespula vulgaris",
|
|
190
|
+
"unknown"
|
|
191
|
+
]
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Directory Structure
|
|
195
|
+
|
|
196
|
+
The pipeline will create the following directories to store artifacts:
|
|
197
|
+
|
|
198
|
+
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
199
|
+
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
|
|
200
|
+
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
|
|
201
|
+
|
|
202
|
+
# Citation
|
|
203
|
+
|
|
204
|
+
All information in this GitHub is available under MIT license, as long as credit is given to the authors.
|
|
205
|
+
|
|
206
|
+
**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**
|
|
207
|
+
|
|
@@ -0,0 +1,178 @@
|
|
|
1
|
+
# B++ repository
|
|
2
|
+
|
|
3
|
+
[](https://zenodo.org/badge/latestdoi/765250194)
|
|
4
|
+
[](https://pypi.org/project/bplusplus/)
|
|
5
|
+
[](https://pypi.org/project/bplusplus/)
|
|
6
|
+
[](https://pypi.org/project/bplusplus/)
|
|
7
|
+
[](https://pepy.tech/project/bplusplus)
|
|
8
|
+
[](https://pepy.tech/project/bplusplus)
|
|
9
|
+
[](https://pepy.tech/project/bplusplus)
|
|
10
|
+
|
|
11
|
+
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
12
|
+
|
|
13
|
+
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
14
|
+
|
|
15
|
+
## Key Features
|
|
16
|
+
|
|
17
|
+
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
18
|
+
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
19
|
+
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
20
|
+
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
21
|
+
## Pipeline Overview
|
|
22
|
+
|
|
23
|
+
The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
24
|
+
|
|
25
|
+
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
26
|
+
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
27
|
+
3. **Train Model**: Train the hierarchical classification model.
|
|
28
|
+
4. **Download Weights**: Fetch pre-trained weights for the detection model.
|
|
29
|
+
5. **Test Model**: Evaluate the performance of the trained model.
|
|
30
|
+
6. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
31
|
+
|
|
32
|
+
## How to Use
|
|
33
|
+
|
|
34
|
+
### Prerequisites
|
|
35
|
+
|
|
36
|
+
- Python 3.10+
|
|
37
|
+
|
|
38
|
+
### Setup
|
|
39
|
+
|
|
40
|
+
1. **Create and activate a virtual environment:**
|
|
41
|
+
```bash
|
|
42
|
+
python3 -m venv venv
|
|
43
|
+
source venv/bin/activate
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
2. **Install the required packages:**
|
|
47
|
+
```bash
|
|
48
|
+
pip install bplusplus
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Running the Pipeline
|
|
52
|
+
|
|
53
|
+
The pipeline can be run step-by-step using the functions from the `bplusplus` library. While the `full_pipeline.ipynb` notebook provides a complete, executable workflow, the core functions are described below.
|
|
54
|
+
|
|
55
|
+
#### Step 1: Collect Data
|
|
56
|
+
Download images for your target species from the GBIF database. You'll need to provide a list of scientific names.
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import bplusplus
|
|
60
|
+
from pathlib import Path
|
|
61
|
+
|
|
62
|
+
# Define species and directories
|
|
63
|
+
names = ["Vespa crabro", "Vespula vulgaris", "Dolichovespula media"]
|
|
64
|
+
GBIF_DATA_DIR = Path("./GBIF_data")
|
|
65
|
+
|
|
66
|
+
# Define search parameters
|
|
67
|
+
search = {"scientificName": names}
|
|
68
|
+
|
|
69
|
+
# Run collection
|
|
70
|
+
bplusplus.collect(
|
|
71
|
+
group_by_key=bplusplus.Group.scientificName,
|
|
72
|
+
search_parameters=search,
|
|
73
|
+
images_per_group=200, # Recommended to download more than needed
|
|
74
|
+
output_directory=GBIF_DATA_DIR,
|
|
75
|
+
num_threads=5
|
|
76
|
+
)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
#### Step 2: Prepare Data
|
|
80
|
+
Process the raw images to extract, crop, and resize insects. This step uses a pre-trained model to ensure only high-quality images are used for training.
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
PREPARED_DATA_DIR = Path("./prepared_data")
|
|
84
|
+
|
|
85
|
+
bplusplus.prepare(
|
|
86
|
+
input_directory=GBIF_DATA_DIR,
|
|
87
|
+
output_directory=PREPARED_DATA_DIR,
|
|
88
|
+
img_size=640 # Target image size for training
|
|
89
|
+
)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
#### Step 3: Train Model
|
|
93
|
+
Train the hierarchical classification model on your prepared data. The model learns to identify family, genus, and species.
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
TRAINED_MODEL_DIR = Path("./trained_model")
|
|
97
|
+
|
|
98
|
+
bplusplus.train(
|
|
99
|
+
batch_size=4,
|
|
100
|
+
epochs=30,
|
|
101
|
+
patience=3,
|
|
102
|
+
img_size=640,
|
|
103
|
+
data_dir=PREPARED_DATA_DIR,
|
|
104
|
+
output_dir=TRAINED_MODEL_DIR,
|
|
105
|
+
species_list=names
|
|
106
|
+
# num_workers=0 # Optional: force single-process loading (most stable)
|
|
107
|
+
)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Note:** The `num_workers` parameter controls DataLoader multiprocessing (defaults to 0 for stability). You can increase it for potentially faster data loading.
|
|
111
|
+
|
|
112
|
+
#### Step 4: Download Detection Weights
|
|
113
|
+
The inference pipeline uses a separate, pre-trained YOLO model for initial insect detection. You need to download its weights manually.
|
|
114
|
+
|
|
115
|
+
You can download the weights file from [this link](https://github.com/Tvenver/Bplusplus/releases/download/v1.2.3/v11small-generic.pt).
|
|
116
|
+
|
|
117
|
+
Place it in the `trained_model` directory and ensure it is named `yolo_weights.pt`.
|
|
118
|
+
|
|
119
|
+
#### Step 5: Run Inference on Video
|
|
120
|
+
Process a video file to detect, classify, and track insects. The final output is an annotated video and a CSV file with aggregated results for each tracked insect.
|
|
121
|
+
|
|
122
|
+
```python
|
|
123
|
+
VIDEO_INPUT_PATH = Path("my_video.mp4")
|
|
124
|
+
VIDEO_OUTPUT_PATH = Path("my_video_annotated.mp4")
|
|
125
|
+
HIERARCHICAL_MODEL_PATH = TRAINED_MODEL_DIR / "best_multitask.pt"
|
|
126
|
+
YOLO_WEIGHTS_PATH = TRAINED_MODEL_DIR / "yolo_weights.pt"
|
|
127
|
+
|
|
128
|
+
bplusplus.inference(
|
|
129
|
+
species_list=names,
|
|
130
|
+
yolo_model_path=YOLO_WEIGHTS_PATH,
|
|
131
|
+
hierarchical_model_path=HIERARCHICAL_MODEL_PATH,
|
|
132
|
+
confidence_threshold=0.35,
|
|
133
|
+
video_path=VIDEO_INPUT_PATH,
|
|
134
|
+
output_path=VIDEO_OUTPUT_PATH,
|
|
135
|
+
tracker_max_frames=60,
|
|
136
|
+
fps=15 # Optional: set processing FPS
|
|
137
|
+
)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Customization
|
|
141
|
+
|
|
142
|
+
To train the model on your own set of insect species, you only need to change the `names` list in **Step 1**. The pipeline will automatically handle the rest.
|
|
143
|
+
|
|
144
|
+
```python
|
|
145
|
+
# To use your own species, change the names in this list
|
|
146
|
+
names = [
|
|
147
|
+
"Vespa crabro",
|
|
148
|
+
"Vespula vulgaris",
|
|
149
|
+
"Dolichovespula media",
|
|
150
|
+
# Add your species here
|
|
151
|
+
]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
#### Handling an "Unknown" Class
|
|
155
|
+
To train a model that can recognize an "unknown" class for insects that don't belong to your target species, add `"unknown"` to your `species_list`. You must also provide a corresponding `unknown` folder containing images of various other insects in your data directories (e.g., `prepared_data/train/unknown`).
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
# Example with an unknown class
|
|
159
|
+
names_with_unknown = [
|
|
160
|
+
"Vespa crabro",
|
|
161
|
+
"Vespula vulgaris",
|
|
162
|
+
"unknown"
|
|
163
|
+
]
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
## Directory Structure
|
|
167
|
+
|
|
168
|
+
The pipeline will create the following directories to store artifacts:
|
|
169
|
+
|
|
170
|
+
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
171
|
+
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
|
|
172
|
+
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
|
|
173
|
+
|
|
174
|
+
# Citation
|
|
175
|
+
|
|
176
|
+
All information in this GitHub is available under MIT license, as long as credit is given to the authors.
|
|
177
|
+
|
|
178
|
+
**Venverloo, T., Duarte, F., B++: Towards Real-Time Monitoring of Insect Species. MIT Senseable City Laboratory, AMS Institute.**
|
|
@@ -1,27 +1,25 @@
|
|
|
1
1
|
[tool.poetry]
|
|
2
2
|
name = "bplusplus"
|
|
3
|
-
version = "1.2.
|
|
3
|
+
version = "1.2.4"
|
|
4
4
|
description = "A simple method to create AI models for biodiversity, with collect and prepare pipeline"
|
|
5
5
|
authors = ["Titus Venverloo <tvenver@mit.edu>", "Deniz Aydemir <deniz@aydemir.us>", "Orlando Closs <orlandocloss@pm.me>", "Ase Hatveit <aase@mit.edu>"]
|
|
6
6
|
license = "MIT"
|
|
7
7
|
readme = "README.md"
|
|
8
8
|
|
|
9
9
|
[tool.poetry.dependencies]
|
|
10
|
-
python = "^3.
|
|
10
|
+
python = "^3.10"
|
|
11
11
|
requests = "2.25.1"
|
|
12
12
|
pandas = "2.1.4"
|
|
13
|
-
ultralytics = "
|
|
13
|
+
ultralytics = "8.3.173"
|
|
14
14
|
pyyaml = "6.0.1"
|
|
15
15
|
tqdm = "4.66.4"
|
|
16
16
|
prettytable = "3.7.0"
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
validators = "^0.33.0"
|
|
24
|
-
tabulate = "^0.9.0"
|
|
17
|
+
pillow = "11.3.0"
|
|
18
|
+
numpy = "1.26.4"
|
|
19
|
+
scikit-learn = "1.7.1"
|
|
20
|
+
pygbif = "0.6.5"
|
|
21
|
+
validators = "0.33.0"
|
|
22
|
+
tabulate = "0.9.0"
|
|
25
23
|
|
|
26
24
|
[tool.poetry.group.dev.dependencies]
|
|
27
25
|
jupyter = "^1.0.0"
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
try:
|
|
2
|
+
import torch
|
|
3
|
+
import torchvision
|
|
4
|
+
except ImportError:
|
|
5
|
+
raise ImportError(
|
|
6
|
+
"PyTorch and Torchvision are not installed. "
|
|
7
|
+
"Please install them before using bplusplus by following the instructions "
|
|
8
|
+
"on the official PyTorch website: https://pytorch.org/get-started/locally/"
|
|
9
|
+
)
|
|
10
|
+
|
|
11
|
+
from .collect import Group, collect
|
|
12
|
+
from .prepare import prepare
|
|
13
|
+
from .train import train
|
|
14
|
+
from .test import test
|
|
15
|
+
from .inference import inference
|
|
@@ -9,6 +9,8 @@ from datetime import datetime
|
|
|
9
9
|
from pathlib import Path
|
|
10
10
|
from .tracker import InsectTracker
|
|
11
11
|
import torch
|
|
12
|
+
import torchvision.transforms as T
|
|
13
|
+
from torchvision.models.detection import fasterrcnn_resnet50_fpn
|
|
12
14
|
from ultralytics import YOLO
|
|
13
15
|
from torchvision import transforms
|
|
14
16
|
from PIL import Image
|
|
@@ -19,6 +21,16 @@ import logging
|
|
|
19
21
|
from collections import defaultdict
|
|
20
22
|
import uuid
|
|
21
23
|
|
|
24
|
+
# Add this check for backwards compatibility
|
|
25
|
+
if hasattr(torch.serialization, 'add_safe_globals'):
|
|
26
|
+
torch.serialization.add_safe_globals([
|
|
27
|
+
'torch.LongTensor',
|
|
28
|
+
'torch.cuda.LongTensor',
|
|
29
|
+
'torch.FloatStorage',
|
|
30
|
+
'torch.FloatStorage',
|
|
31
|
+
'torch.cuda.FloatStorage',
|
|
32
|
+
])
|
|
33
|
+
|
|
22
34
|
# Set up logging
|
|
23
35
|
logging.basicConfig(level=logging.INFO)
|
|
24
36
|
logger = logging.getLogger(__name__)
|
|
@@ -36,12 +48,15 @@ def get_taxonomy(species_list):
|
|
|
36
48
|
species_to_genus = {}
|
|
37
49
|
genus_to_family = {}
|
|
38
50
|
|
|
39
|
-
|
|
51
|
+
species_list_for_gbif = [s for s in species_list if s.lower() != 'unknown']
|
|
52
|
+
has_unknown = len(species_list_for_gbif) != len(species_list)
|
|
53
|
+
|
|
54
|
+
logger.info(f"Building taxonomy from GBIF for {len(species_list_for_gbif)} species")
|
|
40
55
|
|
|
41
56
|
print(f"\n{'Species':<30} {'Family':<20} {'Genus':<20} {'Status'}")
|
|
42
57
|
print("-" * 80)
|
|
43
58
|
|
|
44
|
-
for species_name in
|
|
59
|
+
for species_name in species_list_for_gbif:
|
|
45
60
|
url = f"https://api.gbif.org/v1/species/match?name={species_name}&verbose=true"
|
|
46
61
|
try:
|
|
47
62
|
response = requests.get(url)
|
|
@@ -72,6 +87,21 @@ def get_taxonomy(species_list):
|
|
|
72
87
|
except Exception as e:
|
|
73
88
|
print(f"{species_name:<30} {'Error':<20} {'Error':<20} FAILED")
|
|
74
89
|
logger.error(f"Error retrieving data for '{species_name}': {str(e)}")
|
|
90
|
+
|
|
91
|
+
if has_unknown:
|
|
92
|
+
unknown_family = "Unknown"
|
|
93
|
+
unknown_genus = "Unknown"
|
|
94
|
+
unknown_species = "unknown"
|
|
95
|
+
|
|
96
|
+
if unknown_family not in taxonomy[1]:
|
|
97
|
+
taxonomy[1].append(unknown_family)
|
|
98
|
+
|
|
99
|
+
taxonomy[2][unknown_genus] = unknown_family
|
|
100
|
+
taxonomy[3][unknown_species] = unknown_genus
|
|
101
|
+
species_to_genus[unknown_species] = unknown_genus
|
|
102
|
+
genus_to_family[unknown_genus] = unknown_family
|
|
103
|
+
|
|
104
|
+
print(f"{unknown_species:<30} {unknown_family:<20} {unknown_genus:<20} {'OK'}")
|
|
75
105
|
|
|
76
106
|
taxonomy[1] = sorted(list(set(taxonomy[1])))
|
|
77
107
|
print("-" * 80)
|
|
@@ -85,18 +115,26 @@ def get_taxonomy(species_list):
|
|
|
85
115
|
logger.info(f"Taxonomy built: {len(taxonomy[1])} families, {len(taxonomy[2])} genera, {len(taxonomy[3])} species")
|
|
86
116
|
return taxonomy, species_to_genus, genus_to_family
|
|
87
117
|
|
|
88
|
-
def create_mappings(taxonomy):
|
|
118
|
+
def create_mappings(taxonomy, species_list=None):
|
|
89
119
|
"""Create index mappings from taxonomy"""
|
|
90
120
|
level_to_idx = {}
|
|
91
121
|
idx_to_level = {}
|
|
92
122
|
|
|
93
123
|
for level, labels in taxonomy.items():
|
|
94
124
|
if isinstance(labels, list):
|
|
125
|
+
# Level 1: Family (already sorted)
|
|
95
126
|
level_to_idx[level] = {label: idx for idx, label in enumerate(labels)}
|
|
96
127
|
idx_to_level[level] = {idx: label for idx, label in enumerate(labels)}
|
|
97
|
-
else: # Dictionary
|
|
98
|
-
|
|
99
|
-
|
|
128
|
+
else: # Dictionary for levels 2 and 3
|
|
129
|
+
if level == 3 and species_list is not None:
|
|
130
|
+
# For species, the order is determined by species_list
|
|
131
|
+
sorted_keys = species_list
|
|
132
|
+
else:
|
|
133
|
+
# For genus, sort alphabetically
|
|
134
|
+
sorted_keys = sorted(labels.keys())
|
|
135
|
+
|
|
136
|
+
level_to_idx[level] = {label: idx for idx, label in enumerate(sorted_keys)}
|
|
137
|
+
idx_to_level[level] = {idx: label for idx, label in enumerate(sorted_keys)}
|
|
100
138
|
|
|
101
139
|
return level_to_idx, idx_to_level
|
|
102
140
|
|
|
@@ -321,9 +359,9 @@ class VideoInferenceProcessor:
|
|
|
321
359
|
|
|
322
360
|
# Build taxonomy from species list
|
|
323
361
|
self.taxonomy, self.species_to_genus, self.genus_to_family = get_taxonomy(species_list)
|
|
324
|
-
self.level_to_idx, self.idx_to_level = create_mappings(self.taxonomy)
|
|
325
|
-
self.family_list = self.taxonomy[1]
|
|
326
|
-
self.genus_list = list(self.taxonomy[2].keys())
|
|
362
|
+
self.level_to_idx, self.idx_to_level = create_mappings(self.taxonomy, species_list)
|
|
363
|
+
self.family_list = sorted(self.taxonomy[1])
|
|
364
|
+
self.genus_list = sorted(list(self.taxonomy[2].keys()))
|
|
327
365
|
|
|
328
366
|
# Load models
|
|
329
367
|
print(f"Loading YOLO model from {yolo_model_path}")
|
|
@@ -863,7 +901,7 @@ def main():
|
|
|
863
901
|
species_list = [
|
|
864
902
|
"Coccinella septempunctata", "Apis mellifera", "Bombus lapidarius", "Bombus terrestris",
|
|
865
903
|
"Eupeodes corollae", "Episyrphus balteatus", "Aglais urticae", "Vespula vulgaris",
|
|
866
|
-
"Eristalis tenax"
|
|
904
|
+
"Eristalis tenax", "unknown"
|
|
867
905
|
]
|
|
868
906
|
|
|
869
907
|
# Paths (replace with your actual paths)
|
|
@@ -174,17 +174,18 @@ def _prepare_model_and_clean_images(temp_dir_path: Path):
|
|
|
174
174
|
print(" ✓ Model weights already exist")
|
|
175
175
|
|
|
176
176
|
# Add all required classes to safe globals
|
|
177
|
-
serialization
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
177
|
+
if hasattr(serialization, 'add_safe_globals'):
|
|
178
|
+
serialization.add_safe_globals([
|
|
179
|
+
DetectionModel, Sequential, Conv, Conv2d, BatchNorm2d,
|
|
180
|
+
SiLU, ReLU, LeakyReLU, MaxPool2d, Linear, Dropout, Upsample,
|
|
181
|
+
Module, ModuleList, ModuleDict,
|
|
182
|
+
Bottleneck, C2f, SPPF, Detect, Concat, DFL,
|
|
183
|
+
# Add torch internal classes
|
|
184
|
+
torch.nn.parameter.Parameter,
|
|
185
|
+
torch.Tensor,
|
|
186
|
+
torch._utils._rebuild_tensor_v2,
|
|
187
|
+
torch._utils._rebuild_parameter
|
|
188
|
+
])
|
|
188
189
|
|
|
189
190
|
return weights_path
|
|
190
191
|
|
|
@@ -74,6 +74,16 @@ def setup_gpu():
|
|
|
74
74
|
logger.warning("Falling back to CPU")
|
|
75
75
|
return torch.device("cpu")
|
|
76
76
|
|
|
77
|
+
# Add this check for backwards compatibility
|
|
78
|
+
if hasattr(torch.serialization, 'add_safe_globals'):
|
|
79
|
+
torch.serialization.add_safe_globals([
|
|
80
|
+
'torch.LongTensor',
|
|
81
|
+
'torch.cuda.LongTensor',
|
|
82
|
+
'torch.FloatStorage',
|
|
83
|
+
'torch.FloatStorage',
|
|
84
|
+
'torch.cuda.FloatStorage',
|
|
85
|
+
])
|
|
86
|
+
|
|
77
87
|
class HierarchicalInsectClassifier(nn.Module):
|
|
78
88
|
def __init__(self, num_classes_per_level):
|
|
79
89
|
"""
|
|
@@ -14,18 +14,28 @@ import logging
|
|
|
14
14
|
from tqdm import tqdm
|
|
15
15
|
import sys
|
|
16
16
|
|
|
17
|
-
def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', output_dir='./output', species_list=None):
|
|
17
|
+
def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', output_dir='./output', species_list=None, num_workers=4):
|
|
18
18
|
"""
|
|
19
19
|
Main function to run the entire training pipeline.
|
|
20
20
|
Sets up datasets, model, training process and handles errors.
|
|
21
|
+
|
|
22
|
+
Args:
|
|
23
|
+
batch_size (int): Number of samples per batch. Default: 4
|
|
24
|
+
epochs (int): Maximum number of training epochs. Default: 30
|
|
25
|
+
patience (int): Early stopping patience (epochs without improvement). Default: 3
|
|
26
|
+
img_size (int): Target image size for training. Default: 640
|
|
27
|
+
data_dir (str): Directory containing train/valid subdirectories. Default: 'input'
|
|
28
|
+
output_dir (str): Directory to save trained model and logs. Default: './output'
|
|
29
|
+
species_list (list): List of species names for training. Required.
|
|
30
|
+
num_workers (int): Number of DataLoader worker processes.
|
|
31
|
+
Set to 0 to disable multiprocessing (most stable). Default: 4
|
|
21
32
|
"""
|
|
22
33
|
global logger, device
|
|
23
34
|
|
|
24
35
|
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
|
25
36
|
logger = logging.getLogger(__name__)
|
|
26
37
|
|
|
27
|
-
logger.info(f"Hyperparameters - Batch size: {batch_size}, Epochs: {epochs}, Patience: {patience}, Image size: {img_size}, Data directory: {data_dir}, Output directory: {output_dir}")
|
|
28
|
-
|
|
38
|
+
logger.info(f"Hyperparameters - Batch size: {batch_size}, Epochs: {epochs}, Patience: {patience}, Image size: {img_size}, Data directory: {data_dir}, Output directory: {output_dir}, Num workers: {num_workers}")
|
|
29
39
|
|
|
30
40
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
|
31
41
|
|
|
@@ -52,7 +62,7 @@ def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', o
|
|
|
52
62
|
|
|
53
63
|
taxonomy = get_taxonomy(species_list)
|
|
54
64
|
|
|
55
|
-
level_to_idx, parent_child_relationship = create_mappings(taxonomy)
|
|
65
|
+
level_to_idx, parent_child_relationship = create_mappings(taxonomy, species_list)
|
|
56
66
|
|
|
57
67
|
num_classes_per_level = [len(taxonomy[level]) if isinstance(taxonomy[level], list)
|
|
58
68
|
else len(taxonomy[level].keys()) for level in sorted(taxonomy.keys())]
|
|
@@ -75,14 +85,14 @@ def train(batch_size=4, epochs=30, patience=3, img_size=640, data_dir='input', o
|
|
|
75
85
|
train_dataset,
|
|
76
86
|
batch_size=batch_size,
|
|
77
87
|
shuffle=True,
|
|
78
|
-
num_workers=
|
|
88
|
+
num_workers=num_workers
|
|
79
89
|
)
|
|
80
90
|
|
|
81
91
|
val_loader = DataLoader(
|
|
82
92
|
val_dataset,
|
|
83
93
|
batch_size=batch_size,
|
|
84
94
|
shuffle=False,
|
|
85
|
-
num_workers=
|
|
95
|
+
num_workers=num_workers
|
|
86
96
|
)
|
|
87
97
|
|
|
88
98
|
try:
|
|
@@ -150,14 +160,17 @@ def get_taxonomy(species_list):
|
|
|
150
160
|
species_to_genus = {}
|
|
151
161
|
genus_to_family = {}
|
|
152
162
|
|
|
153
|
-
|
|
163
|
+
species_list_for_gbif = [s for s in species_list if s.lower() != 'unknown']
|
|
164
|
+
has_unknown = len(species_list_for_gbif) != len(species_list)
|
|
165
|
+
|
|
166
|
+
logger.info(f"Building taxonomy from GBIF for {len(species_list_for_gbif)} species")
|
|
154
167
|
|
|
155
168
|
print("\nTaxonomy Results:")
|
|
156
169
|
print("-" * 80)
|
|
157
170
|
print(f"{'Species':<30} {'Family':<20} {'Genus':<20} {'Status'}")
|
|
158
171
|
print("-" * 80)
|
|
159
172
|
|
|
160
|
-
for species_name in
|
|
173
|
+
for species_name in species_list_for_gbif:
|
|
161
174
|
url = f"https://api.gbif.org/v1/species/match?name={species_name}&verbose=true"
|
|
162
175
|
try:
|
|
163
176
|
response = requests.get(url)
|
|
@@ -199,6 +212,19 @@ def get_taxonomy(species_list):
|
|
|
199
212
|
print(f"{species_name:<30} {'Error':<20} {'Error':<20} FAILED")
|
|
200
213
|
print(f"Error: {error_msg}")
|
|
201
214
|
sys.exit(1) # Stop the script
|
|
215
|
+
|
|
216
|
+
if has_unknown:
|
|
217
|
+
unknown_family = "Unknown"
|
|
218
|
+
unknown_genus = "Unknown"
|
|
219
|
+
unknown_species = "unknown"
|
|
220
|
+
|
|
221
|
+
if unknown_family not in taxonomy[1]:
|
|
222
|
+
taxonomy[1].append(unknown_family)
|
|
223
|
+
|
|
224
|
+
taxonomy[2][unknown_genus] = unknown_family
|
|
225
|
+
taxonomy[3][unknown_species] = unknown_genus
|
|
226
|
+
|
|
227
|
+
print(f"{unknown_species:<30} {unknown_family:<20} {unknown_genus:<20} {'OK'}")
|
|
202
228
|
|
|
203
229
|
taxonomy[1] = sorted(list(set(taxonomy[1])))
|
|
204
230
|
print("-" * 80)
|
|
@@ -212,7 +238,7 @@ def get_taxonomy(species_list):
|
|
|
212
238
|
print(f" {i}: {family}")
|
|
213
239
|
|
|
214
240
|
print("\nGenus indices:")
|
|
215
|
-
for i, genus in enumerate(taxonomy[2].keys()):
|
|
241
|
+
for i, genus in enumerate(sorted(taxonomy[2].keys())):
|
|
216
242
|
print(f" {i}: {genus}")
|
|
217
243
|
|
|
218
244
|
print("\nSpecies indices:")
|
|
@@ -244,7 +270,7 @@ def get_species_from_directory(train_dir):
|
|
|
244
270
|
logger.info(f"Found {len(species_list)} species in {train_dir}")
|
|
245
271
|
return species_list
|
|
246
272
|
|
|
247
|
-
def create_mappings(taxonomy):
|
|
273
|
+
def create_mappings(taxonomy, species_list=None):
|
|
248
274
|
"""
|
|
249
275
|
Creates mapping dictionaries from taxonomy data.
|
|
250
276
|
Returns level-to-index mapping and parent-child relationships between taxonomic levels.
|
|
@@ -254,9 +280,17 @@ def create_mappings(taxonomy):
|
|
|
254
280
|
|
|
255
281
|
for level, labels in taxonomy.items():
|
|
256
282
|
if isinstance(labels, list):
|
|
283
|
+
# Level 1: Family (already sorted)
|
|
257
284
|
level_to_idx[level] = {label: idx for idx, label in enumerate(labels)}
|
|
258
|
-
else:
|
|
259
|
-
|
|
285
|
+
else: # dict for levels 2 and 3
|
|
286
|
+
if level == 3 and species_list is not None:
|
|
287
|
+
# For species, the order is determined by species_list
|
|
288
|
+
level_to_idx[level] = {label: idx for idx, label in enumerate(species_list)}
|
|
289
|
+
else:
|
|
290
|
+
# For genus (and as a fallback for species), sort alphabetically
|
|
291
|
+
sorted_keys = sorted(labels.keys())
|
|
292
|
+
level_to_idx[level] = {label: idx for idx, label in enumerate(sorted_keys)}
|
|
293
|
+
|
|
260
294
|
for child, parent in labels.items():
|
|
261
295
|
if (level, parent) not in parent_child_relationship:
|
|
262
296
|
parent_child_relationship[(level, parent)] = []
|
|
@@ -670,7 +704,7 @@ if __name__ == '__main__':
|
|
|
670
704
|
species_list = [
|
|
671
705
|
"Coccinella septempunctata", "Apis mellifera", "Bombus lapidarius", "Bombus terrestris",
|
|
672
706
|
"Eupeodes corollae", "Episyrphus balteatus", "Aglais urticae", "Vespula vulgaris",
|
|
673
|
-
"Eristalis tenax"
|
|
707
|
+
"Eristalis tenax", "unknown"
|
|
674
708
|
]
|
|
675
|
-
|
|
709
|
+
train(species_list=species_list, epochs=2)
|
|
676
710
|
|
bplusplus-1.2.3/PKG-INFO
DELETED
|
@@ -1,101 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.3
|
|
2
|
-
Name: bplusplus
|
|
3
|
-
Version: 1.2.3
|
|
4
|
-
Summary: A simple method to create AI models for biodiversity, with collect and prepare pipeline
|
|
5
|
-
License: MIT
|
|
6
|
-
Author: Titus Venverloo
|
|
7
|
-
Author-email: tvenver@mit.edu
|
|
8
|
-
Requires-Python: >=3.9.0,<4.0.0
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Programming Language :: Python :: 3
|
|
11
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
12
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
-
Requires-Dist: numpy
|
|
17
|
-
Requires-Dist: pandas (==2.1.4)
|
|
18
|
-
Requires-Dist: pillow
|
|
19
|
-
Requires-Dist: prettytable (==3.7.0)
|
|
20
|
-
Requires-Dist: pygbif (>=0.6.4,<0.7.0)
|
|
21
|
-
Requires-Dist: pyyaml (==6.0.1)
|
|
22
|
-
Requires-Dist: requests (==2.25.1)
|
|
23
|
-
Requires-Dist: scikit-learn
|
|
24
|
-
Requires-Dist: tabulate (>=0.9.0,<0.10.0)
|
|
25
|
-
Requires-Dist: torch (>=2.5.0,<3.0.0)
|
|
26
|
-
Requires-Dist: torchvision
|
|
27
|
-
Requires-Dist: tqdm (==4.66.4)
|
|
28
|
-
Requires-Dist: ultralytics (>=8.3.0)
|
|
29
|
-
Requires-Dist: validators (>=0.33.0,<0.34.0)
|
|
30
|
-
Description-Content-Type: text/markdown
|
|
31
|
-
|
|
32
|
-
# Domain-Agnostic Insect Classification Pipeline
|
|
33
|
-
|
|
34
|
-
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
35
|
-
|
|
36
|
-
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
37
|
-
|
|
38
|
-
## Key Features
|
|
39
|
-
|
|
40
|
-
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
41
|
-
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
42
|
-
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
43
|
-
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
44
|
-
## Pipeline Overview
|
|
45
|
-
|
|
46
|
-
The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
47
|
-
|
|
48
|
-
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
49
|
-
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
50
|
-
3. **Train Model**: Train the hierarchical classification model.
|
|
51
|
-
4. **Download Weights**: Fetch pre-trained weights for the detection model.
|
|
52
|
-
5. **Test Model**: Evaluate the performance of the trained model.
|
|
53
|
-
6. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
54
|
-
|
|
55
|
-
## How to Use
|
|
56
|
-
|
|
57
|
-
### Prerequisites
|
|
58
|
-
|
|
59
|
-
- Python 3.8+
|
|
60
|
-
- `venv` for creating a virtual environment (recommended)
|
|
61
|
-
|
|
62
|
-
### Setup
|
|
63
|
-
|
|
64
|
-
1. **Create and activate a virtual environment:**
|
|
65
|
-
```bash
|
|
66
|
-
python3 -m venv venv
|
|
67
|
-
source venv/bin/activate
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
2. **Install the required packages:**
|
|
71
|
-
```bash
|
|
72
|
-
pip install bplusplus
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
### Running the Pipeline
|
|
76
|
-
|
|
77
|
-
The entire workflow is contained within **`full_pipeline.ipynb`**. Open it with a Jupyter Notebook or JupyterLab environment and run the cells sequentially to execute the full pipeline.
|
|
78
|
-
|
|
79
|
-
### Customization
|
|
80
|
-
|
|
81
|
-
To train the model on different insect species, simply modify the `names` list in **Step 1** of the notebook:
|
|
82
|
-
|
|
83
|
-
```python
|
|
84
|
-
# a/full_pipeline.ipynb
|
|
85
|
-
|
|
86
|
-
# To use your own species, change the names in this list
|
|
87
|
-
names = [
|
|
88
|
-
"Vespa crabro", "Vespula vulgaris", "Dolichovespula media"
|
|
89
|
-
]
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
The pipeline will automatically handle the rest, from data collection to training, for your new set of species.
|
|
93
|
-
|
|
94
|
-
## Directory Structure
|
|
95
|
-
|
|
96
|
-
The pipeline will create the following directories to store artifacts:
|
|
97
|
-
|
|
98
|
-
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
99
|
-
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
|
|
100
|
-
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
|
|
101
|
-
|
bplusplus-1.2.3/README.md
DELETED
|
@@ -1,69 +0,0 @@
|
|
|
1
|
-
# Domain-Agnostic Insect Classification Pipeline
|
|
2
|
-
|
|
3
|
-
This project provides a complete, end-to-end pipeline for building a custom insect classification system. The framework is designed to be **domain-agnostic**, allowing you to train a powerful detection and classification model for **any insect species** by simply providing a list of names.
|
|
4
|
-
|
|
5
|
-
Using the `Bplusplus` library, this pipeline automates the entire machine learning workflow, from data collection to video inference.
|
|
6
|
-
|
|
7
|
-
## Key Features
|
|
8
|
-
|
|
9
|
-
- **Automated Data Collection**: Downloads hundreds of images for any species from the GBIF database.
|
|
10
|
-
- **Intelligent Data Preparation**: Uses a pre-trained model to automatically find, crop, and resize insects from raw images, ensuring high-quality training data.
|
|
11
|
-
- **Hierarchical Classification**: Trains a model to identify insects at three taxonomic levels: **family, genus, and species**.
|
|
12
|
-
- **Video Inference & Tracking**: Processes video files to detect, classify, and track individual insects over time, providing aggregated predictions.
|
|
13
|
-
## Pipeline Overview
|
|
14
|
-
|
|
15
|
-
The process is broken down into six main steps, all detailed in the `full_pipeline.ipynb` notebook:
|
|
16
|
-
|
|
17
|
-
1. **Collect Data**: Select your target species and fetch raw insect images from the web.
|
|
18
|
-
2. **Prepare Data**: Filter, clean, and prepare images for training.
|
|
19
|
-
3. **Train Model**: Train the hierarchical classification model.
|
|
20
|
-
4. **Download Weights**: Fetch pre-trained weights for the detection model.
|
|
21
|
-
5. **Test Model**: Evaluate the performance of the trained model.
|
|
22
|
-
6. **Run Inference**: Run the full pipeline on a video file for real-world application.
|
|
23
|
-
|
|
24
|
-
## How to Use
|
|
25
|
-
|
|
26
|
-
### Prerequisites
|
|
27
|
-
|
|
28
|
-
- Python 3.8+
|
|
29
|
-
- `venv` for creating a virtual environment (recommended)
|
|
30
|
-
|
|
31
|
-
### Setup
|
|
32
|
-
|
|
33
|
-
1. **Create and activate a virtual environment:**
|
|
34
|
-
```bash
|
|
35
|
-
python3 -m venv venv
|
|
36
|
-
source venv/bin/activate
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
2. **Install the required packages:**
|
|
40
|
-
```bash
|
|
41
|
-
pip install bplusplus
|
|
42
|
-
```
|
|
43
|
-
|
|
44
|
-
### Running the Pipeline
|
|
45
|
-
|
|
46
|
-
The entire workflow is contained within **`full_pipeline.ipynb`**. Open it with a Jupyter Notebook or JupyterLab environment and run the cells sequentially to execute the full pipeline.
|
|
47
|
-
|
|
48
|
-
### Customization
|
|
49
|
-
|
|
50
|
-
To train the model on different insect species, simply modify the `names` list in **Step 1** of the notebook:
|
|
51
|
-
|
|
52
|
-
```python
|
|
53
|
-
# a/full_pipeline.ipynb
|
|
54
|
-
|
|
55
|
-
# To use your own species, change the names in this list
|
|
56
|
-
names = [
|
|
57
|
-
"Vespa crabro", "Vespula vulgaris", "Dolichovespula media"
|
|
58
|
-
]
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
The pipeline will automatically handle the rest, from data collection to training, for your new set of species.
|
|
62
|
-
|
|
63
|
-
## Directory Structure
|
|
64
|
-
|
|
65
|
-
The pipeline will create the following directories to store artifacts:
|
|
66
|
-
|
|
67
|
-
- `GBIF_data/`: Stores the raw images downloaded from GBIF.
|
|
68
|
-
- `prepared_data/`: Contains the cleaned, cropped, and resized images ready for training.
|
|
69
|
-
- `trained_model/`: Saves the trained model weights (`best_multitask.pt`) and pre-trained detection weights.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|