pyseter 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
pyseter-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Philip Patton
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
pyseter-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,144 @@
1
+ Metadata-Version: 2.4
2
+ Name: pyseter
3
+ Version: 0.1.0
4
+ Summary: Sort images by an automatically generated ID before photo-ID
5
+ Author-email: "Philip T. Patton" <philtpatton@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/philpatton/pyseter
8
+ Project-URL: Repository, https://github.com/philpatton/pyseter
9
+ Project-URL: Bug Tracker, https://github.com/philpatton/pyseter/issues
10
+ Keywords: individual identification,pytorch,marine mammal
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Requires-Python: >=3.8
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: numpy>=1.21.0
22
+ Requires-Dist: pandas>=1.3.0
23
+ Requires-Dist: scikit-learn>=1.1.0
24
+ Requires-Dist: networkx>=2.6.0
25
+ Requires-Dist: matplotlib>=3.5.0
26
+ Requires-Dist: tqdm>=4.60.0
27
+ Requires-Dist: timm>=0.6.0
28
+ Dynamic: license-file
29
+
30
+ # Pyseter
31
+
32
+ A Python package that sorts images by an automatically generated ID before photo-identification.
33
+
34
+ ## Installation
35
+
36
+ ### New to Python?
37
+
38
+ While most biologists use R, we chose to release Pyseter as a Python package because it relies heavily on Pytorch, a deep learning library. If you're new to Python, please follow these steps to getting started with Python and conda.
39
+
40
+ #### Step 1: Install conda
41
+
42
+ Conda is an important tool for managing packages in Python. Unlike Python, R (for the most part) handles packages for you behind the scenes. Python requires a more hands on approach.
43
+
44
+ - Download and install [Miniforge](https://conda-forge.org/download/) (a form of conda)
45
+
46
+ After installing, you can verify your installation by opening the **command line interface** (CLI), which will depend on your operating system. Are you on Windows? Open the "miniforge prompt" in your start menu. Are you on Mac? Open the Terminal application. Then, type the following command into the CLI and hit return.
47
+
48
+ ```bash
49
+ conda --version
50
+ ```
51
+
52
+ You should see something like `conda 25.5.1`. Of course, Anaconda, miniconda, mamba, or any other form of conda will work too.
53
+
54
+ #### Step 2: Create a new environment
55
+
56
+ Then, you'll create an environment for the package will live in. Environments are walled off areas where we can install packages. This allows you to have multiple versions of the same package installed on your machine, which can help prevent conflicts.
57
+
58
+ Enter the following two commands into the CLI:
59
+
60
+ ``` bash
61
+ conda create -n pyseter_env
62
+ conda activate pyseter_env
63
+ ```
64
+
65
+ Here, I name (hence the `-n`) the environment `pyseter_env`, but you can call it anything you like!
66
+
67
+ Now your environment is ready to go! Try installing your first package, pip. Pip is another way of installing Python packages, and will be helpful for installing PyTorch and pyseter (see below). To do so, enter the following command into the CLI.
68
+
69
+ ``` bash
70
+ conda install pip -y
71
+ ```
72
+
73
+ #### Step 3: Install Pytorch
74
+
75
+ Installing PyTorch will allow users to extract features from images, i.e., identify individuals in images. This will be fast for users with an NVIDIA GPU or 16 GB Mac with Apple Silicon. **For all other users, extracting features from images will be extremely slow.**
76
+
77
+ PyTorch installation can be a little finicky. I recommend following [these instructions](https://pytorch.org/get-started/locally/). Below is an example for Windows users. If you haven't already, activate your environment before installing.
78
+
79
+ ``` bash
80
+ conda activate pyseter_env
81
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
82
+ ```
83
+ PyTorch is pretty big (over a gigabyte), so this may take a few minutes.
84
+
85
+ #### Step 4: Install pyseter
86
+
87
+ Now, install pyseter. If you haven't already, activate your environment before installing.
88
+
89
+ ``` bash
90
+ conda activate pyseter_env
91
+ pip3 install pyseter
92
+ ```
93
+
94
+ Now you're ready to go! You can verify your pyseter installation by opening Python in the CLI (assuming your environment is still activated).
95
+
96
+
97
+ ``` bash
98
+ python
99
+ ```
100
+
101
+ Then, run the following Python commands.
102
+
103
+ ``` python
104
+ import pyseter
105
+ pyseter.verify_pytorch()
106
+ quit()
107
+ ```
108
+
109
+ If successful, you should see a message like this.
110
+
111
+ ```
112
+ ✓ PyTorch 2.7.0 detected
113
+ ✓ Apple Silicon (MPS) GPU available
114
+ ```
115
+
116
+
117
+ ## Step 5: AnyDorsal weights
118
+
119
+ Pyseter relies on the [AnyDorsal algorithm](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14167) to extract features from images. Please download the weights and place them anywhere you like. You'll reference the file location later when using the `FeatureExtractor`.
120
+
121
+
122
+ ## Jupyter
123
+
124
+ There are several different ways to interact with Python. The most common way for data analysts is through a *Jupyter Notebook*, which is similar to an RMarkdown document or a Quarto document.
125
+
126
+ Just to make things confusing, there are several ways to open Jupyter Notebooks. Personally, I think the easiest way is through [VS Code](https://code.visualstudio.com/download). VS Code is an IDE (like R Studio) for editing code of all languages, and has great support for [Jupyter notebooks](https://code.visualstudio.com/docs/datascience/jupyter-notebooks). Alternatively, [Positron](https://positron.posit.co) is a VS-Code-based editor developed by the R Studio team.
127
+
128
+ Alternatively, you can try [Jupyter Lab](https://docs.jupyter.org/en/latest/). To do so, [install Jupyter](https://jupyter.org/install) via the command line (see below). I also recommend installing the [ipykernel](https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments), which helps you select the right conda environment in Jupyter Lab.
129
+
130
+ ``` bash
131
+ conda activate pyseter_env
132
+ conda install jupyter ipykernel -y
133
+ python -m ipykernel install --user --name pyseter --display-name "Python (pyseter)"
134
+ ```
135
+
136
+ Note that you only need to activate `pyseter_env` when you open a new command line (i.e., terminal or miniforge prompt). Then you can open Jupyter Lab with the following command:
137
+
138
+ ``` bash
139
+ jupyter lab
140
+ ```
141
+
142
+ ## Getting Started
143
+
144
+ To get started with pyseter, please check out the "General Overview" [notebook](https://github.com/philpatton/pyseter/blob/main/examples/general-overview.ipynb) in the examples folder of this repository!
@@ -0,0 +1,115 @@
1
+ # Pyseter
2
+
3
+ A Python package that sorts images by an automatically generated ID before photo-identification.
4
+
5
+ ## Installation
6
+
7
+ ### New to Python?
8
+
9
+ While most biologists use R, we chose to release Pyseter as a Python package because it relies heavily on Pytorch, a deep learning library. If you're new to Python, please follow these steps to getting started with Python and conda.
10
+
11
+ #### Step 1: Install conda
12
+
13
+ Conda is an important tool for managing packages in Python. Unlike Python, R (for the most part) handles packages for you behind the scenes. Python requires a more hands on approach.
14
+
15
+ - Download and install [Miniforge](https://conda-forge.org/download/) (a form of conda)
16
+
17
+ After installing, you can verify your installation by opening the **command line interface** (CLI), which will depend on your operating system. Are you on Windows? Open the "miniforge prompt" in your start menu. Are you on Mac? Open the Terminal application. Then, type the following command into the CLI and hit return.
18
+
19
+ ```bash
20
+ conda --version
21
+ ```
22
+
23
+ You should see something like `conda 25.5.1`. Of course, Anaconda, miniconda, mamba, or any other form of conda will work too.
24
+
25
+ #### Step 2: Create a new environment
26
+
27
+ Then, you'll create an environment for the package will live in. Environments are walled off areas where we can install packages. This allows you to have multiple versions of the same package installed on your machine, which can help prevent conflicts.
28
+
29
+ Enter the following two commands into the CLI:
30
+
31
+ ``` bash
32
+ conda create -n pyseter_env
33
+ conda activate pyseter_env
34
+ ```
35
+
36
+ Here, I name (hence the `-n`) the environment `pyseter_env`, but you can call it anything you like!
37
+
38
+ Now your environment is ready to go! Try installing your first package, pip. Pip is another way of installing Python packages, and will be helpful for installing PyTorch and pyseter (see below). To do so, enter the following command into the CLI.
39
+
40
+ ``` bash
41
+ conda install pip -y
42
+ ```
43
+
44
+ #### Step 3: Install Pytorch
45
+
46
+ Installing PyTorch will allow users to extract features from images, i.e., identify individuals in images. This will be fast for users with an NVIDIA GPU or 16 GB Mac with Apple Silicon. **For all other users, extracting features from images will be extremely slow.**
47
+
48
+ PyTorch installation can be a little finicky. I recommend following [these instructions](https://pytorch.org/get-started/locally/). Below is an example for Windows users. If you haven't already, activate your environment before installing.
49
+
50
+ ``` bash
51
+ conda activate pyseter_env
52
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
53
+ ```
54
+ PyTorch is pretty big (over a gigabyte), so this may take a few minutes.
55
+
56
+ #### Step 4: Install pyseter
57
+
58
+ Now, install pyseter. If you haven't already, activate your environment before installing.
59
+
60
+ ``` bash
61
+ conda activate pyseter_env
62
+ pip3 install pyseter
63
+ ```
64
+
65
+ Now you're ready to go! You can verify your pyseter installation by opening Python in the CLI (assuming your environment is still activated).
66
+
67
+
68
+ ``` bash
69
+ python
70
+ ```
71
+
72
+ Then, run the following Python commands.
73
+
74
+ ``` python
75
+ import pyseter
76
+ pyseter.verify_pytorch()
77
+ quit()
78
+ ```
79
+
80
+ If successful, you should see a message like this.
81
+
82
+ ```
83
+ ✓ PyTorch 2.7.0 detected
84
+ ✓ Apple Silicon (MPS) GPU available
85
+ ```
86
+
87
+
88
+ ## Step 5: AnyDorsal weights
89
+
90
+ Pyseter relies on the [AnyDorsal algorithm](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14167) to extract features from images. Please download the weights and place them anywhere you like. You'll reference the file location later when using the `FeatureExtractor`.
91
+
92
+
93
+ ## Jupyter
94
+
95
+ There are several different ways to interact with Python. The most common way for data analysts is through a *Jupyter Notebook*, which is similar to an RMarkdown document or a Quarto document.
96
+
97
+ Just to make things confusing, there are several ways to open Jupyter Notebooks. Personally, I think the easiest way is through [VS Code](https://code.visualstudio.com/download). VS Code is an IDE (like R Studio) for editing code of all languages, and has great support for [Jupyter notebooks](https://code.visualstudio.com/docs/datascience/jupyter-notebooks). Alternatively, [Positron](https://positron.posit.co) is a VS-Code-based editor developed by the R Studio team.
98
+
99
+ Alternatively, you can try [Jupyter Lab](https://docs.jupyter.org/en/latest/). To do so, [install Jupyter](https://jupyter.org/install) via the command line (see below). I also recommend installing the [ipykernel](https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments), which helps you select the right conda environment in Jupyter Lab.
100
+
101
+ ``` bash
102
+ conda activate pyseter_env
103
+ conda install jupyter ipykernel -y
104
+ python -m ipykernel install --user --name pyseter --display-name "Python (pyseter)"
105
+ ```
106
+
107
+ Note that you only need to activate `pyseter_env` when you open a new command line (i.e., terminal or miniforge prompt). Then you can open Jupyter Lab with the following command:
108
+
109
+ ``` bash
110
+ jupyter lab
111
+ ```
112
+
113
+ ## Getting Started
114
+
115
+ To get started with pyseter, please check out the "General Overview" [notebook](https://github.com/philpatton/pyseter/blob/main/examples/general-overview.ipynb) in the examples folder of this repository!
@@ -0,0 +1,41 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "pyseter"
7
+ version = "0.1.0"
8
+ description = "Sort images by an automatically generated ID before photo-ID"
9
+ readme = "README.md"
10
+ authors = [{name = "Philip T. Patton", email = "philtpatton@gmail.com"}]
11
+ license = {text = "MIT"}
12
+ classifiers = [
13
+ "Development Status :: 3 - Alpha",
14
+ "License :: OSI Approved :: MIT License",
15
+ "Programming Language :: Python :: 3",
16
+ "Programming Language :: Python :: 3.8",
17
+ "Programming Language :: Python :: 3.9",
18
+ "Programming Language :: Python :: 3.10",
19
+ "Programming Language :: Python :: 3.11",
20
+ ]
21
+ keywords = ["individual identification", "pytorch", "marine mammal"]
22
+ requires-python = ">=3.8"
23
+ dependencies = [
24
+ "numpy>=1.21.0",
25
+ "pandas>=1.3.0",
26
+ "scikit-learn>=1.1.0",
27
+ "networkx>=2.6.0",
28
+ "matplotlib>=3.5.0",
29
+ "tqdm>=4.60.0",
30
+ "timm>=0.6.0"
31
+ ]
32
+
33
+ [project.urls]
34
+ Homepage = "https://github.com/philpatton/pyseter"
35
+ # Documentation = "https://pyseter.readthedocs.io"
36
+ Repository = "https://github.com/philpatton/pyseter"
37
+ "Bug Tracker" = "https://github.com/philpatton/pyseter/issues"
38
+
39
+ [tool.setuptools.packages.find]
40
+ where = ["."]
41
+ include = ["pyseter*"]
@@ -0,0 +1,12 @@
1
+ """
2
+ Your Package Name
3
+ A description of what your package does.
4
+ """
5
+
6
+ __version__ = "0.1.0"
7
+
8
+ # Import main functions/classes for easy access
9
+ from pyseter.extract import verify_pytorch, get_best_device
10
+
11
+ # Define what gets imported with "from your_package import *"
12
+ __all__ = ["verify_pytorch", "get_best_device"]
@@ -0,0 +1,419 @@
1
+ """Extract features from images with AnyDorsal.
2
+
3
+ Leave one blank line. The rest of this docstring should contain an
4
+ overall description of the module or program. Optionally, it may also
5
+ contain a brief description of exported classes and functions and/or usage
6
+ examples.
7
+
8
+ Typical usage example:
9
+
10
+ foo = ClassFoo()
11
+ bar = foo.function_bar()
12
+ """
13
+ from typing import Optional, Dict, LiteralString
14
+ import os
15
+
16
+ from sklearn.preprocessing import normalize
17
+ from torch import nn
18
+ from torch.amp import autocast # pyright: ignore[reportPrivateImportUsage]
19
+ from torch.utils.data import Dataset, DataLoader
20
+ from torchvision.transforms import v2
21
+ from torchvision.io import decode_image
22
+ from tqdm import tqdm
23
+
24
+ import numpy as np
25
+ import pandas as pd
26
+ import timm
27
+ import torch
28
+ import torch.nn.functional as F
29
+ import torch.utils.checkpoint as cp
30
+
31
+ # def main():
32
+
33
+ # # Print the name of the device
34
+ # if torch.cuda.is_available():
35
+ # print('... with cuda device:', torch.cuda.get_device_name())
36
+ # mem = torch.cuda.get_device_properties(0).total_memory
37
+ # print('... which has', int(mem / 1024**3), 'GB of memory')
38
+
39
+ # # sometimes I want to up the batch size depending on which GPU i get
40
+ # user_input = input('Continue? (y/n): ')
41
+ # if user_input.lower() != 'y':
42
+ # print('Exiting...')
43
+ # return
44
+
45
+ # # Load the configuration file
46
+ # with open('config.yaml', 'r') as f:
47
+ # config = yaml.load(f, Loader=yaml.SafeLoader)
48
+
49
+ # # create an output directory
50
+ # root = config['image_root']
51
+ # out_dir = os.path.join(root, 'features')
52
+ # os.makedirs(out_dir, exist_ok=True)
53
+
54
+ # # extract the features
55
+ # sample_size = config['sample_size']
56
+ # for sample in range(sample_size):
57
+ # print('Sample:', sample)
58
+ # features = extract(config)
59
+
60
+ # # define the output path
61
+ # if config['stochastic']:
62
+ # out_path = os.path.join(out_dir, f'features_{sample:>03}.npy')
63
+ # else:
64
+ # out_path = os.path.join(out_dir, 'features.npy')
65
+
66
+ # # save the features
67
+ # np.save(out_path, features)
68
+
69
+ # # might as well save the features in a single array
70
+ # if config['stochastic']:
71
+ # stochastic_features = [f for f in os.listdir(out_dir) if 'sample' in f]
72
+ # files = [k for k in features.keys()]
73
+ # features_array = load_all_features(stochastic_features, files)
74
+ # out_path = os.path.join(out_dir, 'stochastic_feature_array.npy')
75
+ # np.save(out_path, features_array)
76
+
77
+ def verify_pytorch() -> None:
78
+ """Verify PyTorch installation and show device options."""
79
+ try:
80
+ import torch
81
+ print(f"✓ PyTorch {torch.__version__} detected")
82
+
83
+ # Check all device options
84
+ if torch.cuda.is_available():
85
+ print(f"✓ CUDA GPU available: {torch.cuda.get_device_name(0)}")
86
+
87
+ if torch.backends.mps.is_available():
88
+ print("✓ Apple Silicon (MPS) GPU available")
89
+
90
+ if not torch.cuda.is_available() and not torch.backends.mps.is_available():
91
+ print("! No GPU acceleration available. Expect slow feature extraction.")
92
+
93
+ return None
94
+
95
+ except ImportError:
96
+ print("✗ PyTorch not found!")
97
+ print("See homepage for PyTorch installation instructions")
98
+
99
+ return None
100
+
101
+ def get_best_device() -> LiteralString:
102
+ """Select torch device based on expected performance."""
103
+ if torch.cuda.is_available():
104
+ device = "cuda"
105
+ device_name = torch.cuda.get_device_name(0)
106
+ elif torch.backends.mps.is_available():
107
+ device = "mps"
108
+ device_name = "Apple Silicon GPU"
109
+ else:
110
+ device = "cpu"
111
+ device_name = "CPU"
112
+
113
+ print(f"Using device: {device} ({device_name})")
114
+ return device
115
+
116
+ class FeatureExtractor:
117
+
118
+ def __init__(self, batch_size: int, model_path: str,
119
+ device: Optional[str]=None,
120
+ stochastic: bool=False,
121
+ bbox_csv: Optional[str]=None):
122
+ self.batch_size = batch_size
123
+ self.model_path = model_path
124
+ self.stochastic = stochastic
125
+ self.bbox_csv = bbox_csv
126
+ if device is None:
127
+ self.device = get_best_device()
128
+ else:
129
+ self.device = device
130
+
131
+ def extract(self, image_dir: str) -> Dict:
132
+
133
+ print('Loading model...')
134
+ model = self.get_model()
135
+
136
+ if not self.stochastic:
137
+ model.eval()
138
+
139
+ test_dataloader = get_test_data(image_dir, self.batch_size, self.bbox_csv)
140
+
141
+ print('Extracting features...')
142
+ features = self.extract_features(test_dataloader, model)
143
+
144
+ return features
145
+
146
+ def get_model(self):
147
+ """Build the model from the checkpoint."""
148
+ # Create the backbone
149
+ backbone = EfficientNetCustomBackbone(
150
+ model_name='tf_efficientnet_l2_ns',
151
+ drop_path_rate=0.2,
152
+ with_cp=False
153
+ )
154
+
155
+ # Create the complete model
156
+ model = ImageClassifier(
157
+ backbone=backbone,
158
+ in_channels=5504, # Feature dimension for efficientnet_l2_ns
159
+ num_classes=15587 # Number of classes
160
+ )
161
+
162
+ # Load the checkpoint
163
+ print('Loading model from:', self.model_path, )
164
+ checkpoint = torch.load(self.model_path, map_location=self.device)
165
+
166
+ # Process state dict
167
+ state_dict = checkpoint["state_dict"]
168
+ if 'head.compute_loss.margins' in state_dict:
169
+ _ = state_dict.pop('head.compute_loss.margins')
170
+
171
+ # Adapt the state dict keys if needed
172
+ new_state_dict = {}
173
+ for k, v in state_dict.items():
174
+ # Handle backbone keys
175
+ if k.startswith('backbone.timm_model.'):
176
+ # Map to our structure
177
+ if 'blocks' in k:
178
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
179
+ elif 'conv_stem' in k:
180
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
181
+ elif 'bn1' in k:
182
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
183
+ elif 'bn2' in k:
184
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
185
+ elif 'conv_head' in k:
186
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
187
+ else:
188
+ new_k = k.replace('backbone.timm_model.', 'backbone.base_model.')
189
+ else:
190
+ new_k = k
191
+ new_state_dict[new_k] = v
192
+
193
+ # Load state dict with some flexibility for missing keys
194
+ missing_keys, unexpected_keys = model.load_state_dict(new_state_dict, strict=False)
195
+
196
+ if missing_keys:
197
+ print(f"Warning: Missing keys when loading pretrained weights: {missing_keys}")
198
+ if unexpected_keys:
199
+ print(f"Warning: Unexpected keys when loading pretrained weights: {unexpected_keys}")
200
+
201
+ # # Convert DropPath to inference mode if stochastic is enabled
202
+ # if stochastic:
203
+ # model = convert_droppath_to_inference(model)
204
+
205
+ # Move model to device
206
+ model.to(self.device)
207
+
208
+ return model
209
+
210
+ def extract_features(self, dataloader, model) -> dict:
211
+ """Extract features from images using the model."""
212
+ file_list = []
213
+ feature_list = []
214
+
215
+ with torch.no_grad():
216
+ for file, image in tqdm(dataloader):
217
+ image = image.to(self.device)
218
+ with autocast(self.device): # Big speed up
219
+ feature_vector = model(image, return_loss=False)
220
+
221
+ file_list.append(file)
222
+ feature_list.append(feature_vector.cpu().numpy())
223
+
224
+ # Handle case with only one batch
225
+ if isinstance(file_list[0], list) or isinstance(file_list[0], tuple):
226
+ files = np.concatenate(file_list)
227
+ else:
228
+ files = np.array(file_list)
229
+
230
+ feats = np.vstack(feature_list)
231
+
232
+ # Create dictionary mapping filenames to features
233
+ feature_dict = dict(zip(files, feats))
234
+
235
+ return feature_dict
236
+
237
+ class EfficientNetCustomBackbone(nn.Module):
238
+ """Custom EfficientNet backbone that mimics the MMCLS implementation."""
239
+ def __init__(self, model_name='tf_efficientnet_l2_ns', drop_path_rate=0.2, with_cp=False):
240
+ super().__init__()
241
+ # Create the base model
242
+ self.base_model = timm.create_model(
243
+ model_name,
244
+ pretrained=False,
245
+ drop_path_rate=drop_path_rate,
246
+ )
247
+ self.with_cp = with_cp
248
+
249
+ # Remove the classifier and pooling
250
+ self.base_model.classifier = nn.Identity()
251
+ self.base_model.global_pool = nn.Identity()
252
+
253
+ def forward(self, x):
254
+ # We need to replicate the exact forward pass from your MMCLS TimmEfficientNet class
255
+ # This extracts features before the pooling layer
256
+
257
+ # Apply stem
258
+ x = self.base_model.conv_stem(x)
259
+ x = self.base_model.bn1(x)
260
+ # x = self.base_model.act1(x)
261
+
262
+ # Process through all blocks
263
+ for block_idx, blocks in enumerate(self.base_model.blocks):
264
+ for idx, block in enumerate(blocks):
265
+ if self.with_cp and x.requires_grad: # pyright: ignore[reportOptionalMemberAccess]
266
+ x = cp.checkpoint(block, x)
267
+ else:
268
+ x = block(x)
269
+
270
+ # Final convolution and activation
271
+ x = self.base_model.conv_head(x)
272
+ x = self.base_model.bn2(x)
273
+ # x = self.base_model.act2(x)
274
+
275
+ return x
276
+
277
+ class NormLinear(nn.Linear):
278
+ """Linear layer with optional feature and weight normalization."""
279
+ def __init__(self, in_features, out_features, bias=False, feature_norm=True,
280
+ weight_norm=True):
281
+ super().__init__(in_features, out_features, bias=bias)
282
+ self.weight_norm = weight_norm
283
+ self.feature_norm = feature_norm
284
+
285
+ def forward(self, data):
286
+ if self.feature_norm:
287
+ data = F.normalize(data)
288
+ if self.weight_norm:
289
+ weight = F.normalize(self.weight)
290
+ else:
291
+ weight = self.weight
292
+ return F.linear(data, weight, self.bias)
293
+
294
+
295
+ class ImageClassifier(nn.Module):
296
+ """Complete model that mimics the MMCLS classifier structure."""
297
+ def __init__(self, backbone, in_channels=5504, num_classes=15587):
298
+ super().__init__()
299
+ self.backbone = backbone
300
+
301
+ # Global pooling layer
302
+ self.neck = nn.AdaptiveAvgPool2d((1, 1))
303
+
304
+ # Classification head
305
+ self.head = NormLinear(
306
+ in_features=in_channels,
307
+ out_features=num_classes,
308
+ bias=False,
309
+ feature_norm=True,
310
+ weight_norm=True
311
+ )
312
+
313
+ def forward(self, x, return_loss=True):
314
+ # Extract features from backbone
315
+ x = self.backbone(x)
316
+
317
+ # Apply global pooling and flatten
318
+ x = self.neck(x)
319
+ x = torch.flatten(x, 1)
320
+
321
+ # Return features if not computing loss
322
+ if not return_loss:
323
+ return F.normalize(x)
324
+
325
+ # Compute logits for training (not used in feature extraction)
326
+ logits = self.head(x)
327
+ return logits
328
+
329
+ def load_bounding_boxes(csv_path):
330
+ '''Load bounding boxes from a CSV file.'''
331
+ df = pd.read_csv(csv_path)
332
+ bboxes = {}
333
+ for _, row in df.iterrows():
334
+ # filename = row['image_name']
335
+ # bbox_columns = ['bbox_x', 'bbox_y', 'bbox_width', 'bbox_height']
336
+ # xmin, ymin, w, h = row[bbox_columns].values
337
+ # xmax, ymax = xmin + w, ymin + h
338
+ filename = row['filename']
339
+ xmin, ymin, xmax, ymax = row['xmin'], row['ymin'], row['xmax'], row['ymax']
340
+ bboxes[filename] = (xmin, ymin, xmax, ymax)
341
+ return bboxes
342
+
343
+ class DorsalImageDataset(Dataset):
344
+ """Dataset for dorsal fin images with optional bounding boxes."""
345
+ def __init__(self, image_dir, transform=None, bbox_csv=None):
346
+ self.image_dir = image_dir
347
+ self.transform = transform
348
+ self.images = list_images(image_dir)
349
+ self.bboxes = load_bounding_boxes(bbox_csv) if bbox_csv else None
350
+
351
+ def __len__(self):
352
+ return len(self.images)
353
+
354
+ def __getitem__(self, idx):
355
+ filename = self.images[idx]
356
+ full_path = os.path.join(self.image_dir, filename)
357
+ image = decode_image(full_path)
358
+ if self.bboxes and filename in self.bboxes:
359
+ bbox = self.bboxes[filename]
360
+ # Crop image to bounding box
361
+ image = image[:, bbox[1]:bbox[3], bbox[0]:bbox[2]]
362
+ if self.transform:
363
+ image = self.transform(image)
364
+ return filename, image
365
+
366
+ def get_test_data(directory, batch_size, bbox_csv=None):
367
+ """Get the dataloader from a directory of images."""
368
+ # RGB normalization values
369
+ bgr_mean = np.array([123.675, 116.28, 103.53]) / 255
370
+ bgr_std = np.array([58.395, 57.12, 57.375]) / 255
371
+
372
+ rgb_mean = bgr_mean[[2, 1, 0]]
373
+ rgb_std = bgr_std[[2, 1, 0]]
374
+
375
+ image_size = (768, 768)
376
+
377
+ data_transforms = v2.Compose([
378
+ v2.Resize(image_size),
379
+ v2.ToDtype(torch.float32, scale=True),
380
+ v2.Normalize(rgb_mean.tolist(), rgb_std.tolist()),
381
+ ])
382
+
383
+ test_data = DorsalImageDataset(directory, transform=data_transforms,
384
+ bbox_csv=bbox_csv)
385
+
386
+ dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=False,
387
+ num_workers=2, pin_memory=True)
388
+
389
+ return dataloader
390
+
391
+ def list_images(image_dir):
392
+ """List all images in a directory."""
393
+ images = []
394
+ formats = ('png', 'jpg', 'jpeg')
395
+
396
+ for file in os.listdir(image_dir):
397
+ if file.lower().endswith(formats):
398
+ images.append(file)
399
+
400
+ return images
401
+
402
+ def load_and_process_features(path, image_list, l2=True):
403
+ """Load features for a single file and process them into array format."""
404
+ features = np.load(path, allow_pickle=True).item()
405
+ feature_array = np.array([np.array(features[image]) for image in image_list])
406
+
407
+ if l2:
408
+ feature_array = normalize(feature_array, axis=0)
409
+
410
+ return feature_array
411
+
412
+ def load_all_features(feature_paths, image_list):
413
+ """Load and process all features."""
414
+ feature_list = []
415
+ print('Loading features...')
416
+ for path in tqdm(feature_paths):
417
+ feature = load_and_process_features(path, image_list)
418
+ feature_list.append(feature)
419
+ return np.stack(feature_list, axis=1)
@@ -0,0 +1,34 @@
1
+ '''Grade images by distinctiveness.'''
2
+
3
+ from warnings import warn
4
+
5
+ from scipy.spatial.distance import cosine
6
+ from sklearn.preprocessing import normalize
7
+ import numpy as np
8
+
9
+ from pyseter.sort import NetworkCluster
10
+
11
+ def rate_distinctiveness(features: np.ndarray, match_threshold: float=0.6) -> np.ndarray:
12
+ '''Grade images by their distinctiveness.'''
13
+
14
+ # watch out!
15
+ warn('Distinctiveness grades are experimental and should be verified.')
16
+
17
+ # we use single linkage clustering to find the unrecognizable identity (UI)
18
+ nc = NetworkCluster(match_threshold=match_threshold)
19
+ results = nc.cluster_images(features, message=False)
20
+
21
+ # we assume that the largest cluster is the UI
22
+ cluster_ids = np.array(results.cluster_idx)
23
+ labs, count = np.unique(cluster_ids, return_counts=True)
24
+ ui_index = labs[np.argmax(count)]
25
+ print(f'Unrecognizable identity cluster consists of {np.max(count)} images.')
26
+
27
+ # average the features for all images in the largest cluster to get the UI embedding
28
+ ui_feature_array = features[cluster_ids == ui_index]
29
+ ui_norm = normalize(ui_feature_array)
30
+ ui_center = ui_norm.mean(axis=0)
31
+
32
+ # compute the embedding recognizability score
33
+ ers = [cosine(f, ui_center) for f in features]
34
+ return np.array(ers)
@@ -0,0 +1,282 @@
1
+ """Sort images by an automatically generated ID before photo-identification"""
2
+
3
+ from collections import Counter
4
+ from pathlib import Path
5
+ from typing import Tuple, List, Optional
6
+ import os
7
+ import shutil
8
+
9
+ from sklearn.cluster import AgglomerativeClustering
10
+ from sklearn.metrics.pairwise import cosine_distances
11
+ import matplotlib.pyplot as plt
12
+ import networkx as nx
13
+ import numpy as np
14
+ import pandas as pd
15
+
16
+ def prep_images(image_dir: str, all_image_dir: str) -> None:
17
+ """Copy all images to a temporary directory and return encounter information"""
18
+ images, encounters = process_images(image_dir, all_image_dir)
19
+ working_dir = Path(image_dir).parent.absolute().as_posix()
20
+ save_encounter_info(working_dir, encounters, images)
21
+
22
+ def process_images(image_root: str, all_image_dir: str) -> Tuple[List[str], List[str]]:
23
+ """Copy all images to a temporary directory and return encounter information"""
24
+ image_list = []
25
+ encounter_list = []
26
+
27
+ # the temporary directory lies in the image root
28
+ os.makedirs(all_image_dir, exist_ok=True)
29
+
30
+ # loop over all the files in the image root
31
+ i = 0
32
+ for path, dirs, files in os.walk(image_root, topdown=True):
33
+
34
+ # only look at images, not in the tmp dir, or images that have already been sorted
35
+ dirs[:] = [d for d in dirs if d not in all_image_dir]
36
+ dirs[:] = [d for d in dirs if 'cluster' not in d]
37
+ for file in files:
38
+ if not file.lower().endswith('.jpg'):
39
+ continue
40
+
41
+ image_list.append(file)
42
+
43
+ # get string identifies for the encounter
44
+ full_path = os.path.join(path, file)
45
+ p = Path(full_path)
46
+ encounter = p.parts[-2]
47
+ encounter_list.append(encounter)
48
+
49
+ # finally, copy all of the images to the tmp dir
50
+ shutil.copy(full_path, all_image_dir)
51
+ i += 1
52
+
53
+ print(f'Copied {i} images to:', all_image_dir)
54
+
55
+ return image_list, encounter_list
56
+
57
+ def save_encounter_info(output_dir: str, encounters: List[str], images: List[str]) -> None:
58
+ encounter_df = pd.DataFrame(dict(encounter=encounters, image=images))
59
+ encounter_path = os.path.join(output_dir, 'encounter_info.csv')
60
+ encounter_df.to_csv(encounter_path, index=False)
61
+ print('Saved encounter information to:', encounter_path)
62
+
63
+ def load_features(image_root: str) -> Tuple[np.ndarray, np.ndarray]:
64
+ """Load features from disk."""
65
+
66
+ # the features are stored in a dictionary with image names as keys
67
+ feature_path = os.path.join(image_root, 'features', 'features.npy')
68
+ feature_dict = np.load(feature_path, allow_pickle=True).item()
69
+
70
+ # unpack the dictionary into arrays
71
+ image_names = np.array(list(feature_dict.keys()))
72
+ feature_array = np.array(list(feature_dict.values()))
73
+
74
+ return image_names, feature_array
75
+
76
+ class HierarchicalCluster:
77
+
78
+ def __init__(self, match_threshold: float=0.5) -> None:
79
+
80
+ if (match_threshold > 1.0) or (match_threshold < 0.0):
81
+ raise ValueError('Match threshold must lie between 0 and 1')
82
+ self.match_threshold=match_threshold
83
+
84
+ def cluster_images(self, features: np.ndarray) -> np.ndarray:
85
+
86
+ # convert similarity threshold to distance
87
+ distance_threshold = 1 - self.match_threshold
88
+
89
+ # cluster using average linkage
90
+ hac_results = AgglomerativeClustering(
91
+ n_clusters=None,
92
+ distance_threshold=distance_threshold,
93
+ linkage='complete',
94
+ metric='cosine'
95
+ ).fit(features)
96
+
97
+ # report results
98
+ cluster_labels = hac_results.labels_
99
+ return cluster_labels
100
+
101
+ class ClusterResults:
102
+ def __init__(self, cluster_labels):
103
+ self.cluster_labels = cluster_labels
104
+ self.cluster_idx = [None]
105
+ self.filenames = None
106
+ self.cluster_count = len(set(cluster_labels))
107
+ self.cluster_sizes = Counter(cluster_labels).values()
108
+ self.false_positive_df = None # type: Optional[pd.DataFrame]
109
+ self.graph = nx.Graph() # Initialize with empty graph instead of None
110
+ self.bad_clusters = []
111
+ self.bad_cluster_idx = []
112
+
113
+ def plot_suspicious(self):
114
+ graph = self.graph
115
+ # Get connected components from the graph
116
+ if graph is None or graph.number_of_nodes() == 0:
117
+ print("No graph data available to plot suspicious connections.")
118
+ return
119
+ connected_components = [graph.subgraph(c) for c
120
+ in nx.connected_components(self.graph)]
121
+
122
+ subplot_count = len(self.bad_clusters)
123
+ n_col = 5
124
+ n_row = int(np.ceil(subplot_count / n_col))
125
+ width = 1.5
126
+ height = 1.5
127
+
128
+ fig, axes = plt.subplots(n_row, n_col, tight_layout=True,
129
+ figsize=(n_col * width, n_row * height))
130
+ flat = axes.flatten()
131
+
132
+ for i, idx in enumerate(self.bad_cluster_idx):
133
+
134
+ ax = flat[i]
135
+
136
+ # remove self loops
137
+ G = connected_components[idx].copy()
138
+ G.remove_edges_from(nx.selfloop_edges(G))
139
+
140
+ # modularity is the warning sign for a bad cluster
141
+ community = nx.community.louvain_communities(G) # pyright: ignore[reportAttributeAccessIssue]
142
+
143
+ layout = nx.spring_layout(G)
144
+ nx.draw_networkx_edges(G, pos=layout, ax=ax, edge_color='C7',
145
+ alpha=0.3)
146
+ # color each node based on the louvain_communities
147
+ community = nx.community.louvain_communities(G) # pyright: ignore[reportAttributeAccessIssue]
148
+ color_map = {}
149
+ for idx, comm in enumerate(community):
150
+ for node in comm:
151
+ color_map[node] = idx
152
+ node_colors = [color_map[node] for node in G.nodes]
153
+ nx.draw_networkx_nodes(G, layout, node_size=20, edgecolors='k',
154
+ node_color=node_colors, cmap='tab10', ax=ax) # pyright: ignore[reportArgumentType]
155
+
156
+ label = self.bad_clusters[i]
157
+ ax.set_title(label, fontsize=10, loc='center')
158
+
159
+ # delete unused axes
160
+ for idx in range(subplot_count, len(flat)):
161
+ fig.delaxes(flat[idx])
162
+
163
+ s = 'Matches between images\nSingle links between clusters are suspicious'
164
+ fig.suptitle(s, fontsize=12)
165
+
166
+ plt.tight_layout()
167
+ plt.show()
168
+
169
+ def format_ids(ids: np.ndarray) -> List:
170
+ return [f'ID-{i:04d}' for i in ids]
171
+
172
+ class NetworkCluster:
173
+
174
+ def __init__(self, match_threshold: float=0.5) -> None:
175
+
176
+ if (match_threshold > 1.0) or (match_threshold < 0.0):
177
+ raise ValueError('Match threshold must lie between 0 and 1')
178
+ self.match_threshold=match_threshold
179
+
180
+ def cluster_images(self, similarity: np.ndarray, message: bool=True) -> ClusterResults:
181
+
182
+ MODULARITY_THRESHOLD = 0.3
183
+
184
+ matches = (similarity > self.match_threshold)
185
+ # matches = np.where(distance < distance_threshold, distance, 0)
186
+
187
+ # Get connected components from the graph
188
+ G = nx.from_numpy_array(matches)
189
+ connected_components = (G.subgraph(c) for c in nx.connected_components(G))
190
+
191
+ # Create a mapping from node index to cluster index
192
+ file_count, _ = similarity.shape
193
+ cluster_labels = np.empty(file_count, dtype=object)
194
+ cluster_indices = np.empty(file_count, dtype=int)
195
+
196
+ # Assign clusters to the cluster_labels array
197
+ df_list = []
198
+ bad_clusters = []
199
+ bad_cluster_idx = []
200
+ for cluster_idx, subgraph in enumerate(connected_components):
201
+
202
+ cluster_label = f'ID_{cluster_idx:04d}'
203
+ for node in subgraph:
204
+ cluster_labels[node] = cluster_label
205
+ cluster_indices[node] = cluster_idx
206
+
207
+ # modularity is the warning sign for a bad cluster
208
+ community = nx.community.louvain_communities(subgraph) # type: ignore
209
+ modularity = nx.community.quality.modularity(subgraph, community) # pyright: ignore[reportAttributeAccessIssue]
210
+
211
+ if modularity > MODULARITY_THRESHOLD:
212
+ bad_clusters.append(cluster_label)
213
+ bad_cluster_idx.append(cluster_idx)
214
+
215
+ for community_idx, comm in enumerate(community):
216
+ for node in comm:
217
+ row = pd.DataFrame({
218
+ 'cluster_id': [cluster_label],
219
+ 'modularity': modularity,
220
+ # 'filename': fnames[node],
221
+ 'community': community_idx
222
+ })
223
+ df_list.append(row)
224
+
225
+ if bad_clusters and message:
226
+ w = f'Following clusters may contain false positives:\n{bad_clusters}'
227
+ print(w)
228
+
229
+ df = pd.concat(df_list, ignore_index=True)
230
+
231
+ results = ClusterResults(cluster_labels)
232
+ # results.filenames = fnames
233
+ results.graph = G
234
+ results.false_positive_df = df
235
+ results.bad_clusters = bad_clusters
236
+ results.bad_cluster_idx = bad_cluster_idx
237
+ results.cluster_idx = format_ids(cluster_indices)
238
+
239
+ return results
240
+
241
+ def report_cluster_results(cluster_labs: np.ndarray) -> None:
242
+
243
+ # quick summary of the cluster_labs results
244
+ label, count = np.unique(cluster_labs, return_counts=True)
245
+ print(f'Found {len(label)} clusters.')
246
+ print(f'Largest cluster has {np.max(count)} images.')
247
+
248
+ def sort_images(id_df, all_image_dir: str, output_dir: str) -> None:
249
+ """Sort images into folders based on cluster and encounter."""
250
+
251
+ # check that the input directory is a valid derectory
252
+ if not os.path.isdir(all_image_dir):
253
+ raise ValueError('input_dir', all_image_dir, 'is not a valid directory')
254
+
255
+ # check that the names are valid
256
+ required_column_names = ['image', 'proposed_id', 'encounter']
257
+ names_correct = all([i in id_df.columns for i in required_column_names])
258
+ if not names_correct:
259
+ raise ValueError("id_df must contain the column names 'image', 'proposed_id', 'encounter'")
260
+
261
+ if os.path.exists(output_dir):
262
+ shutil.rmtree(output_dir)
263
+ os.makedirs(output_dir)
264
+
265
+ grouped = id_df.groupby(['proposed_id', 'encounter'])
266
+ i = 0
267
+ j = 0
268
+ for (clust_id, enc_id), mini_df in grouped:
269
+
270
+ i += 1
271
+ cluster_dir = os.path.join(output_dir, clust_id)
272
+ os.makedirs(cluster_dir, exist_ok=True)
273
+
274
+ encounter_dir = os.path.join(cluster_dir, enc_id)
275
+ os.makedirs(encounter_dir, exist_ok=True)
276
+
277
+ for img in mini_df['image']:
278
+ j += 1
279
+ old_path = os.path.join(all_image_dir, img)
280
+ shutil.copy(old_path, encounter_dir)
281
+
282
+ print(f'Sorted {j} images into {i} folders.')
@@ -0,0 +1,144 @@
1
+ Metadata-Version: 2.4
2
+ Name: pyseter
3
+ Version: 0.1.0
4
+ Summary: Sort images by an automatically generated ID before photo-ID
5
+ Author-email: "Philip T. Patton" <philtpatton@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/philpatton/pyseter
8
+ Project-URL: Repository, https://github.com/philpatton/pyseter
9
+ Project-URL: Bug Tracker, https://github.com/philpatton/pyseter/issues
10
+ Keywords: individual identification,pytorch,marine mammal
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Requires-Python: >=3.8
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: numpy>=1.21.0
22
+ Requires-Dist: pandas>=1.3.0
23
+ Requires-Dist: scikit-learn>=1.1.0
24
+ Requires-Dist: networkx>=2.6.0
25
+ Requires-Dist: matplotlib>=3.5.0
26
+ Requires-Dist: tqdm>=4.60.0
27
+ Requires-Dist: timm>=0.6.0
28
+ Dynamic: license-file
29
+
30
+ # Pyseter
31
+
32
+ A Python package that sorts images by an automatically generated ID before photo-identification.
33
+
34
+ ## Installation
35
+
36
+ ### New to Python?
37
+
38
+ While most biologists use R, we chose to release Pyseter as a Python package because it relies heavily on Pytorch, a deep learning library. If you're new to Python, please follow these steps to getting started with Python and conda.
39
+
40
+ #### Step 1: Install conda
41
+
42
+ Conda is an important tool for managing packages in Python. Unlike Python, R (for the most part) handles packages for you behind the scenes. Python requires a more hands on approach.
43
+
44
+ - Download and install [Miniforge](https://conda-forge.org/download/) (a form of conda)
45
+
46
+ After installing, you can verify your installation by opening the **command line interface** (CLI), which will depend on your operating system. Are you on Windows? Open the "miniforge prompt" in your start menu. Are you on Mac? Open the Terminal application. Then, type the following command into the CLI and hit return.
47
+
48
+ ```bash
49
+ conda --version
50
+ ```
51
+
52
+ You should see something like `conda 25.5.1`. Of course, Anaconda, miniconda, mamba, or any other form of conda will work too.
53
+
54
+ #### Step 2: Create a new environment
55
+
56
+ Then, you'll create an environment for the package will live in. Environments are walled off areas where we can install packages. This allows you to have multiple versions of the same package installed on your machine, which can help prevent conflicts.
57
+
58
+ Enter the following two commands into the CLI:
59
+
60
+ ``` bash
61
+ conda create -n pyseter_env
62
+ conda activate pyseter_env
63
+ ```
64
+
65
+ Here, I name (hence the `-n`) the environment `pyseter_env`, but you can call it anything you like!
66
+
67
+ Now your environment is ready to go! Try installing your first package, pip. Pip is another way of installing Python packages, and will be helpful for installing PyTorch and pyseter (see below). To do so, enter the following command into the CLI.
68
+
69
+ ``` bash
70
+ conda install pip -y
71
+ ```
72
+
73
+ #### Step 3: Install Pytorch
74
+
75
+ Installing PyTorch will allow users to extract features from images, i.e., identify individuals in images. This will be fast for users with an NVIDIA GPU or 16 GB Mac with Apple Silicon. **For all other users, extracting features from images will be extremely slow.**
76
+
77
+ PyTorch installation can be a little finicky. I recommend following [these instructions](https://pytorch.org/get-started/locally/). Below is an example for Windows users. If you haven't already, activate your environment before installing.
78
+
79
+ ``` bash
80
+ conda activate pyseter_env
81
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
82
+ ```
83
+ PyTorch is pretty big (over a gigabyte), so this may take a few minutes.
84
+
85
+ #### Step 4: Install pyseter
86
+
87
+ Now, install pyseter. If you haven't already, activate your environment before installing.
88
+
89
+ ``` bash
90
+ conda activate pyseter_env
91
+ pip3 install pyseter
92
+ ```
93
+
94
+ Now you're ready to go! You can verify your pyseter installation by opening Python in the CLI (assuming your environment is still activated).
95
+
96
+
97
+ ``` bash
98
+ python
99
+ ```
100
+
101
+ Then, run the following Python commands.
102
+
103
+ ``` python
104
+ import pyseter
105
+ pyseter.verify_pytorch()
106
+ quit()
107
+ ```
108
+
109
+ If successful, you should see a message like this.
110
+
111
+ ```
112
+ ✓ PyTorch 2.7.0 detected
113
+ ✓ Apple Silicon (MPS) GPU available
114
+ ```
115
+
116
+
117
+ ## Step 5: AnyDorsal weights
118
+
119
+ Pyseter relies on the [AnyDorsal algorithm](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14167) to extract features from images. Please download the weights and place them anywhere you like. You'll reference the file location later when using the `FeatureExtractor`.
120
+
121
+
122
+ ## Jupyter
123
+
124
+ There are several different ways to interact with Python. The most common way for data analysts is through a *Jupyter Notebook*, which is similar to an RMarkdown document or a Quarto document.
125
+
126
+ Just to make things confusing, there are several ways to open Jupyter Notebooks. Personally, I think the easiest way is through [VS Code](https://code.visualstudio.com/download). VS Code is an IDE (like R Studio) for editing code of all languages, and has great support for [Jupyter notebooks](https://code.visualstudio.com/docs/datascience/jupyter-notebooks). Alternatively, [Positron](https://positron.posit.co) is a VS-Code-based editor developed by the R Studio team.
127
+
128
+ Alternatively, you can try [Jupyter Lab](https://docs.jupyter.org/en/latest/). To do so, [install Jupyter](https://jupyter.org/install) via the command line (see below). I also recommend installing the [ipykernel](https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments), which helps you select the right conda environment in Jupyter Lab.
129
+
130
+ ``` bash
131
+ conda activate pyseter_env
132
+ conda install jupyter ipykernel -y
133
+ python -m ipykernel install --user --name pyseter --display-name "Python (pyseter)"
134
+ ```
135
+
136
+ Note that you only need to activate `pyseter_env` when you open a new command line (i.e., terminal or miniforge prompt). Then you can open Jupyter Lab with the following command:
137
+
138
+ ``` bash
139
+ jupyter lab
140
+ ```
141
+
142
+ ## Getting Started
143
+
144
+ To get started with pyseter, please check out the "General Overview" [notebook](https://github.com/philpatton/pyseter/blob/main/examples/general-overview.ipynb) in the examples folder of this repository!
@@ -0,0 +1,12 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ pyseter/__init__.py
5
+ pyseter/extract.py
6
+ pyseter/grade.py
7
+ pyseter/sort.py
8
+ pyseter.egg-info/PKG-INFO
9
+ pyseter.egg-info/SOURCES.txt
10
+ pyseter.egg-info/dependency_links.txt
11
+ pyseter.egg-info/requires.txt
12
+ pyseter.egg-info/top_level.txt
@@ -0,0 +1,7 @@
1
+ numpy>=1.21.0
2
+ pandas>=1.3.0
3
+ scikit-learn>=1.1.0
4
+ networkx>=2.6.0
5
+ matplotlib>=3.5.0
6
+ tqdm>=4.60.0
7
+ timm>=0.6.0
@@ -0,0 +1 @@
1
+ pyseter
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+