imageatlas 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- imageatlas-0.1.0/CHANGELOG.md +16 -0
- imageatlas-0.1.0/CONTRIBUTING.md +55 -0
- imageatlas-0.1.0/LICENSE +21 -0
- imageatlas-0.1.0/MANIFEST.in +62 -0
- imageatlas-0.1.0/PKG-INFO +203 -0
- imageatlas-0.1.0/README.md +139 -0
- imageatlas-0.1.0/examples/example_apis.ipynb +376 -0
- imageatlas-0.1.0/examples/example_complete_workflow.py +162 -0
- imageatlas-0.1.0/imageatlas/__init__.py +42 -0
- imageatlas-0.1.0/imageatlas/clustering/__init__.py +14 -0
- imageatlas-0.1.0/imageatlas/clustering/base.py +129 -0
- imageatlas-0.1.0/imageatlas/clustering/factory.py +43 -0
- imageatlas-0.1.0/imageatlas/clustering/gmm.py +165 -0
- imageatlas-0.1.0/imageatlas/clustering/hdbscan_clustering.py +175 -0
- imageatlas-0.1.0/imageatlas/clustering/kmeans.py +148 -0
- imageatlas-0.1.0/imageatlas/core/__init__.py +15 -0
- imageatlas-0.1.0/imageatlas/core/clusterer.py +377 -0
- imageatlas-0.1.0/imageatlas/core/results.py +362 -0
- imageatlas-0.1.0/imageatlas/features/__init__.py +18 -0
- imageatlas-0.1.0/imageatlas/features/adapter.py +0 -0
- imageatlas-0.1.0/imageatlas/features/batch.py +142 -0
- imageatlas-0.1.0/imageatlas/features/cache.py +257 -0
- imageatlas-0.1.0/imageatlas/features/extractors/__init__.py +20 -0
- imageatlas-0.1.0/imageatlas/features/extractors/base.py +73 -0
- imageatlas-0.1.0/imageatlas/features/extractors/clip.py +26 -0
- imageatlas-0.1.0/imageatlas/features/extractors/convnext.py +58 -0
- imageatlas-0.1.0/imageatlas/features/extractors/dinov2.py +42 -0
- imageatlas-0.1.0/imageatlas/features/extractors/efficientnet.py +54 -0
- imageatlas-0.1.0/imageatlas/features/extractors/factory.py +47 -0
- imageatlas-0.1.0/imageatlas/features/extractors/mobilenet.py +58 -0
- imageatlas-0.1.0/imageatlas/features/extractors/resnet.py +63 -0
- imageatlas-0.1.0/imageatlas/features/extractors/swin.py +60 -0
- imageatlas-0.1.0/imageatlas/features/extractors/vgg.py +46 -0
- imageatlas-0.1.0/imageatlas/features/extractors/vit.py +67 -0
- imageatlas-0.1.0/imageatlas/features/loaders.py +187 -0
- imageatlas-0.1.0/imageatlas/features/metadata.py +81 -0
- imageatlas-0.1.0/imageatlas/features/pipeline.py +347 -0
- imageatlas-0.1.0/imageatlas/reduction/__init__.py +20 -0
- imageatlas-0.1.0/imageatlas/reduction/base.py +131 -0
- imageatlas-0.1.0/imageatlas/reduction/factory.py +51 -0
- imageatlas-0.1.0/imageatlas/reduction/pca.py +148 -0
- imageatlas-0.1.0/imageatlas/reduction/tsne.py +173 -0
- imageatlas-0.1.0/imageatlas/reduction/umap_reducer.py +110 -0
- imageatlas-0.1.0/imageatlas/visualization/__init__.py +10 -0
- imageatlas-0.1.0/imageatlas/visualization/grids.py +197 -0
- imageatlas-0.1.0/imageatlas.egg-info/PKG-INFO +203 -0
- imageatlas-0.1.0/imageatlas.egg-info/SOURCES.txt +56 -0
- imageatlas-0.1.0/imageatlas.egg-info/dependency_links.txt +1 -0
- imageatlas-0.1.0/imageatlas.egg-info/requires.txt +14 -0
- imageatlas-0.1.0/imageatlas.egg-info/top_level.txt +1 -0
- imageatlas-0.1.0/pyproject.toml +75 -0
- imageatlas-0.1.0/requirements.txt +9 -0
- imageatlas-0.1.0/setup.cfg +4 -0
- imageatlas-0.1.0/tests/test_batch_processing.py +130 -0
- imageatlas-0.1.0/tests/test_core_api.py +357 -0
- imageatlas-0.1.0/tests/test_features_pipeline.py +139 -0
- imageatlas-0.1.0/tests/test_reduction_module.py +262 -0
- imageatlas-0.1.0/tests/test_visualization.py +379 -0
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.1.0] - 2024-01-22
|
|
9
|
+
### Added
|
|
10
|
+
- Core `ImageClusterer` API for high-level clustering workflows.
|
|
11
|
+
- Feature extraction support for DINOv2, ResNet, ViT, CLIP, Swin, and more.
|
|
12
|
+
- Clustering algorithms: K-Means, GMM, and HDBSCAN.
|
|
13
|
+
- Dimensionality reduction wrappers for PCA, UMAP, and t-SNE.
|
|
14
|
+
- Visualization tools (`GridVisualizer`) for creating image grids from clusters.
|
|
15
|
+
- HDF5 caching system for efficient feature storage.
|
|
16
|
+
- Export functionality to CSV, JSON, and Excel.
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# Contributing to Image Clustering and Visualization Project
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
Thank you for considering contributing to this project! Your contributions are vital in making this tool more effective and accessible. This document outlines the guidelines for contributing to the project.
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
## Getting Started
|
|
8
|
+
|
|
9
|
+
Before you begin contributing, make sure to follow the steps below to set up the project locally and familiarize yourself with the codebase.
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
### Prerequisites
|
|
13
|
+
|
|
14
|
+
- Python 3.8+
|
|
15
|
+
- Pip or Conda for package management
|
|
16
|
+
- GPU (optional, but recommended for faster feature extraction)
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
### Installation
|
|
20
|
+
|
|
21
|
+
1. Clone the repository:
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
git clone https://github.com/ahmadjaved97/ImageClusterViz.git
|
|
25
|
+
cd ImageClusterViz
|
|
26
|
+
2. Set up a virtual environment:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
python -m venv venv
|
|
30
|
+
source venv/bin/activate
|
|
31
|
+
3. Install the required dependencies:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
pip install -r requirements.txt
|
|
35
|
+
## How to Contribute
|
|
36
|
+
|
|
37
|
+
We encourage you to contribute to the project by reporting bugs, suggesting new features, improving documentation, or submitting code contributions.
|
|
38
|
+
|
|
39
|
+
### Reporting Issues
|
|
40
|
+
|
|
41
|
+
If you encounter any bugs or have suggestions for improvements, please create an issue on the [GitHub issue tracker](https://github.com/ahmadjaved97/ImageClusterViz/issues). Please provide a clear and concise description of the issue, including steps to reproduce it if it's a bug.
|
|
42
|
+
|
|
43
|
+
### Suggesting Features
|
|
44
|
+
|
|
45
|
+
Feature requests are also welcome! Please open an issue for feature suggestions and provide details on how the feature would enhance the project. When possible, describe the use case to help us understand your needs.
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
### Community Guidelines
|
|
49
|
+
- Be respectful and constructive in your communication.
|
|
50
|
+
- Provide clear and concise commit messages and issue descriptions.
|
|
51
|
+
- Ensure that any feedback provided to other contributors is thoughtful and - actionable.
|
|
52
|
+
|
|
53
|
+
## License
|
|
54
|
+
|
|
55
|
+
By contributing to this project, you agree that your contributions will be licensed under the MIT License.
|
imageatlas-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Ahmad Javed
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# MANIFEST.in - Specifies additional files to include in the distribution
|
|
2
|
+
|
|
3
|
+
# Include important documentation files
|
|
4
|
+
include README.md
|
|
5
|
+
include LICENSE
|
|
6
|
+
include CONTRIBUTING.md
|
|
7
|
+
include CHANGELOG.md
|
|
8
|
+
include requirements.txt
|
|
9
|
+
include pyproject.toml
|
|
10
|
+
|
|
11
|
+
# Include all Python files in the package
|
|
12
|
+
recursive-include imageatlas *.py
|
|
13
|
+
|
|
14
|
+
# Include type hints marker
|
|
15
|
+
include imageatlas/py.typed
|
|
16
|
+
|
|
17
|
+
# Include test files
|
|
18
|
+
recursive-include tests *.py
|
|
19
|
+
|
|
20
|
+
# Include example files
|
|
21
|
+
recursive-include examples *.py
|
|
22
|
+
recursive-include examples *.ipynb
|
|
23
|
+
|
|
24
|
+
# Exclude compiled Python files
|
|
25
|
+
global-exclude __pycache__
|
|
26
|
+
global-exclude *.py[cod]
|
|
27
|
+
global-exclude *.so
|
|
28
|
+
global-exclude .DS_Store
|
|
29
|
+
|
|
30
|
+
# Exclude cache and build artifacts
|
|
31
|
+
global-exclude *.egg-info
|
|
32
|
+
global-exclude .pytest_cache
|
|
33
|
+
global-exclude .tox
|
|
34
|
+
global-exclude .coverage
|
|
35
|
+
global-exclude htmlcov
|
|
36
|
+
|
|
37
|
+
# Exclude version control files
|
|
38
|
+
global-exclude .git*
|
|
39
|
+
global-exclude .gitignore
|
|
40
|
+
|
|
41
|
+
# Exclude IDE and editor files
|
|
42
|
+
global-exclude .vscode
|
|
43
|
+
global-exclude .idea
|
|
44
|
+
global-exclude *.swp
|
|
45
|
+
global-exclude *.swo
|
|
46
|
+
global-exclude *~
|
|
47
|
+
|
|
48
|
+
# Exclude output directories
|
|
49
|
+
global-exclude output
|
|
50
|
+
global-exclude output_grids
|
|
51
|
+
global-exclude output_clusters
|
|
52
|
+
|
|
53
|
+
# Exclude virtual environment directories
|
|
54
|
+
global-exclude venv
|
|
55
|
+
global-exclude .venv
|
|
56
|
+
global-exclude env
|
|
57
|
+
|
|
58
|
+
# Exclude HDF5 cache files and pickle files
|
|
59
|
+
global-exclude *.h5
|
|
60
|
+
global-exclude *.hdf5
|
|
61
|
+
global-exclude *.pkl
|
|
62
|
+
global-exclude *.pickle
|
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: imageatlas
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: ImageAtlas: A toolkit for organizing, cleaning and analysing your image datasets.
|
|
5
|
+
Author-email: Ahmad Javed <ahmadjaved97@gmail.com>
|
|
6
|
+
Maintainer-email: Ahmad Javed <ahmadjaved97@gmail.com>
|
|
7
|
+
License: MIT License
|
|
8
|
+
|
|
9
|
+
Copyright (c) 2024 Ahmad Javed
|
|
10
|
+
|
|
11
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
12
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
13
|
+
in the Software without restriction, including without limitation the rights
|
|
14
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
15
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
16
|
+
furnished to do so, subject to the following conditions:
|
|
17
|
+
|
|
18
|
+
The above copyright notice and this permission notice shall be included in all
|
|
19
|
+
copies or substantial portions of the Software.
|
|
20
|
+
|
|
21
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
22
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
23
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
24
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
25
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
26
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
27
|
+
SOFTWARE.
|
|
28
|
+
|
|
29
|
+
Project-URL: Homepage, https://github.com/ahmadjaved97/imageatlas
|
|
30
|
+
Project-URL: Documentation, https://github.com/ahmadjaved97/imageatlas
|
|
31
|
+
Project-URL: Repository, https://github.com/ahmadjaved97/imageatlas
|
|
32
|
+
Project-URL: Issues, https://github.com/ahmadjaved97/imageatlas/issues
|
|
33
|
+
Project-URL: Changelog, https://github.com/ahmadjaved97/imageatlas/blob/main/CHANGELOG.md
|
|
34
|
+
Keywords: machine-learning,computer-vision,clustering,embeddings,feature-extraction,dataset-visualization,deep-learning,image-processing,data-science,pytorch
|
|
35
|
+
Classifier: Development Status :: 4 - Beta
|
|
36
|
+
Classifier: Intended Audience :: Developers
|
|
37
|
+
Classifier: Intended Audience :: Science/Research
|
|
38
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
39
|
+
Classifier: Operating System :: OS Independent
|
|
40
|
+
Classifier: Programming Language :: Python :: 3
|
|
41
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
42
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
43
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
44
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
45
|
+
Classifier: Topic :: Scientific/Engineering :: Image Recognition
|
|
46
|
+
Classifier: Topic :: Scientific/Engineering :: Visualization
|
|
47
|
+
Requires-Python: >=3.8
|
|
48
|
+
Description-Content-Type: text/markdown
|
|
49
|
+
License-File: LICENSE
|
|
50
|
+
Requires-Dist: torch>=2.0.1
|
|
51
|
+
Requires-Dist: torchvision>=0.15.2
|
|
52
|
+
Requires-Dist: numpy>=1.24.2
|
|
53
|
+
Requires-Dist: Pillow>=9.5.0
|
|
54
|
+
Requires-Dist: opencv-python>=4.8.0.74
|
|
55
|
+
Requires-Dist: scikit-learn>=1.3.0
|
|
56
|
+
Requires-Dist: tqdm>=4.67.1
|
|
57
|
+
Requires-Dist: h5py>=3.15.1
|
|
58
|
+
Requires-Dist: pandas>=2.3.3
|
|
59
|
+
Provides-Extra: full
|
|
60
|
+
Requires-Dist: umap-learn; extra == "full"
|
|
61
|
+
Requires-Dist: hdbscan; extra == "full"
|
|
62
|
+
Requires-Dist: openpyxl; extra == "full"
|
|
63
|
+
Dynamic: license-file
|
|
64
|
+
|
|
65
|
+
# ImageAtlas
|
|
66
|
+
|
|
67
|
+
## Overview
|
|
68
|
+
|
|
69
|
+
ImageAtlas is a comprehensive toolkit designed to organize, clean, and analyze image datasets.
|
|
70
|
+
|
|
71
|
+
⚠️ Note: ImageAtlas is currently in active development. The current version focuses on clustering and visualization functionality, with additional features coming soon.
|
|
72
|
+
|
|
73
|
+
Perfect for dataset curation, duplicate detection, quality control, and exploratory data analysis.
|
|
74
|
+
|
|
75
|
+
## 📦 Installation
|
|
76
|
+
|
|
77
|
+
**Basic Installation**
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
pip install imageatlas
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**Full Installation**
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
pip install imageatlas[full]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
**From Source**
|
|
90
|
+
```
|
|
91
|
+
git clone https://github.com/ahmadjaved97/ImageAtlas.git
|
|
92
|
+
cd ImageAtlas
|
|
93
|
+
pip install -e .
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## 🚀 Quick Start
|
|
97
|
+
|
|
98
|
+
### Minimal Working Example
|
|
99
|
+
|
|
100
|
+
```python
|
|
101
|
+
import os
|
|
102
|
+
from imageatlas import ImageClusterer
|
|
103
|
+
|
|
104
|
+
# Initialize clusterer
|
|
105
|
+
clusterer = ImageClusterer(
|
|
106
|
+
model='dinov2', # State-of-the-art features
|
|
107
|
+
clustering_method='kmeans',
|
|
108
|
+
n_clusters=10,
|
|
109
|
+
device='cuda' # or 'cpu'
|
|
110
|
+
)
|
|
111
|
+
|
|
112
|
+
# Run clustering on your images
|
|
113
|
+
results = clusterer.fit("./path/to/images")
|
|
114
|
+
|
|
115
|
+
# Save results to JSON
|
|
116
|
+
results.to_json("./output/clustering_results.json")
|
|
117
|
+
|
|
118
|
+
# Create visual grids for each cluster
|
|
119
|
+
results.create_grids(
|
|
120
|
+
image_dir="./path/to/images",
|
|
121
|
+
output_dir="./output/grids"
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
# Organize images into cluster folders
|
|
125
|
+
results.create_cluster_folders(
|
|
126
|
+
image_dir="./path/to/images",
|
|
127
|
+
output_dir="./output/clusters"
|
|
128
|
+
)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
That's it! Your images are now clustered, visualized, and organized.
|
|
132
|
+
|
|
133
|
+
## Available Models & Algorithms
|
|
134
|
+
|
|
135
|
+
### Feature Extraction Models
|
|
136
|
+
|
|
137
|
+
| Model | Variants |
|
|
138
|
+
| ---------------- | --------------------------------------------------- |
|
|
139
|
+
| **DINOv2** | `vits14`, `vitb14`, `vitl14`, `vitg14` |
|
|
140
|
+
| **ViT** | `b_16`, `b_32`, `l_16`, `l_32`, `h_14` |
|
|
141
|
+
| **ResNet** | `18`, `34`, `50`, `101`, `152` |
|
|
142
|
+
| **EfficientNet** | `s`, `m`, `l` |
|
|
143
|
+
| **CLIP** | `RN50`, `RN101`, `ViT-B/32`, `ViT-B/16`, `ViT-L/14` |
|
|
144
|
+
| **ConvNeXt** | `tiny`, `small`, `base`, `large` |
|
|
145
|
+
| **Swin** | `t`, `s`, `b`, `v2_t`, `v2_s`, `v2_b` |
|
|
146
|
+
| **MobileNetV3** | `small`, `large` |
|
|
147
|
+
| **VGG16** | \- |
|
|
148
|
+
|
|
149
|
+
### Clustering Algorithms
|
|
150
|
+
|
|
151
|
+
| Algorithm | Parameters |
|
|
152
|
+
| ----------- | --------------------------------- |
|
|
153
|
+
| **K-Means** | `n_clusters` |
|
|
154
|
+
| **HDBSCAN** | `min_cluster_size`, `min_samples` |
|
|
155
|
+
| **GMM** | `n_components`, `covariance_type` |
|
|
156
|
+
|
|
157
|
+
### Dimensionality Reduction
|
|
158
|
+
|
|
159
|
+
| Method | Parameters |
|
|
160
|
+
| --------------------------| ----------------------------------------- |
|
|
161
|
+
| **PCA** | `n_components`, `whiten` |
|
|
162
|
+
| **UMAP** | `n_components`, `n_neighbors`, `min_dist` |
|
|
163
|
+
| **t-SNE(in development)** | `n_components`, `perplexity` |
|
|
164
|
+
|
|
165
|
+
|
|
166
|
+
## 📝 Citation
|
|
167
|
+
|
|
168
|
+
If you use ImageAtlas in your research, please cite:
|
|
169
|
+
|
|
170
|
+
```bibtex
|
|
171
|
+
@software{imageatlas2024,
|
|
172
|
+
author = {Javed, Ahmad},
|
|
173
|
+
title = {ImageAtlas: A Toolkit for Organizing and Analyzing Image Datasets},
|
|
174
|
+
year = {2024},
|
|
175
|
+
url = {https://github.com/ahmadjaved97/ImageAtlas}
|
|
176
|
+
}
|
|
177
|
+
```
|
|
178
|
+
## Acknowledgments
|
|
179
|
+
|
|
180
|
+
- [DINOv2](https://github.com/facebookresearch/dinov2): Facebook Research
|
|
181
|
+
- [CLIP](https://github.com/openai/CLIP): OpenAI
|
|
182
|
+
- [Vision Transformers](https://github.com/google-research/vision_transformer): Google Research
|
|
183
|
+
- Built with [PyTorch](https://github.com/pytorch/pytorch), [scikit-learn](https://github.com/scikit-learn/scikit-learn), and [OpenCV](https://github.com/opencv/opencv)
|
|
184
|
+
|
|
185
|
+
|
|
186
|
+
### Sample Output
|
|
187
|
+
- Dataset Used: [Fruit and Vegetable Classification](https://www.kaggle.com/code/abdelrahman16/fruit-and-vegetable-classification/input)
|
|
188
|
+
- Number of Clusters: 8
|
|
189
|
+
- Model Used: ViT
|
|
190
|
+
- Clustering Method: Kmeans
|
|
191
|
+
- Output:
|
|
192
|
+
<p align="center">
|
|
193
|
+
<img src="./output_grids/cluster_0.jpg" alt="Image 1" width="250" height="250">
|
|
194
|
+
<img src="./output_grids/cluster_1.jpg" alt="Image 2" width="250" height="250">
|
|
195
|
+
<img src="./output_grids/cluster_2.jpg" alt="Image 3" width="250" height= "250">
|
|
196
|
+
<img src="./output_grids/cluster_3.jpg" alt="Image 3" width="250" height= "250">
|
|
197
|
+
<img src="./output_grids/cluster_4.jpg" alt="Image 3" width="250" height= "250">
|
|
198
|
+
<img src="./output_grids/cluster_5.jpg" alt="Image 3" width="250" height= "250">
|
|
199
|
+
<img src="./output_grids/cluster_6.jpg" alt="Image 3" width="250" height= "250">
|
|
200
|
+
<img src="./output_grids/cluster_7.jpg" alt="Image 3" width="250" height= "250">
|
|
201
|
+
</p>
|
|
202
|
+
|
|
203
|
+
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# ImageAtlas
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
ImageAtlas is a comprehensive toolkit designed to organize, clean, and analyze image datasets.
|
|
6
|
+
|
|
7
|
+
⚠️ Note: ImageAtlas is currently in active development. The current version focuses on clustering and visualization functionality, with additional features coming soon.
|
|
8
|
+
|
|
9
|
+
Perfect for dataset curation, duplicate detection, quality control, and exploratory data analysis.
|
|
10
|
+
|
|
11
|
+
## 📦 Installation
|
|
12
|
+
|
|
13
|
+
**Basic Installation**
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
pip install imageatlas
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
**Full Installation**
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
pip install imageatlas[full]
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**From Source**
|
|
26
|
+
```
|
|
27
|
+
git clone https://github.com/ahmadjaved97/ImageAtlas.git
|
|
28
|
+
cd ImageAtlas
|
|
29
|
+
pip install -e .
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## 🚀 Quick Start
|
|
33
|
+
|
|
34
|
+
### Minimal Working Example
|
|
35
|
+
|
|
36
|
+
```python
|
|
37
|
+
import os
|
|
38
|
+
from imageatlas import ImageClusterer
|
|
39
|
+
|
|
40
|
+
# Initialize clusterer
|
|
41
|
+
clusterer = ImageClusterer(
|
|
42
|
+
model='dinov2', # State-of-the-art features
|
|
43
|
+
clustering_method='kmeans',
|
|
44
|
+
n_clusters=10,
|
|
45
|
+
device='cuda' # or 'cpu'
|
|
46
|
+
)
|
|
47
|
+
|
|
48
|
+
# Run clustering on your images
|
|
49
|
+
results = clusterer.fit("./path/to/images")
|
|
50
|
+
|
|
51
|
+
# Save results to JSON
|
|
52
|
+
results.to_json("./output/clustering_results.json")
|
|
53
|
+
|
|
54
|
+
# Create visual grids for each cluster
|
|
55
|
+
results.create_grids(
|
|
56
|
+
image_dir="./path/to/images",
|
|
57
|
+
output_dir="./output/grids"
|
|
58
|
+
)
|
|
59
|
+
|
|
60
|
+
# Organize images into cluster folders
|
|
61
|
+
results.create_cluster_folders(
|
|
62
|
+
image_dir="./path/to/images",
|
|
63
|
+
output_dir="./output/clusters"
|
|
64
|
+
)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
That's it! Your images are now clustered, visualized, and organized.
|
|
68
|
+
|
|
69
|
+
## Available Models & Algorithms
|
|
70
|
+
|
|
71
|
+
### Feature Extraction Models
|
|
72
|
+
|
|
73
|
+
| Model | Variants |
|
|
74
|
+
| ---------------- | --------------------------------------------------- |
|
|
75
|
+
| **DINOv2** | `vits14`, `vitb14`, `vitl14`, `vitg14` |
|
|
76
|
+
| **ViT** | `b_16`, `b_32`, `l_16`, `l_32`, `h_14` |
|
|
77
|
+
| **ResNet** | `18`, `34`, `50`, `101`, `152` |
|
|
78
|
+
| **EfficientNet** | `s`, `m`, `l` |
|
|
79
|
+
| **CLIP** | `RN50`, `RN101`, `ViT-B/32`, `ViT-B/16`, `ViT-L/14` |
|
|
80
|
+
| **ConvNeXt** | `tiny`, `small`, `base`, `large` |
|
|
81
|
+
| **Swin** | `t`, `s`, `b`, `v2_t`, `v2_s`, `v2_b` |
|
|
82
|
+
| **MobileNetV3** | `small`, `large` |
|
|
83
|
+
| **VGG16** | \- |
|
|
84
|
+
|
|
85
|
+
### Clustering Algorithms
|
|
86
|
+
|
|
87
|
+
| Algorithm | Parameters |
|
|
88
|
+
| ----------- | --------------------------------- |
|
|
89
|
+
| **K-Means** | `n_clusters` |
|
|
90
|
+
| **HDBSCAN** | `min_cluster_size`, `min_samples` |
|
|
91
|
+
| **GMM** | `n_components`, `covariance_type` |
|
|
92
|
+
|
|
93
|
+
### Dimensionality Reduction
|
|
94
|
+
|
|
95
|
+
| Method | Parameters |
|
|
96
|
+
| --------------------------| ----------------------------------------- |
|
|
97
|
+
| **PCA** | `n_components`, `whiten` |
|
|
98
|
+
| **UMAP** | `n_components`, `n_neighbors`, `min_dist` |
|
|
99
|
+
| **t-SNE(in development)** | `n_components`, `perplexity` |
|
|
100
|
+
|
|
101
|
+
|
|
102
|
+
## 📝 Citation
|
|
103
|
+
|
|
104
|
+
If you use ImageAtlas in your research, please cite:
|
|
105
|
+
|
|
106
|
+
```bibtex
|
|
107
|
+
@software{imageatlas2024,
|
|
108
|
+
author = {Javed, Ahmad},
|
|
109
|
+
title = {ImageAtlas: A Toolkit for Organizing and Analyzing Image Datasets},
|
|
110
|
+
year = {2024},
|
|
111
|
+
url = {https://github.com/ahmadjaved97/ImageAtlas}
|
|
112
|
+
}
|
|
113
|
+
```
|
|
114
|
+
## Acknowledgments
|
|
115
|
+
|
|
116
|
+
- [DINOv2](https://github.com/facebookresearch/dinov2): Facebook Research
|
|
117
|
+
- [CLIP](https://github.com/openai/CLIP): OpenAI
|
|
118
|
+
- [Vision Transformers](https://github.com/google-research/vision_transformer): Google Research
|
|
119
|
+
- Built with [PyTorch](https://github.com/pytorch/pytorch), [scikit-learn](https://github.com/scikit-learn/scikit-learn), and [OpenCV](https://github.com/opencv/opencv)
|
|
120
|
+
|
|
121
|
+
|
|
122
|
+
### Sample Output
|
|
123
|
+
- Dataset Used: [Fruit and Vegetable Classification](https://www.kaggle.com/code/abdelrahman16/fruit-and-vegetable-classification/input)
|
|
124
|
+
- Number of Clusters: 8
|
|
125
|
+
- Model Used: ViT
|
|
126
|
+
- Clustering Method: Kmeans
|
|
127
|
+
- Output:
|
|
128
|
+
<p align="center">
|
|
129
|
+
<img src="./output_grids/cluster_0.jpg" alt="Image 1" width="250" height="250">
|
|
130
|
+
<img src="./output_grids/cluster_1.jpg" alt="Image 2" width="250" height="250">
|
|
131
|
+
<img src="./output_grids/cluster_2.jpg" alt="Image 3" width="250" height= "250">
|
|
132
|
+
<img src="./output_grids/cluster_3.jpg" alt="Image 3" width="250" height= "250">
|
|
133
|
+
<img src="./output_grids/cluster_4.jpg" alt="Image 3" width="250" height= "250">
|
|
134
|
+
<img src="./output_grids/cluster_5.jpg" alt="Image 3" width="250" height= "250">
|
|
135
|
+
<img src="./output_grids/cluster_6.jpg" alt="Image 3" width="250" height= "250">
|
|
136
|
+
<img src="./output_grids/cluster_7.jpg" alt="Image 3" width="250" height= "250">
|
|
137
|
+
</p>
|
|
138
|
+
|
|
139
|
+
|