lightly-studio 0.3.4__py3-none-any.whl → 0.4.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of lightly-studio might be problematic. Click here for more details.

Files changed (98) hide show
  1. lightly_studio/api/features.py +3 -5
  2. lightly_studio/api/routes/api/dataset_tag.py +10 -0
  3. lightly_studio/api/routes/api/embeddings2d.py +8 -37
  4. lightly_studio/core/dataset.py +89 -2
  5. lightly_studio/core/dataset_query/__init__.py +14 -0
  6. lightly_studio/core/sample.py +33 -1
  7. lightly_studio/db_manager.py +4 -2
  8. lightly_studio/dist_lightly_studio_view_app/_app/env.js +1 -1
  9. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/0.CN4hnTks.css +1 -0
  10. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/{Samples.CIbricz7.css → Samples.C0_eo9eP.css} +1 -1
  11. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/_layout.kFFGI0zL.css +1 -0
  12. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/{transform.2jKMtOWG.css → transform.sLzR40om.css} +1 -1
  13. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{qIv1kPyv.js → BOmrKuMn.js} +1 -1
  14. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{B7302SU7.js → BPpOWbDa.js} +1 -1
  15. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BaFFwDFr.js +1 -0
  16. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{CZGpyrcA.js → BiGQqqJP.js} +1 -1
  17. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{D_JuJOO3.js → BrNKoXwc.js} +1 -1
  18. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BsaJCCG_.js +96 -0
  19. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{sLqs1uaK.js → BtXGzlpP.js} +1 -1
  20. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{Frwd2CjB.js → C3xJX0nD.js} +1 -1
  21. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{H4l0JFh9.js → CANX9QXL.js} +1 -1
  22. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CAPx0Bfm.js +1 -0
  23. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CP9M7pei.js +39 -0
  24. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CWuDkrMZ.js +436 -0
  25. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/ChlxSwqI.js +1 -0
  26. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{DzBTnFhV.js → Cj4nZbtb.js} +1 -1
  27. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{Bqz7dyEC.js → ClzkJBWk.js} +1 -1
  28. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CpbA3HU7.js +2 -0
  29. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{DQ8aZ1o-.js → D8ZGoCPm.js} +1 -1
  30. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DMJzr1NB.js +1 -0
  31. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{CSCQddQS.js → DNJnBfHs.js} +1 -1
  32. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{DSxvnAMh.js → DUtlYNuP.js} +1 -1
  33. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DVxjPOJB.js +1 -0
  34. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DmGM9V9Q.js +1 -0
  35. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DoEId1MK.js +1 -0
  36. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DthpwYR_.js +2 -0
  37. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DyIcJj6J.js +1 -0
  38. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/SiegjVo0.js +1 -0
  39. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{Dj4O-5se.js → WEyXQRi6.js} +1 -1
  40. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/gBp1tBnA.js +1 -0
  41. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/xQhUoIl9.js +1 -0
  42. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/app.Y-sSoz5q.js +2 -0
  43. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/start.CvxVp0Cu.js +1 -0
  44. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/0.0Fm6E-5B.js +4 -0
  45. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{1.Cdy-7S5q.js → 1.DB-0vkHb.js} +1 -1
  46. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/10.vaUePh5k.js +1 -0
  47. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/11.7i7ljNVT.js +1 -0
  48. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{12.DcO8wIAc.js → 13.9qy3WtZv.js} +1 -1
  49. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{2.BIldfkxL.js → 2.Drwwdm7A.js} +95 -94
  50. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{3.BC9z_TWM.js → 3.D3X_-Wan.js} +1 -1
  51. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{4.D8X_Ch5n.js → 4.C9TqY3tA.js} +1 -1
  52. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/5.iRw6HCWX.js +39 -0
  53. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{6.DRA5Ru_2.js → 6.fqfYR7dB.js} +1 -1
  54. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{7.WVBsruHQ.js → 7.C7gMM-gk.js} +1 -1
  55. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/8.C4v1w-oS.js +20 -0
  56. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/9.DbHcSiMn.js +1 -0
  57. lightly_studio/dist_lightly_studio_view_app/_app/version.json +1 -1
  58. lightly_studio/dist_lightly_studio_view_app/index.html +14 -14
  59. lightly_studio/models/caption.py +1 -0
  60. lightly_studio/models/sample.py +7 -0
  61. lightly_studio/models/settings.py +5 -0
  62. lightly_studio/resolvers/dataset_resolver.py +2 -4
  63. lightly_studio/resolvers/sample_resolver.py +1 -0
  64. lightly_studio/resolvers/settings_resolver.py +3 -0
  65. lightly_studio/resolvers/twodim_embedding_resolver.py +29 -0
  66. lightly_studio/selection/__init__.py +1 -0
  67. lightly_studio/selection/mundig.py +41 -0
  68. lightly_studio-0.4.0.dist-info/METADATA +78 -0
  69. {lightly_studio-0.3.4.dist-info → lightly_studio-0.4.0.dist-info}/RECORD +71 -69
  70. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/0.B3oFNb6O.css +0 -1
  71. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/_layout.7Ma7YdVg.css +0 -1
  72. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/-DXuGN29.js +0 -1
  73. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BeWf8-vJ.js +0 -1
  74. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CfQ4mGwl.js +0 -1
  75. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CiaNZCBa.js +0 -1
  76. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Cqo0Vpvt.js +0 -417
  77. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Cy4fgWTG.js +0 -1
  78. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D5w4xp5l.js +0 -1
  79. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DD63uD-T.js +0 -1
  80. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D_ynJAfY.js +0 -2
  81. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Dafy4oEQ.js +0 -1
  82. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DmjAI-UV.js +0 -1
  83. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Dug7Bq1S.js +0 -1
  84. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Dv5BSBQG.js +0 -1
  85. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DzX_yyqb.js +0 -1
  86. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/H60ATh8g.js +0 -2
  87. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/u-it74zV.js +0 -96
  88. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/app.BPc0HQPq.js +0 -2
  89. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/start.SNvc2nrm.js +0 -1
  90. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/0.5jT7P06o.js +0 -1
  91. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/10.C_uoESTX.js +0 -1
  92. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/5.CAXhxJu6.js +0 -39
  93. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/8.BuKUrCEN.js +0 -20
  94. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/9.CUIn1yCR.js +0 -1
  95. lightly_studio/selection/README.md +0 -6
  96. lightly_studio-0.3.4.dist-info/METADATA +0 -879
  97. /lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{11.CWG1ehzT.js → 12.CWG1ehzT.js} +0 -0
  98. {lightly_studio-0.3.4.dist-info → lightly_studio-0.4.0.dist-info}/WHEEL +0 -0
@@ -1,879 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: lightly-studio
3
- Version: 0.3.4
4
- Summary: LightlyStudio is a lightweight, fast, and easy-to-use data exploration tool for data scientists and engineers.
5
- Classifier: Operating System :: MacOS :: MacOS X
6
- Classifier: Operating System :: Microsoft :: Windows
7
- Classifier: Operating System :: POSIX :: Linux
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: Programming Language :: Python :: 3.8
10
- Classifier: Programming Language :: Python :: 3.9
11
- Classifier: Programming Language :: Python :: 3.10
12
- Classifier: Programming Language :: Python :: 3.11
13
- Classifier: Programming Language :: Python :: 3.12
14
- Requires-Python: <3.13,>=3.8
15
- Requires-Dist: annotated-types==0.7.0
16
- Requires-Dist: duckdb-engine<0.17,>=0.15.0
17
- Requires-Dist: duckdb<1.3,>=1.2.2
18
- Requires-Dist: environs<12.0.0
19
- Requires-Dist: eval-type-backport>=0.2.2
20
- Requires-Dist: fastapi>=0.115.5
21
- Requires-Dist: faster-coco-eval>=1.6.5
22
- Requires-Dist: fsspec>=2023.1.0
23
- Requires-Dist: labelformat>=0.1.7
24
- Requires-Dist: lightly-mundig==0.1.4
25
- Requires-Dist: open-clip-torch>=2.20.0
26
- Requires-Dist: pyarrow>=17.0.0
27
- Requires-Dist: python-multipart>=0.0.20
28
- Requires-Dist: scikit-learn==1.3.2
29
- Requires-Dist: sqlmodel>=0.0.22
30
- Requires-Dist: tqdm>=4.65.0
31
- Requires-Dist: typing-extensions>=4.12.2
32
- Requires-Dist: uvicorn>=0.32.1
33
- Requires-Dist: xxhash>=3.5.0
34
- Provides-Extra: cloud-storage
35
- Requires-Dist: adlfs>=2023.1.0; extra == 'cloud-storage'
36
- Requires-Dist: gcsfs>=2023.1.0; extra == 'cloud-storage'
37
- Requires-Dist: s3fs>=2023.1.0; extra == 'cloud-storage'
38
- Provides-Extra: lightly-edge
39
- Requires-Dist: lightly-edge-sdk>=1.0.1b2; extra == 'lightly-edge'
40
- Requires-Dist: opencv-python; extra == 'lightly-edge'
41
- Description-Content-Type: text/markdown
42
-
43
- <div align="center">
44
- <p align="center">
45
-
46
- <!-- prettier-ignore -->
47
- <img src="https://cdn.prod.website-files.com/62cd5ce03261cba217188442/66dac501a8e9a90495970876_Logo%20dark-short-p-800.png" height="50px">
48
-
49
- **The open-source tool curating datasets**
50
-
51
- ---
52
-
53
- [![PyPI python](https://img.shields.io/pypi/pyversions/lightly-studio)](https://pypi.org/project/lightly-studio)
54
- [![PyPI version](https://badge.fury.io/py/lightly-studio.svg)](https://pypi.org/project/lightly-studio)
55
- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
56
-
57
- </p>
58
- </div>
59
-
60
- # 🚀 Welcome to LightlyStudio!
61
-
62
- We at **[Lightly](https://lightly.ai)** created **LightlyStudio**, an open-source tool
63
- designed to supercharge your data curation workflows for computer vision datasets. Explore
64
- your data, visualize captions, annotations and crops, tag samples, and export curated lists to improve
65
- your machine learning pipelines. And much more!
66
-
67
- LightlyStudio runs entirely locally on your machine, keeping your data private. It consists
68
- of a Python library for indexing your data and a web-based UI for visualization and curation.
69
-
70
- ## ✨ Core Workflow
71
-
72
- Using LightlyStudio typically involves these steps:
73
-
74
- 1. **Index Your Dataset:** Run a Python script using the `lightly_studio` library to process your local dataset (images and annotations) and save metadata into a local `lightly_studio.db` file.
75
- 2. **Launch the UI:** The script then starts a local web server.
76
- 3. **Explore & Curate:** Use the UI to visualize images, annotations, captions, and object crops. Filter and search your data (experimental text search available). Apply tags to interesting samples (e.g., "mislabeled", "review").
77
- 4. **Export Curated Data:** Export information (like filenames) for your tagged samples from the UI to use downstream.
78
- 5. **Stop the Server:** Close the terminal running the script (Ctrl+C) when done.
79
-
80
- <p align="center">
81
- <img alt="LightlyStudio Sample Grid View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_grid_view.jpg" width="70%">
82
- <br/>
83
- <em>Visualize your dataset samples with annotations in the grid view.</em>
84
- </p>
85
- <p align="center">
86
- <img alt="LightlyStudio Annotation Crop View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_annotation_view.jpg" width="70%">
87
- <br/>
88
- <em>Switch to the annotation view to inspect individual object crops easily.</em>
89
- </p>
90
- <p align="center">
91
- <img alt="LightlyStudio Sample Detail View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_detail_view.jpg" width="70%">
92
- <br/>
93
- <em>Inspect individual samples in detail, viewing all annotations and metadata.</em>
94
- </p>
95
-
96
- ## 🎯 Features
97
-
98
- - **Local Web GUI:** Explore and curate your dataset in your browser. Works completely
99
- offline, your data never leaves your machine.
100
- - **Flexible Input Formats:** Load your image dataset from a folder, or with annotations from
101
- a number of popular formats like e.g. COCO or YOLO.
102
- - **Metadata:** Attach your custom metadata to every sample.
103
- - **Tags:** Mark subsets of your dataset for later use.
104
- - **Embeddings:** Run similarity search queries on your data.
105
- - **Selection:** Run advanced selection algorithms to tag a subset of your data.
106
-
107
- ## 💻 Installation
108
-
109
- Ensure you have **Python 3.8 or higher**. We strongly recommend using a virtual environment.
110
-
111
- The library is OS-independent and works on Windows, Linux, and macOS.
112
-
113
- ```shell
114
- # 1. Create and activate a virtual environment (Recommended)
115
- # On Linux/MacOS:
116
- python3 -m venv venv
117
- source venv/bin/activate
118
-
119
- # On Windows:
120
- python -m venv venv
121
- .\venv\Scripts\activate
122
-
123
- # 2. Install LightlyStudio
124
- pip install lightly-studio
125
- ```
126
-
127
- ## **Quickstart**
128
-
129
- Download example datasets by cloning the example repository:
130
-
131
- ```shell
132
- git clone https://github.com/lightly-ai/dataset_examples dataset_examples
133
- ```
134
-
135
- ### YOLO Object Detection
136
-
137
- To run an example using a YOLO dataset, create a file named `example_yolo.py` with the
138
- following contents in the same directory that contains the `dataset_examples/` folder:
139
-
140
- ```python
141
- # example_yolo.py
142
-
143
- import lightly_studio as ls
144
-
145
- # Create a dataset and add the samples from the yolo format
146
- dataset = ls.Dataset.create()
147
- dataset.add_samples_from_yolo(
148
- data_yaml="dataset_examples/road_signs_yolo/data.yaml",
149
- )
150
-
151
- # Start the UI application on the port 8001.
152
- ls.start_gui()
153
- ```
154
-
155
- Run the script:
156
-
157
- ```
158
- python example_yolo.py
159
- ```
160
-
161
- When you are done, stop the app by pressing Ctrl+C in the terminal.
162
-
163
- <details>
164
- <summary>The YOLO format details:</summary>
165
-
166
- ```
167
- road_signs_yolo/
168
- ├── train/
169
- │ ├── images/
170
- │ │ ├── image1.jpg
171
- │ │ ├── image2.jpg
172
- │ │ └── ...
173
- │ └── labels/
174
- │ ├── image1.txt
175
- │ ├── image2.txt
176
- │ └── ...
177
- ├── valid/ (optional)
178
- │ ├── images/
179
- │ │ └── ...
180
- │ └── labels/
181
- │ └── ...
182
- └── data.yaml
183
- ```
184
-
185
- Each label file should contain YOLO format annotations (one per line):
186
-
187
- ```
188
- <class> <x_center> <y_center> <width> <height>
189
- ```
190
-
191
- Where coordinates are normalized between 0 and 1.
192
-
193
- </details>
194
-
195
- ### COCO Instance Segmentation
196
-
197
- To run an instance segmentation example using a COCO dataset, create a file named
198
- `example_coco.py` with the following contents in the same directory that contains
199
- the `dataset_examples/` folder:
200
-
201
- ```python
202
- # example_coco.py
203
-
204
- import lightly_studio as ls
205
-
206
- # Create a dataset and add the samples from the coco format
207
- dataset = ls.Dataset.create()
208
- dataset.add_samples_from_coco(
209
- annotations_json="dataset_examples/coco_subset_128_images/instances_train2017.json",
210
- images_path="dataset_examples/coco_subset_128_images/images",
211
- annotation_type=ls.AnnotationType.INSTANCE_SEGMENTATION,
212
- )
213
-
214
- # Start the UI application on the port 8001.
215
- ls.start_gui()
216
- ```
217
-
218
- Run the script:
219
-
220
- ```
221
- python example_coco.py
222
- ```
223
-
224
- When you are done, stop the app by pressing Ctrl+C in the terminal.
225
-
226
- <details>
227
- <summary>The COCO format details:</summary>
228
-
229
- ```
230
- coco_subset_128_images/
231
- ├── images/
232
- │ ├── image1.jpg
233
- │ ├── image2.jpg
234
- │ └── ...
235
- └── instances_train2017.json # Single JSON file containing all annotations
236
- ```
237
-
238
- COCO uses a single JSON file containing all annotations. The format consists of three main components:
239
-
240
- - Images: Defines metadata for each image in the dataset.
241
- - Categories: Defines the object classes.
242
- - Annotations: Defines object instances.
243
-
244
- </details>
245
-
246
- ### COCO Captions
247
-
248
- To run a caption example using a COCO dataset, create a file named
249
- `example_coco_captions.py` with the following contents in the same directory that contains
250
- the `dataset_examples/` folder:
251
-
252
- ```python
253
- # example_coco_captions.py
254
-
255
- import lightly_studio as ls
256
-
257
- # Create a dataset and add the samples from the coco format
258
- dataset = ls.Dataset.create()
259
- dataset.add_samples_from_coco_caption(
260
- annotations_json="dataset_examples/coco_subset_128_images/captions_train2017.json",
261
- images_path="dataset_examples/coco_subset_128_images/images",
262
- )
263
-
264
- # Start the UI application on the port 8001.
265
- ls.start_gui()
266
- ```
267
-
268
- Run the script:
269
-
270
- ```
271
- python example_coco_captions.py
272
- ```
273
-
274
- Now you can inspect samples with their assigned captions in the app. When you are done,
275
- stop the app by pressing Ctrl+C in the terminal.
276
-
277
- <details>
278
- <summary>The COCO format details:</summary>
279
-
280
- ```
281
- coco_subset_128_images/
282
- ├── images/
283
- │ ├── image1.jpg
284
- │ ├── image2.jpg
285
- │ └── ...
286
- └── captions_train2017.json # Single JSON file containing all captions
287
- ```
288
-
289
- COCO uses a single JSON file containing all captions. The format consists of three main components:
290
-
291
- - Images: Defines metadata for each image in the dataset.
292
- - Annotations: Defines the captions.
293
-
294
- </details>
295
-
296
- ## 🔍 How It Works
297
-
298
- 1. Your **Python script** uses the `lightly_studio` **Dataset**.
299
- 2. The `dataset.add_samples_from_<source>` reads your images and annotations, calculates embeddings, and saves metadata to a local **`lightly_studio.db`** file (using DuckDB).
300
- 3. `lightly_studio.start_gui()` starts a **local Backend API** server.
301
- 4. This server reads from `lightly_studio.db` and serves data to the **UI Application** running in your browser (`http://localhost:8001`).
302
- 5. Images are streamed directly from your disk for display in the UI.
303
-
304
- ## 🎯 Python Interface
305
-
306
- ### Dataset
307
-
308
- #### Load Images From A Folder
309
-
310
- ```py
311
- import lightly_studio as ls
312
-
313
- dataset = ls.Dataset.create()
314
- dataset.add_samples_from_path(path="/path/to/image_dataset")
315
-
316
- ls.start_gui()
317
- ```
318
-
319
- #### ☁️ Cloud Storage Support
320
-
321
- #### Installation with Cloud Storage Support
322
-
323
- ```shell
324
- pip install lightly-studio[cloud-storage]
325
- ```
326
-
327
- #### Example: Loading Dataset from Cloud Storage
328
-
329
- ```python
330
- import lightly_studio as ls
331
-
332
- dataset = ls.Dataset.create()
333
-
334
- # Load dataset from S3
335
- dataset.add_samples_from_path(path="s3://my-bucket/path/to/images/")
336
-
337
- # You can use glob pattern in the file path
338
- dataset.add_samples_from_path(path="s3://my-bucket/path/to/images/**/*.jpg") # matches all .jpg files recursively
339
-
340
- # Load dataset from gcs
341
- dataset.add_samples_from_path(path="gs://path/to/images/")
342
-
343
- ls.start_gui()
344
- ```
345
-
346
- **Note**: Currently, cloud storage support is limited to loading images only. Annotation files (YOLO labels, COCO JSON files, etc.) cannot be loaded directly from cloud storage paths.
347
-
348
- #### Authentication
349
-
350
- **Important**: Cloud storage authentication must be configured before running LightlyStudio. The application relies on your existing cloud storage credentials and will not prompt for authentication.
351
-
352
- #### AWS S3
353
-
354
- You can use either of the following two options:
355
-
356
- - **Set environment variables manually**: Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` (LightlyStudio uses `s3fs` under the hood to connect to S3)
357
- - **Authenticate using AWS CLI**: Run `aws configure` (this will automatically set the environment variables that LightlyStudio can access)
358
-
359
- #### Google Cloud Storage
360
-
361
- You can use either of the following two options:
362
-
363
- - **Set environment variable manually**: Set `GOOGLE_APPLICATION_CREDENTIALS` pointing to your service account key file (LightlyStudio uses `gcsfs` under the hood to connect to GCS)
364
- - **Authenticate using gcloud CLI**: Run `gcloud auth application-default login` (this will automatically set the environment variables that LightlyStudio can access)
365
-
366
- #### Load Images With Annotations
367
-
368
- The `Dataset` currently supports:
369
-
370
- - **YOLOv8 Object Detection:** Reads `.yaml` file. Supports bounding boxes.
371
- - **COCO Object Detection:** Reads `.json` annotations. Supports bounding boxes.
372
- - **COCO Instance Segmentation:** Reads `.json` annotations. Supports instance masks in RLE (Run-Length Encoding) format.
373
-
374
- ```py
375
- # Load a dataset in YOLO format
376
- import lightly_studio as ls
377
-
378
- dataset = ls.Dataset.create()
379
- dataset.add_samples_from_yolo(
380
- data_yaml="my_yolo_dataset/data.yaml",
381
- )
382
-
383
- ls.start_gui()
384
- ```
385
-
386
- ```py
387
- # Load an object detection/instance segmentation dataset in COCO format
388
- import lightly_studio as ls
389
-
390
- dataset = ls.Dataset.create()
391
- dataset.add_samples_from_coco(
392
- annotations_json="my_coco_dataset/detections_train.json",
393
- images_path="my_coco_dataset/images",
394
- # If using instance segmentation, uncomment the next line.
395
- # annotation_type=ls.AnnotationType.INSTANCE_SEGMENTATION,
396
- )
397
-
398
- ls.start_gui()
399
- ```
400
-
401
- #### Load an Existing Dataset
402
-
403
- It is also possible to load an existing dataset by
404
-
405
- ```py
406
- import lightly_studio as ls
407
-
408
- dataset = ls.Dataset.load_or_create()
409
- ```
410
-
411
- This will load the dataset if it does exist in the `.db` file, else it will create a new dataset.
412
-
413
- ### Samples
414
-
415
- The dataset consists of samples. Every sample corresponds to an image.
416
- Dataset samples can be fetched and accessed as follows,
417
- for a full list of attributes see [sample](src/lightly_studio/core/sample.py).
418
-
419
- ```py
420
- # Get all dataset samples
421
- samples = list(dataset)
422
-
423
- # Access sample attributes
424
- s = samples[0]
425
- s.sample_id # Sample ID
426
- s.file_name # Image file name
427
- s.file_path_abs # Full image file path
428
- s.tags # The list of sample tags
429
- s.metadata["key"] # dict-like access for metadata
430
-
431
- # Set sample attributes
432
- s.tags = {"tag1", "tag2"}
433
- s.metadata["key"] = 123
434
-
435
- # Adding/removing tags
436
- s.add_tag("some_tag")
437
- s.remove_tag("some_tag")
438
-
439
- ...
440
- ```
441
-
442
- ### Dataset Query
443
-
444
- You can efficiently fetch filtered dataset samples with a `DatasetQuery()` object. To get a query for a existing `dataset`:
445
-
446
- ```py
447
- query = dataset.query()
448
- ```
449
-
450
- By defining the `match`, `order_by`, and `slice` for a query, the intended filtering is set. If one of them is not required, they can be skipped.
451
-
452
- When the query is used to fetch samples, the order of execution is:
453
-
454
- 1. `match`
455
- 2. `order_by`
456
- 3. `slice`
457
-
458
- #### Example Query Usage
459
-
460
- ```py
461
- from lightly_studio.core.dataset_query.boolean_expression import OR
462
- from lightly_studio.core.dataset_query.order_by import OrderByField
463
- from lightly_studio.core.dataset_query.sample_field import SampleField
464
-
465
- query = dataset.match(
466
- OR(
467
- SampleField.file_name == "a",
468
- SampleField.file_name == "b",
469
- )
470
- ).order_by(
471
- OrderByField(SampleField.width).desc()
472
- ).slice(offset=10, limit=10)
473
-
474
- query.add_tag("query_result")
475
-
476
- ```
477
-
478
- <details>
479
- <summary>Advanced Example:</summary>
480
-
481
- ```py
482
- from lightly_studio.core.dataset_query.boolean_expression import AND, OR, NOT
483
- from lightly_studio.core.dataset_query.order_by import OrderByField
484
- from lightly_studio.core.dataset_query.sample_field import SampleField
485
-
486
- query = dataset.match(
487
- OR(
488
- SampleField.file_name == "a",
489
- SampleField.file_name == "b",
490
- AND(
491
- SampleField.width > 10,
492
- SampleField.width < 20,
493
- NOT(SampleField.tags.contains("dog")),
494
- ),
495
- )
496
- ).order_by(
497
- OrderByField(SampleField.width).desc()
498
- ).slice(offset=10, limit=10)
499
-
500
- query.add_tag("query_result")
501
-
502
- for sample in query:
503
- print(sample.tags)
504
-
505
- ```
506
-
507
- </details>
508
-
509
- #### Define the Query: `match`
510
-
511
- The filtering for a query can be set by:
512
-
513
- ```py
514
- query = query.match(expression)
515
- ```
516
-
517
- To create an expression for filtering on certain sample fields, the `SampleField.<field_name> <operator> <value>` syntax can be used. Available field names can be seen in [`SampleField`](src/lightly_studio/core/dataset_query/sample_field.py).
518
-
519
- <details>
520
- <summary>SampleField Examples:</summary>
521
-
522
- ```py
523
- from lightly_studio.core.dataset_query.sample_field import SampleField
524
-
525
- # Ordinal fields: <, <=, >, >=, ==, !=
526
-
527
- expr = SampleField.height >= 10 # All samples with images that are taller than 9 pixels
528
- expr = SampleField.width == 10 # All samples with images that are exactly 10 pixels wide
529
- expr = SampleField.created_at > datetime # All samples created after datetime (actual datetime object)
530
-
531
- # String fields: ==, !=
532
- expr = SampleField.file_name == "some" # All samples with "some" as file name
533
- expr = SampleField.file_path_abs != "other" # All samples that are not having "other" as file_path
534
-
535
- # Tags: contains()
536
- expr = SampleField.tags.contains("dog") # All samples that contain the tag "dog"
537
-
538
- # Assign any of the previous expressions to a query:
539
- query = query.match(expr)
540
- ```
541
-
542
- </details>
543
-
544
- The filtering on individual fields can flexibly be combined to create more complex match expression. For this, the boolean operators [`AND`](src/lightly_studio/core/dataset_query/boolean_expression.py), [`OR`](src/lightly_studio/core/dataset_query/boolean_expression.py), and [`NOT`](src/lightly_studio/core/dataset_query/boolean_expression.py) are available. Boolean operators can arbitrarily be nested.
545
-
546
- <details>
547
- <summary>Boolean Examples:</summary>
548
-
549
- ```py
550
- from lightly_studio.core.dataset_query.boolean_expression import AND, OR, NOT
551
- from lightly_studio.core.dataset_query.sample_field import SampleField
552
-
553
- # All samples with images that are between 10 and 20 pixels wide
554
- expr = AND(
555
- SampleField.width > 10,
556
- SampleField.width < 20
557
- )
558
-
559
- # All samples with file names that are either "a" or "b"
560
- expr = OR(
561
- SampleField.file_name == "a",
562
- SampleField.file_name == "b"
563
- )
564
-
565
- # All samples which do not contain a tag "dog"
566
- expr = NOT(SampleField.tags.contains("dog"))
567
-
568
- # All samples for a nested expression
569
- expr = OR(
570
- SampleField.file_name == "a",
571
- SampleField.file_name == "b",
572
- AND(
573
- SampleField.width > 10,
574
- SampleField.width < 20,
575
- NOT(
576
- SampleField.tags.contains("dog")
577
- ),
578
- ),
579
- )
580
-
581
- # Assign any of the previous expressions to a query:
582
- query = query.match(expr)
583
- ```
584
-
585
- </details>
586
-
587
- #### Define the Query: `order_by`
588
-
589
- Setting the sorting of a query can done by
590
-
591
- ```py
592
- query = query.order_by(expression)
593
- ```
594
-
595
- The order expression can be defined by `OrderByField(SampleField.<field_name>).<order_direction>()`.
596
-
597
- <details>
598
- <summary>OrderByField Examples:</summary>
599
-
600
- ```py
601
- from lightly_studio.core.dataset_query.order_by import OrderByField
602
- from lightly_studio.core.dataset_query.sample_field import SampleField
603
-
604
- # Sort the query by the width of the image in ascending order
605
- expr = OrderByField(SampleField.width)
606
- expr = OrderByField(SampleField.width).asc()
607
-
608
- # Sort the query by the height of the image in descending order
609
- expr = OrderByField(SampleField.file_name).desc()
610
-
611
- # Assign any of the previous expressions to a query:
612
- query = query.order_by(expr)
613
- ```
614
-
615
- </details>
616
-
617
- #### Define the Query: `slice`
618
-
619
- Setting the slicing of a query can done by:
620
-
621
- ```py
622
- query = query.slice(offset, limit)
623
- # OR
624
- query = query[offset:stop]
625
- ```
626
-
627
- Both are different syntax for the same operation.
628
-
629
- <details>
630
- <summary>Slice Examples:</summary>
631
-
632
- ```py
633
- # Slice 2:5
634
- query = query.slice(offset=2, limit=3)
635
- query = query[2:5]
636
-
637
- # Slice :5
638
- query = query.slice(limit=5)
639
- query = query[:5]
640
-
641
- # Slice 5:
642
- query = query.slice(offset=5)
643
- query = query[5:]
644
- ```
645
-
646
- </details>
647
-
648
- #### Access the Samples
649
-
650
- To access the filtered samples two possibilities are available: iterating over the query object or calling the `to_list()` method.
651
-
652
- **Iterating over the query:**
653
-
654
- ```py
655
- query = dataset.query().match(match_expression).order_by(order_by_expression).slice(offset,limit)
656
-
657
- samples = []
658
- for sample in query:
659
- samples.append(sample)
660
- ```
661
-
662
- **Get all samples as list:**
663
-
664
- ```py
665
- query = dataset.query().match(match_expression).order_by(order_by_expression).slice(offset,limit)
666
-
667
- samples = query.to_list()
668
- ```
669
-
670
- In some use cases, one might want to assign a tag to the samples that are the result of a query:
671
-
672
- ```py
673
- query.add_tag("tag_name")
674
- ```
675
-
676
- #### Export Samples
677
-
678
- Currently, exporting to the COCO object detection format is supported and only annotations
679
- of type object detection are exported. The following example exports the samples in the query
680
- to a COCO JSON file named `coco_export.json`:
681
-
682
- ```py
683
- query.export().to_coco_object_detections()
684
- ```
685
-
686
- ### Examples
687
-
688
- #### Add Custom Metadata
689
-
690
- Attach values to custom fields for every sample.
691
-
692
- ```py
693
- import lightly_studio as ls
694
-
695
- # Load your dataset
696
- dataset = ls.Dataset.create()
697
- dataset.add_samples_from_path(path="/path/to/image_dataset")
698
-
699
- # Attach metadata
700
- for sample in dataset:
701
- sample.metadata["my_metadata"] = f"Example metadata field for {sample.file_name}"
702
- sample.metadata["my_dict"] = {"my_int_key": 10, "my_bool_key": True}
703
-
704
- # View metadata in GUI
705
- ls.start_gui()
706
- ```
707
-
708
- #### Tags
709
-
710
- You can easily mark subsets of your data with tags.
711
-
712
- ```py
713
- import lightly_studio as ls
714
-
715
- # Load your dataset
716
- dataset = ls.Dataset.create()
717
- dataset.add_samples_from_path(path="/path/to/image_dataset")
718
-
719
- # Tag the first 10 samples:
720
- query = dataset.query()[:10]
721
- query.add_tag("some_tag")
722
- ```
723
-
724
- Find existing tags and tagged samples as follows.
725
-
726
- ```py
727
- import lightly_studio as ls
728
-
729
- # Load your dataset
730
- dataset = ls.Dataset.create()
731
- dataset.add_samples_from_path(path="/path/to/image_dataset")
732
-
733
- # Get all samples that contain the tag "dog"
734
- query = dataset.query().match(SampleField.tags.contains("dog"))
735
- samples = query.to_list()
736
- ```
737
-
738
- ### Selection
739
-
740
- LightlyStudio offers as a premium feature advanced methods for subselecting dataset
741
- samples.
742
-
743
- **Prerequisites:** The selection functionality requires a valid LightlyStudio license key.
744
- Set the `LIGHTLY_STUDIO_LICENSE_KEY` environment variable before using selection features:
745
-
746
- ```bash
747
- # On Linux/MacOS
748
- export LIGHTLY_STUDIO_LICENSE_KEY="license_key_here"
749
-
750
- # On Windows (PowerShell)
751
- $env:LIGHTLY_STUDIO_LICENSE_KEY="license_key_here"
752
- ```
753
-
754
- Alternatively, set it inside your Python script:
755
-
756
- ```py
757
- import os
758
- os.environ["LIGHTLY_STUDIO_LICENSE_KEY"] = "license_key_here"
759
- ```
760
-
761
- Or in a `.env` file:
762
-
763
- ```
764
- LIGHTLY_STUDIO_LICENSE_KEY="license_key_here"
765
- ```
766
-
767
- #### Diversity Selection
768
-
769
- Diversity selection can be configured directly from a `DatasetQuery`. The example below showcases a simple case of selecting diverse samples.
770
-
771
- ```py
772
- import lightly_studio as ls
773
-
774
- # Load your dataset
775
- dataset = ls.Dataset.load_or_create()
776
- dataset.add_samples_from_path(path="/path/to/image_dataset")
777
-
778
- # Select a diverse subset of 10 samples.
779
- dataset.query().selection().diverse(
780
- n_samples_to_select=10,
781
- selection_result_tag_name="diverse_selection",
782
- )
783
-
784
- ls.start_gui()
785
- ```
786
-
787
- #### Metadata Weighting Selection
788
-
789
- You can select samples based on the values of a metadata field. The example below showcases a simple case of selecting samples with the highest metadata value.
790
-
791
- ```py
792
- import lightly_studio as ls
793
-
794
- # Load your dataset
795
- dataset = ls.Dataset.load_or_create()
796
- dataset.add_samples_from_path(path="/path/to/image_dataset")
797
- # Compute and store 'typicality' metadata.
798
- dataset.compute_typicality_metadata(metadata_name="typicality")
799
-
800
- # Select the 5 samples with the highest 'typicality' scores.
801
- dataset.query().selection().metadata_weighting(
802
- n_samples_to_select=5,
803
- selection_result_tag_name="metadata_weighting_selection",
804
- metadata_key="typicality",
805
- )
806
- ```
807
-
808
- #### Selection Based on Multiple Strategies
809
-
810
- You can configure multiple strategies, the selection takes into account all of them at the same time, weighted by the `strength` parameter.
811
-
812
- ```py
813
- import lightly_studio as ls
814
- from lightly_studio.selection.selection_config import (
815
- MetadataWeightingStrategy,
816
- EmbeddingDiversityStrategy,
817
- )
818
-
819
- # Load your dataset
820
- dataset = ls.Dataset.load_or_create()
821
- dataset.add_samples_from_path(path="/path/to/image_dataset")
822
- # Compute typicality and store it as `typicality` metadata
823
- dataset.compute_typicality_metadata(metadata_name="typicality")
824
-
825
- # Select 10 samples by combining typicality and diversity, diversity having double the strength.
826
- dataset.query().selection().multi_strategies(
827
- n_samples_to_select=10,
828
- selection_result_tag_name="multi_strategy_selection",
829
- selection_strategies=[
830
- MetadataWeightingStrategy(metadata_key="typicality", strength=1.0),
831
- EmbeddingDiversityStrategy(embedding_model_name="my_model_name", strength=2.0),
832
- ],
833
- )
834
- ```
835
-
836
- #### Exporting Selected Samples
837
-
838
- The selected sample paths can be exported via the GUI, or by a script:
839
-
840
- ```py
841
- import lightly_studio as ls
842
- from lightly_studio.core.dataset_query.sample_field import SampleField
843
-
844
- dataset = ls.Dataset.load("my-dataset")
845
- selected_samples = (
846
- dataset.match(SampleField.tags.contains("diverse_selection")).to_list()
847
- )
848
-
849
- with open("export.txt", "w") as f:
850
- for sample in selected_samples:
851
- f.write(f"{sample.file_path_abs}\n")
852
- ```
853
-
854
- ## 📚 **FAQ**
855
-
856
- ### Does LightlyStudio persist the datasets?
857
-
858
- Yes, the information about datasets is persisted in a database file. You can see inspect
859
- it after the dataset is processed. Use `Dataset.load()` to load a dataset from a pre-existing
860
- database.
861
-
862
- ### Can I change the database path?
863
-
864
- Yes, the database can be selected as follows:
865
- ```py
866
- import lightly_studio as ls
867
-
868
- ls.db_manager.connect(db_file="custom.db")
869
- ```
870
-
871
- ### Can I use LightlyStudio from two scripts in parallel?
872
-
873
- Only one script can be run at one time as the app uses a database lock for data integrity.
874
-
875
- ### Can I change the API backend host and port?
876
-
877
- Yes, by setting environment variables. For the host set the LIGHTLY_STUDIO_HOST variable,
878
- to change the port set the LIGHTLY_STUDIO_PORT variable. Note that if the port is unavailable
879
- at runtime the app uses a random port number.