lightly-studio 0.3.1__py3-none-any.whl → 0.3.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of lightly-studio might be problematic. Click here for more details.

Files changed (133) hide show
  1. lightly_studio/__init__.py +4 -4
  2. lightly_studio/api/app.py +1 -1
  3. lightly_studio/api/routes/api/annotation.py +6 -16
  4. lightly_studio/api/routes/api/annotation_label.py +2 -5
  5. lightly_studio/api/routes/api/annotation_task.py +4 -5
  6. lightly_studio/api/routes/api/classifier.py +2 -5
  7. lightly_studio/api/routes/api/dataset.py +2 -3
  8. lightly_studio/api/routes/api/dataset_tag.py +2 -3
  9. lightly_studio/api/routes/api/metadata.py +2 -4
  10. lightly_studio/api/routes/api/metrics.py +2 -6
  11. lightly_studio/api/routes/api/sample.py +5 -13
  12. lightly_studio/api/routes/api/settings.py +2 -6
  13. lightly_studio/api/routes/images.py +6 -6
  14. lightly_studio/core/add_samples.py +383 -0
  15. lightly_studio/core/dataset.py +250 -362
  16. lightly_studio/core/dataset_query/__init__.py +0 -0
  17. lightly_studio/core/dataset_query/boolean_expression.py +67 -0
  18. lightly_studio/core/dataset_query/dataset_query.py +211 -0
  19. lightly_studio/core/dataset_query/field.py +113 -0
  20. lightly_studio/core/dataset_query/field_expression.py +79 -0
  21. lightly_studio/core/dataset_query/match_expression.py +23 -0
  22. lightly_studio/core/dataset_query/order_by.py +79 -0
  23. lightly_studio/core/dataset_query/sample_field.py +28 -0
  24. lightly_studio/core/dataset_query/tags_expression.py +46 -0
  25. lightly_studio/core/sample.py +159 -32
  26. lightly_studio/core/start_gui.py +35 -0
  27. lightly_studio/dataset/edge_embedding_generator.py +13 -8
  28. lightly_studio/dataset/embedding_generator.py +2 -3
  29. lightly_studio/dataset/embedding_manager.py +74 -6
  30. lightly_studio/dataset/fsspec_lister.py +275 -0
  31. lightly_studio/dataset/loader.py +49 -30
  32. lightly_studio/dataset/mobileclip_embedding_generator.py +6 -4
  33. lightly_studio/db_manager.py +145 -0
  34. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/SelectableSvgGroup.BBm0IWdq.css +1 -0
  35. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/SelectableSvgGroup.BNTuXSAe.css +1 -0
  36. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/2O287xak.js +3 -0
  37. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{O-EABkf9.js → 7YNGEs1C.js} +1 -1
  38. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BBoGk9hq.js +1 -0
  39. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BRnH9v23.js +92 -0
  40. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Bg1Y5eUZ.js +1 -0
  41. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{DOlTMNyt.js → BqBqV92V.js} +1 -1
  42. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/C0JiMuYn.js +1 -0
  43. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{DjfY96ND.js → C98Hk3r5.js} +1 -1
  44. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{r64xT6ao.js → CG0dMCJi.js} +1 -1
  45. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{C8I8rFJQ.js → Ccq4ZD0B.js} +1 -1
  46. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Cpy-nab_.js +1 -0
  47. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{Bu7uvVrG.js → Crk-jcvV.js} +1 -1
  48. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Cs31G8Qn.js +1 -0
  49. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CsKrY2zA.js +1 -0
  50. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{x9G_hzyY.js → Cur71c3O.js} +1 -1
  51. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CzgC3GFB.js +1 -0
  52. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D8GZDMNN.js +1 -0
  53. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DFRh-Spp.js +1 -0
  54. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{BylOuP6i.js → DRZO-E-T.js} +1 -1
  55. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{l7KrR96u.js → DcGCxgpH.js} +1 -1
  56. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{Bsi3UGy5.js → Df3aMO5B.js} +1 -1
  57. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{hQVEETDE.js → DkR_EZ_B.js} +1 -1
  58. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DqUGznj_.js +1 -0
  59. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/KpAtIldw.js +1 -0
  60. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/M1Q1F7bw.js +4 -0
  61. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{CDnpyLsT.js → OH7-C_mc.js} +1 -1
  62. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/{D6su9Aln.js → gLNdjSzu.js} +1 -1
  63. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/i0ZZ4z06.js +1 -0
  64. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/app.BI-EA5gL.js +2 -0
  65. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/start.CcsRl3cZ.js +1 -0
  66. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/0.BbO4Zc3r.js +1 -0
  67. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{1.B4rNYwVp.js → 1._I9GR805.js} +1 -1
  68. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/10.J2RBFrSr.js +1 -0
  69. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/12.Cmqj25a-.js +1 -0
  70. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/2.C45iKJHA.js +6 -0
  71. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{3.CWHpKonm.js → 3.w9g4AcAx.js} +1 -1
  72. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{4.OUWOLQeV.js → 4.BBI8KwnD.js} +1 -1
  73. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/5.huHuxdiF.js +1 -0
  74. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/6.CrbkRPam.js +1 -0
  75. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/7.FomEdhD6.js +1 -0
  76. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/8.Cb_ADSLk.js +1 -0
  77. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/{9.CPu3CiBc.js → 9.CajIG5ce.js} +1 -1
  78. lightly_studio/dist_lightly_studio_view_app/_app/version.json +1 -1
  79. lightly_studio/dist_lightly_studio_view_app/index.html +14 -14
  80. lightly_studio/examples/example.py +13 -12
  81. lightly_studio/examples/example_coco.py +13 -0
  82. lightly_studio/examples/example_metadata.py +83 -98
  83. lightly_studio/examples/example_selection.py +7 -19
  84. lightly_studio/examples/example_split_work.py +12 -36
  85. lightly_studio/examples/{example_v2.py → example_yolo.py} +3 -4
  86. lightly_studio/models/annotation/annotation_base.py +7 -8
  87. lightly_studio/models/annotation/instance_segmentation.py +8 -8
  88. lightly_studio/models/annotation/object_detection.py +4 -4
  89. lightly_studio/models/dataset.py +6 -2
  90. lightly_studio/models/sample.py +10 -3
  91. lightly_studio/resolvers/dataset_resolver.py +10 -0
  92. lightly_studio/resolvers/embedding_model_resolver.py +22 -0
  93. lightly_studio/resolvers/sample_resolver.py +53 -9
  94. lightly_studio/resolvers/tag_resolver.py +23 -0
  95. lightly_studio/selection/select.py +55 -46
  96. lightly_studio/selection/select_via_db.py +23 -19
  97. lightly_studio/selection/selection_config.py +6 -3
  98. lightly_studio/services/annotations_service/__init__.py +4 -0
  99. lightly_studio/services/annotations_service/update_annotation.py +21 -32
  100. lightly_studio/services/annotations_service/update_annotation_bounding_box.py +36 -0
  101. lightly_studio-0.3.2.dist-info/METADATA +689 -0
  102. {lightly_studio-0.3.1.dist-info → lightly_studio-0.3.2.dist-info}/RECORD +104 -91
  103. lightly_studio/api/db.py +0 -133
  104. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/SelectableSvgGroup.OwPEPQZu.css +0 -1
  105. lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/SelectableSvgGroup.b653GmVf.css +0 -1
  106. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/B2FVR0s0.js +0 -1
  107. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/B9zumHo5.js +0 -1
  108. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/BJXwVxaE.js +0 -1
  109. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/Bx1xMsFy.js +0 -1
  110. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CcaPhhk3.js +0 -1
  111. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CvOmgdoc.js +0 -93
  112. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/CxtLVaYz.js +0 -3
  113. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D5-A_Ffd.js +0 -4
  114. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D6RI2Zrd.js +0 -1
  115. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/D98V7j6A.js +0 -1
  116. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DIRAtgl0.js +0 -1
  117. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/DjUWrjOv.js +0 -1
  118. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/XO7A28GO.js +0 -1
  119. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/nAHhluT7.js +0 -1
  120. lightly_studio/dist_lightly_studio_view_app/_app/immutable/chunks/vC4nQVEB.js +0 -1
  121. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/app.CjnvpsmS.js +0 -2
  122. lightly_studio/dist_lightly_studio_view_app/_app/immutable/entry/start.0o1H7wM9.js +0 -1
  123. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/0.XRq_TUwu.js +0 -1
  124. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/10.DfBwOEhN.js +0 -1
  125. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/12.CwF2_8mP.js +0 -1
  126. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/2.CS4muRY-.js +0 -6
  127. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/5.Dm6t9F5W.js +0 -1
  128. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/6.Bw5ck4gK.js +0 -1
  129. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/7.CF0EDTR6.js +0 -1
  130. lightly_studio/dist_lightly_studio_view_app/_app/immutable/nodes/8.Cw30LEcV.js +0 -1
  131. lightly_studio-0.3.1.dist-info/METADATA +0 -520
  132. /lightly_studio/dist_lightly_studio_view_app/_app/immutable/assets/{OpenSans- → OpenSans-Medium.DVUZMR_6.ttf} +0 -0
  133. {lightly_studio-0.3.1.dist-info → lightly_studio-0.3.2.dist-info}/WHEEL +0 -0
@@ -0,0 +1,689 @@
1
+ Metadata-Version: 2.4
2
+ Name: lightly-studio
3
+ Version: 0.3.2
4
+ Summary: LightlyStudio is a lightweight, fast, and easy-to-use data exploration tool for data scientists and engineers.
5
+ Classifier: Operating System :: MacOS :: MacOS X
6
+ Classifier: Operating System :: Microsoft :: Windows
7
+ Classifier: Operating System :: POSIX :: Linux
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.8
10
+ Classifier: Programming Language :: Python :: 3.9
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Requires-Python: >=3.8
14
+ Requires-Dist: annotated-types==0.7.0
15
+ Requires-Dist: duckdb-engine<0.17,>=0.15.0
16
+ Requires-Dist: duckdb<1.3,>=1.2.2
17
+ Requires-Dist: environs<12.0.0
18
+ Requires-Dist: eval-type-backport>=0.2.2
19
+ Requires-Dist: fastapi>=0.115.5
20
+ Requires-Dist: faster-coco-eval>=1.6.5
21
+ Requires-Dist: fsspec>=2023.1.0
22
+ Requires-Dist: labelformat>=0.1.7
23
+ Requires-Dist: lightly-mundig==0.1.3
24
+ Requires-Dist: open-clip-torch>=2.20.0
25
+ Requires-Dist: python-multipart>=0.0.20
26
+ Requires-Dist: scikit-learn==1.3.2
27
+ Requires-Dist: sqlmodel>=0.0.22
28
+ Requires-Dist: torchmetrics>=1.5.2
29
+ Requires-Dist: tqdm>=4.65.0
30
+ Requires-Dist: typing-extensions>=4.12.2
31
+ Requires-Dist: uvicorn>=0.32.1
32
+ Requires-Dist: xxhash>=3.5.0
33
+ Description-Content-Type: text/markdown
34
+
35
+ <div align="center">
36
+ <p align="center">
37
+
38
+ <!-- prettier-ignore -->
39
+ <img src="https://cdn.prod.website-files.com/62cd5ce03261cba217188442/66dac501a8e9a90495970876_Logo%20dark-short-p-800.png" height="50px">
40
+
41
+ **The open-source tool curating datasets**
42
+
43
+ ---
44
+
45
+ [![PyPI python](https://img.shields.io/pypi/pyversions/lightly-studio)](https://pypi.org/project/lightly-studio)
46
+ [![PyPI version](https://badge.fury.io/py/lightly-studio.svg)](https://pypi.org/project/lightly-studio)
47
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
48
+
49
+ </p>
50
+ </div>
51
+
52
+ # 🚀 Welcome to LightlyStudio!
53
+
54
+ We at **[Lightly](https://lightly.ai)** created **LightlyStudio**, an open-source tool
55
+ designed to supercharge your data curation workflows for computer vision datasets. Explore
56
+ your data, visualize annotations and crops, tag samples, and export curated lists to improve
57
+ your machine learning pipelines. And much more!
58
+
59
+ LightlyStudio runs entirely locally on your machine, keeping your data private. It consists
60
+ of a Python library for indexing your data and a web-based UI for visualization and curation.
61
+
62
+ ## ✨ Core Workflow
63
+
64
+ Using LightlyStudio typically involves these steps:
65
+
66
+ 1. **Index Your Dataset:** Run a Python script using the `lightly_studio` library to process your local dataset (images and annotations) and save metadata into a local `lightly_studio.db` file.
67
+ 2. **Launch the UI:** The script then starts a local web server.
68
+ 3. **Explore & Curate:** Use the UI to visualize images, annotations, and object crops. Filter and search your data (experimental text search available). Apply tags to interesting samples (e.g., "mislabeled", "review").
69
+ 4. **Export Curated Data:** Export information (like filenames) for your tagged samples from the UI to use downstream.
70
+ 5. **Stop the Server:** Close the terminal running the script (Ctrl+C) when done.
71
+
72
+ <p align="center">
73
+ <img alt="LightlyStudio Sample Grid View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_grid_view.jpg" width="70%">
74
+ <br/>
75
+ <em>Visualize your dataset samples with annotations in the grid view.</em>
76
+ </p>
77
+ <p align="center">
78
+ <img alt="LightlyStudio Annotation Crop View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_annotation_view.jpg" width="70%">
79
+ <br/>
80
+ <em>Switch to the annotation view to inspect individual object crops easily.</em>
81
+ </p>
82
+ <p align="center">
83
+ <img alt="LightlyStudio Sample Detail View" src="https://storage.googleapis.com/lightly-public/studio/screenshot_detail_view.jpg" width="70%">
84
+ <br/>
85
+ <em>Inspect individual samples in detail, viewing all annotations and metadata.</em>
86
+ </p>
87
+
88
+ ## 🎯 Features
89
+
90
+ - **Local Web GUI:** Explore and curate your dataset in your browser. Works completely
91
+ offline, your data never leaves your machine.
92
+ - **Flexible Input Formats:** Load your image dataset from a folder, or with annotations from
93
+ a number of popular formats like e.g. COCO or YOLO.
94
+ - **Metadata:** Attach your custom metadata to every sample.
95
+ - **Tags:** Mark subsets of your dataset for later use.
96
+ - **Embeddings:** Run similarity search queries on your data.
97
+ - **Selection:** Run advanced selection algorithms to tag a subset of your data.
98
+
99
+ ## 💻 Installation
100
+
101
+ Ensure you have **Python 3.8 or higher**. We strongly recommend using a virtual environment.
102
+
103
+ The library is OS-independent and works on Windows, Linux, and macOS.
104
+
105
+ ```shell
106
+ # 1. Create and activate a virtual environment (Recommended)
107
+ # On Linux/macOS:
108
+ python3 -m venv venv
109
+ source venv/bin/activate
110
+
111
+ # On Windows:
112
+ python -m venv venv
113
+ .\venv\Scripts\activate
114
+
115
+ # 2. Install LightlyStudio
116
+ pip install lightly_studio
117
+ ```
118
+
119
+ ## **Quickstart**
120
+
121
+ Download the dataset and run a quickstart script to load your dataset and launch the app.
122
+
123
+ ### YOLO Object Detection
124
+
125
+ To run an example using a yolo dataset, clone the example repository and run the example script from below:
126
+
127
+ ```shell
128
+ git clone https://github.com/lightly-ai/dataset_examples dataset_examples
129
+ ```
130
+
131
+ **`example_yolo.py` script to explore the dataset:**
132
+
133
+ ```python
134
+ from pathlib import Path
135
+
136
+ import lightly_studio as ls
137
+
138
+ data_yaml_path = Path(__file__).resolve().parent / "data.yaml"
139
+
140
+ # Create a dataset and add the samples from the yolo format
141
+ dataset = ls.Dataset.create()
142
+ dataset.add_samples_from_yolo(
143
+ data_yaml=data_yaml_path,
144
+ input_split="test",
145
+ )
146
+
147
+ # Start the UI application on the port 8001.
148
+ ls.start_gui()
149
+
150
+ ```
151
+
152
+ <details>
153
+ <summary>The YOLO format details:</summary>
154
+
155
+ ```
156
+ road_signs_yolo/
157
+ ├── train/
158
+ │ ├── images/
159
+ │ │ ├── image1.jpg
160
+ │ │ ├── image2.jpg
161
+ │ │ └── ...
162
+ │ └── labels/
163
+ │ ├── image1.txt
164
+ │ ├── image2.txt
165
+ │ └── ...
166
+ ├── valid/ (optional)
167
+ │ ├── images/
168
+ │ │ └── ...
169
+ │ └── labels/
170
+ │ └── ...
171
+ └── data.yaml
172
+ ```
173
+
174
+ Each label file should contain YOLO format annotations (one per line):
175
+
176
+ ```
177
+ <class> <x_center> <y_center> <width> <height>
178
+ ```
179
+
180
+ Where coordinates are normalized between 0 and 1.
181
+
182
+ </details>
183
+
184
+ ### COCO Instance Segmentation
185
+
186
+ To run an instance segmentation example using a COCO dataset, clone the example repository and run the example script from below:
187
+
188
+ ```shell
189
+ git clone https://github.com/lightly-ai/dataset_examples dataset_examples
190
+ ```
191
+
192
+ **`example_coco.py` script to explore the dataset:**
193
+
194
+ ```python
195
+ from pathlib import Path
196
+
197
+ import lightly_studio as ls
198
+
199
+ current_dir = Path(__file__).resolve().parent
200
+
201
+ # Create a dataset and add the samples from the coco format
202
+ dataset = ls.Dataset.create()
203
+ dataset.add_samples_from_coco(
204
+ annotations_json=current_dir / "instances_train2017.json",
205
+ images_path=current_dir / "images",
206
+ annotation_type=ls.AnnotationType.INSTANCE_SEGMENTATION,
207
+ )
208
+
209
+ # Start the UI application on the port 8001.
210
+ ls.start_gui()
211
+ ```
212
+
213
+ <details>
214
+ <summary>The COCO format details:</summary>
215
+
216
+ ```
217
+ coco_subset_128_images/
218
+ ├── images/
219
+ │ ├── image1.jpg
220
+ │ ├── image2.jpg
221
+ │ └── ...
222
+ └── instances_train2017.json # Single JSON file containing all annotations
223
+ ```
224
+
225
+ COCO uses a single JSON file containing all annotations. The format consists of three main components:
226
+
227
+ - Images: Defines metadata for each image in the dataset.
228
+ - Categories: Defines the object classes.
229
+ - Annotations: Defines object instances.
230
+
231
+ </details>
232
+
233
+ ## 🔍 How It Works
234
+
235
+ 1. Your **Python script** uses the `lightly_studio` **Dataset**.
236
+ 2. The `dataset.add_samples_from_<source>` reads your images and annotations, calculates embeddings, and saves metadata to a local **`lightly_studio.db`** file (using DuckDB).
237
+ 3. `lightly_studio.start_gui()` starts a **local Backend API** server.
238
+ 4. This server reads from `lightly_studio.db` and serves data to the **UI Application** running in your browser (`http://localhost:8001`).
239
+ 5. Images are streamed directly from your disk for display in the UI.
240
+
241
+ ## 🎯 Python Interface
242
+
243
+ ### Dataset
244
+
245
+ #### Load Images From A Folder
246
+
247
+ ```py
248
+ import lightly_studio as ls
249
+
250
+ dataset = ls.Dataset.create()
251
+ dataset.add_samples_from_path(path="/path/to/image_dataset")
252
+
253
+ ls.start_gui()
254
+ ```
255
+
256
+ #### Load Images With Annotations
257
+
258
+ The `Dataset` currently supports:
259
+
260
+ - **YOLOv8 Object Detection:** Reads `.yaml` file. Supports bounding boxes.
261
+ - **COCO Object Detection:** Reads `.json` annotations. Supports bounding boxes.
262
+ - **COCO Instance Segmentation:** Reads `.json` annotations. Supports instance masks in RLE (Run-Length Encoding) format.
263
+
264
+ ```py
265
+ # Load a dataset in YOLO format
266
+ import lightly_studio as ls
267
+
268
+ dataset = ls.Dataset.create()
269
+ dataset.add_samples_from_yolo(
270
+ data_yaml="my_yolo_dataset/data.yaml",
271
+ input_split="val",
272
+ )
273
+
274
+ ls.start_gui()
275
+ ```
276
+
277
+ ```py
278
+ # Load an object detection/instance segmentation dataset in COCO format
279
+ import lightly_studio as ls
280
+
281
+ dataset = ls.Dataset.create()
282
+ dataset.add_samples_from_coco(
283
+ annotations_json="my_coco_dataset/detections_train.json",
284
+ images_path="my_coco_dataset/images",
285
+ # If using instance segmentation, uncomment the next line.
286
+ # annotation_type=ls.AnnotationType.INSTANCE_SEGMENTATION,
287
+ )
288
+
289
+ ls.start_gui()
290
+ ```
291
+
292
+ #### Load an Existing Dataset
293
+
294
+ It is also possible to load an existing dataset by
295
+
296
+ ```py
297
+ import lightly_studio as ls
298
+
299
+ dataset = ls.Dataset.load_or_create()
300
+ ```
301
+
302
+ This will load the dataset if it does exist in the `.db` file, else it will create a new dataset.
303
+
304
+ ### Samples
305
+
306
+ The dataset consists of samples. Every sample corresponds to an image.
307
+ Dataset samples can be fetched and accessed as follows,
308
+ for a full list of attributes see [sample](src/lightly_studio/core/sample.py).
309
+
310
+ ```py
311
+ # Get all dataset samples
312
+ samples = list(dataset)
313
+
314
+ # Access sample attributes
315
+ s = samples[0]
316
+ s.sample_id # Sample ID
317
+ s.file_name # Image file name
318
+ s.file_path_abs # Full image file path
319
+ s.tags # The list of sample tags
320
+ s.metadata["key"] # dict-like access for metadata
321
+
322
+ # Set sample attributes
323
+ s.tags = {"tag1", "tag2"}
324
+ s.metadata["key"] = 123
325
+
326
+ # Adding/removing tags
327
+ s.add_tag("some_tag")
328
+ s.remove_tag("some_tag")
329
+
330
+ ...
331
+ ```
332
+
333
+ ### Dataset Query
334
+
335
+ You can efficiently fetch filtered dataset samples with a `DatasetQuery()` object. To get a query for a existing `dataset`:
336
+
337
+ ```py
338
+ query = dataset.query()
339
+ ```
340
+
341
+ By defining the `match`, `order_by`, and `slice` for a query, the intended filtering is set. If one of them is not required, they can be skipped.
342
+
343
+ When the query is used to fetch samples, the order of execution is:
344
+
345
+ 1. `match`
346
+ 2. `order_by`
347
+ 3. `slice`
348
+
349
+ #### Example Query Usage
350
+
351
+ ```py
352
+ from lightly_studio.core.dataset_query.boolean_expression import OR
353
+ from lightly_studio.core.dataset_query.order_by import OrderByField
354
+ from lightly_studio.core.dataset_query.sample_field import SampleField
355
+
356
+ query = dataset.match(
357
+ OR(
358
+ SampleField.file_name == "a",
359
+ SampleField.file_name == "b",
360
+ )
361
+ ).order_by(
362
+ OrderByField(SampleField.width).desc()
363
+ ).slice(offset=10, limit=10)
364
+
365
+ query.add_tag("query_result")
366
+
367
+ ```
368
+
369
+ <details>
370
+ <summary>Advanced Example:</summary>
371
+
372
+ ```py
373
+ from lightly_studio.core.dataset_query.boolean_expression import AND, OR, NOT
374
+ from lightly_studio.core.dataset_query.order_by import OrderByField
375
+ from lightly_studio.core.dataset_query.sample_field import SampleField
376
+
377
+ query = dataset.match(
378
+ OR(
379
+ SampleField.file_name == "a",
380
+ SampleField.file_name == "b",
381
+ AND(
382
+ SampleField.width > 10,
383
+ SampleField.width < 20,
384
+ NOT(SampleField.tags.contains("dog")),
385
+ ),
386
+ )
387
+ ).order_by(
388
+ OrderByField(SampleField.width).desc()
389
+ ).slice(offset=10, limit=10)
390
+
391
+ query.add_tag("query_result")
392
+
393
+ for sample in query:
394
+ print(sample.tags)
395
+
396
+ ```
397
+
398
+ </details>
399
+
400
+ #### Define the Query: `match`
401
+
402
+ The filtering for a query can be set by:
403
+
404
+ ```py
405
+ query = query.match(expression)
406
+ ```
407
+
408
+ To create an expression for filtering on certain sample fields, the `SampleField.<field_name> <operator> <value>` syntax can be used. Available field names can be seen in [`SampleField`](src/lightly_studio/core/dataset_query/sample_field.py).
409
+
410
+ <details>
411
+ <summary>SampleField Examples:</summary>
412
+
413
+ ```py
414
+ from lightly_studio.core.dataset_query.sample_field import SampleField
415
+
416
+ # Ordinal fields: <, <=, >, >=, ==, !=
417
+
418
+ expr = SampleField.height >= 10 # All samples with images that are taller than 9 pixels
419
+ expr = SampleField.width == 10 # All samples with images that are exactly 10 pixels wide
420
+ expr = SampleField.created_at > datetime # All samples created after datetime (actual datetime object)
421
+
422
+ # String fields: ==, !=
423
+ expr = SampleField.file_name == "some" # All samples with "some" as file name
424
+ expr = SampleField.file_path_abs != "other" # All samples that are not having "other" as file_path
425
+
426
+ # Tags: contains()
427
+ expr = SampleField.tags.contains("dog") # All samples that contain the tag "dog"
428
+
429
+ # Assign any of the previous expressions to a query:
430
+ query = query.match(expr)
431
+ ```
432
+
433
+ </details>
434
+
435
+ The filtering on individual fields can flexibly be combined to create more complex match expression. For this, the boolean operators [`AND`](src/lightly_studio/core/dataset_query/boolean_expression.py), [`OR`](src/lightly_studio/core/dataset_query/boolean_expression.py), and [`NOT`](src/lightly_studio/core/dataset_query/boolean_expression.py) are available. Boolean operators can arbitrarily be nested.
436
+
437
+ <details>
438
+ <summary>Boolean Examples:</summary>
439
+
440
+ ```py
441
+ from lightly_studio.core.dataset_query.boolean_expression import AND, OR, NOT
442
+ from lightly_studio.core.dataset_query.sample_field import SampleField
443
+
444
+ # All samples with images that are between 10 and 20 pixels wide
445
+ expr = AND(
446
+ SampleField.width > 10,
447
+ SampleField.width < 20
448
+ )
449
+
450
+ # All samples with file names that are either "a" or "b"
451
+ expr = OR(
452
+ SampleField.file_name == "a",
453
+ SampleField.file_name == "b"
454
+ )
455
+
456
+ # All samples which do not contain a tag "dog"
457
+ expr = NOT(SampleField.tags.contains("dog"))
458
+
459
+ # All samples for a nested expression
460
+ expr = OR(
461
+ SampleField.file_name == "a",
462
+ SampleField.file_name == "b",
463
+ AND(
464
+ SampleField.width > 10,
465
+ SampleField.width < 20,
466
+ NOT(
467
+ SampleField.tags.contains("dog")
468
+ ),
469
+ ),
470
+ )
471
+
472
+ # Assign any of the previous expressions to a query:
473
+ query = query.match(expr)
474
+ ```
475
+
476
+ </details>
477
+
478
+ #### Define the Query: `order_by`
479
+
480
+ Setting the sorting of a query can done by
481
+
482
+ ```py
483
+ query = query.order_by(expression)
484
+ ```
485
+
486
+ The order expression can be defined by `OrderByField(SampleField.<field_name>).<order_direction>()`.
487
+
488
+ <details>
489
+ <summary>OrderByField Examples:</summary>
490
+
491
+ ```py
492
+ from lightly_studio.core.dataset_query.order_by import OrderByField
493
+ from lightly_studio.core.dataset_query.sample_field import SampleField
494
+
495
+ # Sort the query by the width of the image in ascending order
496
+ expr = OrderByField(SampleField.width)
497
+ expr = OrderByField(SampleField.width).asc()
498
+
499
+ # Sort the query by the height of the image in descending order
500
+ expr = OrderByField(SampleField.file_name).desc()
501
+
502
+ # Assign any of the previous expressions to a query:
503
+ query = query.order_by(expr)
504
+ ```
505
+
506
+ </details>
507
+
508
+ #### Define the Query: `slice`
509
+
510
+ Setting the slicing of a query can done by:
511
+
512
+ ```py
513
+ query = query.slice(offset, limit)
514
+ # OR
515
+ query = query[offset:stop]
516
+ ```
517
+
518
+ Both are different syntax for the same operation.
519
+
520
+ <details>
521
+ <summary>Slice Examples:</summary>
522
+
523
+ ```py
524
+ # Slice 2:5
525
+ query = query.slice(offset=2, limit=3)
526
+ query = query[2:5]
527
+
528
+ # Slice :5
529
+ query = query.slice(limit=5)
530
+ query = query[:5]
531
+
532
+ # Slice 5:
533
+ query = query.slice(offset=5)
534
+ query = query[5:]
535
+ ```
536
+
537
+ </details>
538
+
539
+ #### Access the Samples
540
+
541
+ To access the filtered samples two possibilities are available: iterating over the query object or calling the `to_list()` method.
542
+
543
+ **Iterating over the query:**
544
+
545
+ ```py
546
+ query = dataset.query().match(match_expression).order_by(order_by_expression).slice(offset,limit)
547
+
548
+ samples = []
549
+ for sample in query:
550
+ samples.append(sample)
551
+ ```
552
+
553
+ **Get all samples as list:**
554
+
555
+ ```py
556
+ query = dataset.query().match(match_expression).order_by(order_by_expression).slice(offset,limit)
557
+
558
+ samples = query.to_list()
559
+ ```
560
+
561
+ In some use cases, one might want to assign a tag to the samples that are the result of a query:
562
+
563
+ ```py
564
+ query.add_tag("tag_name")
565
+ ```
566
+
567
+ ### Examples
568
+
569
+ #### Add Custom Metadata
570
+
571
+ Attach values to custom fields for every sample.
572
+
573
+ ```py
574
+ import lightly_studio as ls
575
+
576
+ # Load your dataset
577
+ dataset = ls.Dataset.create()
578
+ dataset.add_samples_from_path(path="/path/to/image_dataset")
579
+
580
+ # Attach metadata
581
+ for sample in dataset:
582
+ sample.metadata["my_metadata"] = f"Example metadata field for {sample.file_name}"
583
+ sample.metadata["my_dict"] = {"my_int_key": 10, "my_bool_key": True}
584
+
585
+ # View metadata in GUI
586
+ ls.start_gui()
587
+ ```
588
+
589
+ #### Tags
590
+
591
+ You can easily mark subsets of your data with tags.
592
+
593
+ ```py
594
+ import lightly_studio as ls
595
+
596
+ # Load your dataset
597
+ dataset = ls.Dataset.create()
598
+ dataset.add_samples_from_path(path="/path/to/image_dataset")
599
+
600
+ # Tag the first 10 samples:
601
+ query = dataset.query()[:10]
602
+ query.add_tag("some_tag")
603
+ ```
604
+
605
+ Find existing tags and tagged samples as follows.
606
+
607
+ ```py
608
+ import lightly_studio as ls
609
+
610
+ # Load your dataset
611
+ dataset = ls.Dataset.create()
612
+ dataset.add_samples_from_path(path="/path/to/image_dataset")
613
+
614
+ # Get all samples that contain the tag "dog"
615
+ query = dataset.query().match(SampleField.tags.contains("dog"))
616
+ samples = query.to_list()
617
+ ```
618
+
619
+ ### Selection
620
+
621
+ LightlyStudio offers as a premium feature advanced methods for subselecting dataset
622
+ samples.
623
+
624
+ **Prerequisites:** The selection functionality requires a valid LightlyStudio license key.
625
+ Set the `LIGHTLY_STUDIO_LICENSE_KEY` environment variable before using selection features:
626
+
627
+ ```bash
628
+ export LIGHTLY_STUDIO_LICENSE_KEY="license_key_here"
629
+ ```
630
+
631
+ Alternatively, set it inside your Python script:
632
+
633
+ ```py
634
+ import os
635
+ os.environ["LIGHTLY_STUDIO_LICENSE_KEY"] = "license_key_here"
636
+ ```
637
+
638
+ The selection can be configured directly from a `DatasetQuery`. The example below showcases a simple case of selecting diverse samples.
639
+
640
+ ```py
641
+ import lightly_studio as ls
642
+
643
+ # Load your dataset
644
+ dataset = ls.Dataset.load_or_create()
645
+ dataset.add_samples_from_path(path="/path/to/image_dataset")
646
+
647
+ # Select a diverse subset of 10 samples.
648
+ dataset.query().selection().diverse(
649
+ n_samples_to_select=10,
650
+ selection_result_tag_name="diverse_selection",
651
+ )
652
+
653
+ ls.start_gui()
654
+ ```
655
+
656
+ The selected sample paths can be exported via the GUI, or by a script:
657
+
658
+ ```py
659
+ import lightly_studio as ls
660
+ from lightly_studio.core.dataset_query.sample_field import SampleField
661
+
662
+ dataset = ls.Dataset.load("my-dataset")
663
+ selected_samples = (
664
+ dataset.match(SampleField.tags.contains("diverse_selection")).to_list()
665
+ )
666
+
667
+ with open("export.txt", "w") as f:
668
+ for sample in selected_samples:
669
+ f.write(f"{sample.file_path_abs}\n")
670
+ ```
671
+
672
+ ## 📚 **FAQ**
673
+
674
+ ### Are the datasets persistent?
675
+
676
+ Yes, the information about datasets is persistent and stored in the db file. You can see it after the dataset is processed.
677
+ If you rerun the loader it will create a new dataset representing the same dataset, keeping the previous dataset information untouched.
678
+
679
+ ### Can I change the database path?
680
+
681
+ Not yet. The database is stored in the working directory by default.
682
+
683
+ ### Can I launch in another Python script or do I have to do it in the same script?
684
+
685
+ It is possible to use only one script at the same time because we lock the db file for the duration of the script.
686
+
687
+ ### Can I change the API backend port?
688
+
689
+ Yes. To change the port set the LIGHTLY_STUDIO_PORT variable to your preffered value. If at runtime the port is unavailable it will try to set it to a random value.