PyPI - datago - Versions diffs - 2025.8.1__tar.gz → 2025.10.2__tar.gz - Mend

datago 2025.8.1tar.gz → 2025.10.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

{datago-2025.8.1 → datago-2025.10.2}/Cargo.lock RENAMED Viewed

@@ -613,7 +613,7 @@ dependencies = [
 [[package]]
 name = "datago"
-version = "2025.8.1"
+version = "2025.10.2"
 dependencies = [
  "async-compression",
  "async-tar",

{datago-2025.8.1 → datago-2025.10.2}/Cargo.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [package]
 name = "datago"
 edition = "2021"
-version = "2025.8.1"
+version = "2025.10.2"
 readme = "README.md"
 [lib]

{datago-2025.8.1 → datago-2025.10.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: datago
-Version: 2025.8.1
+Version: 2025.10.2
 Classifier: Programming Language :: Rust
 Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Programming Language :: Python :: Implementation :: PyPy
@@ -97,7 +97,7 @@ config = {
     "source_config": {
         "root_path": "myPath",
         "random_sampling": False, # True if used directly for training
-        "rank": 0,
+        "rank": 0, # Optional, distributed workloads are possible
         "world_size": 1,
     },
     "limit": 200,
@@ -137,15 +137,6 @@ client_config = {
         "rank": 0,
         "world_size": 1,
     },
-    # Optional pre-processing of the images, placing them in an aspect ratio bucket to preseve as much as possible of the original content
-    "image_config": {
-        "crop_and_resize": True, # False to turn it off, or just omit this part of the config
-        "default_image_size": 1024,
-        "downsampling_ratio": 32,
-        "min_aspect_ratio": 0.5,
-        "max_aspect_ratio": 2.0,
-        "pre_encode_images": False,
-    },
     "prefetch_buffer_size": 128,
     "samples_buffer_size": 64,
     "limit": 1_000_000, # Dummy example, max number of samples you would like to serve
@@ -159,6 +150,38 @@ for _ in range(10):
 </details>
+## Process images on the fly
+Datago can also process images on the fly, for instance to align different image payloads. This is done by adding an `image_config` to the configuration. The following example shows how to align different image payloads.
+Processing can be very CPU heavy, but it will be distributed over all CPU cores wihout requiring multiple python processes. I.e., you can keep a single python process using `get_sample()` on the client and still saturate all CPU cores.
+There are three main processing topics that you can choose from:
+- crop the images to within an aspect ratio bucket (which is very handy for all Transformer / patch based architectures)
+- resize the images (setting here will be related to the square aspect ratio bucket, other buckets will differ of course)
+- pre-encode the images to a specific format (jpg, png, ...)
+```python
+   config = {
+    "source_type": "file",
+    "source_config": {
+        "root_path": "myPath",
+        "random_sampling": False, # True if used directly for training
+    },
+    # Optional pre-processing of the images, placing them in an aspect ratio bucket to preserve as much as possible of the original content
+    "image_config": {
+        "crop_and_resize": True, # False to turn it off, or just omit this part of the config
+        "default_image_size": 1024,
+        "downsampling_ratio": 32,
+        "min_aspect_ratio": 0.5,
+        "max_aspect_ratio": 2.0,
+        "pre_encode_images": False,
+    },
+    "limit": 200,
+    "samples_buffer_size": 32,
+}
+```
 ## Match the raw exported buffers with typical python types
@@ -171,6 +194,14 @@ You can set the log level using the RUST_LOG environment variable. E.g. `RUST_LO
 When using the library from Python, `env_logger` will be initialized automatically when creating a `DatagoClient`. There is also a `initialize_logging` function in the `datago` module, which if called before using a client, allows to customize the log level. This only works if RUST_LOG is not set.
+## Env variables
+There are a couple of env variables which will change the behavior of the library, for settings which felt too low level to be exposed in the config.
+- `DATAGO_MAX_TASKS`: refers to the number of threads which will be used to load the samples. Defaults to a multiple of the CPU cores.
+- `RUST_LOG`: see above, will change the level of logging for the whole library, could be useful for debugging or to report an issue here.
+- `DATAGO_MAX_RETRIES`: number of retries for a failed sample load, defaults to 3.
 </details><details> <summary><strong>Build it</strong></summary>
 ## Preamble
@@ -233,6 +264,25 @@ Create a new tag and a new release in this repo, a new package will be pushed au
 </details>
+<details> <summary><strong>Benchmarks</strong></summary>
+As usual, benchmarks are a tricky game, and you shouldn't read too much into the following plots but do your own tests. Some python benchmark examples are provided in the [python](./python/) folder.
+In general, Datago will be impactful if you want to load a lot of images very fast, but if you consume them as you go at a more leisury pace then it's not really needed. The more CPU work there is with the images and the higher quality they are, the more Datago will shine. The following benchmarks are using ImageNet 1k, which is very low resolution and thus kind of a worst case scenario. Data is served from cache (i.e. the OS cache) and the images are not pre-processed. In this case the receiving python process is typically the bottleneck, and caps at around 2000 images per second.
+### AMD Zen3 laptop - IN1k - disk
+![AMD Zen3 laptop & M2 SSD](assets/zen3_ssd.png)
+### AMD EPYC 9454 - IN1k - disk
+![AMD EPYC 9454](assets/epyc_vast.png)
+This benchmark is using the PD12M dataset, which is a 12M images dataset, with a lot of high resolution images. It's accessed through the webdataset front end, datago is compared with the popular python webdataset library. Note that datago will start streaming the images faster here (almost instantly !), so given enough time the two results would look closer.
+### AMD EPYC 9454 - pd12m - webdataset
+![AMD EPYC 9454](assets/epyc_wds.png)
+</details>
 ## License
 MIT License

{datago-2025.8.1 → datago-2025.10.2}/README.md RENAMED Viewed

@@ -80,7 +80,7 @@ config = {
     "source_config": {
         "root_path": "myPath",
         "random_sampling": False, # True if used directly for training
-        "rank": 0,
+        "rank": 0, # Optional, distributed workloads are possible
         "world_size": 1,
     },
     "limit": 200,
@@ -120,15 +120,6 @@ client_config = {
         "rank": 0,
         "world_size": 1,
     },
-    # Optional pre-processing of the images, placing them in an aspect ratio bucket to preseve as much as possible of the original content
-    "image_config": {
-        "crop_and_resize": True, # False to turn it off, or just omit this part of the config
-        "default_image_size": 1024,
-        "downsampling_ratio": 32,
-        "min_aspect_ratio": 0.5,
-        "max_aspect_ratio": 2.0,
-        "pre_encode_images": False,
-    },
     "prefetch_buffer_size": 128,
     "samples_buffer_size": 64,
     "limit": 1_000_000, # Dummy example, max number of samples you would like to serve
@@ -142,6 +133,38 @@ for _ in range(10):
 </details>
+## Process images on the fly
+Datago can also process images on the fly, for instance to align different image payloads. This is done by adding an `image_config` to the configuration. The following example shows how to align different image payloads.
+Processing can be very CPU heavy, but it will be distributed over all CPU cores wihout requiring multiple python processes. I.e., you can keep a single python process using `get_sample()` on the client and still saturate all CPU cores.
+There are three main processing topics that you can choose from:
+- crop the images to within an aspect ratio bucket (which is very handy for all Transformer / patch based architectures)
+- resize the images (setting here will be related to the square aspect ratio bucket, other buckets will differ of course)
+- pre-encode the images to a specific format (jpg, png, ...)
+```python
+   config = {
+    "source_type": "file",
+    "source_config": {
+        "root_path": "myPath",
+        "random_sampling": False, # True if used directly for training
+    },
+    # Optional pre-processing of the images, placing them in an aspect ratio bucket to preserve as much as possible of the original content
+    "image_config": {
+        "crop_and_resize": True, # False to turn it off, or just omit this part of the config
+        "default_image_size": 1024,
+        "downsampling_ratio": 32,
+        "min_aspect_ratio": 0.5,
+        "max_aspect_ratio": 2.0,
+        "pre_encode_images": False,
+    },
+    "limit": 200,
+    "samples_buffer_size": 32,
+}
+```
 ## Match the raw exported buffers with typical python types
@@ -154,6 +177,14 @@ You can set the log level using the RUST_LOG environment variable. E.g. `RUST_LO
 When using the library from Python, `env_logger` will be initialized automatically when creating a `DatagoClient`. There is also a `initialize_logging` function in the `datago` module, which if called before using a client, allows to customize the log level. This only works if RUST_LOG is not set.
+## Env variables
+There are a couple of env variables which will change the behavior of the library, for settings which felt too low level to be exposed in the config.
+- `DATAGO_MAX_TASKS`: refers to the number of threads which will be used to load the samples. Defaults to a multiple of the CPU cores.
+- `RUST_LOG`: see above, will change the level of logging for the whole library, could be useful for debugging or to report an issue here.
+- `DATAGO_MAX_RETRIES`: number of retries for a failed sample load, defaults to 3.
 </details><details> <summary><strong>Build it</strong></summary>
 ## Preamble
@@ -216,6 +247,25 @@ Create a new tag and a new release in this repo, a new package will be pushed au
 </details>
+<details> <summary><strong>Benchmarks</strong></summary>
+As usual, benchmarks are a tricky game, and you shouldn't read too much into the following plots but do your own tests. Some python benchmark examples are provided in the [python](./python/) folder.
+In general, Datago will be impactful if you want to load a lot of images very fast, but if you consume them as you go at a more leisury pace then it's not really needed. The more CPU work there is with the images and the higher quality they are, the more Datago will shine. The following benchmarks are using ImageNet 1k, which is very low resolution and thus kind of a worst case scenario. Data is served from cache (i.e. the OS cache) and the images are not pre-processed. In this case the receiving python process is typically the bottleneck, and caps at around 2000 images per second.
+### AMD Zen3 laptop - IN1k - disk
+![AMD Zen3 laptop & M2 SSD](assets/zen3_ssd.png)
+### AMD EPYC 9454 - IN1k - disk
+![AMD EPYC 9454](assets/epyc_vast.png)
+This benchmark is using the PD12M dataset, which is a 12M images dataset, with a lot of high resolution images. It's accessed through the webdataset front end, datago is compared with the popular python webdataset library. Note that datago will start streaming the images faster here (almost instantly !), so given enough time the two results would look closer.
+### AMD EPYC 9454 - pd12m - webdataset
+![AMD EPYC 9454](assets/epyc_wds.png)
+</details>
 ## License
 MIT License

datago-2025.10.2/assets/epyc_vast.png ADDED Viewed

Binary file

datago-2025.10.2/assets/epyc_wds.png ADDED Viewed

Binary file

datago-2025.10.2/assets/zen3_ssd.png ADDED Viewed

Binary file

{datago-2025.8.1 → datago-2025.10.2}/python/benchmark_db.py RENAMED Viewed

@@ -1,11 +1,13 @@
-from datago import DatagoClient  # type: ignore
+import json
 import time
-from tqdm import tqdm
 import numpy as np
-from raw_types import raw_array_to_pil_image, raw_array_to_numpy
 import typer
-import json
+from benchmark_defaults import IMAGE_CONFIG
+from datago import DatagoClient  # type: ignore
 from PIL import Image
+from raw_types import raw_array_to_numpy, raw_array_to_pil_image
+from tqdm import tqdm
 def benchmark(
@@ -31,19 +33,20 @@ def benchmark(
             "rank": 0,
             "world_size": 1,
         },
-        "image_config": {
-            "crop_and_resize": crop_and_resize,
-            "default_image_size": 1024,
-            "downsampling_ratio": 32,
-            "min_aspect_ratio": 0.5,
-            "max_aspect_ratio": 2.0,
-            "pre_encode_images": encode_images,
-        },
         "prefetch_buffer_size": 128,
         "samples_buffer_size": 64,
         "limit": limit,
     }
+    if crop_and_resize or encode_images:
+        client_config["image_config"] = IMAGE_CONFIG
+    if encode_images:
+        client_config["image_config"]["crop_and_resize"] = (  # type: ignore
+            crop_and_resize  # You may want to encode images without resizing them
+        )
+        client_config["image_config"]["pre_encode_images"] = True  # type: ignore
     client = DatagoClient(json.dumps(client_config))
     client.start()  # Optional, but good practice to start the client to reduce latency to first sample (while you're instantiating models for instance)
     start = time.time()

datago-2025.10.2/python/benchmark_defaults.py ADDED Viewed

@@ -0,0 +1,8 @@
+IMAGE_CONFIG = {
+    "crop_and_resize": True,
+    "default_image_size": 1024,
+    "downsampling_ratio": 32,
+    "min_aspect_ratio": 0.5,
+    "max_aspect_ratio": 2.0,
+    "pre_encode_images": False,
+}

{datago-2025.8.1 → datago-2025.10.2}/python/benchmark_filesystem.py RENAMED Viewed

@@ -1,24 +1,37 @@
-import time
-from tqdm import tqdm
+import json
 import os
+import time
 import typer
+from benchmark_defaults import IMAGE_CONFIG
 from dataset import DatagoIterDataset
+from tqdm import tqdm
 def benchmark(
-    root_path: str = typer.Option(
-        os.getenv("DATAGO_TEST_FILESYSTEM", ""), help="The source to test out"
-    ),
+    root_path: str = typer.Option(os.getenv("DATAGO_TEST_FILESYSTEM", ""), help="The source to test out"),
     limit: int = typer.Option(2000, help="The number of samples to test on"),
-    crop_and_resize: bool = typer.Option(
-        False, help="Crop and resize the images on the fly"
-    ),
+    crop_and_resize: bool = typer.Option(False, help="Crop and resize the images on the fly"),
     compare_torch: bool = typer.Option(True, help="Compare against torch dataloader"),
+    num_workers: int = typer.Option(os.cpu_count(), help="Number of workers to use"),
+    sweep: bool = typer.Option(False, help="Sweep over the number of workers"),
 ):
-    print(f"Running benchmark for {root_path} - {limit} samples")
-    print(
-        "Please run the benchmark twice if you want to compare against torch dataloader, so that file caching affects both paths"
-    )
+    if sweep:
+        results = {}
+        for num_workers in range(2, (os.cpu_count() or 2), 16):
+            results[num_workers] = benchmark(root_path, limit, crop_and_resize, compare_torch, num_workers, False)
+        # Save results to a json file
+        with open("benchmark_results_filesystem.json", "w") as f:
+            json.dump(results, f, indent=2)
+        return results
+    print(f"Running benchmark for {root_path} - {limit} samples - {num_workers} workers")
+    # This setting is not exposed in the config, but an env variable can be used instead
+    os.environ["DATAGO_MAX_TASKS"] = str(num_workers)
     client_config = {
         "source_type": "file",
@@ -27,19 +40,14 @@ def benchmark(
             "rank": 0,
             "world_size": 1,
         },
-        "image_config": {
-            "crop_and_resize": crop_and_resize,
-            "default_image_size": 1024,
-            "downsampling_ratio": 32,
-            "min_aspect_ratio": 0.5,
-            "max_aspect_ratio": 2.0,
-            "pre_encode_images": False,
-        },
         "prefetch_buffer_size": 256,
         "samples_buffer_size": 256,
         "limit": limit,
     }
+    if crop_and_resize:
+        client_config["image_config"] = IMAGE_CONFIG
     # Make sure in the following that we compare apples to apples, meaning in that case
     # that we materialize the payloads in the python scope in the expected format
     # (PIL.Image for images and masks for instance, numpy arrays for latents)
@@ -48,14 +56,15 @@ def benchmark(
     img = None
     count = 0
-    for sample in tqdm(datago_dataset, dynamic_ncols=True):
+    for sample in tqdm(datago_dataset, desc="Datago", dynamic_ncols=True):
         assert sample["id"] != ""
         img = sample["image"]
         count += 1
     assert count == limit, f"Expected {limit} samples, got {count}"
     fps = limit / (time.time() - start)
-    print(f"Datago FPS {fps:.2f}")
+    results = {"datago": {"fps": fps, "count": count}}
+    print(f"Datago - FPS {fps:.2f} - workers {num_workers}")
     del datago_dataset
     # Save the last image as a test
@@ -64,17 +73,14 @@ def benchmark(
     # Let's compare against a classic pytorch dataloader
     if compare_torch:
-        from torchvision import datasets, transforms  # type: ignore
         from torch.utils.data import DataLoader
+        from torchvision import datasets, transforms  # type: ignore
-        print("Benchmarking torch dataloader")
         # Define the transformations to apply to each image
         transform = (
             transforms.Compose(
                 [
-                    transforms.Resize(
-                        (1024, 1024), interpolation=transforms.InterpolationMode.LANCZOS
-                    ),
+                    transforms.Resize((1024, 1024), interpolation=transforms.InterpolationMode.LANCZOS),
                 ]
             )
             if crop_and_resize
@@ -82,13 +88,10 @@ def benchmark(
         )
         # Create the ImageFolder dataset
-        dataset = datasets.ImageFolder(
-            root=root_path, transform=transform, allow_empty=True
-        )
+        dataset = datasets.ImageFolder(root=root_path, transform=transform, allow_empty=True)
         # Create a DataLoader to allow for multiple workers
         # Use available CPU count for num_workers
-        num_workers = os.cpu_count() or 8  # Default to 8 if cpu_count returns None
         dataloader = DataLoader(
             dataset,
             batch_size=1,
@@ -100,12 +103,15 @@ def benchmark(
         # Iterate over the DataLoader
         start = time.time()
         n_images = 0
-        for batch in tqdm(dataloader, dynamic_ncols=True):
+        for batch in tqdm(dataloader, desc="Torch", dynamic_ncols=True):
             n_images += len(batch)
             if n_images > limit:
                 break
         fps = n_images / (time.time() - start)
-        print(f"Torch FPS {fps:.2f}")
+        results["torch"] = {"fps": fps, "count": n_images}
+        print(f"Torch - FPS {fps:.2f} - workers {num_workers}")
+    return results
 if __name__ == "__main__":

{datago-2025.8.1 → datago-2025.10.2}/python/benchmark_webdataset.py RENAMED Viewed

@@ -1,8 +1,11 @@
+import json
+import os
 import time
-from tqdm import tqdm
 import typer
+from benchmark_defaults import IMAGE_CONFIG
 from dataset import DatagoIterDataset
-import os
+from tqdm import tqdm
 def benchmark(
@@ -11,11 +14,23 @@ def benchmark(
         True, help="Crop and resize the images on the fly"
     ),
     compare_wds: bool = typer.Option(True, help="Compare against torch dataloader"),
-    n_processes_wds: int = typer.Option(
+    num_workers: int = typer.Option(
         16,
-        help="Number of processes to use for the torch dataloader - used only if compare_wds is True",
+        help="Number of processes to use",
     ),
+    sweep: bool = typer.Option(False, help="Sweep over the number of processes"),
 ):
+    if sweep:
+        results = {}
+        for num_workers in range(2, max(64, (os.cpu_count() or 1)), 8):
+            results[num_workers] = benchmark(limit, crop_and_resize, compare_wds, num_workers, False)
+        # Save results to a json file
+        with open("benchmark_results_wds.json", "w") as f:
+            json.dump(results, f, indent=2)
+        return results
     # URL of the test bucket
     # bucket = "https://storage.googleapis.com/webdataset/fake-imagenet"
     # dataset = "/imagenet-train-{000000..001281}.tar"
@@ -32,22 +47,18 @@ def benchmark(
         "source_config": {
             "url": url,
             "shuffle": True,
-            "max_concurrency": 8,  # Number of concurrent TarballSample downloads and dispatch
+            "max_concurrency": num_workers,  # Number of concurrent TarballSample downloads and dispatch
             "auth_token": os.environ.get("HF_TOKEN", default=""),
         },
-        "image_config": {
-            "crop_and_resize": crop_and_resize,
-            "default_image_size": 1024,
-            "downsampling_ratio": 32,
-            "min_aspect_ratio": 0.5,
-            "max_aspect_ratio": 2.0,
-            "pre_encode_images": False,
-        },
         "prefetch_buffer_size": 256,
         "samples_buffer_size": 256,
         "limit": limit,
     }
+    if crop_and_resize:
+        # Optionally add a custom image config to crop and resize the images on the fly
+        client_config["image_config"] = IMAGE_CONFIG
     # # Make sure in the following that we compare apples to apples, meaning in that case
     # # that we materialize the payloads in the python scope in the expected format
     # # (PIL.Image for images and masks for instance, numpy arrays for latents)
@@ -55,14 +66,15 @@ def benchmark(
     start = time.time()  # Note that the datago dataset will start preparing samples (up to the requested buffer size) at construction time
     img, count = None, 0
-    for sample in tqdm(datago_dataset, dynamic_ncols=True):
+    for sample in tqdm(datago_dataset, desc="Datago", dynamic_ncols=True):
         assert sample["id"] != ""
         img = sample["image"]
         count += 1
     assert count == limit, f"Expected {limit} samples, got {count}"
     fps = limit / (time.time() - start)
-    print(f"-- Datago WDS FPS {fps:.2f}")
+    print(f"-- Datago WDS FPS {fps:.2f} - workers {num_workers}")
+    results = {"datago": {"fps": fps, "count": count}}
     del datago_dataset
     # Save the last image as a test
@@ -71,9 +83,9 @@ def benchmark(
     # Let's compare against a classic webdataset dataloader
     if compare_wds:
-        from torchvision import transforms
-        from torch.utils.data import DataLoader
         import webdataset as wds
+        from torch.utils.data import DataLoader
+        from torchvision import transforms
         print("\nBenchmarking webdataset library dataloader")
         # Define the transformations to apply to each image
@@ -108,19 +120,21 @@ def benchmark(
         dataloader = DataLoader(
             dataset,
             batch_size=1,
-            num_workers=n_processes_wds,
+            num_workers=num_workers,
             prefetch_factor=2,
             collate_fn=lambda x: x,
         )
         # Iterate over the DataLoader
         start = time.time()
-        for n_images, _ in enumerate(tqdm(dataloader, dynamic_ncols=True)):
+        for n_images, _ in enumerate(tqdm(dataloader, desc="WDS", dynamic_ncols=True)):
             if n_images > limit:
                 break
         fps = n_images / (time.time() - start)
-        print(f"-- Webdataset lib FPS ({n_processes_wds} processes) {fps:.2f}")
+        print(f"-- Webdataset lib FPS ({num_workers} processes) {fps:.2f}")
+        results["webdataset"] = {"fps": fps, "count": n_images}
+        return results
 if __name__ == "__main__":
     typer.run(benchmark)

{datago-2025.8.1 → datago-2025.10.2}/python/dataset.py RENAMED Viewed

@@ -85,6 +85,9 @@ if __name__ == "__main__":
             "min_aspect_ratio": 0.5,
             "max_aspect_ratio": 2.0,
             "pre_encode_images": False,
+            # Optional: Use JPEG encoding instead of PNG (defaults to PNG if not specified)
+            # "encode_format": "jpeg",  # or "png"
+            # "jpeg_quality": 92,  # 0-100, only used when encode_format is "jpeg"
         },
         "prefetch_buffer_size": 64,
         "samples_buffer_size": 128,

{datago-2025.8.1 → datago-2025.10.2}/python/raw_types.py RENAMED Viewed

@@ -1,9 +1,9 @@
 from PIL import Image
-from typing import Optional
+from typing import Optional, Union
 import numpy as np
-def uint8_array_to_numpy(raw_array):
+def uint8_array_to_numpy(raw_array: 'ImagePayload') -> Optional[np.ndarray]:
     if len(raw_array.data) == 0:
         return None
@@ -29,7 +29,7 @@ def uint8_array_to_numpy(raw_array):
     return np.frombuffer(raw_array.data, dtype=np.uint8).reshape(shape)
-def raw_array_to_numpy(raw_array) -> Optional[np.ndarray]:
+def raw_array_to_numpy(raw_array: 'ImagePayload') -> Optional[np.ndarray]:
     if len(raw_array.data) == 0:
         return None
@@ -42,7 +42,7 @@ def raw_array_to_numpy(raw_array) -> Optional[np.ndarray]:
         return None
-def raw_array_to_pil_image(raw_array) -> Optional[Image.Image]:
+def raw_array_to_pil_image(raw_array: 'ImagePayload') -> Union[Optional[Image.Image], 'ImagePayload']:
     if len(raw_array.data) == 0:
         return None
@@ -63,3 +63,25 @@ def raw_array_to_pil_image(raw_array) -> Optional[Image.Image]:
     assert c == 3, f"Expected 3 channels, got {c}"
     return Image.fromarray(np_array)
+def decode_image_payload(payload: 'ImagePayload') -> Image.Image:
+    """
+    Decode an ImagePayload (encoded image) into a PIL Image.
+    This is the proper way to decode encoded images for API users.
+    """
+    import io
+    return Image.open(io.BytesIO(payload.data))
+def get_image_mode(image_or_payload) -> str:
+    """
+    Helper function to get the mode of an image, whether it's a PIL Image or ImagePayload.
+    For ImagePayload objects (encoded images), we need to decode them first.
+    """
+    if hasattr(image_or_payload, 'mode'):
+        # It's a PIL Image
+        return image_or_payload.mode
+    else:
+        # It's an ImagePayload (encoded image), decode it first
+        return decode_image_payload(image_or_payload).mode

datago 2025.8.1__tar.gz → 2025.10.2__tar.gz

datago 2025.8.1tar.gz → 2025.10.2tar.gz