PyPI - datachain - Versions diffs - 0.7.10__tar.gz → 0.7.11__tar.gz - Mend

datachain 0.7.10tar.gz → 0.7.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of datachain might be problematic. Click here for more details.

Files changed (286) hide show

{datachain-0.7.10 → datachain-0.7.11}/.github/workflows/tests.yml RENAMED Viewed

@@ -136,7 +136,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        os: [ubuntu-latest, macos-latest, windows-latest]
+        os: [ubuntu-latest, windows-latest]
         pyv: ['3.9', '3.12']
         group: ['get_started', 'llm_and_nlp or computer_vision', 'multimodal']
         exclude:
@@ -166,7 +166,20 @@ jobs:
       - name: Install nox
         run: uv pip install nox --system
+      # HF runs against actual API - thus run it only once
+      - name: Set hf token
+        if: matrix.os == 'ubuntu-latest' && matrix.pyv == '3.12'
+        run: echo 'HF_TOKEN=${{ secrets.HF_TOKEN }}' >> "$GITHUB_ENV"
       - name: Run examples
-        env:
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: nox -s examples -p ${{ matrix.pyv }} -- -m "${{ matrix.group }}"
+  check:
+    if: always()
+    needs: [lint, datachain, examples]
+    runs-on: ubuntu-latest
+    steps:
+      - uses: re-actors/alls-green@release/v1
+        with:
+          allowed-failures: examples
+          jobs: ${{ toJSON(needs) }}

{datachain-0.7.10 → datachain-0.7.11}/.pre-commit-config.yaml RENAMED Viewed

@@ -24,7 +24,7 @@ repos:
       - id: trailing-whitespace
         exclude: '^LICENSES/'
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: 'v0.8.1'
+    rev: 'v0.8.2'
     hooks:
       - id: ruff
         args: [--fix, --exit-non-zero-on-fix]

{datachain-0.7.10/src/datachain.egg-info → datachain-0.7.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: datachain
-Version: 0.7.10
+Version: 0.7.11
 Summary: Wrangle unstructured AI data at scale
 Author-email: Dmitry Petrov <support@dvc.org>
 License: Apache-2.0
@@ -91,14 +91,14 @@ Requires-Dist: types-requests; extra == "dev"
 Requires-Dist: types-tabulate; extra == "dev"
 Provides-Extra: examples
 Requires-Dist: datachain[tests]; extra == "examples"
-Requires-Dist: numpy<2,>=1; extra == "examples"
 Requires-Dist: defusedxml; extra == "examples"
 Requires-Dist: accelerate; extra == "examples"
-Requires-Dist: unstructured[embed-huggingface,pdf]<0.16.0; extra == "examples"
+Requires-Dist: unstructured_ingest[embed-huggingface]; extra == "examples"
+Requires-Dist: unstructured[pdf]; extra == "examples"
 Requires-Dist: pdfplumber==0.11.4; extra == "examples"
 Requires-Dist: huggingface_hub[hf_transfer]; extra == "examples"
 Requires-Dist: onnx==1.16.1; extra == "examples"
-Requires-Dist: ultralytics==8.3.37; extra == "examples"
+Requires-Dist: ultralytics==8.3.48; extra == "examples"
 ================
 |logo| DataChain
@@ -138,6 +138,11 @@ Use Cases
 3. **Versioning.** DataChain doesn't store, require moving or copying data (unlike DVC).
    Perfect use case is a bucket with thousands or millions of images, videos, audio, PDFs.
+Getting Started
+===============
+Visit `Quick Start <https://docs.datachain.ai/quick-start>`_ and `Docs <https://docs.datachain.ai/>`_
+to get started with `DataChain` and learn more.
 Key Features
 ============
@@ -161,12 +166,6 @@ Key Features
    - Pass datasets to Pytorch and Tensorflow, or export them back into storage.
-Getting Started
-===============
-Visit `Quick Start <https://docs.datachain.ai/quick-start>`_ to get started with `DataChain` and learn more.
 Contributing
 ============

{datachain-0.7.10 → datachain-0.7.11}/README.rst RENAMED Viewed

@@ -36,6 +36,11 @@ Use Cases
 3. **Versioning.** DataChain doesn't store, require moving or copying data (unlike DVC).
    Perfect use case is a bucket with thousands or millions of images, videos, audio, PDFs.
+Getting Started
+===============
+Visit `Quick Start <https://docs.datachain.ai/quick-start>`_ and `Docs <https://docs.datachain.ai/>`_
+to get started with `DataChain` and learn more.
 Key Features
 ============
@@ -59,12 +64,6 @@ Key Features
    - Pass datasets to Pytorch and Tensorflow, or export them back into storage.
-Getting Started
-===============
-Visit `Quick Start <https://docs.datachain.ai/quick-start>`_ to get started with `DataChain` and learn more.
 Contributing
 ============

{datachain-0.7.10 → datachain-0.7.11}/docs/contributing.md RENAMED Viewed

@@ -1,3 +1,7 @@
+---
+title: Contributing
+---
 # Contributor Guide
 Thank you for your interest in improving this project. This project is

datachain-0.7.11/docs/css/github-permalink-style.css ADDED Viewed

@@ -0,0 +1,39 @@
+.headerlink {
+	--permalink-size: 16px; /* for font-relative sizes, 0.6em is a good choice */
+	--permalink-spacing: 4px;
+	width: calc(var(--permalink-size) + var(--permalink-spacing));
+	height: var(--permalink-size);
+	vertical-align: middle;
+	background-color: var(--md-default-fg-color--lighter);
+	background-size: var(--permalink-size);
+	mask-size: var(--permalink-size);
+	-webkit-mask-size: var(--permalink-size);
+	mask-repeat: no-repeat;
+	-webkit-mask-repeat: no-repeat;
+	visibility: visible;
+	mask-image: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg>');
+	-webkit-mask-image: url('data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg>');
+}
+[id]:target .headerlink {
+	background-color: var(--md-typeset-a-color);
+}
+.headerlink:hover {
+	background-color: var(--md-accent-fg-color) !important;
+}
+@media screen and (min-width: 76.25em) {
+	h1, h2, h3, h4, h5, h6 {
+		display: flex;
+		align-items: center;
+		flex-direction: row;
+		column-gap: 0.2em; /* fixes spaces in titles */
+	}
+	.headerlink {
+		order: -1;
+		margin-left: calc(var(--permalink-size) * -1 - var(--permalink-spacing)) !important;
+	}
+}

{datachain-0.7.10 → datachain-0.7.11}/docs/examples.md RENAMED Viewed

@@ -1,3 +1,6 @@
+---
+title: Examples
+---
 # Examples
@@ -225,7 +228,7 @@ Here is an example from MS COCO “captions” JSON which employs separate secti
 }
 ```
-Note how complicated the setup is. Every image is references by the name, and the metadata for this file is keyed by the “id” field. This same field is references later in the “annotations’ array, which is present in JSON files describing captions and the detected instances. The categories for the instances are stored in the “categories” array.
+Note how complicated the setup is. Every image is references by the name, and the metadata for this file is keyed by the “id” field. This same field is references later in the “annotations” array, which is present in JSON files describing captions and the detected instances. The categories for the instances are stored in the “categories” array.
 However, Datachain can easily parse the entire COCO structure via several reading and merging operators:

{datachain-0.7.10 → datachain-0.7.11}/docs/index.md RENAMED Viewed

@@ -1,3 +1,6 @@
+---
+title: Welcome to DataChain
+---
 # <a class="main-header-link" href="/" ><img style="display: inline-block;" src="/assets/datachain.svg" alt="DataChain"> <span style="display: inline-block;"> DataChain</span></a>
 <style>
@@ -83,7 +86,7 @@ The following pages provide detailed documentation on DataChain's features, arch
 - [🏃🏼‍♂️ Quick Start](quick-start.md): Get up and running with DataChain in no time.
 - [🎯 Examples](examples.md): Explore practical examples and use cases.
 - [📚 Tutorials](tutorials.md): Learn how to use DataChain for specific tasks.
-- [📚 API Reference](references/index.md): Dive into the technical details and API reference.
+- [🐍 API Reference](references/index.md): Dive into the technical details and API reference.
 - [🤝 Contributing](contributing.md): Learn how to contribute to DataChain.

{datachain-0.7.10 → datachain-0.7.11}/docs/quick-start.md RENAMED Viewed

@@ -1,3 +1,7 @@
+---
+title: Quick Start
+---
 # Quick Start
 ## Installation

{datachain-0.7.10 → datachain-0.7.11}/docs/references/index.md RENAMED Viewed

@@ -1,3 +1,7 @@
+---
+title: API Reference
+---
 # API Reference
 DataChain's API is organized into several modules:

{datachain-0.7.10 → datachain-0.7.11}/docs/tutorials.md RENAMED Viewed

@@ -1,3 +1,7 @@
+---
+title: Tutorials
+---
 # Tutorials
 * Multimodal: [GitHub](https://github.com/iterative/datachain-examples/blob/main/multimodal/clip_fine_tuning.ipynb) or [Google Colab](https://colab.research.google.com/github/iterative/datachain-examples/blob/main/multimodal/clip_fine_tuning.ipynb)

{datachain-0.7.10 → datachain-0.7.11}/examples/get_started/torch-loader.py RENAMED Viewed

@@ -5,6 +5,7 @@ To install the required dependencies:
 """
+import multiprocessing
 import os
 from posixpath import basename
@@ -12,17 +13,18 @@ import torch
 from torch import nn, optim
 from torch.utils.data import DataLoader
 from torchvision.transforms import v2
+from tqdm import tqdm
 from datachain import C, DataChain
 from datachain.torch import label_to_int
 STORAGE = "gs://datachain-demo/dogs-and-cats/"
-NUM_EPOCHS = os.getenv("NUM_EPOCHS", "3")
+NUM_EPOCHS = int(os.getenv("NUM_EPOCHS", "3"))
 # Define transformation for data preprocessing
 transform = v2.Compose(
     [
-        v2.ToTensor(),
+        v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]),
         v2.Resize((64, 64)),
         v2.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
     ]
@@ -54,6 +56,7 @@ class CNN(nn.Module):
 if __name__ == "__main__":
     ds = (
         DataChain.from_storage(STORAGE, type="image")
+        .settings(cache=True, prefetch=25)
         .filter(C("file.path").glob("*.jpg"))
         .map(
             label=lambda path: label_to_int(basename(path)[:3], CLASSES),
@@ -64,8 +67,10 @@ if __name__ == "__main__":
     train_loader = DataLoader(
         ds.to_pytorch(transform=transform),
-        batch_size=16,
-        num_workers=2,
+        batch_size=25,
+        num_workers=max(4, os.cpu_count() or 2),
+        persistent_workers=True,
+        multiprocessing_context=multiprocessing.get_context("spawn"),
     )
     model = CNN()
@@ -73,19 +78,19 @@ if __name__ == "__main__":
     optimizer = optim.Adam(model.parameters(), lr=0.001)
     # Train the model
-    for epoch in range(int(NUM_EPOCHS)):
-        for i, data in enumerate(train_loader):
-            inputs, labels = data
-            optimizer.zero_grad()
-            # Forward pass
-            outputs = model(inputs)
-            loss = criterion(outputs, labels)
-            # Backward pass and optimize
-            loss.backward()
-            optimizer.step()
-            print(f"[{epoch + 1}, {i + 1:5d}] loss: {loss.item():.3f}")
-    print("Finished Training")
+    for epoch in range(NUM_EPOCHS):
+        with tqdm(
+            train_loader, desc=f"epoch {epoch + 1}/{NUM_EPOCHS}", unit="batch"
+        ) as loader:
+            for data in loader:
+                inputs, labels = data
+                optimizer.zero_grad()
+                # Forward pass
+                outputs = model(inputs)
+                loss = criterion(outputs, labels)
+                # Backward pass and optimize
+                loss.backward()
+                optimizer.step()
+                loader.set_postfix(loss=loss.item())

{datachain-0.7.10 → datachain-0.7.11}/examples/llm_and_nlp/unstructured-embeddings-gen.py RENAMED Viewed

@@ -12,11 +12,11 @@ from unstructured.cleaners.core import (
     group_broken_paragraphs,
     replace_unicode_quotes,
 )
-from unstructured.embed.huggingface import (
+from unstructured.partition.pdf import partition_pdf
+from unstructured_ingest.embed.huggingface import (
     HuggingFaceEmbeddingConfig,
     HuggingFaceEmbeddingEncoder,
 )
-from unstructured.partition.pdf import partition_pdf
 from datachain import C, DataChain, DataModel, File
@@ -43,6 +43,7 @@ def process_pdf(file: File) -> Iterator[Chunk]:
         chunks = partition_pdf(file=f, chunking_strategy="by_title", strategy="fast")
     # Clean the chunks and add new columns
+    text_chunks = []
     for chunk in chunks:
         chunk.apply(
             lambda text: clean(
@@ -51,16 +52,17 @@ def process_pdf(file: File) -> Iterator[Chunk]:
         )
         chunk.apply(replace_unicode_quotes)
         chunk.apply(group_broken_paragraphs)
+        text_chunks.append({"text": str(chunk)})
     # create embeddings
-    chunks_embedded = embedding_encoder.embed_documents(chunks)
+    chunks_embedded = embedding_encoder.embed_documents(text_chunks)
     # Add new rows to DataChain
     for chunk in chunks_embedded:
         yield Chunk(
             key=file.path,
-            text=chunk.text,
-            embeddings=chunk.embeddings,
+            text=chunk.get("text"),
+            embeddings=chunk.get("embeddings"),
         )

{datachain-0.7.10 → datachain-0.7.11}/mkdocs.yml RENAMED Viewed

@@ -27,7 +27,6 @@ theme:
     - navigation.tabs
     - navigation.path
     - navigation.top
-    - navigation.prune
     - navigation.footer
     - toc.follow
     - content.action.edit
@@ -37,7 +36,6 @@ theme:
     - content.tooltips
     - search.highlight
     - search.suggest
-    - navigation.sections
   palette:
     # Palette toggle for automatic mode
@@ -56,8 +54,8 @@ theme:
     # Palette toggle for dark mode
     - media: "(prefers-color-scheme: dark)"
       scheme: slate
-      primary: black
-      accent: lime
+      primary: teal
+      accent: teal
       toggle:
         icon: material/weather-night
         name: Switch to system preference
@@ -68,18 +66,18 @@ nav:
       - 🏃🏼‍♂️ Quick Start: quick-start.md
       - 🎯 Examples: examples.md
       - 📚 Tutorials: tutorials.md
-      - 🐍 API Reference: references/index.md
+      - 🐍 API Reference:
+          - Overview: references/index.md
+          - DataChain: references/datachain.md
+          - DataType: references/datatype.md
+          - File: references/file.md
+          - UDF: references/udf.md
+          - Torch: references/torch.md
+          - SQL: references/sql.md
       - 🤝 Contributing: contributing.md
-  - API Reference:
-      - references/index.md
-      - references/datachain.md
-      - references/datatype.md
-      - references/file.md
-      - references/udf.md
-      - references/torch.md
-      - references/sql.md
-  - DataChain Website: https://datachain.ai" target="_blank"
-  - Studio: https://studio.datachain.ai" target="_blank"
+  - DataChain Website ↗: https://datachain.ai" target="_blank"
+  - Studio ↗: https://studio.datachain.ai" target="_blank"
 markdown_extensions:
   - abbr
@@ -105,7 +103,11 @@ markdown_extensions:
   - pymdownx.tilde
   - tables
   - toc:
-      permalink: true
+      permalink: ''
+# Custom permalink style: https://github.com/squidfunk/mkdocs-material/discussions/3535
+extra_css:
+  - css/github-permalink-style.css
 extra:
   social:

{datachain-0.7.10 → datachain-0.7.11}/noxfile.py RENAMED Viewed

@@ -81,6 +81,8 @@ def examples(session: nox.Session) -> None:
     session.install(".[examples]")
     session.run(
         "pytest",
+        "--durations=0",
+        "tests/examples",
         "-m",
         "examples",
         *session.posargs,

{datachain-0.7.10 → datachain-0.7.11}/pyproject.toml RENAMED Viewed

@@ -104,14 +104,14 @@ dev = [
 ]
 examples = [
   "datachain[tests]",
-  "numpy>=1,<2",
   "defusedxml",
   "accelerate",
-  "unstructured[pdf,embed-huggingface]<0.16.0",
+  "unstructured_ingest[embed-huggingface]",
+  "unstructured[pdf]",
   "pdfplumber==0.11.4",
   "huggingface_hub[hf_transfer]",
   "onnx==1.16.1",
-  "ultralytics==8.3.37"
+  "ultralytics==8.3.48"
 ]
 [project.urls]

datachain-0.7.11/src/datachain/client/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from .fsspec import Client
+__all__ = ["Client"]

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/lib/dc.py RENAMED Viewed

@@ -19,7 +19,6 @@ from typing import (
 )
 import orjson
-import pandas as pd
 import sqlalchemy
 from pydantic import BaseModel
 from sqlalchemy.sql.functions import GenericFunction
@@ -57,6 +56,7 @@ from datachain.telemetry import telemetry
 from datachain.utils import batched_it, inside_notebook, row_to_nested_dict
 if TYPE_CHECKING:
+    import pandas as pd
     from pyarrow import DataType as ArrowDataType
     from typing_extensions import Concatenate, ParamSpec, Self
@@ -1701,6 +1701,8 @@ class DataChain:
         Parameters:
             flatten : Whether to use a multiindex or flatten column names.
         """
+        import pandas as pd
         headers, max_length = self._effective_signals_schema.get_headers_with_length()
         if flatten or max_length < 2:
             columns = [".".join(filter(None, header)) for header in headers]
@@ -1724,6 +1726,8 @@ class DataChain:
             transpose : Whether to transpose rows and columns.
             truncate : Whether or not to truncate the contents of columns.
         """
+        import pandas as pd
         dc = self.limit(limit) if limit > 0 else self  # type: ignore[misc]
         df = dc.to_pandas(flatten)

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/lib/file.py RENAMED Viewed

@@ -17,7 +17,6 @@ from urllib.request import url2pathname
 from fsspec.callbacks import DEFAULT_CALLBACK, Callback
 from PIL import Image
-from pyarrow.dataset import dataset
 from pydantic import Field, field_validator
 from datachain.client.fileslice import FileSlice
@@ -452,6 +451,8 @@ class ArrowRow(DataModel):
     @contextmanager
     def open(self):
         """Stream row contents from indexed file."""
+        from pyarrow.dataset import dataset
         if self.file._caching_enabled:
             self.file.ensure_cached()
             path = self.file.get_local_path()

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/lib/meta_formats.py RENAMED Viewed

@@ -6,7 +6,6 @@ from collections.abc import Iterator
 from pathlib import Path
 from typing import Callable
-import datamodel_code_generator
 import jmespath as jsp
 from pydantic import BaseModel, ConfigDict, Field, ValidationError  # noqa: F401
@@ -67,6 +66,8 @@ def read_schema(source_file, data_type="csv", expr=None, model_name=None):
             data_type = "json"  # treat json line as plain JSON in auto-schema
         data_string = json.dumps(json_object)
+    import datamodel_code_generator
     input_file_types = {i.value: i for i in datamodel_code_generator.InputFileType}
     input_file_type = input_file_types[data_type]
     with tempfile.TemporaryDirectory() as tmpdir:

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/lib/pytorch.py RENAMED Viewed

@@ -7,7 +7,6 @@ from torch import float32
 from torch.distributed import get_rank, get_world_size
 from torch.utils.data import IterableDataset, get_worker_info
 from torchvision.transforms import v2
-from tqdm import tqdm
 from datachain import Session
 from datachain.asyn import AsyncMapper
@@ -112,10 +111,7 @@ class PytorchDataset(IterableDataset):
             from datachain.lib.udf import _prefetch_input
             rows = AsyncMapper(_prefetch_input, rows, workers=self.prefetch).iterate()
-        desc = f"Parsed PyTorch dataset for rank={total_rank} worker"
-        with tqdm(rows, desc=desc, unit=" rows", position=total_rank) as rows_it:
-            yield from map(self._process_row, rows_it)
+        yield from map(self._process_row, rows)
     def _process_row(self, row_features):
         row = []

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/lib/signal_schema.py RENAMED Viewed

@@ -402,9 +402,20 @@ class SignalSchema:
             if ModelStore.is_pydantic(finfo.annotation):
                 SignalSchema._set_file_stream(getattr(obj, field), catalog, cache)
-    def get_column_type(self, col_name: str) -> DataType:
+    def get_column_type(self, col_name: str, with_subtree: bool = False) -> DataType:
+        """
+        Returns column type by column name.
+        If `with_subtree` is True, then it will return the type of the column
+        even if it has a subtree (e.g. model with nested fields), otherwise it will
+        return the type of the column (standard type field, not the model).
+        If column is not found, raises `SignalResolvingError`.
+        """
         for path, _type, has_subtree, _ in self.get_flat_tree():
-            if not has_subtree and DEFAULT_DELIMITER.join(path) == col_name:
+            if (with_subtree or not has_subtree) and DEFAULT_DELIMITER.join(
+                path
+            ) == col_name:
                 return _type
         raise SignalResolvingError([col_name], "is not found")
@@ -492,14 +503,25 @@ class SignalSchema:
                 # renaming existing signal
                 del new_values[value.name]
                 new_values[name] = self.values[value.name]
-            elif isinstance(value, Func):
+                continue
+            if isinstance(value, Column):
+                # adding new signal from existing signal field
+                try:
+                    new_values[name] = self.get_column_type(
+                        value.name, with_subtree=True
+                    )
+                    continue
+                except SignalResolvingError:
+                    pass
+            if isinstance(value, Func):
                 # adding new signal with function
                 new_values[name] = value.get_result_type(self)
-            elif isinstance(value, ColumnElement):
+                continue
+            if isinstance(value, ColumnElement):
                 # adding new signal
                 new_values[name] = sql_to_python(value)
-            else:
-                new_values[name] = value
+                continue
+            new_values[name] = value
         return SignalSchema(new_values)

{datachain-0.7.10 → datachain-0.7.11}/src/datachain/query/dataset.py RENAMED Viewed

@@ -35,7 +35,6 @@ from sqlalchemy.sql.schema import TableClause
 from sqlalchemy.sql.selectable import Select
 from datachain.asyn import ASYNC_WORKERS, AsyncMapper, OrderedMapper
-from datachain.catalog import QUERY_SCRIPT_CANCELED_EXIT_CODE, get_catalog
 from datachain.data_storage.schema import (
     PARTITION_COLUMN_ID,
     partition_col_names,
@@ -394,6 +393,8 @@ class UDFStep(Step, ABC):
         """
     def populate_udf_table(self, udf_table: "Table", query: Select) -> None:
+        from datachain.catalog import QUERY_SCRIPT_CANCELED_EXIT_CODE
         use_partitioning = self.partition_by is not None
         batching = self.udf.get_batching(use_partitioning)
         workers = self.workers
@@ -1087,6 +1088,8 @@ class DatasetQuery:
     def delete(
         name: str, version: Optional[int] = None, catalog: Optional["Catalog"] = None
     ) -> None:
+        from datachain.catalog import get_catalog
         catalog = catalog or get_catalog()
         version = version or catalog.get_dataset(name).latest_version
         catalog.remove_dataset(name, version)

datachain 0.7.10__tar.gz → 0.7.11__tar.gz

Potentially problematic release.

datachain 0.7.10tar.gz → 0.7.11tar.gz