PyPI - datachain - Versions diffs - 0.8.1__py3-none-any.whl → 0.8.2__py3-none-any.whl - Mend

datachain 0.8.1py3-none-any.whl → 0.8.2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of datachain might be problematic. Click here for more details.

Files changed (7) hide show

datachain/client/gcs.py CHANGED Viewed

@@ -33,13 +33,14 @@ class GCSClient(Client):
         return cast(GCSFileSystem, super().create_fs(**kwargs))
     def url(self, path: str, expires: int = 3600, **kwargs) -> str:
-        try:
-            return self.fs.sign(self.get_full_path(path), expiration=expires, **kwargs)
-        except AttributeError as exc:
-            is_anon = self.fs.storage_options.get("token") == "anon"
-            if is_anon and "you need a private key to sign credentials" in str(exc):
-                return f"https://storage.googleapis.com/{self.name}/{path}"
-            raise
+        """
+        Generate a signed URL for the given path.
+        If the client is anonymous, a public URL is returned instead
+        (see https://cloud.google.com/storage/docs/access-public-data#api-link).
+        """
+        if self.fs.storage_options.get("token") == "anon":
+            return f"https://storage.googleapis.com/{self.name}/{path}"
+        return self.fs.sign(self.get_full_path(path), expiration=expires, **kwargs)
     @staticmethod
     def parse_timestamp(timestamp: str) -> datetime:

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: datachain
-Version: 0.8.1
+Version: 0.8.2
 Summary: Wrangle unstructured AI data at scale
 Author-email: Dmitry Petrov <support@dvc.org>
 License: Apache-2.0
@@ -145,6 +145,88 @@ Getting Started
 Visit `Quick Start <https://docs.datachain.ai/quick-start>`_ and `Docs <https://docs.datachain.ai/>`_
 to get started with `DataChain` and learn more.
+.. code:: bash
+        pip install datachain
+Example: download subset of files based on metadata
+---------------------------------------------------
+Sometimes users only need to download a specific subset of files from cloud storage,
+rather than the entire dataset.
+For example, you could use a JSON file's metadata to download just cat images with
+high confidence scores.
+.. code:: py
+    from datachain import Column, DataChain
+    meta = DataChain.from_json("gs://datachain-demo/dogs-and-cats/*json", object_name="meta", anon=True)
+    images = DataChain.from_storage("gs://datachain-demo/dogs-and-cats/*jpg", anon=True)
+    images_id = images.map(id=lambda file: file.path.split('.')[-2])
+    annotated = images_id.merge(meta, on="id", right_on="meta.id")
+    likely_cats = annotated.filter((Column("meta.inference.confidence") > 0.93) \
+                                   & (Column("meta.inference.class_") == "cat"))
+    likely_cats.export_files("high-confidence-cats/", signal="file")
+Example: LLM based text-file evaluation
+---------------------------------------
+In this example, we evaluate chatbot conversations stored in text files
+using LLM based evaluation.
+.. code:: shell
+    $ pip install mistralai # Requires version >=1.0.0
+    $ export MISTRAL_API_KEY=_your_key_
+Python code:
+.. code:: py
+    from mistralai import Mistral
+    from datachain import File, DataChain, Column
+    PROMPT = "Was this dialog successful? Answer in a single word: Success or Failure."
+    def eval_dialogue(file: File) -> bool:
+         client = Mistral()
+         response = client.chat.complete(
+             model="open-mixtral-8x22b",
+             messages=[{"role": "system", "content": PROMPT},
+                       {"role": "user", "content": file.read()}])
+         result = response.choices[0].message.content
+         return result.lower().startswith("success")
+    chain = (
+       DataChain.from_storage("gs://datachain-demo/chatbot-KiT/", object_name="file", anon=True)
+       .settings(parallel=4, cache=True)
+       .map(is_success=eval_dialogue)
+       .save("mistral_files")
+    )
+    successful_chain = chain.filter(Column("is_success") == True)
+    successful_chain.export_files("./output_mistral")
+    print(f"{successful_chain.count()} files were exported")
+With the instruction above, the Mistral model considers 31/50 files to hold the successful dialogues:
+.. code:: shell
+    $ ls output_mistral/datachain-demo/chatbot-KiT/
+    1.txt  15.txt 18.txt 2.txt  22.txt 25.txt 28.txt 33.txt 37.txt 4.txt  41.txt ...
+    $ ls output_mistral/datachain-demo/chatbot-KiT/ | wc -l
+    31
 Key Features
 ============

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/RECORD RENAMED Viewed

@@ -25,7 +25,7 @@ datachain/client/__init__.py,sha256=1kDpCPoibMXi1gExR4lTLc5pi-k6M5TANiwtXkPoLhU,
 datachain/client/azure.py,sha256=ffxs26zm6KLAL1aUWJm-vtzuZP3LSNha7UDGXynMBKo,2234
 datachain/client/fileslice.py,sha256=bT7TYco1Qe3bqoc8aUkUZcPdPofJDHlryL5BsTn9xsY,3021
 datachain/client/fsspec.py,sha256=kf1blSGNcEXJ0tra3y5i35jc1aAy-67wMHXkqjlRMXg,12736
-datachain/client/gcs.py,sha256=41Q9hxE9CYt-31B9m-ogqrEv4NbF2SWF-j8N_AC3VQ0,4580
+datachain/client/gcs.py,sha256=tAm5CCO86UNuSwTCHVPOiPbj1fBhnEYDoEVLKvv9H5I,4632
 datachain/client/hf.py,sha256=XeVJVbiNViZCpn3sfb90Fr8SYO3BdLmfE3hOWMoqInE,951
 datachain/client/local.py,sha256=f2HBqWH8SQM5CyiJ0ljfePVROg2FszWaAn6E2c8RiLE,4596
 datachain/client/s3.py,sha256=CVHBUZ1Ic2Q3370nl-Bbe69phuWjFlrVv9dTJKBpRT0,6019
@@ -121,9 +121,9 @@ datachain/sql/sqlite/vector.py,sha256=ncW4eu2FlJhrP_CIpsvtkUabZlQdl2D5Lgwy_cbfqR
 datachain/toolkit/__init__.py,sha256=eQ58Q5Yf_Fgv1ZG0IO5dpB4jmP90rk8YxUWmPc1M2Bo,68
 datachain/toolkit/split.py,sha256=z3zRJNzjWrpPuRw-zgFbCOBKInyYxJew8ygrYQRQLNc,2930
 datachain/torch/__init__.py,sha256=gIS74PoEPy4TB3X6vx9nLO0Y3sLJzsA8ckn8pRWihJM,579
-datachain-0.8.1.dist-info/LICENSE,sha256=8DnqK5yoPI_E50bEg_zsHKZHY2HqPy4rYN338BHQaRA,11344
-datachain-0.8.1.dist-info/METADATA,sha256=iZV5_BcFNOY82HkPBDJ6h2Y895wYy5UQfo_VJaDZN6Q,8437
-datachain-0.8.1.dist-info/WHEEL,sha256=PZUExdf71Ui_so67QXpySuHtCi3-J3wvF4ORK6k_S8U,91
-datachain-0.8.1.dist-info/entry_points.txt,sha256=0GMJS6B_KWq0m3VT98vQI2YZodAMkn4uReZ_okga9R4,49
-datachain-0.8.1.dist-info/top_level.txt,sha256=lZPpdU_2jJABLNIg2kvEOBi8PtsYikbN1OdMLHk8bTg,10
-datachain-0.8.1.dist-info/RECORD,,
+datachain-0.8.2.dist-info/LICENSE,sha256=8DnqK5yoPI_E50bEg_zsHKZHY2HqPy4rYN338BHQaRA,11344
+datachain-0.8.2.dist-info/METADATA,sha256=MFVRJVJBLh_Cq3aV_1V4dHKkf15HTZHLUWWQNbRId3I,11066
+datachain-0.8.2.dist-info/WHEEL,sha256=PZUExdf71Ui_so67QXpySuHtCi3-J3wvF4ORK6k_S8U,91
+datachain-0.8.2.dist-info/entry_points.txt,sha256=0GMJS6B_KWq0m3VT98vQI2YZodAMkn4uReZ_okga9R4,49
+datachain-0.8.2.dist-info/top_level.txt,sha256=lZPpdU_2jJABLNIg2kvEOBi8PtsYikbN1OdMLHk8bTg,10
+datachain-0.8.2.dist-info/RECORD,,

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/LICENSE RENAMED Viewed

File without changes

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/WHEEL RENAMED Viewed

File without changes

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{datachain-0.8.1.dist-info → datachain-0.8.2.dist-info}/top_level.txt RENAMED Viewed

File without changes

datachain 0.8.1__py3-none-any.whl → 0.8.2__py3-none-any.whl

Potentially problematic release.

datachain 0.8.1py3-none-any.whl → 0.8.2py3-none-any.whl