PyPI - hirundo - Versions diffs - 0.1.21__tar.gz → 0.2.3.post1__tar.gz - Mend

hirundo 0.1.21tar.gz → 0.2.3.post1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{hirundo-0.1.21 → hirundo-0.2.3.post1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hirundo
-Version: 0.1.21
+Version: 0.2.3.post1
 Summary: This package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.
 Author-email: Hirundo <dev@hirundo.io>
 License: MIT License
@@ -18,7 +18,7 @@ Keywords: dataset,machine learning,data science,data engineering
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python
 Classifier: Programming Language :: Python :: 3
-Requires-Python: >=3.9
+Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: pyyaml>=6.0.1
@@ -34,8 +34,9 @@ Requires-Dist: httpx-sse>=0.4.0
 Requires-Dist: tqdm>=4.66.5
 Requires-Dist: h11>=0.16.0
 Requires-Dist: requests>=2.32.4
-Requires-Dist: urllib3>=2.5.0
+Requires-Dist: urllib3>=2.6.3
 Requires-Dist: setuptools>=78.1.1
+Requires-Dist: docutils<0.22.0
 Provides-Extra: dev
 Requires-Dist: pyyaml>=6.0.1; extra == "dev"
 Requires-Dist: types-PyYAML>=6.0.12; extra == "dev"
@@ -50,15 +51,18 @@ Requires-Dist: stamina>=24.2.0; extra == "dev"
 Requires-Dist: httpx-sse>=0.4.0; extra == "dev"
 Requires-Dist: pytest>=8.2.0; extra == "dev"
 Requires-Dist: pytest-asyncio>=0.23.6; extra == "dev"
-Requires-Dist: uv>=0.8.6; extra == "dev"
+Requires-Dist: uv>=0.9.6; extra == "dev"
 Requires-Dist: pre-commit>=3.7.1; extra == "dev"
+Requires-Dist: basedpyright==1.37.1; extra == "dev"
 Requires-Dist: virtualenv>=20.6.6; extra == "dev"
+Requires-Dist: authlib>=1.6.6; extra == "dev"
 Requires-Dist: ruff>=0.12.0; extra == "dev"
-Requires-Dist: bumpver; extra == "dev"
+Requires-Dist: bumpver>=2025.1131; extra == "dev"
 Requires-Dist: platformdirs>=4.3.6; extra == "dev"
-Requires-Dist: safety>=3.2.13; extra == "dev"
 Requires-Dist: cryptography>=44.0.1; extra == "dev"
 Requires-Dist: jinja2>=3.1.6; extra == "dev"
+Requires-Dist: filelock>=3.20.1; extra == "dev"
+Requires-Dist: marshmallow>=3.26.2; extra == "dev"
 Provides-Extra: docs
 Requires-Dist: sphinx>=7.4.7; extra == "docs"
 Requires-Dist: sphinx-autobuild>=2024.9.3; extra == "docs"
@@ -67,13 +71,17 @@ Requires-Dist: autodoc_pydantic>=2.2.0; extra == "docs"
 Requires-Dist: furo; extra == "docs"
 Requires-Dist: sphinx-multiversion; extra == "docs"
 Requires-Dist: esbonio; extra == "docs"
-Requires-Dist: starlette>=0.47.2; extra == "docs"
+Requires-Dist: starlette>=0.49.1; extra == "docs"
 Requires-Dist: markupsafe>=3.0.2; extra == "docs"
 Requires-Dist: jinja2>=3.1.6; extra == "docs"
 Provides-Extra: pandas
 Requires-Dist: pandas>=2.2.3; extra == "pandas"
 Provides-Extra: polars
 Requires-Dist: polars>=1.0.0; extra == "polars"
+Provides-Extra: transformers
+Requires-Dist: transformers>=4.57.3; extra == "transformers"
+Requires-Dist: peft>=0.18.1; extra == "transformers"
+Requires-Dist: accelerate>=1.12.0; extra == "transformers"
 Dynamic: license-file
 # Hirundo
@@ -145,7 +153,31 @@ You can install the codebase with a simple `pip install hirundo` to install the
 ## Usage
-Classification example:
+### Unlearning LLM behavior
+Make sure to install the `transformers` extra, i.e. `pip install hirundo[transformers]` or `uv pip install hirundo[transformers]` if you have `uv` installed which is much faster than `pip`.
+```python
+llm = LlmModel(
+    model_name="Nemotron-Flash-1B",
+    model_source=HuggingFaceTransformersModel(
+        model_name="nvidia/Nemotron-Flash-1B",
+    ),
+)
+llm_id = llm.create()
+run_info = BiasRunInfo(
+    bias_type=BiasType.ALL,
+)
+run_id = LlmUnlearningRun.launch(
+    llm_id,
+    run_info,
+)
+new_adapter = llm.get_hf_pipeline_for_run(run_id)
+```
+### Dataset QA
+#### Classification example:
 ```python
 from hirundo import (
@@ -182,7 +214,7 @@ results = test_dataset.check_run()
 print(results)
 ```
-Object detection example:
+#### Object detection example:
 ```python
 from hirundo import (
@@ -223,7 +255,7 @@ results = test_dataset.check_run()
 print(results)
 ```
-Note: Currently we only support the main CPython release 3.9, 3.10, 3.11, 3.12 & 3.13. PyPy support may be introduced in the future.
+Note: Currently we only support the main CPython release 3.10, 3.11, 3.12 & 3.13. PyPy support may be introduced in the future.
 ## Further documentation

{hirundo-0.1.21 → hirundo-0.2.3.post1}/README.md RENAMED Viewed

@@ -67,7 +67,31 @@ You can install the codebase with a simple `pip install hirundo` to install the
 ## Usage
-Classification example:
+### Unlearning LLM behavior
+Make sure to install the `transformers` extra, i.e. `pip install hirundo[transformers]` or `uv pip install hirundo[transformers]` if you have `uv` installed which is much faster than `pip`.
+```python
+llm = LlmModel(
+    model_name="Nemotron-Flash-1B",
+    model_source=HuggingFaceTransformersModel(
+        model_name="nvidia/Nemotron-Flash-1B",
+    ),
+)
+llm_id = llm.create()
+run_info = BiasRunInfo(
+    bias_type=BiasType.ALL,
+)
+run_id = LlmUnlearningRun.launch(
+    llm_id,
+    run_info,
+)
+new_adapter = llm.get_hf_pipeline_for_run(run_id)
+```
+### Dataset QA
+#### Classification example:
 ```python
 from hirundo import (
@@ -104,7 +128,7 @@ results = test_dataset.check_run()
 print(results)
 ```
-Object detection example:
+#### Object detection example:
 ```python
 from hirundo import (
@@ -145,7 +169,7 @@ results = test_dataset.check_run()
 print(results)
 ```
-Note: Currently we only support the main CPython release 3.9, 3.10, 3.11, 3.12 & 3.13. PyPy support may be introduced in the future.
+Note: Currently we only support the main CPython release 3.10, 3.11, 3.12 & 3.13. PyPy support may be introduced in the future.
 ## Further documentation

{hirundo-0.1.21 → hirundo-0.2.3.post1}/hirundo/__init__.py RENAMED Viewed

@@ -5,8 +5,8 @@ from .dataset_enum import (
 )
 from .dataset_qa import (
     ClassificationRunArgs,
-    Domain,
     HirundoError,
+    ModalityType,
     ObjectDetectionRunArgs,
     QADataset,
     RunArgs,
@@ -30,6 +30,15 @@ from .storage import (
     StorageGit,
     StorageS3,
 )
+from .unlearning_llm import (
+    BiasRunInfo,
+    BiasType,
+    HuggingFaceTransformersModel,
+    LlmModel,
+    LlmSources,
+    LlmUnlearningRun,
+    LocalTransformersModel,
+)
 from .unzip import load_df, load_from_zip
 __all__ = [
@@ -43,7 +52,7 @@ __all__ = [
     "KeylabsObjSegImages",
     "KeylabsObjSegVideo",
     "QADataset",
-    "Domain",
+    "ModalityType",
     "RunArgs",
     "ClassificationRunArgs",
     "ObjectDetectionRunArgs",
@@ -59,8 +68,15 @@ __all__ = [
     "StorageGit",
     "StorageConfig",
     "DatasetQAResults",
+    "BiasRunInfo",
+    "BiasType",
+    "HuggingFaceTransformersModel",
+    "LlmModel",
+    "LlmSources",
+    "LlmUnlearningRun",
+    "LocalTransformersModel",
     "load_df",
     "load_from_zip",
 ]
-__version__ = "0.1.21"
+__version__ = "0.2.3.post1"

{hirundo-0.1.21 → hirundo-0.2.3.post1}/hirundo/_constraints.py RENAMED Viewed

@@ -1,5 +1,4 @@
 import re
-import typing
 from typing import TYPE_CHECKING
 from hirundo._urls import (
@@ -135,8 +134,8 @@ def validate_labeling_type(
 def validate_labeling_info(
     labeling_type: "LabelingType",
-    labeling_info: "typing.Union[LabelingInfo, list[LabelingInfo]]",
-    storage_config: "typing.Union[StorageConfig, ResponseStorageConfig]",
+    labeling_info: "LabelingInfo | list[LabelingInfo]",
+    storage_config: "StorageConfig | ResponseStorageConfig",
 ) -> None:
     """
     Validate the labeling info for a dataset

{hirundo-0.1.21 → hirundo-0.2.3.post1}/hirundo/_iter_sse_retrying.py RENAMED Viewed

@@ -1,6 +1,5 @@
 import asyncio
 import time
-import typing
 import uuid
 from collections.abc import AsyncGenerator, Generator
@@ -15,13 +14,15 @@ from hirundo.logger import get_logger
 logger = get_logger(__name__)
+MAX_RETRIES = 50
 # Credit: https://github.com/florimondmanca/httpx-sse/blob/master/README.md#handling-reconnections
 def iter_sse_retrying(
     client: httpx.Client,
     method: str,
     url: str,
-    headers: typing.Optional[dict[str, str]] = None,
+    headers: dict[str, str] | None = None,
 ) -> Generator[ServerSentEvent, None, None]:
     if headers is None:
         headers = {}
@@ -41,7 +42,8 @@ def iter_sse_retrying(
             httpx.ReadError,
             httpx.RemoteProtocolError,
             urllib3.exceptions.ReadTimeoutError,
-        )
+        ),
+        attempts=MAX_RETRIES,
     )
     def _iter_sse():
         nonlocal last_event_id, reconnection_delay
@@ -105,7 +107,8 @@ async def aiter_sse_retrying(
             httpx.ReadError,
             httpx.RemoteProtocolError,
             urllib3.exceptions.ReadTimeoutError,
-        )
+        ),
+        attempts=MAX_RETRIES,
     )
     async def _iter_sse() -> AsyncGenerator[ServerSentEvent, None]:
         nonlocal last_event_id, reconnection_delay

hirundo-0.2.3.post1/hirundo/_llm_pipeline.py ADDED Viewed

@@ -0,0 +1,153 @@
+import importlib.util
+import tempfile
+import zipfile
+from pathlib import Path
+from typing import TYPE_CHECKING, cast
+from hirundo import HirundoError
+from hirundo._http import requests
+from hirundo._timeouts import DOWNLOAD_READ_TIMEOUT
+from hirundo.logger import get_logger
+if TYPE_CHECKING:
+    from torch import device as torch_device
+    from transformers.configuration_utils import PretrainedConfig
+    from transformers.modeling_utils import PreTrainedModel
+    from transformers.pipelines.base import Pipeline
+    from hirundo.unlearning_llm import LlmModel, LlmModelOut
+logger = get_logger(__name__)
+ZIP_FILE_CHUNK_SIZE = 50 * 1024 * 1024  # 50 MB
+REQUIRED_PACKAGES_FOR_PIPELINE = ["peft", "transformers", "accelerate"]
+def get_hf_pipeline_for_run_given_model(
+    llm: "LlmModel | LlmModelOut",
+    run_id: str,
+    config: "PretrainedConfig | None" = None,
+    device: "str | int | torch_device | None" = None,
+    device_map: str | dict[str, int | str] | None = None,
+    trust_remote_code: bool = False,
+    token: str | None = None,
+) -> "Pipeline":
+    for package in REQUIRED_PACKAGES_FOR_PIPELINE:
+        if importlib.util.find_spec(package) is None:
+            raise HirundoError(
+                f'{package} is not installed. Please install transformers extra with pip install "hirundo[transformers]"'
+            )
+    from peft import PeftModel
+    from transformers.models.auto.configuration_auto import AutoConfig
+    from transformers.models.auto.modeling_auto import (
+        MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES,
+        AutoModelForCausalLM,
+        AutoModelForImageTextToText,
+    )
+    from transformers.models.auto.tokenization_auto import AutoTokenizer
+    from transformers.pipelines import pipeline
+    from hirundo.unlearning_llm import (
+        HuggingFaceTransformersModel,
+        HuggingFaceTransformersModelOutput,
+        LlmUnlearningRun,
+    )
+    run_results = LlmUnlearningRun.check_run_by_id(run_id)
+    if run_results is None:
+        raise HirundoError("No run results found")
+    result_payload = (
+        run_results.get("result", run_results)
+        if isinstance(run_results, dict)
+        else run_results
+    )
+    if isinstance(result_payload, dict):
+        result_url = result_payload.get("result")
+    else:
+        result_url = result_payload
+    if not isinstance(result_url, str):
+        raise HirundoError("Run results did not include a download URL")
+    # Stream the zip file download
+    zip_file_path = tempfile.NamedTemporaryFile(delete=False).name
+    with requests.get(
+        result_url,
+        timeout=DOWNLOAD_READ_TIMEOUT,
+        stream=True,
+    ) as r:
+        r.raise_for_status()
+        with open(zip_file_path, "wb") as zip_file:
+            for chunk in r.iter_content(chunk_size=ZIP_FILE_CHUNK_SIZE):
+                zip_file.write(chunk)
+        logger.info(
+            "Successfully downloaded the result zip file for run ID %s to %s",
+            run_id,
+            zip_file_path,
+        )
+    with tempfile.TemporaryDirectory() as temp_dir:
+        temp_dir_path = Path(temp_dir)
+        with zipfile.ZipFile(zip_file_path, "r") as zip_file:
+            zip_file.extractall(temp_dir_path)
+        # Attempt to load the tokenizer normally
+        base_model_name = (
+            llm.model_source.model_name
+            if isinstance(
+                llm.model_source,
+                HuggingFaceTransformersModel | HuggingFaceTransformersModelOutput,
+            )
+            else llm.model_source.local_path
+        )
+        token = (
+            llm.model_source.token
+            if isinstance(
+                llm.model_source,
+                HuggingFaceTransformersModel,
+            )
+            else token
+        )
+        tokenizer = AutoTokenizer.from_pretrained(
+            base_model_name,
+            token=token,
+            trust_remote_code=trust_remote_code,
+        )
+        if tokenizer.pad_token is None:
+            tokenizer.pad_token = tokenizer.eos_token
+        config = AutoConfig.from_pretrained(
+            base_model_name,
+            token=token,
+            trust_remote_code=trust_remote_code,
+        )
+        config_dict = config.to_dict() if hasattr(config, "to_dict") else config
+        is_multimodal = (
+            config_dict.get("model_type")
+            in MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES.keys()
+        )
+        if is_multimodal:
+            base_model = AutoModelForImageTextToText.from_pretrained(
+                base_model_name,
+                token=token,
+                trust_remote_code=trust_remote_code,
+            )
+        else:
+            base_model = AutoModelForCausalLM.from_pretrained(
+                base_model_name,
+                token=token,
+                trust_remote_code=trust_remote_code,
+            )
+        model = cast(
+            "PreTrainedModel",
+            PeftModel.from_pretrained(
+                base_model, str(temp_dir_path / "unlearned_model_folder")
+            ),
+        )
+        return pipeline(
+            task="text-generation",
+            model=model,
+            tokenizer=tokenizer,
+            config=config,
+            device=device,
+            device_map=device_map,
+        )

hirundo 0.1.21__tar.gz → 0.2.3.post1__tar.gz

hirundo 0.1.21tar.gz → 0.2.3.post1tar.gz