PyPI - strands-diffusers - Versions diffs - 0.1.0__tar.gz → 0.3.0__tar.gz - Mend

strands-diffusers 0.1.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

{strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/.github/workflows/auto-release.yml RENAMED Viewed

@@ -28,7 +28,7 @@ jobs:
     - name: Install build tooling
       run: |
         python -m pip install --upgrade pip
-        pip install build twine
+        pip install build twine packaging
     - name: Extract version from tag
       id: get_version
@@ -40,14 +40,25 @@ jobs:
     - name: Build package (version derived from git tag via setuptools-scm)
       run: python -m build
-    - name: Verify built version matches tag
+    - name: Verify built version matches tag (PEP440-normalized)
       run: |
         ls -l dist/
-        if ! ls dist/ | grep -q "${{ steps.get_version.outputs.version }}"; then
-          echo "::error::Built artifact does not match tag version ${{ steps.get_version.outputs.version }}"
-          ls dist/
-          exit 1
-        fi
+        python - "${{ steps.get_version.outputs.version }}" <<'EOF'
+        import sys, glob, os
+        from packaging.utils import parse_wheel_filename, parse_sdist_filename
+        from packaging.version import Version
+        tag = Version(sys.argv[1])                       # normalize the tag (v1.01 -> 1.1)
+        built = set()
+        for w in glob.glob("dist/*.whl"):
+            built.add(parse_wheel_filename(os.path.basename(w))[1])
+        for s in glob.glob("dist/*.tar.gz"):
+            built.add(parse_sdist_filename(os.path.basename(s))[1])
+        print("tag:", tag, "built:", sorted(map(str, built)))
+        if tag not in built:
+            print(f"::error::Built version(s) {sorted(map(str,built))} != tag {tag}")
+            sys.exit(1)
+        print(f"✅ built version matches tag {tag}")
+        EOF
     - name: Publish to PyPI
       env:

strands_diffusers-0.3.0/.github/workflows/docs.yml ADDED Viewed

@@ -0,0 +1,52 @@
+name: Docs
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  workflow_dispatch:
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+# Allow one concurrent deployment; don't cancel an in-progress run.
+concurrency:
+  group: pages
+  cancel-in-progress: false
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install docs deps
+        run: |
+          python -m pip install --upgrade pip
+          pip install "mkdocs==1.6.1" "mkdocs-material==9.7.6"
+      - name: Build (strict)
+        run: mkdocs build --strict
+      - name: Upload Pages artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: site
+  deploy:
+    needs: build
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4

strands_diffusers-0.3.0/.gitignore ADDED Viewed

@@ -0,0 +1,26 @@
+__pycache__/
+*.pyc
+*.egg-info/
+build/
+dist/
+.venv/
+.pytest_cache/
+.ruff_cache/
+.coverage
+*.bak
+*.json.tmp
+strands_diffusers/_version.py
+system_prompt.prompt
+site/
+# Generated media is ignored everywhere EXCEPT the committed docs gallery.
+*.mp4
+*.png
+*.jpg
+*.wav
+*.gif
+/assets/
+# Docs gallery assets are real, committed outputs — always track them.
+!docs/assets/
+!docs/assets/**

strands_diffusers-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,238 @@
+Metadata-Version: 2.4
+Name: strands-diffusers
+Version: 0.3.0
+Summary: The universal entrypoint to HuggingFace diffusers for Strands agents — 100% pipeline & modality coverage, zero hardcoding. Special focus on Physical-AI world-foundation models (Cosmos) with robot action outputs.
+Author-email: Cagatay Cali <cagataycali@icloud.com>
+License: MIT
+Project-URL: Homepage, https://github.com/cagataycali/strands-diffusers
+Project-URL: Repository, https://github.com/cagataycali/strands-diffusers
+Project-URL: Issues, https://github.com/cagataycali/strands-diffusers/issues
+Keywords: strands,diffusers,huggingface,ai,agents,diffusion,video,image,vla,wfm,world-foundation-model,cosmos,robotics,physical-ai
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: strands-agents
+Requires-Dist: diffusers>=0.30
+Requires-Dist: transformers>=4.40
+Requires-Dist: torch
+Requires-Dist: pillow
+Requires-Dist: numpy
+Requires-Dist: accelerate
+Requires-Dist: matplotlib
+Provides-Extra: video
+Requires-Dist: imageio[ffmpeg]; extra == "video"
+Requires-Dist: opencv-python; extra == "video"
+Requires-Dist: av; extra == "video"
+Provides-Extra: audio
+Requires-Dist: soundfile; extra == "audio"
+Requires-Dist: librosa; extra == "audio"
+Provides-Extra: cosmos
+Requires-Dist: cosmos_guardrail; extra == "cosmos"
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+Requires-Dist: black; extra == "dev"
+Requires-Dist: ruff; extra == "dev"
+Provides-Extra: all
+Requires-Dist: strands-diffusers[audio,dev,video]; extra == "all"
+# strands-diffusers
+<p align="center">
+  <img src="docs/assets/anim/banner.svg" alt="strands-diffusers — one tool, 300+ diffusion pipelines, every modality" width="100%"/>
+</p>
+**The universal entrypoint to HuggingFace `diffusers` for Strands agents.**
+One tool — `use_diffusers` — wraps the whole library with zero hardcoding:
+discover and run any of its 300+ pipelines across every modality. It's a *visual*
+library, so here's what it actually produces — every asset below is **real
+model output**, not a placeholder:
+<table>
+  <tr>
+    <td align="center" width="25%">
+      <b>text → image</b><br/>
+      <img src="docs/assets/text_to_image.png" width="200"/><br/>
+      <sub>any of 108 image pipelines</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>text → video</b><br/>
+      <img src="docs/assets/text_to_video.gif" width="200"/><br/>
+      <sub>LTX · Wan · CogVideoX · Hunyuan</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>robot actions</b> 🤖<br/>
+      <img src="docs/assets/cosmos_world.gif" width="200"/><br/>
+      <sub>Cosmos WFM: world video + actions</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>text → audio</b><br/>
+      <img src="docs/assets/text_to_audio.png" width="200"/><br/>
+      <sub>StableAudio · AudioLDM2</sub>
+    </td>
+  </tr>
+</table>
+```
+text / image / video / robot-state  IN
+image / video / audio / actions / 3d  OUT
+```
+The registry is built at runtime from `diffusers._import_structure`, so new
+pipelines are supported automatically with no code change. Same philosophy as
+`use_aws`, `use_lerobot`, and `use_transformers`: **discover, don't hardcode.**
+<table>
+  <tr>
+    <td align="center" width="50%">
+      <b>3D mesh</b><br/>
+      <img src="docs/assets/mesh_render.png" width="200"/><br/>
+      <sub>ShapE - verts/faces to .ply</sub>
+    </td>
+    <td align="center" width="50%">
+      <b>audio</b> (<a href="docs/assets/text_to_audio.wav">hear the .wav</a>)<br/>
+      <img src="docs/assets/text_to_audio.png" width="300"/><br/>
+      <sub>StableAudio - waveform to .wav</sub>
+    </td>
+  </tr>
+</table>
+## 100% coverage, zero hardcoding
+<p align="center">
+  <img src="docs/assets/modality_coverage.png" width="640"/>
+</p>
+Every pipeline, model, and scheduler diffusers ships is reachable through one
+tool. When diffusers adds a new pipeline, `use_diffusers` exposes it immediately.
+## Physical-AI: world-foundation models with action outputs
+<p align="center">
+  <img src="docs/assets/cosmos_world.gif" width="360" alt="Cosmos world rollout"/>
+</p>
+<table>
+  <tr>
+    <td align="center"><img src="docs/assets/rollout_policy_1.gif" width="220"/><br/><sub>"Put the pot to the left of the purple item."</sub></td>
+    <td align="center"><img src="docs/assets/rollout_policy_2.gif" width="220"/><br/><sub>"Pick up the cloth and place it in the bowl."</sub></td>
+    <td align="center"><img src="docs/assets/rollout_policy_4.gif" width="220"/><br/><sub>"Open the drawer and place the spoon inside."</sub></td>
+  </tr>
+</table>
+Same robot, same first observation — **different task prompt → different imagined
+world and different predicted actions.** Five real rollouts + all three Cosmos
+action modes in the [WFM gallery](https://cagataycali.github.io/strands-diffusers/wfm/).
+This is the headline. A Cosmos action-policy rollout predicts both a future world
+**video** and the **robot action chunk** that produces it. One
+`use_diffusers(action="run", ...)` returns a `.mp4` world video, a `.json` action
+chunk (normalized `[-1, 1]`, shape `[num_chunks, T, action_dim]`), and optional
+`.wav` sound — and you can *see* the motion:
+<table>
+  <tr>
+    <td align="center"><b>time-series</b> (every dim, gripper highlighted)<br/><img src="docs/assets/cosmos_action_timeseries.png" width="380"/></td>
+    <td align="center"><b>end-effector path</b> (dims 0–2)<br/><img src="docs/assets/cosmos_action_trajectory.png" width="300"/></td>
+  </tr>
+</table>
+Verified end-to-end on NVIDIA Thor (`nvidia/Cosmos3-Nano`, bf16/cuda): one call
+produced a world video `(17, 480, 640, 3)` and an action chunk `(1, 16, 10)`. See
+[`examples/cosmos_action_policy.py`](examples/cosmos_action_policy.py).
+## Install
+```bash
+pip install -e .
+pip install -e ".[video,audio]"   # mp4 export, wav I/O
+```
+## Quick start
+```python
+from strands import Agent
+from strands_diffusers import use_diffusers
+agent = Agent(tools=[use_diffusers])
+agent("Generate an image of a robot arm in a kitchen")
+agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
+```
+Direct:
+```python
+use_diffusers(action="run", pipeline="StableDiffusionPipeline",
+              model="stabilityai/stable-diffusion-2-1",
+              parameters={"prompt": "a robot arm in a kitchen"})
+# -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}
+```
+## Two layers
+`run` loads a pipeline via `from_pretrained` and calls it; inputs are coerced
+(path / URL / base64 to PIL / video), outputs auto-saved and returned by path.
+`call` resolves and calls any diffusers class, function, or method (schedulers,
+VAEs, `CosmosActionCondition`, utils). `cached:key` references resolve to live
+objects; `"**"` unpacks a cached mapping into kwargs.
+```python
+use_diffusers(action="call", target="CosmosActionCondition",
+              parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
+use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
+              parameters={"prompt": "...", "action": "cached:cond"},
+              dtype="bfloat16", device="cuda")
+```
+## Discovery
+| action | returns |
+|---|---|
+| `pipelines` / `models` / `schedulers` | classes + derived modality |
+| `tasks` / `modalities` / `wfm` | task maps / modality groups / world-foundation models |
+| `pipeline_info` / `inspect` | signature + docs |
+| `visualize` | action chunk to plots + animation |
+| `cache` / `clear_cache` | manage loaded pipelines |
+## Architecture
+```
+core/registry.py  zero-hardcode taxonomy from diffusers._import_structure
+core/engine.py    load/cache pipelines, auto device+dtype
+core/io.py        coerce inputs; serialize video/image/audio/action/mesh
+core/viz.py       render robot action chunks to plots + animation
+tools/use_diffusers.py  the single @tool: run + call + discovery
+```
+## Testing
+```bash
+pip install -e ".[video,audio,dev]"
+pytest tests/ -q          # unit tests, no GPU, no downloads
+python examples/smoke.py  # E2E gate on tiny fixtures
+```
+Every visual in this README and the [docs](https://cagataycali.github.io/strands-diffusers/)
+is produced by real `use_diffusers` calls — regenerate them with:
+```bash
+python examples/generate_docs_assets.py
+```
+## Docs
+📖 **[cagataycali.github.io/strands-diffusers](https://cagataycali.github.io/strands-diffusers/)**
+— quickstart, full gallery (images / video / audio / actions / 3D), the
+world-foundation-model story, discovery, and the two-layer design.
+MIT

strands_diffusers-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,193 @@
+# strands-diffusers
+<p align="center">
+  <img src="docs/assets/anim/banner.svg" alt="strands-diffusers — one tool, 300+ diffusion pipelines, every modality" width="100%"/>
+</p>
+**The universal entrypoint to HuggingFace `diffusers` for Strands agents.**
+One tool — `use_diffusers` — wraps the whole library with zero hardcoding:
+discover and run any of its 300+ pipelines across every modality. It's a *visual*
+library, so here's what it actually produces — every asset below is **real
+model output**, not a placeholder:
+<table>
+  <tr>
+    <td align="center" width="25%">
+      <b>text → image</b><br/>
+      <img src="docs/assets/text_to_image.png" width="200"/><br/>
+      <sub>any of 108 image pipelines</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>text → video</b><br/>
+      <img src="docs/assets/text_to_video.gif" width="200"/><br/>
+      <sub>LTX · Wan · CogVideoX · Hunyuan</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>robot actions</b> 🤖<br/>
+      <img src="docs/assets/cosmos_world.gif" width="200"/><br/>
+      <sub>Cosmos WFM: world video + actions</sub>
+    </td>
+    <td align="center" width="25%">
+      <b>text → audio</b><br/>
+      <img src="docs/assets/text_to_audio.png" width="200"/><br/>
+      <sub>StableAudio · AudioLDM2</sub>
+    </td>
+  </tr>
+</table>
+```
+text / image / video / robot-state  IN
+image / video / audio / actions / 3d  OUT
+```
+The registry is built at runtime from `diffusers._import_structure`, so new
+pipelines are supported automatically with no code change. Same philosophy as
+`use_aws`, `use_lerobot`, and `use_transformers`: **discover, don't hardcode.**
+<table>
+  <tr>
+    <td align="center" width="50%">
+      <b>3D mesh</b><br/>
+      <img src="docs/assets/mesh_render.png" width="200"/><br/>
+      <sub>ShapE - verts/faces to .ply</sub>
+    </td>
+    <td align="center" width="50%">
+      <b>audio</b> (<a href="docs/assets/text_to_audio.wav">hear the .wav</a>)<br/>
+      <img src="docs/assets/text_to_audio.png" width="300"/><br/>
+      <sub>StableAudio - waveform to .wav</sub>
+    </td>
+  </tr>
+</table>
+## 100% coverage, zero hardcoding
+<p align="center">
+  <img src="docs/assets/modality_coverage.png" width="640"/>
+</p>
+Every pipeline, model, and scheduler diffusers ships is reachable through one
+tool. When diffusers adds a new pipeline, `use_diffusers` exposes it immediately.
+## Physical-AI: world-foundation models with action outputs
+<p align="center">
+  <img src="docs/assets/cosmos_world.gif" width="360" alt="Cosmos world rollout"/>
+</p>
+<table>
+  <tr>
+    <td align="center"><img src="docs/assets/rollout_policy_1.gif" width="220"/><br/><sub>"Put the pot to the left of the purple item."</sub></td>
+    <td align="center"><img src="docs/assets/rollout_policy_2.gif" width="220"/><br/><sub>"Pick up the cloth and place it in the bowl."</sub></td>
+    <td align="center"><img src="docs/assets/rollout_policy_4.gif" width="220"/><br/><sub>"Open the drawer and place the spoon inside."</sub></td>
+  </tr>
+</table>
+Same robot, same first observation — **different task prompt → different imagined
+world and different predicted actions.** Five real rollouts + all three Cosmos
+action modes in the [WFM gallery](https://cagataycali.github.io/strands-diffusers/wfm/).
+This is the headline. A Cosmos action-policy rollout predicts both a future world
+**video** and the **robot action chunk** that produces it. One
+`use_diffusers(action="run", ...)` returns a `.mp4` world video, a `.json` action
+chunk (normalized `[-1, 1]`, shape `[num_chunks, T, action_dim]`), and optional
+`.wav` sound — and you can *see* the motion:
+<table>
+  <tr>
+    <td align="center"><b>time-series</b> (every dim, gripper highlighted)<br/><img src="docs/assets/cosmos_action_timeseries.png" width="380"/></td>
+    <td align="center"><b>end-effector path</b> (dims 0–2)<br/><img src="docs/assets/cosmos_action_trajectory.png" width="300"/></td>
+  </tr>
+</table>
+Verified end-to-end on NVIDIA Thor (`nvidia/Cosmos3-Nano`, bf16/cuda): one call
+produced a world video `(17, 480, 640, 3)` and an action chunk `(1, 16, 10)`. See
+[`examples/cosmos_action_policy.py`](examples/cosmos_action_policy.py).
+## Install
+```bash
+pip install -e .
+pip install -e ".[video,audio]"   # mp4 export, wav I/O
+```
+## Quick start
+```python
+from strands import Agent
+from strands_diffusers import use_diffusers
+agent = Agent(tools=[use_diffusers])
+agent("Generate an image of a robot arm in a kitchen")
+agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
+```
+Direct:
+```python
+use_diffusers(action="run", pipeline="StableDiffusionPipeline",
+              model="stabilityai/stable-diffusion-2-1",
+              parameters={"prompt": "a robot arm in a kitchen"})
+# -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}
+```
+## Two layers
+`run` loads a pipeline via `from_pretrained` and calls it; inputs are coerced
+(path / URL / base64 to PIL / video), outputs auto-saved and returned by path.
+`call` resolves and calls any diffusers class, function, or method (schedulers,
+VAEs, `CosmosActionCondition`, utils). `cached:key` references resolve to live
+objects; `"**"` unpacks a cached mapping into kwargs.
+```python
+use_diffusers(action="call", target="CosmosActionCondition",
+              parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
+use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
+              parameters={"prompt": "...", "action": "cached:cond"},
+              dtype="bfloat16", device="cuda")
+```
+## Discovery
+| action | returns |
+|---|---|
+| `pipelines` / `models` / `schedulers` | classes + derived modality |
+| `tasks` / `modalities` / `wfm` | task maps / modality groups / world-foundation models |
+| `pipeline_info` / `inspect` | signature + docs |
+| `visualize` | action chunk to plots + animation |
+| `cache` / `clear_cache` | manage loaded pipelines |
+## Architecture
+```
+core/registry.py  zero-hardcode taxonomy from diffusers._import_structure
+core/engine.py    load/cache pipelines, auto device+dtype
+core/io.py        coerce inputs; serialize video/image/audio/action/mesh
+core/viz.py       render robot action chunks to plots + animation
+tools/use_diffusers.py  the single @tool: run + call + discovery
+```
+## Testing
+```bash
+pip install -e ".[video,audio,dev]"
+pytest tests/ -q          # unit tests, no GPU, no downloads
+python examples/smoke.py  # E2E gate on tiny fixtures
+```
+Every visual in this README and the [docs](https://cagataycali.github.io/strands-diffusers/)
+is produced by real `use_diffusers` calls — regenerate them with:
+```bash
+python examples/generate_docs_assets.py
+```
+## Docs
+📖 **[cagataycali.github.io/strands-diffusers](https://cagataycali.github.io/strands-diffusers/)**
+— quickstart, full gallery (images / video / audio / actions / 3D), the
+world-foundation-model story, discovery, and the two-layer design.
+MIT

strands-diffusers 0.1.0__tar.gz → 0.3.0__tar.gz

strands-diffusers 0.1.0tar.gz → 0.3.0tar.gz