strands-diffusers 0.1.0__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (115) hide show
  1. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/.github/workflows/auto-release.yml +18 -7
  2. strands_diffusers-0.3.0/.github/workflows/docs.yml +52 -0
  3. strands_diffusers-0.3.0/.gitignore +26 -0
  4. strands_diffusers-0.3.0/PKG-INFO +238 -0
  5. strands_diffusers-0.3.0/README.md +193 -0
  6. strands_diffusers-0.3.0/docs/assets/anim/banner.svg +140 -0
  7. strands_diffusers-0.3.0/docs/assets/anim/denoise.svg +111 -0
  8. strands_diffusers-0.3.0/docs/assets/anim/discover.svg +56 -0
  9. strands_diffusers-0.3.0/docs/assets/anim/hub.svg +38 -0
  10. strands_diffusers-0.3.0/docs/assets/anim/m_audio.svg +34 -0
  11. strands_diffusers-0.3.0/docs/assets/anim/m_image.svg +132 -0
  12. strands_diffusers-0.3.0/docs/assets/anim/m_mesh.svg +34 -0
  13. strands_diffusers-0.3.0/docs/assets/anim/m_video.svg +68 -0
  14. strands_diffusers-0.3.0/docs/assets/anim/robot_modes.svg +78 -0
  15. strands_diffusers-0.3.0/docs/assets/anim/wfm.svg +82 -0
  16. strands_diffusers-0.3.0/docs/assets/audio_birds.png +0 -0
  17. strands_diffusers-0.3.0/docs/assets/audio_birds.wav +0 -0
  18. strands_diffusers-0.3.0/docs/assets/audio_dog.png +0 -0
  19. strands_diffusers-0.3.0/docs/assets/audio_dog.wav +0 -0
  20. strands_diffusers-0.3.0/docs/assets/audio_rain.png +0 -0
  21. strands_diffusers-0.3.0/docs/assets/audio_rain.wav +0 -0
  22. strands_diffusers-0.3.0/docs/assets/audio_typing.png +0 -0
  23. strands_diffusers-0.3.0/docs/assets/audio_typing.wav +0 -0
  24. strands_diffusers-0.3.0/docs/assets/cosmos_action_animation.gif +0 -0
  25. strands_diffusers-0.3.0/docs/assets/cosmos_action_animation.mp4 +0 -0
  26. strands_diffusers-0.3.0/docs/assets/cosmos_action_chunk.json +1 -0
  27. strands_diffusers-0.3.0/docs/assets/cosmos_action_timeseries.png +0 -0
  28. strands_diffusers-0.3.0/docs/assets/cosmos_action_trajectory.png +0 -0
  29. strands_diffusers-0.3.0/docs/assets/cosmos_world.gif +0 -0
  30. strands_diffusers-0.3.0/docs/assets/cosmos_world.mp4 +0 -0
  31. strands_diffusers-0.3.0/docs/assets/logo.svg +23 -0
  32. strands_diffusers-0.3.0/docs/assets/mesh_render.png +0 -0
  33. strands_diffusers-0.3.0/docs/assets/modality_coverage.png +0 -0
  34. strands_diffusers-0.3.0/docs/assets/rollout_id_av0.gif +0 -0
  35. strands_diffusers-0.3.0/docs/assets/rollout_id_av0.mp4 +0 -0
  36. strands_diffusers-0.3.0/docs/assets/rollout_id_av0_action.json +1 -0
  37. strands_diffusers-0.3.0/docs/assets/rollout_id_av0_input.gif +0 -0
  38. strands_diffusers-0.3.0/docs/assets/rollout_id_av1.gif +0 -0
  39. strands_diffusers-0.3.0/docs/assets/rollout_id_av1.mp4 +0 -0
  40. strands_diffusers-0.3.0/docs/assets/rollout_id_av1_action.json +1 -0
  41. strands_diffusers-0.3.0/docs/assets/rollout_id_av1_input.gif +0 -0
  42. strands_diffusers-0.3.0/docs/assets/rollout_policy_1.gif +0 -0
  43. strands_diffusers-0.3.0/docs/assets/rollout_policy_1.mp4 +0 -0
  44. strands_diffusers-0.3.0/docs/assets/rollout_policy_1_action.json +1 -0
  45. strands_diffusers-0.3.0/docs/assets/rollout_policy_2.gif +0 -0
  46. strands_diffusers-0.3.0/docs/assets/rollout_policy_2.mp4 +0 -0
  47. strands_diffusers-0.3.0/docs/assets/rollout_policy_2_action.json +1 -0
  48. strands_diffusers-0.3.0/docs/assets/rollout_policy_3.gif +0 -0
  49. strands_diffusers-0.3.0/docs/assets/rollout_policy_3.mp4 +0 -0
  50. strands_diffusers-0.3.0/docs/assets/rollout_policy_3_action.json +1 -0
  51. strands_diffusers-0.3.0/docs/assets/rollout_policy_4.gif +0 -0
  52. strands_diffusers-0.3.0/docs/assets/rollout_policy_4.mp4 +0 -0
  53. strands_diffusers-0.3.0/docs/assets/rollout_policy_4_action.json +1 -0
  54. strands_diffusers-0.3.0/docs/assets/rollout_policy_5.gif +0 -0
  55. strands_diffusers-0.3.0/docs/assets/rollout_policy_5.mp4 +0 -0
  56. strands_diffusers-0.3.0/docs/assets/rollout_policy_5_action.json +1 -0
  57. strands_diffusers-0.3.0/docs/assets/rollout_t2v.gif +0 -0
  58. strands_diffusers-0.3.0/docs/assets/rollout_t2v.mp4 +0 -0
  59. strands_diffusers-0.3.0/docs/assets/text_to_audio.png +0 -0
  60. strands_diffusers-0.3.0/docs/assets/text_to_audio.wav +0 -0
  61. strands_diffusers-0.3.0/docs/assets/text_to_image.png +0 -0
  62. strands_diffusers-0.3.0/docs/assets/text_to_video.gif +0 -0
  63. strands_diffusers-0.3.0/docs/assets/text_to_video.mp4 +0 -0
  64. strands_diffusers-0.3.0/docs/discovery.md +79 -0
  65. strands_diffusers-0.3.0/docs/gallery/actions.md +58 -0
  66. strands_diffusers-0.3.0/docs/gallery/audio.md +100 -0
  67. strands_diffusers-0.3.0/docs/gallery/images.md +63 -0
  68. strands_diffusers-0.3.0/docs/gallery/mesh.md +40 -0
  69. strands_diffusers-0.3.0/docs/gallery/video.md +47 -0
  70. strands_diffusers-0.3.0/docs/index.md +148 -0
  71. strands_diffusers-0.3.0/docs/js/glass.js +62 -0
  72. strands_diffusers-0.3.0/docs/layers.md +102 -0
  73. strands_diffusers-0.3.0/docs/quickstart.md +65 -0
  74. strands_diffusers-0.3.0/docs/stylesheets/extra.css +668 -0
  75. strands_diffusers-0.3.0/docs/wfm.md +237 -0
  76. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/README.md +5 -4
  77. strands_diffusers-0.3.0/examples/gen_animations.py +541 -0
  78. strands_diffusers-0.3.0/examples/generate_docs_assets.py +328 -0
  79. strands_diffusers-0.3.0/examples/generate_real_assets.py +162 -0
  80. strands_diffusers-0.3.0/examples/generate_wfm_rollouts.py +220 -0
  81. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/smoke.py +24 -0
  82. strands_diffusers-0.3.0/examples/text_to_audio.py +68 -0
  83. strands_diffusers-0.3.0/mkdocs.yml +86 -0
  84. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/pyproject.toml +1 -0
  85. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/requirements.txt +1 -0
  86. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/_version.py +3 -3
  87. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/core/engine.py +23 -1
  88. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/core/io.py +44 -21
  89. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/core/registry.py +22 -6
  90. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/core/viz.py +111 -29
  91. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/tools/use_diffusers.py +21 -3
  92. strands_diffusers-0.3.0/strands_diffusers.egg-info/PKG-INFO +238 -0
  93. strands_diffusers-0.3.0/strands_diffusers.egg-info/SOURCES.txt +108 -0
  94. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers.egg-info/requires.txt +1 -0
  95. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/tests/test_action_io.py +35 -0
  96. strands_diffusers-0.1.0/.gitignore +0 -21
  97. strands_diffusers-0.1.0/PKG-INFO +0 -199
  98. strands_diffusers-0.1.0/README.md +0 -155
  99. strands_diffusers-0.1.0/strands_diffusers.egg-info/PKG-INFO +0 -199
  100. strands_diffusers-0.1.0/strands_diffusers.egg-info/SOURCES.txt +0 -31
  101. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/.github/workflows/ci.yml +0 -0
  102. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/SETUP_COSMOS.md +0 -0
  103. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/cosmos_action_policy.py +0 -0
  104. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/gallery_20.py +0 -0
  105. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/text_to_image.py +0 -0
  106. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/text_to_video.py +0 -0
  107. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/examples/visualize_actions.py +0 -0
  108. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/setup.cfg +0 -0
  109. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/setup.py +0 -0
  110. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/__init__.py +0 -0
  111. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/core/__init__.py +0 -0
  112. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers/tools/__init__.py +0 -0
  113. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers.egg-info/dependency_links.txt +0 -0
  114. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/strands_diffusers.egg-info/top_level.txt +0 -0
  115. {strands_diffusers-0.1.0 → strands_diffusers-0.3.0}/tests/test_registry.py +0 -0
@@ -28,7 +28,7 @@ jobs:
28
28
  - name: Install build tooling
29
29
  run: |
30
30
  python -m pip install --upgrade pip
31
- pip install build twine
31
+ pip install build twine packaging
32
32
 
33
33
  - name: Extract version from tag
34
34
  id: get_version
@@ -40,14 +40,25 @@ jobs:
40
40
  - name: Build package (version derived from git tag via setuptools-scm)
41
41
  run: python -m build
42
42
 
43
- - name: Verify built version matches tag
43
+ - name: Verify built version matches tag (PEP440-normalized)
44
44
  run: |
45
45
  ls -l dist/
46
- if ! ls dist/ | grep -q "${{ steps.get_version.outputs.version }}"; then
47
- echo "::error::Built artifact does not match tag version ${{ steps.get_version.outputs.version }}"
48
- ls dist/
49
- exit 1
50
- fi
46
+ python - "${{ steps.get_version.outputs.version }}" <<'EOF'
47
+ import sys, glob, os
48
+ from packaging.utils import parse_wheel_filename, parse_sdist_filename
49
+ from packaging.version import Version
50
+ tag = Version(sys.argv[1]) # normalize the tag (v1.01 -> 1.1)
51
+ built = set()
52
+ for w in glob.glob("dist/*.whl"):
53
+ built.add(parse_wheel_filename(os.path.basename(w))[1])
54
+ for s in glob.glob("dist/*.tar.gz"):
55
+ built.add(parse_sdist_filename(os.path.basename(s))[1])
56
+ print("tag:", tag, "built:", sorted(map(str, built)))
57
+ if tag not in built:
58
+ print(f"::error::Built version(s) {sorted(map(str,built))} != tag {tag}")
59
+ sys.exit(1)
60
+ print(f"✅ built version matches tag {tag}")
61
+ EOF
51
62
 
52
63
  - name: Publish to PyPI
53
64
  env:
@@ -0,0 +1,52 @@
1
+ name: Docs
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ paths:
7
+ - 'docs/**'
8
+ - 'mkdocs.yml'
9
+ - '.github/workflows/docs.yml'
10
+ workflow_dispatch:
11
+
12
+ permissions:
13
+ contents: read
14
+ pages: write
15
+ id-token: write
16
+
17
+ # Allow one concurrent deployment; don't cancel an in-progress run.
18
+ concurrency:
19
+ group: pages
20
+ cancel-in-progress: false
21
+
22
+ jobs:
23
+ build:
24
+ runs-on: ubuntu-latest
25
+ steps:
26
+ - uses: actions/checkout@v4
27
+ with:
28
+ fetch-depth: 0
29
+ - uses: actions/setup-python@v5
30
+ with:
31
+ python-version: "3.12"
32
+ - name: Install docs deps
33
+ run: |
34
+ python -m pip install --upgrade pip
35
+ pip install "mkdocs==1.6.1" "mkdocs-material==9.7.6"
36
+ - name: Build (strict)
37
+ run: mkdocs build --strict
38
+ - name: Upload Pages artifact
39
+ uses: actions/upload-pages-artifact@v3
40
+ with:
41
+ path: site
42
+
43
+ deploy:
44
+ needs: build
45
+ runs-on: ubuntu-latest
46
+ environment:
47
+ name: github-pages
48
+ url: ${{ steps.deployment.outputs.page_url }}
49
+ steps:
50
+ - name: Deploy to GitHub Pages
51
+ id: deployment
52
+ uses: actions/deploy-pages@v4
@@ -0,0 +1,26 @@
1
+ __pycache__/
2
+ *.pyc
3
+ *.egg-info/
4
+ build/
5
+ dist/
6
+ .venv/
7
+ .pytest_cache/
8
+ .ruff_cache/
9
+ .coverage
10
+ *.bak
11
+ *.json.tmp
12
+ strands_diffusers/_version.py
13
+ system_prompt.prompt
14
+ site/
15
+
16
+ # Generated media is ignored everywhere EXCEPT the committed docs gallery.
17
+ *.mp4
18
+ *.png
19
+ *.jpg
20
+ *.wav
21
+ *.gif
22
+ /assets/
23
+
24
+ # Docs gallery assets are real, committed outputs — always track them.
25
+ !docs/assets/
26
+ !docs/assets/**
@@ -0,0 +1,238 @@
1
+ Metadata-Version: 2.4
2
+ Name: strands-diffusers
3
+ Version: 0.3.0
4
+ Summary: The universal entrypoint to HuggingFace diffusers for Strands agents — 100% pipeline & modality coverage, zero hardcoding. Special focus on Physical-AI world-foundation models (Cosmos) with robot action outputs.
5
+ Author-email: Cagatay Cali <cagataycali@icloud.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/cagataycali/strands-diffusers
8
+ Project-URL: Repository, https://github.com/cagataycali/strands-diffusers
9
+ Project-URL: Issues, https://github.com/cagataycali/strands-diffusers/issues
10
+ Keywords: strands,diffusers,huggingface,ai,agents,diffusion,video,image,vla,wfm,world-foundation-model,cosmos,robotics,physical-ai
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: strands-agents
23
+ Requires-Dist: diffusers>=0.30
24
+ Requires-Dist: transformers>=4.40
25
+ Requires-Dist: torch
26
+ Requires-Dist: pillow
27
+ Requires-Dist: numpy
28
+ Requires-Dist: accelerate
29
+ Requires-Dist: matplotlib
30
+ Provides-Extra: video
31
+ Requires-Dist: imageio[ffmpeg]; extra == "video"
32
+ Requires-Dist: opencv-python; extra == "video"
33
+ Requires-Dist: av; extra == "video"
34
+ Provides-Extra: audio
35
+ Requires-Dist: soundfile; extra == "audio"
36
+ Requires-Dist: librosa; extra == "audio"
37
+ Provides-Extra: cosmos
38
+ Requires-Dist: cosmos_guardrail; extra == "cosmos"
39
+ Provides-Extra: dev
40
+ Requires-Dist: pytest>=7.0; extra == "dev"
41
+ Requires-Dist: black; extra == "dev"
42
+ Requires-Dist: ruff; extra == "dev"
43
+ Provides-Extra: all
44
+ Requires-Dist: strands-diffusers[audio,dev,video]; extra == "all"
45
+
46
+ # strands-diffusers
47
+
48
+ <p align="center">
49
+ <img src="docs/assets/anim/banner.svg" alt="strands-diffusers — one tool, 300+ diffusion pipelines, every modality" width="100%"/>
50
+ </p>
51
+
52
+
53
+ **The universal entrypoint to HuggingFace `diffusers` for Strands agents.**
54
+ One tool — `use_diffusers` — wraps the whole library with zero hardcoding:
55
+ discover and run any of its 300+ pipelines across every modality. It's a *visual*
56
+ library, so here's what it actually produces — every asset below is **real
57
+ model output**, not a placeholder:
58
+
59
+ <table>
60
+ <tr>
61
+ <td align="center" width="25%">
62
+ <b>text → image</b><br/>
63
+ <img src="docs/assets/text_to_image.png" width="200"/><br/>
64
+ <sub>any of 108 image pipelines</sub>
65
+ </td>
66
+ <td align="center" width="25%">
67
+ <b>text → video</b><br/>
68
+ <img src="docs/assets/text_to_video.gif" width="200"/><br/>
69
+ <sub>LTX · Wan · CogVideoX · Hunyuan</sub>
70
+ </td>
71
+ <td align="center" width="25%">
72
+ <b>robot actions</b> 🤖<br/>
73
+ <img src="docs/assets/cosmos_world.gif" width="200"/><br/>
74
+ <sub>Cosmos WFM: world video + actions</sub>
75
+ </td>
76
+ <td align="center" width="25%">
77
+ <b>text → audio</b><br/>
78
+ <img src="docs/assets/text_to_audio.png" width="200"/><br/>
79
+ <sub>StableAudio · AudioLDM2</sub>
80
+ </td>
81
+ </tr>
82
+ </table>
83
+
84
+ ```
85
+ text / image / video / robot-state IN
86
+ image / video / audio / actions / 3d OUT
87
+ ```
88
+
89
+ The registry is built at runtime from `diffusers._import_structure`, so new
90
+ pipelines are supported automatically with no code change. Same philosophy as
91
+ `use_aws`, `use_lerobot`, and `use_transformers`: **discover, don't hardcode.**
92
+
93
+ <table>
94
+ <tr>
95
+ <td align="center" width="50%">
96
+ <b>3D mesh</b><br/>
97
+ <img src="docs/assets/mesh_render.png" width="200"/><br/>
98
+ <sub>ShapE - verts/faces to .ply</sub>
99
+ </td>
100
+ <td align="center" width="50%">
101
+ <b>audio</b> (<a href="docs/assets/text_to_audio.wav">hear the .wav</a>)<br/>
102
+ <img src="docs/assets/text_to_audio.png" width="300"/><br/>
103
+ <sub>StableAudio - waveform to .wav</sub>
104
+ </td>
105
+ </tr>
106
+ </table>
107
+
108
+ ## 100% coverage, zero hardcoding
109
+
110
+ <p align="center">
111
+ <img src="docs/assets/modality_coverage.png" width="640"/>
112
+ </p>
113
+
114
+ Every pipeline, model, and scheduler diffusers ships is reachable through one
115
+ tool. When diffusers adds a new pipeline, `use_diffusers` exposes it immediately.
116
+
117
+ ## Physical-AI: world-foundation models with action outputs
118
+
119
+ <p align="center">
120
+ <img src="docs/assets/cosmos_world.gif" width="360" alt="Cosmos world rollout"/>
121
+ </p>
122
+
123
+ <table>
124
+ <tr>
125
+ <td align="center"><img src="docs/assets/rollout_policy_1.gif" width="220"/><br/><sub>"Put the pot to the left of the purple item."</sub></td>
126
+ <td align="center"><img src="docs/assets/rollout_policy_2.gif" width="220"/><br/><sub>"Pick up the cloth and place it in the bowl."</sub></td>
127
+ <td align="center"><img src="docs/assets/rollout_policy_4.gif" width="220"/><br/><sub>"Open the drawer and place the spoon inside."</sub></td>
128
+ </tr>
129
+ </table>
130
+
131
+ Same robot, same first observation — **different task prompt → different imagined
132
+ world and different predicted actions.** Five real rollouts + all three Cosmos
133
+ action modes in the [WFM gallery](https://cagataycali.github.io/strands-diffusers/wfm/).
134
+
135
+
136
+ This is the headline. A Cosmos action-policy rollout predicts both a future world
137
+ **video** and the **robot action chunk** that produces it. One
138
+ `use_diffusers(action="run", ...)` returns a `.mp4` world video, a `.json` action
139
+ chunk (normalized `[-1, 1]`, shape `[num_chunks, T, action_dim]`), and optional
140
+ `.wav` sound — and you can *see* the motion:
141
+
142
+ <table>
143
+ <tr>
144
+ <td align="center"><b>time-series</b> (every dim, gripper highlighted)<br/><img src="docs/assets/cosmos_action_timeseries.png" width="380"/></td>
145
+ <td align="center"><b>end-effector path</b> (dims 0–2)<br/><img src="docs/assets/cosmos_action_trajectory.png" width="300"/></td>
146
+ </tr>
147
+ </table>
148
+
149
+ Verified end-to-end on NVIDIA Thor (`nvidia/Cosmos3-Nano`, bf16/cuda): one call
150
+ produced a world video `(17, 480, 640, 3)` and an action chunk `(1, 16, 10)`. See
151
+ [`examples/cosmos_action_policy.py`](examples/cosmos_action_policy.py).
152
+
153
+ ## Install
154
+
155
+ ```bash
156
+ pip install -e .
157
+ pip install -e ".[video,audio]" # mp4 export, wav I/O
158
+ ```
159
+
160
+ ## Quick start
161
+
162
+ ```python
163
+ from strands import Agent
164
+ from strands_diffusers import use_diffusers
165
+
166
+ agent = Agent(tools=[use_diffusers])
167
+ agent("Generate an image of a robot arm in a kitchen")
168
+ agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
169
+ ```
170
+
171
+ Direct:
172
+
173
+ ```python
174
+ use_diffusers(action="run", pipeline="StableDiffusionPipeline",
175
+ model="stabilityai/stable-diffusion-2-1",
176
+ parameters={"prompt": "a robot arm in a kitchen"})
177
+ # -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}
178
+ ```
179
+
180
+ ## Two layers
181
+
182
+ `run` loads a pipeline via `from_pretrained` and calls it; inputs are coerced
183
+ (path / URL / base64 to PIL / video), outputs auto-saved and returned by path.
184
+
185
+ `call` resolves and calls any diffusers class, function, or method (schedulers,
186
+ VAEs, `CosmosActionCondition`, utils). `cached:key` references resolve to live
187
+ objects; `"**"` unpacks a cached mapping into kwargs.
188
+
189
+ ```python
190
+ use_diffusers(action="call", target="CosmosActionCondition",
191
+ parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
192
+ use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
193
+ parameters={"prompt": "...", "action": "cached:cond"},
194
+ dtype="bfloat16", device="cuda")
195
+ ```
196
+
197
+ ## Discovery
198
+
199
+ | action | returns |
200
+ |---|---|
201
+ | `pipelines` / `models` / `schedulers` | classes + derived modality |
202
+ | `tasks` / `modalities` / `wfm` | task maps / modality groups / world-foundation models |
203
+ | `pipeline_info` / `inspect` | signature + docs |
204
+ | `visualize` | action chunk to plots + animation |
205
+ | `cache` / `clear_cache` | manage loaded pipelines |
206
+
207
+ ## Architecture
208
+
209
+ ```
210
+ core/registry.py zero-hardcode taxonomy from diffusers._import_structure
211
+ core/engine.py load/cache pipelines, auto device+dtype
212
+ core/io.py coerce inputs; serialize video/image/audio/action/mesh
213
+ core/viz.py render robot action chunks to plots + animation
214
+ tools/use_diffusers.py the single @tool: run + call + discovery
215
+ ```
216
+
217
+ ## Testing
218
+
219
+ ```bash
220
+ pip install -e ".[video,audio,dev]"
221
+ pytest tests/ -q # unit tests, no GPU, no downloads
222
+ python examples/smoke.py # E2E gate on tiny fixtures
223
+ ```
224
+
225
+ Every visual in this README and the [docs](https://cagataycali.github.io/strands-diffusers/)
226
+ is produced by real `use_diffusers` calls — regenerate them with:
227
+
228
+ ```bash
229
+ python examples/generate_docs_assets.py
230
+ ```
231
+
232
+ ## Docs
233
+
234
+ 📖 **[cagataycali.github.io/strands-diffusers](https://cagataycali.github.io/strands-diffusers/)**
235
+ — quickstart, full gallery (images / video / audio / actions / 3D), the
236
+ world-foundation-model story, discovery, and the two-layer design.
237
+
238
+ MIT
@@ -0,0 +1,193 @@
1
+ # strands-diffusers
2
+
3
+ <p align="center">
4
+ <img src="docs/assets/anim/banner.svg" alt="strands-diffusers — one tool, 300+ diffusion pipelines, every modality" width="100%"/>
5
+ </p>
6
+
7
+
8
+ **The universal entrypoint to HuggingFace `diffusers` for Strands agents.**
9
+ One tool — `use_diffusers` — wraps the whole library with zero hardcoding:
10
+ discover and run any of its 300+ pipelines across every modality. It's a *visual*
11
+ library, so here's what it actually produces — every asset below is **real
12
+ model output**, not a placeholder:
13
+
14
+ <table>
15
+ <tr>
16
+ <td align="center" width="25%">
17
+ <b>text → image</b><br/>
18
+ <img src="docs/assets/text_to_image.png" width="200"/><br/>
19
+ <sub>any of 108 image pipelines</sub>
20
+ </td>
21
+ <td align="center" width="25%">
22
+ <b>text → video</b><br/>
23
+ <img src="docs/assets/text_to_video.gif" width="200"/><br/>
24
+ <sub>LTX · Wan · CogVideoX · Hunyuan</sub>
25
+ </td>
26
+ <td align="center" width="25%">
27
+ <b>robot actions</b> 🤖<br/>
28
+ <img src="docs/assets/cosmos_world.gif" width="200"/><br/>
29
+ <sub>Cosmos WFM: world video + actions</sub>
30
+ </td>
31
+ <td align="center" width="25%">
32
+ <b>text → audio</b><br/>
33
+ <img src="docs/assets/text_to_audio.png" width="200"/><br/>
34
+ <sub>StableAudio · AudioLDM2</sub>
35
+ </td>
36
+ </tr>
37
+ </table>
38
+
39
+ ```
40
+ text / image / video / robot-state IN
41
+ image / video / audio / actions / 3d OUT
42
+ ```
43
+
44
+ The registry is built at runtime from `diffusers._import_structure`, so new
45
+ pipelines are supported automatically with no code change. Same philosophy as
46
+ `use_aws`, `use_lerobot`, and `use_transformers`: **discover, don't hardcode.**
47
+
48
+ <table>
49
+ <tr>
50
+ <td align="center" width="50%">
51
+ <b>3D mesh</b><br/>
52
+ <img src="docs/assets/mesh_render.png" width="200"/><br/>
53
+ <sub>ShapE - verts/faces to .ply</sub>
54
+ </td>
55
+ <td align="center" width="50%">
56
+ <b>audio</b> (<a href="docs/assets/text_to_audio.wav">hear the .wav</a>)<br/>
57
+ <img src="docs/assets/text_to_audio.png" width="300"/><br/>
58
+ <sub>StableAudio - waveform to .wav</sub>
59
+ </td>
60
+ </tr>
61
+ </table>
62
+
63
+ ## 100% coverage, zero hardcoding
64
+
65
+ <p align="center">
66
+ <img src="docs/assets/modality_coverage.png" width="640"/>
67
+ </p>
68
+
69
+ Every pipeline, model, and scheduler diffusers ships is reachable through one
70
+ tool. When diffusers adds a new pipeline, `use_diffusers` exposes it immediately.
71
+
72
+ ## Physical-AI: world-foundation models with action outputs
73
+
74
+ <p align="center">
75
+ <img src="docs/assets/cosmos_world.gif" width="360" alt="Cosmos world rollout"/>
76
+ </p>
77
+
78
+ <table>
79
+ <tr>
80
+ <td align="center"><img src="docs/assets/rollout_policy_1.gif" width="220"/><br/><sub>"Put the pot to the left of the purple item."</sub></td>
81
+ <td align="center"><img src="docs/assets/rollout_policy_2.gif" width="220"/><br/><sub>"Pick up the cloth and place it in the bowl."</sub></td>
82
+ <td align="center"><img src="docs/assets/rollout_policy_4.gif" width="220"/><br/><sub>"Open the drawer and place the spoon inside."</sub></td>
83
+ </tr>
84
+ </table>
85
+
86
+ Same robot, same first observation — **different task prompt → different imagined
87
+ world and different predicted actions.** Five real rollouts + all three Cosmos
88
+ action modes in the [WFM gallery](https://cagataycali.github.io/strands-diffusers/wfm/).
89
+
90
+
91
+ This is the headline. A Cosmos action-policy rollout predicts both a future world
92
+ **video** and the **robot action chunk** that produces it. One
93
+ `use_diffusers(action="run", ...)` returns a `.mp4` world video, a `.json` action
94
+ chunk (normalized `[-1, 1]`, shape `[num_chunks, T, action_dim]`), and optional
95
+ `.wav` sound — and you can *see* the motion:
96
+
97
+ <table>
98
+ <tr>
99
+ <td align="center"><b>time-series</b> (every dim, gripper highlighted)<br/><img src="docs/assets/cosmos_action_timeseries.png" width="380"/></td>
100
+ <td align="center"><b>end-effector path</b> (dims 0–2)<br/><img src="docs/assets/cosmos_action_trajectory.png" width="300"/></td>
101
+ </tr>
102
+ </table>
103
+
104
+ Verified end-to-end on NVIDIA Thor (`nvidia/Cosmos3-Nano`, bf16/cuda): one call
105
+ produced a world video `(17, 480, 640, 3)` and an action chunk `(1, 16, 10)`. See
106
+ [`examples/cosmos_action_policy.py`](examples/cosmos_action_policy.py).
107
+
108
+ ## Install
109
+
110
+ ```bash
111
+ pip install -e .
112
+ pip install -e ".[video,audio]" # mp4 export, wav I/O
113
+ ```
114
+
115
+ ## Quick start
116
+
117
+ ```python
118
+ from strands import Agent
119
+ from strands_diffusers import use_diffusers
120
+
121
+ agent = Agent(tools=[use_diffusers])
122
+ agent("Generate an image of a robot arm in a kitchen")
123
+ agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
124
+ ```
125
+
126
+ Direct:
127
+
128
+ ```python
129
+ use_diffusers(action="run", pipeline="StableDiffusionPipeline",
130
+ model="stabilityai/stable-diffusion-2-1",
131
+ parameters={"prompt": "a robot arm in a kitchen"})
132
+ # -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}
133
+ ```
134
+
135
+ ## Two layers
136
+
137
+ `run` loads a pipeline via `from_pretrained` and calls it; inputs are coerced
138
+ (path / URL / base64 to PIL / video), outputs auto-saved and returned by path.
139
+
140
+ `call` resolves and calls any diffusers class, function, or method (schedulers,
141
+ VAEs, `CosmosActionCondition`, utils). `cached:key` references resolve to live
142
+ objects; `"**"` unpacks a cached mapping into kwargs.
143
+
144
+ ```python
145
+ use_diffusers(action="call", target="CosmosActionCondition",
146
+ parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
147
+ use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
148
+ parameters={"prompt": "...", "action": "cached:cond"},
149
+ dtype="bfloat16", device="cuda")
150
+ ```
151
+
152
+ ## Discovery
153
+
154
+ | action | returns |
155
+ |---|---|
156
+ | `pipelines` / `models` / `schedulers` | classes + derived modality |
157
+ | `tasks` / `modalities` / `wfm` | task maps / modality groups / world-foundation models |
158
+ | `pipeline_info` / `inspect` | signature + docs |
159
+ | `visualize` | action chunk to plots + animation |
160
+ | `cache` / `clear_cache` | manage loaded pipelines |
161
+
162
+ ## Architecture
163
+
164
+ ```
165
+ core/registry.py zero-hardcode taxonomy from diffusers._import_structure
166
+ core/engine.py load/cache pipelines, auto device+dtype
167
+ core/io.py coerce inputs; serialize video/image/audio/action/mesh
168
+ core/viz.py render robot action chunks to plots + animation
169
+ tools/use_diffusers.py the single @tool: run + call + discovery
170
+ ```
171
+
172
+ ## Testing
173
+
174
+ ```bash
175
+ pip install -e ".[video,audio,dev]"
176
+ pytest tests/ -q # unit tests, no GPU, no downloads
177
+ python examples/smoke.py # E2E gate on tiny fixtures
178
+ ```
179
+
180
+ Every visual in this README and the [docs](https://cagataycali.github.io/strands-diffusers/)
181
+ is produced by real `use_diffusers` calls — regenerate them with:
182
+
183
+ ```bash
184
+ python examples/generate_docs_assets.py
185
+ ```
186
+
187
+ ## Docs
188
+
189
+ 📖 **[cagataycali.github.io/strands-diffusers](https://cagataycali.github.io/strands-diffusers/)**
190
+ — quickstart, full gallery (images / video / audio / actions / 3D), the
191
+ world-foundation-model story, discovery, and the two-layer design.
192
+
193
+ MIT