strands-transformers 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. strands_transformers-0.2.0/.github/workflows/docs.yml +59 -0
  2. strands_transformers-0.2.0/.github/workflows/release.yml +80 -0
  3. strands_transformers-0.2.0/.gitignore +13 -0
  4. strands_transformers-0.2.0/ARCHITECTURE.md +93 -0
  5. strands_transformers-0.2.0/PKG-INFO +252 -0
  6. strands_transformers-0.2.0/README.md +203 -0
  7. strands_transformers-0.2.0/agent.py +68 -0
  8. strands_transformers-0.2.0/docs/assets/audio/omni_speak.wav +0 -0
  9. strands_transformers-0.2.0/docs/assets/audio/tts_hello.wav +0 -0
  10. strands_transformers-0.2.0/docs/assets/extra.css +27 -0
  11. strands_transformers-0.2.0/docs/assets/img/blue.png +0 -0
  12. strands_transformers-0.2.0/docs/assets/img/green.png +0 -0
  13. strands_transformers-0.2.0/docs/assets/logo.svg +22 -0
  14. strands_transformers-0.2.0/docs/guide/agent-brain.md +94 -0
  15. strands_transformers-0.2.0/docs/guide/agentic-robot.md +79 -0
  16. strands_transformers-0.2.0/docs/guide/audio.md +103 -0
  17. strands_transformers-0.2.0/docs/guide/compat.md +23 -0
  18. strands_transformers-0.2.0/docs/guide/content-blocks.md +91 -0
  19. strands_transformers-0.2.0/docs/guide/contributing.md +34 -0
  20. strands_transformers-0.2.0/docs/guide/installation.md +70 -0
  21. strands_transformers-0.2.0/docs/guide/quickstart.md +50 -0
  22. strands_transformers-0.2.0/docs/guide/robotics.md +106 -0
  23. strands_transformers-0.2.0/docs/guide/the-tool.md +94 -0
  24. strands_transformers-0.2.0/docs/index.md +94 -0
  25. strands_transformers-0.2.0/docs/reference/architecture.md +51 -0
  26. strands_transformers-0.2.0/docs/reference/examples.md +56 -0
  27. strands_transformers-0.2.0/docs/reference/transformer-model.md +20 -0
  28. strands_transformers-0.2.0/docs/reference/use-transformers.md +8 -0
  29. strands_transformers-0.2.0/examples/README.md +137 -0
  30. strands_transformers-0.2.0/examples/audio_content_block.py +78 -0
  31. strands_transformers-0.2.0/examples/cosmos_reason_embodied.py +77 -0
  32. strands_transformers-0.2.0/examples/document_and_audio.py +90 -0
  33. strands_transformers-0.2.0/examples/local_model_agent.py +41 -0
  34. strands_transformers-0.2.0/examples/molmoact_vla.py +83 -0
  35. strands_transformers-0.2.0/examples/multimodal_advanced.py +102 -0
  36. strands_transformers-0.2.0/examples/multimodal_agent.py +64 -0
  37. strands_transformers-0.2.0/examples/multimodal_pipelines.py +153 -0
  38. strands_transformers-0.2.0/examples/omni_audio.py +104 -0
  39. strands_transformers-0.2.0/examples/openvla_vla.py +98 -0
  40. strands_transformers-0.2.0/examples/robot_reason_act_agent.py +169 -0
  41. strands_transformers-0.2.0/examples/smoke.py +122 -0
  42. strands_transformers-0.2.0/examples/smolvlm_image_text.py +48 -0
  43. strands_transformers-0.2.0/examples/vision_tasks.py +85 -0
  44. strands_transformers-0.2.0/mkdocs.yml +96 -0
  45. strands_transformers-0.2.0/pyproject.toml +68 -0
  46. strands_transformers-0.2.0/requirements.txt +5 -0
  47. strands_transformers-0.2.0/setup.cfg +4 -0
  48. strands_transformers-0.2.0/setup.py +5 -0
  49. strands_transformers-0.2.0/strands_transformers/__init__.py +52 -0
  50. strands_transformers-0.2.0/strands_transformers/_version.py +24 -0
  51. strands_transformers-0.2.0/strands_transformers/core/__init__.py +5 -0
  52. strands_transformers-0.2.0/strands_transformers/core/compat.py +251 -0
  53. strands_transformers-0.2.0/strands_transformers/core/engine.py +160 -0
  54. strands_transformers-0.2.0/strands_transformers/core/io.py +273 -0
  55. strands_transformers-0.2.0/strands_transformers/core/registry.py +195 -0
  56. strands_transformers-0.2.0/strands_transformers/models/__init__.py +5 -0
  57. strands_transformers-0.2.0/strands_transformers/models/transformers.py +1421 -0
  58. strands_transformers-0.2.0/strands_transformers/tools/__init__.py +5 -0
  59. strands_transformers-0.2.0/strands_transformers/tools/use_transformers.py +409 -0
  60. strands_transformers-0.2.0/strands_transformers/types/__init__.py +24 -0
  61. strands_transformers-0.2.0/strands_transformers/types/audio.py +91 -0
  62. strands_transformers-0.2.0/strands_transformers.egg-info/PKG-INFO +252 -0
  63. strands_transformers-0.2.0/strands_transformers.egg-info/SOURCES.txt +64 -0
  64. strands_transformers-0.2.0/strands_transformers.egg-info/dependency_links.txt +1 -0
  65. strands_transformers-0.2.0/strands_transformers.egg-info/requires.txt +33 -0
  66. strands_transformers-0.2.0/strands_transformers.egg-info/top_level.txt +1 -0
@@ -0,0 +1,59 @@
1
+ name: Docs
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ paths:
7
+ - "docs/**"
8
+ - "mkdocs.yml"
9
+ - "pyproject.toml"
10
+ - "strands_transformers/**"
11
+ - ".github/workflows/docs.yml"
12
+ workflow_dispatch:
13
+
14
+ # Allow one concurrent deployment; let in-progress runs finish.
15
+ concurrency:
16
+ group: pages
17
+ cancel-in-progress: false
18
+
19
+ permissions:
20
+ contents: read
21
+ pages: write
22
+ id-token: write
23
+
24
+ jobs:
25
+ build:
26
+ runs-on: ubuntu-latest
27
+ steps:
28
+ - uses: actions/checkout@v4
29
+
30
+ - name: Install uv
31
+ uses: astral-sh/setup-uv@v5
32
+
33
+ - name: Build docs
34
+ run: |
35
+ uv venv --python 3.12
36
+ # Docs build uses griffe's STATIC analysis for the API reference, so
37
+ # we only need the doc toolchain + the package importable as a path —
38
+ # NOT torch / transformers / strands (kept out via --no-deps).
39
+ uv pip install mkdocs-material "mkdocstrings[python]" pymdown-extensions
40
+ uv pip install --no-deps -e .
41
+ # Call the venv binary directly (avoid `uv run`, which would re-sync
42
+ # the full project and pull the heavy ML deps).
43
+ .venv/bin/mkdocs build --strict
44
+
45
+ - name: Upload Pages artifact
46
+ uses: actions/upload-pages-artifact@v3
47
+ with:
48
+ path: site
49
+
50
+ deploy:
51
+ needs: build
52
+ runs-on: ubuntu-latest
53
+ environment:
54
+ name: github-pages
55
+ url: ${{ steps.deployment.outputs.page_url }}
56
+ steps:
57
+ - name: Deploy to GitHub Pages
58
+ id: deployment
59
+ uses: actions/deploy-pages@v4
@@ -0,0 +1,80 @@
1
+ name: Release
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*.*.*"
7
+
8
+ permissions:
9
+ contents: write # create GitHub Release
10
+
11
+ jobs:
12
+ build:
13
+ runs-on: ubuntu-latest
14
+ steps:
15
+ - uses: actions/checkout@v4
16
+ with:
17
+ fetch-depth: 0 # full history so setuptools-scm sees the tag
18
+
19
+ - name: Install uv
20
+ uses: astral-sh/setup-uv@v5
21
+
22
+ - name: Build sdist + wheel
23
+ run: |
24
+ uv venv --python 3.12
25
+ uv pip install build
26
+ # setuptools-scm derives the version from the git tag (vX.Y.Z → X.Y.Z).
27
+ # `python -m build` is PEP 517-isolated (only needs setuptools-scm to
28
+ # build), so call the venv binary directly — no heavy ML deps pulled.
29
+ .venv/bin/python -m build
30
+
31
+ - name: Show built artifacts
32
+ run: ls -l dist/
33
+
34
+ - name: Upload artifacts
35
+ uses: actions/upload-artifact@v4
36
+ with:
37
+ name: dist
38
+ path: dist/
39
+
40
+ pypi:
41
+ needs: build
42
+ runs-on: ubuntu-latest
43
+ steps:
44
+ - uses: actions/download-artifact@v4
45
+ with:
46
+ name: dist
47
+ path: dist/
48
+ # Publish with the PYPI_API_TOKEN repo secret.
49
+ - name: Publish to PyPI
50
+ uses: pypa/gh-action-pypi-publish@release/v1
51
+ with:
52
+ password: ${{ secrets.PYPI_API_TOKEN }}
53
+
54
+ github-release:
55
+ needs: pypi
56
+ runs-on: ubuntu-latest
57
+ steps:
58
+ - uses: actions/download-artifact@v4
59
+ with:
60
+ name: dist
61
+ path: dist/
62
+ - name: Extract version
63
+ id: v
64
+ run: echo "version=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"
65
+ - name: Create GitHub Release
66
+ uses: softprops/action-gh-release@v2
67
+ with:
68
+ name: strands-transformers v${{ steps.v.outputs.version }}
69
+ generate_release_notes: true
70
+ files: dist/*
71
+ body: |
72
+ ## 🤗 strands-transformers v${{ steps.v.outputs.version }}
73
+
74
+ ```bash
75
+ uv pip install strands-transformers==${{ steps.v.outputs.version }}
76
+ # or
77
+ pip install strands-transformers==${{ steps.v.outputs.version }}
78
+ ```
79
+
80
+ 📖 Docs: https://cagataycali.github.io/strands-transformers/
@@ -0,0 +1,13 @@
1
+ .venv
2
+ .DS_Store
3
+ *.bak
4
+ __pycache__
5
+ *_merged
6
+ qwen3_*
7
+ sessions
8
+ *.egg-info
9
+ system_prompt.prompt
10
+ site/
11
+ strands_transformers/_version.py
12
+ dist/
13
+ build/
@@ -0,0 +1,93 @@
1
+ # Architecture
2
+
3
+ `strands-transformers` is the universal entrypoint to HuggingFace transformers for
4
+ Strands agents — 100% task & modality coverage with zero hardcoding. It reads
5
+ transformers' own task taxonomy at runtime, so new tasks/models work without code
6
+ changes (the same philosophy as `use_aws` wrapping boto3 and `use_lerobot`
7
+ wrapping lerobot).
8
+
9
+ ## Layout
10
+
11
+ ```
12
+ strands_transformers/
13
+ ├── core/
14
+ │ ├── registry.py # task taxonomy + dynamic class/attr resolution
15
+ │ ├── engine.py # load/cache pipelines & models, device/dtype selection
16
+ │ ├── io.py # multimodal input coercion + JSON-safe output serialization
17
+ │ └── compat.py # backward-compat shims for legacy trust_remote_code models
18
+ ├── models/
19
+ │ └── transformers.py # TransformerModel — a Strands model provider (local brain)
20
+ └── tools/
21
+ └── use_transformers.py # the single @tool agents call
22
+ examples/ # runnable, GPU-verified examples (see examples/README.md)
23
+ ```
24
+
25
+ ## Data flow
26
+
27
+ ```
28
+ agent → use_transformers(action=...) ─┬─ discovery → registry
29
+ ├─ run(task) → engine.get_pipeline → pipeline(inputs) → io.serialize_output
30
+ └─ call(target)→ registry.resolve_attr / cached: → obj(**params) → io.serialize_output
31
+ ```
32
+
33
+ ### `core/registry.py` — the source of truth
34
+ - `supported_tasks()` reads transformers' `SUPPORTED_TASKS` → `{task: {type, auto_models, default_model, pipeline_class}}`.
35
+ - `tasks_by_modality()`, `task_info()`, `resolve_task()` (tolerant of underscores/hyphens).
36
+ - `auto_model_classes()` lists every `Auto*` entrypoint.
37
+ - `resolve_attr(dotted)` resolves any dotted path into transformers (class, fn,
38
+ method), with a root-getattr fast path so transformers' lazy `__getattr__`
39
+ (which raises `AttributeError` on submodule-import attempts) never aborts
40
+ resolution.
41
+ - `describe(obj)` introspects signatures/docstrings for the `inspect` action.
42
+
43
+ ### `core/engine.py` — load, cache, run
44
+ - `select_device()` / `select_dtype()` auto-pick cuda/mps/cpu and bf16/fp16.
45
+ - `get_pipeline(task, model, ...)` builds & caches a `transformers.pipeline`.
46
+ Image-output tasks (depth-estimation, segmentation, image-to-image,
47
+ mask-generation) are kept in **float32** — half precision breaks PIL/numpy
48
+ post-processing.
49
+ - `load_object(auto_class, model_path, ...)` loads any `AutoModel*` / `AutoProcessor`
50
+ / `AutoTokenizer` via `from_pretrained` for the low-level `call` layer.
51
+ - `_CACHE` holds pipelines/models/processors keyed by `cache_key` for the session.
52
+
53
+ ### `core/io.py` — multimodal I/O
54
+ - **In:** `coerce_input` decodes base64 data-URIs to PIL/bytes; paths/URLs/arrays
55
+ pass through natively. `decode_wav` / `maybe_decode_audio_path` pre-decode WAV
56
+ files for audio tasks with the stdlib `wave` module (no ffmpeg needed).
57
+ - **Out:** `serialize_output` converts any result to JSON-safe form — audio dicts
58
+ → `.wav` artifacts, PIL images → `.png` artifacts, torch/numpy tensors → lists
59
+ (bf16/fp16 upcast to float32 first), with `_ensure_json_safe` as a final guard.
60
+
61
+ ### `core/compat.py` — legacy model support
62
+ Patches transformers 4.x→5.x gaps so old `trust_remote_code` models (e.g. OpenVLA)
63
+ run unchanged. Idempotent + re-entrant (`force=True`) because remote code can
64
+ re-import transformers mid-load:
65
+ - moved tokenizer symbols (`PaddingStrategy`, …) re-exposed on a real
66
+ file-backed `tokenization_utils` module;
67
+ - `AutoModelForVision2Seq` recreated as an `AutoModelForImageTextToText` alias,
68
+ asserted everywhere `auto_map` dispatch and `register_for_auto_class()` look;
69
+ - `tie_weights()` signature drift made kwarg-tolerant via an `init_weights` wrap;
70
+ - broken-torchcodec detection disabled so audio pipelines take the array path;
71
+ - `spoof_timm_version()` for models with hard timm pins.
72
+
73
+ ### `tools/use_transformers.py` — the one tool
74
+ Two layers + discovery:
75
+ - **run** — high-level pipelines (native multimodal). Folds separate images into
76
+ chat content for `image-text-to-text`; pre-decodes WAV for audio tasks.
77
+ - **call** — dynamic dispatch to any class/fn/method. `cached:key[.attr]` refs
78
+ resolve to live cached objects (including inside `parameters`); a `"**"` param
79
+ key unpacks a cached mapping into kwargs (e.g. `model.predict_action(**batch)`).
80
+ - **discovery** — `tasks`, `modalities`, `task_info`, `classes`, `inspect`,
81
+ `cache`, `clear_cache`, `compat`.
82
+
83
+ ### `models/transformers.py` — local brain
84
+ `TransformerModel` is a Strands model provider running any local HF causal-LM as
85
+ the agent's reasoning engine (streaming, chat templates, Qwen3 `<think>`, XML
86
+ tool-calling). Pair it with `use_transformers` for a fully local multimodal agent.
87
+
88
+ ## Testing philosophy
89
+
90
+ Every change is verified **end-to-end against the real implementation** — actual
91
+ model inference / pipelines, not mocks. `examples/smoke.py` is a fast (no large
92
+ downloads) 12-check gate across discovery + text/image/audio that exits non-zero
93
+ on failure.
@@ -0,0 +1,252 @@
1
+ Metadata-Version: 2.4
2
+ Name: strands-transformers
3
+ Version: 0.2.0
4
+ Summary: The universal entrypoint to HuggingFace transformers for Strands agents — 100% task & modality coverage, zero hardcoding.
5
+ Author-email: Cagatay Cali <cagataycali@icloud.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/cagataycali/strands-transformers
8
+ Project-URL: Repository, https://github.com/cagataycali/strands-transformers
9
+ Project-URL: Issues, https://github.com/cagataycali/strands-transformers/issues
10
+ Keywords: strands,transformers,huggingface,ai,agents,multimodal,vision,audio,video,vla,robotics,llm
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: strands-agents
23
+ Requires-Dist: transformers>=4.40
24
+ Requires-Dist: torch
25
+ Requires-Dist: pillow
26
+ Requires-Dist: numpy
27
+ Provides-Extra: audio
28
+ Requires-Dist: soundfile; extra == "audio"
29
+ Requires-Dist: librosa; extra == "audio"
30
+ Provides-Extra: vision
31
+ Requires-Dist: pillow; extra == "vision"
32
+ Requires-Dist: opencv-python; extra == "vision"
33
+ Requires-Dist: av; extra == "vision"
34
+ Provides-Extra: training
35
+ Requires-Dist: trl; extra == "training"
36
+ Requires-Dist: peft; extra == "training"
37
+ Requires-Dist: accelerate; extra == "training"
38
+ Requires-Dist: datasets; extra == "training"
39
+ Provides-Extra: dev
40
+ Requires-Dist: pytest>=7.0; extra == "dev"
41
+ Requires-Dist: black; extra == "dev"
42
+ Requires-Dist: ruff; extra == "dev"
43
+ Provides-Extra: docs
44
+ Requires-Dist: mkdocs-material; extra == "docs"
45
+ Requires-Dist: mkdocstrings[python]; extra == "docs"
46
+ Requires-Dist: pymdown-extensions; extra == "docs"
47
+ Provides-Extra: all
48
+ Requires-Dist: strands-transformers[audio,dev,docs,training,vision]; extra == "all"
49
+
50
+ <div align="center">
51
+ <h1>🤗 Strands Transformers</h1>
52
+ <h3>One tool wraps <i>all</i> of HuggingFace transformers. One provider makes any local model a multimodal agent brain.</h3>
53
+ <p><b>Agents that see, hear, and speak — 100% task coverage, zero hardcoding, fully local.</b></p>
54
+
55
+ <div>
56
+ <a href="https://pypi.org/project/strands-transformers/"><img alt="pypi" src="https://img.shields.io/pypi/v/strands-transformers"/></a>
57
+ <a href="https://github.com/cagataycali/strands-transformers/actions/workflows/docs.yml"><img alt="docs" src="https://github.com/cagataycali/strands-transformers/actions/workflows/docs.yml/badge.svg"/></a>
58
+ <a href="https://github.com/cagataycali/strands-transformers/issues"><img alt="issues" src="https://img.shields.io/github/issues/cagataycali/strands-transformers"/></a>
59
+ <img alt="python" src="https://img.shields.io/badge/python-3.10+-blue"/>
60
+ <img alt="transformers" src="https://img.shields.io/badge/🤗_transformers-24_tasks-yellow"/>
61
+ <img alt="modalities" src="https://img.shields.io/badge/modalities-text·image·video·audio-orange"/>
62
+ <img alt="license" src="https://img.shields.io/badge/license-MIT-green"/>
63
+ </div>
64
+ </div>
65
+
66
+ ---
67
+
68
+ `use_aws` wraps all of boto3. `use_lerobot` wraps all of lerobot.
69
+ **`use_transformers` wraps all of HuggingFace transformers** — every task, every
70
+ modality, in one tool that reads transformers' own taxonomy at runtime (new task
71
+ upstream ⇒ supported here with **no code change**). And **`TransformerModel`** makes
72
+ any **local** HF model a drop-in Strands brain that speaks the full content-block
73
+ protocol — image, video, audio, document. With Qwen2.5-Omni it even **speaks back**.
74
+
75
+ ```mermaid
76
+ flowchart LR
77
+ IN["📥 text · image · video<br/>audio · document · robot-state"]
78
+ TOOL["🛠️ use_transformers<br/><i>tool</i>"]
79
+ BRAIN["🧠 TransformerModel<br/><i>local agent brain</i>"]
80
+ OUT["📤 text · speech · image<br/>labels · actions"]
81
+ IN --> TOOL --> OUT
82
+ IN --> BRAIN --> OUT
83
+ classDef i fill:#7C4DFF,stroke:#5b34d6,color:#fff;
84
+ classDef c fill:#FFD21E,stroke:#E68A00,color:#3a2d00;
85
+ classDef o fill:#00E5FF,stroke:#00b3cc,color:#003844;
86
+ class IN i;
87
+ class TOOL,BRAIN c;
88
+ class OUT o;
89
+ ```
90
+
91
+ 📖 **[Full documentation →](https://cagataycali.github.io/strands-transformers/)** &nbsp;·&nbsp; built with MkDocs (`docs/`)
92
+
93
+ ## Install
94
+
95
+ ```bash
96
+ uv pip install strands-transformers # from PyPI
97
+ # or from source:
98
+ uv pip install -e . # or: pip install -e .
99
+ PYTHONPATH=. python examples/smoke.py # verify → "12/12 checks passed"
100
+ ```
101
+
102
+ <details>
103
+ <summary>Optional extras (audio · vision · training · docs)</summary>
104
+
105
+ ```bash
106
+ uv pip install -e ".[audio]" # soundfile, librosa (mp3/flac/ogg decode)
107
+ uv pip install -e ".[vision]" # opencv, av (video)
108
+ uv pip install -e ".[training]" # trl, peft, accelerate
109
+ uv pip install -e ".[docs]" # mkdocs-material, mkdocstrings
110
+ uv pip install -e ".[all]" # everything
111
+ ```
112
+ WAV audio works without extras. `device="auto"` picks cuda → mps → cpu (bf16 on GPU).
113
+ </details>
114
+
115
+ ## 60-second hello — a local vision agent
116
+
117
+ ```python
118
+ import io
119
+ from PIL import Image
120
+ from strands import Agent
121
+ from strands_transformers import TransformerModel
122
+
123
+ buf = io.BytesIO(); Image.new("RGB", (64, 64), (20, 200, 40)).save(buf, "PNG") # green square
124
+
125
+ model = TransformerModel(model_path="HuggingFaceTB/SmolVLM-256M-Instruct")
126
+ agent = Agent(model=model, system_prompt="You are concise.")
127
+
128
+ print(agent([
129
+ {"image": {"format": "png", "source": {"bytes": buf.getvalue()}}},
130
+ {"text": "Color? One word."},
131
+ ]))
132
+ # → Green.
133
+ ```
134
+
135
+ A 256M-param model in the standard Strands loop, *seeing* pixels through a content
136
+ block — no API key, no server. Swap `model_path` for any HF VLM.
137
+
138
+ ## See it work
139
+
140
+ Every output below is a **real** model result (CUDA · transformers 5.12 · torch 2.10):
141
+
142
+ | You give it | Script | It returns |
143
+ |-------------|--------|-----------|
144
+ | 🖼️ a green image + "Color?" | `examples/multimodal_agent.py` | `"Green."` |
145
+ | 🎬 brightening frames | `examples/multimodal_advanced.py` | `"BRIGHTER."` |
146
+ | 🧰 a tool screenshot (blue) | `examples/multimodal_advanced.py` | `"Blue."` |
147
+ | 📄 a text document | `examples/document_and_audio.py` | recovers `BANANA-42` |
148
+ | 🔊 a 440 Hz tone (Omni) | `examples/omni_audio.py` | `"It's a pure tone."` |
149
+ | 💬 "say: …can speak" (Omni) | `examples/omni_audio.py` | 🔊 real 24 kHz speech |
150
+
151
+ ▶️ **[Hear Omni speak + see all diagrams in the docs →](https://cagataycali.github.io/strands-transformers/)**
152
+
153
+ ## Two ways to use it
154
+
155
+ <details open>
156
+ <summary><b>As a tool</b> — <code>use_transformers</code> (discover · run · call)</summary>
157
+
158
+ ```python
159
+ from strands import Agent
160
+ from strands_transformers import use_transformers
161
+
162
+ agent = Agent(tools=[use_transformers])
163
+ agent("Transcribe recording.wav") # automatic-speech-recognition
164
+ agent("What's in scene.jpg?") # image-text-to-text
165
+ agent("Say 'hello from strands' as audio") # text-to-audio
166
+ agent("Detect objects in https://.../street.jpg") # object-detection
167
+ ```
168
+
169
+ Discover everything at runtime (`action="tasks" | "modalities" | "inspect" | …`),
170
+ run high-level pipelines, or `call` any class/fn/method for custom models.
171
+ → **[The tool guide](https://cagataycali.github.io/strands-transformers/guide/the-tool/)**
172
+ </details>
173
+
174
+ <details>
175
+ <summary><b>As the agent's brain</b> — <code>TransformerModel</code> (multimodal content blocks)</summary>
176
+
177
+ Pass `image` / `video` / `audio` / `document` content blocks (and media inside a
178
+ `toolResult`) — the provider auto-detects the model's processor and routes them.
179
+ All outputs below are **real** results (CUDA, transformers 5.12 / torch 2.10):
180
+
181
+ | Content block | Example | Verified output |
182
+ |---|---|---|
183
+ | `image` | `multimodal_agent.py` | `"Green."` |
184
+ | `video` (with `fps`) | `multimodal_advanced.py` | `"BRIGHTER."` |
185
+ | `image` in `toolResult` | `multimodal_advanced.py` | `"Blue."` |
186
+ | `document` | `document_and_audio.py` | recovers `BANANA-42` |
187
+ | `audio` *(our schema extension)* | `audio_content_block.py` | audio → text |
188
+ | `audio` in **and** speech out | `omni_audio.py` | hears + **speaks** (Qwen2.5-Omni) |
189
+
190
+ → **[Agent brain](https://cagataycali.github.io/strands-transformers/guide/agent-brain/)** ·
191
+ **[Content blocks](https://cagataycali.github.io/strands-transformers/guide/content-blocks/)** ·
192
+ **[Audio](https://cagataycali.github.io/strands-transformers/guide/audio/)**
193
+ </details>
194
+
195
+ <details>
196
+ <summary><b>Robotics / VLA</b> — camera + instruction → robot actions</summary>
197
+
198
+ Two layers, both transformers-native and GPU-verified:
199
+ - 🧠 **reason** — [Cosmos-Reason2-2B](https://huggingface.co/nvidia/Cosmos-Reason2-2B)
200
+ (a physical-AI VLM) plans over a scene via the `run` path: *"the red cube is in
201
+ the bottom left corner, so the arm should move there first."*
202
+ - ⚙️ **act** — VLA models expose `predict_action` via the `call` path:
203
+ [MolmoAct2](https://huggingface.co/allenai/MolmoAct2-SO100_101) → `[1,30,6]`;
204
+ [OpenVLA-7b](https://huggingface.co/openvla/openvla-7b) → 7-DoF (auto 4.x→5.x shims).
205
+
206
+ 🔗 **Full agentic loop** ([`examples/robot_reason_act_agent.py`](examples/robot_reason_act_agent.py)):
207
+ Cosmos-Reason *plans* over real RealSense frames → MolmoAct *acts* (`[1,30,6]`) —
208
+ perception→plan→action through one tool.
209
+
210
+ Lerobot-ecosystem policies (SmolVLA, π0, ACT, GR00T) use their own runtimes —
211
+ pair with `use_lerobot`.
212
+ → **[Robotics guide](https://cagataycali.github.io/strands-transformers/guide/robotics/)**
213
+ </details>
214
+
215
+ ## How it works
216
+
217
+ Nothing is hardcoded per task — `core/registry.py` reads transformers' own
218
+ `SUPPORTED_TASKS` at runtime, so coverage tracks upstream automatically.
219
+
220
+ <details>
221
+ <summary>Project layout</summary>
222
+
223
+ ```
224
+ strands_transformers/
225
+ ├── tools/use_transformers.py # the one @tool: discover · run · call
226
+ ├── models/transformers.py # TransformerModel — local multimodal agent brain
227
+ ├── types/audio.py # audio content-block extension
228
+ └── core/{registry,engine,io,compat}.py # taxonomy · load/cache · I/O · legacy shims
229
+ ```
230
+ → **[Architecture](https://cagataycali.github.io/strands-transformers/reference/architecture/)** ·
231
+ **[API reference](https://cagataycali.github.io/strands-transformers/reference/transformer-model/)**
232
+ </details>
233
+
234
+ ## Examples
235
+
236
+ 12 runnable, GPU-verified examples in [`examples/`](examples/) — image, video,
237
+ audio, document, Omni speech, VLA, and pipelines. Run any:
238
+
239
+ ```bash
240
+ PYTHONPATH=. python examples/<name>.py
241
+ ```
242
+
243
+ → **[Examples & FAQ](https://cagataycali.github.io/strands-transformers/reference/examples/)**
244
+
245
+ ## License
246
+
247
+ MIT — built with [Strands Agents SDK](https://github.com/strands-agents/sdk-python)
248
+ and [HuggingFace Transformers](https://github.com/huggingface/transformers).
249
+
250
+ <div align="center">
251
+ <sub>If this saved you a pile of per-model glue code, consider giving it a ⭐</sub>
252
+ </div>