PyPI - gst-python-ml - Versions diffs - 1.0.3__tar.gz → 1.0.4__tar.gz - Mend

gst-python-ml 1.0.3tar.gz → 1.0.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

{gst_python_ml-1.0.3 → gst_python_ml-1.0.4}/MANIFEST.in RENAMED Viewed

@@ -3,5 +3,3 @@ include COPYING
 recursive-include plugins/python *
 global-exclude __pycache__/*
 global-exclude *.pyc
-prune plugins/python/birdseye
-exclude plugins/python/birds_eye.py

{gst_python_ml-1.0.3/plugins/python/gst_python_ml.egg-info → gst_python_ml-1.0.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gst-python-ml
-Version: 1.0.3
+Version: 1.0.4
 Summary: An ML package for GStreamer
 Author-email: Aaron Boxer <aaron.boxer@collabora.com>
 Project-URL: Homepage, https://github.com/collabora/gst-python-ml
@@ -51,9 +51,13 @@ Supported functionality includes:
 1. object detection
 1. tracking
+1. pose estimation (COCO 17-keypoint skeleton)
+1. monocular depth estimation
+1. zero-shot classification (CLIP / SigLIP)
 1. video captioning
 1. translation
 1. transcription
+1. voice activity detection
 1. speech to text
 1. text to speech
 1. text to image
@@ -241,12 +245,6 @@ or
 Now, in the container shell, set up `uv` `venv` as detailed above.
-## IMPORTANT NOTES
-### Birdseye
-To use `pyml_birdseye`, additional pip requirements must be installed from the `plugins/python/birdseye` folder.
 ## Post Install
@@ -343,6 +341,147 @@ GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin
 ```
+### Pose Estimation
+`pyml_yolo_pose` supports all YOLO pose models. Recommended model names:
+```
+yolo11n-pose  (fastest)
+yolo11s-pose
+yolo11m-pose  (best accuracy)
+```
+#### YOLO pose with skeleton visualization (rendered on frame)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_yolo_pose model-name=yolo11n-pose device=cuda \
+    ! videoconvert ! autovideosink sync=false
+```
+#### YOLO pose with bounding box overlay (metadata only, no in-element rendering)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_yolo_pose model-name=yolo11n-pose device=cuda visualize=false \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+### Depth Estimation
+`pyml_depth` supports DepthAnything V2 models from HuggingFace. Available model sizes:
+```
+depth-anything/Depth-Anything-V2-Small-hf  (fastest, ~100 MB)
+depth-anything/Depth-Anything-V2-Base-hf
+depth-anything/Depth-Anything-V2-Large-hf  (most accurate)
+```
+Available colormaps: `inferno` (default), `jet`, `viridis`, `plasma`, `magma`
+#### DepthAnything V2 with inferno colormap
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda \
+    ! videoconvert ! autovideosink sync=false
+```
+#### DepthAnything V2 with jet colormap
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda colormap=jet \
+    ! videoconvert ! autovideosink sync=false
+```
+#### Depth with reduced compute via frame-stride
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda frame-stride=2 \
+    ! videoconvert ! autovideosink sync=false
+```
+#### Depth with original video side-by-side (tee)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! tee name=t \
+    t. ! queue ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda ! videoconvert ! autovideosink sync=false \
+    t. ! queue ! videoconvert ! autovideosink sync=false
+```
+### Zero-Shot Classification (CLIP / SigLIP)
+`pyml_clip` classifies each frame against a user-defined set of text labels
+with no fixed label set — labels are set at pipeline launch time.
+Supported models:
+```
+openai/clip-vit-base-patch32       (default, ~600 MB)
+openai/clip-vit-large-patch14      (more accurate, ~1.7 GB)
+google/siglip-base-patch16-224     (SigLIP, better zero-shot accuracy)
+google/siglip-large-patch16-384    (SigLIP large)
+```
+#### CLIP with custom labels
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
+              labels="person, bicycle, car, dog, cat" top-k=3 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+#### SigLIP (better zero-shot accuracy than CLIP)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=google/siglip-base-patch16-224 device=cuda \
+              labels="people walking, empty street, crowd, indoor scene" top-k=1 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+#### CLIP with threshold (only report labels above 20% confidence)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
+              labels="person, bicycle, car, dog, cat" threshold=0.2 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+### Voice Activity Detection
+#### Standalone VAD with metadata (pass-through, speech probability attached to buffers)
+```
+GST_DEBUG=4 gst-launch-1.0 pulsesrc ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.7 ! fakesink
+```
+#### VAD gating before transcription (mute silent audio, reduce Whisper latency)
+```
+GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! audioresample ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.6 gate=true ! pyml_whispertranscribe device=cuda language=ko ! fakesink
+```
 ### Transcription
 #### transcription with initial prompt set
@@ -417,14 +556,6 @@ https://huggingface.co/models?sort=trending&search=Helsinki
 GST_DEBUG=3 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480 ! tee name=t t. ! queue ! textoverlay name=overlay wait-text=false ! videoconvert ! autovideosink t. ! queue leaky=2 max-size-buffers=1 ! videoconvertscale ! video/x-raw,width=240,height=180 ! pyml_caption_qwen device=cuda:0 prompt="In one sentence, describe what you see?" model-name="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" name=cap cap.src ! fakesink async=0 sync=0 cap.text_src ! queue ! coalescehistory history-length=10 ! pyml_llm model-name="Qwen/Qwen3-0.6B" device=cuda system-prompt="You receive the history of what happened in recent times, summarize it nicely with excitement but NEVER mention the specific times. Focus on the most recent events." ! queue ! overlay.text_sink
 ```
-### Bird's Eye View
-`GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videoconvert ! pyml_birdseye ! videoconvert ! autovideosink`
-`GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videorate ! video/x-raw,framerate=30/1 ! videoconvert ! pyml_birdseye ! videoconvert ! openh264enc ! h264parse ! matroskamux ! filesink location=output.mkv`
 ### kafkasink
 #### Setting up kafka network

{gst_python_ml-1.0.3 → gst_python_ml-1.0.4}/README.md RENAMED Viewed

@@ -6,9 +6,13 @@ Supported functionality includes:
 1. object detection
 1. tracking
+1. pose estimation (COCO 17-keypoint skeleton)
+1. monocular depth estimation
+1. zero-shot classification (CLIP / SigLIP)
 1. video captioning
 1. translation
 1. transcription
+1. voice activity detection
 1. speech to text
 1. text to speech
 1. text to image
@@ -196,12 +200,6 @@ or
 Now, in the container shell, set up `uv` `venv` as detailed above.
-## IMPORTANT NOTES
-### Birdseye
-To use `pyml_birdseye`, additional pip requirements must be installed from the `plugins/python/birdseye` folder.
 ## Post Install
@@ -298,6 +296,147 @@ GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin
 ```
+### Pose Estimation
+`pyml_yolo_pose` supports all YOLO pose models. Recommended model names:
+```
+yolo11n-pose  (fastest)
+yolo11s-pose
+yolo11m-pose  (best accuracy)
+```
+#### YOLO pose with skeleton visualization (rendered on frame)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_yolo_pose model-name=yolo11n-pose device=cuda \
+    ! videoconvert ! autovideosink sync=false
+```
+#### YOLO pose with bounding box overlay (metadata only, no in-element rendering)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_yolo_pose model-name=yolo11n-pose device=cuda visualize=false \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+### Depth Estimation
+`pyml_depth` supports DepthAnything V2 models from HuggingFace. Available model sizes:
+```
+depth-anything/Depth-Anything-V2-Small-hf  (fastest, ~100 MB)
+depth-anything/Depth-Anything-V2-Base-hf
+depth-anything/Depth-Anything-V2-Large-hf  (most accurate)
+```
+Available colormaps: `inferno` (default), `jet`, `viridis`, `plasma`, `magma`
+#### DepthAnything V2 with inferno colormap
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda \
+    ! videoconvert ! autovideosink sync=false
+```
+#### DepthAnything V2 with jet colormap
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda colormap=jet \
+    ! videoconvert ! autovideosink sync=false
+```
+#### Depth with reduced compute via frame-stride
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda frame-stride=2 \
+    ! videoconvert ! autovideosink sync=false
+```
+#### Depth with original video side-by-side (tee)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! tee name=t \
+    t. ! queue ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda ! videoconvert ! autovideosink sync=false \
+    t. ! queue ! videoconvert ! autovideosink sync=false
+```
+### Zero-Shot Classification (CLIP / SigLIP)
+`pyml_clip` classifies each frame against a user-defined set of text labels
+with no fixed label set — labels are set at pipeline launch time.
+Supported models:
+```
+openai/clip-vit-base-patch32       (default, ~600 MB)
+openai/clip-vit-large-patch14      (more accurate, ~1.7 GB)
+google/siglip-base-patch16-224     (SigLIP, better zero-shot accuracy)
+google/siglip-large-patch16-384    (SigLIP large)
+```
+#### CLIP with custom labels
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
+              labels="person, bicycle, car, dog, cat" top-k=3 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+#### SigLIP (better zero-shot accuracy than CLIP)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=google/siglip-base-patch16-224 device=cuda \
+              labels="people walking, empty street, crowd, indoor scene" top-k=1 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+#### CLIP with threshold (only report labels above 20% confidence)
+```
+gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
+  d. ! queue \
+    ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
+    ! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
+              labels="person, bicycle, car, dog, cat" threshold=0.2 \
+    ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
+```
+### Voice Activity Detection
+#### Standalone VAD with metadata (pass-through, speech probability attached to buffers)
+```
+GST_DEBUG=4 gst-launch-1.0 pulsesrc ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.7 ! fakesink
+```
+#### VAD gating before transcription (mute silent audio, reduce Whisper latency)
+```
+GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! audioresample ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.6 gate=true ! pyml_whispertranscribe device=cuda language=ko ! fakesink
+```
 ### Transcription
 #### transcription with initial prompt set
@@ -372,14 +511,6 @@ https://huggingface.co/models?sort=trending&search=Helsinki
 GST_DEBUG=3 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480 ! tee name=t t. ! queue ! textoverlay name=overlay wait-text=false ! videoconvert ! autovideosink t. ! queue leaky=2 max-size-buffers=1 ! videoconvertscale ! video/x-raw,width=240,height=180 ! pyml_caption_qwen device=cuda:0 prompt="In one sentence, describe what you see?" model-name="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" name=cap cap.src ! fakesink async=0 sync=0 cap.text_src ! queue ! coalescehistory history-length=10 ! pyml_llm model-name="Qwen/Qwen3-0.6B" device=cuda system-prompt="You receive the history of what happened in recent times, summarize it nicely with excitement but NEVER mention the specific times. Focus on the most recent events." ! queue ! overlay.text_sink
 ```
-### Bird's Eye View
-`GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videoconvert ! pyml_birdseye ! videoconvert ! autovideosink`
-`GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videorate ! video/x-raw,framerate=30/1 ! videoconvert ! pyml_birdseye ! videoconvert ! openh264enc ! h264parse ! matroskamux ! filesink location=output.mkv`
 ### kafkasink
 #### Setting up kafka network

{gst_python_ml-1.0.3 → gst_python_ml-1.0.4}/plugins/python/base_objectdetector.py RENAMED Viewed

@@ -124,7 +124,7 @@ class BaseObjectDetector(VideoTransform):
                 count = GstAnalytics.relation_get_length(attached_meta)
                 self.logger.info(f"Total metadata relations attached: {count}")
             else:
-                self.logger.error("No metadata attached to buffer")
+                self.logger.debug("No detections on this buffer")
             return Gst.FlowReturn.OK

gst-python-ml 1.0.3__tar.gz → 1.0.4__tar.gz

gst-python-ml 1.0.3tar.gz → 1.0.4tar.gz