npm - @mediapipe/tasks-vision - Versions diffs - 0.1.0-alpha-4 → 0.1.0-alpha-5 - Mend

@mediapipe/tasks-vision 0.1.0-alpha-4 → 0.1.0-alpha-5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +56 -23
package/package.json +1 -1
package/vision.d.ts +339 -8
package/vision_bundle.js +1 -1
package/wasm/vision_wasm_internal.js +441 -307
package/wasm/vision_wasm_internal.wasm +0 -0
package/wasm/vision_wasm_nosimd_internal.js +431 -307
package/wasm/vision_wasm_nosimd_internal.wasm +0 -0

package/README.md CHANGED Viewed

@@ -2,23 +2,57 @@
 This package contains the vision tasks for MediaPipe.
-## Object Detection
+## Face Stylizer
-The MediaPipe Object Detector task lets you detect the presence and location of
-multiple classes of objects within images or videos.
+The MediaPipe Face Stylizer lets you perform face stylization on images.
 ```
 const vision = await FilesetResolver.forVisionTasks(
     "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
 );
-const objectDetector = await ObjectDetector.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/object_detector/efficientdet_lite0_uint8.tflite"
+const faceStylizer = await FaceStylizer.createFromModelPath(vision,
+    "model.tflite"
 );
 const image = document.getElementById("image") as HTMLImageElement;
-const detections = objectDetector.detect(image);
+const stylizedImage = faceStylizer.stylize(image);
 ```
-For more information, refer to the [Object Detector](https://developers.google.com/mediapipe/solutions/vision/object_detector/web_js) documentation.
+## Gesture Recognition
+The MediaPipe Gesture Recognizer task lets you recognize hand gestures in real
+time, and provides the recognized hand gesture results along with the landmarks
+of the detected hands. You can use this task to recognize specific hand gestures
+from a user, and invoke application features that correspond to those gestures.
+```
+const vision = await FilesetResolver.forVisionTasks(
+    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
+);
+const gestureRecognizer = await GestureRecognizer.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
+);
+const image = document.getElementById("image") as HTMLImageElement;
+const recognitions = gestureRecognizer.recognize(image);
+```
+## Hand Landmark Detection
+The MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in
+an image. You can use this Task to localize key points of the hands and render
+visual effects over the hands.
+```
+const vision = await FilesetResolver.forVisionTasks(
+    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
+);
+const handLandmarker = await HandLandmarker.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/hand_landmarker/hand_landmarker.task"
+);
+const image = document.getElementById("image") as HTMLImageElement;
+const landmarks = handLandmarker.detect(image);
+```
+For more information, refer to the [Handlandmark Detection](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/web_js) documentation.
 ## Image Classification
@@ -56,40 +90,39 @@ imageSegmenter.segment(image, (masks, width, height) => {
 });
 ```
-## Gesture Recognition
+## Interactive Segmentation
-The MediaPipe Gesture Recognizer task lets you recognize hand gestures in real
-time, and provides the recognized hand gesture results along with the landmarks
-of the detected hands. You can use this task to recognize specific hand gestures
-from a user, and invoke application features that correspond to those gestures.
+The MediaPipe Interactive Segmenter lets you select a region of interest to
+segment an image by.
 ```
 const vision = await FilesetResolver.forVisionTasks(
     "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
 );
-const gestureRecognizer = await GestureRecognizer.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
+const interactiveSegmenter = await InteractiveSegmenter.createFromModelPath(
+    vision, "model.tflite"
 );
 const image = document.getElementById("image") as HTMLImageElement;
-const recognitions = gestureRecognizer.recognize(image);
+interactiveSegmenter.segment(image, { keypoint: { x: 0.1, y: 0.2 } },
+    (masks, width, height) => { ... }
+);
 ```
-## Handlandmark Detection
+## Object Detection
-The MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in
-an image. You can use this Task to localize key points of the hands and render
-visual effects over the hands.
+The MediaPipe Object Detector task lets you detect the presence and location of
+multiple classes of objects within images or videos.
 ```
 const vision = await FilesetResolver.forVisionTasks(
     "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
 );
-const handLandmarker = await HandLandmarker.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/hand_landmarker/hand_landmarker.task"
+const objectDetector = await ObjectDetector.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/object_detector/efficientdet_lite0_uint8.tflite"
 );
 const image = document.getElementById("image") as HTMLImageElement;
-const landmarks = handLandmarker.detect(image);
+const detections = objectDetector.detect(image);
 ```
-For more information, refer to the [Handlandmark Detection](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/web_js) documentation.
+For more information, refer to the [Object Detector](https://developers.google.com/mediapipe/solutions/vision/object_detector/web_js) documentation.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@mediapipe/tasks-vision",
-  "version": "0.1.0-alpha-4",
+  "version": "0.1.0-alpha-5",
   "description": "MediaPipe Vision Tasks",
   "main": "vision_bundle.js",
   "author": "mediapipe@google.com",

package/vision.d.ts CHANGED Viewed

@@ -224,6 +224,141 @@ export declare interface Embedding {
     headName: string;
 }
+/** Performs face stylization on images. */
+export declare class FaceStylizer extends VisionTaskRunner {
+    /**
+     * Initializes the Wasm runtime and creates a new Face Stylizer from the
+     * provided options.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param faceStylizerOptions The options for the Face Stylizer. Note
+     *     that either a path to the model asset or a model buffer needs to be
+     *     provided (via `baseOptions`).
+     */
+    static createFromOptions(wasmFileset: WasmFileset, faceStylizerOptions: FaceStylizerOptions): Promise<FaceStylizer>;
+    /**
+     * Initializes the Wasm runtime and creates a new Face Stylizer based on
+     * the provided model asset buffer.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param modelAssetBuffer A binary representation of the model.
+     */
+    static createFromModelBuffer(wasmFileset: WasmFileset, modelAssetBuffer: Uint8Array): Promise<FaceStylizer>;
+    /**
+     * Initializes the Wasm runtime and creates a new Face Stylizer based on
+     * the path to the model asset.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param modelAssetPath The path to the model asset.
+     */
+    static createFromModelPath(wasmFileset: WasmFileset, modelAssetPath: string): Promise<FaceStylizer>;
+    private constructor();
+    /**
+     * Sets new options for the Face Stylizer.
+     *
+     * Calling `setOptions()` with a subset of options only affects those
+     * options. You can reset an option back to its default value by
+     * explicitly setting it to `undefined`.
+     *
+     * @param options The options for the Face Stylizer.
+     */
+    setOptions(options: FaceStylizerOptions): Promise<void>;
+    /**
+     * Performs face stylization on the provided single image. The method returns
+     * synchronously once the callback returns. Only use this method when the
+     * FaceStylizer is created with the image running mode.
+     *
+     * The input image can be of any size. To ensure that the output image has
+     * reasonable quailty, the stylized output image size is determined by the
+     * model output size.
+     *
+     * @param image An image to process.
+     * @param callback The callback that is invoked with the stylized image. The
+     *    lifetime of the returned data is only guaranteed for the duration of the
+     *    callback.
+     */
+    stylize(image: ImageSource, callback: ImageCallback): void;
+    /**
+     * Performs face stylization on the provided single image. The method returns
+     * synchronously once the callback returns. Only use this method when the
+     * FaceStylizer is created with the image running mode.
+     *
+     * The 'imageProcessingOptions' parameter can be used to specify one or all
+     * of:
+     *  - the rotation to apply to the image before performing stylization, by
+     *    setting its 'rotationDegrees' property.
+     *  - the region-of-interest on which to perform stylization, by setting its
+     *   'regionOfInterest' property. If not specified, the full image is used.
+     *  If both are specified, the crop around the region-of-interest is extracted
+     *  first, then the specified rotation is applied to the crop.
+     *
+     * The input image can be of any size. To ensure that the output image has
+     * reasonable quailty, the stylized output image size is the smaller of the
+     * model output size and the size of the 'regionOfInterest' specified in
+     * 'imageProcessingOptions'.
+     *
+     * @param image An image to process.
+     * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
+     *    to process the input image before running inference.
+     * @param callback The callback that is invoked with the stylized image. The
+     *    lifetime of the returned data is only guaranteed for the duration of the
+     *    callback.
+     */
+    stylize(image: ImageSource, imageProcessingOptions: ImageProcessingOptions, callback: ImageCallback): void;
+    /**
+     * Performs face stylization on the provided video frame. Only use this method
+     * when the FaceStylizer is created with the video running mode.
+     *
+     * The input frame can be of any size. It's required to provide the video
+     * frame's timestamp (in milliseconds). The input timestamps must be
+     * monotonically increasing.
+     *
+     * To ensure that the output image has reasonable quality, the stylized
+     * output image size is determined by the model output size.
+     *
+     * @param videoFrame A video frame to process.
+     * @param timestamp The timestamp of the current frame, in ms.
+     * @param callback The callback that is invoked with the stylized image. The
+     *    lifetime of the returned data is only guaranteed for the duration of
+     * the callback.
+     */
+    stylizeForVideo(videoFrame: ImageSource, timestamp: number, callback: ImageCallback): void;
+    /**
+     * Performs face stylization on the provided video frame. Only use this
+     * method when the FaceStylizer is created with the video running mode.
+     *
+     * The 'imageProcessingOptions' parameter can be used to specify one or all
+     * of:
+     *  - the rotation to apply to the image before performing stylization, by
+     *    setting its 'rotationDegrees' property.
+     *  - the region-of-interest on which to perform stylization, by setting its
+     *   'regionOfInterest' property. If not specified, the full image is used.
+     *  If both are specified, the crop around the region-of-interest is
+     * extracted first, then the specified rotation is applied to the crop.
+     *
+     * The input frame can be of any size. It's required to provide the video
+     * frame's timestamp (in milliseconds). The input timestamps must be
+     * monotonically increasing.
+     *
+     * To ensure that the output image has reasonable quailty, the stylized
+     * output image size is the smaller of the model output size and the size of
+     * the 'regionOfInterest' specified in 'imageProcessingOptions'.
+     *
+     * @param videoFrame A video frame to process.
+     * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
+     *    to process the input image before running inference.
+     * @param timestamp The timestamp of the current frame, in ms.
+     * @param callback The callback that is invoked with the stylized image. The
+     *    lifetime of the returned data is only guaranteed for the duration of
+     * the callback.
+     */
+    stylizeForVideo(videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, timestamp: number, callback: ImageCallback): void;
+}
+/** Options to configure the MediaPipe Face Stylizer Task */
+export declare interface FaceStylizerOptions extends VisionTaskOptions {
+}
 /**
  * Resolves the files required for the MediaPipe Task APIs.
  *
@@ -363,7 +498,7 @@ export declare interface GestureRecognizerOptions extends VisionTaskOptions {
      */
     minTrackingConfidence?: number | undefined;
     /**
-     * Sets the optional `ClassifierOptions` controling the canned gestures
+     * Sets the optional `ClassifierOptions` controlling the canned gestures
      * classifier, such as score threshold, allow list and deny list of gestures.
      * The categories for canned gesture
      * classifiers are: ["None", "Closed_Fist", "Open_Palm", "Pointing_Up",
@@ -495,6 +630,16 @@ export declare interface HandLandmarkerResult {
     handednesses: Category[][];
 }
+/**
+ * A callback that receives an `ImageData` object from a Vision task. The
+ * lifetime of the underlying data is limited to the duration of the callback.
+ * If asynchronous processing is needed, all data needs to be copied before the
+ * callback returns.
+ *
+ * The `WebGLTexture` output type is reserved for future usage.
+ */
+export declare type ImageCallback = (image: ImageData | WebGLTexture, width: number, height: number) => void;
 /** Performs classification on images. */
 export declare class ImageClassifier extends VisionTaskRunner {
     /**
@@ -764,6 +909,18 @@ export declare class ImageSegmenter extends VisionTaskRunner {
      *    callback.
      */
     segment(image: ImageSource, imageProcessingOptions: ImageProcessingOptions, callback: SegmentationMaskCallback): void;
+    /**
+     * Get the category label list of the ImageSegmenter can recognize. For
+     * `CATEGORY_MASK` type, the index in the category mask corresponds to the
+     * category in the label list. For `CONFIDENCE_MASK` type, the output mask
+     * list at index corresponds to the category in the label list.
+     *
+     * If there is no labelmap provided in the model file, empty label array is
+     * returned.
+     *
+     * @return The labels used by the current model.
+     */
+    getLabels(): string[];
     /**
      * Performs image segmentation on the provided video frame and invokes the
      * callback with the response. The method returns synchronously once the
@@ -822,6 +979,136 @@ export declare interface ImageSegmenterOptions extends VisionTaskOptions {
  */
 export declare type ImageSource = HTMLCanvasElement | HTMLVideoElement | HTMLImageElement | ImageData | ImageBitmap;
+/**
+ * Performs interactive segmentation on images.
+ *
+ * Users can represent user interaction through `RegionOfInterest`, which gives
+ * a hint to InteractiveSegmenter to perform segmentation focusing on the given
+ * region of interest.
+ *
+ * The API expects a TFLite model with mandatory TFLite Model Metadata.
+ *
+ * Input tensor:
+ *   (kTfLiteUInt8/kTfLiteFloat32)
+ *   - image input of size `[batch x height x width x channels]`.
+ *   - batch inference is not supported (`batch` is required to be 1).
+ *   - RGB inputs is supported (`channels` is required to be 3).
+ *   - if type is kTfLiteFloat32, NormalizationOptions are required to be
+ *     attached to the metadata for input normalization.
+ * Output tensors:
+ *  (kTfLiteUInt8/kTfLiteFloat32)
+ *   - list of segmented masks.
+ *   - if `output_type` is CATEGORY_MASK, uint8 Image, Image vector of size 1.
+ *   - if `output_type` is CONFIDENCE_MASK, float32 Image list of size
+ *     `channels`.
+ *   - batch is always 1
+ */
+export declare class InteractiveSegmenter extends VisionTaskRunner {
+    /**
+     * Initializes the Wasm runtime and creates a new interactive segmenter from
+     * the provided options.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param interactiveSegmenterOptions The options for the Interactive
+     *     Segmenter. Note that either a path to the model asset or a model buffer
+     *     needs to be provided (via `baseOptions`).
+     * @return A new `InteractiveSegmenter`.
+     */
+    static createFromOptions(wasmFileset: WasmFileset, interactiveSegmenterOptions: InteractiveSegmenterOptions): Promise<InteractiveSegmenter>;
+    /**
+     * Initializes the Wasm runtime and creates a new interactive segmenter based
+     * on the provided model asset buffer.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param modelAssetBuffer A binary representation of the model.
+     * @return A new `InteractiveSegmenter`.
+     */
+    static createFromModelBuffer(wasmFileset: WasmFileset, modelAssetBuffer: Uint8Array): Promise<InteractiveSegmenter>;
+    /**
+     * Initializes the Wasm runtime and creates a new interactive segmenter based
+     * on the path to the model asset.
+     * @param wasmFileset A configuration object that provides the location of
+     *     the Wasm binary and its loader.
+     * @param modelAssetPath The path to the model asset.
+     * @return A new `InteractiveSegmenter`.
+     */
+    static createFromModelPath(wasmFileset: WasmFileset, modelAssetPath: string): Promise<InteractiveSegmenter>;
+    private constructor();
+    /**
+     * Sets new options for the interactive segmenter.
+     *
+     * Calling `setOptions()` with a subset of options only affects those
+     * options. You can reset an option back to its default value by
+     * explicitly setting it to `undefined`.
+     *
+     * @param options The options for the interactive segmenter.
+     * @return A Promise that resolves when the settings have been applied.
+     */
+    setOptions(options: InteractiveSegmenterOptions): Promise<void>;
+    /**
+     * Performs interactive segmentation on the provided single image and invokes
+     * the callback with the response.  The `roi` parameter is used to represent a
+     * user's region of interest for segmentation.
+     *
+     * If the output_type is `CATEGORY_MASK`, the callback is invoked with vector
+     * of images that represent per-category segmented image mask. If the
+     * output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of
+     * images that contains only one confidence image mask. The method returns
+     * synchronously once the callback returns.
+     *
+     * @param image An image to process.
+     * @param roi The region of interest for segmentation.
+     * @param callback The callback that is invoked with the segmented masks. The
+     *    lifetime of the returned data is only guaranteed for the duration of the
+     *    callback.
+     */
+    segment(image: ImageSource, roi: RegionOfInterest, callback: SegmentationMaskCallback): void;
+    /**
+     * Performs interactive segmentation on the provided single image and invokes
+     * the callback with the response. The `roi` parameter is used to represent a
+     * user's region of interest for segmentation.
+     *
+     * The 'image_processing_options' parameter can be used to specify the
+     * rotation to apply to the image before performing segmentation, by setting
+     * its 'rotationDegrees' field. Note that specifying a region-of-interest
+     * using the 'regionOfInterest' field is NOT supported and will result in an
+     * error.
+     *
+     * If the output_type is `CATEGORY_MASK`, the callback is invoked with vector
+     * of images that represent per-category segmented image mask. If the
+     * output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of
+     * images that contains only one confidence image mask. The method returns
+     * synchronously once the callback returns.
+     *
+     * @param image An image to process.
+     * @param roi The region of interest for segmentation.
+     * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
+     *    to process the input image before running inference.
+     * @param callback The callback that is invoked with the segmented masks. The
+     *    lifetime of the returned data is only guaranteed for the duration of the
+     *    callback.
+     */
+    segment(image: ImageSource, roi: RegionOfInterest, imageProcessingOptions: ImageProcessingOptions, callback: SegmentationMaskCallback): void;
+}
+/** Options to configure the MediaPipe Interactive Segmenter Task */
+export declare interface InteractiveSegmenterOptions extends TaskRunnerOptions {
+    /**
+     * The output type of segmentation results.
+     *
+     * The two supported modes are:
+     * - Category Mask:   Gives a single output mask where each pixel represents
+     *                    the class which the pixel in the original image was
+     *                    predicted to belong to.
+     * - Confidence Mask: Gives a list of output masks (one for each class). For
+     *                    each mask, the pixel represents the prediction
+     *                    confidence, usually in the [0.0, 0.1] range.
+     *
+     * Defaults to `CATEGORY_MASK`.
+     */
+    outputType?: "CATEGORY_MASK" | "CONFIDENCE_MASK" | undefined;
+}
 /**
  * Landmark represents a point in 3D space with x, y, z coordinates. The
  * landmark coordinates are in meters. z represents the landmark depth,
@@ -836,6 +1123,36 @@ export declare interface Landmark {
     z: number;
 }
+/**
+ * Copyright 2023 The MediaPipe Authors. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/**
+ * A keypoint, defined by the coordinates (x, y), normalized by the image
+ * dimensions.
+ */
+declare interface NormalizedKeypoint {
+    /** X in normalized image coordinates. */
+    x: number;
+    /** Y in normalized image coordinates. */
+    y: number;
+    /** Optional label of the keypoint. */
+    label?: string;
+    /** Optional score of the keypoint. */
+    score?: number;
+}
 /**
  * Copyright 2022 The MediaPipe Authors. All Rights Reserved.
  *
@@ -950,6 +1267,12 @@ declare interface RectF {
     bottom: number;
 }
+/** A Region-Of-Interest (ROI) to represent a region within an image. */
+export declare interface RegionOfInterest {
+    /** The ROI in keypoint format. */
+    keypoint: NormalizedKeypoint;
+}
 /**
  * The two running modes of a vision task.
  * 1) The image mode for processing single image inputs.
@@ -958,17 +1281,18 @@ declare interface RectF {
 declare type RunningMode = "IMAGE" | "VIDEO";
 /**
- * The ImageSegmenter returns the segmentation result as a Uint8Array (when
- * the default mode of `CATEGORY_MASK` is used) or as a Float32Array (for
- * output type `CONFIDENCE_MASK`). The `WebGLTexture` output type is reserved
- * for future usage.
+ * The segmentation tasks return the segmentation either as a WebGLTexture (when
+ * the output is on GPU) or as a typed JavaScript arrays for CPU-based
+ * category or confidence masks. `Uint8ClampedArray`s are used to represend
+ * CPU-based category masks and `Float32Array`s are used for CPU-based
+ * confidence masks.
  */
-export declare type SegmentationMask = Uint8Array | Float32Array | WebGLTexture;
+export declare type SegmentationMask = Uint8ClampedArray | Float32Array | WebGLTexture;
 /**
- * A callback that receives the computed masks from the image segmenter. The
+ * A callback that receives the computed masks from the segmentation tasks. The
  * callback either receives a single element array with a category mask (as a
- * `[Uint8Array]`) or multiple confidence masks (as a `Float32Array[]`).
+ * `[Uint8ClampedArray]`) or multiple confidence masks (as a `Float32Array[]`).
  * The returned data is only valid for the duration of the callback. If
  * asynchronous processing is needed, all data needs to be copied before the
  * callback returns.
@@ -990,6 +1314,13 @@ declare interface TaskRunnerOptions {
 /** The options for configuring a MediaPipe vision task. */
 declare interface VisionTaskOptions extends TaskRunnerOptions {
+    /**
+     * The canvas element to bind textures to. This has to be set for GPU
+     * processing. The task will initialize a WebGL context and throw an eror if
+     * this fails (e.g. if you have already initialized a different type of
+     * context).
+     */
+    canvas?: HTMLCanvasElement | OffscreenCanvas;
     /**
      * The running mode of the task. Default to the image mode.
      * Vision tasks have two running modes: