PyPI - mediapipe-nightly - Versions diffs - 0.10.21.post20241223__cp310-cp310-manylinux_2_28_x86_64.whl - Mend

mediapipe-nightly 0.10.21.post20241223__cp310-cp310-manylinux_2_28_x86_64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (593) hide show

mediapipe/util/sequence/media_sequence.py ADDED Viewed

@@ -0,0 +1,716 @@
+"""Copyright 2019 The MediaPipe Authors.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+This script defines a large number of getters and setters for storing
+multimedia, such as video or audio, and related machine learning data in
+tf.train.SequenceExamples. These getters and setters simplify sharing
+data by enforcing common patterns for storing data in SequenceExample
+key-value pairs.
+The constants, macros, and functions are organized into 6 groups: clip
+metadata, clip label related, segment related, bounding-box related, image
+related, feature list related, and keyframe related. The following examples
+will walk through common task structures, but the relevant data to store can
+vary by task.
+The clip metadata group is generally data about the media and stored in the
+SequenceExample.context. Specifying the metadata enables media pipelines,
+such as MediaPipe, to retrieve that data. Typically, set_clip_data_path,
+set_clip_start_timestamp, and set_clip_end_timestamp define which data to use
+without storing the data itself. Example:
+  tensorflow.train.SequenceExample sequence
+  set_clip_data_path("/relative/path/to/data.mp4", sequence)
+  set_clip_start_timestamp(0, sequence)
+  set_clip_end_timestamp(10000000, sequence)  # 10 seconds in microseconds.
+The clip label group adds labels that apply to the entire media clip. To
+annotate that a video clip has a particular label, set the clip metadata
+above and also set the set_clip_label_index and set_clip_label_string. Most
+training pipelines will only use the label index or string, but we recommend
+storing both to improve readability while maintaining ease of use.
+Example:
+  set_clip_label_string(("run", "jump"), sequence)
+  set_Clip_label_index((35, 47), sequence)
+The segment group is generally data about time spans within the media clip
+and stored in the SequenceExample.context. In this code, continuous lengths
+of media are called clips, and each clip may have subregions of interest that
+are called segments. To annotate that a video clip has time spans with labels
+set the clip metadata above and use the functions set_segment_start_timestamp,
+set_segment_end_timestamp, set_segment_label_index, and
+set_segment_label_string. Most training pipelines will only use the label index
+or string, but we recommend storing both to improve readability while
+maintaining ease of use. By listing segments as times, the frame rate or other
+properties can change without affecting the labels.
+Example:
+  set_segment_start_timestamp((500000, 1000000), sequence)  # in microseconds
+  set_segment_end_timestamp((2000000, 6000000), sequence)
+  set_segment_label_index((35, 47), sequence)
+  set_segment_label_string(("run", "jump"), sequence)
+The bounding box group is useful for identifying spatio-temporal annotations
+for detection, tracking, or action recognition. The exact keys that are
+needed can vary by task, but to annotate a video clip for detection set the
+clip metadata above and use repeatedly call add_bbox, add_bbox_timestamp,
+add_bbox_label_index, and add_bbox_label_string. Most training pipelines will
+only use the label index or string, but we recommend storing both to improve
+readability while maintaining ease of use. Because bounding boxes are
+assigned to timepoints in a video, changing the image frame rate can can
+change the alignment. The media_sequence.h's ReconcileMetadata function can
+align bounding boxes to the nearest image.
+The image group is useful for storing data as sequential 2D arrays, typically
+encoded as bytes. Images can be RGB images stored as JPEG, discrete masks
+stored as PNG, or some other format. Parameters that are static over time are
+set in the context using set_image_width, set_image_height, set_image_format,
+etc. The series of frames and timestamps are then added with add_image_encoded
+and
+add_image_timestamp. For discrete masks, the class or instance indices can be
+mapped to labels or classes using
+set_class_segmentation_class_label_{index,string} and
+set_instance_segmentation_object_class_index.
+The feature list group is useful for storing audio and extracted features,
+such as per-frame embeddings. SequenceExamples only store lists of floats per
+timestep, so the dimensions are stored in the context to enable reshaping.
+For example, set_feature_dimensions and repeatedly calling add_feature_floats
+and add_feature_timestamp adds per-frame embeddings. The feature methods also
+support audio features.
+Macros for common patterns are created in media_sequence_util.py and are used
+here extensively. Because these macros are formulaic, I will only include a
+usage example here in the code rather than repeating documentation for every
+instance. This header defines additional functions to simplify working with
+MediaPipe types.
+Each msu.create_{TYPE}_context_feature takes a NAME and a KEY. It provides
+setters and getters for SequenceExamples and stores a single value under KEY
+in the context field. The provided functions are has_${NAME}, get_${NAME},
+set_${Name}, and clear_${NAME}.
+Eg.
+  tf.train.SequenceExample example
+  set_data_path("data_path", example)
+  if has_data_path(example):
+     data_path = get_data_path(example)
+     clear_data_path(example)
+Each msu.create_{TYPE}_list_context_feature takes a NAME and a KEY. It provides
+setters and getters for SequenceExamples and stores a sequence of values
+under KEY in the context field. The provided functions are has_${NAME},
+get_${NAME}, set_${Name}, clear_${NAME}, get_${NAME}_at, and add_${NAME}.
+Eg.
+  tf.train.SequenceExample example
+  set_clip_label_string(("run", "jump"), example)
+  if has_clip_label_string(example):
+     values = get_clip_label_string(example)
+     clear_clip_label_string(example)
+Each msu.create_{TYPE}_feature_list takes a NAME and a KEY. It provides setters
+and getters for SequenceExamples and stores a single value in each feature field
+under KEY of the feature_lists field. The provided functions are has_${NAME},
+get_${NAME}, clear_${NAME}, get_${NAME}_size, get_${NAME}_at, and add_${NAME}.
+  tf.train.SequenceExample example
+  add_image_timestamp(1000000, example)
+  add_image_timestamp(2000000, example)
+  if has_image_timestamp(example):
+    for i in range(get_image_timestamp_size()):
+      timestamp = get_image_timestamp_at(example, i)
+    clear_image_timestamp(example)
+Each VECTOR_{TYPE}_FEATURE_LIST takes a NAME and a KEY. It provides setters
+and getters for SequenceExamples and stores a sequence of values in each
+feature field under KEY of the feature_lists field. The provided functions
+are Has${NAME}, Get${NAME}, Clear${NAME}, Get${NAME}Size, Get${NAME}At, and
+Add${NAME}.
+  tf.train.SequenceExample example
+  add_bbox_label_string(("run", "jump"), example)
+  add_bbox_label_string(("run", "fall"), example)
+  if has_bbox_label_string(example):
+    for i in range(get_bbox_label_string_size(example)):
+      labels = get_bbox_label_string_at(example, i)
+    clear_bbox_label_string(example)
+As described in media_sequence_util.h, each of these functions can take an
+additional string prefix argument as their first argument. The prefix can
+be fixed with a new NAME by using functools.partial. Prefixes are used to
+identify common storage patterns (e.g. storing an image along with the height
+and width) under different names (e.g. storing a left and right image in a
+stereo pair.) An example creating functions such as
+add_left_image_encoded that adds a string under the key "LEFT/image/encoded"
+ add_left_image_encoded = msu.function_with_default(add_image_encoded, "LEFT")
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from mediapipe.util.sequence import media_sequence_util
+msu = media_sequence_util
+_HAS_DYNAMIC_ATTRIBUTES = True
+##################################  METADATA  #################################
+# A unique identifier for each example.
+EXAMPLE_ID_KEY = "example/id"
+# The name o fthe data set, including the version.
+EXAMPLE_DATASET_NAME_KEY = "example/dataset_name"
+# String flags or attributes for this example within a data set.
+EXAMPLE_DATASET_FLAG_STRING_KEY = "example/dataset/flag/string"
+# The relative path to the data on disk from some root directory.
+CLIP_DATA_PATH_KEY = "clip/data_path"
+# Any identifier for the media beyond the data path.
+CLIP_MEDIA_ID_KEY = "clip/media_id"
+# Yet another alternative identifier.
+ALTERNATIVE_CLIP_MEDIA_ID_KEY = "clip/alternative_media_id"
+# The encoded bytes for storing media directly in the SequenceExample.
+CLIP_ENCODED_MEDIA_BYTES_KEY = "clip/encoded_media_bytes"
+# The start time for the encoded media if not preserved during encoding.
+CLIP_ENCODED_MEDIA_START_TIMESTAMP_KEY = "clip/encoded_media_start_timestamp"
+# The start time, in microseconds, for the start of the clip in the media.
+CLIP_START_TIMESTAMP_KEY = "clip/start/timestamp"
+# The end time, in microseconds, for the end of the clip in the media.
+CLIP_END_TIMESTAMP_KEY = "clip/end/timestamp"
+# A list of label indices for this clip.
+CLIP_LABEL_INDEX_KEY = "clip/label/index"
+# A list of label strings for this clip.
+CLIP_LABEL_STRING_KEY = "clip/label/string"
+# A list of label confidences for this clip.
+CLIP_LABEL_CONFIDENCE_KEY = "clip/label/confidence"
+# A list of label start timestamps for this clip.
+CLIP_LABEL_START_TIMESTAMP_KEY = "clip/label/start/timestamp"
+# A list of label end timestamps for this clip.
+CLIP_LABEL_END_TIMESTAMP_KEY = "clip/label/end/timestamp"
+msu.create_bytes_context_feature(
+    "example_id", EXAMPLE_ID_KEY, module_dict=globals())
+msu.create_bytes_context_feature(
+    "example_dataset_name", EXAMPLE_DATASET_NAME_KEY, module_dict=globals())
+msu.create_bytes_list_context_feature(
+    "example_dataset_flag_string", EXAMPLE_DATASET_FLAG_STRING_KEY,
+    module_dict=globals())
+msu.create_bytes_context_feature(
+    "clip_media_id", CLIP_MEDIA_ID_KEY, module_dict=globals())
+msu.create_bytes_context_feature(
+    "clip_alternative_media_id", ALTERNATIVE_CLIP_MEDIA_ID_KEY,
+    module_dict=globals())
+msu.create_bytes_context_feature(
+    "clip_encoded_media_bytes", CLIP_ENCODED_MEDIA_BYTES_KEY,
+    module_dict=globals())
+msu.create_bytes_context_feature(
+    "clip_data_path", CLIP_DATA_PATH_KEY, module_dict=globals())
+msu.create_int_context_feature(
+    "clip_encoded_media_start_timestamp",
+    CLIP_ENCODED_MEDIA_START_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_context_feature(
+    "clip_start_timestamp", CLIP_START_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_context_feature(
+    "clip_end_timestamp", CLIP_END_TIMESTAMP_KEY, module_dict=globals())
+msu.create_bytes_list_context_feature(
+    "clip_label_string", CLIP_LABEL_STRING_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "clip_label_index", CLIP_LABEL_INDEX_KEY, module_dict=globals())
+msu.create_float_list_context_feature(
+    "clip_label_confidence", CLIP_LABEL_CONFIDENCE_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "clip_label_start_timestamp",
+    CLIP_LABEL_START_TIMESTAMP_KEY,
+    module_dict=globals())
+msu.create_int_list_context_feature(
+    "clip_label_end_timestamp",
+    CLIP_LABEL_END_TIMESTAMP_KEY,
+    module_dict=globals())
+##################################  SEGMENTS  #################################
+# A list of segment start times in microseconds.
+SEGMENT_START_TIMESTAMP_KEY = "segment/start/timestamp"
+# A list of indices marking the first frame index >= the start timestamp.
+SEGMENT_START_INDEX_KEY = "segment/start/index"
+# A list of segment end times in microseconds.
+SEGMENT_END_TIMESTAMP_KEY = "segment/end/timestamp"
+# A list of indices marking the last frame index <= the end timestamp.
+SEGMENT_END_INDEX_KEY = "segment/end/index"
+# A list with the label index for each segment.
+# Multiple labels for the same segment are encoded as repeated segments.
+SEGMENT_LABEL_INDEX_KEY = "segment/label/index"
+# A list with the label string for each segment.
+# Multiple labels for the same segment are encoded as repeated segments.
+SEGMENT_LABEL_STRING_KEY = "segment/label/string"
+# A list with the label confidence for each segment.
+# Multiple labels for the same segment are encoded as repeated segments.
+SEGMENT_LABEL_CONFIDENCE_KEY = "segment/label/confidence"
+msu.create_bytes_list_context_feature(
+    "segment_label_string", SEGMENT_LABEL_STRING_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "segment_start_timestamp",
+    SEGMENT_START_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "segment_start_index", SEGMENT_START_INDEX_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "segment_end_timestamp", SEGMENT_END_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "segment_end_index", SEGMENT_END_INDEX_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "segment_label_index", SEGMENT_LABEL_INDEX_KEY, module_dict=globals())
+msu.create_float_list_context_feature(
+    "segment_label_confidence",
+    SEGMENT_LABEL_CONFIDENCE_KEY, module_dict=globals())
+##########################  REGIONS / BOUNDING BOXES  #########################
+# Normalized coordinates of bounding boxes are provided in four lists to avoid
+# order ambiguity. We provide additional accessors for complete bounding boxes
+# below.
+REGION_BBOX_YMIN_KEY = "region/bbox/ymin"
+REGION_BBOX_XMIN_KEY = "region/bbox/xmin"
+REGION_BBOX_YMAX_KEY = "region/bbox/ymax"
+REGION_BBOX_XMAX_KEY = "region/bbox/xmax"
+# The point and radius can denote keypoints.
+REGION_POINT_X_KEY = "region/point/x"
+REGION_POINT_Y_KEY = "region/point/y"
+REGION_RADIUS_KEY = "region/radius"
+# The 3D point can denote keypoints.
+REGION_3D_POINT_X_KEY = "region/3d_point/x"
+REGION_3D_POINT_Y_KEY = "region/3d_point/y"
+REGION_3D_POINT_Z_KEY = "region/3d_point/z"
+# The number of regions at that timestep.
+REGION_NUM_REGIONS_KEY = "region/num_regions"
+# Whether that timestep is annotated for regions.
+# (Disambiguates between multiple meanings of num_regions = 0.)
+REGION_IS_ANNOTATED_KEY = "region/is_annotated"
+# A list indicating if each region is generated (1) or manually annotated (0)
+REGION_IS_GENERATED_KEY = "region/is_generated"
+# A list indicating if each region is occluded (1) or visible (0)
+REGION_IS_OCCLUDED_KEY = "region/is_occluded"
+# Lists with a label for each region.
+# Multiple labels for the same region require duplicating the region.
+REGION_LABEL_INDEX_KEY = "region/label/index"
+REGION_LABEL_STRING_KEY = "region/label/string"
+REGION_LABEL_CONFIDENCE_KEY = "region/label/confidence"
+# Lists with a track identifier for each region.
+# Multiple track identifier for the same region require duplicating the region.
+REGION_TRACK_INDEX_KEY = "region/track/index"
+REGION_TRACK_STRING_KEY = "region/track/string"
+REGION_TRACK_CONFIDENCE_KEY = "region/track/confidence"
+# Lists with a class for each region. In general, prefer to use the label
+# fields. These class fields exist to distinguish tracks when different classes
+# have overlapping track ids.
+REGION_CLASS_INDEX_KEY = "region/class/index"
+REGION_CLASS_STRING_KEY = "region/class/string"
+REGION_CLASS_CONFIDENCE_KEY = "region/class/confidence"
+# The timestamp of the region annotation in microseconds.
+REGION_TIMESTAMP_KEY = "region/timestamp"
+# The original timestamp in microseconds for region annotations.
+# If regions are aligned to image frames, this field preserves the original
+# timestamps.
+REGION_UNMODIFIED_TIMESTAMP_KEY = "region/unmodified_timestamp"
+# The list of region parts expected in this example.
+REGION_PARTS_KEY = "region/parts"
+# The dimensions of each embedding per region / bounding box.
+REGION_EMBEDDING_DIMENSIONS_PER_REGION_KEY = (
+    "region/embedding/dimensions_per_region")
+# The format encoding embeddings as strings.
+REGION_EMBEDDING_FORMAT_KEY = "region/embedding/format"
+# An embedding for each region. The length of each list must be the product of
+# the number of regions and the product of the embedding dimensions.
+REGION_EMBEDDING_FLOAT_KEY = "region/embedding/float"
+# A string encoded embedding for each regions.
+REGION_EMBEDDING_ENCODED_KEY = "region/embedding/encoded"
+# The confidence of the embedding.
+REGION_EMBEDDING_CONFIDENCE_KEY = "region/embedding/confidence"
+def _create_region_with_prefix(name, prefix):
+  """Create multiple accessors for region based data."""
+  msu.create_int_feature_list(name + "_num_regions", REGION_NUM_REGIONS_KEY,
+                              prefix=prefix, module_dict=globals())
+  msu.create_int_feature_list(name + "_is_annotated", REGION_IS_ANNOTATED_KEY,
+                              prefix=prefix, module_dict=globals())
+  msu.create_int_list_feature_list(
+      name + "_is_occluded", REGION_IS_OCCLUDED_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_feature_list(
+      name + "_is_generated", REGION_IS_GENERATED_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_feature_list(name + "_timestamp", REGION_TIMESTAMP_KEY,
+                              prefix=prefix, module_dict=globals())
+  msu.create_int_feature_list(
+      name + "_unmodified_timestamp", REGION_UNMODIFIED_TIMESTAMP_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_feature_list(
+      name + "_label_string", REGION_LABEL_STRING_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_feature_list(
+      name + "_label_index", REGION_LABEL_INDEX_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_label_confidence", REGION_LABEL_CONFIDENCE_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_feature_list(
+      name + "_class_string", REGION_CLASS_STRING_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_feature_list(
+      name + "_class_index", REGION_CLASS_INDEX_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_class_confidence", REGION_CLASS_CONFIDENCE_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_feature_list(
+      name + "_track_string", REGION_TRACK_STRING_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_feature_list(
+      name + "_track_index", REGION_TRACK_INDEX_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_track_confidence", REGION_TRACK_CONFIDENCE_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_ymin", REGION_BBOX_YMIN_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_xmin", REGION_BBOX_XMIN_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_ymax", REGION_BBOX_YMAX_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_xmax", REGION_BBOX_XMAX_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_point_x", REGION_POINT_X_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_point_y", REGION_POINT_Y_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_3d_point_x", REGION_3D_POINT_X_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_3d_point_y", REGION_3D_POINT_Y_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(
+      name + "_3d_point_z", REGION_3D_POINT_Z_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_context_feature(name + "_parts",
+                                        REGION_PARTS_KEY,
+                                        prefix=prefix, module_dict=globals())
+  msu.create_float_list_context_feature(
+      name + "_embedding_dimensions_per_region",
+      REGION_EMBEDDING_DIMENSIONS_PER_REGION_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_context_feature(name + "_embedding_format",
+                                   REGION_EMBEDDING_FORMAT_KEY,
+                                   prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_embedding_floats",
+                                     REGION_EMBEDDING_FLOAT_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_feature_list(name + "_embedding_encoded",
+                                     REGION_EMBEDDING_ENCODED_KEY,
+                                     prefix=prefix, module_dict=globals())
+  msu.create_float_list_feature_list(name + "_embedding_confidence",
+                                     REGION_EMBEDDING_CONFIDENCE_KEY,
+                                     prefix=prefix, module_dict=globals())
+  # pylint: disable=undefined-variable
+  def get_prefixed_bbox_at(index, sequence_example, prefix):
+    return np.stack((
+        get_bbox_ymin_at(index, sequence_example, prefix=prefix),
+        get_bbox_xmin_at(index, sequence_example, prefix=prefix),
+        get_bbox_ymax_at(index, sequence_example, prefix=prefix),
+        get_bbox_xmax_at(index, sequence_example, prefix=prefix)),
+                    1)
+  def add_prefixed_bbox(values, sequence_example, prefix):
+    values = np.array(values)
+    if values.size == 0:
+      add_bbox_ymin([], sequence_example, prefix=prefix)
+      add_bbox_xmin([], sequence_example, prefix=prefix)
+      add_bbox_ymax([], sequence_example, prefix=prefix)
+      add_bbox_xmax([], sequence_example, prefix=prefix)
+    else:
+      add_bbox_ymin(values[:, 0], sequence_example, prefix=prefix)
+      add_bbox_xmin(values[:, 1], sequence_example, prefix=prefix)
+      add_bbox_ymax(values[:, 2], sequence_example, prefix=prefix)
+      add_bbox_xmax(values[:, 3], sequence_example, prefix=prefix)
+  def get_prefixed_bbox_size(sequence_example, prefix):
+    return get_bbox_ymin_size(sequence_example, prefix=prefix)
+  def has_prefixed_bbox(sequence_example, prefix):
+    return has_bbox_ymin(sequence_example, prefix=prefix)
+  def clear_prefixed_bbox(sequence_example, prefix):
+    clear_bbox_ymin(sequence_example, prefix=prefix)
+    clear_bbox_xmin(sequence_example, prefix=prefix)
+    clear_bbox_ymax(sequence_example, prefix=prefix)
+    clear_bbox_xmax(sequence_example, prefix=prefix)
+  def get_prefixed_point_at(index, sequence_example, prefix):
+    return np.stack((
+        get_bbox_point_y_at(index, sequence_example, prefix=prefix),
+        get_bbox_point_x_at(index, sequence_example, prefix=prefix)),
+                    1)
+  def add_prefixed_point(values, sequence_example, prefix):
+    add_bbox_point_y(values[:, 0], sequence_example, prefix=prefix)
+    add_bbox_point_x(values[:, 1], sequence_example, prefix=prefix)
+  def get_prefixed_point_size(sequence_example, prefix):
+    return get_bbox_point_y_size(sequence_example, prefix=prefix)
+  def has_prefixed_point(sequence_example, prefix):
+    return has_bbox_point_y(sequence_example, prefix=prefix)
+  def clear_prefixed_point(sequence_example, prefix):
+    clear_bbox_point_y(sequence_example, prefix=prefix)
+    clear_bbox_point_x(sequence_example, prefix=prefix)
+  def get_prefixed_3d_point_at(index, sequence_example, prefix):
+    return np.stack((
+        get_bbox_3d_point_x_at(index, sequence_example, prefix=prefix),
+        get_bbox_3d_point_y_at(index, sequence_example, prefix=prefix),
+        get_bbox_3d_point_z_at(index, sequence_example, prefix=prefix)),
+                    1)
+  def add_prefixed_3d_point(values, sequence_example, prefix):
+    add_bbox_3d_point_x(values[:, 0], sequence_example, prefix=prefix)
+    add_bbox_3d_point_y(values[:, 1], sequence_example, prefix=prefix)
+    add_bbox_3d_point_z(values[:, 2], sequence_example, prefix=prefix)
+  def get_prefixed_3d_point_size(sequence_example, prefix):
+    return get_bbox_3d_point_x_size(sequence_example, prefix=prefix)
+  def has_prefixed_3d_point(sequence_example, prefix):
+    return has_bbox_3d_point_x(sequence_example, prefix=prefix)
+  def clear_prefixed_3d_point(sequence_example, prefix):
+    clear_bbox_3d_point_x(sequence_example, prefix=prefix)
+    clear_bbox_3d_point_y(sequence_example, prefix=prefix)
+    clear_bbox_3d_point_z(sequence_example, prefix=prefix)
+  # pylint: enable=undefined-variable
+  msu.add_functions_to_module({
+      "get_" + name + "_at":
+          msu.function_with_default(get_prefixed_bbox_at, prefix),
+      "add_" + name:
+          msu.function_with_default(add_prefixed_bbox, prefix),
+      "get_" + name + "_size":
+          msu.function_with_default(get_prefixed_bbox_size, prefix),
+      "has_" + name:
+          msu.function_with_default(has_prefixed_bbox, prefix),
+      "clear_" + name:
+          msu.function_with_default(clear_prefixed_bbox, prefix),
+  }, module_dict=globals())
+  msu.add_functions_to_module({
+      "get_" + name + "_point_at":
+          msu.function_with_default(get_prefixed_point_at, prefix),
+      "add_" + name + "_point":
+          msu.function_with_default(add_prefixed_point, prefix),
+      "get_" + name + "_point_size":
+          msu.function_with_default(get_prefixed_point_size, prefix),
+      "has_" + name + "_point":
+          msu.function_with_default(has_prefixed_point, prefix),
+      "clear_" + name + "_point":
+          msu.function_with_default(clear_prefixed_point, prefix),
+  }, module_dict=globals())
+  msu.add_functions_to_module({
+      "get_" + name + "_3d_point_at":
+          msu.function_with_default(get_prefixed_3d_point_at, prefix),
+      "add_" + name + "_3d_point":
+          msu.function_with_default(add_prefixed_3d_point, prefix),
+      "get_" + name + "_3d_point_size":
+          msu.function_with_default(get_prefixed_3d_point_size, prefix),
+      "has_" + name + "_3d_point":
+          msu.function_with_default(has_prefixed_3d_point, prefix),
+      "clear_" + name + "_3d_point":
+          msu.function_with_default(clear_prefixed_3d_point, prefix),
+  }, module_dict=globals())
+PREDICTED_PREFIX = "PREDICTED"
+_create_region_with_prefix("bbox", "")
+_create_region_with_prefix("predicted_bbox", PREDICTED_PREFIX)
+###################################  IMAGES  #################################
+# The format the images are encoded as (e.g. "JPEG", "PNG")
+IMAGE_FORMAT_KEY = "image/format"
+# The number of channels in the image.
+IMAGE_CHANNELS_KEY = "image/channels"
+# The colorspace of the iamge.
+IMAGE_COLORSPACE_KEY = "image/colorspace"
+# The height of the image in pixels.
+IMAGE_HEIGHT_KEY = "image/height"
+# The width of the image in pixels.
+IMAGE_WIDTH_KEY = "image/width"
+# frame rate in images/second of media.
+IMAGE_FRAME_RATE_KEY = "image/frame_rate"
+# The maximum values if the images were saturated and normalized for encoding.
+IMAGE_SATURATION_KEY = "image/saturation"
+# The listing from discrete image values (as indices) to class indices.
+IMAGE_CLASS_LABEL_INDEX_KEY = "image/class/label/index"
+# The listing from discrete image values (as indices) to class strings.
+IMAGE_CLASS_LABEL_STRING_KEY = "image/class/label/string"
+# The listing from discrete instance indices to class indices they embody.
+IMAGE_OBJECT_CLASS_INDEX_KEY = "image/object/class/index"
+# The encoded image frame.
+IMAGE_ENCODED_KEY = "image/encoded"
+# Multiple images from the same timestep (e.g. multiview video).
+IMAGE_MULTI_ENCODED_KEY = "image/multi_encoded"
+# The timestamp of the frame in microseconds.
+IMAGE_TIMESTAMP_KEY = "image/timestamp"
+# A per image label if specific frames have labels.
+# If time spans have labels, segments are preferred to allow changing rates.
+IMAGE_LABEL_INDEX_KEY = "image/label/index"
+IMAGE_LABEL_STRING_KEY = "image/label/string"
+IMAGE_LABEL_CONFIDENCE_KEY = "image/label/confidence"
+# The path of the image file if it did not come from a media clip.
+IMAGE_DATA_PATH_KEY = "image/data_path"
+def _create_image_with_prefix(name, prefix):
+  """Create multiple accessors for image based data."""
+  msu.create_bytes_context_feature(name + "_format", IMAGE_FORMAT_KEY,
+                                   prefix=prefix, module_dict=globals())
+  msu.create_bytes_context_feature(name + "_colorspace", IMAGE_COLORSPACE_KEY,
+                                   prefix=prefix, module_dict=globals())
+  msu.create_int_context_feature(name + "_channels", IMAGE_CHANNELS_KEY,
+                                 prefix=prefix, module_dict=globals())
+  msu.create_int_context_feature(name + "_height", IMAGE_HEIGHT_KEY,
+                                 prefix=prefix, module_dict=globals())
+  msu.create_int_context_feature(name + "_width", IMAGE_WIDTH_KEY,
+                                 prefix=prefix, module_dict=globals())
+  msu.create_bytes_feature_list(name + "_encoded", IMAGE_ENCODED_KEY,
+                                prefix=prefix, module_dict=globals())
+  msu.create_float_context_feature(name + "_frame_rate", IMAGE_FRAME_RATE_KEY,
+                                   prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_context_feature(
+      name + "_class_label_string", IMAGE_CLASS_LABEL_STRING_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_context_feature(
+      name + "_class_label_index", IMAGE_CLASS_LABEL_INDEX_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_int_list_context_feature(
+      name + "_object_class_index", IMAGE_OBJECT_CLASS_INDEX_KEY,
+      prefix=prefix, module_dict=globals())
+  msu.create_bytes_context_feature(name + "_data_path", IMAGE_DATA_PATH_KEY,
+                                   prefix=prefix, module_dict=globals())
+  msu.create_int_feature_list(name + "_timestamp", IMAGE_TIMESTAMP_KEY,
+                              prefix=prefix, module_dict=globals())
+  msu.create_bytes_list_feature_list(name + "_multi_encoded",
+                                     IMAGE_MULTI_ENCODED_KEY, prefix=prefix,
+                                     module_dict=globals())
+FORWARD_FLOW_PREFIX = "FORWARD_FLOW"
+CLASS_SEGMENTATION_PREFIX = "CLASS_SEGMENTATION"
+INSTANCE_SEGMENTATION_PREFIX = "INSTANCE_SEGMENTATION"
+_create_image_with_prefix("image", "")
+_create_image_with_prefix("forward_flow", FORWARD_FLOW_PREFIX)
+_create_image_with_prefix("class_segmentation", CLASS_SEGMENTATION_PREFIX)
+_create_image_with_prefix("instance_segmentation", INSTANCE_SEGMENTATION_PREFIX)
+##################################  TEXT  #################################
+# Which language text tokens are likely to be in.
+TEXT_LANGUAGE_KEY = "text/language"
+# A large block of text that applies to the media.
+TEXT_CONTEXT_CONTENT_KEY = "text/context/content"
+# A large block of text that applies to the media as token ids.
+TEXT_CONTEXT_TOKEN_ID_KEY = "text/context/token_id"
+# A large block of text that applies to the media as embeddings.
+TEXT_CONTEXT_EMBEDDING_KEY = "text/context/embedding"
+# The text contents for a given time.
+TEXT_CONTENT_KEY = "text/content"
+# The start time for the text becoming relevant.
+TEXT_TIMESTAMP_KEY = "text/timestamp"
+# The duration where the text is relevant.
+TEXT_DURATION_KEY = "text/duration"
+# The confidence that this is the correct text.
+TEXT_CONFIDENCE_KEY = "text/confidence"
+# A floating point embedding corresponding to the text.
+TEXT_EMBEDDING_KEY = "text/embedding"
+# An integer id corresponding to the text.
+TEXT_TOKEN_ID_KEY = "text/token/id"
+msu.create_bytes_context_feature(
+    "text_language", TEXT_LANGUAGE_KEY, module_dict=globals())
+msu.create_bytes_context_feature(
+    "text_context_content", TEXT_CONTEXT_CONTENT_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "text_context_token_id", TEXT_CONTEXT_TOKEN_ID_KEY, module_dict=globals())
+msu.create_float_list_context_feature(
+    "text_context_embedding", TEXT_CONTEXT_EMBEDDING_KEY, module_dict=globals())
+msu.create_bytes_feature_list(
+    "text_content", TEXT_CONTENT_KEY, module_dict=globals())
+msu.create_int_feature_list(
+    "text_timestamp", TEXT_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_feature_list(
+    "text_duration", TEXT_DURATION_KEY, module_dict=globals())
+msu.create_float_feature_list(
+    "text_confidence", TEXT_CONFIDENCE_KEY, module_dict=globals())
+msu.create_float_list_feature_list(
+    "text_embedding", TEXT_EMBEDDING_KEY, module_dict=globals())
+msu.create_int_feature_list(
+    "text_token_id", TEXT_TOKEN_ID_KEY, module_dict=globals())
+##################################  FEATURES  #################################
+# The dimensions of the feature.
+FEATURE_DIMENSIONS_KEY = "feature/dimensions"
+# The rate the features are extracted per second of media.
+FEATURE_RATE_KEY = "feature/rate"
+# The encoding format if any for the feature.
+FEATURE_BYTES_FORMAT_KEY = "feature/bytes/format"
+# For audio, the rate the samples are extracted per second of media.
+FEATURE_SAMPLE_RATE_KEY = "feature/sample_rate"
+# For audio, the number of channels per extracted feature.
+FEATURE_NUM_CHANNELS_KEY = "feature/num_channels"
+# For audio, th enumber of samples per extracted feature.
+FEATURE_NUM_SAMPLES_KEY = "feature/num_samples"
+# For audio, the rate the features are extracted per second of media.
+FEATURE_PACKET_RATE_KEY = "feature/packet_rate"
+# For audio, the original audio sampling rate the feature is derived from.
+FEATURE_AUDIO_SAMPLE_RATE_KEY = "feature/audio_sample_rate"
+# The feature as a list of floats.
+FEATURE_FLOATS_KEY = "feature/floats"
+# The feature as a list of bytes. May be encoded.
+FEATURE_BYTES_KEY = "feature/bytes"
+# The feature as a list of ints.
+FEATURE_INTS_KEY = "feature/ints"
+# The timestamp, in microseconds, of the feature.
+FEATURE_TIMESTAMP_KEY = "feature/timestamp"
+# It is occasionally useful to indicate that a feature applies to a given range.
+# This should be used for features only and annotations should be provided as
+# segments.
+FEATURE_DURATION_KEY = "feature/duration"
+# Encodes an optional confidence score for the generated features.
+FEATURE_CONFIDENCE_KEY = "feature/confidence"
+# The feature as a list of floats in the context.
+CONTEXT_FEATURE_FLOATS_KEY = "context_feature/floats"
+# The feature as a list of bytes in the context. May be encoded.
+CONTEXT_FEATURE_BYTES_KEY = "context_feature/bytes"
+# The feature as a list of ints in the context.
+CONTEXT_FEATURE_INTS_KEY = "context_feature/ints"
+msu.create_int_list_context_feature(
+    "feature_dimensions", FEATURE_DIMENSIONS_KEY, module_dict=globals())
+msu.create_float_context_feature(
+    "feature_rate", FEATURE_RATE_KEY, module_dict=globals())
+msu.create_bytes_context_feature(
+    "feature_bytes_format", FEATURE_BYTES_FORMAT_KEY, module_dict=globals())
+msu.create_float_context_feature(
+    "feature_sample_rate", FEATURE_SAMPLE_RATE_KEY, module_dict=globals())
+msu.create_int_context_feature(
+    "feature_num_channels", FEATURE_NUM_CHANNELS_KEY, module_dict=globals())
+msu.create_int_context_feature(
+    "feature_num_samples", FEATURE_NUM_SAMPLES_KEY, module_dict=globals())
+msu.create_float_context_feature(
+    "feature_packet_rate", FEATURE_PACKET_RATE_KEY, module_dict=globals())
+msu.create_float_context_feature(
+    "feature_audio_sample_rate", FEATURE_AUDIO_SAMPLE_RATE_KEY,
+    module_dict=globals())
+msu.create_float_list_feature_list(
+    "feature_floats", FEATURE_FLOATS_KEY, module_dict=globals())
+msu.create_bytes_list_feature_list(
+    "feature_bytes", FEATURE_BYTES_KEY, module_dict=globals())
+msu.create_int_list_feature_list(
+    "feature_ints", FEATURE_INTS_KEY, module_dict=globals())
+msu.create_int_feature_list(
+    "feature_timestamp", FEATURE_TIMESTAMP_KEY, module_dict=globals())
+msu.create_int_list_feature_list(
+    "feature_duration", FEATURE_DURATION_KEY, module_dict=globals())
+msu.create_float_list_feature_list(
+    "feature_confidence", FEATURE_CONFIDENCE_KEY, module_dict=globals())
+msu.create_float_list_context_feature(
+    "context_feature_floats", CONTEXT_FEATURE_FLOATS_KEY, module_dict=globals())
+msu.create_bytes_list_context_feature(
+    "context_feature_bytes", CONTEXT_FEATURE_BYTES_KEY, module_dict=globals())
+msu.create_int_list_context_feature(
+    "context_feature_ints", CONTEXT_FEATURE_INTS_KEY, module_dict=globals())