ob-metaflow-stubs 6.0.5.1__py2.py3-none-any.whl → 6.0.5.2__py2.py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (261) hide show
  1. metaflow-stubs/__init__.pyi +995 -995
  2. metaflow-stubs/cards.pyi +1 -1
  3. metaflow-stubs/cli.pyi +1 -1
  4. metaflow-stubs/cli_components/__init__.pyi +1 -1
  5. metaflow-stubs/cli_components/utils.pyi +1 -1
  6. metaflow-stubs/client/__init__.pyi +1 -1
  7. metaflow-stubs/client/core.pyi +5 -5
  8. metaflow-stubs/client/filecache.pyi +1 -1
  9. metaflow-stubs/events.pyi +2 -2
  10. metaflow-stubs/exception.pyi +1 -1
  11. metaflow-stubs/flowspec.pyi +4 -4
  12. metaflow-stubs/generated_for.txt +1 -1
  13. metaflow-stubs/includefile.pyi +2 -2
  14. metaflow-stubs/meta_files.pyi +1 -1
  15. metaflow-stubs/metadata_provider/__init__.pyi +1 -1
  16. metaflow-stubs/metadata_provider/heartbeat.pyi +1 -1
  17. metaflow-stubs/metadata_provider/metadata.pyi +1 -1
  18. metaflow-stubs/metadata_provider/util.pyi +1 -1
  19. metaflow-stubs/metaflow_config.pyi +1 -1
  20. metaflow-stubs/metaflow_current.pyi +53 -53
  21. metaflow-stubs/metaflow_git.pyi +1 -1
  22. metaflow-stubs/mf_extensions/__init__.pyi +1 -1
  23. metaflow-stubs/mf_extensions/obcheckpoint/__init__.pyi +1 -1
  24. metaflow-stubs/mf_extensions/obcheckpoint/plugins/__init__.pyi +1 -1
  25. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/__init__.pyi +1 -1
  26. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/card_utils/__init__.pyi +1 -1
  27. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/card_utils/async_cards.pyi +1 -1
  28. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/card_utils/deco_injection_mixin.pyi +1 -1
  29. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/card_utils/extra_components.pyi +1 -1
  30. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/__init__.pyi +1 -1
  31. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/cards/__init__.pyi +1 -1
  32. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/cards/checkpoint_lister.pyi +3 -3
  33. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/cards/lineage_card.pyi +1 -1
  34. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/checkpoint_storage.pyi +2 -2
  35. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/constructors.pyi +1 -1
  36. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/core.pyi +3 -3
  37. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/decorator.pyi +3 -3
  38. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/exceptions.pyi +1 -1
  39. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/final_api.pyi +1 -1
  40. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/checkpoints/lineage.pyi +1 -1
  41. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/__init__.pyi +1 -1
  42. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/context.pyi +2 -2
  43. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/core.pyi +1 -1
  44. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/decorator.pyi +1 -1
  45. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/exceptions.pyi +1 -1
  46. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/task_utils.pyi +1 -1
  47. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastore/utils.pyi +1 -1
  48. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/datastructures.pyi +1 -1
  49. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/exceptions.pyi +1 -1
  50. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/hf_hub/__init__.pyi +1 -1
  51. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/hf_hub/decorator.pyi +2 -2
  52. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/modeling_utils/__init__.pyi +1 -1
  53. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/modeling_utils/core.pyi +3 -3
  54. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/modeling_utils/exceptions.pyi +1 -1
  55. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/modeling_utils/model_storage.pyi +2 -2
  56. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/__init__.pyi +1 -1
  57. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/flowspec_utils.pyi +1 -1
  58. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/general.pyi +1 -1
  59. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/identity_utils.pyi +1 -1
  60. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/serialization_handler/__init__.pyi +1 -1
  61. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/serialization_handler/base.pyi +1 -1
  62. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/serialization_handler/tar.pyi +1 -1
  63. metaflow-stubs/mf_extensions/obcheckpoint/plugins/machine_learning_utilities/utils/tar_utils.pyi +1 -1
  64. metaflow-stubs/mf_extensions/outerbounds/__init__.pyi +1 -1
  65. metaflow-stubs/mf_extensions/outerbounds/plugins/__init__.pyi +1 -1
  66. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/__init__.pyi +1 -1
  67. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/__init__.pyi +1 -1
  68. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/_state_machine.pyi +2 -1
  69. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/_vendor/__init__.pyi +1 -1
  70. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/_vendor/spinner/__init__.pyi +1 -1
  71. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/_vendor/spinner/spinners.pyi +1 -1
  72. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/app_cli.pyi +1 -1
  73. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/app_config.pyi +2 -2
  74. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/capsule.pyi +2 -2
  75. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/click_importer.pyi +1 -1
  76. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/code_package/__init__.pyi +1 -1
  77. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/code_package/code_packager.pyi +1 -1
  78. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/__init__.pyi +1 -1
  79. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/cli_generator.pyi +1 -1
  80. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/config_utils.pyi +1 -1
  81. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/schema_export.pyi +1 -1
  82. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/typed_configs.pyi +6 -2
  83. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/config/unified_config.pyi +10 -1
  84. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/dependencies.pyi +2 -2
  85. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/deployer.pyi +6 -7
  86. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/experimental/__init__.pyi +1 -1
  87. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/perimeters.pyi +1 -1
  88. metaflow-stubs/mf_extensions/outerbounds/plugins/apps/core/utils.pyi +1 -1
  89. metaflow-stubs/mf_extensions/outerbounds/plugins/aws/__init__.pyi +1 -1
  90. metaflow-stubs/mf_extensions/outerbounds/plugins/aws/assume_role_decorator.pyi +1 -1
  91. metaflow-stubs/mf_extensions/outerbounds/plugins/card_utilities/__init__.pyi +1 -1
  92. metaflow-stubs/mf_extensions/outerbounds/plugins/card_utilities/async_cards.pyi +1 -1
  93. metaflow-stubs/mf_extensions/outerbounds/plugins/card_utilities/injector.pyi +1 -1
  94. metaflow-stubs/mf_extensions/outerbounds/plugins/checkpoint_datastores/__init__.pyi +1 -1
  95. metaflow-stubs/mf_extensions/outerbounds/plugins/checkpoint_datastores/coreweave.pyi +1 -1
  96. metaflow-stubs/mf_extensions/outerbounds/plugins/checkpoint_datastores/nebius.pyi +1 -1
  97. metaflow-stubs/mf_extensions/outerbounds/plugins/fast_bakery/__init__.pyi +1 -1
  98. metaflow-stubs/mf_extensions/outerbounds/plugins/fast_bakery/baker.pyi +2 -2
  99. metaflow-stubs/mf_extensions/outerbounds/plugins/fast_bakery/docker_environment.pyi +1 -1
  100. metaflow-stubs/mf_extensions/outerbounds/plugins/fast_bakery/fast_bakery.pyi +1 -1
  101. metaflow-stubs/mf_extensions/outerbounds/plugins/kubernetes/__init__.pyi +1 -1
  102. metaflow-stubs/mf_extensions/outerbounds/plugins/kubernetes/pod_killer.pyi +1 -1
  103. metaflow-stubs/mf_extensions/outerbounds/plugins/ollama/__init__.pyi +1 -1
  104. metaflow-stubs/mf_extensions/outerbounds/plugins/ollama/constants.pyi +1 -1
  105. metaflow-stubs/mf_extensions/outerbounds/plugins/ollama/exceptions.pyi +1 -1
  106. metaflow-stubs/mf_extensions/outerbounds/plugins/ollama/ollama.pyi +1 -1
  107. metaflow-stubs/mf_extensions/outerbounds/plugins/ollama/status_card.pyi +1 -1
  108. metaflow-stubs/mf_extensions/outerbounds/plugins/snowflake/__init__.pyi +1 -1
  109. metaflow-stubs/mf_extensions/outerbounds/plugins/snowflake/snowflake.pyi +1 -1
  110. metaflow-stubs/mf_extensions/outerbounds/profilers/__init__.pyi +1 -1
  111. metaflow-stubs/mf_extensions/outerbounds/profilers/gpu.pyi +1 -1
  112. metaflow-stubs/mf_extensions/outerbounds/remote_config.pyi +1 -1
  113. metaflow-stubs/mf_extensions/outerbounds/toplevel/__init__.pyi +1 -1
  114. metaflow-stubs/mf_extensions/outerbounds/toplevel/global_aliases_for_metaflow_package.pyi +1 -1
  115. metaflow-stubs/mf_extensions/outerbounds/toplevel/s3_proxy.pyi +1 -1
  116. metaflow-stubs/multicore_utils.pyi +1 -1
  117. metaflow-stubs/ob_internal.pyi +1 -1
  118. metaflow-stubs/packaging_sys/__init__.pyi +3 -3
  119. metaflow-stubs/packaging_sys/backend.pyi +1 -1
  120. metaflow-stubs/packaging_sys/distribution_support.pyi +3 -3
  121. metaflow-stubs/packaging_sys/tar_backend.pyi +3 -3
  122. metaflow-stubs/packaging_sys/utils.pyi +1 -1
  123. metaflow-stubs/packaging_sys/v1.pyi +1 -1
  124. metaflow-stubs/parameters.pyi +2 -2
  125. metaflow-stubs/plugins/__init__.pyi +7 -7
  126. metaflow-stubs/plugins/airflow/__init__.pyi +1 -1
  127. metaflow-stubs/plugins/airflow/airflow_utils.pyi +1 -1
  128. metaflow-stubs/plugins/airflow/exception.pyi +1 -1
  129. metaflow-stubs/plugins/airflow/sensors/__init__.pyi +1 -1
  130. metaflow-stubs/plugins/airflow/sensors/base_sensor.pyi +1 -1
  131. metaflow-stubs/plugins/airflow/sensors/external_task_sensor.pyi +1 -1
  132. metaflow-stubs/plugins/airflow/sensors/s3_sensor.pyi +1 -1
  133. metaflow-stubs/plugins/argo/__init__.pyi +1 -1
  134. metaflow-stubs/plugins/argo/argo_client.pyi +1 -1
  135. metaflow-stubs/plugins/argo/argo_events.pyi +1 -1
  136. metaflow-stubs/plugins/argo/argo_workflows.pyi +1 -1
  137. metaflow-stubs/plugins/argo/argo_workflows_decorator.pyi +2 -2
  138. metaflow-stubs/plugins/argo/argo_workflows_deployer.pyi +2 -2
  139. metaflow-stubs/plugins/argo/argo_workflows_deployer_objects.pyi +1 -1
  140. metaflow-stubs/plugins/argo/exit_hooks.pyi +1 -1
  141. metaflow-stubs/plugins/aws/__init__.pyi +1 -1
  142. metaflow-stubs/plugins/aws/aws_client.pyi +1 -1
  143. metaflow-stubs/plugins/aws/aws_utils.pyi +1 -1
  144. metaflow-stubs/plugins/aws/batch/__init__.pyi +1 -1
  145. metaflow-stubs/plugins/aws/batch/batch.pyi +1 -1
  146. metaflow-stubs/plugins/aws/batch/batch_client.pyi +1 -1
  147. metaflow-stubs/plugins/aws/batch/batch_decorator.pyi +1 -1
  148. metaflow-stubs/plugins/aws/secrets_manager/__init__.pyi +1 -1
  149. metaflow-stubs/plugins/aws/secrets_manager/aws_secrets_manager_secrets_provider.pyi +2 -2
  150. metaflow-stubs/plugins/aws/step_functions/__init__.pyi +1 -1
  151. metaflow-stubs/plugins/aws/step_functions/event_bridge_client.pyi +1 -1
  152. metaflow-stubs/plugins/aws/step_functions/schedule_decorator.pyi +1 -1
  153. metaflow-stubs/plugins/aws/step_functions/step_functions.pyi +1 -1
  154. metaflow-stubs/plugins/aws/step_functions/step_functions_client.pyi +1 -1
  155. metaflow-stubs/plugins/aws/step_functions/step_functions_deployer.pyi +1 -1
  156. metaflow-stubs/plugins/aws/step_functions/step_functions_deployer_objects.pyi +2 -2
  157. metaflow-stubs/plugins/azure/__init__.pyi +1 -1
  158. metaflow-stubs/plugins/azure/azure_credential.pyi +1 -1
  159. metaflow-stubs/plugins/azure/azure_exceptions.pyi +1 -1
  160. metaflow-stubs/plugins/azure/azure_secret_manager_secrets_provider.pyi +2 -2
  161. metaflow-stubs/plugins/azure/azure_utils.pyi +1 -1
  162. metaflow-stubs/plugins/azure/blob_service_client_factory.pyi +1 -1
  163. metaflow-stubs/plugins/azure/includefile_support.pyi +1 -1
  164. metaflow-stubs/plugins/cards/__init__.pyi +1 -1
  165. metaflow-stubs/plugins/cards/card_client.pyi +2 -2
  166. metaflow-stubs/plugins/cards/card_creator.pyi +1 -1
  167. metaflow-stubs/plugins/cards/card_datastore.pyi +1 -1
  168. metaflow-stubs/plugins/cards/card_decorator.pyi +1 -1
  169. metaflow-stubs/plugins/cards/card_modules/__init__.pyi +1 -1
  170. metaflow-stubs/plugins/cards/card_modules/basic.pyi +1 -1
  171. metaflow-stubs/plugins/cards/card_modules/card.pyi +1 -1
  172. metaflow-stubs/plugins/cards/card_modules/components.pyi +2 -2
  173. metaflow-stubs/plugins/cards/card_modules/convert_to_native_type.pyi +1 -1
  174. metaflow-stubs/plugins/cards/card_modules/renderer_tools.pyi +1 -1
  175. metaflow-stubs/plugins/cards/card_modules/test_cards.pyi +1 -1
  176. metaflow-stubs/plugins/cards/card_resolver.pyi +1 -1
  177. metaflow-stubs/plugins/cards/component_serializer.pyi +1 -1
  178. metaflow-stubs/plugins/cards/exception.pyi +1 -1
  179. metaflow-stubs/plugins/catch_decorator.pyi +1 -1
  180. metaflow-stubs/plugins/datatools/__init__.pyi +1 -1
  181. metaflow-stubs/plugins/datatools/local.pyi +1 -1
  182. metaflow-stubs/plugins/datatools/s3/__init__.pyi +1 -1
  183. metaflow-stubs/plugins/datatools/s3/s3.pyi +2 -2
  184. metaflow-stubs/plugins/datatools/s3/s3tail.pyi +1 -1
  185. metaflow-stubs/plugins/datatools/s3/s3util.pyi +1 -1
  186. metaflow-stubs/plugins/debug_logger.pyi +1 -1
  187. metaflow-stubs/plugins/debug_monitor.pyi +1 -1
  188. metaflow-stubs/plugins/environment_decorator.pyi +1 -1
  189. metaflow-stubs/plugins/events_decorator.pyi +1 -1
  190. metaflow-stubs/plugins/exit_hook/__init__.pyi +1 -1
  191. metaflow-stubs/plugins/exit_hook/exit_hook_decorator.pyi +1 -1
  192. metaflow-stubs/plugins/frameworks/__init__.pyi +1 -1
  193. metaflow-stubs/plugins/frameworks/pytorch.pyi +1 -1
  194. metaflow-stubs/plugins/gcp/__init__.pyi +1 -1
  195. metaflow-stubs/plugins/gcp/gcp_secret_manager_secrets_provider.pyi +2 -2
  196. metaflow-stubs/plugins/gcp/gs_exceptions.pyi +1 -1
  197. metaflow-stubs/plugins/gcp/gs_storage_client_factory.pyi +1 -1
  198. metaflow-stubs/plugins/gcp/gs_utils.pyi +1 -1
  199. metaflow-stubs/plugins/gcp/includefile_support.pyi +1 -1
  200. metaflow-stubs/plugins/kubernetes/__init__.pyi +1 -1
  201. metaflow-stubs/plugins/kubernetes/kube_utils.pyi +1 -1
  202. metaflow-stubs/plugins/kubernetes/kubernetes.pyi +1 -1
  203. metaflow-stubs/plugins/kubernetes/kubernetes_client.pyi +1 -1
  204. metaflow-stubs/plugins/kubernetes/kubernetes_decorator.pyi +1 -1
  205. metaflow-stubs/plugins/kubernetes/kubernetes_jobsets.pyi +1 -1
  206. metaflow-stubs/plugins/kubernetes/spot_monitor_sidecar.pyi +1 -1
  207. metaflow-stubs/plugins/ollama/__init__.pyi +2 -2
  208. metaflow-stubs/plugins/parallel_decorator.pyi +1 -1
  209. metaflow-stubs/plugins/perimeters.pyi +1 -1
  210. metaflow-stubs/plugins/project_decorator.pyi +1 -1
  211. metaflow-stubs/plugins/pypi/__init__.pyi +1 -1
  212. metaflow-stubs/plugins/pypi/conda_decorator.pyi +1 -1
  213. metaflow-stubs/plugins/pypi/conda_environment.pyi +3 -3
  214. metaflow-stubs/plugins/pypi/parsers.pyi +1 -1
  215. metaflow-stubs/plugins/pypi/pypi_decorator.pyi +1 -1
  216. metaflow-stubs/plugins/pypi/pypi_environment.pyi +1 -1
  217. metaflow-stubs/plugins/pypi/utils.pyi +1 -1
  218. metaflow-stubs/plugins/resources_decorator.pyi +1 -1
  219. metaflow-stubs/plugins/retry_decorator.pyi +1 -1
  220. metaflow-stubs/plugins/secrets/__init__.pyi +1 -1
  221. metaflow-stubs/plugins/secrets/inline_secrets_provider.pyi +2 -2
  222. metaflow-stubs/plugins/secrets/secrets_decorator.pyi +1 -1
  223. metaflow-stubs/plugins/secrets/secrets_func.pyi +1 -1
  224. metaflow-stubs/plugins/secrets/secrets_spec.pyi +1 -1
  225. metaflow-stubs/plugins/secrets/utils.pyi +1 -1
  226. metaflow-stubs/plugins/snowflake/__init__.pyi +1 -1
  227. metaflow-stubs/plugins/storage_executor.pyi +1 -1
  228. metaflow-stubs/plugins/test_unbounded_foreach_decorator.pyi +1 -1
  229. metaflow-stubs/plugins/timeout_decorator.pyi +1 -1
  230. metaflow-stubs/plugins/torchtune/__init__.pyi +1 -1
  231. metaflow-stubs/plugins/uv/__init__.pyi +1 -1
  232. metaflow-stubs/plugins/uv/uv_environment.pyi +1 -1
  233. metaflow-stubs/profilers/__init__.pyi +1 -1
  234. metaflow-stubs/pylint_wrapper.pyi +1 -1
  235. metaflow-stubs/runner/__init__.pyi +1 -1
  236. metaflow-stubs/runner/deployer.pyi +5 -5
  237. metaflow-stubs/runner/deployer_impl.pyi +1 -1
  238. metaflow-stubs/runner/metaflow_runner.pyi +2 -2
  239. metaflow-stubs/runner/nbdeploy.pyi +1 -1
  240. metaflow-stubs/runner/nbrun.pyi +1 -1
  241. metaflow-stubs/runner/subprocess_manager.pyi +1 -1
  242. metaflow-stubs/runner/utils.pyi +1 -1
  243. metaflow-stubs/system/__init__.pyi +1 -1
  244. metaflow-stubs/system/system_logger.pyi +2 -2
  245. metaflow-stubs/system/system_monitor.pyi +1 -1
  246. metaflow-stubs/tagging_util.pyi +1 -1
  247. metaflow-stubs/tuple_util.pyi +1 -1
  248. metaflow-stubs/user_configs/__init__.pyi +1 -1
  249. metaflow-stubs/user_configs/config_options.pyi +1 -1
  250. metaflow-stubs/user_configs/config_parameters.pyi +4 -4
  251. metaflow-stubs/user_decorators/__init__.pyi +1 -1
  252. metaflow-stubs/user_decorators/common.pyi +1 -1
  253. metaflow-stubs/user_decorators/mutable_flow.pyi +4 -4
  254. metaflow-stubs/user_decorators/mutable_step.pyi +4 -4
  255. metaflow-stubs/user_decorators/user_flow_decorator.pyi +3 -3
  256. metaflow-stubs/user_decorators/user_step_decorator.pyi +4 -4
  257. {ob_metaflow_stubs-6.0.5.1.dist-info → ob_metaflow_stubs-6.0.5.2.dist-info}/METADATA +1 -1
  258. ob_metaflow_stubs-6.0.5.2.dist-info/RECORD +261 -0
  259. ob_metaflow_stubs-6.0.5.1.dist-info/RECORD +0 -261
  260. {ob_metaflow_stubs-6.0.5.1.dist-info → ob_metaflow_stubs-6.0.5.2.dist-info}/WHEEL +0 -0
  261. {ob_metaflow_stubs-6.0.5.1.dist-info → ob_metaflow_stubs-6.0.5.2.dist-info}/top_level.txt +0 -0
@@ -1,7 +1,7 @@
1
1
  ######################################################################################################
2
2
  # Auto-generated Metaflow stub file #
3
3
  # MF version: 2.16.8.1+obcheckpoint(0.2.4);ob(v1) #
4
- # Generated on 2025-08-01T20:12:28.874985 #
4
+ # Generated on 2025-08-04T19:06:54.653206 #
5
5
  ######################################################################################################
6
6
 
7
7
  from __future__ import annotations
@@ -40,16 +40,16 @@ from .user_decorators.user_step_decorator import StepMutator as StepMutator
40
40
  from .user_decorators.user_step_decorator import user_step_decorator as user_step_decorator
41
41
  from .user_decorators.user_flow_decorator import FlowMutator as FlowMutator
42
42
  from . import cards as cards
43
- from . import metaflow_git as metaflow_git
44
43
  from . import tuple_util as tuple_util
44
+ from . import metaflow_git as metaflow_git
45
45
  from . import events as events
46
46
  from . import runner as runner
47
47
  from . import plugins as plugins
48
48
  from .mf_extensions.outerbounds.toplevel.global_aliases_for_metaflow_package import S3 as S3
49
49
  from . import includefile as includefile
50
50
  from .includefile import IncludeFile as IncludeFile
51
- from .plugins.pypi.parsers import conda_environment_yml_parser as conda_environment_yml_parser
52
51
  from .plugins.pypi.parsers import pyproject_toml_parser as pyproject_toml_parser
52
+ from .plugins.pypi.parsers import conda_environment_yml_parser as conda_environment_yml_parser
53
53
  from .plugins.pypi.parsers import requirements_txt_parser as requirements_txt_parser
54
54
  from . import client as client
55
55
  from .client.core import namespace as namespace
@@ -167,6 +167,251 @@ def step(f: typing.Union[typing.Callable[[FlowSpecDerived], None], typing.Callab
167
167
  """
168
168
  ...
169
169
 
170
+ def ollama(*, models: list, backend: str, force_pull: bool, cache_update_policy: str, force_cache_update: bool, debug: bool, circuit_breaker_config: dict, timeout_config: dict) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
171
+ """
172
+ This decorator is used to run Ollama APIs as Metaflow task sidecars.
173
+
174
+ User code call
175
+ --------------
176
+ @ollama(
177
+ models=[...],
178
+ ...
179
+ )
180
+
181
+ Valid backend options
182
+ ---------------------
183
+ - 'local': Run as a separate process on the local task machine.
184
+ - (TODO) 'managed': Outerbounds hosts and selects compute provider.
185
+ - (TODO) 'remote': Spin up separate instance to serve Ollama models.
186
+
187
+ Valid model options
188
+ -------------------
189
+ Any model here https://ollama.com/search, e.g. 'llama3.2', 'llama3.3'
190
+
191
+
192
+ Parameters
193
+ ----------
194
+ models: list[str]
195
+ List of Ollama containers running models in sidecars.
196
+ backend: str
197
+ Determines where and how to run the Ollama process.
198
+ force_pull: bool
199
+ Whether to run `ollama pull` no matter what, or first check the remote cache in Metaflow datastore for this model key.
200
+ cache_update_policy: str
201
+ Cache update policy: "auto", "force", or "never".
202
+ force_cache_update: bool
203
+ Simple override for "force" cache update policy.
204
+ debug: bool
205
+ Whether to turn on verbose debugging logs.
206
+ circuit_breaker_config: dict
207
+ Configuration for circuit breaker protection. Keys: failure_threshold, recovery_timeout, reset_timeout.
208
+ timeout_config: dict
209
+ Configuration for various operation timeouts. Keys: pull, stop, health_check, install, server_startup.
210
+ """
211
+ ...
212
+
213
+ def s3_proxy(*, integration_name: typing.Optional[str] = None, write_mode: typing.Optional[str] = None, debug: typing.Optional[bool] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
214
+ """
215
+ S3 Proxy decorator for routing S3 requests through a local proxy service.
216
+
217
+
218
+ Parameters
219
+ ----------
220
+ integration_name : str, optional
221
+ Name of the S3 proxy integration. If not specified, will use the only
222
+ available S3 proxy integration in the namespace (fails if multiple exist).
223
+ write_mode : str, optional
224
+ The desired behavior during write operations to target (origin) S3 bucket.
225
+ allowed options are:
226
+ "origin-and-cache" -> write to both the target S3 bucket and local object
227
+ storage
228
+ "origin" -> only write to the target S3 bucket
229
+ "cache" -> only write to the object storage service used for caching
230
+ debug : bool, optional
231
+ Enable debug logging for proxy operations.
232
+ """
233
+ ...
234
+
235
+ @typing.overload
236
+ def checkpoint(*, load_policy: str = 'fresh', temp_dir_root: str = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
237
+ """
238
+ Enables checkpointing for a step.
239
+
240
+ > Examples
241
+
242
+ - Saving Checkpoints
243
+
244
+ ```python
245
+ @checkpoint
246
+ @step
247
+ def train(self):
248
+ model = create_model(self.parameters, checkpoint_path = None)
249
+ for i in range(self.epochs):
250
+ # some training logic
251
+ loss = model.train(self.dataset)
252
+ if i % 10 == 0:
253
+ model.save(
254
+ current.checkpoint.directory,
255
+ )
256
+ # saves the contents of the `current.checkpoint.directory` as a checkpoint
257
+ # and returns a reference dictionary to the checkpoint saved in the datastore
258
+ self.latest_checkpoint = current.checkpoint.save(
259
+ name="epoch_checkpoint",
260
+ metadata={
261
+ "epoch": i,
262
+ "loss": loss,
263
+ }
264
+ )
265
+ ```
266
+
267
+ - Using Loaded Checkpoints
268
+
269
+ ```python
270
+ @retry(times=3)
271
+ @checkpoint
272
+ @step
273
+ def train(self):
274
+ # Assume that the task has restarted and the previous attempt of the task
275
+ # saved a checkpoint
276
+ checkpoint_path = None
277
+ if current.checkpoint.is_loaded: # Check if a checkpoint is loaded
278
+ print("Loaded checkpoint from the previous attempt")
279
+ checkpoint_path = current.checkpoint.directory
280
+
281
+ model = create_model(self.parameters, checkpoint_path = checkpoint_path)
282
+ for i in range(self.epochs):
283
+ ...
284
+ ```
285
+
286
+
287
+ Parameters
288
+ ----------
289
+ load_policy : str, default: "fresh"
290
+ The policy for loading the checkpoint. The following policies are supported:
291
+ - "eager": Loads the the latest available checkpoint within the namespace.
292
+ With this mode, the latest checkpoint written by any previous task (can be even a different run) of the step
293
+ will be loaded at the start of the task.
294
+ - "none": Do not load any checkpoint
295
+ - "fresh": Loads the lastest checkpoint created within the running Task.
296
+ This mode helps loading checkpoints across various retry attempts of the same task.
297
+ With this mode, no checkpoint will be loaded at the start of a task but any checkpoints
298
+ created within the task will be loaded when the task is retries execution on failure.
299
+
300
+ temp_dir_root : str, default: None
301
+ The root directory under which `current.checkpoint.directory` will be created.
302
+ """
303
+ ...
304
+
305
+ @typing.overload
306
+ def checkpoint(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
307
+ ...
308
+
309
+ @typing.overload
310
+ def checkpoint(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
311
+ ...
312
+
313
+ def checkpoint(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, load_policy: str = 'fresh', temp_dir_root: str = None):
314
+ """
315
+ Enables checkpointing for a step.
316
+
317
+ > Examples
318
+
319
+ - Saving Checkpoints
320
+
321
+ ```python
322
+ @checkpoint
323
+ @step
324
+ def train(self):
325
+ model = create_model(self.parameters, checkpoint_path = None)
326
+ for i in range(self.epochs):
327
+ # some training logic
328
+ loss = model.train(self.dataset)
329
+ if i % 10 == 0:
330
+ model.save(
331
+ current.checkpoint.directory,
332
+ )
333
+ # saves the contents of the `current.checkpoint.directory` as a checkpoint
334
+ # and returns a reference dictionary to the checkpoint saved in the datastore
335
+ self.latest_checkpoint = current.checkpoint.save(
336
+ name="epoch_checkpoint",
337
+ metadata={
338
+ "epoch": i,
339
+ "loss": loss,
340
+ }
341
+ )
342
+ ```
343
+
344
+ - Using Loaded Checkpoints
345
+
346
+ ```python
347
+ @retry(times=3)
348
+ @checkpoint
349
+ @step
350
+ def train(self):
351
+ # Assume that the task has restarted and the previous attempt of the task
352
+ # saved a checkpoint
353
+ checkpoint_path = None
354
+ if current.checkpoint.is_loaded: # Check if a checkpoint is loaded
355
+ print("Loaded checkpoint from the previous attempt")
356
+ checkpoint_path = current.checkpoint.directory
357
+
358
+ model = create_model(self.parameters, checkpoint_path = checkpoint_path)
359
+ for i in range(self.epochs):
360
+ ...
361
+ ```
362
+
363
+
364
+ Parameters
365
+ ----------
366
+ load_policy : str, default: "fresh"
367
+ The policy for loading the checkpoint. The following policies are supported:
368
+ - "eager": Loads the the latest available checkpoint within the namespace.
369
+ With this mode, the latest checkpoint written by any previous task (can be even a different run) of the step
370
+ will be loaded at the start of the task.
371
+ - "none": Do not load any checkpoint
372
+ - "fresh": Loads the lastest checkpoint created within the running Task.
373
+ This mode helps loading checkpoints across various retry attempts of the same task.
374
+ With this mode, no checkpoint will be loaded at the start of a task but any checkpoints
375
+ created within the task will be loaded when the task is retries execution on failure.
376
+
377
+ temp_dir_root : str, default: None
378
+ The root directory under which `current.checkpoint.directory` will be created.
379
+ """
380
+ ...
381
+
382
+ @typing.overload
383
+ def environment(*, vars: typing.Dict[str, str] = {}) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
384
+ """
385
+ Specifies environment variables to be set prior to the execution of a step.
386
+
387
+
388
+ Parameters
389
+ ----------
390
+ vars : Dict[str, str], default {}
391
+ Dictionary of environment variables to set.
392
+ """
393
+ ...
394
+
395
+ @typing.overload
396
+ def environment(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
397
+ ...
398
+
399
+ @typing.overload
400
+ def environment(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
401
+ ...
402
+
403
+ def environment(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, vars: typing.Dict[str, str] = {}):
404
+ """
405
+ Specifies environment variables to be set prior to the execution of a step.
406
+
407
+
408
+ Parameters
409
+ ----------
410
+ vars : Dict[str, str], default {}
411
+ Dictionary of environment variables to set.
412
+ """
413
+ ...
414
+
170
415
  @typing.overload
171
416
  def catch(*, var: typing.Optional[str] = None, print_exception: bool = True) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
172
417
  """
@@ -218,42 +463,68 @@ def catch(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], ty
218
463
  """
219
464
  ...
220
465
 
221
- @typing.overload
222
- def secrets(*, sources: typing.List[typing.Union[str, typing.Dict[str, typing.Any]]] = [], role: typing.Optional[str] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
466
+ def nvidia(*, gpu: int, gpu_type: str, queue_timeout: int) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
223
467
  """
224
- Specifies secrets to be retrieved and injected as environment variables prior to
225
- the execution of a step.
468
+ Specifies that this step should execute on DGX cloud.
226
469
 
227
470
 
228
471
  Parameters
229
472
  ----------
230
- sources : List[Union[str, Dict[str, Any]]], default: []
231
- List of secret specs, defining how the secrets are to be retrieved
232
- role : str, optional, default: None
233
- Role to use for fetching secrets
473
+ gpu : int
474
+ Number of GPUs to use.
475
+ gpu_type : str
476
+ Type of Nvidia GPU to use.
477
+ queue_timeout : int
478
+ Time to keep the job in NVCF's queue.
234
479
  """
235
480
  ...
236
481
 
237
482
  @typing.overload
238
- def secrets(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
483
+ def card(*, type: str = 'default', id: typing.Optional[str] = None, options: typing.Dict[str, typing.Any] = {}, timeout: int = 45) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
484
+ """
485
+ Creates a human-readable report, a Metaflow Card, after this step completes.
486
+
487
+ Note that you may add multiple `@card` decorators in a step with different parameters.
488
+
489
+
490
+ Parameters
491
+ ----------
492
+ type : str, default 'default'
493
+ Card type.
494
+ id : str, optional, default None
495
+ If multiple cards are present, use this id to identify this card.
496
+ options : Dict[str, Any], default {}
497
+ Options passed to the card. The contents depend on the card type.
498
+ timeout : int, default 45
499
+ Interrupt reporting if it takes more than this many seconds.
500
+ """
239
501
  ...
240
502
 
241
503
  @typing.overload
242
- def secrets(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
504
+ def card(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
243
505
  ...
244
506
 
245
- def secrets(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, sources: typing.List[typing.Union[str, typing.Dict[str, typing.Any]]] = [], role: typing.Optional[str] = None):
507
+ @typing.overload
508
+ def card(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
509
+ ...
510
+
511
+ def card(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, type: str = 'default', id: typing.Optional[str] = None, options: typing.Dict[str, typing.Any] = {}, timeout: int = 45):
246
512
  """
247
- Specifies secrets to be retrieved and injected as environment variables prior to
248
- the execution of a step.
513
+ Creates a human-readable report, a Metaflow Card, after this step completes.
514
+
515
+ Note that you may add multiple `@card` decorators in a step with different parameters.
249
516
 
250
517
 
251
518
  Parameters
252
519
  ----------
253
- sources : List[Union[str, Dict[str, Any]]], default: []
254
- List of secret specs, defining how the secrets are to be retrieved
255
- role : str, optional, default: None
256
- Role to use for fetching secrets
520
+ type : str, default 'default'
521
+ Card type.
522
+ id : str, optional, default None
523
+ If multiple cards are present, use this id to identify this card.
524
+ options : Dict[str, Any], default {}
525
+ Options passed to the card. The contents depend on the card type.
526
+ timeout : int, default 45
527
+ Interrupt reporting if it takes more than this many seconds.
257
528
  """
258
529
  ...
259
530
 
@@ -274,22 +545,17 @@ def fast_bakery_internal(f: typing.Union[typing.Callable[[FlowSpecDerived, StepF
274
545
  """
275
546
  ...
276
547
 
277
- @typing.overload
278
- def test_append_card(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
279
- """
280
- A simple decorator that demonstrates using CardDecoratorInjector
281
- to inject a card and render simple markdown content.
282
- """
283
- ...
284
-
285
- @typing.overload
286
- def test_append_card(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
287
- ...
288
-
289
- def test_append_card(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
548
+ def nvct(*, gpu: int, gpu_type: str) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
290
549
  """
291
- A simple decorator that demonstrates using CardDecoratorInjector
292
- to inject a card and render simple markdown content.
550
+ Specifies that this step should execute on DGX cloud.
551
+
552
+
553
+ Parameters
554
+ ----------
555
+ gpu : int
556
+ Number of GPUs to use.
557
+ gpu_type : str
558
+ Type of Nvidia GPU to use.
293
559
  """
294
560
  ...
295
561
 
@@ -344,182 +610,226 @@ def pypi(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typ
344
610
  """
345
611
  ...
346
612
 
347
- def vllm(*, model: str, backend: str, openai_api_server: bool, debug: bool, card_refresh_interval: int, max_retries: int, retry_alert_frequency: int, engine_args: dict) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
613
+ @typing.overload
614
+ def secrets(*, sources: typing.List[typing.Union[str, typing.Dict[str, typing.Any]]] = [], role: typing.Optional[str] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
348
615
  """
349
- This decorator is used to run vllm APIs as Metaflow task sidecars.
350
-
351
- User code call
352
- --------------
353
- @vllm(
354
- model="...",
355
- ...
356
- )
357
-
358
- Valid backend options
359
- ---------------------
360
- - 'local': Run as a separate process on the local task machine.
361
-
362
- Valid model options
363
- -------------------
364
- Any HuggingFace model identifier, e.g. 'meta-llama/Llama-3.2-1B'
365
-
366
- NOTE: vLLM's OpenAI-compatible server serves ONE model per server instance.
367
- If you need multiple models, you must create multiple @vllm decorators.
616
+ Specifies secrets to be retrieved and injected as environment variables prior to
617
+ the execution of a step.
368
618
 
369
619
 
370
620
  Parameters
371
621
  ----------
372
- model: str
373
- HuggingFace model identifier to be served by vLLM.
374
- backend: str
375
- Determines where and how to run the vLLM process.
376
- openai_api_server: bool
377
- Whether to use OpenAI-compatible API server mode (subprocess) instead of native engine.
378
- Default is False (uses native engine).
379
- Set to True for backward compatibility with existing code.
380
- debug: bool
381
- Whether to turn on verbose debugging logs.
382
- card_refresh_interval: int
383
- Interval in seconds for refreshing the vLLM status card.
384
- Only used when openai_api_server=True.
385
- max_retries: int
386
- Maximum number of retries checking for vLLM server startup.
387
- Only used when openai_api_server=True.
388
- retry_alert_frequency: int
389
- Frequency of alert logs for vLLM server startup retries.
390
- Only used when openai_api_server=True.
391
- engine_args : dict
392
- Additional keyword arguments to pass to the vLLM engine.
393
- For example, `tensor_parallel_size=2`.
622
+ sources : List[Union[str, Dict[str, Any]]], default: []
623
+ List of secret specs, defining how the secrets are to be retrieved
624
+ role : str, optional, default: None
625
+ Role to use for fetching secrets
394
626
  """
395
627
  ...
396
628
 
397
629
  @typing.overload
398
- def model(*, load: typing.Union[typing.List[str], str, typing.List[typing.Tuple[str, typing.Optional[str]]]] = None, temp_dir_root: str = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
630
+ def secrets(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
631
+ ...
632
+
633
+ @typing.overload
634
+ def secrets(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
635
+ ...
636
+
637
+ def secrets(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, sources: typing.List[typing.Union[str, typing.Dict[str, typing.Any]]] = [], role: typing.Optional[str] = None):
399
638
  """
400
- Enables loading / saving of models within a step.
401
-
402
- > Examples
403
- - Saving Models
404
- ```python
405
- @model
406
- @step
407
- def train(self):
408
- # current.model.save returns a dictionary reference to the model saved
409
- self.my_model = current.model.save(
410
- path_to_my_model,
411
- label="my_model",
412
- metadata={
413
- "epochs": 10,
414
- "batch-size": 32,
415
- "learning-rate": 0.001,
416
- }
417
- )
418
- self.next(self.test)
419
-
420
- @model(load="my_model")
421
- @step
422
- def test(self):
423
- # `current.model.loaded` returns a dictionary of the loaded models
424
- # where the key is the name of the artifact and the value is the path to the model
425
- print(os.listdir(current.model.loaded["my_model"]))
426
- self.next(self.end)
427
- ```
428
-
429
- - Loading models
430
- ```python
431
- @step
432
- def train(self):
433
- # current.model.load returns the path to the model loaded
434
- checkpoint_path = current.model.load(
435
- self.checkpoint_key,
436
- )
437
- model_path = current.model.load(
438
- self.model,
439
- )
440
- self.next(self.test)
441
- ```
639
+ Specifies secrets to be retrieved and injected as environment variables prior to
640
+ the execution of a step.
442
641
 
443
642
 
444
643
  Parameters
445
644
  ----------
446
- load : Union[List[str],str,List[Tuple[str,Union[str,None]]]], default: None
447
- Artifact name/s referencing the models/checkpoints to load. Artifact names refer to the names of the instance variables set to `self`.
448
- These artifact names give to `load` be reference objects or reference `key` string's from objects created by `current.checkpoint` / `current.model` / `current.huggingface_hub`.
449
- If a list of tuples is provided, the first element is the artifact name and the second element is the path the artifact needs be unpacked on
450
- the local filesystem. If the second element is None, the artifact will be unpacked in the current working directory.
451
- If a string is provided, then the artifact corresponding to that name will be loaded in the current working directory.
452
-
453
- temp_dir_root : str, default: None
454
- The root directory under which `current.model.loaded` will store loaded models
645
+ sources : List[Union[str, Dict[str, Any]]], default: []
646
+ List of secret specs, defining how the secrets are to be retrieved
647
+ role : str, optional, default: None
648
+ Role to use for fetching secrets
455
649
  """
456
650
  ...
457
651
 
458
652
  @typing.overload
459
- def model(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
653
+ def test_append_card(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
654
+ """
655
+ A simple decorator that demonstrates using CardDecoratorInjector
656
+ to inject a card and render simple markdown content.
657
+ """
460
658
  ...
461
659
 
462
660
  @typing.overload
463
- def model(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
661
+ def test_append_card(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
464
662
  ...
465
663
 
466
- def model(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, load: typing.Union[typing.List[str], str, typing.List[typing.Tuple[str, typing.Optional[str]]]] = None, temp_dir_root: str = None):
664
+ def test_append_card(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
467
665
  """
468
- Enables loading / saving of models within a step.
469
-
470
- > Examples
471
- - Saving Models
472
- ```python
473
- @model
474
- @step
475
- def train(self):
476
- # current.model.save returns a dictionary reference to the model saved
477
- self.my_model = current.model.save(
478
- path_to_my_model,
479
- label="my_model",
480
- metadata={
481
- "epochs": 10,
482
- "batch-size": 32,
483
- "learning-rate": 0.001,
484
- }
485
- )
486
- self.next(self.test)
487
-
488
- @model(load="my_model")
489
- @step
490
- def test(self):
491
- # `current.model.loaded` returns a dictionary of the loaded models
492
- # where the key is the name of the artifact and the value is the path to the model
493
- print(os.listdir(current.model.loaded["my_model"]))
494
- self.next(self.end)
495
- ```
496
-
497
- - Loading models
498
- ```python
499
- @step
500
- def train(self):
501
- # current.model.load returns the path to the model loaded
502
- checkpoint_path = current.model.load(
503
- self.checkpoint_key,
504
- )
505
- model_path = current.model.load(
506
- self.model,
507
- )
508
- self.next(self.test)
509
- ```
666
+ A simple decorator that demonstrates using CardDecoratorInjector
667
+ to inject a card and render simple markdown content.
668
+ """
669
+ ...
670
+
671
+ def kubernetes(*, cpu: int = 1, memory: int = 4096, disk: int = 10240, image: typing.Optional[str] = None, image_pull_policy: str = 'KUBERNETES_IMAGE_PULL_POLICY', image_pull_secrets: typing.List[str] = [], service_account: str = 'METAFLOW_KUBERNETES_SERVICE_ACCOUNT', secrets: typing.Optional[typing.List[str]] = None, node_selector: typing.Union[typing.Dict[str, str], str, None] = None, namespace: str = 'METAFLOW_KUBERNETES_NAMESPACE', gpu: typing.Optional[int] = None, gpu_vendor: str = 'KUBERNETES_GPU_VENDOR', tolerations: typing.List[typing.Dict[str, str]] = [], labels: typing.Dict[str, str] = 'METAFLOW_KUBERNETES_LABELS', annotations: typing.Dict[str, str] = 'METAFLOW_KUBERNETES_ANNOTATIONS', use_tmpfs: bool = False, tmpfs_tempdir: bool = True, tmpfs_size: typing.Optional[int] = None, tmpfs_path: typing.Optional[str] = '/metaflow_temp', persistent_volume_claims: typing.Optional[typing.Dict[str, str]] = None, shared_memory: typing.Optional[int] = None, port: typing.Optional[int] = None, compute_pool: typing.Optional[str] = None, hostname_resolution_timeout: int = 600, qos: str = 'Burstable', security_context: typing.Optional[typing.Dict[str, typing.Any]] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
672
+ """
673
+ Specifies that this step should execute on Kubernetes.
510
674
 
511
675
 
512
676
  Parameters
513
677
  ----------
514
- load : Union[List[str],str,List[Tuple[str,Union[str,None]]]], default: None
515
- Artifact name/s referencing the models/checkpoints to load. Artifact names refer to the names of the instance variables set to `self`.
516
- These artifact names give to `load` be reference objects or reference `key` string's from objects created by `current.checkpoint` / `current.model` / `current.huggingface_hub`.
517
- If a list of tuples is provided, the first element is the artifact name and the second element is the path the artifact needs be unpacked on
518
- the local filesystem. If the second element is None, the artifact will be unpacked in the current working directory.
519
- If a string is provided, then the artifact corresponding to that name will be loaded in the current working directory.
678
+ cpu : int, default 1
679
+ Number of CPUs required for this step. If `@resources` is
680
+ also present, the maximum value from all decorators is used.
681
+ memory : int, default 4096
682
+ Memory size (in MB) required for this step. If
683
+ `@resources` is also present, the maximum value from all decorators is
684
+ used.
685
+ disk : int, default 10240
686
+ Disk size (in MB) required for this step. If
687
+ `@resources` is also present, the maximum value from all decorators is
688
+ used.
689
+ image : str, optional, default None
690
+ Docker image to use when launching on Kubernetes. If not specified, and
691
+ METAFLOW_KUBERNETES_CONTAINER_IMAGE is specified, that image is used. If
692
+ not, a default Docker image mapping to the current version of Python is used.
693
+ image_pull_policy: str, default KUBERNETES_IMAGE_PULL_POLICY
694
+ If given, the imagePullPolicy to be applied to the Docker image of the step.
695
+ image_pull_secrets: List[str], default []
696
+ The default is extracted from METAFLOW_KUBERNETES_IMAGE_PULL_SECRETS.
697
+ Kubernetes image pull secrets to use when pulling container images
698
+ in Kubernetes.
699
+ service_account : str, default METAFLOW_KUBERNETES_SERVICE_ACCOUNT
700
+ Kubernetes service account to use when launching pod in Kubernetes.
701
+ secrets : List[str], optional, default None
702
+ Kubernetes secrets to use when launching pod in Kubernetes. These
703
+ secrets are in addition to the ones defined in `METAFLOW_KUBERNETES_SECRETS`
704
+ in Metaflow configuration.
705
+ node_selector: Union[Dict[str,str], str], optional, default None
706
+ Kubernetes node selector(s) to apply to the pod running the task.
707
+ Can be passed in as a comma separated string of values e.g.
708
+ 'kubernetes.io/os=linux,kubernetes.io/arch=amd64' or as a dictionary
709
+ {'kubernetes.io/os': 'linux', 'kubernetes.io/arch': 'amd64'}
710
+ namespace : str, default METAFLOW_KUBERNETES_NAMESPACE
711
+ Kubernetes namespace to use when launching pod in Kubernetes.
712
+ gpu : int, optional, default None
713
+ Number of GPUs required for this step. A value of zero implies that
714
+ the scheduled node should not have GPUs.
715
+ gpu_vendor : str, default KUBERNETES_GPU_VENDOR
716
+ The vendor of the GPUs to be used for this step.
717
+ tolerations : List[Dict[str,str]], default []
718
+ The default is extracted from METAFLOW_KUBERNETES_TOLERATIONS.
719
+ Kubernetes tolerations to use when launching pod in Kubernetes.
720
+ labels: Dict[str, str], default: METAFLOW_KUBERNETES_LABELS
721
+ Kubernetes labels to use when launching pod in Kubernetes.
722
+ annotations: Dict[str, str], default: METAFLOW_KUBERNETES_ANNOTATIONS
723
+ Kubernetes annotations to use when launching pod in Kubernetes.
724
+ use_tmpfs : bool, default False
725
+ This enables an explicit tmpfs mount for this step.
726
+ tmpfs_tempdir : bool, default True
727
+ sets METAFLOW_TEMPDIR to tmpfs_path if set for this step.
728
+ tmpfs_size : int, optional, default: None
729
+ The value for the size (in MiB) of the tmpfs mount for this step.
730
+ This parameter maps to the `--tmpfs` option in Docker. Defaults to 50% of the
731
+ memory allocated for this step.
732
+ tmpfs_path : str, optional, default /metaflow_temp
733
+ Path to tmpfs mount for this step.
734
+ persistent_volume_claims : Dict[str, str], optional, default None
735
+ A map (dictionary) of persistent volumes to be mounted to the pod for this step. The map is from persistent
736
+ volumes to the path to which the volume is to be mounted, e.g., `{'pvc-name': '/path/to/mount/on'}`.
737
+ shared_memory: int, optional
738
+ Shared memory size (in MiB) required for this step
739
+ port: int, optional
740
+ Port number to specify in the Kubernetes job object
741
+ compute_pool : str, optional, default None
742
+ Compute pool to be used for for this step.
743
+ If not specified, any accessible compute pool within the perimeter is used.
744
+ hostname_resolution_timeout: int, default 10 * 60
745
+ Timeout in seconds for the workers tasks in the gang scheduled cluster to resolve the hostname of control task.
746
+ Only applicable when @parallel is used.
747
+ qos: str, default: Burstable
748
+ Quality of Service class to assign to the pod. Supported values are: Guaranteed, Burstable, BestEffort
520
749
 
521
- temp_dir_root : str, default: None
522
- The root directory under which `current.model.loaded` will store loaded models
750
+ security_context: Dict[str, Any], optional, default None
751
+ Container security context. Applies to the task container. Allows the following keys:
752
+ - privileged: bool, optional, default None
753
+ - allow_privilege_escalation: bool, optional, default None
754
+ - run_as_user: int, optional, default None
755
+ - run_as_group: int, optional, default None
756
+ - run_as_non_root: bool, optional, default None
757
+ """
758
+ ...
759
+
760
+ @typing.overload
761
+ def nebius_s3_proxy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
762
+ """
763
+ Nebius-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
764
+ It exists to make it easier for users to know that this decorator should only be used with
765
+ a Neo Cloud like Nebius.
766
+ """
767
+ ...
768
+
769
+ @typing.overload
770
+ def nebius_s3_proxy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
771
+ ...
772
+
773
+ def nebius_s3_proxy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
774
+ """
775
+ Nebius-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
776
+ It exists to make it easier for users to know that this decorator should only be used with
777
+ a Neo Cloud like Nebius.
778
+ """
779
+ ...
780
+
781
+ @typing.overload
782
+ def retry(*, times: int = 3, minutes_between_retries: int = 2) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
783
+ """
784
+ Specifies the number of times the task corresponding
785
+ to a step needs to be retried.
786
+
787
+ This decorator is useful for handling transient errors, such as networking issues.
788
+ If your task contains operations that can't be retried safely, e.g. database updates,
789
+ it is advisable to annotate it with `@retry(times=0)`.
790
+
791
+ This can be used in conjunction with the `@catch` decorator. The `@catch`
792
+ decorator will execute a no-op task after all retries have been exhausted,
793
+ ensuring that the flow execution can continue.
794
+
795
+
796
+ Parameters
797
+ ----------
798
+ times : int, default 3
799
+ Number of times to retry this task.
800
+ minutes_between_retries : int, default 2
801
+ Number of minutes between retries.
802
+ """
803
+ ...
804
+
805
+ @typing.overload
806
+ def retry(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
807
+ ...
808
+
809
+ @typing.overload
810
+ def retry(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
811
+ ...
812
+
813
+ def retry(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, times: int = 3, minutes_between_retries: int = 2):
814
+ """
815
+ Specifies the number of times the task corresponding
816
+ to a step needs to be retried.
817
+
818
+ This decorator is useful for handling transient errors, such as networking issues.
819
+ If your task contains operations that can't be retried safely, e.g. database updates,
820
+ it is advisable to annotate it with `@retry(times=0)`.
821
+
822
+ This can be used in conjunction with the `@catch` decorator. The `@catch`
823
+ decorator will execute a no-op task after all retries have been exhausted,
824
+ ensuring that the flow execution can continue.
825
+
826
+
827
+ Parameters
828
+ ----------
829
+ times : int, default 3
830
+ Number of times to retry this task.
831
+ minutes_between_retries : int, default 2
832
+ Number of minutes between retries.
523
833
  """
524
834
  ...
525
835
 
@@ -603,488 +913,45 @@ def huggingface_hub(*, temp_dir_root: typing.Optional[str] = None, load: typing.
603
913
  """
604
914
  ...
605
915
 
606
- def nvidia(*, gpu: int, gpu_type: str, queue_timeout: int) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
916
+ @typing.overload
917
+ def coreweave_s3_proxy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
607
918
  """
608
- Specifies that this step should execute on DGX cloud.
609
-
610
-
611
- Parameters
612
- ----------
613
- gpu : int
614
- Number of GPUs to use.
615
- gpu_type : str
616
- Type of Nvidia GPU to use.
617
- queue_timeout : int
618
- Time to keep the job in NVCF's queue.
919
+ CoreWeave-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
920
+ It exists to make it easier for users to know that this decorator should only be used with
921
+ a Neo Cloud like CoreWeave.
619
922
  """
620
923
  ...
621
924
 
622
925
  @typing.overload
623
- def checkpoint(*, load_policy: str = 'fresh', temp_dir_root: str = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
926
+ def coreweave_s3_proxy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
927
+ ...
928
+
929
+ def coreweave_s3_proxy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
624
930
  """
625
- Enables checkpointing for a step.
626
-
627
- > Examples
931
+ CoreWeave-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
932
+ It exists to make it easier for users to know that this decorator should only be used with
933
+ a Neo Cloud like CoreWeave.
934
+ """
935
+ ...
936
+
937
+ @typing.overload
938
+ def resources(*, cpu: int = 1, gpu: typing.Optional[int] = None, disk: typing.Optional[int] = None, memory: int = 4096, shared_memory: typing.Optional[int] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
939
+ """
940
+ Specifies the resources needed when executing this step.
628
941
 
629
- - Saving Checkpoints
942
+ Use `@resources` to specify the resource requirements
943
+ independently of the specific compute layer (`@batch`, `@kubernetes`).
630
944
 
631
- ```python
632
- @checkpoint
633
- @step
634
- def train(self):
635
- model = create_model(self.parameters, checkpoint_path = None)
636
- for i in range(self.epochs):
637
- # some training logic
638
- loss = model.train(self.dataset)
639
- if i % 10 == 0:
640
- model.save(
641
- current.checkpoint.directory,
642
- )
643
- # saves the contents of the `current.checkpoint.directory` as a checkpoint
644
- # and returns a reference dictionary to the checkpoint saved in the datastore
645
- self.latest_checkpoint = current.checkpoint.save(
646
- name="epoch_checkpoint",
647
- metadata={
648
- "epoch": i,
649
- "loss": loss,
650
- }
651
- )
945
+ You can choose the compute layer on the command line by executing e.g.
652
946
  ```
653
-
654
- - Using Loaded Checkpoints
655
-
656
- ```python
657
- @retry(times=3)
658
- @checkpoint
659
- @step
660
- def train(self):
661
- # Assume that the task has restarted and the previous attempt of the task
662
- # saved a checkpoint
663
- checkpoint_path = None
664
- if current.checkpoint.is_loaded: # Check if a checkpoint is loaded
665
- print("Loaded checkpoint from the previous attempt")
666
- checkpoint_path = current.checkpoint.directory
667
-
668
- model = create_model(self.parameters, checkpoint_path = checkpoint_path)
669
- for i in range(self.epochs):
670
- ...
947
+ python myflow.py run --with batch
948
+ ```
949
+ or
950
+ ```
951
+ python myflow.py run --with kubernetes
671
952
  ```
672
-
673
-
674
- Parameters
675
- ----------
676
- load_policy : str, default: "fresh"
677
- The policy for loading the checkpoint. The following policies are supported:
678
- - "eager": Loads the the latest available checkpoint within the namespace.
679
- With this mode, the latest checkpoint written by any previous task (can be even a different run) of the step
680
- will be loaded at the start of the task.
681
- - "none": Do not load any checkpoint
682
- - "fresh": Loads the lastest checkpoint created within the running Task.
683
- This mode helps loading checkpoints across various retry attempts of the same task.
684
- With this mode, no checkpoint will be loaded at the start of a task but any checkpoints
685
- created within the task will be loaded when the task is retries execution on failure.
686
-
687
- temp_dir_root : str, default: None
688
- The root directory under which `current.checkpoint.directory` will be created.
689
- """
690
- ...
691
-
692
- @typing.overload
693
- def checkpoint(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
694
- ...
695
-
696
- @typing.overload
697
- def checkpoint(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
698
- ...
699
-
700
- def checkpoint(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, load_policy: str = 'fresh', temp_dir_root: str = None):
701
- """
702
- Enables checkpointing for a step.
703
-
704
- > Examples
705
-
706
- - Saving Checkpoints
707
-
708
- ```python
709
- @checkpoint
710
- @step
711
- def train(self):
712
- model = create_model(self.parameters, checkpoint_path = None)
713
- for i in range(self.epochs):
714
- # some training logic
715
- loss = model.train(self.dataset)
716
- if i % 10 == 0:
717
- model.save(
718
- current.checkpoint.directory,
719
- )
720
- # saves the contents of the `current.checkpoint.directory` as a checkpoint
721
- # and returns a reference dictionary to the checkpoint saved in the datastore
722
- self.latest_checkpoint = current.checkpoint.save(
723
- name="epoch_checkpoint",
724
- metadata={
725
- "epoch": i,
726
- "loss": loss,
727
- }
728
- )
729
- ```
730
-
731
- - Using Loaded Checkpoints
732
-
733
- ```python
734
- @retry(times=3)
735
- @checkpoint
736
- @step
737
- def train(self):
738
- # Assume that the task has restarted and the previous attempt of the task
739
- # saved a checkpoint
740
- checkpoint_path = None
741
- if current.checkpoint.is_loaded: # Check if a checkpoint is loaded
742
- print("Loaded checkpoint from the previous attempt")
743
- checkpoint_path = current.checkpoint.directory
744
-
745
- model = create_model(self.parameters, checkpoint_path = checkpoint_path)
746
- for i in range(self.epochs):
747
- ...
748
- ```
749
-
750
-
751
- Parameters
752
- ----------
753
- load_policy : str, default: "fresh"
754
- The policy for loading the checkpoint. The following policies are supported:
755
- - "eager": Loads the the latest available checkpoint within the namespace.
756
- With this mode, the latest checkpoint written by any previous task (can be even a different run) of the step
757
- will be loaded at the start of the task.
758
- - "none": Do not load any checkpoint
759
- - "fresh": Loads the lastest checkpoint created within the running Task.
760
- This mode helps loading checkpoints across various retry attempts of the same task.
761
- With this mode, no checkpoint will be loaded at the start of a task but any checkpoints
762
- created within the task will be loaded when the task is retries execution on failure.
763
-
764
- temp_dir_root : str, default: None
765
- The root directory under which `current.checkpoint.directory` will be created.
766
- """
767
- ...
768
-
769
- @typing.overload
770
- def parallel(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
771
- """
772
- Decorator prototype for all step decorators. This function gets specialized
773
- and imported for all decorators types by _import_plugin_decorators().
774
- """
775
- ...
776
-
777
- @typing.overload
778
- def parallel(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
779
- ...
780
-
781
- def parallel(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
782
- """
783
- Decorator prototype for all step decorators. This function gets specialized
784
- and imported for all decorators types by _import_plugin_decorators().
785
- """
786
- ...
787
-
788
- @typing.overload
789
- def environment(*, vars: typing.Dict[str, str] = {}) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
790
- """
791
- Specifies environment variables to be set prior to the execution of a step.
792
-
793
-
794
- Parameters
795
- ----------
796
- vars : Dict[str, str], default {}
797
- Dictionary of environment variables to set.
798
- """
799
- ...
800
-
801
- @typing.overload
802
- def environment(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
803
- ...
804
-
805
- @typing.overload
806
- def environment(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
807
- ...
808
-
809
- def environment(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, vars: typing.Dict[str, str] = {}):
810
- """
811
- Specifies environment variables to be set prior to the execution of a step.
812
-
813
-
814
- Parameters
815
- ----------
816
- vars : Dict[str, str], default {}
817
- Dictionary of environment variables to set.
818
- """
819
- ...
820
-
821
- def kubernetes(*, cpu: int = 1, memory: int = 4096, disk: int = 10240, image: typing.Optional[str] = None, image_pull_policy: str = 'KUBERNETES_IMAGE_PULL_POLICY', image_pull_secrets: typing.List[str] = [], service_account: str = 'METAFLOW_KUBERNETES_SERVICE_ACCOUNT', secrets: typing.Optional[typing.List[str]] = None, node_selector: typing.Union[typing.Dict[str, str], str, None] = None, namespace: str = 'METAFLOW_KUBERNETES_NAMESPACE', gpu: typing.Optional[int] = None, gpu_vendor: str = 'KUBERNETES_GPU_VENDOR', tolerations: typing.List[typing.Dict[str, str]] = [], labels: typing.Dict[str, str] = 'METAFLOW_KUBERNETES_LABELS', annotations: typing.Dict[str, str] = 'METAFLOW_KUBERNETES_ANNOTATIONS', use_tmpfs: bool = False, tmpfs_tempdir: bool = True, tmpfs_size: typing.Optional[int] = None, tmpfs_path: typing.Optional[str] = '/metaflow_temp', persistent_volume_claims: typing.Optional[typing.Dict[str, str]] = None, shared_memory: typing.Optional[int] = None, port: typing.Optional[int] = None, compute_pool: typing.Optional[str] = None, hostname_resolution_timeout: int = 600, qos: str = 'Burstable', security_context: typing.Optional[typing.Dict[str, typing.Any]] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
822
- """
823
- Specifies that this step should execute on Kubernetes.
824
-
825
-
826
- Parameters
827
- ----------
828
- cpu : int, default 1
829
- Number of CPUs required for this step. If `@resources` is
830
- also present, the maximum value from all decorators is used.
831
- memory : int, default 4096
832
- Memory size (in MB) required for this step. If
833
- `@resources` is also present, the maximum value from all decorators is
834
- used.
835
- disk : int, default 10240
836
- Disk size (in MB) required for this step. If
837
- `@resources` is also present, the maximum value from all decorators is
838
- used.
839
- image : str, optional, default None
840
- Docker image to use when launching on Kubernetes. If not specified, and
841
- METAFLOW_KUBERNETES_CONTAINER_IMAGE is specified, that image is used. If
842
- not, a default Docker image mapping to the current version of Python is used.
843
- image_pull_policy: str, default KUBERNETES_IMAGE_PULL_POLICY
844
- If given, the imagePullPolicy to be applied to the Docker image of the step.
845
- image_pull_secrets: List[str], default []
846
- The default is extracted from METAFLOW_KUBERNETES_IMAGE_PULL_SECRETS.
847
- Kubernetes image pull secrets to use when pulling container images
848
- in Kubernetes.
849
- service_account : str, default METAFLOW_KUBERNETES_SERVICE_ACCOUNT
850
- Kubernetes service account to use when launching pod in Kubernetes.
851
- secrets : List[str], optional, default None
852
- Kubernetes secrets to use when launching pod in Kubernetes. These
853
- secrets are in addition to the ones defined in `METAFLOW_KUBERNETES_SECRETS`
854
- in Metaflow configuration.
855
- node_selector: Union[Dict[str,str], str], optional, default None
856
- Kubernetes node selector(s) to apply to the pod running the task.
857
- Can be passed in as a comma separated string of values e.g.
858
- 'kubernetes.io/os=linux,kubernetes.io/arch=amd64' or as a dictionary
859
- {'kubernetes.io/os': 'linux', 'kubernetes.io/arch': 'amd64'}
860
- namespace : str, default METAFLOW_KUBERNETES_NAMESPACE
861
- Kubernetes namespace to use when launching pod in Kubernetes.
862
- gpu : int, optional, default None
863
- Number of GPUs required for this step. A value of zero implies that
864
- the scheduled node should not have GPUs.
865
- gpu_vendor : str, default KUBERNETES_GPU_VENDOR
866
- The vendor of the GPUs to be used for this step.
867
- tolerations : List[Dict[str,str]], default []
868
- The default is extracted from METAFLOW_KUBERNETES_TOLERATIONS.
869
- Kubernetes tolerations to use when launching pod in Kubernetes.
870
- labels: Dict[str, str], default: METAFLOW_KUBERNETES_LABELS
871
- Kubernetes labels to use when launching pod in Kubernetes.
872
- annotations: Dict[str, str], default: METAFLOW_KUBERNETES_ANNOTATIONS
873
- Kubernetes annotations to use when launching pod in Kubernetes.
874
- use_tmpfs : bool, default False
875
- This enables an explicit tmpfs mount for this step.
876
- tmpfs_tempdir : bool, default True
877
- sets METAFLOW_TEMPDIR to tmpfs_path if set for this step.
878
- tmpfs_size : int, optional, default: None
879
- The value for the size (in MiB) of the tmpfs mount for this step.
880
- This parameter maps to the `--tmpfs` option in Docker. Defaults to 50% of the
881
- memory allocated for this step.
882
- tmpfs_path : str, optional, default /metaflow_temp
883
- Path to tmpfs mount for this step.
884
- persistent_volume_claims : Dict[str, str], optional, default None
885
- A map (dictionary) of persistent volumes to be mounted to the pod for this step. The map is from persistent
886
- volumes to the path to which the volume is to be mounted, e.g., `{'pvc-name': '/path/to/mount/on'}`.
887
- shared_memory: int, optional
888
- Shared memory size (in MiB) required for this step
889
- port: int, optional
890
- Port number to specify in the Kubernetes job object
891
- compute_pool : str, optional, default None
892
- Compute pool to be used for for this step.
893
- If not specified, any accessible compute pool within the perimeter is used.
894
- hostname_resolution_timeout: int, default 10 * 60
895
- Timeout in seconds for the workers tasks in the gang scheduled cluster to resolve the hostname of control task.
896
- Only applicable when @parallel is used.
897
- qos: str, default: Burstable
898
- Quality of Service class to assign to the pod. Supported values are: Guaranteed, Burstable, BestEffort
899
-
900
- security_context: Dict[str, Any], optional, default None
901
- Container security context. Applies to the task container. Allows the following keys:
902
- - privileged: bool, optional, default None
903
- - allow_privilege_escalation: bool, optional, default None
904
- - run_as_user: int, optional, default None
905
- - run_as_group: int, optional, default None
906
- - run_as_non_root: bool, optional, default None
907
- """
908
- ...
909
-
910
- @typing.overload
911
- def coreweave_s3_proxy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
912
- """
913
- CoreWeave-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
914
- It exists to make it easier for users to know that this decorator should only be used with
915
- a Neo Cloud like CoreWeave.
916
- """
917
- ...
918
-
919
- @typing.overload
920
- def coreweave_s3_proxy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
921
- ...
922
-
923
- def coreweave_s3_proxy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
924
- """
925
- CoreWeave-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
926
- It exists to make it easier for users to know that this decorator should only be used with
927
- a Neo Cloud like CoreWeave.
928
- """
929
- ...
930
-
931
- @typing.overload
932
- def retry(*, times: int = 3, minutes_between_retries: int = 2) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
933
- """
934
- Specifies the number of times the task corresponding
935
- to a step needs to be retried.
936
-
937
- This decorator is useful for handling transient errors, such as networking issues.
938
- If your task contains operations that can't be retried safely, e.g. database updates,
939
- it is advisable to annotate it with `@retry(times=0)`.
940
-
941
- This can be used in conjunction with the `@catch` decorator. The `@catch`
942
- decorator will execute a no-op task after all retries have been exhausted,
943
- ensuring that the flow execution can continue.
944
-
945
-
946
- Parameters
947
- ----------
948
- times : int, default 3
949
- Number of times to retry this task.
950
- minutes_between_retries : int, default 2
951
- Number of minutes between retries.
952
- """
953
- ...
954
-
955
- @typing.overload
956
- def retry(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
957
- ...
958
-
959
- @typing.overload
960
- def retry(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
961
- ...
962
-
963
- def retry(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, times: int = 3, minutes_between_retries: int = 2):
964
- """
965
- Specifies the number of times the task corresponding
966
- to a step needs to be retried.
967
-
968
- This decorator is useful for handling transient errors, such as networking issues.
969
- If your task contains operations that can't be retried safely, e.g. database updates,
970
- it is advisable to annotate it with `@retry(times=0)`.
971
-
972
- This can be used in conjunction with the `@catch` decorator. The `@catch`
973
- decorator will execute a no-op task after all retries have been exhausted,
974
- ensuring that the flow execution can continue.
975
-
976
-
977
- Parameters
978
- ----------
979
- times : int, default 3
980
- Number of times to retry this task.
981
- minutes_between_retries : int, default 2
982
- Number of minutes between retries.
983
- """
984
- ...
985
-
986
- def nvct(*, gpu: int, gpu_type: str) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
987
- """
988
- Specifies that this step should execute on DGX cloud.
989
-
990
-
991
- Parameters
992
- ----------
993
- gpu : int
994
- Number of GPUs to use.
995
- gpu_type : str
996
- Type of Nvidia GPU to use.
997
- """
998
- ...
999
-
1000
- @typing.overload
1001
- def card(*, type: str = 'default', id: typing.Optional[str] = None, options: typing.Dict[str, typing.Any] = {}, timeout: int = 45) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1002
- """
1003
- Creates a human-readable report, a Metaflow Card, after this step completes.
1004
-
1005
- Note that you may add multiple `@card` decorators in a step with different parameters.
1006
-
1007
-
1008
- Parameters
1009
- ----------
1010
- type : str, default 'default'
1011
- Card type.
1012
- id : str, optional, default None
1013
- If multiple cards are present, use this id to identify this card.
1014
- options : Dict[str, Any], default {}
1015
- Options passed to the card. The contents depend on the card type.
1016
- timeout : int, default 45
1017
- Interrupt reporting if it takes more than this many seconds.
1018
- """
1019
- ...
1020
-
1021
- @typing.overload
1022
- def card(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1023
- ...
1024
-
1025
- @typing.overload
1026
- def card(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1027
- ...
1028
-
1029
- def card(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, type: str = 'default', id: typing.Optional[str] = None, options: typing.Dict[str, typing.Any] = {}, timeout: int = 45):
1030
- """
1031
- Creates a human-readable report, a Metaflow Card, after this step completes.
1032
-
1033
- Note that you may add multiple `@card` decorators in a step with different parameters.
1034
-
1035
-
1036
- Parameters
1037
- ----------
1038
- type : str, default 'default'
1039
- Card type.
1040
- id : str, optional, default None
1041
- If multiple cards are present, use this id to identify this card.
1042
- options : Dict[str, Any], default {}
1043
- Options passed to the card. The contents depend on the card type.
1044
- timeout : int, default 45
1045
- Interrupt reporting if it takes more than this many seconds.
1046
- """
1047
- ...
1048
-
1049
- @typing.overload
1050
- def nebius_s3_proxy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1051
- """
1052
- Nebius-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
1053
- It exists to make it easier for users to know that this decorator should only be used with
1054
- a Neo Cloud like Nebius.
1055
- """
1056
- ...
1057
-
1058
- @typing.overload
1059
- def nebius_s3_proxy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1060
- ...
1061
-
1062
- def nebius_s3_proxy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
1063
- """
1064
- Nebius-specific S3 Proxy decorator for routing S3 requests through a local proxy service.
1065
- It exists to make it easier for users to know that this decorator should only be used with
1066
- a Neo Cloud like Nebius.
1067
- """
1068
- ...
1069
-
1070
- @typing.overload
1071
- def resources(*, cpu: int = 1, gpu: typing.Optional[int] = None, disk: typing.Optional[int] = None, memory: int = 4096, shared_memory: typing.Optional[int] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1072
- """
1073
- Specifies the resources needed when executing this step.
1074
-
1075
- Use `@resources` to specify the resource requirements
1076
- independently of the specific compute layer (`@batch`, `@kubernetes`).
1077
-
1078
- You can choose the compute layer on the command line by executing e.g.
1079
- ```
1080
- python myflow.py run --with batch
1081
- ```
1082
- or
1083
- ```
1084
- python myflow.py run --with kubernetes
1085
- ```
1086
- which executes the flow on the desired system using the
1087
- requirements specified in `@resources`.
953
+ which executes the flow on the desired system using the
954
+ requirements specified in `@resources`.
1088
955
 
1089
956
 
1090
957
  Parameters
@@ -1147,102 +1014,200 @@ def resources(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None]
1147
1014
  ...
1148
1015
 
1149
1016
  @typing.overload
1150
- def timeout(*, seconds: int = 0, minutes: int = 0, hours: int = 0) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1017
+ def parallel(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1151
1018
  """
1152
- Specifies a timeout for your step.
1019
+ Decorator prototype for all step decorators. This function gets specialized
1020
+ and imported for all decorators types by _import_plugin_decorators().
1021
+ """
1022
+ ...
1023
+
1024
+ @typing.overload
1025
+ def parallel(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1026
+ ...
1027
+
1028
+ def parallel(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
1029
+ """
1030
+ Decorator prototype for all step decorators. This function gets specialized
1031
+ and imported for all decorators types by _import_plugin_decorators().
1032
+ """
1033
+ ...
1034
+
1035
+ @typing.overload
1036
+ def model(*, load: typing.Union[typing.List[str], str, typing.List[typing.Tuple[str, typing.Optional[str]]]] = None, temp_dir_root: str = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1037
+ """
1038
+ Enables loading / saving of models within a step.
1153
1039
 
1154
- This decorator is useful if this step may hang indefinitely.
1040
+ > Examples
1041
+ - Saving Models
1042
+ ```python
1043
+ @model
1044
+ @step
1045
+ def train(self):
1046
+ # current.model.save returns a dictionary reference to the model saved
1047
+ self.my_model = current.model.save(
1048
+ path_to_my_model,
1049
+ label="my_model",
1050
+ metadata={
1051
+ "epochs": 10,
1052
+ "batch-size": 32,
1053
+ "learning-rate": 0.001,
1054
+ }
1055
+ )
1056
+ self.next(self.test)
1155
1057
 
1156
- This can be used in conjunction with the `@retry` decorator as well as the `@catch` decorator.
1157
- A timeout is considered to be an exception thrown by the step. It will cause the step to be
1158
- retried if needed and the exception will be caught by the `@catch` decorator, if present.
1058
+ @model(load="my_model")
1059
+ @step
1060
+ def test(self):
1061
+ # `current.model.loaded` returns a dictionary of the loaded models
1062
+ # where the key is the name of the artifact and the value is the path to the model
1063
+ print(os.listdir(current.model.loaded["my_model"]))
1064
+ self.next(self.end)
1065
+ ```
1159
1066
 
1160
- Note that all the values specified in parameters are added together so if you specify
1161
- 60 seconds and 1 hour, the decorator will have an effective timeout of 1 hour and 1 minute.
1067
+ - Loading models
1068
+ ```python
1069
+ @step
1070
+ def train(self):
1071
+ # current.model.load returns the path to the model loaded
1072
+ checkpoint_path = current.model.load(
1073
+ self.checkpoint_key,
1074
+ )
1075
+ model_path = current.model.load(
1076
+ self.model,
1077
+ )
1078
+ self.next(self.test)
1079
+ ```
1162
1080
 
1163
1081
 
1164
1082
  Parameters
1165
1083
  ----------
1166
- seconds : int, default 0
1167
- Number of seconds to wait prior to timing out.
1168
- minutes : int, default 0
1169
- Number of minutes to wait prior to timing out.
1170
- hours : int, default 0
1171
- Number of hours to wait prior to timing out.
1084
+ load : Union[List[str],str,List[Tuple[str,Union[str,None]]]], default: None
1085
+ Artifact name/s referencing the models/checkpoints to load. Artifact names refer to the names of the instance variables set to `self`.
1086
+ These artifact names give to `load` be reference objects or reference `key` string's from objects created by `current.checkpoint` / `current.model` / `current.huggingface_hub`.
1087
+ If a list of tuples is provided, the first element is the artifact name and the second element is the path the artifact needs be unpacked on
1088
+ the local filesystem. If the second element is None, the artifact will be unpacked in the current working directory.
1089
+ If a string is provided, then the artifact corresponding to that name will be loaded in the current working directory.
1090
+
1091
+ temp_dir_root : str, default: None
1092
+ The root directory under which `current.model.loaded` will store loaded models
1172
1093
  """
1173
1094
  ...
1174
1095
 
1175
1096
  @typing.overload
1176
- def timeout(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1097
+ def model(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1177
1098
  ...
1178
1099
 
1179
1100
  @typing.overload
1180
- def timeout(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1101
+ def model(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1181
1102
  ...
1182
1103
 
1183
- def timeout(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, seconds: int = 0, minutes: int = 0, hours: int = 0):
1104
+ def model(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, load: typing.Union[typing.List[str], str, typing.List[typing.Tuple[str, typing.Optional[str]]]] = None, temp_dir_root: str = None):
1184
1105
  """
1185
- Specifies a timeout for your step.
1106
+ Enables loading / saving of models within a step.
1186
1107
 
1187
- This decorator is useful if this step may hang indefinitely.
1108
+ > Examples
1109
+ - Saving Models
1110
+ ```python
1111
+ @model
1112
+ @step
1113
+ def train(self):
1114
+ # current.model.save returns a dictionary reference to the model saved
1115
+ self.my_model = current.model.save(
1116
+ path_to_my_model,
1117
+ label="my_model",
1118
+ metadata={
1119
+ "epochs": 10,
1120
+ "batch-size": 32,
1121
+ "learning-rate": 0.001,
1122
+ }
1123
+ )
1124
+ self.next(self.test)
1188
1125
 
1189
- This can be used in conjunction with the `@retry` decorator as well as the `@catch` decorator.
1190
- A timeout is considered to be an exception thrown by the step. It will cause the step to be
1191
- retried if needed and the exception will be caught by the `@catch` decorator, if present.
1126
+ @model(load="my_model")
1127
+ @step
1128
+ def test(self):
1129
+ # `current.model.loaded` returns a dictionary of the loaded models
1130
+ # where the key is the name of the artifact and the value is the path to the model
1131
+ print(os.listdir(current.model.loaded["my_model"]))
1132
+ self.next(self.end)
1133
+ ```
1192
1134
 
1193
- Note that all the values specified in parameters are added together so if you specify
1194
- 60 seconds and 1 hour, the decorator will have an effective timeout of 1 hour and 1 minute.
1135
+ - Loading models
1136
+ ```python
1137
+ @step
1138
+ def train(self):
1139
+ # current.model.load returns the path to the model loaded
1140
+ checkpoint_path = current.model.load(
1141
+ self.checkpoint_key,
1142
+ )
1143
+ model_path = current.model.load(
1144
+ self.model,
1145
+ )
1146
+ self.next(self.test)
1147
+ ```
1195
1148
 
1196
1149
 
1197
1150
  Parameters
1198
1151
  ----------
1199
- seconds : int, default 0
1200
- Number of seconds to wait prior to timing out.
1201
- minutes : int, default 0
1202
- Number of minutes to wait prior to timing out.
1203
- hours : int, default 0
1204
- Number of hours to wait prior to timing out.
1205
- """
1206
- ...
1207
-
1208
- @typing.overload
1209
- def app_deploy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1210
- """
1211
- Decorator prototype for all step decorators. This function gets specialized
1212
- and imported for all decorators types by _import_plugin_decorators().
1213
- """
1214
- ...
1215
-
1216
- @typing.overload
1217
- def app_deploy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1218
- ...
1219
-
1220
- def app_deploy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
1221
- """
1222
- Decorator prototype for all step decorators. This function gets specialized
1223
- and imported for all decorators types by _import_plugin_decorators().
1152
+ load : Union[List[str],str,List[Tuple[str,Union[str,None]]]], default: None
1153
+ Artifact name/s referencing the models/checkpoints to load. Artifact names refer to the names of the instance variables set to `self`.
1154
+ These artifact names give to `load` be reference objects or reference `key` string's from objects created by `current.checkpoint` / `current.model` / `current.huggingface_hub`.
1155
+ If a list of tuples is provided, the first element is the artifact name and the second element is the path the artifact needs be unpacked on
1156
+ the local filesystem. If the second element is None, the artifact will be unpacked in the current working directory.
1157
+ If a string is provided, then the artifact corresponding to that name will be loaded in the current working directory.
1158
+
1159
+ temp_dir_root : str, default: None
1160
+ The root directory under which `current.model.loaded` will store loaded models
1224
1161
  """
1225
1162
  ...
1226
1163
 
1227
- def s3_proxy(*, integration_name: typing.Optional[str] = None, write_mode: typing.Optional[str] = None, debug: typing.Optional[bool] = None) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1164
+ def vllm(*, model: str, backend: str, openai_api_server: bool, debug: bool, card_refresh_interval: int, max_retries: int, retry_alert_frequency: int, engine_args: dict) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1228
1165
  """
1229
- S3 Proxy decorator for routing S3 requests through a local proxy service.
1166
+ This decorator is used to run vllm APIs as Metaflow task sidecars.
1167
+
1168
+ User code call
1169
+ --------------
1170
+ @vllm(
1171
+ model="...",
1172
+ ...
1173
+ )
1174
+
1175
+ Valid backend options
1176
+ ---------------------
1177
+ - 'local': Run as a separate process on the local task machine.
1178
+
1179
+ Valid model options
1180
+ -------------------
1181
+ Any HuggingFace model identifier, e.g. 'meta-llama/Llama-3.2-1B'
1182
+
1183
+ NOTE: vLLM's OpenAI-compatible server serves ONE model per server instance.
1184
+ If you need multiple models, you must create multiple @vllm decorators.
1230
1185
 
1231
1186
 
1232
1187
  Parameters
1233
1188
  ----------
1234
- integration_name : str, optional
1235
- Name of the S3 proxy integration. If not specified, will use the only
1236
- available S3 proxy integration in the namespace (fails if multiple exist).
1237
- write_mode : str, optional
1238
- The desired behavior during write operations to target (origin) S3 bucket.
1239
- allowed options are:
1240
- "origin-and-cache" -> write to both the target S3 bucket and local object
1241
- storage
1242
- "origin" -> only write to the target S3 bucket
1243
- "cache" -> only write to the object storage service used for caching
1244
- debug : bool, optional
1245
- Enable debug logging for proxy operations.
1189
+ model: str
1190
+ HuggingFace model identifier to be served by vLLM.
1191
+ backend: str
1192
+ Determines where and how to run the vLLM process.
1193
+ openai_api_server: bool
1194
+ Whether to use OpenAI-compatible API server mode (subprocess) instead of native engine.
1195
+ Default is False (uses native engine).
1196
+ Set to True for backward compatibility with existing code.
1197
+ debug: bool
1198
+ Whether to turn on verbose debugging logs.
1199
+ card_refresh_interval: int
1200
+ Interval in seconds for refreshing the vLLM status card.
1201
+ Only used when openai_api_server=True.
1202
+ max_retries: int
1203
+ Maximum number of retries checking for vLLM server startup.
1204
+ Only used when openai_api_server=True.
1205
+ retry_alert_frequency: int
1206
+ Frequency of alert logs for vLLM server startup retries.
1207
+ Only used when openai_api_server=True.
1208
+ engine_args : dict
1209
+ Additional keyword arguments to pass to the vLLM engine.
1210
+ For example, `tensor_parallel_size=2`.
1246
1211
  """
1247
1212
  ...
1248
1213
 
@@ -1305,140 +1270,81 @@ def conda(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], ty
1305
1270
  """
1306
1271
  ...
1307
1272
 
1308
- def ollama(*, models: list, backend: str, force_pull: bool, cache_update_policy: str, force_cache_update: bool, debug: bool, circuit_breaker_config: dict, timeout_config: dict) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1273
+ @typing.overload
1274
+ def app_deploy(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1309
1275
  """
1310
- This decorator is used to run Ollama APIs as Metaflow task sidecars.
1311
-
1312
- User code call
1313
- --------------
1314
- @ollama(
1315
- models=[...],
1316
- ...
1317
- )
1318
-
1319
- Valid backend options
1320
- ---------------------
1321
- - 'local': Run as a separate process on the local task machine.
1322
- - (TODO) 'managed': Outerbounds hosts and selects compute provider.
1323
- - (TODO) 'remote': Spin up separate instance to serve Ollama models.
1324
-
1325
- Valid model options
1326
- -------------------
1327
- Any model here https://ollama.com/search, e.g. 'llama3.2', 'llama3.3'
1328
-
1329
-
1330
- Parameters
1331
- ----------
1332
- models: list[str]
1333
- List of Ollama containers running models in sidecars.
1334
- backend: str
1335
- Determines where and how to run the Ollama process.
1336
- force_pull: bool
1337
- Whether to run `ollama pull` no matter what, or first check the remote cache in Metaflow datastore for this model key.
1338
- cache_update_policy: str
1339
- Cache update policy: "auto", "force", or "never".
1340
- force_cache_update: bool
1341
- Simple override for "force" cache update policy.
1342
- debug: bool
1343
- Whether to turn on verbose debugging logs.
1344
- circuit_breaker_config: dict
1345
- Configuration for circuit breaker protection. Keys: failure_threshold, recovery_timeout, reset_timeout.
1346
- timeout_config: dict
1347
- Configuration for various operation timeouts. Keys: pull, stop, health_check, install, server_startup.
1276
+ Decorator prototype for all step decorators. This function gets specialized
1277
+ and imported for all decorators types by _import_plugin_decorators().
1348
1278
  """
1349
1279
  ...
1350
1280
 
1351
1281
  @typing.overload
1352
- def conda_base(*, packages: typing.Dict[str, str] = {}, libraries: typing.Dict[str, str] = {}, python: typing.Optional[str] = None, disabled: bool = False) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1282
+ def app_deploy(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1283
+ ...
1284
+
1285
+ def app_deploy(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None):
1353
1286
  """
1354
- Specifies the Conda environment for all steps of the flow.
1287
+ Decorator prototype for all step decorators. This function gets specialized
1288
+ and imported for all decorators types by _import_plugin_decorators().
1289
+ """
1290
+ ...
1291
+
1292
+ @typing.overload
1293
+ def timeout(*, seconds: int = 0, minutes: int = 0, hours: int = 0) -> typing.Callable[[typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]], typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]]]:
1294
+ """
1295
+ Specifies a timeout for your step.
1355
1296
 
1356
- Use `@conda_base` to set common libraries required by all
1357
- steps and use `@conda` to specify step-specific additions.
1297
+ This decorator is useful if this step may hang indefinitely.
1298
+
1299
+ This can be used in conjunction with the `@retry` decorator as well as the `@catch` decorator.
1300
+ A timeout is considered to be an exception thrown by the step. It will cause the step to be
1301
+ retried if needed and the exception will be caught by the `@catch` decorator, if present.
1358
1302
 
1303
+ Note that all the values specified in parameters are added together so if you specify
1304
+ 60 seconds and 1 hour, the decorator will have an effective timeout of 1 hour and 1 minute.
1359
1305
 
1360
- Parameters
1361
- ----------
1362
- packages : Dict[str, str], default {}
1363
- Packages to use for this flow. The key is the name of the package
1364
- and the value is the version to use.
1365
- libraries : Dict[str, str], default {}
1366
- Supported for backward compatibility. When used with packages, packages will take precedence.
1367
- python : str, optional, default None
1368
- Version of Python to use, e.g. '3.7.4'. A default value of None implies
1369
- that the version used will correspond to the version of the Python interpreter used to start the run.
1370
- disabled : bool, default False
1371
- If set to True, disables Conda.
1306
+
1307
+ Parameters
1308
+ ----------
1309
+ seconds : int, default 0
1310
+ Number of seconds to wait prior to timing out.
1311
+ minutes : int, default 0
1312
+ Number of minutes to wait prior to timing out.
1313
+ hours : int, default 0
1314
+ Number of hours to wait prior to timing out.
1372
1315
  """
1373
1316
  ...
1374
1317
 
1375
1318
  @typing.overload
1376
- def conda_base(f: typing.Type[FlowSpecDerived]) -> typing.Type[FlowSpecDerived]:
1319
+ def timeout(f: typing.Callable[[FlowSpecDerived, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, StepFlag], None]:
1377
1320
  ...
1378
1321
 
1379
- def conda_base(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, packages: typing.Dict[str, str] = {}, libraries: typing.Dict[str, str] = {}, python: typing.Optional[str] = None, disabled: bool = False):
1322
+ @typing.overload
1323
+ def timeout(f: typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]) -> typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None]:
1324
+ ...
1325
+
1326
+ def timeout(f: typing.Union[typing.Callable[[FlowSpecDerived, StepFlag], None], typing.Callable[[FlowSpecDerived, typing.Any, StepFlag], None], None] = None, *, seconds: int = 0, minutes: int = 0, hours: int = 0):
1380
1327
  """
1381
- Specifies the Conda environment for all steps of the flow.
1328
+ Specifies a timeout for your step.
1382
1329
 
1383
- Use `@conda_base` to set common libraries required by all
1384
- steps and use `@conda` to specify step-specific additions.
1330
+ This decorator is useful if this step may hang indefinitely.
1385
1331
 
1332
+ This can be used in conjunction with the `@retry` decorator as well as the `@catch` decorator.
1333
+ A timeout is considered to be an exception thrown by the step. It will cause the step to be
1334
+ retried if needed and the exception will be caught by the `@catch` decorator, if present.
1386
1335
 
1387
- Parameters
1388
- ----------
1389
- packages : Dict[str, str], default {}
1390
- Packages to use for this flow. The key is the name of the package
1391
- and the value is the version to use.
1392
- libraries : Dict[str, str], default {}
1393
- Supported for backward compatibility. When used with packages, packages will take precedence.
1394
- python : str, optional, default None
1395
- Version of Python to use, e.g. '3.7.4'. A default value of None implies
1396
- that the version used will correspond to the version of the Python interpreter used to start the run.
1397
- disabled : bool, default False
1398
- If set to True, disables Conda.
1399
- """
1400
- ...
1401
-
1402
- def airflow_external_task_sensor(*, timeout: int, poke_interval: int, mode: str, exponential_backoff: bool, pool: str, soft_fail: bool, name: str, description: str, external_dag_id: str, external_task_ids: typing.List[str], allowed_states: typing.List[str], failed_states: typing.List[str], execution_delta: "datetime.timedelta", check_existence: bool) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1403
- """
1404
- The `@airflow_external_task_sensor` decorator attaches a Airflow [ExternalTaskSensor](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/external_task/index.html#airflow.sensors.external_task.ExternalTaskSensor) before the start step of the flow.
1405
- This decorator only works when a flow is scheduled on Airflow and is compiled using `airflow create`. More than one `@airflow_external_task_sensor` can be added as a flow decorators. Adding more than one decorator will ensure that `start` step starts only after all sensors finish.
1336
+ Note that all the values specified in parameters are added together so if you specify
1337
+ 60 seconds and 1 hour, the decorator will have an effective timeout of 1 hour and 1 minute.
1406
1338
 
1407
1339
 
1408
1340
  Parameters
1409
1341
  ----------
1410
- timeout : int
1411
- Time, in seconds before the task times out and fails. (Default: 3600)
1412
- poke_interval : int
1413
- Time in seconds that the job should wait in between each try. (Default: 60)
1414
- mode : str
1415
- How the sensor operates. Options are: { poke | reschedule }. (Default: "poke")
1416
- exponential_backoff : bool
1417
- allow progressive longer waits between pokes by using exponential backoff algorithm. (Default: True)
1418
- pool : str
1419
- the slot pool this task should run in,
1420
- slot pools are a way to limit concurrency for certain tasks. (Default:None)
1421
- soft_fail : bool
1422
- Set to true to mark the task as SKIPPED on failure. (Default: False)
1423
- name : str
1424
- Name of the sensor on Airflow
1425
- description : str
1426
- Description of sensor in the Airflow UI
1427
- external_dag_id : str
1428
- The dag_id that contains the task you want to wait for.
1429
- external_task_ids : List[str]
1430
- The list of task_ids that you want to wait for.
1431
- If None (default value) the sensor waits for the DAG. (Default: None)
1432
- allowed_states : List[str]
1433
- Iterable of allowed states, (Default: ['success'])
1434
- failed_states : List[str]
1435
- Iterable of failed or dis-allowed states. (Default: None)
1436
- execution_delta : datetime.timedelta
1437
- time difference with the previous execution to look at,
1438
- the default is the same logical date as the current task or DAG. (Default: None)
1439
- check_existence: bool
1440
- Set to True to check if the external task exists or check if
1441
- the DAG to wait for exists. (Default: True)
1342
+ seconds : int, default 0
1343
+ Number of seconds to wait prior to timing out.
1344
+ minutes : int, default 0
1345
+ Number of minutes to wait prior to timing out.
1346
+ hours : int, default 0
1347
+ Number of hours to wait prior to timing out.
1442
1348
  """
1443
1349
  ...
1444
1350
 
@@ -1493,127 +1399,10 @@ def schedule(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, hourly:
1493
1399
  """
1494
1400
  ...
1495
1401
 
1496
- def with_artifact_store(f: typing.Optional[typing.Type[FlowSpecDerived]] = None):
1497
- """
1498
- Allows setting external datastores to save data for the
1499
- `@checkpoint`/`@model`/`@huggingface_hub` decorators.
1500
-
1501
- This decorator is useful when users wish to save data to a different datastore
1502
- than what is configured in Metaflow. This can be for variety of reasons:
1503
-
1504
- 1. Data security: The objects needs to be stored in a bucket (object storage) that is not accessible by other flows.
1505
- 2. Data Locality: The location where the task is executing is not located in the same region as the datastore.
1506
- - Example: Metaflow datastore lives in US East, but the task is executing in Finland datacenters.
1507
- 3. Data Lifecycle Policies: The objects need to be archived / managed separately from the Metaflow managed objects.
1508
- - Example: Flow is training very large models that need to be stored separately and will be deleted more aggressively than the Metaflow managed objects.
1509
-
1510
- Usage:
1511
- ----------
1512
-
1513
- - Using a custom IAM role to access the datastore.
1514
-
1515
- ```python
1516
- @with_artifact_store(
1517
- type="s3",
1518
- config=lambda: {
1519
- "root": "s3://my-bucket-foo/path/to/root",
1520
- "role_arn": ROLE,
1521
- },
1522
- )
1523
- class MyFlow(FlowSpec):
1524
-
1525
- @checkpoint
1526
- @step
1527
- def start(self):
1528
- with open("my_file.txt", "w") as f:
1529
- f.write("Hello, World!")
1530
- self.external_bucket_checkpoint = current.checkpoint.save("my_file.txt")
1531
- self.next(self.end)
1532
-
1533
- ```
1534
-
1535
- - Using credentials to access the s3-compatible datastore.
1536
-
1537
- ```python
1538
- @with_artifact_store(
1539
- type="s3",
1540
- config=lambda: {
1541
- "root": "s3://my-bucket-foo/path/to/root",
1542
- "client_params": {
1543
- "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1544
- "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1545
- },
1546
- },
1547
- )
1548
- class MyFlow(FlowSpec):
1549
-
1550
- @checkpoint
1551
- @step
1552
- def start(self):
1553
- with open("my_file.txt", "w") as f:
1554
- f.write("Hello, World!")
1555
- self.external_bucket_checkpoint = current.checkpoint.save("my_file.txt")
1556
- self.next(self.end)
1557
-
1558
- ```
1559
-
1560
- - Accessing objects stored in external datastores after task execution.
1561
-
1562
- ```python
1563
- run = Run("CheckpointsTestsFlow/8992")
1564
- with artifact_store_from(run=run, config={
1565
- "client_params": {
1566
- "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1567
- "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1568
- },
1569
- }):
1570
- with Checkpoint() as cp:
1571
- latest = cp.list(
1572
- task=run["start"].task
1573
- )[0]
1574
- print(latest)
1575
- cp.load(
1576
- latest,
1577
- "test-checkpoints"
1578
- )
1579
-
1580
- task = Task("TorchTuneFlow/8484/train/53673")
1581
- with artifact_store_from(run=run, config={
1582
- "client_params": {
1583
- "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1584
- "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1585
- },
1586
- }):
1587
- load_model(
1588
- task.data.model_ref,
1589
- "test-models"
1590
- )
1591
- ```
1592
- Parameters:
1593
- ----------
1594
-
1595
- type: str
1596
- The type of the datastore. Can be one of 's3', 'gcs', 'azure' or any other supported metaflow Datastore.
1597
-
1598
- config: dict or Callable
1599
- Dictionary of configuration options for the datastore. The following keys are required:
1600
- - root: The root path in the datastore where the data will be saved. (needs to be in the format expected by the datastore)
1601
- - example: 's3://bucket-name/path/to/root'
1602
- - example: 'gs://bucket-name/path/to/root'
1603
- - example: 'https://myblockacc.blob.core.windows.net/metaflow/'
1604
- - role_arn (optional): AWS IAM role to access s3 bucket (only when `type` is 's3')
1605
- - session_vars (optional): AWS session variables to access s3 bucket (only when `type` is 's3')
1606
- - client_params (optional): AWS client parameters to access s3 bucket (only when `type` is 's3')
1607
- """
1608
- ...
1609
-
1610
- def airflow_s3_key_sensor(*, timeout: int, poke_interval: int, mode: str, exponential_backoff: bool, pool: str, soft_fail: bool, name: str, description: str, bucket_key: typing.Union[str, typing.List[str]], bucket_name: str, wildcard_match: bool, aws_conn_id: str, verify: bool) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1402
+ def airflow_external_task_sensor(*, timeout: int, poke_interval: int, mode: str, exponential_backoff: bool, pool: str, soft_fail: bool, name: str, description: str, external_dag_id: str, external_task_ids: typing.List[str], allowed_states: typing.List[str], failed_states: typing.List[str], execution_delta: "datetime.timedelta", check_existence: bool) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1611
1403
  """
1612
- The `@airflow_s3_key_sensor` decorator attaches a Airflow [S3KeySensor](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/sensors/s3/index.html#airflow.providers.amazon.aws.sensors.s3.S3KeySensor)
1613
- before the start step of the flow. This decorator only works when a flow is scheduled on Airflow
1614
- and is compiled using `airflow create`. More than one `@airflow_s3_key_sensor` can be
1615
- added as a flow decorators. Adding more than one decorator will ensure that `start` step
1616
- starts only after all sensors finish.
1404
+ The `@airflow_external_task_sensor` decorator attaches a Airflow [ExternalTaskSensor](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/external_task/index.html#airflow.sensors.external_task.ExternalTaskSensor) before the start step of the flow.
1405
+ This decorator only works when a flow is scheduled on Airflow and is compiled using `airflow create`. More than one `@airflow_external_task_sensor` can be added as a flow decorators. Adding more than one decorator will ensure that `start` step starts only after all sensors finish.
1617
1406
 
1618
1407
 
1619
1408
  Parameters
@@ -1635,18 +1424,56 @@ def airflow_s3_key_sensor(*, timeout: int, poke_interval: int, mode: str, expone
1635
1424
  Name of the sensor on Airflow
1636
1425
  description : str
1637
1426
  Description of sensor in the Airflow UI
1638
- bucket_key : Union[str, List[str]]
1639
- The key(s) being waited on. Supports full s3:// style url or relative path from root level.
1640
- When it's specified as a full s3:// url, please leave `bucket_name` as None
1641
- bucket_name : str
1642
- Name of the S3 bucket. Only needed when bucket_key is not provided as a full s3:// url.
1643
- When specified, all the keys passed to bucket_key refers to this bucket. (Default:None)
1644
- wildcard_match : bool
1645
- whether the bucket_key should be interpreted as a Unix wildcard pattern. (Default: False)
1646
- aws_conn_id : str
1647
- a reference to the s3 connection on Airflow. (Default: None)
1648
- verify : bool
1649
- Whether or not to verify SSL certificates for S3 connection. (Default: None)
1427
+ external_dag_id : str
1428
+ The dag_id that contains the task you want to wait for.
1429
+ external_task_ids : List[str]
1430
+ The list of task_ids that you want to wait for.
1431
+ If None (default value) the sensor waits for the DAG. (Default: None)
1432
+ allowed_states : List[str]
1433
+ Iterable of allowed states, (Default: ['success'])
1434
+ failed_states : List[str]
1435
+ Iterable of failed or dis-allowed states. (Default: None)
1436
+ execution_delta : datetime.timedelta
1437
+ time difference with the previous execution to look at,
1438
+ the default is the same logical date as the current task or DAG. (Default: None)
1439
+ check_existence: bool
1440
+ Set to True to check if the external task exists or check if
1441
+ the DAG to wait for exists. (Default: True)
1442
+ """
1443
+ ...
1444
+
1445
+ def project(*, name: str, branch: typing.Optional[str] = None, production: bool = False) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1446
+ """
1447
+ Specifies what flows belong to the same project.
1448
+
1449
+ A project-specific namespace is created for all flows that
1450
+ use the same `@project(name)`.
1451
+
1452
+
1453
+ Parameters
1454
+ ----------
1455
+ name : str
1456
+ Project name. Make sure that the name is unique amongst all
1457
+ projects that use the same production scheduler. The name may
1458
+ contain only lowercase alphanumeric characters and underscores.
1459
+
1460
+ branch : Optional[str], default None
1461
+ The branch to use. If not specified, the branch is set to
1462
+ `user.<username>` unless `production` is set to `True`. This can
1463
+ also be set on the command line using `--branch` as a top-level option.
1464
+ It is an error to specify `branch` in the decorator and on the command line.
1465
+
1466
+ production : bool, default False
1467
+ Whether or not the branch is the production branch. This can also be set on the
1468
+ command line using `--production` as a top-level option. It is an error to specify
1469
+ `production` in the decorator and on the command line.
1470
+ The project branch name will be:
1471
+ - if `branch` is specified:
1472
+ - if `production` is True: `prod.<branch>`
1473
+ - if `production` is False: `test.<branch>`
1474
+ - if `branch` is not specified:
1475
+ - if `production` is True: `prod`
1476
+ - if `production` is False: `user.<username>`
1650
1477
  """
1651
1478
  ...
1652
1479
 
@@ -1751,6 +1578,120 @@ def trigger_on_finish(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *
1751
1578
  """
1752
1579
  ...
1753
1580
 
1581
+ def with_artifact_store(f: typing.Optional[typing.Type[FlowSpecDerived]] = None):
1582
+ """
1583
+ Allows setting external datastores to save data for the
1584
+ `@checkpoint`/`@model`/`@huggingface_hub` decorators.
1585
+
1586
+ This decorator is useful when users wish to save data to a different datastore
1587
+ than what is configured in Metaflow. This can be for variety of reasons:
1588
+
1589
+ 1. Data security: The objects needs to be stored in a bucket (object storage) that is not accessible by other flows.
1590
+ 2. Data Locality: The location where the task is executing is not located in the same region as the datastore.
1591
+ - Example: Metaflow datastore lives in US East, but the task is executing in Finland datacenters.
1592
+ 3. Data Lifecycle Policies: The objects need to be archived / managed separately from the Metaflow managed objects.
1593
+ - Example: Flow is training very large models that need to be stored separately and will be deleted more aggressively than the Metaflow managed objects.
1594
+
1595
+ Usage:
1596
+ ----------
1597
+
1598
+ - Using a custom IAM role to access the datastore.
1599
+
1600
+ ```python
1601
+ @with_artifact_store(
1602
+ type="s3",
1603
+ config=lambda: {
1604
+ "root": "s3://my-bucket-foo/path/to/root",
1605
+ "role_arn": ROLE,
1606
+ },
1607
+ )
1608
+ class MyFlow(FlowSpec):
1609
+
1610
+ @checkpoint
1611
+ @step
1612
+ def start(self):
1613
+ with open("my_file.txt", "w") as f:
1614
+ f.write("Hello, World!")
1615
+ self.external_bucket_checkpoint = current.checkpoint.save("my_file.txt")
1616
+ self.next(self.end)
1617
+
1618
+ ```
1619
+
1620
+ - Using credentials to access the s3-compatible datastore.
1621
+
1622
+ ```python
1623
+ @with_artifact_store(
1624
+ type="s3",
1625
+ config=lambda: {
1626
+ "root": "s3://my-bucket-foo/path/to/root",
1627
+ "client_params": {
1628
+ "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1629
+ "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1630
+ },
1631
+ },
1632
+ )
1633
+ class MyFlow(FlowSpec):
1634
+
1635
+ @checkpoint
1636
+ @step
1637
+ def start(self):
1638
+ with open("my_file.txt", "w") as f:
1639
+ f.write("Hello, World!")
1640
+ self.external_bucket_checkpoint = current.checkpoint.save("my_file.txt")
1641
+ self.next(self.end)
1642
+
1643
+ ```
1644
+
1645
+ - Accessing objects stored in external datastores after task execution.
1646
+
1647
+ ```python
1648
+ run = Run("CheckpointsTestsFlow/8992")
1649
+ with artifact_store_from(run=run, config={
1650
+ "client_params": {
1651
+ "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1652
+ "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1653
+ },
1654
+ }):
1655
+ with Checkpoint() as cp:
1656
+ latest = cp.list(
1657
+ task=run["start"].task
1658
+ )[0]
1659
+ print(latest)
1660
+ cp.load(
1661
+ latest,
1662
+ "test-checkpoints"
1663
+ )
1664
+
1665
+ task = Task("TorchTuneFlow/8484/train/53673")
1666
+ with artifact_store_from(run=run, config={
1667
+ "client_params": {
1668
+ "aws_access_key_id": os.environ.get("MY_CUSTOM_ACCESS_KEY"),
1669
+ "aws_secret_access_key": os.environ.get("MY_CUSTOM_SECRET_KEY"),
1670
+ },
1671
+ }):
1672
+ load_model(
1673
+ task.data.model_ref,
1674
+ "test-models"
1675
+ )
1676
+ ```
1677
+ Parameters:
1678
+ ----------
1679
+
1680
+ type: str
1681
+ The type of the datastore. Can be one of 's3', 'gcs', 'azure' or any other supported metaflow Datastore.
1682
+
1683
+ config: dict or Callable
1684
+ Dictionary of configuration options for the datastore. The following keys are required:
1685
+ - root: The root path in the datastore where the data will be saved. (needs to be in the format expected by the datastore)
1686
+ - example: 's3://bucket-name/path/to/root'
1687
+ - example: 'gs://bucket-name/path/to/root'
1688
+ - example: 'https://myblockacc.blob.core.windows.net/metaflow/'
1689
+ - role_arn (optional): AWS IAM role to access s3 bucket (only when `type` is 's3')
1690
+ - session_vars (optional): AWS session variables to access s3 bucket (only when `type` is 's3')
1691
+ - client_params (optional): AWS client parameters to access s3 bucket (only when `type` is 's3')
1692
+ """
1693
+ ...
1694
+
1754
1695
  @typing.overload
1755
1696
  def trigger(*, event: typing.Union[str, typing.Dict[str, typing.Any], None] = None, events: typing.List[typing.Union[str, typing.Dict[str, typing.Any]]] = [], options: typing.Dict[str, typing.Any] = {}) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1756
1697
  """
@@ -1844,30 +1785,102 @@ def trigger(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, event: t
1844
1785
  """
1845
1786
  ...
1846
1787
 
1788
+ def airflow_s3_key_sensor(*, timeout: int, poke_interval: int, mode: str, exponential_backoff: bool, pool: str, soft_fail: bool, name: str, description: str, bucket_key: typing.Union[str, typing.List[str]], bucket_name: str, wildcard_match: bool, aws_conn_id: str, verify: bool) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1789
+ """
1790
+ The `@airflow_s3_key_sensor` decorator attaches a Airflow [S3KeySensor](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/sensors/s3/index.html#airflow.providers.amazon.aws.sensors.s3.S3KeySensor)
1791
+ before the start step of the flow. This decorator only works when a flow is scheduled on Airflow
1792
+ and is compiled using `airflow create`. More than one `@airflow_s3_key_sensor` can be
1793
+ added as a flow decorators. Adding more than one decorator will ensure that `start` step
1794
+ starts only after all sensors finish.
1795
+
1796
+
1797
+ Parameters
1798
+ ----------
1799
+ timeout : int
1800
+ Time, in seconds before the task times out and fails. (Default: 3600)
1801
+ poke_interval : int
1802
+ Time in seconds that the job should wait in between each try. (Default: 60)
1803
+ mode : str
1804
+ How the sensor operates. Options are: { poke | reschedule }. (Default: "poke")
1805
+ exponential_backoff : bool
1806
+ allow progressive longer waits between pokes by using exponential backoff algorithm. (Default: True)
1807
+ pool : str
1808
+ the slot pool this task should run in,
1809
+ slot pools are a way to limit concurrency for certain tasks. (Default:None)
1810
+ soft_fail : bool
1811
+ Set to true to mark the task as SKIPPED on failure. (Default: False)
1812
+ name : str
1813
+ Name of the sensor on Airflow
1814
+ description : str
1815
+ Description of sensor in the Airflow UI
1816
+ bucket_key : Union[str, List[str]]
1817
+ The key(s) being waited on. Supports full s3:// style url or relative path from root level.
1818
+ When it's specified as a full s3:// url, please leave `bucket_name` as None
1819
+ bucket_name : str
1820
+ Name of the S3 bucket. Only needed when bucket_key is not provided as a full s3:// url.
1821
+ When specified, all the keys passed to bucket_key refers to this bucket. (Default:None)
1822
+ wildcard_match : bool
1823
+ whether the bucket_key should be interpreted as a Unix wildcard pattern. (Default: False)
1824
+ aws_conn_id : str
1825
+ a reference to the s3 connection on Airflow. (Default: None)
1826
+ verify : bool
1827
+ Whether or not to verify SSL certificates for S3 connection. (Default: None)
1828
+ """
1829
+ ...
1830
+
1847
1831
  @typing.overload
1848
- def pypi_base(*, packages: typing.Dict[str, str] = {}, python: typing.Optional[str] = None) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1832
+ def conda_base(*, packages: typing.Dict[str, str] = {}, libraries: typing.Dict[str, str] = {}, python: typing.Optional[str] = None, disabled: bool = False) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1849
1833
  """
1850
- Specifies the PyPI packages for all steps of the flow.
1834
+ Specifies the Conda environment for all steps of the flow.
1835
+
1836
+ Use `@conda_base` to set common libraries required by all
1837
+ steps and use `@conda` to specify step-specific additions.
1851
1838
 
1852
- Use `@pypi_base` to set common packages required by all
1853
- steps and use `@pypi` to specify step-specific overrides.
1854
1839
 
1855
1840
  Parameters
1856
1841
  ----------
1857
- packages : Dict[str, str], default: {}
1842
+ packages : Dict[str, str], default {}
1858
1843
  Packages to use for this flow. The key is the name of the package
1859
1844
  and the value is the version to use.
1860
- python : str, optional, default: None
1845
+ libraries : Dict[str, str], default {}
1846
+ Supported for backward compatibility. When used with packages, packages will take precedence.
1847
+ python : str, optional, default None
1861
1848
  Version of Python to use, e.g. '3.7.4'. A default value of None implies
1862
1849
  that the version used will correspond to the version of the Python interpreter used to start the run.
1850
+ disabled : bool, default False
1851
+ If set to True, disables Conda.
1863
1852
  """
1864
1853
  ...
1865
1854
 
1866
1855
  @typing.overload
1867
- def pypi_base(f: typing.Type[FlowSpecDerived]) -> typing.Type[FlowSpecDerived]:
1856
+ def conda_base(f: typing.Type[FlowSpecDerived]) -> typing.Type[FlowSpecDerived]:
1868
1857
  ...
1869
1858
 
1870
- def pypi_base(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, packages: typing.Dict[str, str] = {}, python: typing.Optional[str] = None):
1859
+ def conda_base(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, packages: typing.Dict[str, str] = {}, libraries: typing.Dict[str, str] = {}, python: typing.Optional[str] = None, disabled: bool = False):
1860
+ """
1861
+ Specifies the Conda environment for all steps of the flow.
1862
+
1863
+ Use `@conda_base` to set common libraries required by all
1864
+ steps and use `@conda` to specify step-specific additions.
1865
+
1866
+
1867
+ Parameters
1868
+ ----------
1869
+ packages : Dict[str, str], default {}
1870
+ Packages to use for this flow. The key is the name of the package
1871
+ and the value is the version to use.
1872
+ libraries : Dict[str, str], default {}
1873
+ Supported for backward compatibility. When used with packages, packages will take precedence.
1874
+ python : str, optional, default None
1875
+ Version of Python to use, e.g. '3.7.4'. A default value of None implies
1876
+ that the version used will correspond to the version of the Python interpreter used to start the run.
1877
+ disabled : bool, default False
1878
+ If set to True, disables Conda.
1879
+ """
1880
+ ...
1881
+
1882
+ @typing.overload
1883
+ def pypi_base(*, packages: typing.Dict[str, str] = {}, python: typing.Optional[str] = None) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1871
1884
  """
1872
1885
  Specifies the PyPI packages for all steps of the flow.
1873
1886
 
@@ -1885,38 +1898,25 @@ def pypi_base(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, packag
1885
1898
  """
1886
1899
  ...
1887
1900
 
1888
- def project(*, name: str, branch: typing.Optional[str] = None, production: bool = False) -> typing.Callable[[typing.Type[FlowSpecDerived]], typing.Type[FlowSpecDerived]]:
1901
+ @typing.overload
1902
+ def pypi_base(f: typing.Type[FlowSpecDerived]) -> typing.Type[FlowSpecDerived]:
1903
+ ...
1904
+
1905
+ def pypi_base(f: typing.Optional[typing.Type[FlowSpecDerived]] = None, *, packages: typing.Dict[str, str] = {}, python: typing.Optional[str] = None):
1889
1906
  """
1890
- Specifies what flows belong to the same project.
1891
-
1892
- A project-specific namespace is created for all flows that
1893
- use the same `@project(name)`.
1907
+ Specifies the PyPI packages for all steps of the flow.
1894
1908
 
1909
+ Use `@pypi_base` to set common packages required by all
1910
+ steps and use `@pypi` to specify step-specific overrides.
1895
1911
 
1896
1912
  Parameters
1897
1913
  ----------
1898
- name : str
1899
- Project name. Make sure that the name is unique amongst all
1900
- projects that use the same production scheduler. The name may
1901
- contain only lowercase alphanumeric characters and underscores.
1902
-
1903
- branch : Optional[str], default None
1904
- The branch to use. If not specified, the branch is set to
1905
- `user.<username>` unless `production` is set to `True`. This can
1906
- also be set on the command line using `--branch` as a top-level option.
1907
- It is an error to specify `branch` in the decorator and on the command line.
1908
-
1909
- production : bool, default False
1910
- Whether or not the branch is the production branch. This can also be set on the
1911
- command line using `--production` as a top-level option. It is an error to specify
1912
- `production` in the decorator and on the command line.
1913
- The project branch name will be:
1914
- - if `branch` is specified:
1915
- - if `production` is True: `prod.<branch>`
1916
- - if `production` is False: `test.<branch>`
1917
- - if `branch` is not specified:
1918
- - if `production` is True: `prod`
1919
- - if `production` is False: `user.<username>`
1914
+ packages : Dict[str, str], default: {}
1915
+ Packages to use for this flow. The key is the name of the package
1916
+ and the value is the version to use.
1917
+ python : str, optional, default: None
1918
+ Version of Python to use, e.g. '3.7.4'. A default value of None implies
1919
+ that the version used will correspond to the version of the Python interpreter used to start the run.
1920
1920
  """
1921
1921
  ...
1922
1922