skypilot-nightly 1.0.0.dev20250922__py3-none-any.whl → 1.0.0.dev20250925__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of skypilot-nightly might be problematic. Click here for more details.

Files changed (111) hide show
  1. sky/__init__.py +2 -2
  2. sky/backends/backend.py +10 -0
  3. sky/backends/backend_utils.py +194 -69
  4. sky/backends/cloud_vm_ray_backend.py +37 -13
  5. sky/backends/local_docker_backend.py +9 -0
  6. sky/client/cli/command.py +104 -53
  7. sky/client/sdk.py +13 -5
  8. sky/client/sdk_async.py +4 -2
  9. sky/clouds/kubernetes.py +2 -1
  10. sky/clouds/runpod.py +20 -7
  11. sky/core.py +7 -53
  12. sky/dashboard/out/404.html +1 -1
  13. sky/dashboard/out/_next/static/{KP6HCNMqb_bnJB17oplgW → bn-NHt5qTzeTN2PefXuDA}/_buildManifest.js +1 -1
  14. sky/dashboard/out/_next/static/chunks/1121-b911fc0a0b4742f0.js +1 -0
  15. sky/dashboard/out/_next/static/chunks/6856-2b3600ff2854d066.js +1 -0
  16. sky/dashboard/out/_next/static/chunks/8969-d8bc3a2b9cf839a9.js +1 -0
  17. sky/dashboard/out/_next/static/chunks/pages/clusters/[cluster]/[job]-2cb9b15e09cda628.js +16 -0
  18. sky/dashboard/out/_next/static/chunks/pages/clusters/{[cluster]-9525660179df3605.js → [cluster]-e052384df65ef200.js} +1 -1
  19. sky/dashboard/out/_next/static/chunks/{webpack-26167a9e6d91fa51.js → webpack-16ba1d7187d2e3b1.js} +1 -1
  20. sky/dashboard/out/clusters/[cluster]/[job].html +1 -1
  21. sky/dashboard/out/clusters/[cluster].html +1 -1
  22. sky/dashboard/out/clusters.html +1 -1
  23. sky/dashboard/out/config.html +1 -1
  24. sky/dashboard/out/index.html +1 -1
  25. sky/dashboard/out/infra/[context].html +1 -1
  26. sky/dashboard/out/infra.html +1 -1
  27. sky/dashboard/out/jobs/[job].html +1 -1
  28. sky/dashboard/out/jobs/pools/[pool].html +1 -1
  29. sky/dashboard/out/jobs.html +1 -1
  30. sky/dashboard/out/users.html +1 -1
  31. sky/dashboard/out/volumes.html +1 -1
  32. sky/dashboard/out/workspace/new.html +1 -1
  33. sky/dashboard/out/workspaces/[name].html +1 -1
  34. sky/dashboard/out/workspaces.html +1 -1
  35. sky/data/mounting_utils.py +19 -10
  36. sky/execution.py +4 -2
  37. sky/global_user_state.py +217 -36
  38. sky/jobs/client/sdk.py +10 -1
  39. sky/jobs/controller.py +7 -7
  40. sky/jobs/server/core.py +3 -3
  41. sky/jobs/server/server.py +15 -11
  42. sky/jobs/utils.py +1 -1
  43. sky/logs/agent.py +30 -3
  44. sky/logs/aws.py +9 -19
  45. sky/provision/__init__.py +2 -1
  46. sky/provision/aws/instance.py +2 -1
  47. sky/provision/azure/instance.py +2 -1
  48. sky/provision/cudo/instance.py +2 -2
  49. sky/provision/do/instance.py +2 -2
  50. sky/provision/docker_utils.py +41 -19
  51. sky/provision/fluidstack/instance.py +2 -2
  52. sky/provision/gcp/instance.py +2 -1
  53. sky/provision/hyperbolic/instance.py +2 -1
  54. sky/provision/instance_setup.py +1 -1
  55. sky/provision/kubernetes/instance.py +134 -8
  56. sky/provision/lambda_cloud/instance.py +2 -1
  57. sky/provision/nebius/instance.py +2 -1
  58. sky/provision/oci/instance.py +2 -1
  59. sky/provision/paperspace/instance.py +2 -2
  60. sky/provision/primeintellect/instance.py +2 -2
  61. sky/provision/provisioner.py +1 -0
  62. sky/provision/runpod/instance.py +2 -2
  63. sky/provision/scp/instance.py +2 -2
  64. sky/provision/seeweb/instance.py +2 -1
  65. sky/provision/vast/instance.py +2 -1
  66. sky/provision/vsphere/instance.py +6 -5
  67. sky/schemas/api/responses.py +2 -1
  68. sky/serve/autoscalers.py +2 -0
  69. sky/serve/client/impl.py +45 -19
  70. sky/serve/replica_managers.py +12 -5
  71. sky/serve/serve_utils.py +5 -7
  72. sky/serve/server/core.py +9 -6
  73. sky/serve/server/impl.py +78 -25
  74. sky/serve/server/server.py +4 -5
  75. sky/serve/service_spec.py +33 -0
  76. sky/server/constants.py +1 -1
  77. sky/server/daemons.py +2 -3
  78. sky/server/requests/executor.py +56 -6
  79. sky/server/requests/payloads.py +31 -8
  80. sky/server/requests/preconditions.py +2 -3
  81. sky/server/rest.py +2 -0
  82. sky/server/server.py +28 -19
  83. sky/server/stream_utils.py +34 -12
  84. sky/setup_files/dependencies.py +4 -1
  85. sky/setup_files/setup.py +44 -44
  86. sky/templates/kubernetes-ray.yml.j2 +16 -15
  87. sky/usage/usage_lib.py +3 -0
  88. sky/utils/cli_utils/status_utils.py +4 -5
  89. sky/utils/context.py +104 -29
  90. sky/utils/controller_utils.py +7 -6
  91. sky/utils/kubernetes/create_cluster.sh +13 -28
  92. sky/utils/kubernetes/delete_cluster.sh +10 -7
  93. sky/utils/kubernetes/generate_kind_config.py +6 -66
  94. sky/utils/kubernetes/kubernetes_deploy_utils.py +170 -37
  95. sky/utils/kubernetes_enums.py +5 -0
  96. sky/utils/ux_utils.py +35 -1
  97. sky/utils/yaml_utils.py +9 -0
  98. sky/volumes/client/sdk.py +44 -8
  99. sky/volumes/server/server.py +33 -7
  100. sky/volumes/volume.py +22 -14
  101. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/METADATA +40 -35
  102. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/RECORD +107 -107
  103. sky/dashboard/out/_next/static/chunks/1121-4ff1ec0dbc5792ab.js +0 -1
  104. sky/dashboard/out/_next/static/chunks/6856-9a2538f38c004652.js +0 -1
  105. sky/dashboard/out/_next/static/chunks/8969-a39efbadcd9fde80.js +0 -1
  106. sky/dashboard/out/_next/static/chunks/pages/clusters/[cluster]/[job]-1e9248ddbddcd122.js +0 -16
  107. /sky/dashboard/out/_next/static/{KP6HCNMqb_bnJB17oplgW → bn-NHt5qTzeTN2PefXuDA}/_ssgManifest.js +0 -0
  108. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/WHEEL +0 -0
  109. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/entry_points.txt +0 -0
  110. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/licenses/LICENSE +0 -0
  111. {skypilot_nightly-1.0.0.dev20250922.dist-info → skypilot_nightly-1.0.0.dev20250925.dist-info}/top_level.txt +0 -0
@@ -3,12 +3,13 @@
3
3
  import fastapi
4
4
 
5
5
  from sky import clouds
6
+ from sky import exceptions
6
7
  from sky import sky_logging
7
8
  from sky.server.requests import executor
8
9
  from sky.server.requests import payloads
9
10
  from sky.server.requests import requests as requests_lib
10
11
  from sky.utils import registry
11
- from sky.utils import volume
12
+ from sky.utils import volume as volume_utils
12
13
  from sky.volumes.server import core
13
14
 
14
15
  logger = sky_logging.init_logger(__name__)
@@ -46,6 +47,31 @@ async def volume_delete(request: fastapi.Request,
46
47
  )
47
48
 
48
49
 
50
+ @router.post('/validate')
51
+ async def volume_validate(
52
+ _: fastapi.Request,
53
+ volume_validate_body: payloads.VolumeValidateBody) -> None:
54
+ """Validates a volume."""
55
+ # pylint: disable=import-outside-toplevel
56
+ from sky.volumes import volume as volume_lib
57
+
58
+ try:
59
+ volume_config = {
60
+ 'name': volume_validate_body.name,
61
+ 'type': volume_validate_body.volume_type,
62
+ 'infra': volume_validate_body.infra,
63
+ 'size': volume_validate_body.size,
64
+ 'labels': volume_validate_body.labels,
65
+ 'config': volume_validate_body.config,
66
+ 'resource_name': volume_validate_body.resource_name,
67
+ }
68
+ volume = volume_lib.Volume.from_yaml_config(volume_config)
69
+ volume.validate()
70
+ except Exception as e:
71
+ raise fastapi.HTTPException(status_code=400,
72
+ detail=exceptions.serialize_exception(e))
73
+
74
+
49
75
  @router.post('/apply')
50
76
  async def volume_apply(request: fastapi.Request,
51
77
  volume_apply_body: payloads.VolumeApplyBody) -> None:
@@ -55,7 +81,7 @@ async def volume_apply(request: fastapi.Request,
55
81
  volume_config = volume_apply_body.config
56
82
 
57
83
  supported_volume_types = [
58
- volume_type.value for volume_type in volume.VolumeType
84
+ volume_type.value for volume_type in volume_utils.VolumeType
59
85
  ]
60
86
  if volume_type not in supported_volume_types:
61
87
  raise fastapi.HTTPException(
@@ -64,24 +90,24 @@ async def volume_apply(request: fastapi.Request,
64
90
  if cloud is None:
65
91
  raise fastapi.HTTPException(status_code=400,
66
92
  detail=f'Invalid cloud: {volume_cloud}')
67
- if volume_type == volume.VolumeType.PVC.value:
93
+ if volume_type == volume_utils.VolumeType.PVC.value:
68
94
  if not cloud.is_same_cloud(clouds.Kubernetes()):
69
95
  raise fastapi.HTTPException(
70
96
  status_code=400,
71
97
  detail='PVC storage is only supported on Kubernetes')
72
98
  supported_access_modes = [
73
- access_mode.value for access_mode in volume.VolumeAccessMode
99
+ access_mode.value for access_mode in volume_utils.VolumeAccessMode
74
100
  ]
75
101
  if volume_config is None:
76
102
  volume_config = {}
77
103
  access_mode = volume_config.get('access_mode')
78
104
  if access_mode is None:
79
- volume_config[
80
- 'access_mode'] = volume.VolumeAccessMode.READ_WRITE_ONCE.value
105
+ volume_config['access_mode'] = (
106
+ volume_utils.VolumeAccessMode.READ_WRITE_ONCE.value)
81
107
  elif access_mode not in supported_access_modes:
82
108
  raise fastapi.HTTPException(
83
109
  status_code=400, detail=f'Invalid access mode: {access_mode}')
84
- elif volume_type == volume.VolumeType.RUNPOD_NETWORK_VOLUME.value:
110
+ elif volume_type == volume_utils.VolumeType.RUNPOD_NETWORK_VOLUME.value:
85
111
  if not cloud.is_same_cloud(clouds.RunPod()):
86
112
  raise fastapi.HTTPException(
87
113
  status_code=400,
sky/volumes/volume.py CHANGED
@@ -115,9 +115,6 @@ class Volume:
115
115
  self.region = infra_info.region
116
116
  self.zone = infra_info.zone
117
117
 
118
- # Validate the volume config
119
- self._validate_config()
120
-
121
118
  def _adjust_config(self) -> None:
122
119
  """Adjust the volume config (e.g., parse size)."""
123
120
  if self.size is None:
@@ -132,8 +129,28 @@ class Volume:
132
129
  except ValueError as e:
133
130
  raise ValueError(f'Invalid size {self.size}: {e}') from e
134
131
 
135
- def _validate_config(self) -> None:
136
- """Validate the volume config."""
132
+ def validate(self, skip_cloud_compatibility: bool = False) -> None:
133
+ """Validates the volume."""
134
+ self.validate_name()
135
+ self.validate_size()
136
+ if not skip_cloud_compatibility:
137
+ self.validate_cloud_compatibility()
138
+ # Extra, type-specific validations
139
+ self._validate_config_extra()
140
+
141
+ def validate_name(self) -> None:
142
+ """Validates if the volume name is set."""
143
+ assert self.name is not None, 'Volume name must be set'
144
+
145
+ def validate_size(self) -> None:
146
+ """Validates that size is specified for new volumes."""
147
+ if not self.resource_name and not self.size:
148
+ raise ValueError('Size is required for new volumes. '
149
+ 'Please specify the size in the YAML file or '
150
+ 'use the --size flag.')
151
+
152
+ def validate_cloud_compatibility(self) -> None:
153
+ """Validates that the specified cloud is compatible with volume type."""
137
154
  cloud_obj_from_type = VOLUME_TYPE_TO_CLOUD.get(
138
155
  volume_lib.VolumeType(self.type))
139
156
  if self.cloud:
@@ -150,25 +167,16 @@ class Volume:
150
167
  self.region, self.zone = cloud_obj.validate_region_zone(
151
168
  self.region, self.zone)
152
169
 
153
- # Name must be set by factory before validation.
154
- assert self.name is not None
155
170
  valid, err_msg = cloud_obj.is_volume_name_valid(self.name)
156
171
  if not valid:
157
172
  raise ValueError(f'Invalid volume name: {err_msg}')
158
173
 
159
- if not self.resource_name and not self.size:
160
- raise ValueError('Size is required for new volumes. '
161
- 'Please specify the size in the YAML file or '
162
- 'use the --size flag.')
163
174
  if self.labels:
164
175
  for key, value in self.labels.items():
165
176
  valid, err_msg = cloud_obj.is_label_valid(key, value)
166
177
  if not valid:
167
178
  raise ValueError(f'{err_msg}')
168
179
 
169
- # Extra, type-specific validations
170
- self._validate_config_extra()
171
-
172
180
  # Hook methods for subclasses
173
181
  def _validate_config_extra(self) -> None:
174
182
  """Additional type-specific validation.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: skypilot-nightly
3
- Version: 1.0.0.dev20250922
3
+ Version: 1.0.0.dev20250925
4
4
  Summary: SkyPilot: Run AI on Any Infra — Unified, Faster, Cheaper.
5
5
  Author: SkyPilot Team
6
6
  License: Apache 2.0
@@ -117,6 +117,7 @@ Requires-Dist: grpcio>=1.63.0; extra == "remote"
117
117
  Requires-Dist: protobuf<7.0.0,>=5.26.1; extra == "remote"
118
118
  Provides-Extra: runpod
119
119
  Requires-Dist: runpod>=1.6.1; extra == "runpod"
120
+ Requires-Dist: tomli; python_version < "3.11" and extra == "runpod"
120
121
  Provides-Extra: fluidstack
121
122
  Provides-Extra: cudo
122
123
  Requires-Dist: cudo-compute>=0.1.10; extra == "cudo"
@@ -151,50 +152,53 @@ Requires-Dist: anyio; extra == "server"
151
152
  Requires-Dist: grpcio>=1.63.0; extra == "server"
152
153
  Requires-Dist: protobuf<7.0.0,>=5.26.1; extra == "server"
153
154
  Requires-Dist: aiosqlite; extra == "server"
155
+ Requires-Dist: greenlet; extra == "server"
154
156
  Provides-Extra: all
155
- Requires-Dist: azure-core>=1.31.0; extra == "all"
156
- Requires-Dist: cudo-compute>=0.1.10; extra == "all"
157
157
  Requires-Dist: azure-storage-blob>=12.23.1; extra == "all"
158
- Requires-Dist: casbin; extra == "all"
159
- Requires-Dist: sqlalchemy_adapter; extra == "all"
160
- Requires-Dist: ibm-cos-sdk; extra == "all"
161
- Requires-Dist: pydo>=0.3.0; extra == "all"
162
- Requires-Dist: ray[default]>=2.6.1; extra == "all"
163
- Requires-Dist: azure-core>=1.24.0; extra == "all"
164
- Requires-Dist: websockets; extra == "all"
165
- Requires-Dist: azure-mgmt-network>=27.0.0; extra == "all"
166
- Requires-Dist: anyio; extra == "all"
167
- Requires-Dist: ibm-vpc; extra == "all"
168
- Requires-Dist: ecsapi>=0.2.0; extra == "all"
158
+ Requires-Dist: msgraph-sdk; extra == "all"
159
+ Requires-Dist: tomli; python_version < "3.11" and extra == "all"
160
+ Requires-Dist: azure-mgmt-compute>=33.0.0; extra == "all"
161
+ Requires-Dist: docker; extra == "all"
162
+ Requires-Dist: cudo-compute>=0.1.10; extra == "all"
169
163
  Requires-Dist: passlib; extra == "all"
170
- Requires-Dist: google-cloud-storage; extra == "all"
164
+ Requires-Dist: botocore>=1.29.10; extra == "all"
165
+ Requires-Dist: websockets; extra == "all"
166
+ Requires-Dist: protobuf<7.0.0,>=5.26.1; extra == "all"
167
+ Requires-Dist: pyopenssl<24.3.0,>=23.2.0; extra == "all"
168
+ Requires-Dist: ibm-platform-services>=0.48.0; extra == "all"
169
+ Requires-Dist: runpod>=1.6.1; extra == "all"
170
+ Requires-Dist: nebius>=0.2.47; extra == "all"
171
+ Requires-Dist: pyjwt; extra == "all"
172
+ Requires-Dist: azure-cli>=2.65.0; extra == "all"
171
173
  Requires-Dist: oci; extra == "all"
172
- Requires-Dist: azure-mgmt-compute>=33.0.0; extra == "all"
173
174
  Requires-Dist: python-dateutil; extra == "all"
174
- Requires-Dist: docker; extra == "all"
175
- Requires-Dist: colorama<0.4.5; extra == "all"
175
+ Requires-Dist: casbin; extra == "all"
176
+ Requires-Dist: azure-core>=1.24.0; extra == "all"
177
+ Requires-Dist: aiosqlite; extra == "all"
178
+ Requires-Dist: greenlet; extra == "all"
179
+ Requires-Dist: msrestazure; extra == "all"
180
+ Requires-Dist: sqlalchemy_adapter; extra == "all"
181
+ Requires-Dist: azure-identity>=1.19.0; extra == "all"
176
182
  Requires-Dist: vastai-sdk>=0.1.12; extra == "all"
183
+ Requires-Dist: pydo>=0.3.0; extra == "all"
184
+ Requires-Dist: boto3>=1.26.1; extra == "all"
185
+ Requires-Dist: awscli>=1.27.10; extra == "all"
177
186
  Requires-Dist: grpcio>=1.63.0; extra == "all"
178
- Requires-Dist: pyjwt; extra == "all"
179
- Requires-Dist: aiosqlite; extra == "all"
187
+ Requires-Dist: google-cloud-storage; extra == "all"
188
+ Requires-Dist: pyvmomi==8.0.1.0.2; extra == "all"
180
189
  Requires-Dist: ibm-cloud-sdk-core; extra == "all"
181
- Requires-Dist: aiohttp; extra == "all"
182
- Requires-Dist: azure-common; extra == "all"
190
+ Requires-Dist: ray[default]>=2.6.1; extra == "all"
183
191
  Requires-Dist: kubernetes!=32.0.0,>=20.0.0; extra == "all"
184
- Requires-Dist: awscli>=1.27.10; extra == "all"
185
- Requires-Dist: boto3>=1.26.1; extra == "all"
186
- Requires-Dist: msrestazure; extra == "all"
187
- Requires-Dist: pyopenssl<24.3.0,>=23.2.0; extra == "all"
188
- Requires-Dist: runpod>=1.6.1; extra == "all"
192
+ Requires-Dist: anyio; extra == "all"
193
+ Requires-Dist: colorama<0.4.5; extra == "all"
194
+ Requires-Dist: ecsapi>=0.2.0; extra == "all"
195
+ Requires-Dist: azure-core>=1.31.0; extra == "all"
189
196
  Requires-Dist: google-api-python-client>=2.69.0; extra == "all"
190
- Requires-Dist: azure-cli>=2.65.0; extra == "all"
191
- Requires-Dist: protobuf<7.0.0,>=5.26.1; extra == "all"
192
- Requires-Dist: msgraph-sdk; extra == "all"
193
- Requires-Dist: pyvmomi==8.0.1.0.2; extra == "all"
194
- Requires-Dist: azure-identity>=1.19.0; extra == "all"
195
- Requires-Dist: ibm-platform-services>=0.48.0; extra == "all"
196
- Requires-Dist: nebius>=0.2.47; extra == "all"
197
- Requires-Dist: botocore>=1.29.10; extra == "all"
197
+ Requires-Dist: ibm-vpc; extra == "all"
198
+ Requires-Dist: azure-common; extra == "all"
199
+ Requires-Dist: azure-mgmt-network>=27.0.0; extra == "all"
200
+ Requires-Dist: aiohttp; extra == "all"
201
+ Requires-Dist: ibm-cos-sdk; extra == "all"
198
202
  Dynamic: author
199
203
  Dynamic: classifier
200
204
  Dynamic: description
@@ -245,6 +249,7 @@ Dynamic: summary
245
249
 
246
250
  :fire: *News* :fire:
247
251
  - [Aug 2025] Serve and finetune **OpenAI GPT-OSS models** (gpt-oss-120b, gpt-oss-20b) with one command on any infra: [**serve**](./llm/gpt-oss/) + [**LoRA and full finetuning**](./llm/gpt-oss-finetuning/)
252
+ - [Jul 2025] Run large-scale **LLM training with TorchTitan** on any cloud: [**example**](./llm/torchtitan/)
248
253
  - [Jul 2025] Run distributed **RL training for LLMs** with Verl (PPO, GRPO) on any cloud: [**example**](./llm/verl/)
249
254
  - [Jul 2025] 🎉 SkyPilot v0.10.0 released! [**blog post**](https://blog.skypilot.co/announcing-skypilot-0.10.0/), [**release notes**](https://github.com/skypilot-org/skypilot/releases/tag/v0.10.0)
250
255
  - [Jul 2025] Finetune **Llama4** on any distributed cluster/cloud: [**example**](./llm/llama-4-finetuning/)