vec-inf 0.3.3__tar.gz → 0.4.0.post1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Vector Institute
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: vec-inf
3
- Version: 0.3.3
3
+ Version: 0.4.0.post1
4
4
  Summary: Efficient LLM inference on Slurm clusters using vLLM.
5
5
  License: MIT
6
6
  Author: Marshall Wang
@@ -11,19 +11,21 @@ Classifier: Programming Language :: Python :: 3
11
11
  Classifier: Programming Language :: Python :: 3.10
12
12
  Classifier: Programming Language :: Python :: 3.11
13
13
  Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Programming Language :: Python :: 3.13
14
15
  Provides-Extra: dev
15
16
  Requires-Dist: click (>=8.1.0,<9.0.0)
16
17
  Requires-Dist: cupy-cuda12x (==12.1.0) ; extra == "dev"
17
- Requires-Dist: pandas (>=2.2.2,<3.0.0)
18
+ Requires-Dist: numpy (>=1.24.0,<2.0.0)
19
+ Requires-Dist: polars (>=1.15.0,<2.0.0)
18
20
  Requires-Dist: ray (>=2.9.3,<3.0.0) ; extra == "dev"
19
21
  Requires-Dist: requests (>=2.31.0,<3.0.0)
20
22
  Requires-Dist: rich (>=13.7.0,<14.0.0)
21
- Requires-Dist: vllm (>=0.5.0,<0.6.0) ; extra == "dev"
23
+ Requires-Dist: vllm (>=0.6.0,<0.7.0) ; extra == "dev"
22
24
  Requires-Dist: vllm-nccl-cu12 (>=2.18,<2.19) ; extra == "dev"
23
25
  Description-Content-Type: text/markdown
24
26
 
25
27
  # Vector Inference: Easy inference on Slurm clusters
26
- This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.
28
+ This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/models/models.csv) accordingly.
27
29
 
28
30
  ## Installation
29
31
  If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -33,16 +35,23 @@ pip install vec-inf
33
35
  Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
34
36
 
35
37
  ## Launch an inference server
38
+ ### `launch` command
36
39
  We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
37
40
  ```bash
38
41
  vec-inf launch Meta-Llama-3.1-8B-Instruct
39
42
  ```
40
43
  You should see an output like the following:
41
44
 
42
- <img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/557eb421-47db-4810-bccd-c49c526b1b43">
45
+ <img width="700" alt="launch_img" src="https://github.com/user-attachments/assets/ab658552-18b2-47e0-bf70-e539c3b898d5">
43
46
 
44
- The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run.
47
+ The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
48
+ * Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
49
+ * Your model weights directory should contain HF format weights.
50
+ * The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`, `--enforce-eager`. All other launch parameters need to be specified for custom models.
51
+ * Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
52
+ * For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
45
53
 
54
+ ### `status` command
46
55
  You can check the inference server status by providing the Slurm job ID to the `status` command:
47
56
  ```bash
48
57
  vec-inf status 13014393
@@ -62,6 +71,17 @@ There are 5 possible states:
62
71
 
63
72
  Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
64
73
 
74
+ ### `metrics` command
75
+ Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
76
+ ```bash
77
+ vec-inf metrics 13014393
78
+ ```
79
+
80
+ And you will see the performance metrics streamed to your console, note that the metrics are updated with a 10-second interval.
81
+
82
+ <img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
83
+
84
+ ### `shutdown` command
65
85
  Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
66
86
  ```bash
67
87
  vec-inf shutdown 13014393
@@ -69,17 +89,19 @@ vec-inf shutdown 13014393
69
89
  > Shutting down model with Slurm Job ID: 13014393
70
90
  ```
71
91
 
92
+ ### `list` command
72
93
  You call view the full list of available models by running the `list` command:
73
94
  ```bash
74
95
  vec-inf list
75
96
  ```
76
- <img width="1200" alt="list_img" src="https://github.com/user-attachments/assets/a4f0d896-989d-43bf-82a2-6a6e5d0d288f">
97
+ <img width="940" alt="list_img" src="https://github.com/user-attachments/assets/8cf901c4-404c-4398-a52f-0486f00747a3">
98
+
77
99
 
78
100
  You can also view the default setup for a specific supported model by providing the model name, for example `Meta-Llama-3.1-70B-Instruct`:
79
101
  ```bash
80
102
  vec-inf list Meta-Llama-3.1-70B-Instruct
81
103
  ```
82
- <img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/5dec7a33-ba6b-490d-af47-4cf7341d0b42">
104
+ <img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/30e42ab7-dde2-4d20-85f0-187adffefc3d">
83
105
 
84
106
  `launch`, `list`, and `status` command supports `--json-mode`, where the command output would be structured as a JSON string.
85
107
 
@@ -1,5 +1,5 @@
1
1
  # Vector Inference: Easy inference on Slurm clusters
2
- This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.
2
+ This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/models/models.csv) accordingly.
3
3
 
4
4
  ## Installation
5
5
  If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -9,16 +9,23 @@ pip install vec-inf
9
9
  Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
10
10
 
11
11
  ## Launch an inference server
12
+ ### `launch` command
12
13
  We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
13
14
  ```bash
14
15
  vec-inf launch Meta-Llama-3.1-8B-Instruct
15
16
  ```
16
17
  You should see an output like the following:
17
18
 
18
- <img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/557eb421-47db-4810-bccd-c49c526b1b43">
19
+ <img width="700" alt="launch_img" src="https://github.com/user-attachments/assets/ab658552-18b2-47e0-bf70-e539c3b898d5">
19
20
 
20
- The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run.
21
+ The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
22
+ * Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
23
+ * Your model weights directory should contain HF format weights.
24
+ * The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`, `--enforce-eager`. All other launch parameters need to be specified for custom models.
25
+ * Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
26
+ * For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
21
27
 
28
+ ### `status` command
22
29
  You can check the inference server status by providing the Slurm job ID to the `status` command:
23
30
  ```bash
24
31
  vec-inf status 13014393
@@ -38,6 +45,17 @@ There are 5 possible states:
38
45
 
39
46
  Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
40
47
 
48
+ ### `metrics` command
49
+ Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
50
+ ```bash
51
+ vec-inf metrics 13014393
52
+ ```
53
+
54
+ And you will see the performance metrics streamed to your console, note that the metrics are updated with a 10-second interval.
55
+
56
+ <img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
57
+
58
+ ### `shutdown` command
41
59
  Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
42
60
  ```bash
43
61
  vec-inf shutdown 13014393
@@ -45,17 +63,19 @@ vec-inf shutdown 13014393
45
63
  > Shutting down model with Slurm Job ID: 13014393
46
64
  ```
47
65
 
66
+ ### `list` command
48
67
  You call view the full list of available models by running the `list` command:
49
68
  ```bash
50
69
  vec-inf list
51
70
  ```
52
- <img width="1200" alt="list_img" src="https://github.com/user-attachments/assets/a4f0d896-989d-43bf-82a2-6a6e5d0d288f">
71
+ <img width="940" alt="list_img" src="https://github.com/user-attachments/assets/8cf901c4-404c-4398-a52f-0486f00747a3">
72
+
53
73
 
54
74
  You can also view the default setup for a specific supported model by providing the model name, for example `Meta-Llama-3.1-70B-Instruct`:
55
75
  ```bash
56
76
  vec-inf list Meta-Llama-3.1-70B-Instruct
57
77
  ```
58
- <img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/5dec7a33-ba6b-490d-af47-4cf7341d0b42">
78
+ <img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/30e42ab7-dde2-4d20-85f0-187adffefc3d">
59
79
 
60
80
  `launch`, `list`, and `status` command supports `--json-mode`, where the command output would be structured as a JSON string.
61
81
 
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "vec-inf"
3
- version = "0.3.3"
3
+ version = "0.4.0.post1"
4
4
  description = "Efficient LLM inference on Slurm clusters using vLLM."
5
5
  authors = ["Marshall Wang <marshall.wang@vectorinstitute.ai>"]
6
6
  license = "MIT license"
@@ -11,8 +11,9 @@ python = "^3.10"
11
11
  requests = "^2.31.0"
12
12
  click = "^8.1.0"
13
13
  rich = "^13.7.0"
14
- pandas = "^2.2.2"
15
- vllm = { version = "^0.5.0", optional = true }
14
+ polars = "^1.15.0"
15
+ numpy = "^1.24.0"
16
+ vllm = { version = "^0.6.0", optional = true }
16
17
  vllm-nccl-cu12 = { version = ">=2.18,<2.19", optional = true }
17
18
  ray = { version = "^2.9.3", optional = true }
18
19
  cupy-cuda12x = { version = "12.1.0", optional = true }
@@ -1,7 +1,8 @@
1
1
  # `vec-inf` Commands
2
2
 
3
3
  * `launch`: Specify a model family and other optional parameters to launch an OpenAI compatible inference server, `--json-mode` supported. Check [`here`](./models/README.md) for complete list of available options.
4
- * `list`: List all available model names, `--json-mode` supported.
4
+ * `list`: List all available model names, or append a supported model name to view the default configuration, `--json-mode` supported.
5
+ * `metrics`: Streams performance metrics to the console.
5
6
  * `status`: Check the model status by providing its Slurm job ID, `--json-mode` supported.
6
7
  * `shutdown`: Shutdown a model by providing its Slurm job ID.
7
8
 
@@ -1,9 +1,13 @@
1
1
  import os
2
- from typing import Optional
2
+ import time
3
+ from typing import Optional, cast
3
4
 
4
5
  import click
6
+
7
+ import polars as pl
5
8
  from rich.columns import Columns
6
9
  from rich.console import Console
10
+ from rich.live import Live
7
11
  from rich.panel import Panel
8
12
 
9
13
  import vec_inf.cli._utils as utils
@@ -24,9 +28,19 @@ def cli():
24
28
  @click.option(
25
29
  "--max-model-len",
26
30
  type=int,
27
- help="Model context length. If unspecified, will be automatically derived from the model config.",
31
+ help="Model context length. Default value set based on suggested resource allocation.",
32
+ )
33
+ @click.option(
34
+ "--max-num-seqs",
35
+ type=int,
36
+ help="Maximum number of sequences to process in a single request",
37
+ )
38
+ @click.option(
39
+ "--partition",
40
+ type=str,
41
+ default="a40",
42
+ help="Type of compute partition, default to a40",
28
43
  )
29
- @click.option("--partition", type=str, help="Type of compute partition, default to a40")
30
44
  @click.option(
31
45
  "--num-nodes",
32
46
  type=int,
@@ -40,24 +54,48 @@ def cli():
40
54
  @click.option(
41
55
  "--qos",
42
56
  type=str,
43
- help="Quality of service, default depends on suggested resource allocation required for the model",
57
+ help="Quality of service",
44
58
  )
45
59
  @click.option(
46
60
  "--time",
47
61
  type=str,
48
- help="Time limit for job, this should comply with QoS, default to max walltime of the chosen QoS",
62
+ help="Time limit for job, this should comply with QoS limits",
49
63
  )
50
64
  @click.option(
51
65
  "--vocab-size",
52
66
  type=int,
53
67
  help="Vocabulary size, this option is intended for custom models",
54
68
  )
55
- @click.option("--data-type", type=str, help="Model data type, default to auto")
56
- @click.option("--venv", type=str, help="Path to virtual environment")
69
+ @click.option(
70
+ "--data-type", type=str, default="auto", help="Model data type, default to auto"
71
+ )
72
+ @click.option(
73
+ "--venv",
74
+ type=str,
75
+ default="singularity",
76
+ help="Path to virtual environment, default to preconfigured singularity container",
77
+ )
57
78
  @click.option(
58
79
  "--log-dir",
59
80
  type=str,
60
- help="Path to slurm log directory, default to .vec-inf-logs in home directory",
81
+ default="default",
82
+ help="Path to slurm log directory, default to .vec-inf-logs in user home directory",
83
+ )
84
+ @click.option(
85
+ "--model-weights-parent-dir",
86
+ type=str,
87
+ default="/model-weights",
88
+ help="Path to parent directory containing model weights, default to '/model-weights' for supported models",
89
+ )
90
+ @click.option(
91
+ "--pipeline-parallelism",
92
+ type=str,
93
+ help="Enable pipeline parallelism, accepts 'True' or 'False', default to 'True' for supported models",
94
+ )
95
+ @click.option(
96
+ "--enforce-eager",
97
+ type=str,
98
+ help="Always use eager-mode PyTorch, accepts 'True' or 'False', default to 'False' for custom models if not set",
61
99
  )
62
100
  @click.option(
63
101
  "--json-mode",
@@ -69,6 +107,7 @@ def launch(
69
107
  model_family: Optional[str] = None,
70
108
  model_variant: Optional[str] = None,
71
109
  max_model_len: Optional[int] = None,
110
+ max_num_seqs: Optional[int] = None,
72
111
  partition: Optional[str] = None,
73
112
  num_nodes: Optional[int] = None,
74
113
  num_gpus: Optional[int] = None,
@@ -78,11 +117,20 @@ def launch(
78
117
  data_type: Optional[str] = None,
79
118
  venv: Optional[str] = None,
80
119
  log_dir: Optional[str] = None,
120
+ model_weights_parent_dir: Optional[str] = None,
121
+ pipeline_parallelism: Optional[str] = None,
122
+ enforce_eager: Optional[str] = None,
81
123
  json_mode: bool = False,
82
124
  ) -> None:
83
125
  """
84
126
  Launch a model on the cluster
85
127
  """
128
+
129
+ if isinstance(pipeline_parallelism, str):
130
+ pipeline_parallelism = (
131
+ "True" if pipeline_parallelism.lower() == "true" else "False"
132
+ )
133
+
86
134
  launch_script_path = os.path.join(
87
135
  os.path.dirname(os.path.dirname(os.path.realpath(__file__))), "launch_server.sh"
88
136
  )
@@ -90,7 +138,7 @@ def launch(
90
138
 
91
139
  models_df = utils.load_models_df()
92
140
 
93
- if model_name in models_df["model_name"].values:
141
+ if model_name in models_df["model_name"].to_list():
94
142
  default_args = utils.load_default_args(models_df, model_name)
95
143
  for arg in default_args:
96
144
  if arg in locals() and locals()[arg] is not None:
@@ -98,10 +146,11 @@ def launch(
98
146
  renamed_arg = arg.replace("_", "-")
99
147
  launch_cmd += f" --{renamed_arg} {default_args[arg]}"
100
148
  else:
101
- model_args = models_df.columns.tolist()
102
- excluded_keys = ["model_name", "pipeline_parallelism"]
149
+ model_args = models_df.columns
150
+ model_args.remove("model_name")
151
+ model_args.remove("model_type")
103
152
  for arg in model_args:
104
- if arg not in excluded_keys and locals()[arg] is not None:
153
+ if locals()[arg] is not None:
105
154
  renamed_arg = arg.replace("_", "-")
106
155
  launch_cmd += f" --{renamed_arg} {locals()[arg]}"
107
156
 
@@ -225,40 +274,111 @@ def shutdown(slurm_job_id: int) -> None:
225
274
  is_flag=True,
226
275
  help="Output in JSON string",
227
276
  )
228
- def list(model_name: Optional[str] = None, json_mode: bool = False) -> None:
277
+ def list_models(model_name: Optional[str] = None, json_mode: bool = False) -> None:
229
278
  """
230
279
  List all available models, or get default setup of a specific model
231
280
  """
232
- models_df = utils.load_models_df()
233
281
 
234
- if model_name:
235
- if model_name not in models_df["model_name"].values:
282
+ def list_model(model_name: str, models_df: pl.DataFrame, json_mode: bool):
283
+ if model_name not in models_df["model_name"].to_list():
236
284
  raise ValueError(f"Model name {model_name} not found in available models")
237
285
 
238
- excluded_keys = {"venv", "log_dir", "pipeline_parallelism"}
239
- model_row = models_df.loc[models_df["model_name"] == model_name]
286
+ excluded_keys = {"venv", "log_dir"}
287
+ model_row = models_df.filter(models_df["model_name"] == model_name)
240
288
 
241
289
  if json_mode:
242
- # click.echo(model_row.to_json(orient='records'))
243
- filtered_model_row = model_row.drop(columns=excluded_keys, errors="ignore")
244
- click.echo(filtered_model_row.to_json(orient="records"))
290
+ filtered_model_row = model_row.drop(excluded_keys, strict=False)
291
+ click.echo(filtered_model_row.to_dicts()[0])
245
292
  return
246
293
  table = utils.create_table(key_title="Model Config", value_title="Value")
247
- for _, row in model_row.iterrows():
294
+ for row in model_row.to_dicts():
248
295
  for key, value in row.items():
249
296
  if key not in excluded_keys:
250
297
  table.add_row(key, str(value))
251
298
  CONSOLE.print(table)
252
- return
253
299
 
254
- if json_mode:
255
- click.echo(models_df["model_name"].to_json(orient="records"))
256
- return
257
- panels = []
258
- for _, row in models_df.iterrows():
259
- styled_text = f"[magenta]{row['model_family']}[/magenta]-{row['model_variant']}"
260
- panels.append(Panel(styled_text, expand=True))
261
- CONSOLE.print(Columns(panels, equal=True))
300
+ def list_all(models_df: pl.DataFrame, json_mode: bool):
301
+ if json_mode:
302
+ click.echo(models_df["model_name"].to_list())
303
+ return
304
+ panels = []
305
+ model_type_colors = {
306
+ "LLM": "cyan",
307
+ "VLM": "bright_blue",
308
+ "Text Embedding": "purple",
309
+ "Reward Modeling": "bright_magenta",
310
+ }
311
+
312
+ models_df = models_df.with_columns(
313
+ pl.when(pl.col("model_type") == "LLM")
314
+ .then(0)
315
+ .when(pl.col("model_type") == "VLM")
316
+ .then(1)
317
+ .when(pl.col("model_type") == "Text Embedding")
318
+ .then(2)
319
+ .when(pl.col("model_type") == "Reward Modeling")
320
+ .then(3)
321
+ .otherwise(-1)
322
+ .alias("model_type_order")
323
+ )
324
+
325
+ models_df = models_df.sort("model_type_order")
326
+ models_df = models_df.drop("model_type_order")
327
+
328
+ for row in models_df.to_dicts():
329
+ panel_color = model_type_colors.get(row["model_type"], "white")
330
+ styled_text = (
331
+ f"[magenta]{row['model_family']}[/magenta]-{row['model_variant']}"
332
+ )
333
+ panels.append(Panel(styled_text, expand=True, border_style=panel_color))
334
+ CONSOLE.print(Columns(panels, equal=True))
335
+
336
+ models_df = utils.load_models_df()
337
+
338
+ if model_name:
339
+ list_model(model_name, models_df, json_mode)
340
+ else:
341
+ list_all(models_df, json_mode)
342
+
343
+
344
+ @cli.command("metrics")
345
+ @click.argument("slurm_job_id", type=int, nargs=1)
346
+ @click.option(
347
+ "--log-dir",
348
+ type=str,
349
+ help="Path to slurm log directory. This is required if --log-dir was set in model launch",
350
+ )
351
+ def metrics(slurm_job_id: int, log_dir: Optional[str] = None) -> None:
352
+ """
353
+ Stream performance metrics to the console
354
+ """
355
+ status_cmd = f"scontrol show job {slurm_job_id} --oneliner"
356
+ output = utils.run_bash_command(status_cmd)
357
+ slurm_job_name = output.split(" ")[1].split("=")[1]
358
+
359
+ with Live(refresh_per_second=1, console=CONSOLE) as live:
360
+ while True:
361
+ out_logs = utils.read_slurm_log(
362
+ slurm_job_name, slurm_job_id, "out", log_dir
363
+ )
364
+ # if out_logs is a string, then it is an error message
365
+ if isinstance(out_logs, str):
366
+ live.update(out_logs)
367
+ break
368
+ out_logs = cast(list, out_logs)
369
+ latest_metrics = utils.get_latest_metric(out_logs)
370
+ # if latest_metrics is a string, then it is an error message
371
+ if isinstance(latest_metrics, str):
372
+ live.update(latest_metrics)
373
+ break
374
+ latest_metrics = cast(dict, latest_metrics)
375
+ table = utils.create_table(key_title="Metric", value_title="Value")
376
+ for key, value in latest_metrics.items():
377
+ table.add_row(key, value)
378
+
379
+ live.update(table)
380
+
381
+ time.sleep(2)
262
382
 
263
383
 
264
384
  if __name__ == "__main__":
@@ -1,12 +1,12 @@
1
1
  import os
2
2
  import subprocess
3
- from typing import Optional, Union
3
+ from typing import Optional, Union, cast
4
4
 
5
- import pandas as pd
5
+ import polars as pl
6
6
  import requests
7
7
  from rich.table import Table
8
8
 
9
- MODEL_READY_SIGNATURE = "INFO: Uvicorn running on http://0.0.0.0:"
9
+ MODEL_READY_SIGNATURE = "INFO: Application startup complete."
10
10
  SERVER_ADDRESS_SIGNATURE = "Server address: "
11
11
 
12
12
 
@@ -25,7 +25,7 @@ def read_slurm_log(
25
25
  slurm_job_name: str, slurm_job_id: int, slurm_log_type: str, log_dir: Optional[str]
26
26
  ) -> Union[list[str], str]:
27
27
  """
28
- Get the directory of a model
28
+ Read the slurm log file
29
29
  """
30
30
  if not log_dir:
31
31
  models_dir = os.path.join(os.path.expanduser("~"), ".vec-inf-logs")
@@ -35,9 +35,11 @@ def read_slurm_log(
35
35
  log_dir = os.path.join(models_dir, dir)
36
36
  break
37
37
 
38
+ log_dir = cast(str, log_dir)
39
+
38
40
  try:
39
41
  file_path = os.path.join(
40
- log_dir, # type: ignore
42
+ log_dir,
41
43
  f"{slurm_job_name}.{slurm_job_id}.{slurm_log_type}",
42
44
  )
43
45
  with open(file_path, "r") as file:
@@ -58,12 +60,15 @@ def is_server_running(
58
60
  if isinstance(log_content, str):
59
61
  return log_content
60
62
 
63
+ status: Union[str, tuple[str, str]] = "LAUNCHING"
64
+
61
65
  for line in log_content:
62
66
  if "error" in line.lower():
63
- return ("FAILED", line.strip("\n"))
67
+ status = ("FAILED", line.strip("\n"))
64
68
  if MODEL_READY_SIGNATURE in line:
65
- return "RUNNING"
66
- return "LAUNCHING"
69
+ status = "RUNNING"
70
+
71
+ return status
67
72
 
68
73
 
69
74
  def get_base_url(slurm_job_name: str, slurm_job_id: int, log_dir: Optional[str]) -> str:
@@ -114,11 +119,11 @@ def create_table(
114
119
  return table
115
120
 
116
121
 
117
- def load_models_df() -> pd.DataFrame:
122
+ def load_models_df() -> pl.DataFrame:
118
123
  """
119
124
  Load the models dataframe
120
125
  """
121
- models_df = pd.read_csv(
126
+ models_df = pl.read_csv(
122
127
  os.path.join(
123
128
  os.path.dirname(os.path.dirname(os.path.realpath(__file__))),
124
129
  "models/models.csv",
@@ -127,11 +132,32 @@ def load_models_df() -> pd.DataFrame:
127
132
  return models_df
128
133
 
129
134
 
130
- def load_default_args(models_df: pd.DataFrame, model_name: str) -> dict:
135
+ def load_default_args(models_df: pl.DataFrame, model_name: str) -> dict:
131
136
  """
132
137
  Load the default arguments for a model
133
138
  """
134
- row_data = models_df.loc[models_df["model_name"] == model_name]
135
- default_args = row_data.iloc[0].to_dict()
136
- default_args.pop("model_name")
139
+ row_data = models_df.filter(models_df["model_name"] == model_name)
140
+ default_args = row_data.to_dicts()[0]
141
+ default_args.pop("model_name", None)
142
+ default_args.pop("model_type", None)
137
143
  return default_args
144
+
145
+
146
+ def get_latest_metric(log_lines: list[str]) -> dict | str:
147
+ """Read the latest metric entry from the log file."""
148
+ latest_metric = {}
149
+
150
+ try:
151
+ for line in reversed(log_lines):
152
+ if "Avg prompt throughput" in line:
153
+ # Parse the metric values from the line
154
+ metrics_str = line.split("] ")[1].strip().strip(".")
155
+ metrics_list = metrics_str.split(", ")
156
+ for metric in metrics_list:
157
+ key, value = metric.split(": ")
158
+ latest_metric[key] = value
159
+ break
160
+ except Exception as e:
161
+ return f"[red]Error reading log file: {e}[/red]"
162
+
163
+ return latest_metric