PyPI - vec-inf - Versions diffs - 0.3.3__tar.gz → 0.4.0.post1__tar.gz - Mend

vec-inf 0.3.3tar.gz → 0.4.0.post1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

vec_inf-0.4.0.post1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 Vector Institute
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: vec-inf
-Version: 0.3.3
+Version: 0.4.0.post1
 Summary: Efficient LLM inference on Slurm clusters using vLLM.
 License: MIT
 Author: Marshall Wang
@@ -11,19 +11,21 @@ Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
 Provides-Extra: dev
 Requires-Dist: click (>=8.1.0,<9.0.0)
 Requires-Dist: cupy-cuda12x (==12.1.0) ; extra == "dev"
-Requires-Dist: pandas (>=2.2.2,<3.0.0)
+Requires-Dist: numpy (>=1.24.0,<2.0.0)
+Requires-Dist: polars (>=1.15.0,<2.0.0)
 Requires-Dist: ray (>=2.9.3,<3.0.0) ; extra == "dev"
 Requires-Dist: requests (>=2.31.0,<3.0.0)
 Requires-Dist: rich (>=13.7.0,<14.0.0)
-Requires-Dist: vllm (>=0.5.0,<0.6.0) ; extra == "dev"
+Requires-Dist: vllm (>=0.6.0,<0.7.0) ; extra == "dev"
 Requires-Dist: vllm-nccl-cu12 (>=2.18,<2.19) ; extra == "dev"
 Description-Content-Type: text/markdown
 # Vector Inference: Easy inference on Slurm clusters
-This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.
+This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/models/models.csv) accordingly.
 ## Installation
 If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -33,16 +35,23 @@ pip install vec-inf
 Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
 ## Launch an inference server
+### `launch` command
 We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
 ```bash
 vec-inf launch Meta-Llama-3.1-8B-Instruct
 ```
 You should see an output like the following:
-<img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/557eb421-47db-4810-bccd-c49c526b1b43">
+<img width="700" alt="launch_img" src="https://github.com/user-attachments/assets/ab658552-18b2-47e0-bf70-e539c3b898d5">
-The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run.
+The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
+* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
+* Your model weights directory should contain HF format weights.
+* The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`, `--enforce-eager`. All other launch parameters need to be specified for custom models.
+* Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
+* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
+### `status` command
 You can check the inference server status by providing the Slurm job ID to the `status` command:
 ```bash
 vec-inf status 13014393
@@ -62,6 +71,17 @@ There are 5 possible states:
 Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
+### `metrics` command
+Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
+```bash
+vec-inf metrics 13014393
+```
+And you will see the performance metrics streamed to your console, note that the metrics are updated with a 10-second interval.
+<img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
+### `shutdown` command
 Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
 ```bash
 vec-inf shutdown 13014393
@@ -69,17 +89,19 @@ vec-inf shutdown 13014393
 > Shutting down model with Slurm Job ID: 13014393
 ```
+### `list` command
 You call view the full list of available models by running the `list` command:
 ```bash
 vec-inf list
 ```
-<img width="1200" alt="list_img" src="https://github.com/user-attachments/assets/a4f0d896-989d-43bf-82a2-6a6e5d0d288f">
+<img width="940" alt="list_img" src="https://github.com/user-attachments/assets/8cf901c4-404c-4398-a52f-0486f00747a3">
 You can also view the default setup for a specific supported model by providing the model name, for example `Meta-Llama-3.1-70B-Instruct`:
 ```bash
 vec-inf list Meta-Llama-3.1-70B-Instruct
 ```
-<img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/5dec7a33-ba6b-490d-af47-4cf7341d0b42">
+<img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/30e42ab7-dde2-4d20-85f0-187adffefc3d">
 `launch`, `list`, and `status` command supports `--json-mode`, where the command output would be structured as a JSON string.

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/README.md RENAMED Viewed

@@ -1,5 +1,5 @@
 # Vector Inference: Easy inference on Slurm clusters
-This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.
+This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec_inf/launch_server.sh), [`vllm.slurm`](vec_inf/vllm.slurm), [`multinode_vllm.slurm`](vec_inf/multinode_vllm.slurm) and [`models.csv`](vec_inf/models/models.csv) accordingly.
 ## Installation
 If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -9,16 +9,23 @@ pip install vec-inf
 Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
 ## Launch an inference server
+### `launch` command
 We will use the Llama 3.1 model as example, to launch an OpenAI compatible inference server for Meta-Llama-3.1-8B-Instruct, run:
 ```bash
 vec-inf launch Meta-Llama-3.1-8B-Instruct
 ```
 You should see an output like the following:
-<img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/557eb421-47db-4810-bccd-c49c526b1b43">
+<img width="700" alt="launch_img" src="https://github.com/user-attachments/assets/ab658552-18b2-47e0-bf70-e539c3b898d5">
-The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run.
+The model would be launched using the [default parameters](vec_inf/models/models.csv), you can override these values by providing additional parameters, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), and make sure to follow the instructions below:
+* Your model weights directory naming convention should follow `$MODEL_FAMILY-$MODEL_VARIANT`.
+* Your model weights directory should contain HF format weights.
+* The following launch parameters will conform to default value if not specified: `--max-num-seqs`, `--partition`, `--data-type`, `--venv`, `--log-dir`, `--model-weights-parent-dir`, `--pipeline-parallelism`, `--enforce-eager`. All other launch parameters need to be specified for custom models.
+* Example for setting the model weights parent directory: `--model-weights-parent-dir /h/user_name/my_weights`.
+* For other model launch parameters you can reference the default values for similar models using the [`list` command ](#list-command).
+### `status` command
 You can check the inference server status by providing the Slurm job ID to the `status` command:
 ```bash
 vec-inf status 13014393
@@ -38,6 +45,17 @@ There are 5 possible states:
 Note that the base URL is only available when model is in `READY` state, and if you've changed the Slurm log directory path, you also need to specify it when using the `status` command.
+### `metrics` command
+Once your server is ready, you can check performance metrics by providing the Slurm job ID to the `metrics` command:
+```bash
+vec-inf metrics 13014393
+```
+And you will see the performance metrics streamed to your console, note that the metrics are updated with a 10-second interval.
+<img width="400" alt="metrics_img" src="https://github.com/user-attachments/assets/e5ff2cd5-659b-4c88-8ebc-d8f3fdc023a4">
+### `shutdown` command
 Finally, when you're finished using a model, you can shut it down by providing the Slurm job ID:
 ```bash
 vec-inf shutdown 13014393
@@ -45,17 +63,19 @@ vec-inf shutdown 13014393
 > Shutting down model with Slurm Job ID: 13014393
 ```
+### `list` command
 You call view the full list of available models by running the `list` command:
 ```bash
 vec-inf list
 ```
-<img width="1200" alt="list_img" src="https://github.com/user-attachments/assets/a4f0d896-989d-43bf-82a2-6a6e5d0d288f">
+<img width="940" alt="list_img" src="https://github.com/user-attachments/assets/8cf901c4-404c-4398-a52f-0486f00747a3">
 You can also view the default setup for a specific supported model by providing the model name, for example `Meta-Llama-3.1-70B-Instruct`:
 ```bash
 vec-inf list Meta-Llama-3.1-70B-Instruct
 ```
-<img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/5dec7a33-ba6b-490d-af47-4cf7341d0b42">
+<img width="400" alt="list_model_img" src="https://github.com/user-attachments/assets/30e42ab7-dde2-4d20-85f0-187adffefc3d">
 `launch`, `list`, and `status` command supports `--json-mode`, where the command output would be structured as a JSON string.

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "vec-inf"
-version = "0.3.3"
+version = "0.4.0.post1"
 description = "Efficient LLM inference on Slurm clusters using vLLM."
 authors = ["Marshall Wang <marshall.wang@vectorinstitute.ai>"]
 license = "MIT license"
@@ -11,8 +11,9 @@ python = "^3.10"
 requests = "^2.31.0"
 click = "^8.1.0"
 rich = "^13.7.0"
-pandas = "^2.2.2"
-vllm = { version = "^0.5.0", optional = true }
+polars = "^1.15.0"
+numpy = "^1.24.0"
+vllm = { version = "^0.6.0", optional = true }
 vllm-nccl-cu12 = { version = ">=2.18,<2.19", optional = true }
 ray = { version = "^2.9.3", optional = true }
 cupy-cuda12x = { version = "12.1.0", optional = true }

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/vec_inf/README.md RENAMED Viewed

@@ -1,7 +1,8 @@
 # `vec-inf` Commands
 * `launch`: Specify a model family and other optional parameters to launch an OpenAI compatible inference server, `--json-mode` supported. Check [`here`](./models/README.md) for complete list of available options.
-* `list`: List all available model names, `--json-mode` supported.
+* `list`: List all available model names, or append a supported model name to view the default configuration, `--json-mode` supported.
+* `metrics`: Streams performance metrics to the console.
 * `status`: Check the model status by providing its Slurm job ID, `--json-mode` supported.
 * `shutdown`: Shutdown a model by providing its Slurm job ID.

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/vec_inf/cli/_cli.py RENAMED Viewed

@@ -1,9 +1,13 @@
 import os
-from typing import Optional
+import time
+from typing import Optional, cast
 import click
+import polars as pl
 from rich.columns import Columns
 from rich.console import Console
+from rich.live import Live
 from rich.panel import Panel
 import vec_inf.cli._utils as utils
@@ -24,9 +28,19 @@ def cli():
 @click.option(
     "--max-model-len",
     type=int,
-    help="Model context length. If unspecified, will be automatically derived from the model config.",
+    help="Model context length. Default value set based on suggested resource allocation.",
+)
+@click.option(
+    "--max-num-seqs",
+    type=int,
+    help="Maximum number of sequences to process in a single request",
+)
+@click.option(
+    "--partition",
+    type=str,
+    default="a40",
+    help="Type of compute partition, default to a40",
 )
-@click.option("--partition", type=str, help="Type of compute partition, default to a40")
 @click.option(
     "--num-nodes",
     type=int,
@@ -40,24 +54,48 @@ def cli():
 @click.option(
     "--qos",
     type=str,
-    help="Quality of service, default depends on suggested resource allocation required for the model",
+    help="Quality of service",
 )
 @click.option(
     "--time",
     type=str,
-    help="Time limit for job, this should comply with QoS, default to max walltime of the chosen QoS",
+    help="Time limit for job, this should comply with QoS limits",
 )
 @click.option(
     "--vocab-size",
     type=int,
     help="Vocabulary size, this option is intended for custom models",
 )
-@click.option("--data-type", type=str, help="Model data type, default to auto")
-@click.option("--venv", type=str, help="Path to virtual environment")
+@click.option(
+    "--data-type", type=str, default="auto", help="Model data type, default to auto"
+)
+@click.option(
+    "--venv",
+    type=str,
+    default="singularity",
+    help="Path to virtual environment, default to preconfigured singularity container",
+)
 @click.option(
     "--log-dir",
     type=str,
-    help="Path to slurm log directory, default to .vec-inf-logs in home directory",
+    default="default",
+    help="Path to slurm log directory, default to .vec-inf-logs in user home directory",
+)
+@click.option(
+    "--model-weights-parent-dir",
+    type=str,
+    default="/model-weights",
+    help="Path to parent directory containing model weights, default to '/model-weights' for supported models",
+)
+@click.option(
+    "--pipeline-parallelism",
+    type=str,
+    help="Enable pipeline parallelism, accepts 'True' or 'False', default to 'True' for supported models",
+)
+@click.option(
+    "--enforce-eager",
+    type=str,
+    help="Always use eager-mode PyTorch, accepts 'True' or 'False', default to 'False' for custom models if not set",
 )
 @click.option(
     "--json-mode",
@@ -69,6 +107,7 @@ def launch(
     model_family: Optional[str] = None,
     model_variant: Optional[str] = None,
     max_model_len: Optional[int] = None,
+    max_num_seqs: Optional[int] = None,
     partition: Optional[str] = None,
     num_nodes: Optional[int] = None,
     num_gpus: Optional[int] = None,
@@ -78,11 +117,20 @@ def launch(
     data_type: Optional[str] = None,
     venv: Optional[str] = None,
     log_dir: Optional[str] = None,
+    model_weights_parent_dir: Optional[str] = None,
+    pipeline_parallelism: Optional[str] = None,
+    enforce_eager: Optional[str] = None,
     json_mode: bool = False,
 ) -> None:
     """
     Launch a model on the cluster
     """
+    if isinstance(pipeline_parallelism, str):
+        pipeline_parallelism = (
+            "True" if pipeline_parallelism.lower() == "true" else "False"
+        )
     launch_script_path = os.path.join(
         os.path.dirname(os.path.dirname(os.path.realpath(__file__))), "launch_server.sh"
     )
@@ -90,7 +138,7 @@ def launch(
     models_df = utils.load_models_df()
-    if model_name in models_df["model_name"].values:
+    if model_name in models_df["model_name"].to_list():
         default_args = utils.load_default_args(models_df, model_name)
         for arg in default_args:
             if arg in locals() and locals()[arg] is not None:
@@ -98,10 +146,11 @@ def launch(
             renamed_arg = arg.replace("_", "-")
             launch_cmd += f" --{renamed_arg} {default_args[arg]}"
     else:
-        model_args = models_df.columns.tolist()
-        excluded_keys = ["model_name", "pipeline_parallelism"]
+        model_args = models_df.columns
+        model_args.remove("model_name")
+        model_args.remove("model_type")
         for arg in model_args:
-            if arg not in excluded_keys and locals()[arg] is not None:
+            if locals()[arg] is not None:
                 renamed_arg = arg.replace("_", "-")
                 launch_cmd += f" --{renamed_arg} {locals()[arg]}"
@@ -225,40 +274,111 @@ def shutdown(slurm_job_id: int) -> None:
     is_flag=True,
     help="Output in JSON string",
 )
-def list(model_name: Optional[str] = None, json_mode: bool = False) -> None:
+def list_models(model_name: Optional[str] = None, json_mode: bool = False) -> None:
     """
     List all available models, or get default setup of a specific model
     """
-    models_df = utils.load_models_df()
-    if model_name:
-        if model_name not in models_df["model_name"].values:
+    def list_model(model_name: str, models_df: pl.DataFrame, json_mode: bool):
+        if model_name not in models_df["model_name"].to_list():
             raise ValueError(f"Model name {model_name} not found in available models")
-        excluded_keys = {"venv", "log_dir", "pipeline_parallelism"}
-        model_row = models_df.loc[models_df["model_name"] == model_name]
+        excluded_keys = {"venv", "log_dir"}
+        model_row = models_df.filter(models_df["model_name"] == model_name)
         if json_mode:
-            # click.echo(model_row.to_json(orient='records'))
-            filtered_model_row = model_row.drop(columns=excluded_keys, errors="ignore")
-            click.echo(filtered_model_row.to_json(orient="records"))
+            filtered_model_row = model_row.drop(excluded_keys, strict=False)
+            click.echo(filtered_model_row.to_dicts()[0])
             return
         table = utils.create_table(key_title="Model Config", value_title="Value")
-        for _, row in model_row.iterrows():
+        for row in model_row.to_dicts():
             for key, value in row.items():
                 if key not in excluded_keys:
                     table.add_row(key, str(value))
         CONSOLE.print(table)
-        return
-    if json_mode:
-        click.echo(models_df["model_name"].to_json(orient="records"))
-        return
-    panels = []
-    for _, row in models_df.iterrows():
-        styled_text = f"[magenta]{row['model_family']}[/magenta]-{row['model_variant']}"
-        panels.append(Panel(styled_text, expand=True))
-    CONSOLE.print(Columns(panels, equal=True))
+    def list_all(models_df: pl.DataFrame, json_mode: bool):
+        if json_mode:
+            click.echo(models_df["model_name"].to_list())
+            return
+        panels = []
+        model_type_colors = {
+            "LLM": "cyan",
+            "VLM": "bright_blue",
+            "Text Embedding": "purple",
+            "Reward Modeling": "bright_magenta",
+        }
+        models_df = models_df.with_columns(
+            pl.when(pl.col("model_type") == "LLM")
+            .then(0)
+            .when(pl.col("model_type") == "VLM")
+            .then(1)
+            .when(pl.col("model_type") == "Text Embedding")
+            .then(2)
+            .when(pl.col("model_type") == "Reward Modeling")
+            .then(3)
+            .otherwise(-1)
+            .alias("model_type_order")
+        )
+        models_df = models_df.sort("model_type_order")
+        models_df = models_df.drop("model_type_order")
+        for row in models_df.to_dicts():
+            panel_color = model_type_colors.get(row["model_type"], "white")
+            styled_text = (
+                f"[magenta]{row['model_family']}[/magenta]-{row['model_variant']}"
+            )
+            panels.append(Panel(styled_text, expand=True, border_style=panel_color))
+        CONSOLE.print(Columns(panels, equal=True))
+    models_df = utils.load_models_df()
+    if model_name:
+        list_model(model_name, models_df, json_mode)
+    else:
+        list_all(models_df, json_mode)
+@cli.command("metrics")
+@click.argument("slurm_job_id", type=int, nargs=1)
+@click.option(
+    "--log-dir",
+    type=str,
+    help="Path to slurm log directory. This is required if --log-dir was set in model launch",
+)
+def metrics(slurm_job_id: int, log_dir: Optional[str] = None) -> None:
+    """
+    Stream performance metrics to the console
+    """
+    status_cmd = f"scontrol show job {slurm_job_id} --oneliner"
+    output = utils.run_bash_command(status_cmd)
+    slurm_job_name = output.split(" ")[1].split("=")[1]
+    with Live(refresh_per_second=1, console=CONSOLE) as live:
+        while True:
+            out_logs = utils.read_slurm_log(
+                slurm_job_name, slurm_job_id, "out", log_dir
+            )
+            # if out_logs is a string, then it is an error message
+            if isinstance(out_logs, str):
+                live.update(out_logs)
+                break
+            out_logs = cast(list, out_logs)
+            latest_metrics = utils.get_latest_metric(out_logs)
+            # if latest_metrics is a string, then it is an error message
+            if isinstance(latest_metrics, str):
+                live.update(latest_metrics)
+                break
+            latest_metrics = cast(dict, latest_metrics)
+            table = utils.create_table(key_title="Metric", value_title="Value")
+            for key, value in latest_metrics.items():
+                table.add_row(key, value)
+            live.update(table)
+            time.sleep(2)
 if __name__ == "__main__":

{vec_inf-0.3.3 → vec_inf-0.4.0.post1}/vec_inf/cli/_utils.py RENAMED Viewed

@@ -1,12 +1,12 @@
 import os
 import subprocess
-from typing import Optional, Union
+from typing import Optional, Union, cast
-import pandas as pd
+import polars as pl
 import requests
 from rich.table import Table
-MODEL_READY_SIGNATURE = "INFO:     Uvicorn running on http://0.0.0.0:"
+MODEL_READY_SIGNATURE = "INFO:     Application startup complete."
 SERVER_ADDRESS_SIGNATURE = "Server address: "
@@ -25,7 +25,7 @@ def read_slurm_log(
     slurm_job_name: str, slurm_job_id: int, slurm_log_type: str, log_dir: Optional[str]
 ) -> Union[list[str], str]:
     """
-    Get the directory of a model
+    Read the slurm log file
     """
     if not log_dir:
         models_dir = os.path.join(os.path.expanduser("~"), ".vec-inf-logs")
@@ -35,9 +35,11 @@ def read_slurm_log(
                 log_dir = os.path.join(models_dir, dir)
                 break
+    log_dir = cast(str, log_dir)
     try:
         file_path = os.path.join(
-            log_dir,  # type: ignore
+            log_dir,
             f"{slurm_job_name}.{slurm_job_id}.{slurm_log_type}",
         )
         with open(file_path, "r") as file:
@@ -58,12 +60,15 @@ def is_server_running(
     if isinstance(log_content, str):
         return log_content
+    status: Union[str, tuple[str, str]] = "LAUNCHING"
     for line in log_content:
         if "error" in line.lower():
-            return ("FAILED", line.strip("\n"))
+            status = ("FAILED", line.strip("\n"))
         if MODEL_READY_SIGNATURE in line:
-            return "RUNNING"
-    return "LAUNCHING"
+            status = "RUNNING"
+    return status
 def get_base_url(slurm_job_name: str, slurm_job_id: int, log_dir: Optional[str]) -> str:
@@ -114,11 +119,11 @@ def create_table(
     return table
-def load_models_df() -> pd.DataFrame:
+def load_models_df() -> pl.DataFrame:
     """
     Load the models dataframe
     """
-    models_df = pd.read_csv(
+    models_df = pl.read_csv(
         os.path.join(
             os.path.dirname(os.path.dirname(os.path.realpath(__file__))),
             "models/models.csv",
@@ -127,11 +132,32 @@ def load_models_df() -> pd.DataFrame:
     return models_df
-def load_default_args(models_df: pd.DataFrame, model_name: str) -> dict:
+def load_default_args(models_df: pl.DataFrame, model_name: str) -> dict:
     """
     Load the default arguments for a model
     """
-    row_data = models_df.loc[models_df["model_name"] == model_name]
-    default_args = row_data.iloc[0].to_dict()
-    default_args.pop("model_name")
+    row_data = models_df.filter(models_df["model_name"] == model_name)
+    default_args = row_data.to_dicts()[0]
+    default_args.pop("model_name", None)
+    default_args.pop("model_type", None)
     return default_args
+def get_latest_metric(log_lines: list[str]) -> dict | str:
+    """Read the latest metric entry from the log file."""
+    latest_metric = {}
+    try:
+        for line in reversed(log_lines):
+            if "Avg prompt throughput" in line:
+                # Parse the metric values from the line
+                metrics_str = line.split("] ")[1].strip().strip(".")
+                metrics_list = metrics_str.split(", ")
+                for metric in metrics_list:
+                    key, value = metric.split(": ")
+                    latest_metric[key] = value
+                break
+    except Exception as e:
+        return f"[red]Error reading log file: {e}[/red]"
+    return latest_metric

vec-inf 0.3.3__tar.gz → 0.4.0.post1__tar.gz

vec-inf 0.3.3tar.gz → 0.4.0.post1tar.gz