PyPI - sglang - Versions diffs - 0.4.0__tar.gz → 0.4.0.post2__tar.gz - Mend

sglang 0.4.0tar.gz → 0.4.0.post2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (200) hide show

{sglang-0.4.0 → sglang-0.4.0.post2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sglang
-Version: 0.4.0
+Version: 0.4.0.post2
 Summary: SGLang is yet another fast serving framework for large language models and vision language models.
 License:                                  Apache License
                                    Version 2.0, January 2004
@@ -215,6 +215,7 @@ Requires-Dist: requests
 Requires-Dist: tqdm
 Requires-Dist: numpy
 Requires-Dist: IPython
+Requires-Dist: setproctitle
 Provides-Extra: runtime-common
 Requires-Dist: aiohttp; extra == "runtime-common"
 Requires-Dist: decord; extra == "runtime-common"
@@ -232,16 +233,17 @@ Requires-Dist: psutil; extra == "runtime-common"
 Requires-Dist: pydantic; extra == "runtime-common"
 Requires-Dist: python-multipart; extra == "runtime-common"
 Requires-Dist: pyzmq>=25.1.2; extra == "runtime-common"
-Requires-Dist: torchao; extra == "runtime-common"
+Requires-Dist: torchao>=0.7.0; extra == "runtime-common"
+Requires-Dist: gemlite; extra == "runtime-common"
 Requires-Dist: uvicorn; extra == "runtime-common"
 Requires-Dist: uvloop; extra == "runtime-common"
-Requires-Dist: xgrammar>=0.1.4; extra == "runtime-common"
+Requires-Dist: xgrammar>=0.1.6; extra == "runtime-common"
 Provides-Extra: srt
 Requires-Dist: sglang[runtime_common]; extra == "srt"
 Requires-Dist: torch; extra == "srt"
-Requires-Dist: vllm>=0.6.3.post1; extra == "srt"
+Requires-Dist: vllm<=0.6.4.post1,>=0.6.3.post1; extra == "srt"
 Requires-Dist: cuda-python; extra == "srt"
-Requires-Dist: flashinfer>=0.1.6; extra == "srt"
+Requires-Dist: flashinfer==0.1.6; extra == "srt"
 Provides-Extra: srt-hip
 Requires-Dist: sglang[runtime_common]; extra == "srt-hip"
 Requires-Dist: torch; extra == "srt-hip"
@@ -311,10 +313,14 @@ Requires-Dist: sglang[test]; extra == "dev-hpu"
 --------------------------------------------------------------------------------
-| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/) | [**Documentation**](https://sgl-project.github.io/) | [**Join Slack**](https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2tmmp6flg-89dOlJW2TjnBrTRk1I_~GA) |
-[**Join Bi-Weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?usp=sharing) | [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
+| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/)
+| [**Documentation**](https://sgl-project.github.io/)
+| [**Join Slack**](https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2tmmp6flg-89dOlJW2TjnBrTRk1I_~GA)
+| [**Join Bi-Weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?usp=sharing)
+| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
 ## News
+- [2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).
 - [2024/10] 🔥 The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).
 - [2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).
 - [2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).
@@ -346,13 +352,13 @@ The core features include:
 - [Frontend: Structured Generation Language (SGLang)](https://sgl-project.github.io/frontend/frontend.html)
 ## Benchmark And Performance
-Learn more in our release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)
+Learn more in our release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)
 ## Roadmap
 [Development Roadmap (2024 Q4)](https://github.com/sgl-project/sglang/issues/1487)
 ## Adoption and Sponsorship
-The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.
+The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.
 ## Acknowledgment and Citation
 We learned from the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.4.0 → sglang-0.4.0.post2}/README.md RENAMED Viewed

@@ -12,10 +12,14 @@
 --------------------------------------------------------------------------------
-| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/) | [**Documentation**](https://sgl-project.github.io/) | [**Join Slack**](https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2tmmp6flg-89dOlJW2TjnBrTRk1I_~GA) |
-[**Join Bi-Weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?usp=sharing) | [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
+| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/)
+| [**Documentation**](https://sgl-project.github.io/)
+| [**Join Slack**](https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2tmmp6flg-89dOlJW2TjnBrTRk1I_~GA)
+| [**Join Bi-Weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?usp=sharing)
+| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
 ## News
+- [2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).
 - [2024/10] 🔥 The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).
 - [2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).
 - [2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).
@@ -47,13 +51,13 @@ The core features include:
 - [Frontend: Structured Generation Language (SGLang)](https://sgl-project.github.io/frontend/frontend.html)
 ## Benchmark And Performance
-Learn more in our release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)
+Learn more in our release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)
 ## Roadmap
 [Development Roadmap (2024 Q4)](https://github.com/sgl-project/sglang/issues/1487)
 ## Adoption and Sponsorship
-The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.
+The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.
 ## Acknowledgment and Citation
 We learned from the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.4.0 → sglang-0.4.0.post2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sglang"
-version = "0.4.0"
+version = "0.4.0.post2"
 description = "SGLang is yet another fast serving framework for large language models and vision language models."
 readme = "README.md"
 requires-python = ">=3.8"
@@ -13,7 +13,7 @@ classifiers = [
     "Programming Language :: Python :: 3",
     "License :: OSI Approved :: Apache Software License",
 ]
-dependencies = ["requests", "tqdm", "numpy", "IPython"]
+dependencies = ["requests", "tqdm", "numpy", "IPython", "setproctitle"]
 [project.optional-dependencies]
 runtime_common = ["aiohttp", "decord", "fastapi",
@@ -21,9 +21,9 @@ runtime_common = ["aiohttp", "decord", "fastapi",
     "orjson", "outlines>=0.0.44,<0.1.0",
     "packaging", "pillow", "prometheus-client>=0.20.0",
     "psutil", "pydantic", "python-multipart",
-    "pyzmq>=25.1.2", "torchao", "uvicorn", "uvloop",
-    "xgrammar>=0.1.4"]
-srt = ["sglang[runtime_common]", "torch", "vllm>=0.6.3.post1", "cuda-python", "flashinfer>=0.1.6"]
+    "pyzmq>=25.1.2", "torchao>=0.7.0", "gemlite", "uvicorn", "uvloop",
+    "xgrammar>=0.1.6"]
+srt = ["sglang[runtime_common]", "torch", "vllm>=0.6.3.post1,<=0.6.4.post1", "cuda-python", "flashinfer==0.1.6"]
 # HIP (Heterogeneous-computing Interface for Portability) for AMD
 # => base docker rocm/vllm-dev:20241022, not from public vllm whl
@@ -33,7 +33,7 @@ srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.3.dev13"]
 srt_xpu = ["sglang[runtime_common]"]
 #For Intel Gaudi(device : hpu) follow the installation guide
 #https://docs.vllm.ai/en/latest/getting_started/gaudi-installation.html
-srt_hpu =  ["sglang[runtime_common]"]
+srt_hpu = ["sglang[runtime_common]"]
 openai = ["openai>=1.0", "tiktoken"]
 anthropic = ["anthropic>=0.20.0"]
@@ -50,6 +50,7 @@ all = ["sglang[srt]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
 all_hip = ["sglang[srt_hip]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
 all_xpu = ["sglang[srt_xpu]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
 all_hpu = ["sglang[srt_hpu]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
 dev = ["sglang[all]", "sglang[test]"]
 dev_hip = ["sglang[all_hip]", "sglang[test]"]
 dev_xpu = ["sglang[all_xpu]", "sglang[test]"]

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/__init__.py RENAMED Viewed

@@ -66,7 +66,7 @@ from sglang.version import __version__
 __all__ += ["__version__"]
-# SGL Backends
+# SGLang Backends
 from sglang.lang.backend.runtime_endpoint import RuntimeEndpoint
 from sglang.utils import LazyImport

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/bench_offline_throughput.py RENAMED Viewed

@@ -201,18 +201,17 @@ def throughput_test_once(
         for r in reqs
     ]
-    st = time.perf_counter()
     if profile:
         backend.start_profile()
+    st = time.perf_counter()
     gen_out = backend.generate(prompt=prompt, sampling_params=sampling_params)
+    latency = time.perf_counter() - st
     if profile:
         backend.stop_profile()
         monitor_trace_file(os.getenv("SGLANG_TORCH_PROFILER_DIR"))
-    latency = time.perf_counter() - st
     if backend_name == "runtime":
         gen_out = json.loads(gen_out)
@@ -285,7 +284,7 @@ def throughput_test(
     else:
         raise ValueError('Please set backend to either "engine" or "runtime"')
-    tokenizer_id = server_args.model_path
+    tokenizer_id = server_args.tokenizer_path or server_args.model_path
     tokenizer = get_tokenizer(tokenizer_id)
     # Set global environmnets
@@ -304,8 +303,8 @@ def throughput_test(
     warmup_requests = sample_random_requests(
         input_len=256,
         output_len=16,
-        num_prompts=16,
-        range_ratio=0.8,
+        num_prompts=min(bench_args.num_prompts, 16),
+        range_ratio=1.0,
         tokenizer=tokenizer,
         dataset_path=bench_args.dataset_path,
     )
@@ -321,6 +320,19 @@ def throughput_test(
             extra_request_body=extra_request_body,
             profile=False,
         )
+        time.sleep(0.5)
+    try:
+        import os
+        import pwd
+        from gemlite.core import GemLiteLinearTriton
+        GemLiteLinearTriton.cache_config(
+            f"/tmp/{pwd.getpwuid(os.getuid()).pw_gecos}_gemlite.json"
+        )
+    except ImportError:
+        pass
     logging.info("\nBenchmark...")
     result = throughput_test_once(

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/bench_one_batch.py RENAMED Viewed

@@ -385,6 +385,19 @@ def latency_test(
         8,  # shorter decoding to speed up the warmup
         server_args.device,
     )
+    try:
+        import os
+        import pwd
+        from gemlite.core import GemLiteLinearTriton
+        GemLiteLinearTriton.cache_config(
+            f"/tmp/{pwd.getpwuid(os.getuid()).pw_gecos}_gemlite.json"
+        )
+    except ImportError:
+        pass
     rank_print("Benchmark ...")
     # Run the sweep

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/bench_serving.py RENAMED Viewed

@@ -321,6 +321,8 @@ async def async_request_sglang_generate(
             },
             "stream": not args.disable_stream,
             "lora_path": request_func_input.lora_name,
+            "return_logprob": args.return_logprob,
+            "logprob_start_len": -1,
             **request_func_input.extra_request_body,
         }
         headers = {}
@@ -911,7 +913,7 @@ async def benchmark(
         prompt=test_prompt,
         api_url=api_url,
         prompt_len=test_prompt_len,
-        output_len=test_output_len,
+        output_len=min(test_output_len, 32),
         lora_name=lora_name,
         extra_request_body=extra_request_body,
     )
@@ -1413,6 +1415,11 @@ if __name__ == "__main__":
         action="store_true",
         help="Disable ignoring EOS.",
     )
+    parser.add_argument(
+        "--return-logprob",
+        action="store_true",
+        help="Return logprob.",
+    )
     parser.add_argument(
         "--extra-request-body",
         metavar='{"key1": "value1", "key2": "value2"}',

sglang-0.4.0.post2/sglang/check_env.py ADDED Viewed

@@ -0,0 +1,305 @@
+"""Check environment configurations and dependency versions."""
+import importlib
+import os
+import resource
+import subprocess
+import sys
+from collections import OrderedDict, defaultdict
+import torch
+from sglang.srt.utils import is_hip
+def is_cuda_v2():
+    return torch.version.cuda is not None
+# List of packages to check versions
+PACKAGE_LIST = [
+    "sglang",
+    "flashinfer",
+    "triton",
+    "transformers",
+    "torchao",
+    "numpy",
+    "aiohttp",
+    "fastapi",
+    "hf_transfer",
+    "huggingface_hub",
+    "interegular",
+    "modelscope",
+    "orjson",
+    "outlines",
+    "packaging",
+    "psutil",
+    "pydantic",
+    "multipart",
+    "zmq",
+    "torchao",
+    "uvicorn",
+    "uvloop",
+    "vllm",
+    "xgrammar",
+    "openai",
+    "tiktoken",
+    "anthropic",
+    "litellm",
+    "decord",
+]
+def get_package_versions(packages):
+    """
+    Get versions of specified packages.
+    """
+    versions = {}
+    for package in packages:
+        package_name = package.split("==")[0].split(">=")[0].split("<=")[0]
+        try:
+            module = importlib.import_module(package_name)
+            if hasattr(module, "__version__"):
+                versions[package_name] = module.__version__
+        except ModuleNotFoundError:
+            versions[package_name] = "Module Not Found"
+    return versions
+def get_cuda_info():
+    """
+    Get CUDA-related information if available.
+    """
+    if is_cuda_v2():
+        cuda_info = {"CUDA available": torch.cuda.is_available()}
+        if cuda_info["CUDA available"]:
+            cuda_info.update(_get_gpu_info())
+            cuda_info.update(_get_cuda_version_info())
+        return cuda_info
+    elif is_hip():
+        cuda_info = {"ROCM available": torch.cuda.is_available()}
+        if cuda_info["ROCM available"]:
+            cuda_info.update(_get_gpu_info())
+            cuda_info.update(_get_cuda_version_info())
+        return cuda_info
+def _get_gpu_info():
+    """
+    Get information about available GPUs.
+    """
+    devices = defaultdict(list)
+    capabilities = defaultdict(list)
+    for k in range(torch.cuda.device_count()):
+        devices[torch.cuda.get_device_name(k)].append(str(k))
+        capability = torch.cuda.get_device_capability(k)
+        capabilities[f"{capability[0]}.{capability[1]}"].append(str(k))
+    gpu_info = {}
+    for name, device_ids in devices.items():
+        gpu_info[f"GPU {','.join(device_ids)}"] = name
+    if len(capabilities) == 1:
+        # All GPUs have the same compute capability
+        cap, gpu_ids = list(capabilities.items())[0]
+        gpu_info[f"GPU {','.join(gpu_ids)} Compute Capability"] = cap
+    else:
+        # GPUs have different compute capabilities
+        for cap, gpu_ids in capabilities.items():
+            gpu_info[f"GPU {','.join(gpu_ids)} Compute Capability"] = cap
+    return gpu_info
+def _get_cuda_version_info():
+    """
+    Get CUDA version information.
+    """
+    if is_cuda_v2():
+        from torch.utils.cpp_extension import CUDA_HOME
+        cuda_info = {"CUDA_HOME": CUDA_HOME}
+        if CUDA_HOME and os.path.isdir(CUDA_HOME):
+            cuda_info.update(_get_nvcc_info())
+            cuda_info.update(_get_cuda_driver_version())
+        return cuda_info
+    elif is_hip():
+        from torch.utils.cpp_extension import ROCM_HOME as ROCM_HOME
+        cuda_info = {"ROCM_HOME": ROCM_HOME}
+        if ROCM_HOME and os.path.isdir(ROCM_HOME):
+            cuda_info.update(_get_nvcc_info())
+            cuda_info.update(_get_cuda_driver_version())
+        return cuda_info
+    else:
+        cuda_info = {"CUDA_HOME": ""}
+        return cuda_info
+def _get_nvcc_info():
+    """
+    Get NVCC version information.
+    """
+    if is_cuda_v2():
+        from torch.utils.cpp_extension import CUDA_HOME
+        try:
+            nvcc = os.path.join(CUDA_HOME, "bin/nvcc")
+            nvcc_output = (
+                subprocess.check_output(f'"{nvcc}" -V', shell=True)
+                .decode("utf-8")
+                .strip()
+            )
+            return {
+                "NVCC": nvcc_output[
+                    nvcc_output.rfind("Cuda compilation tools") : nvcc_output.rfind(
+                        "Build"
+                    )
+                ].strip()
+            }
+        except subprocess.SubprocessError:
+            return {"NVCC": "Not Available"}
+    elif is_hip():
+        from torch.utils.cpp_extension import ROCM_HOME
+        try:
+            hipcc = os.path.join(ROCM_HOME, "bin/hipcc")
+            hipcc_output = (
+                subprocess.check_output(f'"{hipcc}" --version', shell=True)
+                .decode("utf-8")
+                .strip()
+            )
+            return {
+                "HIPCC": hipcc_output[
+                    hipcc_output.rfind("HIP version") : hipcc_output.rfind("AMD clang")
+                ].strip()
+            }
+        except subprocess.SubprocessError:
+            return {"HIPCC": "Not Available"}
+    else:
+        return {"NVCC": "Not Available"}
+def _get_cuda_driver_version():
+    """
+    Get CUDA driver version.
+    """
+    versions = set()
+    if is_cuda_v2():
+        try:
+            output = subprocess.check_output(
+                [
+                    "nvidia-smi",
+                    "--query-gpu=driver_version",
+                    "--format=csv,noheader,nounits",
+                ]
+            )
+            versions = set(output.decode().strip().split("\n"))
+            if len(versions) == 1:
+                return {"CUDA Driver Version": versions.pop()}
+            else:
+                return {"CUDA Driver Versions": ", ".join(sorted(versions))}
+        except subprocess.SubprocessError:
+            return {"CUDA Driver Version": "Not Available"}
+    elif is_hip():
+        try:
+            output = subprocess.check_output(
+                [
+                    "rocm-smi",
+                    "--showdriverversion",
+                    "--csv",
+                ]
+            )
+            versions = set(output.decode().strip().split("\n"))
+            versions.discard("name, value")
+            ver = versions.pop()
+            ver = ver.replace('"Driver version", ', "").replace('"', "")
+            return {"ROCM Driver Version": ver}
+        except subprocess.SubprocessError:
+            return {"ROCM Driver Version": "Not Available"}
+    else:
+        return {"CUDA Driver Version": "Not Available"}
+def get_gpu_topology():
+    """
+    Get GPU topology information.
+    """
+    if is_cuda_v2():
+        try:
+            result = subprocess.run(
+                ["nvidia-smi", "topo", "-m"],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+                check=True,
+            )
+            return "\n" + result.stdout if result.returncode == 0 else None
+        except subprocess.SubprocessError:
+            return None
+    elif is_hip():
+        try:
+            result = subprocess.run(
+                ["rocm-smi", "--showtopotype"],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+                check=True,
+            )
+            return "\n" + result.stdout if result.returncode == 0 else None
+        except subprocess.SubprocessError:
+            return None
+    else:
+        return None
+def get_hypervisor_vendor():
+    try:
+        output = subprocess.check_output(["lscpu"], text=True)
+        for line in output.split("\n"):
+            if "Hypervisor vendor:" in line:
+                return line.split(":")[1].strip()
+        return None
+    except:
+        return None
+def check_env():
+    """
+    Check and print environment information.
+    """
+    env_info = OrderedDict()
+    env_info["Python"] = sys.version.replace("\n", "")
+    env_info.update(get_cuda_info())
+    env_info["PyTorch"] = torch.__version__
+    env_info.update(get_package_versions(PACKAGE_LIST))
+    gpu_topo = get_gpu_topology()
+    if gpu_topo:
+        if is_cuda_v2():
+            env_info["NVIDIA Topology"] = gpu_topo
+        elif is_hip():
+            env_info["AMD Topology"] = gpu_topo
+    hypervisor_vendor = get_hypervisor_vendor()
+    if hypervisor_vendor:
+        env_info["Hypervisor vendor"] = hypervisor_vendor
+    ulimit_soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
+    env_info["ulimit soft"] = ulimit_soft
+    for k, v in env_info.items():
+        print(f"{k}: {v}")
+if __name__ == "__main__":
+    check_env()

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/lang/backend/runtime_endpoint.py RENAMED Viewed

@@ -55,6 +55,7 @@ class RuntimeEndpoint(BaseBackend):
             self.base_url + "/flush_cache",
             api_key=self.api_key,
             verify=self.verify,
+            method="POST",
         )
         self._assert_success(res)

{sglang-0.4.0 → sglang-0.4.0.post2}/sglang/lang/chat_template.py RENAMED Viewed

@@ -320,6 +320,28 @@ register_chat_template(
     )
 )
+register_chat_template(
+    ChatTemplate(
+        name="granite-3-instruct",
+        default_system_prompt=None,
+        role_prefix_and_suffix={
+            "system": (
+                "<|start_of_role|>system<|end_of_role|>",
+                "<|end_of_text|>",
+            ),
+            "user": (
+                "<|start_of_role|>user<|end_of_role|>",
+                "<|end_of_text|>",
+            ),
+            "assistant": (
+                "<|start_of_role|>assistant<|end_of_role|>",
+                "<|end_of_text|>",
+            ),
+        },
+        stop_str=("<|end_of_text|>",),
+    )
+)
 @register_chat_template_matching_function
 def match_dbrx(model_path: str):
@@ -402,6 +424,16 @@ def match_c4ai_command_r(model_path: str):
         return get_chat_template("c4ai-command-r")
+@register_chat_template_matching_function
+def match_granite_instruct(model_path: str):
+    model_path = model_path.lower()
+    # When future versions of Granite are released, this code may
+    # need to be updated. For now, assume that the Granite 3.0
+    # template works across the board.
+    if "granite" in model_path and "instruct" in model_path:
+        return get_chat_template("granite-3-instruct")
 if __name__ == "__main__":
     messages = [
         {"role": "system", "content": None},  # None means default

sglang 0.4.0__tar.gz → 0.4.0.post2__tar.gz

sglang 0.4.0tar.gz → 0.4.0.post2tar.gz