PyPI - sglang - Versions diffs - 0.1.14__tar.gz → 0.1.16__tar.gz - Mend

sglang 0.1.14tar.gz → 0.1.16tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (82) hide show

{sglang-0.1.14/sglang.egg-info → sglang-0.1.16}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sglang
-Version: 0.1.14
+Version: 0.1.16
 Summary: A structured generation langauge for LLMs.
 License:                                  Apache License
                                    Version 2.0, January 2004
@@ -212,6 +212,7 @@ Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: requests
+Requires-Dist: tqdm
 Provides-Extra: srt
 Requires-Dist: aiohttp; extra == "srt"
 Requires-Dist: fastapi; extra == "srt"
@@ -221,19 +222,18 @@ Requires-Dist: torch; extra == "srt"
 Requires-Dist: uvloop; extra == "srt"
 Requires-Dist: uvicorn; extra == "srt"
 Requires-Dist: zmq; extra == "srt"
-Requires-Dist: vllm>=0.3.3; extra == "srt"
+Requires-Dist: vllm>=0.4.2; extra == "srt"
 Requires-Dist: interegular; extra == "srt"
-Requires-Dist: lark; extra == "srt"
-Requires-Dist: numba; extra == "srt"
 Requires-Dist: pydantic; extra == "srt"
-Requires-Dist: referencing; extra == "srt"
-Requires-Dist: diskcache; extra == "srt"
-Requires-Dist: cloudpickle; extra == "srt"
 Requires-Dist: pillow; extra == "srt"
-Requires-Dist: outlines>=0.0.27; extra == "srt"
+Requires-Dist: packaging; extra == "srt"
+Requires-Dist: huggingface_hub; extra == "srt"
+Requires-Dist: hf_transfer; extra == "srt"
+Requires-Dist: outlines>=0.0.34; extra == "srt"
 Provides-Extra: openai
 Requires-Dist: openai>=1.0; extra == "openai"
 Requires-Dist: numpy; extra == "openai"
+Requires-Dist: tiktoken; extra == "openai"
 Provides-Extra: anthropic
 Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
 Requires-Dist: numpy; extra == "anthropic"
@@ -541,7 +541,6 @@ curl http://localhost:30000/generate \
 Learn more about the argument format [here](docs/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports an experimental OpenAI-compatible API.
 ```python
@@ -571,15 +570,17 @@ response = client.chat.completions.create(
 print(response)
 ```
-In above example, the server uses the chat template specified in the model tokenizer.
-You can override the chat template if needed when launching the server:
+By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
+If needed, you can also override the chat template when launching the server:
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
 ```
 If the chat template you are looking for is missing, you are welcome to contribute it.
-Meanwhile, you can also temporary register your chat template as follows:
+Meanwhile, you can also temporarily register your chat template as follows:
 ```json
 {
@@ -606,7 +607,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
 ```
-- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.
+- You can turn on [flashinfer](docs/flashinfer.md) to accelerate the inference by using highly optimized CUDA kernels.
 ### Supported Models
 - Llama
@@ -622,10 +623,14 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
   - `python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port 3000`
 - Yi-VL
   - see [srt_example_yi_vl.py](examples/quick_start/srt_example_yi_vl.py).
-- AWQ/GPTQ quantization
+- StableLM
+- Command-R
+- DBRX
+- AWQ/GPTQ/Marlin quantization
-## Benchmark And Performance
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+## Benchmark And Performance
 - Llama-7B on NVIDIA A10G, FP16, Tensor Parallelism=1
 ![llama_7b](assets/llama_7b.jpg)
@@ -649,7 +654,4 @@ https://github.com/sgl-project/sglang/issues/157
 }
 ```
-[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/papers/2312.07104)
 We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.1.14 → sglang-0.1.16}/README.md RENAMED Viewed

@@ -297,7 +297,6 @@ curl http://localhost:30000/generate \
 Learn more about the argument format [here](docs/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports an experimental OpenAI-compatible API.
 ```python
@@ -327,15 +326,17 @@ response = client.chat.completions.create(
 print(response)
 ```
-In above example, the server uses the chat template specified in the model tokenizer.
-You can override the chat template if needed when launching the server:
+By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
+If needed, you can also override the chat template when launching the server:
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
 ```
 If the chat template you are looking for is missing, you are welcome to contribute it.
-Meanwhile, you can also temporary register your chat template as follows:
+Meanwhile, you can also temporarily register your chat template as follows:
 ```json
 {
@@ -362,7 +363,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
 ```
-- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.
+- You can turn on [flashinfer](docs/flashinfer.md) to accelerate the inference by using highly optimized CUDA kernels.
 ### Supported Models
 - Llama
@@ -378,10 +379,14 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
   - `python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port 3000`
 - Yi-VL
   - see [srt_example_yi_vl.py](examples/quick_start/srt_example_yi_vl.py).
-- AWQ/GPTQ quantization
+- StableLM
+- Command-R
+- DBRX
+- AWQ/GPTQ/Marlin quantization
-## Benchmark And Performance
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+## Benchmark And Performance
 - Llama-7B on NVIDIA A10G, FP16, Tensor Parallelism=1
 ![llama_7b](assets/llama_7b.jpg)
@@ -405,7 +410,4 @@ https://github.com/sgl-project/sglang/issues/157
 }
 ```
-[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/papers/2312.07104)
-We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).
+We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.1.14 → sglang-0.1.16}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sglang"
-version = "0.1.14"
+version = "0.1.16"
 description = "A structured generation langauge for LLMs."
 readme = "README.md"
 requires-python = ">=3.8"
@@ -15,13 +15,13 @@ classifiers = [
 ]
 dependencies = [
     "requests",
+    "tqdm",
 ]
 [project.optional-dependencies]
 srt = ["aiohttp", "fastapi", "psutil", "rpyc", "torch", "uvloop", "uvicorn",
-       "zmq", "vllm>=0.3.3", "interegular", "lark", "numba",
-       "pydantic", "referencing", "diskcache", "cloudpickle", "pillow", "outlines>=0.0.27"]
-openai = ["openai>=1.0", "numpy"]
+       "zmq", "vllm>=0.4.2", "interegular", "pydantic", "pillow", "packaging", "huggingface_hub", "hf_transfer", "outlines>=0.0.34"]
+openai = ["openai>=1.0", "numpy", "tiktoken"]
 anthropic = ["anthropic>=0.20.0", "numpy"]
 all = ["sglang[srt]", "sglang[openai]", "sglang[anthropic]"]

sglang-0.1.16/sglang/__init__.py ADDED Viewed

@@ -0,0 +1,59 @@
+__version__ = "0.1.16"
+# SGL API Components
+from sglang.api import (
+    Runtime,
+    assistant,
+    assistant_begin,
+    assistant_end,
+    flush_cache,
+    function,
+    gen,
+    gen_int,
+    gen_string,
+    get_server_args,
+    image,
+    select,
+    set_default_backend,
+    system,
+    user,
+    user_begin,
+    user_end,
+    video,
+)
+# SGL Backends
+from sglang.backend.anthropic import Anthropic
+from sglang.backend.openai import OpenAI
+from sglang.backend.runtime_endpoint import RuntimeEndpoint
+from sglang.backend.vertexai import VertexAI
+# Global Configurations
+from sglang.global_config import global_config
+# public APIs management
+__all__ = [
+    "global_config",
+    "Anthropic",
+    "OpenAI",
+    "RuntimeEndpoint",
+    "VertexAI",
+    "function",
+    "Runtime",
+    "set_default_backend",
+    "flush_cache",
+    "get_server_args",
+    "gen",
+    "gen_int",
+    "gen_string",
+    "image",
+    "video",
+    "select",
+    "system",
+    "user",
+    "assistant",
+    "user_begin",
+    "user_end",
+    "assistant_begin",
+    "assistant_end",
+]

{sglang-0.1.14 → sglang-0.1.16}/sglang/api.py RENAMED Viewed

@@ -1,13 +1,10 @@
-"""Public API"""
+"""Some Public API Definitions"""
+import os
 import re
 from typing import Callable, List, Optional, Union
-from sglang.backend.anthropic import Anthropic
 from sglang.backend.base_backend import BaseBackend
-from sglang.backend.openai import OpenAI
-from sglang.backend.runtime_endpoint import RuntimeEndpoint
-from sglang.backend.vertexai import VertexAI
 from sglang.global_config import global_config
 from sglang.lang.ir import (
     SglExpr,
@@ -18,6 +15,7 @@ from sglang.lang.ir import (
     SglRoleBegin,
     SglRoleEnd,
     SglSelect,
+    SglVideo,
 )
@@ -35,6 +33,7 @@ def function(
 def Runtime(*args, **kwargs):
     # Avoid importing unnecessary dependency
+    os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
     from sglang.srt.server import Runtime
     return Runtime(*args, **kwargs)
@@ -153,6 +152,10 @@ def image(expr: SglExpr):
     return SglImage(expr)
+def video(path: str, num_frames: int):
+    return SglVideo(path, num_frames)
 def select(
     name: Optional[str] = None,
     choices: List[str] = None,

{sglang-0.1.14 → sglang-0.1.16}/sglang/backend/anthropic.py RENAMED Viewed

@@ -1,6 +1,7 @@
 from typing import List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import get_chat_template
 from sglang.lang.interpreter import StreamExecutor
@@ -13,7 +14,7 @@ except ImportError as e:
 class Anthropic(BaseBackend):
-    def __init__(self, model_name):
+    def __init__(self, model_name, *args, **kwargs):
         super().__init__()
         if isinstance(anthropic, Exception):
@@ -21,6 +22,7 @@ class Anthropic(BaseBackend):
         self.model_name = model_name
         self.chat_template = get_chat_template("claude")
+        self.client = anthropic.Anthropic(*args, **kwargs)
     def get_chat_template(self):
         return self.chat_template
@@ -35,8 +37,14 @@ class Anthropic(BaseBackend):
         else:
             messages = [{"role": "user", "content": s.text_}]
-        ret = anthropic.Anthropic().messages.create(
+        if messages and messages[0]["role"] == "system":
+            system = messages.pop(0)["content"]
+        else:
+            system = ""
+        ret = self.client.messages.create(
             model=self.model_name,
+            system=system,
             messages=messages,
             **sampling_params.to_anthropic_kwargs(),
         )
@@ -54,10 +62,16 @@ class Anthropic(BaseBackend):
         else:
             messages = [{"role": "user", "content": s.text_}]
-        with anthropic.Anthropic().messages.stream(
+        if messages and messages[0]["role"] == "system":
+            system = messages.pop(0)["content"]
+        else:
+            system = ""
+        with self.client.messages.stream(
             model=self.model_name,
+            system=system,
             messages=messages,
             **sampling_params.to_anthropic_kwargs(),
         ) as stream:
             for text in stream.text_stream:
-                yield text, {}
+                yield text, {}

{sglang-0.1.14 → sglang-0.1.16}/sglang/backend/openai.py RENAMED Viewed

@@ -3,6 +3,7 @@ import time
 from typing import Callable, List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import ChatTemplate, get_chat_template_by_model_path
 from sglang.lang.interpreter import StreamExecutor
@@ -227,7 +228,7 @@ class OpenAI(BaseBackend):
             prompt_tokens.append(ret_token)
         decision = choices[np.argmax(scores)]
-        return decision, scores, scores
+        return decision, scores, None, None
 def openai_completion(client, retries=3, is_chat=None, prompt=None, **kwargs):

{sglang-0.1.14 → sglang-0.1.16}/sglang/backend/runtime_endpoint.py RENAMED Viewed

@@ -3,6 +3,7 @@ from typing import Callable, List, Optional, Union
 import numpy as np
 import requests
 from sglang.backend.base_backend import BaseBackend
 from sglang.global_config import global_config
 from sglang.lang.chat_template import get_chat_template_by_model_path
@@ -73,9 +74,11 @@ class RuntimeEndpoint(BaseBackend):
         assert res.status_code == 200
     def commit_lazy_operations(self, s: StreamExecutor):
+        data = {"text": s.text_, "sampling_params": {"max_new_tokens": 0}}
+        self._add_images(s, data)
         res = http_request(
             self.base_url + "/generate",
-            json={"text": s.text_, "sampling_params": {"max_new_tokens": 0}},
+            json=data,
             auth_token=self.auth_token,
             api_key=self.api_key,
             verify=self.verify,
@@ -104,6 +107,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     **sampling_params.to_srt_kwargs(),
                 },
             }
@@ -112,6 +116,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     "dtype": "int",
                     **sampling_params.to_srt_kwargs(),
                 },
@@ -142,6 +147,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     **sampling_params.to_srt_kwargs(),
                 },
             }
@@ -150,6 +156,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     "dtype": "int",
                     **sampling_params.to_srt_kwargs(),
                 },
@@ -224,13 +231,19 @@ class RuntimeEndpoint(BaseBackend):
         )
         assert res.status_code == 200
         obj = res.json()
-        normalized_prompt_logprob = [
+        normalized_prompt_logprobs = [
             r["meta_info"]["normalized_prompt_logprob"] for r in obj
         ]
-        prompt_logprob = [r["meta_info"]["prompt_logprob"] for r in obj]
+        decision = choices[np.argmax(normalized_prompt_logprobs)]
+        prefill_token_logprobs = [r["meta_info"]["prefill_token_logprobs"] for r in obj]
+        decode_token_logprobs = [r["meta_info"]["decode_token_logprobs"] for r in obj]
-        decision = choices[np.argmax(normalized_prompt_logprob)]
-        return decision, normalized_prompt_logprob, prompt_logprob
+        return (
+            decision,
+            normalized_prompt_logprobs,
+            prefill_token_logprobs,
+            decode_token_logprobs,
+        )
     def concatenate_and_append(self, src_rids: List[str], dst_rid: str):
         res = http_request(

{sglang-0.1.14 → sglang-0.1.16}/sglang/backend/vertexai.py RENAMED Viewed

@@ -3,6 +3,7 @@ import warnings
 from typing import List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import get_chat_template
 from sglang.lang.interpreter import StreamExecutor

{sglang-0.1.14 → sglang-0.1.16}/sglang/global_config.py RENAMED Viewed

@@ -12,10 +12,11 @@ class GlobalConfig:
         # Output configs
         self.skip_special_tokens_in_output = True
+        self.spaces_between_special_tokens_in_out = True
         # Optimization configs
         self.eager_fill_image = False
-        self.enable_prefix_sharing = True
+        self.enable_precache_with_tracing = True
         self.enable_parallel_encoding = True
         self.enable_parallel_decoding = True
@@ -24,5 +25,8 @@ class GlobalConfig:
         # adjust_cache: Adjust the position embedding of KV cache.
         self.concate_and_append_mode = "no_adjust"
+        # Request dependency time due to network delay
+        self.request_dependency_time = 0.03
 global_config = GlobalConfig()

{sglang-0.1.14 → sglang-0.1.16}/sglang/lang/chat_template.py RENAMED Viewed

@@ -162,6 +162,28 @@ register_chat_template(
     )
 )
+register_chat_template(
+    ChatTemplate(
+        name="llama-3-instruct",
+        default_system_prompt=None,
+        role_prefix_and_suffix={
+            "system": (
+                "<|start_header_id|>system<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+            "user": (
+                "<|start_header_id|>user<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+            "assistant": (
+                "<|start_header_id|>assistant<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+        },
+        stop_str=("<|eot_id|>",),
+    )
+)
 # Reference: https://github.com/01-ai/Yi/tree/main/VL#major-difference-with-llava
 register_chat_template(
     ChatTemplate(
@@ -192,6 +214,44 @@ register_chat_template(
     )
 )
+register_chat_template(
+    ChatTemplate(
+        name="dbrx-instruct",
+        default_system_prompt="You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.\nYOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.\nYou assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).\n(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)\nThis is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.\nYOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.",
+        role_prefix_and_suffix={
+            "system": ("<|im_start|>system\n", "<|im_end|>"),
+            "user": ("\n<|im_start|>user\n", "<|im_end|>"),
+            "assistant": ("\n<|im_start|>assistant\n", "<|im_end|>"),
+        },
+        stop_str=("<|im_end|>",),
+    )
+)
+register_chat_template(
+    ChatTemplate(
+        name="c4ai-command-r",
+        default_system_prompt=None,
+        role_prefix_and_suffix={
+            "system": (
+                "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>",
+                "<|END_OF_TURN_TOKEN|>",
+            ),
+            "user": ("<|START_OF_TURN_TOKEN|><|USER_TOKEN|>", "<|END_OF_TURN_TOKEN|>"),
+            "assistant": (
+                "<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
+                "<|END_OF_TURN_TOKEN|>",
+            ),
+        },
+        style=ChatTemplateStyle.PLAIN,
+    )
+)
+@register_chat_template_matching_function
+def match_dbrx(model_path: str):
+    if "dbrx" in model_path.lower() and "instruct" in model_path.lower():
+        return get_chat_template("dbrx-instruct")
 @register_chat_template_matching_function
 def match_vicuna(model_path: str):
@@ -199,6 +259,8 @@ def match_vicuna(model_path: str):
         return get_chat_template("vicuna_v1.1")
     if "llava-v1.5" in model_path.lower():
         return get_chat_template("vicuna_v1.1")
+    if "llava-next-video-7b" in model_path.lower():
+        return get_chat_template("vicuna_v1.1")
 @register_chat_template_matching_function
@@ -214,21 +276,33 @@ def match_llama2_chat(model_path: str):
         return get_chat_template("llama-2-chat")
+@register_chat_template_matching_function
+def match_llama3_instruct(model_path: str):
+    model_path = model_path.lower()
+    if "llama-3" in model_path and "instruct" in model_path:
+        return get_chat_template("llama-3-instruct")
 @register_chat_template_matching_function
 def match_chat_ml(model_path: str):
+    # import pdb;pdb.set_trace()
     model_path = model_path.lower()
     if "tinyllama" in model_path:
         return get_chat_template("chatml")
     if "qwen" in model_path and "chat" in model_path:
         return get_chat_template("chatml")
-    if "llava-v1.6-34b" in model_path:
+    if (
+        "llava-v1.6-34b" in model_path
+        or "llava-v1.6-yi-34b" in model_path
+        or "llava-next-video-34b" in model_path
+    ):
         return get_chat_template("chatml-llava")
 @register_chat_template_matching_function
 def match_chat_yi(model_path: str):
     model_path = model_path.lower()
-    if "yi" in model_path:
+    if "yi" in model_path and "llava" not in model_path:
         return get_chat_template("yi")
@@ -239,6 +313,13 @@ def match_gemma_it(model_path: str):
         return get_chat_template("gemma-it")
+@register_chat_template_matching_function
+def match_c4ai_command_r(model_path: str):
+    model_path = model_path.lower()
+    if "c4ai-command-r" in model_path:
+        return get_chat_template("c4ai-command-r")
 if __name__ == "__main__":
     messages = [
         {"role": "system", "content": None},  # None means default

sglang 0.1.14__tar.gz → 0.1.16__tar.gz

sglang 0.1.14tar.gz → 0.1.16tar.gz