PyPI - sglang - Versions diffs - 0.1.13__tar.gz → 0.1.15__tar.gz - Mend

sglang 0.1.13tar.gz → 0.1.15tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (79) hide show

{sglang-0.1.13/sglang.egg-info → sglang-0.1.15}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sglang
-Version: 0.1.13
+Version: 0.1.15
 Summary: A structured generation langauge for LLMs.
 License:                                  Apache License
                                    Version 2.0, January 2004
@@ -212,6 +212,7 @@ Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: requests
+Requires-Dist: tqdm
 Provides-Extra: srt
 Requires-Dist: aiohttp; extra == "srt"
 Requires-Dist: fastapi; extra == "srt"
@@ -221,21 +222,18 @@ Requires-Dist: torch; extra == "srt"
 Requires-Dist: uvloop; extra == "srt"
 Requires-Dist: uvicorn; extra == "srt"
 Requires-Dist: zmq; extra == "srt"
-Requires-Dist: vllm>=0.3.3; extra == "srt"
+Requires-Dist: vllm>=0.4.2; extra == "srt"
 Requires-Dist: interegular; extra == "srt"
-Requires-Dist: lark; extra == "srt"
-Requires-Dist: numba; extra == "srt"
 Requires-Dist: pydantic; extra == "srt"
-Requires-Dist: referencing; extra == "srt"
-Requires-Dist: diskcache; extra == "srt"
-Requires-Dist: cloudpickle; extra == "srt"
 Requires-Dist: pillow; extra == "srt"
 Requires-Dist: outlines>=0.0.27; extra == "srt"
+Requires-Dist: packaging; extra == "srt"
 Provides-Extra: openai
 Requires-Dist: openai>=1.0; extra == "openai"
 Requires-Dist: numpy; extra == "openai"
+Requires-Dist: tiktoken; extra == "openai"
 Provides-Extra: anthropic
-Requires-Dist: anthropic; extra == "anthropic"
+Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
 Requires-Dist: numpy; extra == "anthropic"
 Provides-Extra: all
 Requires-Dist: sglang[srt]; extra == "all"
@@ -541,7 +539,6 @@ curl http://localhost:30000/generate \
 Learn more about the argument format [here](docs/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports an experimental OpenAI-compatible API.
 ```python
@@ -606,7 +603,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
 ```
-- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.
+- You can turn on [flashinfer](docs/flashinfer.md) to accelerate the inference by using highly optimized CUDA kernels.
 ### Supported Models
 - Llama
@@ -622,10 +619,14 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
   - `python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port 3000`
 - Yi-VL
   - see [srt_example_yi_vl.py](examples/quick_start/srt_example_yi_vl.py).
-- AWQ/GPTQ quantization
+- StableLM
+- Command-R
+- DBRX
+- AWQ/GPTQ/Marlin quantization
-## Benchmark And Performance
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+## Benchmark And Performance
 - Llama-7B on NVIDIA A10G, FP16, Tensor Parallelism=1
 ![llama_7b](assets/llama_7b.jpg)
@@ -649,7 +650,4 @@ https://github.com/sgl-project/sglang/issues/157
 }
 ```
-[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/papers/2312.07104)
 We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.1.13 → sglang-0.1.15}/README.md RENAMED Viewed

@@ -297,7 +297,6 @@ curl http://localhost:30000/generate \
 Learn more about the argument format [here](docs/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports an experimental OpenAI-compatible API.
 ```python
@@ -362,7 +361,7 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --mem-fraction-static 0.7
 ```
-- You can turn on [flashinfer](docs/flashinfer.md) to acclerate the inference by using highly optimized CUDA kernels.
+- You can turn on [flashinfer](docs/flashinfer.md) to accelerate the inference by using highly optimized CUDA kernels.
 ### Supported Models
 - Llama
@@ -378,10 +377,14 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
   - `python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port 3000`
 - Yi-VL
   - see [srt_example_yi_vl.py](examples/quick_start/srt_example_yi_vl.py).
-- AWQ/GPTQ quantization
+- StableLM
+- Command-R
+- DBRX
+- AWQ/GPTQ/Marlin quantization
-## Benchmark And Performance
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+## Benchmark And Performance
 - Llama-7B on NVIDIA A10G, FP16, Tensor Parallelism=1
 ![llama_7b](assets/llama_7b.jpg)
@@ -405,7 +408,4 @@ https://github.com/sgl-project/sglang/issues/157
 }
 ```
-[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/papers/2312.07104)
 We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).

{sglang-0.1.13 → sglang-0.1.15}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sglang"
-version = "0.1.13"
+version = "0.1.15"
 description = "A structured generation langauge for LLMs."
 readme = "README.md"
 requires-python = ">=3.8"
@@ -15,14 +15,14 @@ classifiers = [
 ]
 dependencies = [
     "requests",
+    "tqdm",
 ]
 [project.optional-dependencies]
 srt = ["aiohttp", "fastapi", "psutil", "rpyc", "torch", "uvloop", "uvicorn",
-       "zmq", "vllm>=0.3.3", "interegular", "lark", "numba",
-       "pydantic", "referencing", "diskcache", "cloudpickle", "pillow", "outlines>=0.0.27"]
-openai = ["openai>=1.0", "numpy"]
-anthropic = ["anthropic", "numpy"]
+       "zmq", "vllm>=0.4.2", "interegular", "pydantic", "pillow", "outlines>=0.0.27", "packaging"]
+openai = ["openai>=1.0", "numpy", "tiktoken"]
+anthropic = ["anthropic>=0.20.0", "numpy"]
 all = ["sglang[srt]", "sglang[openai]", "sglang[anthropic]"]
 [project.urls]

sglang-0.1.15/sglang/__init__.py ADDED Viewed

@@ -0,0 +1,57 @@
+__version__ = "0.1.15"
+# SGL API Components
+from sglang.api import (
+    Runtime,
+    assistant,
+    assistant_begin,
+    assistant_end,
+    flush_cache,
+    function,
+    gen,
+    gen_int,
+    gen_string,
+    get_server_args,
+    image,
+    select,
+    set_default_backend,
+    system,
+    user,
+    user_begin,
+    user_end,
+)
+# SGL Backends
+from sglang.backend.anthropic import Anthropic
+from sglang.backend.openai import OpenAI
+from sglang.backend.runtime_endpoint import RuntimeEndpoint
+from sglang.backend.vertexai import VertexAI
+# Global Configurations
+from sglang.global_config import global_config
+# public APIs management
+__all__ = [
+    "global_config",
+    "Anthropic",
+    "OpenAI",
+    "RuntimeEndpoint",
+    "VertexAI",
+    "function",
+    "Runtime",
+    "set_default_backend",
+    "flush_cache",
+    "get_server_args",
+    "gen",
+    "gen_int",
+    "gen_string",
+    "image",
+    "select",
+    "system",
+    "user",
+    "assistant",
+    "user_begin",
+    "user_end",
+    "assistant_begin",
+    "assistant_end",
+]

{sglang-0.1.13 → sglang-0.1.15}/sglang/api.py RENAMED Viewed

@@ -1,13 +1,10 @@
-"""Public API"""
+"""Some Public API Definitions"""
+import os
 import re
 from typing import Callable, List, Optional, Union
-from sglang.backend.anthropic import Anthropic
 from sglang.backend.base_backend import BaseBackend
-from sglang.backend.openai import OpenAI
-from sglang.backend.runtime_endpoint import RuntimeEndpoint
-from sglang.backend.vertexai import VertexAI
 from sglang.global_config import global_config
 from sglang.lang.ir import (
     SglExpr,
@@ -35,6 +32,7 @@ def function(
 def Runtime(*args, **kwargs):
     # Avoid importing unnecessary dependency
+    os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
     from sglang.srt.server import Runtime
     return Runtime(*args, **kwargs)

{sglang-0.1.13 → sglang-0.1.15}/sglang/backend/anthropic.py RENAMED Viewed

@@ -1,6 +1,7 @@
 from typing import List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import get_chat_template
 from sglang.lang.interpreter import StreamExecutor
@@ -13,7 +14,7 @@ except ImportError as e:
 class Anthropic(BaseBackend):
-    def __init__(self, model_name):
+    def __init__(self, model_name, *args, **kwargs):
         super().__init__()
         if isinstance(anthropic, Exception):
@@ -21,6 +22,7 @@ class Anthropic(BaseBackend):
         self.model_name = model_name
         self.chat_template = get_chat_template("claude")
+        self.client = anthropic.Anthropic(*args, **kwargs)
     def get_chat_template(self):
         return self.chat_template
@@ -30,13 +32,23 @@ class Anthropic(BaseBackend):
         s: StreamExecutor,
         sampling_params: SglSamplingParams,
     ):
-        prompt = s.text_
-        ret = anthropic.Anthropic().completions.create(
+        if s.messages_:
+            messages = s.messages_
+        else:
+            messages = [{"role": "user", "content": s.text_}]
+        if messages and messages[0]["role"] == "system":
+            system = messages.pop(0)["content"]
+        else:
+            system = ""
+        ret = self.client.messages.create(
             model=self.model_name,
-            prompt=prompt,
+            system=system,
+            messages=messages,
             **sampling_params.to_anthropic_kwargs(),
         )
-        comp = ret.completion
+        comp = ret.content[0].text
         return comp, {}
@@ -45,13 +57,21 @@ class Anthropic(BaseBackend):
         s: StreamExecutor,
         sampling_params: SglSamplingParams,
     ):
-        prompt = s.text_
-        generator = anthropic.Anthropic().completions.create(
+        if s.messages_:
+            messages = s.messages_
+        else:
+            messages = [{"role": "user", "content": s.text_}]
+        if messages and messages[0]["role"] == "system":
+            system = messages.pop(0)["content"]
+        else:
+            system = ""
+        with self.client.messages.stream(
             model=self.model_name,
-            prompt=prompt,
-            stream=True,
+            system=system,
+            messages=messages,
             **sampling_params.to_anthropic_kwargs(),
-        )
-        for ret in generator:
-            yield ret.completion, {}
+        ) as stream:
+            for text in stream.text_stream:
+                yield text, {}

{sglang-0.1.13 → sglang-0.1.15}/sglang/backend/openai.py RENAMED Viewed

@@ -3,6 +3,7 @@ import time
 from typing import Callable, List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import ChatTemplate, get_chat_template_by_model_path
 from sglang.lang.interpreter import StreamExecutor
@@ -227,7 +228,7 @@ class OpenAI(BaseBackend):
             prompt_tokens.append(ret_token)
         decision = choices[np.argmax(scores)]
-        return decision, scores, scores
+        return decision, scores, None, None
 def openai_completion(client, retries=3, is_chat=None, prompt=None, **kwargs):

{sglang-0.1.13 → sglang-0.1.15}/sglang/backend/runtime_endpoint.py RENAMED Viewed

@@ -3,6 +3,7 @@ from typing import Callable, List, Optional, Union
 import numpy as np
 import requests
 from sglang.backend.base_backend import BaseBackend
 from sglang.global_config import global_config
 from sglang.lang.chat_template import get_chat_template_by_model_path
@@ -73,9 +74,11 @@ class RuntimeEndpoint(BaseBackend):
         assert res.status_code == 200
     def commit_lazy_operations(self, s: StreamExecutor):
+        data = {"text": s.text_, "sampling_params": {"max_new_tokens": 0}}
+        self._add_images(s, data)
         res = http_request(
             self.base_url + "/generate",
-            json={"text": s.text_, "sampling_params": {"max_new_tokens": 0}},
+            json=data,
             auth_token=self.auth_token,
             api_key=self.api_key,
             verify=self.verify,
@@ -104,6 +107,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     **sampling_params.to_srt_kwargs(),
                 },
             }
@@ -112,6 +116,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     "dtype": "int",
                     **sampling_params.to_srt_kwargs(),
                 },
@@ -142,6 +147,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     **sampling_params.to_srt_kwargs(),
                 },
             }
@@ -150,6 +156,7 @@ class RuntimeEndpoint(BaseBackend):
                 "text": s.text_,
                 "sampling_params": {
                     "skip_special_tokens": global_config.skip_special_tokens_in_output,
+                    "spaces_between_special_tokens": global_config.spaces_between_special_tokens_in_out,
                     "dtype": "int",
                     **sampling_params.to_srt_kwargs(),
                 },
@@ -224,13 +231,19 @@ class RuntimeEndpoint(BaseBackend):
         )
         assert res.status_code == 200
         obj = res.json()
-        normalized_prompt_logprob = [
+        normalized_prompt_logprobs = [
             r["meta_info"]["normalized_prompt_logprob"] for r in obj
         ]
-        prompt_logprob = [r["meta_info"]["prompt_logprob"] for r in obj]
+        decision = choices[np.argmax(normalized_prompt_logprobs)]
+        prefill_token_logprobs = [r["meta_info"]["prefill_token_logprobs"] for r in obj]
+        decode_token_logprobs = [r["meta_info"]["decode_token_logprobs"] for r in obj]
-        decision = choices[np.argmax(normalized_prompt_logprob)]
-        return decision, normalized_prompt_logprob, prompt_logprob
+        return (
+            decision,
+            normalized_prompt_logprobs,
+            prefill_token_logprobs,
+            decode_token_logprobs,
+        )
     def concatenate_and_append(self, src_rids: List[str], dst_rid: str):
         res = http_request(

{sglang-0.1.13 → sglang-0.1.15}/sglang/backend/vertexai.py RENAMED Viewed

@@ -3,6 +3,7 @@ import warnings
 from typing import List, Optional, Union
 import numpy as np
 from sglang.backend.base_backend import BaseBackend
 from sglang.lang.chat_template import get_chat_template
 from sglang.lang.interpreter import StreamExecutor

{sglang-0.1.13 → sglang-0.1.15}/sglang/global_config.py RENAMED Viewed

@@ -12,6 +12,7 @@ class GlobalConfig:
         # Output configs
         self.skip_special_tokens_in_output = True
+        self.spaces_between_special_tokens_in_out = True
         # Optimization configs
         self.eager_fill_image = False

{sglang-0.1.13 → sglang-0.1.15}/sglang/lang/chat_template.py RENAMED Viewed

@@ -162,6 +162,28 @@ register_chat_template(
     )
 )
+register_chat_template(
+    ChatTemplate(
+        name="llama-3-instruct",
+        default_system_prompt=None,
+        role_prefix_and_suffix={
+            "system": (
+                "<|start_header_id|>system<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+            "user": (
+                "<|start_header_id|>user<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+            "assistant": (
+                "<|start_header_id|>assistant<|end_header_id|>\n\n",
+                "<|eot_id|>",
+            ),
+        },
+        stop_str=("<|eot_id|>",),
+    )
+)
 # Reference: https://github.com/01-ai/Yi/tree/main/VL#major-difference-with-llava
 register_chat_template(
     ChatTemplate(
@@ -192,6 +214,44 @@ register_chat_template(
     )
 )
+register_chat_template(
+    ChatTemplate(
+        name="dbrx-instruct",
+        default_system_prompt="You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.\nYOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.\nYou assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).\n(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)\nThis is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.\nYOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.",
+        role_prefix_and_suffix={
+            "system": ("<|im_start|>system\n", "<|im_end|>"),
+            "user": ("\n<|im_start|>user\n", "<|im_end|>"),
+            "assistant": ("\n<|im_start|>assistant\n", "<|im_end|>"),
+        },
+        stop_str=("<|im_end|>",),
+    )
+)
+register_chat_template(
+    ChatTemplate(
+        name="c4ai-command-r",
+        default_system_prompt=None,
+        role_prefix_and_suffix={
+            "system": (
+                "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>",
+                "<|END_OF_TURN_TOKEN|>",
+            ),
+            "user": ("<|START_OF_TURN_TOKEN|><|USER_TOKEN|>", "<|END_OF_TURN_TOKEN|>"),
+            "assistant": (
+                "<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
+                "<|END_OF_TURN_TOKEN|>",
+            ),
+        },
+        style=ChatTemplateStyle.PLAIN,
+    )
+)
+@register_chat_template_matching_function
+def match_dbrx(model_path: str):
+    if "dbrx" in model_path.lower() and "instruct" in model_path.lower():
+        return get_chat_template("dbrx-instruct")
 @register_chat_template_matching_function
 def match_vicuna(model_path: str):
@@ -214,6 +274,13 @@ def match_llama2_chat(model_path: str):
         return get_chat_template("llama-2-chat")
+@register_chat_template_matching_function
+def match_llama3_instruct(model_path: str):
+    model_path = model_path.lower()
+    if "llama-3" in model_path and "instruct" in model_path:
+        return get_chat_template("llama-3-instruct")
 @register_chat_template_matching_function
 def match_chat_ml(model_path: str):
     model_path = model_path.lower()
@@ -239,6 +306,13 @@ def match_gemma_it(model_path: str):
         return get_chat_template("gemma-it")
+@register_chat_template_matching_function
+def match_c4ai_command_r(model_path: str):
+    model_path = model_path.lower()
+    if "c4ai-command-r" in model_path:
+        return get_chat_template("c4ai-command-r")
 if __name__ == "__main__":
     messages = [
         {"role": "system", "content": None},  # None means default

{sglang-0.1.13 → sglang-0.1.15}/sglang/lang/interpreter.py RENAMED Viewed

@@ -1,6 +1,7 @@
 """The interpreter that executes SGL programs"""
 import asyncio
+import contextvars
 import multiprocessing
 import queue
 import threading
@@ -10,6 +11,7 @@ from contextlib import contextmanager
 from typing import Any, Callable, Dict, List, Optional, Union
 import tqdm
 from sglang.global_config import global_config
 from sglang.lang.ir import (
     SglCommitLazy,
@@ -217,7 +219,13 @@ class StreamExecutor:
         self.use_thread = use_thread
         if self.use_thread:
             self.queue = queue.Queue()
-            self.worker = threading.Thread(target=self._thread_worker_func)
+            def _run_worker_in_context():
+                self._thread_worker_func()
+            self.worker = threading.Thread(
+                target=contextvars.copy_context().run, args=(_run_worker_in_context,)
+            )
             self.worker.start()
         # For streaming
@@ -248,17 +256,24 @@ class StreamExecutor:
     def set_var(self, name, value):
         self.variables[name] = value
-    def get_meta_info(self, name):
+    def get_meta_info(self, name, timeout=None):
         if name in self.variable_event:
-            self.variable_event[name].wait()
+            got = self.variable_event[name].wait(timeout)
+            if not got:
+                raise TimeoutError(f"Timeout while waiting for event '{name}'")
         ret = self.meta_info.get(name, None)
         return ret
-    def fork(self, number: int, position_ids_offset: Optional[List[int]] = None):
-        self.submit(SglCommitLazy())
-        self.sync()
+    def fork(
+        self,
+        size: int = 1,
+        position_ids_offset: Optional[List[int]] = None,
+    ):
+        if size > 1:
+            self.submit(SglCommitLazy())
-        number = int(number)
+        self.sync()
+        size = int(size)
         exes = [
             StreamExecutor(
@@ -268,14 +283,15 @@ class StreamExecutor:
                 self.chat_template,
                 self.stream,
             )
-            for _ in range(number)
+            for _ in range(size)
         ]
-        for i in range(number):
+        for i in range(size):
             exes[i].variables = dict(self.variables)
             exes[i].text_ = str(self.text_)
             exes[i].messages_ = list(self.messages_)
             exes[i].cur_role = self.cur_role
             exes[i].fork_start_text_pos = len(self.text_)
+            exes[i].images_ = list(self.images_)
         return exes
@@ -454,15 +470,19 @@ class StreamExecutor:
             self.stream_var_event[name].set()
     def _execute_select(self, expr: SglSelect):
-        decision, normalized_prompt_logprob, prompt_logprob = self.backend.select(
-            self, expr.choices, expr.temperature
-        )
+        (
+            decision,
+            normalized_prompt_logprobs,
+            prefill_token_logprobs,
+            decode_token_logprobs,
+        ) = self.backend.select(self, expr.choices, expr.temperature)
         if expr.name is not None:
             name = expr.name
             self.variables[name] = decision
             self.meta_info[name] = {
-                "normalized_prompt_logprob": normalized_prompt_logprob,
-                "prompt_logprob": prompt_logprob,
+                "normalized_prompt_logprobs": normalized_prompt_logprobs,
+                "prefill_token_logprobs": prefill_token_logprobs,
+                "decode_token_logprobs": decode_token_logprobs,
             }
             self.variable_event[name].set()
         self.text_ += decision
@@ -634,8 +654,12 @@ class ProgramState:
         yield
         self.stream_executor.submit(SglVarScopeEnd(name))
-    def fork(self, number: int = 1, position_ids_offset: Optional[List[int]] = None):
-        stream_executors = self.stream_executor.fork(number, position_ids_offset)
+    def fork(
+        self,
+        size: int = 1,
+        position_ids_offset: Optional[List[int]] = None,
+    ):
+        stream_executors = self.stream_executor.fork(size, position_ids_offset)
         states = [ProgramState(x) for x in stream_executors]
         state_group = ProgramStateGroup(states, self)
         return state_group

{sglang-0.1.13 → sglang-0.1.15}/sglang/lang/ir.py RENAMED Viewed

@@ -73,7 +73,7 @@ class SglSamplingParams:
                 "Regular expression is not supported in the Anthropic backend."
             )
         return {
-            "max_tokens_to_sample": self.max_new_tokens,
+            "max_tokens": self.max_new_tokens,
             "stop_sequences": (
                 self.stop if isinstance(self.stop, (list, tuple)) else [self.stop]
             ),

sglang 0.1.13__tar.gz → 0.1.15__tar.gz

sglang 0.1.13tar.gz → 0.1.15tar.gz