PyPI - sglang - Versions diffs - 0.2.6__tar.gz → 0.2.7__tar.gz - Mend

sglang 0.2.6tar.gz → 0.2.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (102) hide show

{sglang-0.2.6/sglang.egg-info → sglang-0.2.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sglang
-Version: 0.2.6
+Version: 0.2.7
 Summary: SGLang is yet another fast serving framework for large language models and vision language models.
 License:                                  Apache License
                                    Version 2.0, January 2004
@@ -245,6 +245,13 @@ Requires-Dist: sglang[litellm]; extra == "all"
 <div align="center">
 <img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400"></img>
+[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
+![PyPI - Downloads](https://img.shields.io/pypi/dm/sglang)
+[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
+[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
+[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
 </div>
 --------------------------------------------------------------------------------
@@ -292,7 +299,8 @@ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/
 ### Method 2: From source
 ```
-git clone https://github.com/sgl-project/sglang.git
+# Use the stable release branch
+git clone -b release https://github.com/sgl-project/sglang.git
 cd sglang
 pip install --upgrade pip
@@ -341,7 +349,7 @@ curl http://localhost:30000/generate \
     }
   }'
 ```
-Learn more about the argument format [here](docs/sampling_params.md).
+Learn more about the argument format [here](docs/en/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports OpenAI-compatible APIs.
@@ -388,7 +396,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 ```
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --port 30000 --mem-fraction-static 0.7
 ```
-- See [hyperparameter_tuning.md](docs/hyperparameter_tuning.md) on tuning hyperparameters for better performance.
+- See [hyperparameter_tuning.md](docs/en/hyperparameter_tuning.md) on tuning hyperparameters for better performance.
 - Add `--nnodes 2` to run tensor parallelism on multiple nodes. If you have two nodes with two GPUs on each node and want to run TP=4, let `sgl-dev-0` be the hostname of the first node and `50000` be an available port.
 ```
 # Node 0
@@ -397,7 +405,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 # Node 1
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
-- If the model does not have a template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/custom_chat_template.md).
+- If the model does not have a template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/en/custom_chat_template.md).
 - To enable fp8 quantization, you can add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
 - To enable experimental torch.compile support, you can add `--enable-torch-compile`. It accelerates small models on small batch sizes.
@@ -440,7 +448,7 @@ GLOO_SOCKET_IFNAME=eth0 python3 -m sglang.launch_server --model-path meta-llama/
 - InternLM 2
 - Mistral NeMo
-Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/en/model_support.md).
 ### Benchmark Performance
@@ -671,6 +679,24 @@ for out in state.text_iter():
     print(out, end="", flush=True)
 ```
+#### Roles
+Use `sgl.system`， `sgl.user` and `sgl.assistant` to set roles when using Chat models. You can also define more complex role prompts using begin and end tokens.
+```python
+@sgl.function
+def chat_example(s):
+    s += sgl.system("You are a helpful assistant.")
+    # Same as: s += s.system("You are a helpful assistant.")
+    with s.user():
+        s += "Question: What is the capital of France?"
+    s += sgl.assistant_begin()
+    s += "Answer: " + sgl.gen(max_tokens=100, stop="\n")
+    s += sgl.assistant_end()
+```
 #### Tips and Implementation Details
 - The `choices` argument in `sgl.gen` is implemented by computing the [token-length normalized log probabilities](https://blog.eleuther.ai/multiple-choice-normalization/) of all choices and selecting the one with the highest probability.
 - The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex. It is compatible with `temperature=0` and `temperature != 0`.

{sglang-0.2.6 → sglang-0.2.7}/README.md RENAMED Viewed

@@ -1,5 +1,12 @@
 <div align="center">
 <img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400"></img>
+[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
+![PyPI - Downloads](https://img.shields.io/pypi/dm/sglang)
+[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
+[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
+[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
 </div>
 --------------------------------------------------------------------------------
@@ -47,7 +54,8 @@ pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/
 ### Method 2: From source
 ```
-git clone https://github.com/sgl-project/sglang.git
+# Use the stable release branch
+git clone -b release https://github.com/sgl-project/sglang.git
 cd sglang
 pip install --upgrade pip
@@ -96,7 +104,7 @@ curl http://localhost:30000/generate \
     }
   }'
 ```
-Learn more about the argument format [here](docs/sampling_params.md).
+Learn more about the argument format [here](docs/en/sampling_params.md).
 ### OpenAI Compatible API
 In addition, the server supports OpenAI-compatible APIs.
@@ -143,7 +151,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 ```
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --port 30000 --mem-fraction-static 0.7
 ```
-- See [hyperparameter_tuning.md](docs/hyperparameter_tuning.md) on tuning hyperparameters for better performance.
+- See [hyperparameter_tuning.md](docs/en/hyperparameter_tuning.md) on tuning hyperparameters for better performance.
 - Add `--nnodes 2` to run tensor parallelism on multiple nodes. If you have two nodes with two GPUs on each node and want to run TP=4, let `sgl-dev-0` be the hostname of the first node and `50000` be an available port.
 ```
 # Node 0
@@ -152,7 +160,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 # Node 1
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
-- If the model does not have a template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/custom_chat_template.md).
+- If the model does not have a template in the Hugging Face tokenizer, you can specify a [custom chat template](docs/en/custom_chat_template.md).
 - To enable fp8 quantization, you can add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
 - To enable experimental torch.compile support, you can add `--enable-torch-compile`. It accelerates small models on small batch sizes.
@@ -195,7 +203,7 @@ GLOO_SOCKET_IFNAME=eth0 python3 -m sglang.launch_server --model-path meta-llama/
 - InternLM 2
 - Mistral NeMo
-Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
+Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/en/model_support.md).
 ### Benchmark Performance
@@ -426,6 +434,24 @@ for out in state.text_iter():
     print(out, end="", flush=True)
 ```
+#### Roles
+Use `sgl.system`， `sgl.user` and `sgl.assistant` to set roles when using Chat models. You can also define more complex role prompts using begin and end tokens.
+```python
+@sgl.function
+def chat_example(s):
+    s += sgl.system("You are a helpful assistant.")
+    # Same as: s += s.system("You are a helpful assistant.")
+    with s.user():
+        s += "Question: What is the capital of France?"
+    s += sgl.assistant_begin()
+    s += "Answer: " + sgl.gen(max_tokens=100, stop="\n")
+    s += sgl.assistant_end()
+```
 #### Tips and Implementation Details
 - The `choices` argument in `sgl.gen` is implemented by computing the [token-length normalized log probabilities](https://blog.eleuther.ai/multiple-choice-normalization/) of all choices and selecting the one with the highest probability.
 - The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex. It is compatible with `temperature=0` and `temperature != 0`.

{sglang-0.2.6 → sglang-0.2.7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sglang"
-version = "0.2.6"
+version = "0.2.7"
 description = "SGLang is yet another fast serving framework for large language models and vision language models."
 readme = "README.md"
 requires-python = ">=3.8"

{sglang-0.2.6 → sglang-0.2.7}/sglang/__init__.py RENAMED Viewed

@@ -1,4 +1,5 @@
 # SGL API Components
 from sglang.api import (
     Runtime,
     assistant,
@@ -14,48 +15,54 @@ from sglang.api import (
     select,
     set_default_backend,
     system,
+    system_begin,
+    system_end,
     user,
     user_begin,
     user_end,
     video,
 )
-# Global Configurations
-from sglang.global_config import global_config
-# SGL Backends
-from sglang.lang.backend.anthropic import Anthropic
-from sglang.lang.backend.litellm import LiteLLM
-from sglang.lang.backend.openai import OpenAI
-from sglang.lang.backend.runtime_endpoint import RuntimeEndpoint
-from sglang.lang.backend.vertexai import VertexAI
-from .version import __version__
-# public APIs management
+# SGLang DSL APIs
 __all__ = [
-    "global_config",
-    "Anthropic",
-    "LiteLLM",
-    "OpenAI",
-    "RuntimeEndpoint",
-    "VertexAI",
-    "function",
     "Runtime",
-    "set_default_backend",
+    "assistant",
+    "assistant_begin",
+    "assistant_end",
     "flush_cache",
-    "get_server_args",
+    "function",
     "gen",
     "gen_int",
     "gen_string",
+    "get_server_args",
     "image",
-    "video",
     "select",
+    "set_default_backend",
     "system",
+    "system_begin",
+    "system_end",
     "user",
-    "assistant",
     "user_begin",
     "user_end",
-    "assistant_begin",
-    "assistant_end",
+    "video",
 ]
+# Global Configurations
+from sglang.global_config import global_config
+__all__ += ["global_config"]
+from sglang.version import __version__
+__all__ += ["__version__"]
+# SGL Backends
+from sglang.lang.backend.runtime_endpoint import RuntimeEndpoint
+from sglang.utils import LazyImport
+Anthropic = LazyImport("sglang.lang.backend.anthropic", "Anthropic")
+LiteLLM = LazyImport("sglang.lang.backend.litellm", "LiteLLM")
+OpenAI = LazyImport("sglang.lang.backend.openai", "OpenAI")
+VertexAI = LazyImport("sglang.lang.backend.vertexai", "VertexAI")
+__all__ += ["Anthropic", "LiteLLM", "OpenAI", "VertexAI", "RuntimeEndpoint"]

{sglang-0.2.6 → sglang-0.2.7}/sglang/api.py RENAMED Viewed

@@ -75,7 +75,7 @@ def gen(
     choices: Optional[List[str]] = None,
     regex: Optional[str] = None,
 ):
-    """Call the model to generate. See the meaning of the arguments in docs/sampling_params.md"""
+    """Call the model to generate. See the meaning of the arguments in docs/en/sampling_params.md"""
     if choices:
         return SglSelect(name, choices, 0.0 if temperature is None else temperature)
@@ -210,6 +210,14 @@ def assistant(expr: Optional[SglExpr] = None):
     return _role_common("assistant", expr)
+def system_begin():
+    return SglRoleBegin("system")
+def system_end():
+    return SglRoleEnd("system")
 def user_begin():
     return SglRoleBegin("user")

{sglang-0.2.6 → sglang-0.2.7}/sglang/bench_latency.py RENAMED Viewed

@@ -37,9 +37,9 @@ import torch
 import torch.distributed as dist
 from sglang.srt.hf_transformers_utils import get_tokenizer
-from sglang.srt.managers.controller.infer_batch import Batch, ForwardMode, Req
-from sglang.srt.managers.controller.model_runner import ModelRunner
+from sglang.srt.managers.schedule_batch import Batch, ForwardMode, Req
 from sglang.srt.model_config import ModelConfig
+from sglang.srt.model_executor.model_runner import ModelRunner
 from sglang.srt.sampling_params import SamplingParams
 from sglang.srt.server_args import ServerArgs
 from sglang.srt.utils import suppress_other_loggers

{sglang-0.2.6 → sglang-0.2.7}/sglang/bench_serving.py RENAMED Viewed

@@ -1,5 +1,6 @@
 # Adapted from https://github.com/vllm-project/vllm/blob/6366efc67b0aedd2c1721c14385370e50b297fb3/benchmarks/backend_request_func.py
 # Adapted from https://github.com/vllm-project/vllm/blob/6366efc67b0aedd2c1721c14385370e50b297fb3/benchmarks/benchmark_serving.py
 """
 Benchmark online serving.
@@ -84,6 +85,9 @@ async def async_request_trt_llm(
             "min_length": request_func_input.output_len,
             "end_id": 1048576,
         }
+        if args.disable_ignore_eos:
+            del payload["min_length"]
+            del payload["end_id"]
         output = RequestFuncOutput()
         output.prompt_len = request_func_input.prompt_len
@@ -149,7 +153,7 @@ async def async_request_openai_completions(
             "best_of": 1,
             "max_tokens": request_func_input.output_len,
             "stream": not args.disable_stream,
-            "ignore_eos": True,
+            "ignore_eos": not args.disable_ignore_eos,
         }
         headers = {"Authorization": f"Bearer {os.environ.get('OPENAI_API_KEY')}"}
@@ -969,6 +973,11 @@ if __name__ == "__main__":
         action="store_true",
         help="Disable streaming mode.",
     )
+    parser.add_argument(
+        "--disable-ignore-eos",
+        action="store_true",
+        help="Disable ignoring EOS.",
+    )
     set_ulimit()

{sglang-0.2.6 → sglang-0.2.7}/sglang/check_env.py RENAMED Viewed

@@ -22,7 +22,7 @@ PACKAGE_LIST = [
     "huggingface_hub",
     "interegular",
     "packaging",
-    "pillow",
+    "PIL",
     "psutil",
     "pydantic",
     "uvicorn",

{sglang-0.2.6 → sglang-0.2.7}/sglang/lang/backend/litellm.py RENAMED Viewed

@@ -61,7 +61,7 @@ class LiteLLM(BaseBackend):
             model=self.model_name,
             messages=messages,
             **self.client_params,
-            **sampling_params.to_anthropic_kwargs(),
+            **sampling_params.to_litellm_kwargs(),
         )
         comp = ret.choices[0].message.content

{sglang-0.2.6 → sglang-0.2.7}/sglang/lang/backend/openai.py RENAMED Viewed

@@ -18,7 +18,7 @@ except ImportError as e:
     openai = tiktoken = e
-logger = logging.getLogger("openai")
+logger = logging.getLogger(__name__)
 def create_logit_bias_int(tokenizer):

{sglang-0.2.6 → sglang-0.2.7}/sglang/lang/interpreter.py RENAMED Viewed

@@ -553,6 +553,7 @@ class StreamExecutor:
                 "output_token_logprobs": output_token_logprobs,
             }
             self.variable_event[name].set()
+            self.stream_var_event[name].set()
         self.text_ += decision
     def _execute_variable(self, expr: SglVariable):
@@ -705,9 +706,9 @@ class ProgramState:
     def _role_common(self, name: str, expr: Optional[SglExpr] = None):
         if expr is not None:
-            self.stream_executor.submit(
-                SglExprList([SglRoleBegin(name), expr, SglRoleEnd(name)])
-            )
+            role_expr = SglExprList([SglRoleBegin(name), expr, SglRoleEnd(name)])
+            self.stream_executor.submit(role_expr)
+            return role_expr
         else:
             @contextmanager
@@ -778,7 +779,14 @@ class ProgramState:
                     if self.stream_executor.is_finished:
                         break
             else:
-                event = self.stream_executor.stream_var_event[var_name]
+                event = None
+                while not event:
+                    if var_name in self.stream_executor.stream_var_event:
+                        event = self.stream_executor.stream_var_event[var_name]
+                    if self.stream_executor.is_finished:
+                        yield ""
+                        return
                 while True:
                     event.wait()
                     event.clear()
@@ -813,7 +821,14 @@ class ProgramState:
                     if self.stream_executor.is_finished:
                         break
             else:
-                event = self.stream_executor.stream_var_event[var_name]
+                event = None
+                while not event:
+                    if var_name in self.stream_executor.stream_var_event:
+                        event = self.stream_executor.stream_var_event[var_name]
+                    if self.stream_executor.is_finished:
+                        yield ""
+                        return
                 while True:
                     await loop.run_in_executor(None, event.wait)
                     event.clear()

{sglang-0.2.6 → sglang-0.2.7}/sglang/lang/ir.py RENAMED Viewed

@@ -410,7 +410,7 @@ class SglGen(SglExpr):
         dtype: Optional[type] = None,
         regex: Optional[str] = None,
     ):
-        """Call the model to generate. See the meaning of the arguments in docs/sampling_params.md"""
+        """Call the model to generate. See the meaning of the arguments in docs/en/sampling_params.md"""
         super().__init__()
         self.name = name
         self.sampling_params = SglSamplingParams(

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/constrained/__init__.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 import json
 from typing import Dict, Optional, Union

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/constrained/base_cache.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 """Base cache class."""
 import time

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/constrained/fsm_cache.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 """Cache for the compressed finite state machine."""
 from sglang.srt.constrained import RegexGuide, TransformerTokenizer

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/constrained/jump_forward.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 """
 Faster constrained decoding.
 Reference: https://lmsys.org/blog/2024-02-05-compressed-fsm/

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/conversation.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 """Conversation templates."""
 # Adapted from
@@ -421,3 +436,14 @@ register_conv_template(
         sep2="</s>",
     )
 )
+# Reference: https://github.com/InternLM/lmdeploy/blob/387bf54b4f124e72aab30ae9755f562e435d3d01/lmdeploy/model.py#L425-L442
+register_conv_template(
+    Conversation(
+        name="internlm2-chat",
+        system_template="<|im_start|>system\n{system_message}",
+        roles=("<|im_start|>user", "<|im_start|>assistant"),
+        sep="\n",
+        stop_str=["<|im_end|>", "<|action_end|>"],
+    )
+)

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/hf_transformers_utils.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 """Utilities for Huggingface Transformers."""
 import functools

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/layers/context_flashattention_nopad.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 # Adapted from
 # https://github.com/ModelTC/lightllm/blob/f2a54f0912293f683bf1d1695fd12c4098a5bf82/lightllm/models/llama/triton_kernel/context_flashattention_nopad.py#L1
 import torch

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/layers/extend_attention.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 import torch
 import triton
 import triton.language as tl

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/layers/fused_moe.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 # Adapted from
 # https://github.com/vllm-project/vllm/blob/c7f2cf2b7f67bce5842fedfdba508440fe257375/vllm/model_executor/layers/fused_moe/fused_moe.py#L1
 """Fused MoE kernel."""

{sglang-0.2.6 → sglang-0.2.7}/sglang/srt/layers/linear.py RENAMED Viewed

@@ -1,3 +1,18 @@
+"""
+Copyright 2023-2024 SGLang Team
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+"""
 # temporarily adapted from https://github.com/vllm-project/vllm/blob/e76466dde2bc9525d55165ceaa600d298c7bf773/vllm/model_executor/layers/linear.py
 # FIXME: refactor the linear abstraction
 from abc import abstractmethod

sglang 0.2.6__tar.gz → 0.2.7__tar.gz

sglang 0.2.6tar.gz → 0.2.7tar.gz