PyPI - xinference - Versions diffs - 0.0.1__tar.gz → 0.0.3__tar.gz - Mend

xinference 0.0.1tar.gz → 0.0.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of xinference might be problematic. Click here for more details.

Files changed (45) hide show

{xinference-0.0.1 → xinference-0.0.3}/MANIFEST.in RENAMED Viewed

@@ -8,3 +8,4 @@ global-exclude .DS_Store
 include versioneer.py
 include xinference/_version.py
 global-exclude conftest.py
+include xinference/locale/*.json

{xinference-0.0.1/xinference.egg-info → xinference-0.0.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xinference
-Version: 0.0.1
+Version: 0.0.3
 Summary: Model Serving Made Easy
 Home-page: https://github.com/xorbitsai/inference
 Author: Qin Xuye
@@ -19,15 +19,16 @@ Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Topic :: Software Development :: Libraries
 Description-Content-Type: text/markdown
 Provides-Extra: dev
+Provides-Extra: all
 License-File: LICENSE
 [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
-[![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
+[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
 [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
 [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
 [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
-# Xorbits inference: Model Serving Made Easy 🤖
+# Xorbits Inference: Model Serving Made Easy 🤖
 Welcome to the Xorbits Inference GitHub repository!
@@ -42,10 +43,16 @@ which is specifically designed to enable large models and high performance on co
 We are actively working on expanding Xorbits Inference's support to include additional runtimes,
 including PyTorch and JAX, in the near future.
+![demo](assets/demo.gif)
+<div align="center">
+<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
+</div>
 ## Key Features
 🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
-recognition, and multimodal models. With a single command, you can set up and deploy your models
-for experimentation and production.
+recognition, and multimodal models. You can set up and deploy your models
+for experimentation and production with a single command.
 ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
 command. Inference provides access to state-of-the-art open-source models!
@@ -63,43 +70,44 @@ for seamless management and monitoring.
 allowing the seamless distribution of model inference across multiple devices or machines. It
 leverages distributed computing techniques to parallelize and scale the inference process.
-🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
-integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
+🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
+with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
 ## Getting Started
 Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
 environment to avoid conflicts.
 ```bash
-$ pip install xinference
+$ pip install "xinference[all]"
 ```
+"xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
+different hardware, refer to the installation documentation of the corresponding package.
+- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
+- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
 ### Deployment
-To start a local instance of Xinference, run the following command:
+You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
+#### Local
+To start a local instance of Xinference, run the following command:
 ```bash
-$ xinference -H,--host "localhost" \
-             -p,--port 9997 \
-             --log-level INFO
+$ xinference
 ```
-To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
+#### Distributed
+To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
 Xinference workers on the other servers. Follow the steps below:
-#### Starting the Supervisor
-On the server where you want to run the Xinference supervisor, run the following command:
+**Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
 ```bash
-$ xinference-supervisor -H,--host "${supervisor_host}" \
-                        -p,--port 9997 \
-                        --log-level INFO
+$ xinference-supervisor -H "${supervisor_host}"
 ```
 Replace `${supervisor_host}` with the actual host of your supervisor server.
-#### Starting the Workers
-On each of the other servers where you want to run Xinference workers, run the following command:
+**Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
 ```bash
-$ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
-                    -H,--host "0.0.0.0" \
-                    --log-level INFO
+$ xinference-worker -e "http://${supervisor_host}:9997"
 ```
 Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -109,7 +117,7 @@ Xinference  client.
 - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
 `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
-You can also view a web UI using the Xinference endpoint where you can chat with all the
+You can also view a web UI using the Xinference endpoint to chat with all the
 builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
 their performance**!
@@ -177,26 +185,27 @@ To view the builtin models, run the following command:
 $ xinference list --all
 ```
-| Name                 | Format  | Size (in billions) | Quantization                                                                                                                   |
-| -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
-| baichuan             | ggmlv3  | [7]                | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| wizardlm-v1.0        | ggmlv3  | [7, 13, 33]        | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| vicuna-v1.3          | ggmlv3  | [7, 13]            | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| orca                 | ggmlv3  | [3, 7, 13]         | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm              | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm2             | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
+| Name                 | Type             | Language | Format | Size (in billions) | Quantization                           |
+| -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
+| baichuan             | Foundation Model | en, zh   | ggmlv3 | 7                  | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| chatglm              | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| chatglm2             | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| wizardlm-v1.0        | SFT Model        | en       | ggmlv3 | 7, 13, 33          | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| vicuna-v1.3          | SFT Model        | en       | ggmlv3 | 7, 13              | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| orca                 | SFT Model        | en       | ggmlv3 | 3, 7, 13           | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
 **NOTE**:
-- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
-- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
-- Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
+- Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
+- Foundation models only provide interface `generate`.
+- SFT models provide both `generate` and `chat`.
 ## Roadmap
 Xinference is currently under active development. Here's a roadmap outlining our planned
 developments for the next few weeks:
 ### PyTorch Support
-With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
+With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
 within Xinference.
 ### Langchain & LlamaIndex integration

{xinference-0.0.1 → xinference-0.0.3}/README.md RENAMED Viewed

@@ -1,10 +1,10 @@
 [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
-[![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
+[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
 [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
 [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
 [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
-# Xorbits inference: Model Serving Made Easy 🤖
+# Xorbits Inference: Model Serving Made Easy 🤖
 Welcome to the Xorbits Inference GitHub repository!
@@ -19,10 +19,16 @@ which is specifically designed to enable large models and high performance on co
 We are actively working on expanding Xorbits Inference's support to include additional runtimes,
 including PyTorch and JAX, in the near future.
+![demo](assets/demo.gif)
+<div align="center">
+<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
+</div>
 ## Key Features
 🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
-recognition, and multimodal models. With a single command, you can set up and deploy your models
-for experimentation and production.
+recognition, and multimodal models. You can set up and deploy your models
+for experimentation and production with a single command.
 ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
 command. Inference provides access to state-of-the-art open-source models!
@@ -40,43 +46,44 @@ for seamless management and monitoring.
 allowing the seamless distribution of model inference across multiple devices or machines. It
 leverages distributed computing techniques to parallelize and scale the inference process.
-🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
-integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
+🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
+with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
 ## Getting Started
 Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
 environment to avoid conflicts.
 ```bash
-$ pip install xinference
+$ pip install "xinference[all]"
 ```
+"xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
+different hardware, refer to the installation documentation of the corresponding package.
+- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
+- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
 ### Deployment
-To start a local instance of Xinference, run the following command:
+You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
+#### Local
+To start a local instance of Xinference, run the following command:
 ```bash
-$ xinference -H,--host "localhost" \
-             -p,--port 9997 \
-             --log-level INFO
+$ xinference
 ```
-To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
+#### Distributed
+To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
 Xinference workers on the other servers. Follow the steps below:
-#### Starting the Supervisor
-On the server where you want to run the Xinference supervisor, run the following command:
+**Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
 ```bash
-$ xinference-supervisor -H,--host "${supervisor_host}" \
-                        -p,--port 9997 \
-                        --log-level INFO
+$ xinference-supervisor -H "${supervisor_host}"
 ```
 Replace `${supervisor_host}` with the actual host of your supervisor server.
-#### Starting the Workers
-On each of the other servers where you want to run Xinference workers, run the following command:
+**Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
 ```bash
-$ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
-                    -H,--host "0.0.0.0" \
-                    --log-level INFO
+$ xinference-worker -e "http://${supervisor_host}:9997"
 ```
 Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -86,7 +93,7 @@ Xinference  client.
 - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
 `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
-You can also view a web UI using the Xinference endpoint where you can chat with all the
+You can also view a web UI using the Xinference endpoint to chat with all the
 builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
 their performance**!
@@ -154,26 +161,27 @@ To view the builtin models, run the following command:
 $ xinference list --all
 ```
-| Name                 | Format  | Size (in billions) | Quantization                                                                                                                   |
-| -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
-| baichuan             | ggmlv3  | [7]                | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| wizardlm-v1.0        | ggmlv3  | [7, 13, 33]        | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| vicuna-v1.3          | ggmlv3  | [7, 13]            | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| orca                 | ggmlv3  | [3, 7, 13]         | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm              | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm2             | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
+| Name                 | Type             | Language | Format | Size (in billions) | Quantization                           |
+| -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
+| baichuan             | Foundation Model | en, zh   | ggmlv3 | 7                  | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| chatglm              | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| chatglm2             | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| wizardlm-v1.0        | SFT Model        | en       | ggmlv3 | 7, 13, 33          | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| vicuna-v1.3          | SFT Model        | en       | ggmlv3 | 7, 13              | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| orca                 | SFT Model        | en       | ggmlv3 | 3, 7, 13           | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
 **NOTE**:
-- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
-- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
-- Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
+- Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
+- Foundation models only provide interface `generate`.
+- SFT models provide both `generate` and `chat`.
 ## Roadmap
 Xinference is currently under active development. Here's a roadmap outlining our planned
 developments for the next few weeks:
 ### PyTorch Support
-With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
+With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
 within Xinference.
 ### Langchain & LlamaIndex integration

{xinference-0.0.1 → xinference-0.0.3}/setup.cfg RENAMED Viewed

@@ -31,6 +31,9 @@ install_requires =
 	tqdm
 	tabulate
 	requests
+	pydantic
+	fastapi
+	uvicorn
 [options.packages.find]
 exclude =
@@ -48,6 +51,9 @@ dev =
 	pytest-asyncio>=0.14.0
 	flake8>=3.8.0
 	black
+all =
+	chatglm-cpp
+	llama-cpp-python
 [options.entry_points]
 console_scripts =

{xinference-0.0.1 → xinference-0.0.3}/xinference/_version.py RENAMED Viewed

@@ -8,11 +8,11 @@ import json
 version_json = '''
 {
- "date": "2023-07-10T18:23:00+0800",
+ "date": "2023-07-11T21:48:28+0800",
  "dirty": false,
  "error": null,
- "full-revisionid": "9427eed6dc1dd9857227a22327368b76d0f990fa",
- "version": "0.0.1"
+ "full-revisionid": "12ed8a3a876dea13fbd61644f49fc49622c2eb26",
+ "version": "0.0.3"
 }
 '''  # END VERSION_JSON

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/gradio.py RENAMED Viewed

@@ -27,7 +27,9 @@ if TYPE_CHECKING:
     from ..types import ChatCompletionChunk, ChatCompletionMessage
 MODEL_TO_FAMILIES = dict(
-    (model_family.model_name, model_family) for model_family in MODEL_FAMILIES
+    (model_family.model_name, model_family)
+    for model_family in MODEL_FAMILIES
+    if model_family.model_name != "baichuan"
 )
@@ -36,7 +38,7 @@ class GradioApp:
         self,
         supervisor_address: str,
         gladiator_num: int = 2,
-        max_model_num: int = 2,
+        max_model_num: int = 3,
         use_launched_model: bool = False,
     ):
         self._api = SyncSupervisorAPI(supervisor_address)
@@ -193,7 +195,7 @@ class GradioApp:
         with gr.Column():
             with gr.Row():
                 model_name = gr.Dropdown(
-                    choices=[m.model_name for m in MODEL_FAMILIES],
+                    choices=list(MODEL_TO_FAMILIES.keys()),
                     label=self._locale("model name"),
                     scale=2,
                 )
@@ -311,10 +313,11 @@ class GradioApp:
             _model_size_in_billions: str,
             _quantization: str,
         ):
-            return _model_name, gr.Chatbot.update(
-                label="-".join(
-                    [_model_name, _model_size_in_billions, _model_format, _quantization]
-                ),
+            full_name = "-".join(
+                [_model_name, _model_size_in_billions, _model_format, _quantization]
+            )
+            return str(uuid.uuid4()), gr.Chatbot.update(
+                label=full_name,
                 value=[],
             )

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/model.py RENAMED Viewed

@@ -77,10 +77,8 @@ class ModelActor(xo.Actor):
             return ret
     async def generate(self, prompt: str, *args, **kwargs):
-        logger.warning("Generate, self address: %s", self.address)
         if not hasattr(self._model, "generate"):
-            raise AttributeError("generate")
+            raise AttributeError(f"Model {self._model.model_spec} is not for generate.")
         return self._wrap_generator(
             getattr(self._model, "generate")(prompt, *args, **kwargs)
@@ -88,7 +86,7 @@ class ModelActor(xo.Actor):
     async def chat(self, prompt: str, *args, **kwargs):
         if not hasattr(self._model, "chat"):
-            raise AttributeError("chat")
+            raise AttributeError(f"Model {self._model.model_spec} is not for chat.")
         return self._wrap_generator(
             getattr(self._model, "chat")(prompt, *args, **kwargs)

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/restful_api.py RENAMED Viewed

@@ -262,9 +262,13 @@ class RESTfulAPIActor(xo.Actor):
         # run uvicorn in another daemon thread.
         config = Config(app=app, log_level="critical")
         server = Server(config)
-        server_thread = threading.Thread(
-            target=server.run, args=[self._sockets], daemon=True
-        )
+        def _serve():
+            httpx_logger = logging.getLogger("httpx")
+            httpx_logger.setLevel(logging.CRITICAL)
+            server.run(self._sockets)
+        server_thread = threading.Thread(target=_serve, daemon=True)
         server_thread.start()
     async def list_models(self) -> Dict[str, Dict[str, Any]]:

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/service.py RENAMED Viewed

@@ -175,6 +175,7 @@ class SupervisorActor(xo.Actor):
         worker_ref = await xo.actor_ref(address=worker_address, uid=WorkerActor.uid())
         self._worker_address_to_worker[worker_address] = worker_ref
+        logger.info("Worker %s has been added successfully", worker_address)
     async def report_worker_status(
         self, worker_address: str, status: Dict[str, ResourceStatus]

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/supervisor.py RENAMED Viewed

@@ -18,7 +18,9 @@ import socket
 from typing import Dict, Optional
 import xoscar as xo
+from xoscar.utils import get_next_port
+from ..constants import XINFERENCE_DEFAULT_ENDPOINT_PORT
 from ..core.gradio import GradioApp
 from ..core.restful_api import RESTfulAPIActor
 from ..core.service import SupervisorActor
@@ -30,10 +32,28 @@ async def start_supervisor_components(address: str, host: str, port: int):
     await xo.create_actor(SupervisorActor, address=address, uid=SupervisorActor.uid())
     gradio_block = GradioApp(address).build()
     # create a socket for RESTful API
-    sockets = []
-    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-    sock.bind((host, port))
-    sockets.append(sock)
+    try:
+        sockets = []
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        sock.bind((host, port))
+        sockets.append(sock)
+    except OSError:
+        # compare the reference to differentiate between the cases where the user specify the
+        # default port and the user does not specify the port.
+        if port is XINFERENCE_DEFAULT_ENDPOINT_PORT:
+            while True:
+                try:
+                    sockets = []
+                    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                    port = get_next_port()
+                    sock.bind((host, port))
+                    sockets.append(sock)
+                    break
+                except OSError:
+                    logger.warning("Failed to create socket with port %d", port)
+        else:
+            raise
     restful_actor = await xo.create_actor(
         RESTfulAPIActor,
         address=address,
@@ -43,7 +63,7 @@ async def start_supervisor_components(address: str, host: str, port: int):
     )
     await restful_actor.serve()
     url = f"http://{host}:{port}"
-    logger.info(f"Server address: {url}")
+    logger.info(f"Xinference successfully started. Endpoint: {url}")
     return url

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/worker.py RENAMED Viewed

@@ -13,12 +13,15 @@
 # limitations under the License.
 import asyncio
+import logging
 from typing import Dict, Optional
 import xoscar as xo
 from ..core.service import WorkerActor
+logger = logging.getLogger(__name__)
 async def start_worker_components(address: str, supervisor_address: str):
     actor_pool_config = await xo.get_pool_config(address)
@@ -35,6 +38,7 @@ async def start_worker_components(address: str, supervisor_address: str):
         supervisor_address=supervisor_address,
         subpool_addresses=subpool_addresses,  # exclude the main actor pool.
     )
+    logger.info(f"Xinference worker successfully started.")
 async def _start_worker(

{xinference-0.0.1 → xinference-0.0.3}/xinference/locale/utils.py RENAMED Viewed

@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import codecs
 import json
 import locale
 import os
@@ -27,7 +28,7 @@ class Locale:
             os.path.dirname(os.path.abspath(__file__)), f"{self._language}.json"
         )
         if os.path.exists(json_path):
-            self._mapping = json.load(open(json_path))
+            self._mapping = json.load(codecs.open(json_path, "r", encoding="utf-8"))
         else:
             self._mapping = None

xinference-0.0.3/xinference/locale/zh_CN.json ADDED Viewed

@@ -0,0 +1,25 @@
+{
+  "Please create model first": "请先创建模型",
+  "stop reason": "停止原因",
+  "Show stop reason": "展示停止原因",
+  "Max tokens": "最大 token 数量",
+  "The maximum number of tokens to generate.": "生成 token 数量最大值",
+  "Temperature": "温度参数",
+  "The temperature to use for sampling.": "温度参数用于调整输出的多样性，数值越高多样性越高",
+  "Top P": "Top P",
+  "The top-p value to use for sampling.": "用于控制生成文本的确定性，数值越低确定性越高",
+  "Window size": "窗口大小",
+  "Window size of chat history.": "用于生成回复的聊天历史窗口大小",
+  "show stop reason": "展示停止原因",
+  "Downloading": "下载中",
+  "model name": "模型名",
+  "model format": "模型格式",
+  "model size in billions": "模型大小(B)",
+  "quantization": "模型量化方式",
+  "Parameters": "参数调整",
+  "create": "创建",
+  "select model": "选择模型",
+  "Arena": "角斗场",
+  "Chat": "聊天",
+  "Input": "输入"
+}

{xinference-0.0.1 → xinference-0.0.3/xinference.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xinference
-Version: 0.0.1
+Version: 0.0.3
 Summary: Model Serving Made Easy
 Home-page: https://github.com/xorbitsai/inference
 Author: Qin Xuye
@@ -19,15 +19,16 @@ Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Topic :: Software Development :: Libraries
 Description-Content-Type: text/markdown
 Provides-Extra: dev
+Provides-Extra: all
 License-File: LICENSE
 [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
-[![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
+[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
 [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
 [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
 [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
-# Xorbits inference: Model Serving Made Easy 🤖
+# Xorbits Inference: Model Serving Made Easy 🤖
 Welcome to the Xorbits Inference GitHub repository!
@@ -42,10 +43,16 @@ which is specifically designed to enable large models and high performance on co
 We are actively working on expanding Xorbits Inference's support to include additional runtimes,
 including PyTorch and JAX, in the near future.
+![demo](assets/demo.gif)
+<div align="center">
+<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
+</div>
 ## Key Features
 🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
-recognition, and multimodal models. With a single command, you can set up and deploy your models
-for experimentation and production.
+recognition, and multimodal models. You can set up and deploy your models
+for experimentation and production with a single command.
 ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
 command. Inference provides access to state-of-the-art open-source models!
@@ -63,43 +70,44 @@ for seamless management and monitoring.
 allowing the seamless distribution of model inference across multiple devices or machines. It
 leverages distributed computing techniques to parallelize and scale the inference process.
-🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
-integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
+🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
+with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
 ## Getting Started
 Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
 environment to avoid conflicts.
 ```bash
-$ pip install xinference
+$ pip install "xinference[all]"
 ```
+"xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
+different hardware, refer to the installation documentation of the corresponding package.
+- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
+- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
 ### Deployment
-To start a local instance of Xinference, run the following command:
+You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
+#### Local
+To start a local instance of Xinference, run the following command:
 ```bash
-$ xinference -H,--host "localhost" \
-             -p,--port 9997 \
-             --log-level INFO
+$ xinference
 ```
-To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
+#### Distributed
+To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
 Xinference workers on the other servers. Follow the steps below:
-#### Starting the Supervisor
-On the server where you want to run the Xinference supervisor, run the following command:
+**Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
 ```bash
-$ xinference-supervisor -H,--host "${supervisor_host}" \
-                        -p,--port 9997 \
-                        --log-level INFO
+$ xinference-supervisor -H "${supervisor_host}"
 ```
 Replace `${supervisor_host}` with the actual host of your supervisor server.
-#### Starting the Workers
-On each of the other servers where you want to run Xinference workers, run the following command:
+**Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
 ```bash
-$ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
-                    -H,--host "0.0.0.0" \
-                    --log-level INFO
+$ xinference-worker -e "http://${supervisor_host}:9997"
 ```
 Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -109,7 +117,7 @@ Xinference  client.
 - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
 `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
-You can also view a web UI using the Xinference endpoint where you can chat with all the
+You can also view a web UI using the Xinference endpoint to chat with all the
 builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
 their performance**!
@@ -177,26 +185,27 @@ To view the builtin models, run the following command:
 $ xinference list --all
 ```
-| Name                 | Format  | Size (in billions) | Quantization                                                                                                                   |
-| -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
-| baichuan             | ggmlv3  | [7]                | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| wizardlm-v1.0        | ggmlv3  | [7, 13, 33]        | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| vicuna-v1.3          | ggmlv3  | [7, 13]            | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
-| orca                 | ggmlv3  | [3, 7, 13]         | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm              | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
-| chatglm2             | ggmlv3  | [6]                | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0']                                                                                       |
+| Name                 | Type             | Language | Format | Size (in billions) | Quantization                           |
+| -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
+| baichuan             | Foundation Model | en, zh   | ggmlv3 | 7                  | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| chatglm              | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| chatglm2             | SFT Model        | en, zh   | ggmlv3 | 6                  | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
+| wizardlm-v1.0        | SFT Model        | en       | ggmlv3 | 7, 13, 33          | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| vicuna-v1.3          | SFT Model        | en       | ggmlv3 | 7, 13              | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
+| orca                 | SFT Model        | en       | ggmlv3 | 3, 7, 13           | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
 **NOTE**:
-- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
-- [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
-- Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
+- Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
+- Foundation models only provide interface `generate`.
+- SFT models provide both `generate` and `chat`.
 ## Roadmap
 Xinference is currently under active development. Here's a roadmap outlining our planned
 developments for the next few weeks:
 ### PyTorch Support
-With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
+With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
 within Xinference.
 ### Langchain & LlamaIndex integration

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/SOURCES.txt RENAMED Viewed

@@ -34,6 +34,7 @@ xinference/deploy/worker.py
 xinference/deploy/test/__init__.py
 xinference/locale/__init__.py
 xinference/locale/utils.py
+xinference/locale/zh_CN.json
 xinference/model/__init__.py
 xinference/model/llm/__init__.py
 xinference/model/llm/chatglm.py

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/requires.txt RENAMED Viewed

@@ -5,6 +5,13 @@ click
 tqdm
 tabulate
 requests
+pydantic
+fastapi
+uvicorn
+[all]
+chatglm-cpp
+llama-cpp-python
 [dev]
 cython>=0.29

{xinference-0.0.1 → xinference-0.0.3}/LICENSE RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/pyproject.toml RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/setup.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/versioneer.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/client.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/constants.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/api.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/core/resource.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/cmdline.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/local.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/test/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/utils.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/isolation.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/locale/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/__init__.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/chatglm.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/core.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/orca.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/vicuna.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/wizardlm.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference/types.py RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/entry_points.txt RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/not-zip-safe RENAMED Viewed

File without changes

{xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/top_level.txt RENAMED Viewed

File without changes

xinference 0.0.1__tar.gz → 0.0.3__tar.gz

Potentially problematic release.

xinference 0.0.1tar.gz → 0.0.3tar.gz