xinference 0.0.1__tar.gz → 0.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of xinference might be problematic. Click here for more details.

Files changed (45) hide show
  1. {xinference-0.0.1 → xinference-0.0.3}/MANIFEST.in +1 -0
  2. {xinference-0.0.1/xinference.egg-info → xinference-0.0.3}/PKG-INFO +45 -36
  3. {xinference-0.0.1 → xinference-0.0.3}/README.md +43 -35
  4. {xinference-0.0.1 → xinference-0.0.3}/setup.cfg +6 -0
  5. {xinference-0.0.1 → xinference-0.0.3}/xinference/_version.py +3 -3
  6. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/gradio.py +10 -7
  7. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/model.py +2 -4
  8. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/restful_api.py +7 -3
  9. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/service.py +1 -0
  10. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/supervisor.py +25 -5
  11. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/worker.py +4 -0
  12. {xinference-0.0.1 → xinference-0.0.3}/xinference/locale/utils.py +2 -1
  13. xinference-0.0.3/xinference/locale/zh_CN.json +25 -0
  14. {xinference-0.0.1 → xinference-0.0.3/xinference.egg-info}/PKG-INFO +45 -36
  15. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/SOURCES.txt +1 -0
  16. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/requires.txt +7 -0
  17. {xinference-0.0.1 → xinference-0.0.3}/LICENSE +0 -0
  18. {xinference-0.0.1 → xinference-0.0.3}/pyproject.toml +0 -0
  19. {xinference-0.0.1 → xinference-0.0.3}/setup.py +0 -0
  20. {xinference-0.0.1 → xinference-0.0.3}/versioneer.py +0 -0
  21. {xinference-0.0.1 → xinference-0.0.3}/xinference/__init__.py +0 -0
  22. {xinference-0.0.1 → xinference-0.0.3}/xinference/client.py +0 -0
  23. {xinference-0.0.1 → xinference-0.0.3}/xinference/constants.py +0 -0
  24. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/__init__.py +0 -0
  25. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/api.py +0 -0
  26. {xinference-0.0.1 → xinference-0.0.3}/xinference/core/resource.py +0 -0
  27. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/__init__.py +0 -0
  28. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/cmdline.py +0 -0
  29. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/local.py +0 -0
  30. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/test/__init__.py +0 -0
  31. {xinference-0.0.1 → xinference-0.0.3}/xinference/deploy/utils.py +0 -0
  32. {xinference-0.0.1 → xinference-0.0.3}/xinference/isolation.py +0 -0
  33. {xinference-0.0.1 → xinference-0.0.3}/xinference/locale/__init__.py +0 -0
  34. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/__init__.py +0 -0
  35. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/__init__.py +0 -0
  36. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/chatglm.py +0 -0
  37. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/core.py +0 -0
  38. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/orca.py +0 -0
  39. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/vicuna.py +0 -0
  40. {xinference-0.0.1 → xinference-0.0.3}/xinference/model/llm/wizardlm.py +0 -0
  41. {xinference-0.0.1 → xinference-0.0.3}/xinference/types.py +0 -0
  42. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/dependency_links.txt +0 -0
  43. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/entry_points.txt +0 -0
  44. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/not-zip-safe +0 -0
  45. {xinference-0.0.1 → xinference-0.0.3}/xinference.egg-info/top_level.txt +0 -0
@@ -8,3 +8,4 @@ global-exclude .DS_Store
8
8
  include versioneer.py
9
9
  include xinference/_version.py
10
10
  global-exclude conftest.py
11
+ include xinference/locale/*.json
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: xinference
3
- Version: 0.0.1
3
+ Version: 0.0.3
4
4
  Summary: Model Serving Made Easy
5
5
  Home-page: https://github.com/xorbitsai/inference
6
6
  Author: Qin Xuye
@@ -19,15 +19,16 @@ Classifier: Programming Language :: Python :: Implementation :: CPython
19
19
  Classifier: Topic :: Software Development :: Libraries
20
20
  Description-Content-Type: text/markdown
21
21
  Provides-Extra: dev
22
+ Provides-Extra: all
22
23
  License-File: LICENSE
23
24
 
24
25
  [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
25
- [![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
26
+ [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
26
27
  [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
27
28
  [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
28
29
  [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
29
30
 
30
- # Xorbits inference: Model Serving Made Easy 🤖
31
+ # Xorbits Inference: Model Serving Made Easy 🤖
31
32
 
32
33
  Welcome to the Xorbits Inference GitHub repository!
33
34
 
@@ -42,10 +43,16 @@ which is specifically designed to enable large models and high performance on co
42
43
  We are actively working on expanding Xorbits Inference's support to include additional runtimes,
43
44
  including PyTorch and JAX, in the near future.
44
45
 
46
+ ![demo](assets/demo.gif)
47
+
48
+ <div align="center">
49
+ <i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
50
+ </div>
51
+
45
52
  ## Key Features
46
53
  🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
47
- recognition, and multimodal models. With a single command, you can set up and deploy your models
48
- for experimentation and production.
54
+ recognition, and multimodal models. You can set up and deploy your models
55
+ for experimentation and production with a single command.
49
56
 
50
57
  ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
51
58
  command. Inference provides access to state-of-the-art open-source models!
@@ -63,43 +70,44 @@ for seamless management and monitoring.
63
70
  allowing the seamless distribution of model inference across multiple devices or machines. It
64
71
  leverages distributed computing techniques to parallelize and scale the inference process.
65
72
 
66
- 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
67
- integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
73
+ 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
74
+ with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
68
75
 
69
76
  ## Getting Started
70
77
  Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
71
78
  environment to avoid conflicts.
72
79
  ```bash
73
- $ pip install xinference
80
+ $ pip install "xinference[all]"
74
81
  ```
82
+ "xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
83
+ different hardware, refer to the installation documentation of the corresponding package.
84
+ - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
85
+ - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
86
+
75
87
 
76
88
  ### Deployment
77
- To start a local instance of Xinference, run the following command:
89
+ You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
78
90
 
91
+ #### Local
92
+ To start a local instance of Xinference, run the following command:
79
93
  ```bash
80
- $ xinference -H,--host "localhost" \
81
- -p,--port 9997 \
82
- --log-level INFO
94
+ $ xinference
83
95
  ```
84
96
 
85
- To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
97
+ #### Distributed
98
+
99
+ To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
86
100
  Xinference workers on the other servers. Follow the steps below:
87
101
 
88
- #### Starting the Supervisor
89
- On the server where you want to run the Xinference supervisor, run the following command:
102
+ **Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
90
103
  ```bash
91
- $ xinference-supervisor -H,--host "${supervisor_host}" \
92
- -p,--port 9997 \
93
- --log-level INFO
104
+ $ xinference-supervisor -H "${supervisor_host}"
94
105
  ```
95
106
  Replace `${supervisor_host}` with the actual host of your supervisor server.
96
107
 
97
- #### Starting the Workers
98
- On each of the other servers where you want to run Xinference workers, run the following command:
108
+ **Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
99
109
  ```bash
100
- $ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
101
- -H,--host "0.0.0.0" \
102
- --log-level INFO
110
+ $ xinference-worker -e "http://${supervisor_host}:9997"
103
111
  ```
104
112
 
105
113
  Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -109,7 +117,7 @@ Xinference client.
109
117
  - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
110
118
  `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
111
119
 
112
- You can also view a web UI using the Xinference endpoint where you can chat with all the
120
+ You can also view a web UI using the Xinference endpoint to chat with all the
113
121
  builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
114
122
  their performance**!
115
123
 
@@ -177,26 +185,27 @@ To view the builtin models, run the following command:
177
185
  $ xinference list --all
178
186
  ```
179
187
 
180
- | Name | Format | Size (in billions) | Quantization |
181
- | -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
182
- | baichuan | ggmlv3 | [7] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
183
- | wizardlm-v1.0 | ggmlv3 | [7, 13, 33] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
184
- | vicuna-v1.3 | ggmlv3 | [7, 13] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
185
- | orca | ggmlv3 | [3, 7, 13] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
186
- | chatglm | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
187
- | chatglm2 | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
188
+ | Name | Type | Language | Format | Size (in billions) | Quantization |
189
+ | -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
190
+ | baichuan | Foundation Model | en, zh | ggmlv3 | 7 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
191
+ | chatglm | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
192
+ | chatglm2 | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
193
+ | wizardlm-v1.0 | SFT Model | en | ggmlv3 | 7, 13, 33 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
194
+ | vicuna-v1.3 | SFT Model | en | ggmlv3 | 7, 13 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
195
+ | orca | SFT Model | en | ggmlv3 | 3, 7, 13 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
196
+
188
197
 
189
198
  **NOTE**:
190
- - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
191
- - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
192
- - Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
199
+ - Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
200
+ - Foundation models only provide interface `generate`.
201
+ - SFT models provide both `generate` and `chat`.
193
202
 
194
203
  ## Roadmap
195
204
  Xinference is currently under active development. Here's a roadmap outlining our planned
196
205
  developments for the next few weeks:
197
206
 
198
207
  ### PyTorch Support
199
- With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
208
+ With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
200
209
  within Xinference.
201
210
 
202
211
  ### Langchain & LlamaIndex integration
@@ -1,10 +1,10 @@
1
1
  [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
2
- [![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
2
+ [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
3
3
  [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
4
4
  [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
5
5
  [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
6
6
 
7
- # Xorbits inference: Model Serving Made Easy 🤖
7
+ # Xorbits Inference: Model Serving Made Easy 🤖
8
8
 
9
9
  Welcome to the Xorbits Inference GitHub repository!
10
10
 
@@ -19,10 +19,16 @@ which is specifically designed to enable large models and high performance on co
19
19
  We are actively working on expanding Xorbits Inference's support to include additional runtimes,
20
20
  including PyTorch and JAX, in the near future.
21
21
 
22
+ ![demo](assets/demo.gif)
23
+
24
+ <div align="center">
25
+ <i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
26
+ </div>
27
+
22
28
  ## Key Features
23
29
  🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
24
- recognition, and multimodal models. With a single command, you can set up and deploy your models
25
- for experimentation and production.
30
+ recognition, and multimodal models. You can set up and deploy your models
31
+ for experimentation and production with a single command.
26
32
 
27
33
  ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
28
34
  command. Inference provides access to state-of-the-art open-source models!
@@ -40,43 +46,44 @@ for seamless management and monitoring.
40
46
  allowing the seamless distribution of model inference across multiple devices or machines. It
41
47
  leverages distributed computing techniques to parallelize and scale the inference process.
42
48
 
43
- 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
44
- integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
49
+ 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
50
+ with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
45
51
 
46
52
  ## Getting Started
47
53
  Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
48
54
  environment to avoid conflicts.
49
55
  ```bash
50
- $ pip install xinference
56
+ $ pip install "xinference[all]"
51
57
  ```
58
+ "xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
59
+ different hardware, refer to the installation documentation of the corresponding package.
60
+ - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
61
+ - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
62
+
52
63
 
53
64
  ### Deployment
54
- To start a local instance of Xinference, run the following command:
65
+ You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
55
66
 
67
+ #### Local
68
+ To start a local instance of Xinference, run the following command:
56
69
  ```bash
57
- $ xinference -H,--host "localhost" \
58
- -p,--port 9997 \
59
- --log-level INFO
70
+ $ xinference
60
71
  ```
61
72
 
62
- To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
73
+ #### Distributed
74
+
75
+ To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
63
76
  Xinference workers on the other servers. Follow the steps below:
64
77
 
65
- #### Starting the Supervisor
66
- On the server where you want to run the Xinference supervisor, run the following command:
78
+ **Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
67
79
  ```bash
68
- $ xinference-supervisor -H,--host "${supervisor_host}" \
69
- -p,--port 9997 \
70
- --log-level INFO
80
+ $ xinference-supervisor -H "${supervisor_host}"
71
81
  ```
72
82
  Replace `${supervisor_host}` with the actual host of your supervisor server.
73
83
 
74
- #### Starting the Workers
75
- On each of the other servers where you want to run Xinference workers, run the following command:
84
+ **Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
76
85
  ```bash
77
- $ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
78
- -H,--host "0.0.0.0" \
79
- --log-level INFO
86
+ $ xinference-worker -e "http://${supervisor_host}:9997"
80
87
  ```
81
88
 
82
89
  Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -86,7 +93,7 @@ Xinference client.
86
93
  - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
87
94
  `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
88
95
 
89
- You can also view a web UI using the Xinference endpoint where you can chat with all the
96
+ You can also view a web UI using the Xinference endpoint to chat with all the
90
97
  builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
91
98
  their performance**!
92
99
 
@@ -154,26 +161,27 @@ To view the builtin models, run the following command:
154
161
  $ xinference list --all
155
162
  ```
156
163
 
157
- | Name | Format | Size (in billions) | Quantization |
158
- | -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
159
- | baichuan | ggmlv3 | [7] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
160
- | wizardlm-v1.0 | ggmlv3 | [7, 13, 33] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
161
- | vicuna-v1.3 | ggmlv3 | [7, 13] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
162
- | orca | ggmlv3 | [3, 7, 13] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
163
- | chatglm | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
164
- | chatglm2 | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
164
+ | Name | Type | Language | Format | Size (in billions) | Quantization |
165
+ | -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
166
+ | baichuan | Foundation Model | en, zh | ggmlv3 | 7 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
167
+ | chatglm | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
168
+ | chatglm2 | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
169
+ | wizardlm-v1.0 | SFT Model | en | ggmlv3 | 7, 13, 33 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
170
+ | vicuna-v1.3 | SFT Model | en | ggmlv3 | 7, 13 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
171
+ | orca | SFT Model | en | ggmlv3 | 3, 7, 13 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
172
+
165
173
 
166
174
  **NOTE**:
167
- - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
168
- - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
169
- - Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
175
+ - Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
176
+ - Foundation models only provide interface `generate`.
177
+ - SFT models provide both `generate` and `chat`.
170
178
 
171
179
  ## Roadmap
172
180
  Xinference is currently under active development. Here's a roadmap outlining our planned
173
181
  developments for the next few weeks:
174
182
 
175
183
  ### PyTorch Support
176
- With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
184
+ With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
177
185
  within Xinference.
178
186
 
179
187
  ### Langchain & LlamaIndex integration
@@ -31,6 +31,9 @@ install_requires =
31
31
  tqdm
32
32
  tabulate
33
33
  requests
34
+ pydantic
35
+ fastapi
36
+ uvicorn
34
37
 
35
38
  [options.packages.find]
36
39
  exclude =
@@ -48,6 +51,9 @@ dev =
48
51
  pytest-asyncio>=0.14.0
49
52
  flake8>=3.8.0
50
53
  black
54
+ all =
55
+ chatglm-cpp
56
+ llama-cpp-python
51
57
 
52
58
  [options.entry_points]
53
59
  console_scripts =
@@ -8,11 +8,11 @@ import json
8
8
 
9
9
  version_json = '''
10
10
  {
11
- "date": "2023-07-10T18:23:00+0800",
11
+ "date": "2023-07-11T21:48:28+0800",
12
12
  "dirty": false,
13
13
  "error": null,
14
- "full-revisionid": "9427eed6dc1dd9857227a22327368b76d0f990fa",
15
- "version": "0.0.1"
14
+ "full-revisionid": "12ed8a3a876dea13fbd61644f49fc49622c2eb26",
15
+ "version": "0.0.3"
16
16
  }
17
17
  ''' # END VERSION_JSON
18
18
 
@@ -27,7 +27,9 @@ if TYPE_CHECKING:
27
27
  from ..types import ChatCompletionChunk, ChatCompletionMessage
28
28
 
29
29
  MODEL_TO_FAMILIES = dict(
30
- (model_family.model_name, model_family) for model_family in MODEL_FAMILIES
30
+ (model_family.model_name, model_family)
31
+ for model_family in MODEL_FAMILIES
32
+ if model_family.model_name != "baichuan"
31
33
  )
32
34
 
33
35
 
@@ -36,7 +38,7 @@ class GradioApp:
36
38
  self,
37
39
  supervisor_address: str,
38
40
  gladiator_num: int = 2,
39
- max_model_num: int = 2,
41
+ max_model_num: int = 3,
40
42
  use_launched_model: bool = False,
41
43
  ):
42
44
  self._api = SyncSupervisorAPI(supervisor_address)
@@ -193,7 +195,7 @@ class GradioApp:
193
195
  with gr.Column():
194
196
  with gr.Row():
195
197
  model_name = gr.Dropdown(
196
- choices=[m.model_name for m in MODEL_FAMILIES],
198
+ choices=list(MODEL_TO_FAMILIES.keys()),
197
199
  label=self._locale("model name"),
198
200
  scale=2,
199
201
  )
@@ -311,10 +313,11 @@ class GradioApp:
311
313
  _model_size_in_billions: str,
312
314
  _quantization: str,
313
315
  ):
314
- return _model_name, gr.Chatbot.update(
315
- label="-".join(
316
- [_model_name, _model_size_in_billions, _model_format, _quantization]
317
- ),
316
+ full_name = "-".join(
317
+ [_model_name, _model_size_in_billions, _model_format, _quantization]
318
+ )
319
+ return str(uuid.uuid4()), gr.Chatbot.update(
320
+ label=full_name,
318
321
  value=[],
319
322
  )
320
323
 
@@ -77,10 +77,8 @@ class ModelActor(xo.Actor):
77
77
  return ret
78
78
 
79
79
  async def generate(self, prompt: str, *args, **kwargs):
80
- logger.warning("Generate, self address: %s", self.address)
81
-
82
80
  if not hasattr(self._model, "generate"):
83
- raise AttributeError("generate")
81
+ raise AttributeError(f"Model {self._model.model_spec} is not for generate.")
84
82
 
85
83
  return self._wrap_generator(
86
84
  getattr(self._model, "generate")(prompt, *args, **kwargs)
@@ -88,7 +86,7 @@ class ModelActor(xo.Actor):
88
86
 
89
87
  async def chat(self, prompt: str, *args, **kwargs):
90
88
  if not hasattr(self._model, "chat"):
91
- raise AttributeError("chat")
89
+ raise AttributeError(f"Model {self._model.model_spec} is not for chat.")
92
90
 
93
91
  return self._wrap_generator(
94
92
  getattr(self._model, "chat")(prompt, *args, **kwargs)
@@ -262,9 +262,13 @@ class RESTfulAPIActor(xo.Actor):
262
262
  # run uvicorn in another daemon thread.
263
263
  config = Config(app=app, log_level="critical")
264
264
  server = Server(config)
265
- server_thread = threading.Thread(
266
- target=server.run, args=[self._sockets], daemon=True
267
- )
265
+
266
+ def _serve():
267
+ httpx_logger = logging.getLogger("httpx")
268
+ httpx_logger.setLevel(logging.CRITICAL)
269
+ server.run(self._sockets)
270
+
271
+ server_thread = threading.Thread(target=_serve, daemon=True)
268
272
  server_thread.start()
269
273
 
270
274
  async def list_models(self) -> Dict[str, Dict[str, Any]]:
@@ -175,6 +175,7 @@ class SupervisorActor(xo.Actor):
175
175
 
176
176
  worker_ref = await xo.actor_ref(address=worker_address, uid=WorkerActor.uid())
177
177
  self._worker_address_to_worker[worker_address] = worker_ref
178
+ logger.info("Worker %s has been added successfully", worker_address)
178
179
 
179
180
  async def report_worker_status(
180
181
  self, worker_address: str, status: Dict[str, ResourceStatus]
@@ -18,7 +18,9 @@ import socket
18
18
  from typing import Dict, Optional
19
19
 
20
20
  import xoscar as xo
21
+ from xoscar.utils import get_next_port
21
22
 
23
+ from ..constants import XINFERENCE_DEFAULT_ENDPOINT_PORT
22
24
  from ..core.gradio import GradioApp
23
25
  from ..core.restful_api import RESTfulAPIActor
24
26
  from ..core.service import SupervisorActor
@@ -30,10 +32,28 @@ async def start_supervisor_components(address: str, host: str, port: int):
30
32
  await xo.create_actor(SupervisorActor, address=address, uid=SupervisorActor.uid())
31
33
  gradio_block = GradioApp(address).build()
32
34
  # create a socket for RESTful API
33
- sockets = []
34
- sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
35
- sock.bind((host, port))
36
- sockets.append(sock)
35
+ try:
36
+ sockets = []
37
+ sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
38
+ sock.bind((host, port))
39
+ sockets.append(sock)
40
+ except OSError:
41
+ # compare the reference to differentiate between the cases where the user specify the
42
+ # default port and the user does not specify the port.
43
+ if port is XINFERENCE_DEFAULT_ENDPOINT_PORT:
44
+ while True:
45
+ try:
46
+ sockets = []
47
+ sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
48
+ port = get_next_port()
49
+ sock.bind((host, port))
50
+ sockets.append(sock)
51
+ break
52
+ except OSError:
53
+ logger.warning("Failed to create socket with port %d", port)
54
+ else:
55
+ raise
56
+
37
57
  restful_actor = await xo.create_actor(
38
58
  RESTfulAPIActor,
39
59
  address=address,
@@ -43,7 +63,7 @@ async def start_supervisor_components(address: str, host: str, port: int):
43
63
  )
44
64
  await restful_actor.serve()
45
65
  url = f"http://{host}:{port}"
46
- logger.info(f"Server address: {url}")
66
+ logger.info(f"Xinference successfully started. Endpoint: {url}")
47
67
  return url
48
68
 
49
69
 
@@ -13,12 +13,15 @@
13
13
  # limitations under the License.
14
14
 
15
15
  import asyncio
16
+ import logging
16
17
  from typing import Dict, Optional
17
18
 
18
19
  import xoscar as xo
19
20
 
20
21
  from ..core.service import WorkerActor
21
22
 
23
+ logger = logging.getLogger(__name__)
24
+
22
25
 
23
26
  async def start_worker_components(address: str, supervisor_address: str):
24
27
  actor_pool_config = await xo.get_pool_config(address)
@@ -35,6 +38,7 @@ async def start_worker_components(address: str, supervisor_address: str):
35
38
  supervisor_address=supervisor_address,
36
39
  subpool_addresses=subpool_addresses, # exclude the main actor pool.
37
40
  )
41
+ logger.info(f"Xinference worker successfully started.")
38
42
 
39
43
 
40
44
  async def _start_worker(
@@ -12,6 +12,7 @@
12
12
  # See the License for the specific language governing permissions and
13
13
  # limitations under the License.
14
14
 
15
+ import codecs
15
16
  import json
16
17
  import locale
17
18
  import os
@@ -27,7 +28,7 @@ class Locale:
27
28
  os.path.dirname(os.path.abspath(__file__)), f"{self._language}.json"
28
29
  )
29
30
  if os.path.exists(json_path):
30
- self._mapping = json.load(open(json_path))
31
+ self._mapping = json.load(codecs.open(json_path, "r", encoding="utf-8"))
31
32
  else:
32
33
  self._mapping = None
33
34
 
@@ -0,0 +1,25 @@
1
+ {
2
+ "Please create model first": "请先创建模型",
3
+ "stop reason": "停止原因",
4
+ "Show stop reason": "展示停止原因",
5
+ "Max tokens": "最大 token 数量",
6
+ "The maximum number of tokens to generate.": "生成 token 数量最大值",
7
+ "Temperature": "温度参数",
8
+ "The temperature to use for sampling.": "温度参数用于调整输出的多样性,数值越高多样性越高",
9
+ "Top P": "Top P",
10
+ "The top-p value to use for sampling.": "用于控制生成文本的确定性,数值越低确定性越高",
11
+ "Window size": "窗口大小",
12
+ "Window size of chat history.": "用于生成回复的聊天历史窗口大小",
13
+ "show stop reason": "展示停止原因",
14
+ "Downloading": "下载中",
15
+ "model name": "模型名",
16
+ "model format": "模型格式",
17
+ "model size in billions": "模型大小(B)",
18
+ "quantization": "模型量化方式",
19
+ "Parameters": "参数调整",
20
+ "create": "创建",
21
+ "select model": "选择模型",
22
+ "Arena": "角斗场",
23
+ "Chat": "聊天",
24
+ "Input": "输入"
25
+ }
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: xinference
3
- Version: 0.0.1
3
+ Version: 0.0.3
4
4
  Summary: Model Serving Made Easy
5
5
  Home-page: https://github.com/xorbitsai/inference
6
6
  Author: Qin Xuye
@@ -19,15 +19,16 @@ Classifier: Programming Language :: Python :: Implementation :: CPython
19
19
  Classifier: Topic :: Software Development :: Libraries
20
20
  Description-Content-Type: text/markdown
21
21
  Provides-Extra: dev
22
+ Provides-Extra: all
22
23
  License-File: LICENSE
23
24
 
24
25
  [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
25
- [![License](https://img.shields.io/pypi/l/inference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
26
+ [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
26
27
  [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
27
28
  [![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
28
29
  [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)
29
30
 
30
- # Xorbits inference: Model Serving Made Easy 🤖
31
+ # Xorbits Inference: Model Serving Made Easy 🤖
31
32
 
32
33
  Welcome to the Xorbits Inference GitHub repository!
33
34
 
@@ -42,10 +43,16 @@ which is specifically designed to enable large models and high performance on co
42
43
  We are actively working on expanding Xorbits Inference's support to include additional runtimes,
43
44
  including PyTorch and JAX, in the near future.
44
45
 
46
+ ![demo](assets/demo.gif)
47
+
48
+ <div align="center">
49
+ <i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
50
+ </div>
51
+
45
52
  ## Key Features
46
53
  🌟 **Model Serving Made Easy**: Inference simplifies the process of serving large language, speech
47
- recognition, and multimodal models. With a single command, you can set up and deploy your models
48
- for experimentation and production.
54
+ recognition, and multimodal models. You can set up and deploy your models
55
+ for experimentation and production with a single command.
49
56
 
50
57
  ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single
51
58
  command. Inference provides access to state-of-the-art open-source models!
@@ -63,43 +70,44 @@ for seamless management and monitoring.
63
70
  allowing the seamless distribution of model inference across multiple devices or machines. It
64
71
  leverages distributed computing techniques to parallelize and scale the inference process.
65
72
 
66
- 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference provides seamless
67
- integration with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
73
+ 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
74
+ with popular third-party libraries like LangChain and LlamaIndex. (Coming soon)
68
75
 
69
76
  ## Getting Started
70
77
  Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
71
78
  environment to avoid conflicts.
72
79
  ```bash
73
- $ pip install xinference
80
+ $ pip install "xinference[all]"
74
81
  ```
82
+ "xinference[all]" installs all the necessary packages for serving models. If you want to achieve acceleration on
83
+ different hardware, refer to the installation documentation of the corresponding package.
84
+ - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-from-pypi-recommended) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
85
+ - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp#getting-started) is required to run `chatglm` and `chatglm2`.
86
+
75
87
 
76
88
  ### Deployment
77
- To start a local instance of Xinference, run the following command:
89
+ You can deploy Xinference locally with a single command or deploy it in a distributed cluster.
78
90
 
91
+ #### Local
92
+ To start a local instance of Xinference, run the following command:
79
93
  ```bash
80
- $ xinference -H,--host "localhost" \
81
- -p,--port 9997 \
82
- --log-level INFO
94
+ $ xinference
83
95
  ```
84
96
 
85
- To deploy Xinference in a cluster, you need to start an Xinference supervisor on one server and
97
+ #### Distributed
98
+
99
+ To deploy Xinference in a cluster, you need to start a Xinference supervisor on one server and
86
100
  Xinference workers on the other servers. Follow the steps below:
87
101
 
88
- #### Starting the Supervisor
89
- On the server where you want to run the Xinference supervisor, run the following command:
102
+ **Starting the Supervisor**: On the server where you want to run the Xinference supervisor, run the following command:
90
103
  ```bash
91
- $ xinference-supervisor -H,--host "${supervisor_host}" \
92
- -p,--port 9997 \
93
- --log-level INFO
104
+ $ xinference-supervisor -H "${supervisor_host}"
94
105
  ```
95
106
  Replace `${supervisor_host}` with the actual host of your supervisor server.
96
107
 
97
- #### Starting the Workers
98
- On each of the other servers where you want to run Xinference workers, run the following command:
108
+ **Starting the Workers**: On each of the other servers where you want to run Xinference workers, run the following command:
99
109
  ```bash
100
- $ xinference-worker -e, --endpoint "http://${supervisor_host}:9997" \
101
- -H,--host "0.0.0.0" \
102
- --log-level INFO
110
+ $ xinference-worker -e "http://${supervisor_host}:9997"
103
111
  ```
104
112
 
105
113
  Once Xinference is running, an endpoint will be accessible for model management via CLI or
@@ -109,7 +117,7 @@ Xinference client.
109
117
  - For cluster deployment, the endpoint will be `http://${supervisor_host}:9997`, where
110
118
  `${supervisor_host}` is the hostname or IP address of the server where the supervisor is running.
111
119
 
112
- You can also view a web UI using the Xinference endpoint where you can chat with all the
120
+ You can also view a web UI using the Xinference endpoint to chat with all the
113
121
  builtin models. You can even **chat with two cutting-edge AI models side-by-side to compare
114
122
  their performance**!
115
123
 
@@ -177,26 +185,27 @@ To view the builtin models, run the following command:
177
185
  $ xinference list --all
178
186
  ```
179
187
 
180
- | Name | Format | Size (in billions) | Quantization |
181
- | -------------------- | ------- | ------------------ |--------------------------------------------------------------------------------------------------------------------------------|
182
- | baichuan | ggmlv3 | [7] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
183
- | wizardlm-v1.0 | ggmlv3 | [7, 13, 33] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
184
- | vicuna-v1.3 | ggmlv3 | [7, 13] | ['q2_K', 'q3_K_L', 'q3_K_M', 'q3_K_S', 'q4_0', 'q4_1', 'q4_K_M', 'q4_K_S', 'q5_0', 'q5_1', 'q5_K_M', 'q5_K_S', 'q6_K', 'q8_0'] |
185
- | orca | ggmlv3 | [3, 7, 13] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
186
- | chatglm | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
187
- | chatglm2 | ggmlv3 | [6] | ['q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0'] |
188
+ | Name | Type | Language | Format | Size (in billions) | Quantization |
189
+ | -------------------- |------------------|----------|--------|--------------------|----------------------------------------|
190
+ | baichuan | Foundation Model | en, zh | ggmlv3 | 7 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
191
+ | chatglm | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
192
+ | chatglm2 | SFT Model | en, zh | ggmlv3 | 6 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
193
+ | wizardlm-v1.0 | SFT Model | en | ggmlv3 | 7, 13, 33 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
194
+ | vicuna-v1.3 | SFT Model | en | ggmlv3 | 7, 13 | 'q2_K', 'q3_K_L', ... , 'q6_K', 'q8_0' |
195
+ | orca | SFT Model | en | ggmlv3 | 3, 7, 13 | 'q4_0', 'q4_1', 'q5_0', 'q5_1', 'q8_0' |
196
+
188
197
 
189
198
  **NOTE**:
190
- - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is required to run `baichuan`, `wizardlm-v1.0`, `vicuna-v1.3` and `orca`.
191
- - [chatglm-cpp-python](https://github.com/li-plus/chatglm.cpp) is required to run `chatglm` and `chatglm2`.
192
- - Xinference will download models automatically for you and by default the models will be saved under `${USER}/.xinference/cache`.
199
+ - Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
200
+ - Foundation models only provide interface `generate`.
201
+ - SFT models provide both `generate` and `chat`.
193
202
 
194
203
  ## Roadmap
195
204
  Xinference is currently under active development. Here's a roadmap outlining our planned
196
205
  developments for the next few weeks:
197
206
 
198
207
  ### PyTorch Support
199
- With PyTorch integration, users will be able to seamlessly utilize PyTorch models form huggingface
208
+ With PyTorch integration, users will be able to seamlessly utilize PyTorch models from Hugging Face
200
209
  within Xinference.
201
210
 
202
211
  ### Langchain & LlamaIndex integration
@@ -34,6 +34,7 @@ xinference/deploy/worker.py
34
34
  xinference/deploy/test/__init__.py
35
35
  xinference/locale/__init__.py
36
36
  xinference/locale/utils.py
37
+ xinference/locale/zh_CN.json
37
38
  xinference/model/__init__.py
38
39
  xinference/model/llm/__init__.py
39
40
  xinference/model/llm/chatglm.py
@@ -5,6 +5,13 @@ click
5
5
  tqdm
6
6
  tabulate
7
7
  requests
8
+ pydantic
9
+ fastapi
10
+ uvicorn
11
+
12
+ [all]
13
+ chatglm-cpp
14
+ llama-cpp-python
8
15
 
9
16
  [dev]
10
17
  cython>=0.29
File without changes
File without changes
File without changes
File without changes