PyPI - xinference - Versions diffs - 1.1.1__py3-none-any.whl → 1.2.0__py3-none-any.whl - Mend

xinference 1.1.1py3-none-any.whl → 1.2.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of xinference might be problematic. Click here for more details.

Files changed (125) hide show

xinference/web/ui/src/locales/en.json ADDED Viewed

@@ -0,0 +1,186 @@
+{
+  "menu": {
+    "launchModel": "Launch Model",
+    "runningModels": "Running Models",
+    "registerModel": "Register Model",
+    "clusterInfo": "Cluster Information",
+    "contactUs": "Contact Us"
+  },
+  "model": {
+    "languageModels": "Language Models",
+    "embeddingModels": "Embedding Models",
+    "rerankModels": "Rerank Models",
+    "imageModels": "Image Models",
+    "audioModels": "Audio Models",
+    "videoModels": "Video Models",
+    "customModels": "Custom Models",
+    "flexibleModels": "Flexible Models"
+  },
+  "launchModel": {
+    "modelAbility": "Model Ability",
+    "generate": "generate",
+    "chat": "chat",
+    "vision": "vl-chat",
+    "status": "Status",
+    "cached": "Cached",
+    "manageCachedModels": "Manage Cached Models",
+    "favorite": "Favorite",
+    "unfavorite": "Unfavorite",
+    "search": "Search for model name and description",
+    "searchModelType": "Search for {{modelType}} model name",
+    "searchInstruction": "Type {{hotkey}} to search",
+    "clickToLaunchModel": "Click with mouse to launch the model",
+    "dimensions": "dimensions",
+    "maxTokens": "max tokens",
+    "edit": "Edit",
+    "delete": "Delete",
+    "contextLength": "context length",
+    "chatModel": "chat model",
+    "generateModel": "generate model",
+    "otherModel": "other model",
+    "confirmDeleteCustomModel": "Are you sure to delete this custom model? This behavior is irreversible.",
+    "lastConfig": "Last Config",
+    "modelEngine": "Model Engine",
+    "modelFormat": "Model Format",
+    "modelSize": "Model Size",
+    "quantization": "Quantization",
+    "nGPU": "GPU Count",
+    "nGpuLayers": "N GPU Layers",
+    "replica": "Replica",
+    "optionalConfigurations": "Optional Configurations",
+    "modelUID.optional": "(Optional) Model UID, model name by default",
+    "requestLimits.optional": "(Optional) Request Limits, the number of request limits for this model, default is None",
+    "workerIp.optional": "(Optional) Worker Ip, specify the worker ip where the model is located in a distributed scenario",
+    "workerIp": "Worker Ip, specify the worker ip where the model is located in a distributed scenario",
+    "GPUIdx.optional": "(Optional) GPU Idx, Specify the GPU index where the model is located",
+    "GPUIdx": "GPU Idx, Specify the GPU index where the model is located",
+    "downloadHub.optional": "(Optional) Download_hub",
+    "modelPath.optional": "(Optional) Model Path, For PyTorch, provide the model directory. For GGML/GGUF, provide the model file path.",
+    "GGUFQuantization.optional": "(Optional) GGUF quantization format, quantizing the Transformer part.",
+    "GGUFModelPath.optional": "(Optional) GGUF model path, should be a file ending with .gguf.",
+    "CPUOffload": "CPU Offload",
+    "CPUOffload.tip": "Unload the model to the CPU. Recommend to enable this when resources are limited or when using the GGUF option.",
+    "loraConfig": "Lora Config",
+    "loraModelConfig": "Lora Model Config",
+    "additionalParametersForInferenceEngine": "Additional parameters passed to the inference engine",
+    "enterIntegerGreaterThanZero": "Please enter an integer greater than 0.",
+    "enterCommaSeparatedNumbers": "Please enter numeric data separated by commas, for example: 0,1,2",
+    "device": "Device",
+    "loraLoadKwargsForImageModel": "Lora Load Kwargs for Image Model",
+    "loraFuseKwargsForImageModel": "Lora Fuse Kwargs for Image Model",
+    "launch": "Launch",
+    "goBack": "Go Back",
+    "copyJson": "Copy Json",
+    "cancel": "Cancel",
+    "fillCompleteParametersBeforeAdding": "Please fill in the complete parameters before adding!",
+    "model_format": "model_format",
+    "model_size_in_billions": "model_size_in_billions",
+    "quantizations": "quantizations",
+    "real_path": "real_path",
+    "path": "path",
+    "ipAddress": "IP Address",
+    "operation": "operation",
+    "copyRealPath": "Copy real_path",
+    "copyPath": "Copy path",
+    "noCacheForNow": "No cache for now!",
+    "confirmDeleteCacheFiles": "Confirm deletion of cache files? This action is irreversible."
+  },
+  "runningModels": {
+    "name": "Name",
+    "address": "Address",
+    "gpuIndexes": "GPU Indexes",
+    "size": "Size",
+    "quantization": "Quantization",
+    "replica": "Replica",
+    "actions": "Actions",
+    "noRunningModels": "No Running Models",
+    "noRunningModelsMatches": "No Running Models Matches"
+  },
+  "registerModel": {
+    "modelName": "Model Name",
+    "modelDescription": "Model Description (Optional)",
+    "contextLength": "Context Length",
+    "dimensions": "Dimensions",
+    "maxTokens": "Max Tokens",
+    "modelPath": "Model Path",
+    "modelLanguages": "Model Languages",
+    "languages": "Languages",
+    "multilingual": "Multilingual",
+    "modelAbilities": "Model Abilities",
+    "modelFamily": "Model Family",
+    "chatTemplate": "Chat Template",
+    "test": "test",
+    "testResult": "test result",
+    "noTestResults": "No test results...",
+    "stopTokenIds": "Stop Token Ids",
+    "stop": "Stop",
+    "launcher": "Launcher",
+    "launcherArguments": "Launcher Arguments (Optional)",
+    "edit": "Edit",
+    "cancel": "Cancel",
+    "registerModel": "Register Model",
+    "messagesExample": "Messages Example",
+    "JSONFormat": "JSON Format",
+    "modelSpecs": "Model Specs",
+    "modelSizeBillions": "Model Size in Billions",
+    "quantization": "Quantization",
+    "quantizationOptional": "Quantization (Optional)",
+    "delete": "Delete",
+    "controlnet": "Controlnet",
+    "more": "more",
+    "modelFormat": "Model Format",
+    "enterNumberGreaterThanZero": "Please enter a number greater than 0.",
+    "carefulQuantizationForModelRegistration": "For GPTQ/AWQ/FP8/MLX models, please be careful to fill in the quantization corresponding to the model you want to register.",
+    "quantizationCannotBeEmpty": "Quantization cannot be left empty.",
+    "enterInteger": "Please enter an integer.",
+    "enterIntegerGreaterThanZero": "Please enter an integer greater than 0.",
+    "showCustomJsonConfig": "Show custom json config used by api",
+    "packUp": "Pack up",
+    "unfold": "Unfold",
+    "copyAll": "Copy all",
+    "alphanumericWithHyphensUnderscores": "Alphanumeric characters with properly placed hyphens and underscores. Must not match any built-in model names.",
+    "chooseBuiltInOrCustomModel": "You can choose from the built-in models or input your own.",
+    "chooseOnlyBuiltInModel": "You can only choose from the built-in models.",
+    "provideModelDirectoryPath": "Provide the model directory path.",
+    "provideModelLauncher": "Provide the model launcher.",
+    "jsonArgumentsForLauncher": "A JSON-formatted dictionary representing the arguments passed to the Launcher.",
+    "provideModelDirectoryOrFilePath": "For PyTorch, provide the model directory. For GGUF, provide the model file path.",
+    "ensureChatTemplatePassesTest": "Please make sure this chat_template passes the test by clicking the TEST button on the right. Please note that this test may not cover all cases and will only be used for the most basic case.",
+    "testFailurePreventsChatWorking": "Please note that failure to pass test may prevent chats from working properly.",
+    "stopControlForChatModels": "int type, used to control the stopping of chat models",
+    "stopControlStringForChatModels": "string type, used to control the stopping of chat models",
+    "enterJsonFormattedDictionary": "Please enter the JSON-formatted dictionary."
+  },
+  "clusterInfo": {
+    "supervisor": "Supervisor",
+    "workers": "Workers",
+    "workerDetails": "Worker Details",
+    "count": "Count",
+    "cpuInfo": "CPU Info",
+    "usage": "Usage:",
+    "total": "Total",
+    "cpuMemoryInfo": "CPU Memory Info",
+    "version": "Version",
+    "release": "Release:",
+    "commit": "Commit:",
+    "gpuInfo": "GPU Info",
+    "gpuMemoryInfo": "GPU Memory Info",
+    "address": "Address",
+    "item": "Item",
+    "value": "Value",
+    "nodeType": "Node Type",
+    "cpuUsage": "CPU Usage",
+    "cpuTotal": "CPU Total",
+    "memUsage": "Mem Usage",
+    "memTotal": "Mem Total",
+    "gpuCount": "GPU Count",
+    "gpuMemUsage": "GPU Mem Usage",
+    "gpuMemTotal": "GPU Mem Total",
+    "worker": "Worker"
+  }
+}

xinference/web/ui/src/locales/zh.json ADDED Viewed

@@ -0,0 +1,186 @@
+{
+  "menu": {
+    "launchModel": "启动模型",
+    "runningModels": "运行模型",
+    "registerModel": "注册模型",
+    "clusterInfo": "集群信息",
+    "contactUs": "联系我们"
+  },
+  "model": {
+    "languageModels": "语言模型",
+    "embeddingModels": "嵌入模型",
+    "rerankModels": "重排序模型",
+    "imageModels": "图像模型",
+    "audioModels": "音频模型",
+    "videoModels": "视频模型",
+    "customModels": "自定义模型",
+    "flexibleModels": "灵活模型"
+  },
+  "launchModel": {
+    "modelAbility": "模型能力",
+    "generate": "生成",
+    "chat": "聊天",
+    "vision": "视觉聊天",
+    "status": "状态",
+    "cached": "已缓存",
+    "manageCachedModels": "管理缓存模型",
+    "favorite": "收藏",
+    "unfavorite": "取消收藏",
+    "search": "搜索模型名称和描述",
+    "searchModelType": "搜索 {{modelType}} 相关的模型名称",
+    "searchInstruction": "输入 {{hotkey}} 进行搜索",
+    "clickToLaunchModel": "点击鼠标以启动模型",
+    "dimensions": "维度",
+    "maxTokens": "最大 token 数",
+    "edit": "编辑",
+    "delete": "删除",
+    "contextLength": "上下文长度",
+    "chatModel": "聊天模型",
+    "generateModel": "生成模型",
+    "otherModel": "其他模型",
+    "confirmDeleteCustomModel": "您确定要删除这个自定义模型吗？此操作无法恢复。",
+    "lastConfig": "最后配置",
+    "modelEngine": "模型引擎",
+    "modelFormat": "模型格式",
+    "modelSize": "模型大小",
+    "quantization": "量化",
+    "nGPU": "GPU 数量",
+    "nGpuLayers": "GPU 层数",
+    "replica": "副本",
+    "optionalConfigurations": "可选配置",
+    "modelUID.optional": "(可选) 模型 UID，默认是模型名称",
+    "requestLimits.optional": "(可选) 请求限制，模型的请求限制数，默认值为无",
+    "workerIp.optional": "(可选) 工作节点 IP，在分布式场景中指定模型所在的工作节点 IP",
+    "workerIp": "工作节点 IP，在分布式场景中指定模型所在的工作节点 IP",
+    "GPUIdx.optional": "(可选) GPU 索引，指定模型所在的 GPU 索引",
+    "GPUIdx": "GPU 索引，指定模型所在的 GPU 索引",
+    "downloadHub.optional": "(可选) 下载中心",
+    "modelPath.optional": "(可选) 模型路径，对于 PyTorch，提供模型目录；对于 GGML/GGUF，提供模型文件路径。",
+    "GGUFQuantization.optional": "(可选) GGUF量化格式，对Transformer部分进行量化。",
+    "GGUFModelPath.optional": "(可选) GGUF模型路径，应为以 .gguf 结尾的文件。",
+    "CPUOffload": "CPU卸载",
+    "CPUOffload.tip": "将模型卸载到CPU。当资源有限或使用GGUF选项时，建议启用此功能。",
+    "loraConfig": "Lora 配置",
+    "loraModelConfig": "Lora 模型配置",
+    "additionalParametersForInferenceEngine": "传递给推理引擎的附加参数",
+    "enterIntegerGreaterThanZero": "请输入大于 0 的整数。",
+    "enterCommaSeparatedNumbers": "请输入以逗号分隔的数字数据，例如：0,1,2",
+    "device": "设备",
+    "loraLoadKwargsForImageModel": "图像模型的 Lora 加载参数",
+    "loraFuseKwargsForImageModel": "图像模型的 Lora 融合参数",
+    "launch": "启动",
+    "goBack": "返回",
+    "copyJson": "复制 JSON",
+    "cancel": "取消",
+    "fillCompleteParametersBeforeAdding": "请在添加之前填写完整的参数！",
+    "model_format": "模型格式",
+    "model_size_in_billions": "模型大小（以十亿为单位）",
+    "quantizations": "量化方式",
+    "real_path": "真实路径",
+    "path": "路径",
+    "ipAddress": "IP 地址",
+    "operation": "操作",
+    "copyRealPath": "复制真实路径",
+    "copyPath": "复制路径",
+    "noCacheForNow": "当前没有缓存！",
+    "confirmDeleteCacheFiles": "确认删除缓存文件吗？此操作无法恢复。"
+  },
+  "runningModels": {
+    "name": "名称",
+    "address": "地址",
+    "gpuIndexes": "GPU 索引",
+    "size": "大小",
+    "quantization": "量化",
+    "replica": "副本",
+    "actions": "操作",
+    "noRunningModels": "没有运行中的模型",
+    "noRunningModelsMatches": "没有匹配的运行模型"
+  },
+  "registerModel": {
+    "modelName": "模型名称",
+    "modelDescription": "模型描述（可选）",
+    "contextLength": "上下文长度",
+    "dimensions": "维度",
+    "maxTokens": "最大 token 数",
+    "modelPath": "模型路径",
+    "modelLanguages": "模型语言",
+    "languages": "语言",
+    "multilingual": "多语言",
+    "modelAbilities": "模型能力",
+    "modelFamily": "模型系列",
+    "chatTemplate": "聊天模板",
+    "test": "测试",
+    "testResult": "测试结果",
+    "noTestResults": "没有测试结果...",
+    "stopTokenIds": "停止token ID",
+    "stop": "停止",
+    "launcher": "启动器",
+    "launcherArguments": "启动器参数（可选）",
+    "edit": "编辑",
+    "cancel": "取消",
+    "registerModel": "注册模型",
+    "messagesExample": "消息示例",
+    "JSONFormat": "JSON 格式",
+    "modelSpecs": "模型规格",
+    "modelSizeBillions": "模型大小（以十亿为单位）",
+    "quantization": "量化",
+    "quantizationOptional": "量化（可选）",
+    "delete": "删除",
+    "controlnet": "控制网",
+    "more": "更多",
+    "modelFormat": "模型格式",
+    "enterNumberGreaterThanZero": "请输入大于 0 的数字。",
+    "carefulQuantizationForModelRegistration": "对于 GPTQ/AWQ/FP8/MLX 模型，请小心填写与您要注册的模型对应的量化方式。",
+    "quantizationCannotBeEmpty": "量化方式不能为空。",
+    "enterInteger": "请输入一个整数。",
+    "enterIntegerGreaterThanZero": "请输入大于 0 的整数。",
+    "showCustomJsonConfig": "显示由 API 使用的自定义 JSON 配置",
+    "packUp": "收起",
+    "unfold": "展开",
+    "copyAll": "复制全部",
+    "alphanumericWithHyphensUnderscores": "字母数字字符，连字符和下划线应正确放置。不能与任何内置模型名称匹配。",
+    "chooseBuiltInOrCustomModel": "您可以选择内置模型或输入自定义模型。",
+    "chooseOnlyBuiltInModel": "您只能从内置模型中选择。",
+    "provideModelDirectoryPath": "提供模型目录路径。",
+    "provideModelLauncher": "提供模型启动器。",
+    "jsonArgumentsForLauncher": "一个 JSON 格式的字典，表示传递给启动器的参数。",
+    "provideModelDirectoryOrFilePath": "对于 PyTorch，提供模型目录。对于 GGUF，提供模型文件路径。",
+    "ensureChatTemplatePassesTest": "请确保通过点击右侧的测试按钮，使此聊天模板通过测试。请注意，此测试可能无法涵盖所有情况，只会用于最基本的情况。",
+    "testFailurePreventsChatWorking": "请注意，未通过测试可能会导致聊天无法正常工作。",
+    "stopControlForChatModels": "整数类型，用于控制聊天模型的停止。",
+    "stopControlStringForChatModels": "字符串类型，用于控制聊天模型的停止。",
+    "enterJsonFormattedDictionary": "请输入 JSON 格式的字典。"
+  },
+  "clusterInfo": {
+    "supervisor": "主管",
+    "workers": "工作节点",
+    "workerDetails": "工作节点详情",
+    "count": "数量",
+    "cpuInfo": "CPU 信息",
+    "usage": "使用率：",
+    "total": "总计",
+    "cpuMemoryInfo": "CPU 内存信息",
+    "version": "版本",
+    "release": "发布：",
+    "commit": "提交：",
+    "gpuInfo": "GPU 信息",
+    "gpuMemoryInfo": "GPU 内存信息",
+    "address": "地址",
+    "item": "项",
+    "value": "值",
+    "nodeType": "节点类型",
+    "cpuUsage": "CPU 使用率",
+    "cpuTotal": "CPU 总数",
+    "memUsage": "内存使用率",
+    "memTotal": "内存总量",
+    "gpuCount": "GPU 数量",
+    "gpuMemUsage": "GPU 内存使用率",
+    "gpuMemTotal": "GPU 内存总量",
+    "worker": "工作节点"
+  }
+}

{xinference-1.1.1.dist-info → xinference-1.2.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xinference
-Version: 1.1.1
+Version: 1.2.0
 Summary: Model Serving Made Easy
 Home-page: https://github.com/xorbitsai/inference
 Author: Qin Xuye
@@ -104,8 +104,9 @@ Requires-Dist: jsonschema; extra == "all"
 Requires-Dist: verovio>=4.3.1; extra == "all"
 Requires-Dist: auto-gptq; sys_platform != "darwin" and extra == "all"
 Requires-Dist: autoawq<0.2.6; sys_platform != "darwin" and extra == "all"
+Requires-Dist: mlx<0.22.0; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
 Requires-Dist: mlx-lm; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
-Requires-Dist: mlx-vlm; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
+Requires-Dist: mlx-vlm>=0.1.7; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
 Requires-Dist: mlx-whisper; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
 Requires-Dist: f5-tts-mlx; (sys_platform == "darwin" and platform_machine == "arm64") and extra == "all"
 Requires-Dist: vllm>=0.2.6; sys_platform == "linux" and extra == "all"
@@ -197,8 +198,9 @@ Requires-Dist: intel-extension-for-pytorch==2.1.10+xpu; extra == "intel"
 Provides-Extra: llama_cpp
 Requires-Dist: llama-cpp-python!=0.2.58,>=0.2.25; extra == "llama-cpp"
 Provides-Extra: mlx
+Requires-Dist: mlx<0.22.0; extra == "mlx"
 Requires-Dist: mlx-lm; extra == "mlx"
-Requires-Dist: mlx-vlm; extra == "mlx"
+Requires-Dist: mlx-vlm>=0.1.7; extra == "mlx"
 Requires-Dist: mlx-whisper; extra == "mlx"
 Requires-Dist: f5-tts-mlx; extra == "mlx"
 Requires-Dist: qwen-vl-utils; extra == "mlx"
@@ -277,6 +279,7 @@ potential of cutting-edge AI models.
 ## 🔥 Hot Topics
 ### Framework Enhancements
+- VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732)
 - Support Continuous batching for Transformers engine: [#1724](https://github.com/xorbitsai/inference/pull/1724)
 - Support MLX backend for Apple Silicon chips: [#1765](https://github.com/xorbitsai/inference/pull/1765)
 - Support specifying worker and GPU indexes for launching models: [#1195](https://github.com/xorbitsai/inference/pull/1195)
@@ -285,14 +288,14 @@ potential of cutting-edge AI models.
 - Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
 - Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
 ### New Models
+- Built-in support for [Stable Diffusion 3.5](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838): [#2706](https://github.com/xorbitsai/inference/pull/2706)
+- Built-in support for [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B): [#2684](https://github.com/xorbitsai/inference/pull/2684)
+- Built-in support for [Fish Speech V1.5](https://huggingface.co/fishaudio/fish-speech-1.5): [#2672](https://github.com/xorbitsai/inference/pull/2672)
 - Built-in support for [F5-TTS](https://github.com/SWivid/F5-TTS): [#2626](https://github.com/xorbitsai/inference/pull/2626)
 - Built-in support for [GLM Edge](https://github.com/THUDM/GLM-Edge): [#2582](https://github.com/xorbitsai/inference/pull/2582)
 - Built-in support for [QwQ-32B-Preview](https://qwenlm.github.io/blog/qwq-32b-preview/): [#2602](https://github.com/xorbitsai/inference/pull/2602)
 - Built-in support for [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325)
-- Built-in support for [Fish Speech V1.4](https://huggingface.co/fishaudio/fish-speech-1.4): [#2295](https://github.com/xorbitsai/inference/pull/2295)
 - Built-in support for [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292)
-- Built-in support for [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): [#2271](https://github.com/xorbitsai/inference/pull/2271)
-- Built-in support for [Qwen2-vl-instruct](https://github.com/QwenLM/Qwen2-VL): [#2205](https://github.com/xorbitsai/inference/pull/2205)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.

xinference 1.1.1__py3-none-any.whl → 1.2.0__py3-none-any.whl

Potentially problematic release.

xinference 1.1.1py3-none-any.whl → 1.2.0py3-none-any.whl