@chenchaolong/plugin-vllm 0.0.4 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,68 +1,68 @@
1
- # Xpert Plugin: vLLM
2
-
3
- ## Overview
4
-
5
- `@chenchaolong/plugin-vllm` provides a model adapter for connecting vLLM inference services to the [XpertAI](https://github.com/xpert-ai/xpert) platform. The plugin communicates with vLLM clusters via an OpenAI-compatible API, enabling agents to invoke conversational models, embedding models, vision-enhanced models, and reranking models within a unified XpertAI agentic workflow.
6
-
7
- ## Core Features
8
-
9
- - Provides the `VLLMPlugin` NestJS module, which automatically registers model providers, lifecycle logging, and configuration validation logic.
10
- - Wraps vLLM's conversational/inference capabilities as XpertAI's `LargeLanguageModel` via `VLLMLargeLanguageModel`, supporting function calling, streaming output, and agent token statistics.
11
- - Exposes `VLLMTextEmbeddingModel`, reusing LangChain's `OpenAIEmbeddings` to generate vector representations for knowledge base retrieval.
12
- - Integrates `VLLMRerankModel`, leveraging the OpenAI-compatible rerank API to improve retrieval result ranking.
13
- - Supports declaring capabilities such as vision, function calling, and streaming mode in plugin metadata, allowing flexible configuration of different vLLM deployments in the console.
14
-
15
- ## Installation
16
-
17
- ```bash
18
- npm install @chenchaolong/plugin-vllm
19
- ```
20
-
21
- > **Peer Dependencies**: The host project must also provide libraries such as `@xpert-ai/plugin-sdk`, `@nestjs/common`, `@metad/contracts`, `@langchain/openai`, `lodash-es`, `chalk`, and `zod`. Please refer to `package.json` for version requirements.
22
-
23
- ## Enabling in XpertAI
24
-
25
- 1. Add the plugin package to your system dependencies and ensure it is resolvable by Node.js.
26
- 2. Before starting the service, declare the plugin in your environment variables:
27
- ```bash
28
- PLUGINS=@chenchaolong/plugin-vllm
29
- ```
30
- 3. Add a new model provider in the XpertAI admin interface or configuration file, and select `vllm`.
31
-
32
- ## Credentials & Model Configuration
33
-
34
- The form fields defined in `vllm.yaml` cover common deployment scenarios:
35
-
36
- | Field | Description |
37
- | --- | --- |
38
- | `api_key` | vLLM service access token (leave blank if the service does not require authentication). |
39
- | `endpoint_url` | Required. The base URL of the vLLM OpenAI-compatible API, e.g., `https://vllm.example.com/v1`. |
40
- | `endpoint_model_name` | Specify explicitly if the model name on the server differs from the logical model name in XpertAI. |
41
- | `mode` | Choose between `chat` or `completion` inference modes. |
42
- | `context_size` / `max_tokens_to_sample` | Control the context window and generation length. |
43
- | `agent_though_support`, `function_calling_type`, `stream_function_calling`, `vision_support` | Indicate whether the model supports agent thought exposure, function/tool calling, streaming function calling, and multimodal input, to inform UI capability hints. |
44
- | `stream_mode_delimiter` | Customize the paragraph delimiter for streaming output. |
45
-
46
- After saving the configuration, the plugin will call the `validateCredentials` method in the background, making a minimal request to the vLLM service to ensure the credentials are valid.
47
-
48
- ## Model Capabilities
49
-
50
- - **Conversational Models**: Uses `ChatOAICompatReasoningModel` to proxy the vLLM OpenAI API, supporting message history, function calling, and streaming output.
51
- - **Embedding Models**: Relies on LangChain's `OpenAIEmbeddings` for knowledge base vectorization and retrieval-augmented generation.
52
- - **Reranking Models**: Wraps `OpenAICompatibleReranker` to semantically rerank recall results.
53
- - **Vision Models**: If the vLLM inference service supports multimodal (text+image) input, enable `vision_support` in the configuration to declare multimodal capabilities to the frontend.
54
-
55
- ## Development & Debugging
56
-
57
- From the repository root, enter the `xpertai/` directory and use Nx commands to build and test:
58
-
59
- ```bash
60
- npx nx build @chenchaolong/plugin-vllm
61
- npx nx test @chenchaolong/plugin-vllm
62
- ```
63
-
64
- Build artifacts are output to `dist/` by default. Jest configuration is in `jest.config.ts` for writing and running unit tests.
65
-
66
- ## License
67
-
68
- This project follows the [AGPL-3.0 License](../../../LICENSE) found at the root of the repository.
1
+ # Xpert Plugin: vLLM
2
+
3
+ ## Overview
4
+
5
+ `@chenchaolong/plugin-vllm` provides a model adapter for connecting vLLM inference services to the [XpertAI](https://github.com/xpert-ai/xpert) platform. The plugin communicates with vLLM clusters via an OpenAI-compatible API, enabling agents to invoke conversational models, embedding models, vision-enhanced models, and reranking models within a unified XpertAI agentic workflow.
6
+
7
+ ## Core Features
8
+
9
+ - Provides the `VLLMPlugin` NestJS module, which automatically registers model providers, lifecycle logging, and configuration validation logic.
10
+ - Wraps vLLM's conversational/inference capabilities as XpertAI's `LargeLanguageModel` via `VLLMLargeLanguageModel`, supporting function calling, streaming output, and agent token statistics.
11
+ - Exposes `VLLMTextEmbeddingModel`, reusing LangChain's `OpenAIEmbeddings` to generate vector representations for knowledge base retrieval.
12
+ - Integrates `VLLMRerankModel`, leveraging the OpenAI-compatible rerank API to improve retrieval result ranking.
13
+ - Supports declaring capabilities such as vision, function calling, and streaming mode in plugin metadata, allowing flexible configuration of different vLLM deployments in the console.
14
+
15
+ ## Installation
16
+
17
+ ```bash
18
+ npm install @chenchaolong/plugin-vllm
19
+ ```
20
+
21
+ > **Peer Dependencies**: The host project must also provide libraries such as `@xpert-ai/plugin-sdk`, `@nestjs/common`, `@metad/contracts`, `@langchain/openai`, `lodash-es`, `chalk`, and `zod`. Please refer to `package.json` for version requirements.
22
+
23
+ ## Enabling in XpertAI
24
+
25
+ 1. Add the plugin package to your system dependencies and ensure it is resolvable by Node.js.
26
+ 2. Before starting the service, declare the plugin in your environment variables:
27
+ ```bash
28
+ PLUGINS=@chenchaolong/plugin-vllm
29
+ ```
30
+ 3. Add a new model provider in the XpertAI admin interface or configuration file, and select `vllm`.
31
+
32
+ ## Credentials & Model Configuration
33
+
34
+ The form fields defined in `vllm.yaml` cover common deployment scenarios:
35
+
36
+ | Field | Description |
37
+ | --- | --- |
38
+ | `api_key` | vLLM service access token (leave blank if the service does not require authentication). |
39
+ | `endpoint_url` | Required. The base URL of the vLLM OpenAI-compatible API, e.g., `https://vllm.example.com/v1`. |
40
+ | `endpoint_model_name` | Specify explicitly if the model name on the server differs from the logical model name in XpertAI. |
41
+ | `mode` | Choose between `chat` or `completion` inference modes. |
42
+ | `context_size` / `max_tokens_to_sample` | Control the context window and generation length. |
43
+ | `agent_though_support`, `function_calling_type`, `stream_function_calling`, `vision_support` | Indicate whether the model supports agent thought exposure, function/tool calling, streaming function calling, and multimodal input, to inform UI capability hints. |
44
+ | `stream_mode_delimiter` | Customize the paragraph delimiter for streaming output. |
45
+
46
+ After saving the configuration, the plugin will call the `validateCredentials` method in the background, making a minimal request to the vLLM service to ensure the credentials are valid.
47
+
48
+ ## Model Capabilities
49
+
50
+ - **Conversational Models**: Uses `ChatOAICompatReasoningModel` to proxy the vLLM OpenAI API, supporting message history, function calling, and streaming output.
51
+ - **Embedding Models**: Relies on LangChain's `OpenAIEmbeddings` for knowledge base vectorization and retrieval-augmented generation.
52
+ - **Reranking Models**: Wraps `OpenAICompatibleReranker` to semantically rerank recall results.
53
+ - **Vision Models**: If the vLLM inference service supports multimodal (text+image) input, enable `vision_support` in the configuration to declare multimodal capabilities to the frontend.
54
+
55
+ ## Development & Debugging
56
+
57
+ From the repository root, enter the `xpertai/` directory and use Nx commands to build and test:
58
+
59
+ ```bash
60
+ npx nx build @chenchaolong/plugin-vllm
61
+ npx nx test @chenchaolong/plugin-vllm
62
+ ```
63
+
64
+ Build artifacts are output to `dist/` by default. Jest configuration is in `jest.config.ts` for writing and running unit tests.
65
+
66
+ ## License
67
+
68
+ This project follows the [AGPL-3.0 License](../../../LICENSE) found at the root of the repository.
package/dist/llm/llm.d.ts CHANGED
@@ -1,4 +1,4 @@
1
- import { ICopilotModel } from '@metad/contracts';
1
+ import { AIModelEntity, ICopilotModel } from '@metad/contracts';
2
2
  import { ChatOAICompatReasoningModel, LargeLanguageModel, TChatModelOptions } from '@xpert-ai/plugin-sdk';
3
3
  import { VLLMProviderStrategy } from '../provider.strategy.js';
4
4
  import { VLLMModelCredentials } from '../types.js';
@@ -7,5 +7,10 @@ export declare class VLLMLargeLanguageModel extends LargeLanguageModel {
7
7
  constructor(modelProvider: VLLMProviderStrategy);
8
8
  validateCredentials(model: string, credentials: VLLMModelCredentials): Promise<void>;
9
9
  getChatModel(copilotModel: ICopilotModel, options?: TChatModelOptions): ChatOAICompatReasoningModel;
10
+ /**
11
+ * Generate model schema from credentials for customizable models
12
+ * This method dynamically generates parameter rules including thinking mode
13
+ */
14
+ getCustomizableModelSchemaFromCredentials(model: string, credentials: Record<string, any>): AIModelEntity | null;
10
15
  }
11
16
  //# sourceMappingURL=llm.d.ts.map
@@ -1 +1 @@
1
- {"version":3,"file":"llm.d.ts","sourceRoot":"","sources":["../../src/llm/llm.ts"],"names":[],"mappings":"AACA,OAAO,EAAmB,aAAa,EAAE,MAAM,kBAAkB,CAAA;AAEjE,OAAO,EACL,2BAA2B,EAG3B,kBAAkB,EAClB,iBAAiB,EAClB,MAAM,sBAAsB,CAAA;AAE7B,OAAO,EAAE,oBAAoB,EAAE,MAAM,yBAAyB,CAAA;AAC9D,OAAO,EAAsB,oBAAoB,EAAE,MAAM,aAAa,CAAA;AAGtE,qBACa,sBAAuB,SAAQ,kBAAkB;;gBAGhD,aAAa,EAAE,oBAAoB;IAIzC,mBAAmB,CAAC,KAAK,EAAE,MAAM,EAAE,WAAW,EAAE,oBAAoB,GAAG,OAAO,CAAC,IAAI,CAAC;IAkBjF,YAAY,CAAC,YAAY,EAAE,aAAa,EAAE,OAAO,CAAC,EAAE,iBAAiB;CA4B/E"}
1
+ {"version":3,"file":"llm.d.ts","sourceRoot":"","sources":["../../src/llm/llm.ts"],"names":[],"mappings":"AACA,OAAO,EACL,aAAa,EAGb,aAAa,EAId,MAAM,kBAAkB,CAAA;AAEzB,OAAO,EACL,2BAA2B,EAG3B,kBAAkB,EAClB,iBAAiB,EAClB,MAAM,sBAAsB,CAAA;AAE7B,OAAO,EAAE,oBAAoB,EAAE,MAAM,yBAAyB,CAAA;AAC9D,OAAO,EAAsB,oBAAoB,EAAE,MAAM,aAAa,CAAA;AAGtE,qBACa,sBAAuB,SAAQ,kBAAkB;;gBAGhD,aAAa,EAAE,oBAAoB;IAIzC,mBAAmB,CAAC,KAAK,EAAE,MAAM,EAAE,WAAW,EAAE,oBAAoB,GAAG,OAAO,CAAC,IAAI,CAAC;IAkBjF,YAAY,CAAC,YAAY,EAAE,aAAa,EAAE,OAAO,CAAC,EAAE,iBAAiB;IA+C9E;;;OAGG;IACM,yCAAyC,CAChD,KAAK,EAAE,MAAM,EACb,WAAW,EAAE,MAAM,CAAC,MAAM,EAAE,GAAG,CAAC,GAC/B,aAAa,GAAG,IAAI;CAwExB"}
package/dist/llm/llm.js CHANGED
@@ -2,7 +2,7 @@ var _a;
2
2
  var VLLMLargeLanguageModel_1;
3
3
  import { __decorate, __metadata } from "tslib";
4
4
  import { ChatOpenAI } from '@langchain/openai';
5
- import { AiModelTypeEnum } from '@metad/contracts';
5
+ import { AiModelTypeEnum, FetchFrom, ModelFeature, ModelPropertyKey, ParameterType } from '@metad/contracts';
6
6
  import { Injectable, Logger } from '@nestjs/common';
7
7
  import { ChatOAICompatReasoningModel, CredentialsValidateFailedError, getErrorMessage, LargeLanguageModel } from '@xpert-ai/plugin-sdk';
8
8
  import { isNil, omitBy } from 'lodash-es';
@@ -39,8 +39,24 @@ let VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = class VLLMLargeLanguageM
39
39
  throw new Error(translate('Error.ModelCredentialsMissing', { model: copilotModel.model }));
40
40
  }
41
41
  const params = toCredentialKwargs(modelProperties, copilotModel.model);
42
+ // Get thinking parameter from model options (runtime parameter)
43
+ // This takes priority over the default value in credentials
44
+ const modelOptions = copilotModel.options;
45
+ const thinking = modelOptions?.thinking ?? modelProperties?.thinking ?? false;
46
+ // Merge modelKwargs with thinking parameter
47
+ // Ensure chat_template_kwargs structure is correct for vLLM API
48
+ const existingModelKwargs = (params.modelKwargs || {});
49
+ const existingChatTemplateKwargs = existingModelKwargs.chat_template_kwargs || {};
50
+ const modelKwargs = {
51
+ ...existingModelKwargs,
52
+ chat_template_kwargs: {
53
+ ...existingChatTemplateKwargs,
54
+ enable_thinking: !!thinking
55
+ }
56
+ };
42
57
  const fields = omitBy({
43
58
  ...params,
59
+ modelKwargs,
44
60
  streaming: copilotModel.options?.['streaming'] ?? true,
45
61
  // include token usage in the stream. this will include an additional chunk at the end of the stream with the token usage.
46
62
  streamUsage: true
@@ -54,6 +70,75 @@ let VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = class VLLMLargeLanguageM
54
70
  ]
55
71
  });
56
72
  }
73
+ /**
74
+ * Generate model schema from credentials for customizable models
75
+ * This method dynamically generates parameter rules including thinking mode
76
+ */
77
+ getCustomizableModelSchemaFromCredentials(model, credentials) {
78
+ const rules = [];
79
+ // Add thinking mode parameter
80
+ // This parameter enables thinking mode for models deployed on vLLM and SGLang
81
+ rules.push({
82
+ name: 'thinking',
83
+ type: ParameterType.BOOLEAN,
84
+ label: {
85
+ zh_Hans: '思考模式',
86
+ en_US: 'Thinking Mode'
87
+ },
88
+ help: {
89
+ zh_Hans: '是否启用思考模式',
90
+ en_US: 'Enable thinking mode'
91
+ },
92
+ required: false,
93
+ default: credentials['thinking'] ?? false
94
+ });
95
+ // Determine completion type from credentials
96
+ let completionType = 'chat';
97
+ if (credentials['mode']) {
98
+ if (credentials['mode'] === 'chat') {
99
+ completionType = 'chat';
100
+ }
101
+ else if (credentials['mode'] === 'completion') {
102
+ completionType = 'completion';
103
+ }
104
+ }
105
+ // Build features array based on credentials
106
+ const features = [];
107
+ // Check function calling support
108
+ const functionCallingType = credentials['function_calling_type'];
109
+ if (functionCallingType === 'function_call' || functionCallingType === 'tool_call') {
110
+ features.push(ModelFeature.TOOL_CALL);
111
+ }
112
+ // Check vision support
113
+ const visionSupport = credentials['vision_support'];
114
+ if (visionSupport === 'support') {
115
+ features.push(ModelFeature.VISION);
116
+ }
117
+ // Check agent thought support
118
+ const agentThoughtSupport = credentials['agent_though_support'];
119
+ if (agentThoughtSupport === 'supported') {
120
+ features.push(ModelFeature.AGENT_THOUGHT);
121
+ }
122
+ // Get context size from credentials
123
+ const contextSize = credentials['context_size']
124
+ ? parseInt(String(credentials['context_size']), 10)
125
+ : 4096;
126
+ return {
127
+ model,
128
+ label: {
129
+ zh_Hans: model,
130
+ en_US: model
131
+ },
132
+ fetch_from: FetchFrom.CUSTOMIZABLE_MODEL,
133
+ model_type: AiModelTypeEnum.LLM,
134
+ features: features,
135
+ model_properties: {
136
+ [ModelPropertyKey.MODE]: completionType,
137
+ [ModelPropertyKey.CONTEXT_SIZE]: contextSize
138
+ },
139
+ parameter_rules: rules
140
+ };
141
+ }
57
142
  };
58
143
  VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = __decorate([
59
144
  Injectable(),
package/package.json CHANGED
@@ -1,51 +1,51 @@
1
- {
2
- "name": "@chenchaolong/plugin-vllm",
3
- "version": "0.0.4",
4
- "author": {
5
- "name": "XpertAI",
6
- "url": "https://xpertai.cn"
7
- },
8
- "license": "AGPL-3.0",
9
- "repository": {
10
- "type": "git",
11
- "url": "https://github.com/xpert-ai/xpert-plugins.git"
12
- },
13
- "bugs": {
14
- "url": "https://github.com/xpert-ai/xpert-plugins/issues"
15
- },
16
- "type": "module",
17
- "main": "./dist/index.js",
18
- "module": "./dist/index.js",
19
- "types": "./dist/index.d.ts",
20
- "exports": {
21
- "./package.json": "./package.json",
22
- ".": {
23
- "@xpert-plugins-starter/source": "./src/index.ts",
24
- "types": "./dist/index.d.ts",
25
- "import": "./dist/index.js",
26
- "default": "./dist/index.js"
27
- }
28
- },
29
- "files": [
30
- "dist",
31
- "src/i18n",
32
- "!**/*.tsbuildinfo"
33
- ],
34
- "scripts": {
35
- "prepack": "node ./scripts/copy-assets.mjs"
36
- },
37
- "dependencies": {
38
- "tslib": "^2.3.0"
39
- },
40
- "peerDependencies": {
41
- "@langchain/openai": "0.6.9",
42
- "@metad/contracts": "^3.6.1",
43
- "@nestjs/common": "^11.1.6",
44
- "@nestjs/config": "^4.0.2",
45
- "@xpert-ai/plugin-sdk": "^3.6.3",
46
- "i18next": "25.6.0",
47
- "lodash-es": "4.17.21",
48
- "chalk": "4.1.2",
49
- "zod": "3.25.67"
50
- }
51
- }
1
+ {
2
+ "name": "@chenchaolong/plugin-vllm",
3
+ "version": "0.0.6",
4
+ "author": {
5
+ "name": "XpertAI",
6
+ "url": "https://xpertai.cn"
7
+ },
8
+ "license": "AGPL-3.0",
9
+ "repository": {
10
+ "type": "git",
11
+ "url": "https://github.com/xpert-ai/xpert-plugins.git"
12
+ },
13
+ "bugs": {
14
+ "url": "https://github.com/xpert-ai/xpert-plugins/issues"
15
+ },
16
+ "type": "module",
17
+ "main": "./dist/index.js",
18
+ "module": "./dist/index.js",
19
+ "types": "./dist/index.d.ts",
20
+ "exports": {
21
+ "./package.json": "./package.json",
22
+ ".": {
23
+ "@xpert-plugins-starter/source": "./src/index.ts",
24
+ "types": "./dist/index.d.ts",
25
+ "import": "./dist/index.js",
26
+ "default": "./dist/index.js"
27
+ }
28
+ },
29
+ "files": [
30
+ "dist",
31
+ "src/i18n",
32
+ "!**/*.tsbuildinfo"
33
+ ],
34
+ "scripts": {
35
+ "prepack": "node ./scripts/copy-assets.mjs"
36
+ },
37
+ "dependencies": {
38
+ "tslib": "^2.3.0"
39
+ },
40
+ "peerDependencies": {
41
+ "@langchain/openai": "0.6.9",
42
+ "@metad/contracts": "^3.6.1",
43
+ "@nestjs/common": "^11.1.6",
44
+ "@nestjs/config": "^4.0.2",
45
+ "@xpert-ai/plugin-sdk": "^3.6.3",
46
+ "i18next": "25.6.0",
47
+ "lodash-es": "4.17.21",
48
+ "chalk": "4.1.2",
49
+ "zod": "3.25.67"
50
+ }
51
+ }