npm - @chenchaolong/plugin-vllm - Versions diffs - 0.0.4 → 0.0.6 - Mend

@chenchaolong/plugin-vllm 0.0.4 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,68 +1,68 @@
-# Xpert Plugin: vLLM
-## Overview
-`@chenchaolong/plugin-vllm` provides a model adapter for connecting vLLM inference services to the [XpertAI](https://github.com/xpert-ai/xpert) platform. The plugin communicates with vLLM clusters via an OpenAI-compatible API, enabling agents to invoke conversational models, embedding models, vision-enhanced models, and reranking models within a unified XpertAI agentic workflow.
-## Core Features
-- Provides the `VLLMPlugin` NestJS module, which automatically registers model providers, lifecycle logging, and configuration validation logic.
-- Wraps vLLM's conversational/inference capabilities as XpertAI's `LargeLanguageModel` via `VLLMLargeLanguageModel`, supporting function calling, streaming output, and agent token statistics.
-- Exposes `VLLMTextEmbeddingModel`, reusing LangChain's `OpenAIEmbeddings` to generate vector representations for knowledge base retrieval.
-- Integrates `VLLMRerankModel`, leveraging the OpenAI-compatible rerank API to improve retrieval result ranking.
-- Supports declaring capabilities such as vision, function calling, and streaming mode in plugin metadata, allowing flexible configuration of different vLLM deployments in the console.
-## Installation
-```bash
-npm install @chenchaolong/plugin-vllm
-```
-> **Peer Dependencies**: The host project must also provide libraries such as `@xpert-ai/plugin-sdk`, `@nestjs/common`, `@metad/contracts`, `@langchain/openai`, `lodash-es`, `chalk`, and `zod`. Please refer to `package.json` for version requirements.
-## Enabling in XpertAI
-1. Add the plugin package to your system dependencies and ensure it is resolvable by Node.js.
-2. Before starting the service, declare the plugin in your environment variables:
-   ```bash
-   PLUGINS=@chenchaolong/plugin-vllm
-   ```
-3. Add a new model provider in the XpertAI admin interface or configuration file, and select `vllm`.
-## Credentials & Model Configuration
-The form fields defined in `vllm.yaml` cover common deployment scenarios:
-| Field | Description |
-| --- | --- |
-| `api_key` | vLLM service access token (leave blank if the service does not require authentication). |
-| `endpoint_url` | Required. The base URL of the vLLM OpenAI-compatible API, e.g., `https://vllm.example.com/v1`. |
-| `endpoint_model_name` | Specify explicitly if the model name on the server differs from the logical model name in XpertAI. |
-| `mode` | Choose between `chat` or `completion` inference modes. |
-| `context_size` / `max_tokens_to_sample` | Control the context window and generation length. |
-| `agent_though_support`, `function_calling_type`, `stream_function_calling`, `vision_support` | Indicate whether the model supports agent thought exposure, function/tool calling, streaming function calling, and multimodal input, to inform UI capability hints. |
-| `stream_mode_delimiter` | Customize the paragraph delimiter for streaming output. |
-After saving the configuration, the plugin will call the `validateCredentials` method in the background, making a minimal request to the vLLM service to ensure the credentials are valid.
-## Model Capabilities
-- **Conversational Models**: Uses `ChatOAICompatReasoningModel` to proxy the vLLM OpenAI API, supporting message history, function calling, and streaming output.
-- **Embedding Models**: Relies on LangChain's `OpenAIEmbeddings` for knowledge base vectorization and retrieval-augmented generation.
-- **Reranking Models**: Wraps `OpenAICompatibleReranker` to semantically rerank recall results.
-- **Vision Models**: If the vLLM inference service supports multimodal (text+image) input, enable `vision_support` in the configuration to declare multimodal capabilities to the frontend.
-## Development & Debugging
-From the repository root, enter the `xpertai/` directory and use Nx commands to build and test:
-```bash
-npx nx build @chenchaolong/plugin-vllm
-npx nx test @chenchaolong/plugin-vllm
-```
-Build artifacts are output to `dist/` by default. Jest configuration is in `jest.config.ts` for writing and running unit tests.
-## License
-This project follows the [AGPL-3.0 License](../../../LICENSE) found at the root of the repository.
+# Xpert Plugin: vLLM
+## Overview
+`@chenchaolong/plugin-vllm` provides a model adapter for connecting vLLM inference services to the [XpertAI](https://github.com/xpert-ai/xpert) platform. The plugin communicates with vLLM clusters via an OpenAI-compatible API, enabling agents to invoke conversational models, embedding models, vision-enhanced models, and reranking models within a unified XpertAI agentic workflow.
+## Core Features
+- Provides the `VLLMPlugin` NestJS module, which automatically registers model providers, lifecycle logging, and configuration validation logic.
+- Wraps vLLM's conversational/inference capabilities as XpertAI's `LargeLanguageModel` via `VLLMLargeLanguageModel`, supporting function calling, streaming output, and agent token statistics.
+- Exposes `VLLMTextEmbeddingModel`, reusing LangChain's `OpenAIEmbeddings` to generate vector representations for knowledge base retrieval.
+- Integrates `VLLMRerankModel`, leveraging the OpenAI-compatible rerank API to improve retrieval result ranking.
+- Supports declaring capabilities such as vision, function calling, and streaming mode in plugin metadata, allowing flexible configuration of different vLLM deployments in the console.
+## Installation
+```bash
+npm install @chenchaolong/plugin-vllm
+```
+> **Peer Dependencies**: The host project must also provide libraries such as `@xpert-ai/plugin-sdk`, `@nestjs/common`, `@metad/contracts`, `@langchain/openai`, `lodash-es`, `chalk`, and `zod`. Please refer to `package.json` for version requirements.
+## Enabling in XpertAI
+1. Add the plugin package to your system dependencies and ensure it is resolvable by Node.js.
+2. Before starting the service, declare the plugin in your environment variables:
+   ```bash
+   PLUGINS=@chenchaolong/plugin-vllm
+   ```
+3. Add a new model provider in the XpertAI admin interface or configuration file, and select `vllm`.
+## Credentials & Model Configuration
+The form fields defined in `vllm.yaml` cover common deployment scenarios:
+| Field | Description |
+| --- | --- |
+| `api_key` | vLLM service access token (leave blank if the service does not require authentication). |
+| `endpoint_url` | Required. The base URL of the vLLM OpenAI-compatible API, e.g., `https://vllm.example.com/v1`. |
+| `endpoint_model_name` | Specify explicitly if the model name on the server differs from the logical model name in XpertAI. |
+| `mode` | Choose between `chat` or `completion` inference modes. |
+| `context_size` / `max_tokens_to_sample` | Control the context window and generation length. |
+| `agent_though_support`, `function_calling_type`, `stream_function_calling`, `vision_support` | Indicate whether the model supports agent thought exposure, function/tool calling, streaming function calling, and multimodal input, to inform UI capability hints. |
+| `stream_mode_delimiter` | Customize the paragraph delimiter for streaming output. |
+After saving the configuration, the plugin will call the `validateCredentials` method in the background, making a minimal request to the vLLM service to ensure the credentials are valid.
+## Model Capabilities
+- **Conversational Models**: Uses `ChatOAICompatReasoningModel` to proxy the vLLM OpenAI API, supporting message history, function calling, and streaming output.
+- **Embedding Models**: Relies on LangChain's `OpenAIEmbeddings` for knowledge base vectorization and retrieval-augmented generation.
+- **Reranking Models**: Wraps `OpenAICompatibleReranker` to semantically rerank recall results.
+- **Vision Models**: If the vLLM inference service supports multimodal (text+image) input, enable `vision_support` in the configuration to declare multimodal capabilities to the frontend.
+## Development & Debugging
+From the repository root, enter the `xpertai/` directory and use Nx commands to build and test:
+```bash
+npx nx build @chenchaolong/plugin-vllm
+npx nx test @chenchaolong/plugin-vllm
+```
+Build artifacts are output to `dist/` by default. Jest configuration is in `jest.config.ts` for writing and running unit tests.
+## License
+This project follows the [AGPL-3.0 License](../../../LICENSE) found at the root of the repository.

package/dist/llm/llm.d.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { ICopilotModel } from '@metad/contracts';
+import { AIModelEntity, ICopilotModel } from '@metad/contracts';
 import { ChatOAICompatReasoningModel, LargeLanguageModel, TChatModelOptions } from '@xpert-ai/plugin-sdk';
 import { VLLMProviderStrategy } from '../provider.strategy.js';
 import { VLLMModelCredentials } from '../types.js';
@@ -7,5 +7,10 @@ export declare class VLLMLargeLanguageModel extends LargeLanguageModel {
     constructor(modelProvider: VLLMProviderStrategy);
     validateCredentials(model: string, credentials: VLLMModelCredentials): Promise<void>;
     getChatModel(copilotModel: ICopilotModel, options?: TChatModelOptions): ChatOAICompatReasoningModel;
+    /**
+     * Generate model schema from credentials for customizable models
+     * This method dynamically generates parameter rules including thinking mode
+     */
+    getCustomizableModelSchemaFromCredentials(model: string, credentials: Record<string, any>): AIModelEntity | null;
 }
 //# sourceMappingURL=llm.d.ts.map

package/dist/llm/llm.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"llm.d.ts","sourceRoot":"","sources":["../../src/llm/llm.ts"],"names":[],"mappings":"AACA,OAAO,~~EAAmB~~,aAAa,~~EAAE~~,MAAM,kBAAkB,CAAA;~~AAEjE~~,OAAO,EACL,2BAA2B,EAG3B,kBAAkB,EAClB,iBAAiB,EAClB,MAAM,sBAAsB,CAAA;AAE7B,OAAO,EAAE,oBAAoB,EAAE,MAAM,yBAAyB,CAAA;AAC9D,OAAO,EAAsB,oBAAoB,EAAE,MAAM,aAAa,CAAA;AAGtE,qBACa,sBAAuB,SAAQ,kBAAkB;;gBAGhD,aAAa,EAAE,oBAAoB;IAIzC,mBAAmB,CAAC,KAAK,EAAE,MAAM,EAAE,WAAW,EAAE,oBAAoB,GAAG,OAAO,CAAC,IAAI,CAAC;IAkBjF,YAAY,CAAC,YAAY,EAAE,aAAa,EAAE,OAAO,CAAC,EAAE,iBAAiB;~~CA4B~~/E"}
1	+ {"version":3,"file":"llm.d.ts","sourceRoot":"","sources":["../../src/llm/llm.ts"],"names":[],"mappings":"AACA,OAAO,EACL,aAAa,EAGb,aAAa,EAId,MAAM,kBAAkB,CAAA;AAEzB,OAAO,EACL,2BAA2B,EAG3B,kBAAkB,EAClB,iBAAiB,EAClB,MAAM,sBAAsB,CAAA;AAE7B,OAAO,EAAE,oBAAoB,EAAE,MAAM,yBAAyB,CAAA;AAC9D,OAAO,EAAsB,oBAAoB,EAAE,MAAM,aAAa,CAAA;AAGtE,qBACa,sBAAuB,SAAQ,kBAAkB;;gBAGhD,aAAa,EAAE,oBAAoB;IAIzC,mBAAmB,CAAC,KAAK,EAAE,MAAM,EAAE,WAAW,EAAE,oBAAoB,GAAG,OAAO,CAAC,IAAI,CAAC;IAkBjF,YAAY,CAAC,YAAY,EAAE,aAAa,EAAE,OAAO,CAAC,EAAE,iBAAiB;IA+C9E;;;OAGG;IACM,yCAAyC,CAChD,KAAK,EAAE,MAAM,EACb,WAAW,EAAE,MAAM,CAAC,MAAM,EAAE,GAAG,CAAC,GAC/B,aAAa,GAAG,IAAI;CAwExB"}

package/dist/llm/llm.js CHANGED Viewed

@@ -2,7 +2,7 @@ var _a;
 var VLLMLargeLanguageModel_1;
 import { __decorate, __metadata } from "tslib";
 import { ChatOpenAI } from '@langchain/openai';
-import { AiModelTypeEnum } from '@metad/contracts';
+import { AiModelTypeEnum, FetchFrom, ModelFeature, ModelPropertyKey, ParameterType } from '@metad/contracts';
 import { Injectable, Logger } from '@nestjs/common';
 import { ChatOAICompatReasoningModel, CredentialsValidateFailedError, getErrorMessage, LargeLanguageModel } from '@xpert-ai/plugin-sdk';
 import { isNil, omitBy } from 'lodash-es';
@@ -39,8 +39,24 @@ let VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = class VLLMLargeLanguageM
             throw new Error(translate('Error.ModelCredentialsMissing', { model: copilotModel.model }));
         }
         const params = toCredentialKwargs(modelProperties, copilotModel.model);
+        // Get thinking parameter from model options (runtime parameter)
+        // This takes priority over the default value in credentials
+        const modelOptions = copilotModel.options;
+        const thinking = modelOptions?.thinking ?? modelProperties?.thinking ?? false;
+        // Merge modelKwargs with thinking parameter
+        // Ensure chat_template_kwargs structure is correct for vLLM API
+        const existingModelKwargs = (params.modelKwargs || {});
+        const existingChatTemplateKwargs = existingModelKwargs.chat_template_kwargs || {};
+        const modelKwargs = {
+            ...existingModelKwargs,
+            chat_template_kwargs: {
+                ...existingChatTemplateKwargs,
+                enable_thinking: !!thinking
+            }
+        };
         const fields = omitBy({
             ...params,
+            modelKwargs,
             streaming: copilotModel.options?.['streaming'] ?? true,
             // include token usage in the stream. this will include an additional chunk at the end of the stream with the token usage.
             streamUsage: true
@@ -54,6 +70,75 @@ let VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = class VLLMLargeLanguageM
             ]
         });
     }
+    /**
+     * Generate model schema from credentials for customizable models
+     * This method dynamically generates parameter rules including thinking mode
+     */
+    getCustomizableModelSchemaFromCredentials(model, credentials) {
+        const rules = [];
+        // Add thinking mode parameter
+        // This parameter enables thinking mode for models deployed on vLLM and SGLang
+        rules.push({
+            name: 'thinking',
+            type: ParameterType.BOOLEAN,
+            label: {
+                zh_Hans: '思考模式',
+                en_US: 'Thinking Mode'
+            },
+            help: {
+                zh_Hans: '是否启用思考模式',
+                en_US: 'Enable thinking mode'
+            },
+            required: false,
+            default: credentials['thinking'] ?? false
+        });
+        // Determine completion type from credentials
+        let completionType = 'chat';
+        if (credentials['mode']) {
+            if (credentials['mode'] === 'chat') {
+                completionType = 'chat';
+            }
+            else if (credentials['mode'] === 'completion') {
+                completionType = 'completion';
+            }
+        }
+        // Build features array based on credentials
+        const features = [];
+        // Check function calling support
+        const functionCallingType = credentials['function_calling_type'];
+        if (functionCallingType === 'function_call' || functionCallingType === 'tool_call') {
+            features.push(ModelFeature.TOOL_CALL);
+        }
+        // Check vision support
+        const visionSupport = credentials['vision_support'];
+        if (visionSupport === 'support') {
+            features.push(ModelFeature.VISION);
+        }
+        // Check agent thought support
+        const agentThoughtSupport = credentials['agent_though_support'];
+        if (agentThoughtSupport === 'supported') {
+            features.push(ModelFeature.AGENT_THOUGHT);
+        }
+        // Get context size from credentials
+        const contextSize = credentials['context_size']
+            ? parseInt(String(credentials['context_size']), 10)
+            : 4096;
+        return {
+            model,
+            label: {
+                zh_Hans: model,
+                en_US: model
+            },
+            fetch_from: FetchFrom.CUSTOMIZABLE_MODEL,
+            model_type: AiModelTypeEnum.LLM,
+            features: features,
+            model_properties: {
+                [ModelPropertyKey.MODE]: completionType,
+                [ModelPropertyKey.CONTEXT_SIZE]: contextSize
+            },
+            parameter_rules: rules
+        };
+    }
 };
 VLLMLargeLanguageModel = VLLMLargeLanguageModel_1 = __decorate([
     Injectable(),

package/package.json CHANGED Viewed

@@ -1,51 +1,51 @@
-{
-  "name": "@chenchaolong/plugin-vllm",
-  "version": "0.0.4",
-  "author": {
-    "name": "XpertAI",
-    "url": "https://xpertai.cn"
-  },
-  "license": "AGPL-3.0",
-  "repository": {
-    "type": "git",
-    "url": "https://github.com/xpert-ai/xpert-plugins.git"
-  },
-  "bugs": {
-    "url": "https://github.com/xpert-ai/xpert-plugins/issues"
-  },
-  "type": "module",
-  "main": "./dist/index.js",
-  "module": "./dist/index.js",
-  "types": "./dist/index.d.ts",
-  "exports": {
-    "./package.json": "./package.json",
-    ".": {
-      "@xpert-plugins-starter/source": "./src/index.ts",
-      "types": "./dist/index.d.ts",
-      "import": "./dist/index.js",
-      "default": "./dist/index.js"
-    }
-  },
-  "files": [
-    "dist",
-    "src/i18n",
-    "!**/*.tsbuildinfo"
-  ],
-  "scripts": {
-    "prepack": "node ./scripts/copy-assets.mjs"
-  },
-  "dependencies": {
-    "tslib": "^2.3.0"
-  },
-  "peerDependencies": {
-    "@langchain/openai": "0.6.9",
-    "@metad/contracts": "^3.6.1",
-    "@nestjs/common": "^11.1.6",
-    "@nestjs/config": "^4.0.2",
-    "@xpert-ai/plugin-sdk": "^3.6.3",
-    "i18next": "25.6.0",
-    "lodash-es": "4.17.21",
-    "chalk": "4.1.2",
-    "zod": "3.25.67"
-  }
-}
+{
+  "name": "@chenchaolong/plugin-vllm",
+  "version": "0.0.6",
+  "author": {
+    "name": "XpertAI",
+    "url": "https://xpertai.cn"
+  },
+  "license": "AGPL-3.0",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/xpert-ai/xpert-plugins.git"
+  },
+  "bugs": {
+    "url": "https://github.com/xpert-ai/xpert-plugins/issues"
+  },
+  "type": "module",
+  "main": "./dist/index.js",
+  "module": "./dist/index.js",
+  "types": "./dist/index.d.ts",
+  "exports": {
+    "./package.json": "./package.json",
+    ".": {
+      "@xpert-plugins-starter/source": "./src/index.ts",
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js",
+      "default": "./dist/index.js"
+    }
+  },
+  "files": [
+    "dist",
+    "src/i18n",
+    "!**/*.tsbuildinfo"
+  ],
+  "scripts": {
+    "prepack": "node ./scripts/copy-assets.mjs"
+  },
+  "dependencies": {
+    "tslib": "^2.3.0"
+  },
+  "peerDependencies": {
+    "@langchain/openai": "0.6.9",
+    "@metad/contracts": "^3.6.1",
+    "@nestjs/common": "^11.1.6",
+    "@nestjs/config": "^4.0.2",
+    "@xpert-ai/plugin-sdk": "^3.6.3",
+    "i18next": "25.6.0",
+    "lodash-es": "4.17.21",
+    "chalk": "4.1.2",
+    "zod": "3.25.67"
+  }
+}