pi-makora-provider 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +162 -0
- package/custom-models.json +24 -0
- package/index.ts +287 -0
- package/models.json +212 -0
- package/package.json +41 -0
- package/patch.json +135 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# 🔁 pi-makora-provider
|
|
4
|
+
|
|
5
|
+
**Open-weight models through [Makora](https://inference.makora.com)**
|
|
6
|
+
|
|
7
|
+
_DeepSeek V4, Kimi K2.6, GLM 5.1, Qwen 3.6 — with client-side tool call repair for [pi](https://github.com/earendil-works/pi-coding-agent)._
|
|
8
|
+
|
|
9
|
+
[](https://github.com/earendil-works/pi-coding-agent)
|
|
10
|
+
[](./LICENSE)
|
|
11
|
+
|
|
12
|
+
</div>
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Models
|
|
17
|
+
|
|
18
|
+
<!-- MODELS_TABLE_START -->
|
|
19
|
+
| Model | ID | Reasoning | Notes |
|
|
20
|
+
|-------|----|-----------|-------|
|
|
21
|
+
| DeepSeek V4 Flash | `deepseek-ai/DeepSeek-V4-Flash` | Yes | `include_reasoning` + `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning` field |
|
|
22
|
+
| DeepSeek V4 Pro | `deepseek-ai/DeepSeek-V4-Pro` | Yes | `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning_content` field |
|
|
23
|
+
| GLM 5.1 FP8 | `zai-org/GLM-5.1-FP8` | Yes | `enable_thinking` via `qwen-chat-template`; returns `reasoning_content` field; client-side tool call parsing (vLLM streaming parser bypass) |
|
|
24
|
+
| GPT-OSS 120B | `openai/gpt-oss-120b` | Yes | Reasoning always on |
|
|
25
|
+
| Kimi K2.6 NVFP4 | `nvidia/Kimi-K2.6-NVFP4` | Yes | Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) |
|
|
26
|
+
| Kimi K2.7 Code | `moonshotai/Kimi-K2.7-Code` | Yes | Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) |
|
|
27
|
+
| Llama 3.3 70B FP8 | `amd/Llama-3.3-70B-Instruct-FP8-KV` | No | |
|
|
28
|
+
| Llama 3.3 70B Instruct | `meta-llama/Llama-3.3-70B-Instruct` | No | |
|
|
29
|
+
| MiniMax M3 MXFP8 | `MiniMaxAI/MiniMax-M3-MXFP8` | Yes | Reasoning via `chat_template_kwargs.enable_thinking`; returns `reasoning_content` field |
|
|
30
|
+
| Qwen 3.6 27B NVFP4 | `unsloth/Qwen3.6-27B-NVFP4` | Yes | `enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass) |
|
|
31
|
+
| Qwen 3.6 35B A3B NVFP4 | `unsloth/Qwen3.6-35B-A3B-NVFP4` | Yes | `enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass) |
|
|
32
|
+
<!-- MODELS_TABLE_END -->
|
|
33
|
+
|
|
34
|
+
## Installation
|
|
35
|
+
|
|
36
|
+
### Option 1: Using `pi install` (Recommended)
|
|
37
|
+
|
|
38
|
+
Install directly from GitHub:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
pi install https://github.com/monotykamary/pi-makora-provider
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Then set your API key and run pi:
|
|
45
|
+
```bash
|
|
46
|
+
# Recommended: add to auth.json
|
|
47
|
+
# See Authentication section below
|
|
48
|
+
|
|
49
|
+
# Or set as environment variable
|
|
50
|
+
export MAKORA_OPTIMIZE_TOKEN=your-api-key-here
|
|
51
|
+
|
|
52
|
+
pi
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### Option 2: Manual Clone
|
|
56
|
+
|
|
57
|
+
1. Clone this repository:
|
|
58
|
+
```bash
|
|
59
|
+
git clone https://github.com/monotykamary/pi-makora-provider.git
|
|
60
|
+
cd pi-makora-provider
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
2. Set your Makora API key:
|
|
64
|
+
```bash
|
|
65
|
+
# Recommended: add to auth.json
|
|
66
|
+
# See Authentication section below
|
|
67
|
+
|
|
68
|
+
# Or set as environment variable
|
|
69
|
+
export MAKORA_OPTIMIZE_TOKEN=your-api-key-here
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
3. Run pi with the extension:
|
|
73
|
+
```bash
|
|
74
|
+
pi -e /path/to/pi-makora-provider
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Setup
|
|
78
|
+
|
|
79
|
+
### API Key
|
|
80
|
+
|
|
81
|
+
Add your Makora API key to `~/.pi/agent/auth.json` (recommended):
|
|
82
|
+
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"makora": { "type": "api_key", "key": "your-api-key" }
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Or set it as an environment variable:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
export MAKORA_OPTIMIZE_TOKEN=your-api-key
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Usage
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
pi -e /path/to/pi-makora-provider
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Then use `/model` to select from available Makora models.
|
|
102
|
+
|
|
103
|
+
## Model Resolution
|
|
104
|
+
|
|
105
|
+
Models are discovered from the Makora `/v1/models` API and stored in `models.json`. Custom definitions and overrides are layered via `patch.json` and `custom-models.json`.
|
|
106
|
+
|
|
107
|
+
| File | Purpose |
|
|
108
|
+
|---|---|
|
|
109
|
+
| `models.json` | Auto-generated from Makora API (model discovery). Regenerated by `node scripts/update-models.js` — do not edit manually |
|
|
110
|
+
| `patch.json` | Manual overrides (reasoning, compat, notes, limits, etc.) applied on top of `models.json` |
|
|
111
|
+
| `custom-models.json` | Models not available via the API (e.g. per-slug endpoint models) |
|
|
112
|
+
|
|
113
|
+
Models are loaded by merging `models.json` → apply `patch.json` → merge `custom-models.json`.
|
|
114
|
+
|
|
115
|
+
## Adding Custom Models
|
|
116
|
+
|
|
117
|
+
Do **not** edit `models.json` directly — it is auto-generated from the API. To customize:
|
|
118
|
+
|
|
119
|
+
- **Override an existing model**: Add entries to `patch.json` (reasoning, compat, notes, maxTokens, etc.)
|
|
120
|
+
- **Add new models not in the API**: Add entries to `custom-models.json`:
|
|
121
|
+
|
|
122
|
+
```json
|
|
123
|
+
[
|
|
124
|
+
{
|
|
125
|
+
"id": "my-org/my-model",
|
|
126
|
+
"name": "My Custom Model",
|
|
127
|
+
"reasoning": false,
|
|
128
|
+
"input": ["text"],
|
|
129
|
+
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
|
|
130
|
+
"contextWindow": 131072,
|
|
131
|
+
"maxTokens": 16384,
|
|
132
|
+
"baseUrl": "https://inference.makora.com/my-model-slug/v1"
|
|
133
|
+
}
|
|
134
|
+
]
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
## API Notes
|
|
138
|
+
|
|
139
|
+
- Each model is accessible at `https://inference.makora.com/v1/chat/completions` (unified endpoint)
|
|
140
|
+
- Models with a `baseUrl` override use their per-slug endpoint instead
|
|
141
|
+
- The API is OpenAI-compatible (chat completions format)
|
|
142
|
+
- All models are hosted on vLLM
|
|
143
|
+
- The `developer` role is not supported (prompts are silently dropped); `supportsDeveloperRole` is set to `false` for all models
|
|
144
|
+
|
|
145
|
+
## vLLM Caveats
|
|
146
|
+
|
|
147
|
+
These issues are common to all vLLM-hosted providers and affect Makora models:
|
|
148
|
+
|
|
149
|
+
- **GLM 5.1 tool calling**: vLLM's streaming tool call handling is broken for GLM — the model outputs Zhipu's native `<tool_call>` XML format as raw text. The `message_end` hook parses this into `toolCall` blocks so pi can execute the tools. A `context` hook then strips `tool_calls` from assistant messages before follow-up requests, converting them back to `<tool_call>` text to avoid a ZAI/vLLM server crash (500: `'str object' has no attribute 'items'`) that occurs when any assistant message contains a `tool_calls` field. If upstream fixes both the streaming parser and the 500 crash, the `message_end` hook gracefully skips (existing valid `toolCall` blocks are preserved), and the `context` hook's text-stripping is harmless (GLM natively understands `<tool_call>` format in conversation history).
|
|
150
|
+
|
|
151
|
+
- **Kimi K2.6 + Qwen 3.6 tool calling**: vLLM's streaming tool call handling is broken or missing for these models. The `before_provider_request` hook sets `tool_choice: "none"` and `skip_special_tokens: false` so the model's tool call tokens pass through as plain text. The `message_end` hook then re-parses into `toolCall` blocks:
|
|
152
|
+
|
|
153
|
+
- **Kimi K2.6**: Uses `<|tool_call_begin|>...<|tool_call_end|>` tokens. Makora's vLLM is missing both `--enable-auto-tool-choice` and `--tool-call-parser` for this model.
|
|
154
|
+
- **Qwen 3.6**: Uses hermes-style `<function=...>` XML, sometimes with `█` delimiters. Same vLLM flag limitation as Kimi.
|
|
155
|
+
|
|
156
|
+
- **GLM 5.1 CoT leak**: On some vLLM builds, disabling reasoning may still leak chain-of-thought into `content` terminated by a ``` marker. See [vllm-project/vllm#31319](https://github.com/vllm-project/vllm/issues/31319).
|
|
157
|
+
|
|
158
|
+
- **DeepSeek V4 reasoning**: The official DeepSeek API uses `thinking: { type: "enabled" }` which Makora's vLLM silently ignores. The `before_provider_request` hook rewrites the payload to use vLLM-native params instead:
|
|
159
|
+
- **DS V4 Pro**: `chat_template_kwargs: { thinking: true }`. Returns `reasoning_content`.
|
|
160
|
+
- **DS V4 Flash**: `include_reasoning: true` + `chat_template_kwargs: { thinking: true }`. `include_reasoning` alone returns `reasoning: null` on this vLLM build — both params are required. Returns `reasoning`.
|
|
161
|
+
- **GLM 5.1 reasoning**: Returns `reasoning_content` (not `reasoning`). pi's OpenAI completions handler checks `reasoning_content` first, so this is handled correctly.
|
|
162
|
+
- **MiniMax M3 reasoning**: Uses `chat_template_kwargs.enable_thinking` to toggle thinking (not `chat_template_kwargs.thinking` like DeepSeek). The `before_provider_request` hook rewrites the DeepSeek API-style `thinking` param into vLLM-native `chat_template_kwargs: { enable_thinking: true }`. Returns `reasoning_content` field.
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
[
|
|
2
|
+
{
|
|
3
|
+
"id": "amd/Llama-3.3-70B-Instruct-FP8-KV",
|
|
4
|
+
"name": "Llama 3.3 70B FP8",
|
|
5
|
+
"reasoning": false,
|
|
6
|
+
"input": [
|
|
7
|
+
"text"
|
|
8
|
+
],
|
|
9
|
+
"cost": {
|
|
10
|
+
"input": 0,
|
|
11
|
+
"output": 0,
|
|
12
|
+
"cacheRead": 0,
|
|
13
|
+
"cacheWrite": 0
|
|
14
|
+
},
|
|
15
|
+
"contextWindow": 128000,
|
|
16
|
+
"maxTokens": 16384,
|
|
17
|
+
"baseUrl": "https://inference.makora.com/llama3-3-70b-instruct-fp8/v1",
|
|
18
|
+
"compat": {
|
|
19
|
+
"supportsDeveloperRole": false,
|
|
20
|
+
"supportsStore": false,
|
|
21
|
+
"maxTokensField": "max_completion_tokens"
|
|
22
|
+
}
|
|
23
|
+
}
|
|
24
|
+
]
|
package/index.ts
ADDED
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Makora Provider Extension
|
|
3
|
+
*
|
|
4
|
+
* Registers Makora (inference.makora.com) as a custom provider using the
|
|
5
|
+
* OpenAI completions API.
|
|
6
|
+
*
|
|
7
|
+
* Makora is an inference optimization platform serving open-weight models via
|
|
8
|
+
* a unified OpenAI-compatible API at https://inference.makora.com/v1. Each
|
|
9
|
+
* model is hosted on vLLM and speaks the standard OpenAI chat completions
|
|
10
|
+
* protocol. Most models use the shared provider baseUrl; models not yet
|
|
11
|
+
* on the unified endpoint retain a per-model `baseUrl` override.
|
|
12
|
+
*
|
|
13
|
+
* Model resolution strategy: static models.json merged with custom-models.json
|
|
14
|
+
*
|
|
15
|
+
* Reasoning notes:
|
|
16
|
+
* - DeepSeek V4 Pro: reasoning via chat_template_kwargs.thinking on vLLM.
|
|
17
|
+
* pi sends thinking: { type } via the "deepseek" thinkingFormat, but vLLM
|
|
18
|
+
* ignores that — the before_provider_request hook rewrites the payload to
|
|
19
|
+
* use chat_template_kwargs: { thinking: true } instead.
|
|
20
|
+
* Returns reasoning_content field.
|
|
21
|
+
* - DeepSeek V4 Flash: reasoning via include_reasoning +
|
|
22
|
+
* chat_template_kwargs.thinking on vLLM.
|
|
23
|
+
* The before_provider_request hook rewrites the payload to replace
|
|
24
|
+
* thinking: { type } with include_reasoning: true +
|
|
25
|
+
* chat_template_kwargs: { thinking: true }.
|
|
26
|
+
* include_reasoning alone returns reasoning: null on this vLLM build.
|
|
27
|
+
* Returns reasoning field.
|
|
28
|
+
* - GLM 5.1 FP8: reasoning via chat_template_kwargs.enable_thinking.
|
|
29
|
+
* NOTE: vLLM may leak chain-of-thought into content instead of the
|
|
30
|
+
* reasoning field on some builds. See
|
|
31
|
+
* https://github.com/vllm-project/vllm/issues/31319
|
|
32
|
+
* Also: vLLM's streaming parser omits delta.tool_calls when the model
|
|
33
|
+
* calls tools, finishing with finish_reason: "tool_calls" but an empty
|
|
34
|
+
* delta. Setting zaiToolStream: true sends tool_stream: true in the
|
|
35
|
+
* request, which forces vLLM to use the explicit tool streaming path
|
|
36
|
+
* that correctly emits tool call chunks.
|
|
37
|
+
* - GPT-OSS 120B: reasoning always on; returns `reasoning` field.
|
|
38
|
+
* - Kimi K2.6 NVFP4 / Kimi K2.7 Code: reasoning always on by default;
|
|
39
|
+
* returns `reasoning` field. Can be toggled via enable_thinking.
|
|
40
|
+
* - Qwen 3.6 models: reasoning via chat_template_kwargs.enable_thinking;
|
|
41
|
+
* returns `reasoning` field.
|
|
42
|
+
* - MiniMax M3 MXFP8: reasoning via chat_template_kwargs.enable_thinking;
|
|
43
|
+
* returns reasoning_content field.
|
|
44
|
+
* - Llama 3.3 70B: not a reasoning model.
|
|
45
|
+
*
|
|
46
|
+
* Developer role is NOT supported by any of the chat templates on Makora's
|
|
47
|
+
* vLLM deployment (prompts with role: "developer" are silently dropped).
|
|
48
|
+
* supportsDeveloperRole is set to false for all models.
|
|
49
|
+
*
|
|
50
|
+
* Usage:
|
|
51
|
+
* # Option 1: Store in auth.json (recommended)
|
|
52
|
+
* # Add to ~/.pi/agent/auth.json:
|
|
53
|
+
* # "makora": { "type": "api_key", "key": "your-api-key" }
|
|
54
|
+
*
|
|
55
|
+
* # Option 2: Set as environment variable
|
|
56
|
+
* export MAKORA_OPTIMIZE_TOKEN=your-api-key
|
|
57
|
+
*
|
|
58
|
+
* # Run pi with the extension
|
|
59
|
+
* pi -e /path/to/pi-makora-provider
|
|
60
|
+
*
|
|
61
|
+
* Then use /model to select from available models.
|
|
62
|
+
*/
|
|
63
|
+
|
|
64
|
+
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
|
|
65
|
+
import modelsData from "./models.json" with { type: "json" };
|
|
66
|
+
import customModelsData from "./custom-models.json" with { type: "json" };
|
|
67
|
+
import patchData from "./patch.json" with { type: "json" };
|
|
68
|
+
|
|
69
|
+
// Types
|
|
70
|
+
|
|
71
|
+
interface JsonModel {
|
|
72
|
+
id: string;
|
|
73
|
+
name: string;
|
|
74
|
+
reasoning: boolean;
|
|
75
|
+
input: string[];
|
|
76
|
+
cost: {
|
|
77
|
+
input: number;
|
|
78
|
+
output: number;
|
|
79
|
+
cacheRead: number;
|
|
80
|
+
cacheWrite: number;
|
|
81
|
+
};
|
|
82
|
+
contextWindow: number;
|
|
83
|
+
maxTokens: number;
|
|
84
|
+
baseUrl?: string;
|
|
85
|
+
notes?: string;
|
|
86
|
+
thinkingLevelMap?: Record<string, string | null>;
|
|
87
|
+
headers?: Record<string, string>;
|
|
88
|
+
vision?: {
|
|
89
|
+
maxImagesPerRequest?: number;
|
|
90
|
+
};
|
|
91
|
+
compat?: {
|
|
92
|
+
supportsDeveloperRole?: boolean;
|
|
93
|
+
supportsStore?: boolean;
|
|
94
|
+
maxTokensField?: "max_completion_tokens" | "max_tokens";
|
|
95
|
+
thinkingFormat?:
|
|
96
|
+
| "openai"
|
|
97
|
+
| "openrouter"
|
|
98
|
+
| "deepseek"
|
|
99
|
+
| "together"
|
|
100
|
+
| "zai"
|
|
101
|
+
| "qwen"
|
|
102
|
+
| "qwen-chat-template";
|
|
103
|
+
supportsReasoningEffort?: boolean;
|
|
104
|
+
requiresReasoningContentOnAssistantMessages?: boolean;
|
|
105
|
+
requiresToolResultName?: boolean;
|
|
106
|
+
requiresAssistantAfterToolResult?: boolean;
|
|
107
|
+
cacheControlFormat?: "anthropic";
|
|
108
|
+
};
|
|
109
|
+
}
|
|
110
|
+
|
|
111
|
+
interface PatchEntry {
|
|
112
|
+
name?: string;
|
|
113
|
+
reasoning?: boolean;
|
|
114
|
+
input?: string[];
|
|
115
|
+
cost?: {
|
|
116
|
+
input?: number;
|
|
117
|
+
output?: number;
|
|
118
|
+
cacheRead?: number;
|
|
119
|
+
cacheWrite?: number;
|
|
120
|
+
};
|
|
121
|
+
contextWindow?: number;
|
|
122
|
+
maxTokens?: number;
|
|
123
|
+
baseUrl?: string;
|
|
124
|
+
notes?: string;
|
|
125
|
+
thinkingLevelMap?: Record<string, string | null>;
|
|
126
|
+
headers?: Record<string, string>;
|
|
127
|
+
compat?: Record<string, unknown>;
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
type PatchMap = Record<string, PatchEntry>;
|
|
131
|
+
|
|
132
|
+
// Patch Application
|
|
133
|
+
|
|
134
|
+
function applyPatch(model: JsonModel, patch: PatchEntry): JsonModel {
|
|
135
|
+
const result = { ...model };
|
|
136
|
+
|
|
137
|
+
if (patch.name !== undefined) result.name = patch.name;
|
|
138
|
+
if (patch.reasoning !== undefined) result.reasoning = patch.reasoning;
|
|
139
|
+
if (patch.input !== undefined) result.input = patch.input;
|
|
140
|
+
if (patch.contextWindow !== undefined) result.contextWindow = patch.contextWindow;
|
|
141
|
+
if (patch.maxTokens !== undefined) result.maxTokens = patch.maxTokens;
|
|
142
|
+
if (patch.baseUrl !== undefined) result.baseUrl = patch.baseUrl;
|
|
143
|
+
if (patch.notes !== undefined) result.notes = patch.notes;
|
|
144
|
+
if (patch.thinkingLevelMap !== undefined) result.thinkingLevelMap = { ...patch.thinkingLevelMap };
|
|
145
|
+
if (patch.headers !== undefined) result.headers = { ...patch.headers };
|
|
146
|
+
|
|
147
|
+
if (patch.cost) {
|
|
148
|
+
result.cost = {
|
|
149
|
+
input: patch.cost.input ?? result.cost.input,
|
|
150
|
+
output: patch.cost.output ?? result.cost.output,
|
|
151
|
+
cacheRead: patch.cost.cacheRead ?? result.cost.cacheRead,
|
|
152
|
+
cacheWrite: patch.cost.cacheWrite ?? result.cost.cacheWrite,
|
|
153
|
+
};
|
|
154
|
+
}
|
|
155
|
+
if (patch.compat) {
|
|
156
|
+
result.compat = { ...(result.compat || {}), ...patch.compat };
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
if (!result.reasoning && result.compat?.thinkingFormat) {
|
|
160
|
+
delete result.compat.thinkingFormat;
|
|
161
|
+
}
|
|
162
|
+
if (result.compat && Object.keys(result.compat).length === 0) {
|
|
163
|
+
delete result.compat;
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
return result;
|
|
167
|
+
}
|
|
168
|
+
|
|
169
|
+
/** Merge static models with any user-defined custom models */
|
|
170
|
+
function buildModels(
|
|
171
|
+
base: JsonModel[],
|
|
172
|
+
custom: JsonModel[],
|
|
173
|
+
patch: PatchMap
|
|
174
|
+
): JsonModel[] {
|
|
175
|
+
const modelMap = new Map<string, JsonModel>();
|
|
176
|
+
|
|
177
|
+
for (const model of base) {
|
|
178
|
+
modelMap.set(model.id, model);
|
|
179
|
+
}
|
|
180
|
+
|
|
181
|
+
for (const [id, patchEntry] of Object.entries(patch)) {
|
|
182
|
+
const existing = modelMap.get(id);
|
|
183
|
+
if (existing) {
|
|
184
|
+
modelMap.set(id, applyPatch(existing, patchEntry));
|
|
185
|
+
}
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
for (const model of custom) {
|
|
189
|
+
const existing = modelMap.get(model.id);
|
|
190
|
+
const patchEntry = patch[model.id];
|
|
191
|
+
if (existing && patchEntry) {
|
|
192
|
+
modelMap.set(model.id, applyPatch(model, patchEntry));
|
|
193
|
+
} else if (existing) {
|
|
194
|
+
modelMap.set(model.id, model);
|
|
195
|
+
} else if (patchEntry) {
|
|
196
|
+
modelMap.set(model.id, applyPatch(model, patchEntry));
|
|
197
|
+
} else {
|
|
198
|
+
modelMap.set(model.id, model);
|
|
199
|
+
}
|
|
200
|
+
}
|
|
201
|
+
|
|
202
|
+
return Array.from(modelMap.values());
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
// Extension Entry Point
|
|
206
|
+
|
|
207
|
+
const PROVIDER_ID = "makora";
|
|
208
|
+
const BASE_URL = "https://inference.makora.com/v1";
|
|
209
|
+
|
|
210
|
+
const DS_PRO_ID = "deepseek-ai/DeepSeek-V4-Pro";
|
|
211
|
+
const DS_FLASH_ID = "deepseek-ai/DeepSeek-V4-Flash";
|
|
212
|
+
const MINIMAX_M3_ID = "MiniMaxAI/MiniMax-M3-MXFP8";
|
|
213
|
+
|
|
214
|
+
const DS_VLLM_MODELS = new Set([DS_PRO_ID, DS_FLASH_ID]);
|
|
215
|
+
const ENABLE_THINKING_VLLM_MODELS = new Set([MINIMAX_M3_ID]);
|
|
216
|
+
|
|
217
|
+
/**
|
|
218
|
+
* Intercept the request payload for models that need vLLM-specific thinking
|
|
219
|
+
* param rewrites.
|
|
220
|
+
*
|
|
221
|
+
* pi's "deepseek" thinkingFormat sends `thinking: { type: "enabled" }` which
|
|
222
|
+
* is the official DeepSeek API format — but Makora's vLLM deployment ignores
|
|
223
|
+
* it. vLLM requires different params depending on the model:
|
|
224
|
+
* - DS V4 Pro: `chat_template_kwargs: { thinking: true }` + `reasoning_effort`
|
|
225
|
+
* - DS V4 Flash: `include_reasoning: true` + `chat_template_kwargs: { thinking: true }`
|
|
226
|
+
* + `reasoning_effort`. `include_reasoning` alone returns `reasoning: null`
|
|
227
|
+
* on this vLLM build — both params are required.
|
|
228
|
+
* - MiniMax M3: `chat_template_kwargs: { enable_thinking: true }` +
|
|
229
|
+
* `reasoning_effort`. Returns `reasoning_content` field.
|
|
230
|
+
*
|
|
231
|
+
* This hook rewrites the payload accordingly.
|
|
232
|
+
*/
|
|
233
|
+
function rewriteVllmPayload(payload: Record<string, unknown>): Record<string, unknown> {
|
|
234
|
+
const model = payload.model as string | undefined;
|
|
235
|
+
if (!model) return payload;
|
|
236
|
+
|
|
237
|
+
const p = { ...payload };
|
|
238
|
+
|
|
239
|
+
if (DS_VLLM_MODELS.has(model)) {
|
|
240
|
+
// Remove the DeepSeek API-style `thinking` param that vLLM ignores
|
|
241
|
+
delete p.thinking;
|
|
242
|
+
|
|
243
|
+
if (model === DS_PRO_ID) {
|
|
244
|
+
// DS Pro: chat_template_kwargs.thinking + reasoning_effort
|
|
245
|
+
const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
|
|
246
|
+
p.chat_template_kwargs = { ...ctq, thinking: true };
|
|
247
|
+
} else if (model === DS_FLASH_ID) {
|
|
248
|
+
// DS Flash: include_reasoning + chat_template_kwargs.thinking + reasoning_effort
|
|
249
|
+
// vLLM requires *both* include_reasoning and chat_template_kwargs.thinking:
|
|
250
|
+
// include_reasoning alone returns reasoning: null.
|
|
251
|
+
p.include_reasoning = true;
|
|
252
|
+
const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
|
|
253
|
+
p.chat_template_kwargs = { ...ctq, thinking: true };
|
|
254
|
+
}
|
|
255
|
+
} else if (ENABLE_THINKING_VLLM_MODELS.has(model)) {
|
|
256
|
+
// Models using chat_template_kwargs.enable_thinking (e.g. MiniMax M3)
|
|
257
|
+
delete p.thinking;
|
|
258
|
+
const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
|
|
259
|
+
p.chat_template_kwargs = { ...ctq, enable_thinking: true };
|
|
260
|
+
}
|
|
261
|
+
|
|
262
|
+
return p;
|
|
263
|
+
}
|
|
264
|
+
|
|
265
|
+
export default function (pi: ExtensionAPI) {
|
|
266
|
+
const embeddedModels = modelsData as JsonModel[];
|
|
267
|
+
const customModels = customModelsData as JsonModel[];
|
|
268
|
+
const patches = patchData as PatchMap;
|
|
269
|
+
|
|
270
|
+
const models = buildModels(embeddedModels, customModels, patches);
|
|
271
|
+
|
|
272
|
+
// apiKey resolution order: auth.json ("makora" key) → MAKORA_OPTIMIZE_TOKEN env var
|
|
273
|
+
pi.registerProvider(PROVIDER_ID, {
|
|
274
|
+
name: "Makora",
|
|
275
|
+
baseUrl: BASE_URL,
|
|
276
|
+
apiKey: "$MAKORA_OPTIMIZE_TOKEN",
|
|
277
|
+
api: "openai-completions",
|
|
278
|
+
models,
|
|
279
|
+
});
|
|
280
|
+
|
|
281
|
+
pi.on("before_provider_request", (event) => {
|
|
282
|
+
const payload = event.payload as Record<string, unknown> | undefined;
|
|
283
|
+
if (!payload || typeof payload.model !== "string") return;
|
|
284
|
+
return rewriteVllmPayload(payload);
|
|
285
|
+
});
|
|
286
|
+
}
|
|
287
|
+
|
package/models.json
ADDED
|
@@ -0,0 +1,212 @@
|
|
|
1
|
+
[
|
|
2
|
+
{
|
|
3
|
+
"id": "deepseek-ai/DeepSeek-V4-Flash",
|
|
4
|
+
"name": "DeepSeek V4 Flash",
|
|
5
|
+
"reasoning": false,
|
|
6
|
+
"input": [
|
|
7
|
+
"text"
|
|
8
|
+
],
|
|
9
|
+
"cost": {
|
|
10
|
+
"input": 0,
|
|
11
|
+
"output": 0,
|
|
12
|
+
"cacheRead": 0,
|
|
13
|
+
"cacheWrite": 0
|
|
14
|
+
},
|
|
15
|
+
"contextWindow": 1048576,
|
|
16
|
+
"maxTokens": 0,
|
|
17
|
+
"compat": {
|
|
18
|
+
"supportsDeveloperRole": false,
|
|
19
|
+
"supportsStore": false,
|
|
20
|
+
"maxTokensField": "max_completion_tokens"
|
|
21
|
+
}
|
|
22
|
+
},
|
|
23
|
+
{
|
|
24
|
+
"id": "deepseek-ai/DeepSeek-V4-Pro",
|
|
25
|
+
"name": "DeepSeek V4 Pro",
|
|
26
|
+
"reasoning": false,
|
|
27
|
+
"input": [
|
|
28
|
+
"text"
|
|
29
|
+
],
|
|
30
|
+
"cost": {
|
|
31
|
+
"input": 0,
|
|
32
|
+
"output": 0,
|
|
33
|
+
"cacheRead": 0,
|
|
34
|
+
"cacheWrite": 0
|
|
35
|
+
},
|
|
36
|
+
"contextWindow": 1048576,
|
|
37
|
+
"maxTokens": 0,
|
|
38
|
+
"compat": {
|
|
39
|
+
"supportsDeveloperRole": false,
|
|
40
|
+
"supportsStore": false,
|
|
41
|
+
"maxTokensField": "max_completion_tokens"
|
|
42
|
+
}
|
|
43
|
+
},
|
|
44
|
+
{
|
|
45
|
+
"id": "meta-llama/Llama-3.3-70B-Instruct",
|
|
46
|
+
"name": "Llama 3.3 70B Instruct",
|
|
47
|
+
"reasoning": false,
|
|
48
|
+
"input": [
|
|
49
|
+
"text"
|
|
50
|
+
],
|
|
51
|
+
"cost": {
|
|
52
|
+
"input": 0,
|
|
53
|
+
"output": 0,
|
|
54
|
+
"cacheRead": 0,
|
|
55
|
+
"cacheWrite": 0
|
|
56
|
+
},
|
|
57
|
+
"contextWindow": 131072,
|
|
58
|
+
"maxTokens": 0,
|
|
59
|
+
"compat": {
|
|
60
|
+
"supportsDeveloperRole": false,
|
|
61
|
+
"supportsStore": false,
|
|
62
|
+
"maxTokensField": "max_completion_tokens"
|
|
63
|
+
}
|
|
64
|
+
},
|
|
65
|
+
{
|
|
66
|
+
"id": "MiniMaxAI/MiniMax-M3-MXFP8",
|
|
67
|
+
"name": "MiniMax M3 MXFP8",
|
|
68
|
+
"reasoning": false,
|
|
69
|
+
"input": [
|
|
70
|
+
"text"
|
|
71
|
+
],
|
|
72
|
+
"cost": {
|
|
73
|
+
"input": 0,
|
|
74
|
+
"output": 0,
|
|
75
|
+
"cacheRead": 0,
|
|
76
|
+
"cacheWrite": 0
|
|
77
|
+
},
|
|
78
|
+
"contextWindow": 1048576,
|
|
79
|
+
"maxTokens": 0,
|
|
80
|
+
"compat": {
|
|
81
|
+
"supportsDeveloperRole": false,
|
|
82
|
+
"supportsStore": false,
|
|
83
|
+
"maxTokensField": "max_completion_tokens"
|
|
84
|
+
}
|
|
85
|
+
},
|
|
86
|
+
{
|
|
87
|
+
"id": "moonshotai/Kimi-K2.7-Code",
|
|
88
|
+
"name": "Kimi K2.7 Code",
|
|
89
|
+
"reasoning": false,
|
|
90
|
+
"input": [
|
|
91
|
+
"text"
|
|
92
|
+
],
|
|
93
|
+
"cost": {
|
|
94
|
+
"input": 0,
|
|
95
|
+
"output": 0,
|
|
96
|
+
"cacheRead": 0,
|
|
97
|
+
"cacheWrite": 0
|
|
98
|
+
},
|
|
99
|
+
"contextWindow": 262144,
|
|
100
|
+
"maxTokens": 0,
|
|
101
|
+
"compat": {
|
|
102
|
+
"supportsDeveloperRole": false,
|
|
103
|
+
"supportsStore": false,
|
|
104
|
+
"maxTokensField": "max_completion_tokens"
|
|
105
|
+
}
|
|
106
|
+
},
|
|
107
|
+
{
|
|
108
|
+
"id": "nvidia/Kimi-K2.6-NVFP4",
|
|
109
|
+
"name": "Kimi K2.6 NVFP4",
|
|
110
|
+
"reasoning": false,
|
|
111
|
+
"input": [
|
|
112
|
+
"text"
|
|
113
|
+
],
|
|
114
|
+
"cost": {
|
|
115
|
+
"input": 0,
|
|
116
|
+
"output": 0,
|
|
117
|
+
"cacheRead": 0,
|
|
118
|
+
"cacheWrite": 0
|
|
119
|
+
},
|
|
120
|
+
"contextWindow": 262144,
|
|
121
|
+
"maxTokens": 0,
|
|
122
|
+
"compat": {
|
|
123
|
+
"supportsDeveloperRole": false,
|
|
124
|
+
"supportsStore": false,
|
|
125
|
+
"maxTokensField": "max_completion_tokens"
|
|
126
|
+
}
|
|
127
|
+
},
|
|
128
|
+
{
|
|
129
|
+
"id": "openai/gpt-oss-120b",
|
|
130
|
+
"name": "GPT-OSS 120B",
|
|
131
|
+
"reasoning": false,
|
|
132
|
+
"input": [
|
|
133
|
+
"text"
|
|
134
|
+
],
|
|
135
|
+
"cost": {
|
|
136
|
+
"input": 0,
|
|
137
|
+
"output": 0,
|
|
138
|
+
"cacheRead": 0,
|
|
139
|
+
"cacheWrite": 0
|
|
140
|
+
},
|
|
141
|
+
"contextWindow": 131072,
|
|
142
|
+
"maxTokens": 0,
|
|
143
|
+
"compat": {
|
|
144
|
+
"supportsDeveloperRole": false,
|
|
145
|
+
"supportsStore": false,
|
|
146
|
+
"maxTokensField": "max_completion_tokens"
|
|
147
|
+
}
|
|
148
|
+
},
|
|
149
|
+
{
|
|
150
|
+
"id": "unsloth/Qwen3.6-27B-NVFP4",
|
|
151
|
+
"name": "Qwen 3.6 27B NVFP4",
|
|
152
|
+
"reasoning": false,
|
|
153
|
+
"input": [
|
|
154
|
+
"text"
|
|
155
|
+
],
|
|
156
|
+
"cost": {
|
|
157
|
+
"input": 0,
|
|
158
|
+
"output": 0,
|
|
159
|
+
"cacheRead": 0,
|
|
160
|
+
"cacheWrite": 0
|
|
161
|
+
},
|
|
162
|
+
"contextWindow": 262144,
|
|
163
|
+
"maxTokens": 0,
|
|
164
|
+
"compat": {
|
|
165
|
+
"supportsDeveloperRole": false,
|
|
166
|
+
"supportsStore": false,
|
|
167
|
+
"maxTokensField": "max_completion_tokens"
|
|
168
|
+
}
|
|
169
|
+
},
|
|
170
|
+
{
|
|
171
|
+
"id": "unsloth/Qwen3.6-35B-A3B-NVFP4",
|
|
172
|
+
"name": "Qwen 3.6 35B A3B NVFP4",
|
|
173
|
+
"reasoning": false,
|
|
174
|
+
"input": [
|
|
175
|
+
"text"
|
|
176
|
+
],
|
|
177
|
+
"cost": {
|
|
178
|
+
"input": 0,
|
|
179
|
+
"output": 0,
|
|
180
|
+
"cacheRead": 0,
|
|
181
|
+
"cacheWrite": 0
|
|
182
|
+
},
|
|
183
|
+
"contextWindow": 262144,
|
|
184
|
+
"maxTokens": 0,
|
|
185
|
+
"compat": {
|
|
186
|
+
"supportsDeveloperRole": false,
|
|
187
|
+
"supportsStore": false,
|
|
188
|
+
"maxTokensField": "max_completion_tokens"
|
|
189
|
+
}
|
|
190
|
+
},
|
|
191
|
+
{
|
|
192
|
+
"id": "zai-org/GLM-5.1-FP8",
|
|
193
|
+
"name": "GLM 5.1 FP8",
|
|
194
|
+
"reasoning": false,
|
|
195
|
+
"input": [
|
|
196
|
+
"text"
|
|
197
|
+
],
|
|
198
|
+
"cost": {
|
|
199
|
+
"input": 0,
|
|
200
|
+
"output": 0,
|
|
201
|
+
"cacheRead": 0,
|
|
202
|
+
"cacheWrite": 0
|
|
203
|
+
},
|
|
204
|
+
"contextWindow": 202752,
|
|
205
|
+
"maxTokens": 0,
|
|
206
|
+
"compat": {
|
|
207
|
+
"supportsDeveloperRole": false,
|
|
208
|
+
"supportsStore": false,
|
|
209
|
+
"maxTokensField": "max_completion_tokens"
|
|
210
|
+
}
|
|
211
|
+
}
|
|
212
|
+
]
|
package/package.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "pi-makora-provider",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Makora provider extension for pi - Access DeepSeek V4, GLM 5.1, Kimi K2.6, Llama 3.3, Qwen 3.6, and more through the Makora inference API",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"main": "index.ts",
|
|
7
|
+
"scripts": {
|
|
8
|
+
"clean": "echo 'nothing to clean'",
|
|
9
|
+
"build": "echo 'nothing to build'",
|
|
10
|
+
"check": "echo 'nothing to check'",
|
|
11
|
+
"update-models": "node scripts/update-models.js"
|
|
12
|
+
},
|
|
13
|
+
"keywords": [
|
|
14
|
+
"pi",
|
|
15
|
+
"extension",
|
|
16
|
+
"provider",
|
|
17
|
+
"makora",
|
|
18
|
+
"ai",
|
|
19
|
+
"llm",
|
|
20
|
+
"deepseek",
|
|
21
|
+
"glm",
|
|
22
|
+
"kimi",
|
|
23
|
+
"llama",
|
|
24
|
+
"qwen"
|
|
25
|
+
],
|
|
26
|
+
"author": "monotykamary",
|
|
27
|
+
"license": "MIT",
|
|
28
|
+
"files": [
|
|
29
|
+
"index.ts",
|
|
30
|
+
"models.json",
|
|
31
|
+
"custom-models.json",
|
|
32
|
+
"patch.json",
|
|
33
|
+
"README.md",
|
|
34
|
+
"LICENSE"
|
|
35
|
+
],
|
|
36
|
+
"pi": {
|
|
37
|
+
"extensions": [
|
|
38
|
+
"./index.ts"
|
|
39
|
+
]
|
|
40
|
+
}
|
|
41
|
+
}
|
package/patch.json
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
{
|
|
2
|
+
"deepseek-ai/DeepSeek-V4-Flash": {
|
|
3
|
+
"reasoning": true,
|
|
4
|
+
"notes": "`include_reasoning` + `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning` field",
|
|
5
|
+
"thinkingLevelMap": {
|
|
6
|
+
"minimal": null,
|
|
7
|
+
"low": null,
|
|
8
|
+
"medium": null,
|
|
9
|
+
"high": "high",
|
|
10
|
+
"xhigh": "max"
|
|
11
|
+
},
|
|
12
|
+
"compat": {
|
|
13
|
+
"thinkingFormat": "deepseek",
|
|
14
|
+
"supportsReasoningEffort": true
|
|
15
|
+
}
|
|
16
|
+
},
|
|
17
|
+
"deepseek-ai/DeepSeek-V4-Pro": {
|
|
18
|
+
"reasoning": true,
|
|
19
|
+
"notes": "`chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning_content` field",
|
|
20
|
+
"thinkingLevelMap": {
|
|
21
|
+
"minimal": null,
|
|
22
|
+
"low": null,
|
|
23
|
+
"medium": null,
|
|
24
|
+
"high": "high",
|
|
25
|
+
"xhigh": "max"
|
|
26
|
+
},
|
|
27
|
+
"compat": {
|
|
28
|
+
"thinkingFormat": "deepseek",
|
|
29
|
+
"supportsReasoningEffort": true,
|
|
30
|
+
"requiresReasoningContentOnAssistantMessages": true
|
|
31
|
+
}
|
|
32
|
+
},
|
|
33
|
+
"nvidia/Kimi-K2.6-NVFP4": {
|
|
34
|
+
"reasoning": true,
|
|
35
|
+
"input": [
|
|
36
|
+
"text",
|
|
37
|
+
"image"
|
|
38
|
+
],
|
|
39
|
+
"notes": "Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass)",
|
|
40
|
+
"thinkingLevelMap": {
|
|
41
|
+
"minimal": "low",
|
|
42
|
+
"xhigh": "high"
|
|
43
|
+
},
|
|
44
|
+
"compat": {
|
|
45
|
+
"thinkingFormat": "qwen-chat-template",
|
|
46
|
+
"supportsReasoningEffort": true
|
|
47
|
+
}
|
|
48
|
+
},
|
|
49
|
+
"openai/gpt-oss-120b": {
|
|
50
|
+
"reasoning": true,
|
|
51
|
+
"notes": "Reasoning always on",
|
|
52
|
+
"thinkingLevelMap": {
|
|
53
|
+
"minimal": "low",
|
|
54
|
+
"xhigh": "high"
|
|
55
|
+
},
|
|
56
|
+
"compat": {
|
|
57
|
+
"thinkingFormat": "qwen-chat-template",
|
|
58
|
+
"supportsReasoningEffort": true
|
|
59
|
+
}
|
|
60
|
+
},
|
|
61
|
+
"unsloth/Qwen3.6-27B-NVFP4": {
|
|
62
|
+
"reasoning": true,
|
|
63
|
+
"notes": "`enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass)",
|
|
64
|
+
"thinkingLevelMap": {
|
|
65
|
+
"minimal": "low",
|
|
66
|
+
"xhigh": "high"
|
|
67
|
+
},
|
|
68
|
+
"compat": {
|
|
69
|
+
"thinkingFormat": "qwen-chat-template",
|
|
70
|
+
"supportsReasoningEffort": true
|
|
71
|
+
}
|
|
72
|
+
},
|
|
73
|
+
"unsloth/Qwen3.6-35B-A3B-NVFP4": {
|
|
74
|
+
"reasoning": true,
|
|
75
|
+
"notes": "`enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass)",
|
|
76
|
+
"thinkingLevelMap": {
|
|
77
|
+
"minimal": "low",
|
|
78
|
+
"xhigh": "high"
|
|
79
|
+
},
|
|
80
|
+
"compat": {
|
|
81
|
+
"thinkingFormat": "qwen-chat-template",
|
|
82
|
+
"supportsReasoningEffort": true
|
|
83
|
+
}
|
|
84
|
+
},
|
|
85
|
+
"MiniMaxAI/MiniMax-M3-MXFP8": {
|
|
86
|
+
"reasoning": true,
|
|
87
|
+
"input": [
|
|
88
|
+
"text",
|
|
89
|
+
"image"
|
|
90
|
+
],
|
|
91
|
+
"notes": "Reasoning via `chat_template_kwargs.enable_thinking`; returns `reasoning_content` field",
|
|
92
|
+
"thinkingLevelMap": {
|
|
93
|
+
"minimal": null,
|
|
94
|
+
"low": null,
|
|
95
|
+
"medium": null,
|
|
96
|
+
"high": "high",
|
|
97
|
+
"xhigh": "max"
|
|
98
|
+
},
|
|
99
|
+
"compat": {
|
|
100
|
+
"thinkingFormat": "deepseek",
|
|
101
|
+
"supportsReasoningEffort": true,
|
|
102
|
+
"requiresReasoningContentOnAssistantMessages": true
|
|
103
|
+
}
|
|
104
|
+
},
|
|
105
|
+
"moonshotai/Kimi-K2.7-Code": {
|
|
106
|
+
"reasoning": true,
|
|
107
|
+
"input": [
|
|
108
|
+
"text",
|
|
109
|
+
"image"
|
|
110
|
+
],
|
|
111
|
+
"notes": "Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass)",
|
|
112
|
+
"thinkingLevelMap": {
|
|
113
|
+
"minimal": "low",
|
|
114
|
+
"xhigh": "high"
|
|
115
|
+
},
|
|
116
|
+
"compat": {
|
|
117
|
+
"thinkingFormat": "qwen-chat-template",
|
|
118
|
+
"supportsReasoningEffort": true
|
|
119
|
+
}
|
|
120
|
+
},
|
|
121
|
+
"zai-org/GLM-5.1-FP8": {
|
|
122
|
+
"contextWindow": 200000,
|
|
123
|
+
"reasoning": true,
|
|
124
|
+
"notes": "`enable_thinking` via `qwen-chat-template`; returns `reasoning_content` field; client-side tool call parsing (vLLM streaming parser bypass)",
|
|
125
|
+
"thinkingLevelMap": {
|
|
126
|
+
"minimal": "low",
|
|
127
|
+
"xhigh": "high"
|
|
128
|
+
},
|
|
129
|
+
"compat": {
|
|
130
|
+
"thinkingFormat": "qwen-chat-template",
|
|
131
|
+
"supportsReasoningEffort": true,
|
|
132
|
+
"zaiToolStream": true
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
}
|