pi-makora-provider 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,162 @@
1
+ <div align="center">
2
+
3
+ # 🔁 pi-makora-provider
4
+
5
+ **Open-weight models through [Makora](https://inference.makora.com)**
6
+
7
+ _DeepSeek V4, Kimi K2.6, GLM 5.1, Qwen 3.6 — with client-side tool call repair for [pi](https://github.com/earendil-works/pi-coding-agent)._
8
+
9
+ [![pi extension](https://img.shields.io/badge/pi-extension-blueviolet)](https://github.com/earendil-works/pi-coding-agent)
10
+ [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
11
+
12
+ </div>
13
+
14
+ ---
15
+
16
+ ## Models
17
+
18
+ <!-- MODELS_TABLE_START -->
19
+ | Model | ID | Reasoning | Notes |
20
+ |-------|----|-----------|-------|
21
+ | DeepSeek V4 Flash | `deepseek-ai/DeepSeek-V4-Flash` | Yes | `include_reasoning` + `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning` field |
22
+ | DeepSeek V4 Pro | `deepseek-ai/DeepSeek-V4-Pro` | Yes | `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning_content` field |
23
+ | GLM 5.1 FP8 | `zai-org/GLM-5.1-FP8` | Yes | `enable_thinking` via `qwen-chat-template`; returns `reasoning_content` field; client-side tool call parsing (vLLM streaming parser bypass) |
24
+ | GPT-OSS 120B | `openai/gpt-oss-120b` | Yes | Reasoning always on |
25
+ | Kimi K2.6 NVFP4 | `nvidia/Kimi-K2.6-NVFP4` | Yes | Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) |
26
+ | Kimi K2.7 Code | `moonshotai/Kimi-K2.7-Code` | Yes | Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass) |
27
+ | Llama 3.3 70B FP8 | `amd/Llama-3.3-70B-Instruct-FP8-KV` | No | |
28
+ | Llama 3.3 70B Instruct | `meta-llama/Llama-3.3-70B-Instruct` | No | |
29
+ | MiniMax M3 MXFP8 | `MiniMaxAI/MiniMax-M3-MXFP8` | Yes | Reasoning via `chat_template_kwargs.enable_thinking`; returns `reasoning_content` field |
30
+ | Qwen 3.6 27B NVFP4 | `unsloth/Qwen3.6-27B-NVFP4` | Yes | `enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass) |
31
+ | Qwen 3.6 35B A3B NVFP4 | `unsloth/Qwen3.6-35B-A3B-NVFP4` | Yes | `enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass) |
32
+ <!-- MODELS_TABLE_END -->
33
+
34
+ ## Installation
35
+
36
+ ### Option 1: Using `pi install` (Recommended)
37
+
38
+ Install directly from GitHub:
39
+
40
+ ```bash
41
+ pi install https://github.com/monotykamary/pi-makora-provider
42
+ ```
43
+
44
+ Then set your API key and run pi:
45
+ ```bash
46
+ # Recommended: add to auth.json
47
+ # See Authentication section below
48
+
49
+ # Or set as environment variable
50
+ export MAKORA_OPTIMIZE_TOKEN=your-api-key-here
51
+
52
+ pi
53
+ ```
54
+
55
+ ### Option 2: Manual Clone
56
+
57
+ 1. Clone this repository:
58
+ ```bash
59
+ git clone https://github.com/monotykamary/pi-makora-provider.git
60
+ cd pi-makora-provider
61
+ ```
62
+
63
+ 2. Set your Makora API key:
64
+ ```bash
65
+ # Recommended: add to auth.json
66
+ # See Authentication section below
67
+
68
+ # Or set as environment variable
69
+ export MAKORA_OPTIMIZE_TOKEN=your-api-key-here
70
+ ```
71
+
72
+ 3. Run pi with the extension:
73
+ ```bash
74
+ pi -e /path/to/pi-makora-provider
75
+ ```
76
+
77
+ ## Setup
78
+
79
+ ### API Key
80
+
81
+ Add your Makora API key to `~/.pi/agent/auth.json` (recommended):
82
+
83
+ ```json
84
+ {
85
+ "makora": { "type": "api_key", "key": "your-api-key" }
86
+ }
87
+ ```
88
+
89
+ Or set it as an environment variable:
90
+
91
+ ```bash
92
+ export MAKORA_OPTIMIZE_TOKEN=your-api-key
93
+ ```
94
+
95
+ ### Usage
96
+
97
+ ```bash
98
+ pi -e /path/to/pi-makora-provider
99
+ ```
100
+
101
+ Then use `/model` to select from available Makora models.
102
+
103
+ ## Model Resolution
104
+
105
+ Models are discovered from the Makora `/v1/models` API and stored in `models.json`. Custom definitions and overrides are layered via `patch.json` and `custom-models.json`.
106
+
107
+ | File | Purpose |
108
+ |---|---|
109
+ | `models.json` | Auto-generated from Makora API (model discovery). Regenerated by `node scripts/update-models.js` — do not edit manually |
110
+ | `patch.json` | Manual overrides (reasoning, compat, notes, limits, etc.) applied on top of `models.json` |
111
+ | `custom-models.json` | Models not available via the API (e.g. per-slug endpoint models) |
112
+
113
+ Models are loaded by merging `models.json` → apply `patch.json` → merge `custom-models.json`.
114
+
115
+ ## Adding Custom Models
116
+
117
+ Do **not** edit `models.json` directly — it is auto-generated from the API. To customize:
118
+
119
+ - **Override an existing model**: Add entries to `patch.json` (reasoning, compat, notes, maxTokens, etc.)
120
+ - **Add new models not in the API**: Add entries to `custom-models.json`:
121
+
122
+ ```json
123
+ [
124
+ {
125
+ "id": "my-org/my-model",
126
+ "name": "My Custom Model",
127
+ "reasoning": false,
128
+ "input": ["text"],
129
+ "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
130
+ "contextWindow": 131072,
131
+ "maxTokens": 16384,
132
+ "baseUrl": "https://inference.makora.com/my-model-slug/v1"
133
+ }
134
+ ]
135
+ ```
136
+
137
+ ## API Notes
138
+
139
+ - Each model is accessible at `https://inference.makora.com/v1/chat/completions` (unified endpoint)
140
+ - Models with a `baseUrl` override use their per-slug endpoint instead
141
+ - The API is OpenAI-compatible (chat completions format)
142
+ - All models are hosted on vLLM
143
+ - The `developer` role is not supported (prompts are silently dropped); `supportsDeveloperRole` is set to `false` for all models
144
+
145
+ ## vLLM Caveats
146
+
147
+ These issues are common to all vLLM-hosted providers and affect Makora models:
148
+
149
+ - **GLM 5.1 tool calling**: vLLM's streaming tool call handling is broken for GLM — the model outputs Zhipu's native `<tool_call>` XML format as raw text. The `message_end` hook parses this into `toolCall` blocks so pi can execute the tools. A `context` hook then strips `tool_calls` from assistant messages before follow-up requests, converting them back to `<tool_call>` text to avoid a ZAI/vLLM server crash (500: `'str object' has no attribute 'items'`) that occurs when any assistant message contains a `tool_calls` field. If upstream fixes both the streaming parser and the 500 crash, the `message_end` hook gracefully skips (existing valid `toolCall` blocks are preserved), and the `context` hook's text-stripping is harmless (GLM natively understands `<tool_call>` format in conversation history).
150
+
151
+ - **Kimi K2.6 + Qwen 3.6 tool calling**: vLLM's streaming tool call handling is broken or missing for these models. The `before_provider_request` hook sets `tool_choice: "none"` and `skip_special_tokens: false` so the model's tool call tokens pass through as plain text. The `message_end` hook then re-parses into `toolCall` blocks:
152
+
153
+ - **Kimi K2.6**: Uses `<|tool_call_begin|>...<|tool_call_end|>` tokens. Makora's vLLM is missing both `--enable-auto-tool-choice` and `--tool-call-parser` for this model.
154
+ - **Qwen 3.6**: Uses hermes-style `<function=...>` XML, sometimes with `█` delimiters. Same vLLM flag limitation as Kimi.
155
+
156
+ - **GLM 5.1 CoT leak**: On some vLLM builds, disabling reasoning may still leak chain-of-thought into `content` terminated by a ``` marker. See [vllm-project/vllm#31319](https://github.com/vllm-project/vllm/issues/31319).
157
+
158
+ - **DeepSeek V4 reasoning**: The official DeepSeek API uses `thinking: { type: "enabled" }` which Makora's vLLM silently ignores. The `before_provider_request` hook rewrites the payload to use vLLM-native params instead:
159
+ - **DS V4 Pro**: `chat_template_kwargs: { thinking: true }`. Returns `reasoning_content`.
160
+ - **DS V4 Flash**: `include_reasoning: true` + `chat_template_kwargs: { thinking: true }`. `include_reasoning` alone returns `reasoning: null` on this vLLM build — both params are required. Returns `reasoning`.
161
+ - **GLM 5.1 reasoning**: Returns `reasoning_content` (not `reasoning`). pi's OpenAI completions handler checks `reasoning_content` first, so this is handled correctly.
162
+ - **MiniMax M3 reasoning**: Uses `chat_template_kwargs.enable_thinking` to toggle thinking (not `chat_template_kwargs.thinking` like DeepSeek). The `before_provider_request` hook rewrites the DeepSeek API-style `thinking` param into vLLM-native `chat_template_kwargs: { enable_thinking: true }`. Returns `reasoning_content` field.
@@ -0,0 +1,24 @@
1
+ [
2
+ {
3
+ "id": "amd/Llama-3.3-70B-Instruct-FP8-KV",
4
+ "name": "Llama 3.3 70B FP8",
5
+ "reasoning": false,
6
+ "input": [
7
+ "text"
8
+ ],
9
+ "cost": {
10
+ "input": 0,
11
+ "output": 0,
12
+ "cacheRead": 0,
13
+ "cacheWrite": 0
14
+ },
15
+ "contextWindow": 128000,
16
+ "maxTokens": 16384,
17
+ "baseUrl": "https://inference.makora.com/llama3-3-70b-instruct-fp8/v1",
18
+ "compat": {
19
+ "supportsDeveloperRole": false,
20
+ "supportsStore": false,
21
+ "maxTokensField": "max_completion_tokens"
22
+ }
23
+ }
24
+ ]
package/index.ts ADDED
@@ -0,0 +1,287 @@
1
+ /**
2
+ * Makora Provider Extension
3
+ *
4
+ * Registers Makora (inference.makora.com) as a custom provider using the
5
+ * OpenAI completions API.
6
+ *
7
+ * Makora is an inference optimization platform serving open-weight models via
8
+ * a unified OpenAI-compatible API at https://inference.makora.com/v1. Each
9
+ * model is hosted on vLLM and speaks the standard OpenAI chat completions
10
+ * protocol. Most models use the shared provider baseUrl; models not yet
11
+ * on the unified endpoint retain a per-model `baseUrl` override.
12
+ *
13
+ * Model resolution strategy: static models.json merged with custom-models.json
14
+ *
15
+ * Reasoning notes:
16
+ * - DeepSeek V4 Pro: reasoning via chat_template_kwargs.thinking on vLLM.
17
+ * pi sends thinking: { type } via the "deepseek" thinkingFormat, but vLLM
18
+ * ignores that — the before_provider_request hook rewrites the payload to
19
+ * use chat_template_kwargs: { thinking: true } instead.
20
+ * Returns reasoning_content field.
21
+ * - DeepSeek V4 Flash: reasoning via include_reasoning +
22
+ * chat_template_kwargs.thinking on vLLM.
23
+ * The before_provider_request hook rewrites the payload to replace
24
+ * thinking: { type } with include_reasoning: true +
25
+ * chat_template_kwargs: { thinking: true }.
26
+ * include_reasoning alone returns reasoning: null on this vLLM build.
27
+ * Returns reasoning field.
28
+ * - GLM 5.1 FP8: reasoning via chat_template_kwargs.enable_thinking.
29
+ * NOTE: vLLM may leak chain-of-thought into content instead of the
30
+ * reasoning field on some builds. See
31
+ * https://github.com/vllm-project/vllm/issues/31319
32
+ * Also: vLLM's streaming parser omits delta.tool_calls when the model
33
+ * calls tools, finishing with finish_reason: "tool_calls" but an empty
34
+ * delta. Setting zaiToolStream: true sends tool_stream: true in the
35
+ * request, which forces vLLM to use the explicit tool streaming path
36
+ * that correctly emits tool call chunks.
37
+ * - GPT-OSS 120B: reasoning always on; returns `reasoning` field.
38
+ * - Kimi K2.6 NVFP4 / Kimi K2.7 Code: reasoning always on by default;
39
+ * returns `reasoning` field. Can be toggled via enable_thinking.
40
+ * - Qwen 3.6 models: reasoning via chat_template_kwargs.enable_thinking;
41
+ * returns `reasoning` field.
42
+ * - MiniMax M3 MXFP8: reasoning via chat_template_kwargs.enable_thinking;
43
+ * returns reasoning_content field.
44
+ * - Llama 3.3 70B: not a reasoning model.
45
+ *
46
+ * Developer role is NOT supported by any of the chat templates on Makora's
47
+ * vLLM deployment (prompts with role: "developer" are silently dropped).
48
+ * supportsDeveloperRole is set to false for all models.
49
+ *
50
+ * Usage:
51
+ * # Option 1: Store in auth.json (recommended)
52
+ * # Add to ~/.pi/agent/auth.json:
53
+ * # "makora": { "type": "api_key", "key": "your-api-key" }
54
+ *
55
+ * # Option 2: Set as environment variable
56
+ * export MAKORA_OPTIMIZE_TOKEN=your-api-key
57
+ *
58
+ * # Run pi with the extension
59
+ * pi -e /path/to/pi-makora-provider
60
+ *
61
+ * Then use /model to select from available models.
62
+ */
63
+
64
+ import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
65
+ import modelsData from "./models.json" with { type: "json" };
66
+ import customModelsData from "./custom-models.json" with { type: "json" };
67
+ import patchData from "./patch.json" with { type: "json" };
68
+
69
+ // Types
70
+
71
+ interface JsonModel {
72
+ id: string;
73
+ name: string;
74
+ reasoning: boolean;
75
+ input: string[];
76
+ cost: {
77
+ input: number;
78
+ output: number;
79
+ cacheRead: number;
80
+ cacheWrite: number;
81
+ };
82
+ contextWindow: number;
83
+ maxTokens: number;
84
+ baseUrl?: string;
85
+ notes?: string;
86
+ thinkingLevelMap?: Record<string, string | null>;
87
+ headers?: Record<string, string>;
88
+ vision?: {
89
+ maxImagesPerRequest?: number;
90
+ };
91
+ compat?: {
92
+ supportsDeveloperRole?: boolean;
93
+ supportsStore?: boolean;
94
+ maxTokensField?: "max_completion_tokens" | "max_tokens";
95
+ thinkingFormat?:
96
+ | "openai"
97
+ | "openrouter"
98
+ | "deepseek"
99
+ | "together"
100
+ | "zai"
101
+ | "qwen"
102
+ | "qwen-chat-template";
103
+ supportsReasoningEffort?: boolean;
104
+ requiresReasoningContentOnAssistantMessages?: boolean;
105
+ requiresToolResultName?: boolean;
106
+ requiresAssistantAfterToolResult?: boolean;
107
+ cacheControlFormat?: "anthropic";
108
+ };
109
+ }
110
+
111
+ interface PatchEntry {
112
+ name?: string;
113
+ reasoning?: boolean;
114
+ input?: string[];
115
+ cost?: {
116
+ input?: number;
117
+ output?: number;
118
+ cacheRead?: number;
119
+ cacheWrite?: number;
120
+ };
121
+ contextWindow?: number;
122
+ maxTokens?: number;
123
+ baseUrl?: string;
124
+ notes?: string;
125
+ thinkingLevelMap?: Record<string, string | null>;
126
+ headers?: Record<string, string>;
127
+ compat?: Record<string, unknown>;
128
+ }
129
+
130
+ type PatchMap = Record<string, PatchEntry>;
131
+
132
+ // Patch Application
133
+
134
+ function applyPatch(model: JsonModel, patch: PatchEntry): JsonModel {
135
+ const result = { ...model };
136
+
137
+ if (patch.name !== undefined) result.name = patch.name;
138
+ if (patch.reasoning !== undefined) result.reasoning = patch.reasoning;
139
+ if (patch.input !== undefined) result.input = patch.input;
140
+ if (patch.contextWindow !== undefined) result.contextWindow = patch.contextWindow;
141
+ if (patch.maxTokens !== undefined) result.maxTokens = patch.maxTokens;
142
+ if (patch.baseUrl !== undefined) result.baseUrl = patch.baseUrl;
143
+ if (patch.notes !== undefined) result.notes = patch.notes;
144
+ if (patch.thinkingLevelMap !== undefined) result.thinkingLevelMap = { ...patch.thinkingLevelMap };
145
+ if (patch.headers !== undefined) result.headers = { ...patch.headers };
146
+
147
+ if (patch.cost) {
148
+ result.cost = {
149
+ input: patch.cost.input ?? result.cost.input,
150
+ output: patch.cost.output ?? result.cost.output,
151
+ cacheRead: patch.cost.cacheRead ?? result.cost.cacheRead,
152
+ cacheWrite: patch.cost.cacheWrite ?? result.cost.cacheWrite,
153
+ };
154
+ }
155
+ if (patch.compat) {
156
+ result.compat = { ...(result.compat || {}), ...patch.compat };
157
+ }
158
+
159
+ if (!result.reasoning && result.compat?.thinkingFormat) {
160
+ delete result.compat.thinkingFormat;
161
+ }
162
+ if (result.compat && Object.keys(result.compat).length === 0) {
163
+ delete result.compat;
164
+ }
165
+
166
+ return result;
167
+ }
168
+
169
+ /** Merge static models with any user-defined custom models */
170
+ function buildModels(
171
+ base: JsonModel[],
172
+ custom: JsonModel[],
173
+ patch: PatchMap
174
+ ): JsonModel[] {
175
+ const modelMap = new Map<string, JsonModel>();
176
+
177
+ for (const model of base) {
178
+ modelMap.set(model.id, model);
179
+ }
180
+
181
+ for (const [id, patchEntry] of Object.entries(patch)) {
182
+ const existing = modelMap.get(id);
183
+ if (existing) {
184
+ modelMap.set(id, applyPatch(existing, patchEntry));
185
+ }
186
+ }
187
+
188
+ for (const model of custom) {
189
+ const existing = modelMap.get(model.id);
190
+ const patchEntry = patch[model.id];
191
+ if (existing && patchEntry) {
192
+ modelMap.set(model.id, applyPatch(model, patchEntry));
193
+ } else if (existing) {
194
+ modelMap.set(model.id, model);
195
+ } else if (patchEntry) {
196
+ modelMap.set(model.id, applyPatch(model, patchEntry));
197
+ } else {
198
+ modelMap.set(model.id, model);
199
+ }
200
+ }
201
+
202
+ return Array.from(modelMap.values());
203
+ }
204
+
205
+ // Extension Entry Point
206
+
207
+ const PROVIDER_ID = "makora";
208
+ const BASE_URL = "https://inference.makora.com/v1";
209
+
210
+ const DS_PRO_ID = "deepseek-ai/DeepSeek-V4-Pro";
211
+ const DS_FLASH_ID = "deepseek-ai/DeepSeek-V4-Flash";
212
+ const MINIMAX_M3_ID = "MiniMaxAI/MiniMax-M3-MXFP8";
213
+
214
+ const DS_VLLM_MODELS = new Set([DS_PRO_ID, DS_FLASH_ID]);
215
+ const ENABLE_THINKING_VLLM_MODELS = new Set([MINIMAX_M3_ID]);
216
+
217
+ /**
218
+ * Intercept the request payload for models that need vLLM-specific thinking
219
+ * param rewrites.
220
+ *
221
+ * pi's "deepseek" thinkingFormat sends `thinking: { type: "enabled" }` which
222
+ * is the official DeepSeek API format — but Makora's vLLM deployment ignores
223
+ * it. vLLM requires different params depending on the model:
224
+ * - DS V4 Pro: `chat_template_kwargs: { thinking: true }` + `reasoning_effort`
225
+ * - DS V4 Flash: `include_reasoning: true` + `chat_template_kwargs: { thinking: true }`
226
+ * + `reasoning_effort`. `include_reasoning` alone returns `reasoning: null`
227
+ * on this vLLM build — both params are required.
228
+ * - MiniMax M3: `chat_template_kwargs: { enable_thinking: true }` +
229
+ * `reasoning_effort`. Returns `reasoning_content` field.
230
+ *
231
+ * This hook rewrites the payload accordingly.
232
+ */
233
+ function rewriteVllmPayload(payload: Record<string, unknown>): Record<string, unknown> {
234
+ const model = payload.model as string | undefined;
235
+ if (!model) return payload;
236
+
237
+ const p = { ...payload };
238
+
239
+ if (DS_VLLM_MODELS.has(model)) {
240
+ // Remove the DeepSeek API-style `thinking` param that vLLM ignores
241
+ delete p.thinking;
242
+
243
+ if (model === DS_PRO_ID) {
244
+ // DS Pro: chat_template_kwargs.thinking + reasoning_effort
245
+ const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
246
+ p.chat_template_kwargs = { ...ctq, thinking: true };
247
+ } else if (model === DS_FLASH_ID) {
248
+ // DS Flash: include_reasoning + chat_template_kwargs.thinking + reasoning_effort
249
+ // vLLM requires *both* include_reasoning and chat_template_kwargs.thinking:
250
+ // include_reasoning alone returns reasoning: null.
251
+ p.include_reasoning = true;
252
+ const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
253
+ p.chat_template_kwargs = { ...ctq, thinking: true };
254
+ }
255
+ } else if (ENABLE_THINKING_VLLM_MODELS.has(model)) {
256
+ // Models using chat_template_kwargs.enable_thinking (e.g. MiniMax M3)
257
+ delete p.thinking;
258
+ const ctq = (p.chat_template_kwargs as Record<string, unknown>) ?? {};
259
+ p.chat_template_kwargs = { ...ctq, enable_thinking: true };
260
+ }
261
+
262
+ return p;
263
+ }
264
+
265
+ export default function (pi: ExtensionAPI) {
266
+ const embeddedModels = modelsData as JsonModel[];
267
+ const customModels = customModelsData as JsonModel[];
268
+ const patches = patchData as PatchMap;
269
+
270
+ const models = buildModels(embeddedModels, customModels, patches);
271
+
272
+ // apiKey resolution order: auth.json ("makora" key) → MAKORA_OPTIMIZE_TOKEN env var
273
+ pi.registerProvider(PROVIDER_ID, {
274
+ name: "Makora",
275
+ baseUrl: BASE_URL,
276
+ apiKey: "$MAKORA_OPTIMIZE_TOKEN",
277
+ api: "openai-completions",
278
+ models,
279
+ });
280
+
281
+ pi.on("before_provider_request", (event) => {
282
+ const payload = event.payload as Record<string, unknown> | undefined;
283
+ if (!payload || typeof payload.model !== "string") return;
284
+ return rewriteVllmPayload(payload);
285
+ });
286
+ }
287
+
package/models.json ADDED
@@ -0,0 +1,212 @@
1
+ [
2
+ {
3
+ "id": "deepseek-ai/DeepSeek-V4-Flash",
4
+ "name": "DeepSeek V4 Flash",
5
+ "reasoning": false,
6
+ "input": [
7
+ "text"
8
+ ],
9
+ "cost": {
10
+ "input": 0,
11
+ "output": 0,
12
+ "cacheRead": 0,
13
+ "cacheWrite": 0
14
+ },
15
+ "contextWindow": 1048576,
16
+ "maxTokens": 0,
17
+ "compat": {
18
+ "supportsDeveloperRole": false,
19
+ "supportsStore": false,
20
+ "maxTokensField": "max_completion_tokens"
21
+ }
22
+ },
23
+ {
24
+ "id": "deepseek-ai/DeepSeek-V4-Pro",
25
+ "name": "DeepSeek V4 Pro",
26
+ "reasoning": false,
27
+ "input": [
28
+ "text"
29
+ ],
30
+ "cost": {
31
+ "input": 0,
32
+ "output": 0,
33
+ "cacheRead": 0,
34
+ "cacheWrite": 0
35
+ },
36
+ "contextWindow": 1048576,
37
+ "maxTokens": 0,
38
+ "compat": {
39
+ "supportsDeveloperRole": false,
40
+ "supportsStore": false,
41
+ "maxTokensField": "max_completion_tokens"
42
+ }
43
+ },
44
+ {
45
+ "id": "meta-llama/Llama-3.3-70B-Instruct",
46
+ "name": "Llama 3.3 70B Instruct",
47
+ "reasoning": false,
48
+ "input": [
49
+ "text"
50
+ ],
51
+ "cost": {
52
+ "input": 0,
53
+ "output": 0,
54
+ "cacheRead": 0,
55
+ "cacheWrite": 0
56
+ },
57
+ "contextWindow": 131072,
58
+ "maxTokens": 0,
59
+ "compat": {
60
+ "supportsDeveloperRole": false,
61
+ "supportsStore": false,
62
+ "maxTokensField": "max_completion_tokens"
63
+ }
64
+ },
65
+ {
66
+ "id": "MiniMaxAI/MiniMax-M3-MXFP8",
67
+ "name": "MiniMax M3 MXFP8",
68
+ "reasoning": false,
69
+ "input": [
70
+ "text"
71
+ ],
72
+ "cost": {
73
+ "input": 0,
74
+ "output": 0,
75
+ "cacheRead": 0,
76
+ "cacheWrite": 0
77
+ },
78
+ "contextWindow": 1048576,
79
+ "maxTokens": 0,
80
+ "compat": {
81
+ "supportsDeveloperRole": false,
82
+ "supportsStore": false,
83
+ "maxTokensField": "max_completion_tokens"
84
+ }
85
+ },
86
+ {
87
+ "id": "moonshotai/Kimi-K2.7-Code",
88
+ "name": "Kimi K2.7 Code",
89
+ "reasoning": false,
90
+ "input": [
91
+ "text"
92
+ ],
93
+ "cost": {
94
+ "input": 0,
95
+ "output": 0,
96
+ "cacheRead": 0,
97
+ "cacheWrite": 0
98
+ },
99
+ "contextWindow": 262144,
100
+ "maxTokens": 0,
101
+ "compat": {
102
+ "supportsDeveloperRole": false,
103
+ "supportsStore": false,
104
+ "maxTokensField": "max_completion_tokens"
105
+ }
106
+ },
107
+ {
108
+ "id": "nvidia/Kimi-K2.6-NVFP4",
109
+ "name": "Kimi K2.6 NVFP4",
110
+ "reasoning": false,
111
+ "input": [
112
+ "text"
113
+ ],
114
+ "cost": {
115
+ "input": 0,
116
+ "output": 0,
117
+ "cacheRead": 0,
118
+ "cacheWrite": 0
119
+ },
120
+ "contextWindow": 262144,
121
+ "maxTokens": 0,
122
+ "compat": {
123
+ "supportsDeveloperRole": false,
124
+ "supportsStore": false,
125
+ "maxTokensField": "max_completion_tokens"
126
+ }
127
+ },
128
+ {
129
+ "id": "openai/gpt-oss-120b",
130
+ "name": "GPT-OSS 120B",
131
+ "reasoning": false,
132
+ "input": [
133
+ "text"
134
+ ],
135
+ "cost": {
136
+ "input": 0,
137
+ "output": 0,
138
+ "cacheRead": 0,
139
+ "cacheWrite": 0
140
+ },
141
+ "contextWindow": 131072,
142
+ "maxTokens": 0,
143
+ "compat": {
144
+ "supportsDeveloperRole": false,
145
+ "supportsStore": false,
146
+ "maxTokensField": "max_completion_tokens"
147
+ }
148
+ },
149
+ {
150
+ "id": "unsloth/Qwen3.6-27B-NVFP4",
151
+ "name": "Qwen 3.6 27B NVFP4",
152
+ "reasoning": false,
153
+ "input": [
154
+ "text"
155
+ ],
156
+ "cost": {
157
+ "input": 0,
158
+ "output": 0,
159
+ "cacheRead": 0,
160
+ "cacheWrite": 0
161
+ },
162
+ "contextWindow": 262144,
163
+ "maxTokens": 0,
164
+ "compat": {
165
+ "supportsDeveloperRole": false,
166
+ "supportsStore": false,
167
+ "maxTokensField": "max_completion_tokens"
168
+ }
169
+ },
170
+ {
171
+ "id": "unsloth/Qwen3.6-35B-A3B-NVFP4",
172
+ "name": "Qwen 3.6 35B A3B NVFP4",
173
+ "reasoning": false,
174
+ "input": [
175
+ "text"
176
+ ],
177
+ "cost": {
178
+ "input": 0,
179
+ "output": 0,
180
+ "cacheRead": 0,
181
+ "cacheWrite": 0
182
+ },
183
+ "contextWindow": 262144,
184
+ "maxTokens": 0,
185
+ "compat": {
186
+ "supportsDeveloperRole": false,
187
+ "supportsStore": false,
188
+ "maxTokensField": "max_completion_tokens"
189
+ }
190
+ },
191
+ {
192
+ "id": "zai-org/GLM-5.1-FP8",
193
+ "name": "GLM 5.1 FP8",
194
+ "reasoning": false,
195
+ "input": [
196
+ "text"
197
+ ],
198
+ "cost": {
199
+ "input": 0,
200
+ "output": 0,
201
+ "cacheRead": 0,
202
+ "cacheWrite": 0
203
+ },
204
+ "contextWindow": 202752,
205
+ "maxTokens": 0,
206
+ "compat": {
207
+ "supportsDeveloperRole": false,
208
+ "supportsStore": false,
209
+ "maxTokensField": "max_completion_tokens"
210
+ }
211
+ }
212
+ ]
package/package.json ADDED
@@ -0,0 +1,41 @@
1
+ {
2
+ "name": "pi-makora-provider",
3
+ "version": "1.0.0",
4
+ "description": "Makora provider extension for pi - Access DeepSeek V4, GLM 5.1, Kimi K2.6, Llama 3.3, Qwen 3.6, and more through the Makora inference API",
5
+ "type": "module",
6
+ "main": "index.ts",
7
+ "scripts": {
8
+ "clean": "echo 'nothing to clean'",
9
+ "build": "echo 'nothing to build'",
10
+ "check": "echo 'nothing to check'",
11
+ "update-models": "node scripts/update-models.js"
12
+ },
13
+ "keywords": [
14
+ "pi",
15
+ "extension",
16
+ "provider",
17
+ "makora",
18
+ "ai",
19
+ "llm",
20
+ "deepseek",
21
+ "glm",
22
+ "kimi",
23
+ "llama",
24
+ "qwen"
25
+ ],
26
+ "author": "monotykamary",
27
+ "license": "MIT",
28
+ "files": [
29
+ "index.ts",
30
+ "models.json",
31
+ "custom-models.json",
32
+ "patch.json",
33
+ "README.md",
34
+ "LICENSE"
35
+ ],
36
+ "pi": {
37
+ "extensions": [
38
+ "./index.ts"
39
+ ]
40
+ }
41
+ }
package/patch.json ADDED
@@ -0,0 +1,135 @@
1
+ {
2
+ "deepseek-ai/DeepSeek-V4-Flash": {
3
+ "reasoning": true,
4
+ "notes": "`include_reasoning` + `chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning` field",
5
+ "thinkingLevelMap": {
6
+ "minimal": null,
7
+ "low": null,
8
+ "medium": null,
9
+ "high": "high",
10
+ "xhigh": "max"
11
+ },
12
+ "compat": {
13
+ "thinkingFormat": "deepseek",
14
+ "supportsReasoningEffort": true
15
+ }
16
+ },
17
+ "deepseek-ai/DeepSeek-V4-Pro": {
18
+ "reasoning": true,
19
+ "notes": "`chat_template_kwargs.thinking` via `before_provider_request` payload rewrite; returns `reasoning_content` field",
20
+ "thinkingLevelMap": {
21
+ "minimal": null,
22
+ "low": null,
23
+ "medium": null,
24
+ "high": "high",
25
+ "xhigh": "max"
26
+ },
27
+ "compat": {
28
+ "thinkingFormat": "deepseek",
29
+ "supportsReasoningEffort": true,
30
+ "requiresReasoningContentOnAssistantMessages": true
31
+ }
32
+ },
33
+ "nvidia/Kimi-K2.6-NVFP4": {
34
+ "reasoning": true,
35
+ "input": [
36
+ "text",
37
+ "image"
38
+ ],
39
+ "notes": "Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass)",
40
+ "thinkingLevelMap": {
41
+ "minimal": "low",
42
+ "xhigh": "high"
43
+ },
44
+ "compat": {
45
+ "thinkingFormat": "qwen-chat-template",
46
+ "supportsReasoningEffort": true
47
+ }
48
+ },
49
+ "openai/gpt-oss-120b": {
50
+ "reasoning": true,
51
+ "notes": "Reasoning always on",
52
+ "thinkingLevelMap": {
53
+ "minimal": "low",
54
+ "xhigh": "high"
55
+ },
56
+ "compat": {
57
+ "thinkingFormat": "qwen-chat-template",
58
+ "supportsReasoningEffort": true
59
+ }
60
+ },
61
+ "unsloth/Qwen3.6-27B-NVFP4": {
62
+ "reasoning": true,
63
+ "notes": "`enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass)",
64
+ "thinkingLevelMap": {
65
+ "minimal": "low",
66
+ "xhigh": "high"
67
+ },
68
+ "compat": {
69
+ "thinkingFormat": "qwen-chat-template",
70
+ "supportsReasoningEffort": true
71
+ }
72
+ },
73
+ "unsloth/Qwen3.6-35B-A3B-NVFP4": {
74
+ "reasoning": true,
75
+ "notes": "`enable_thinking` via `qwen-chat-template`; client-side tool call parsing (vLLM streaming parser bypass)",
76
+ "thinkingLevelMap": {
77
+ "minimal": "low",
78
+ "xhigh": "high"
79
+ },
80
+ "compat": {
81
+ "thinkingFormat": "qwen-chat-template",
82
+ "supportsReasoningEffort": true
83
+ }
84
+ },
85
+ "MiniMaxAI/MiniMax-M3-MXFP8": {
86
+ "reasoning": true,
87
+ "input": [
88
+ "text",
89
+ "image"
90
+ ],
91
+ "notes": "Reasoning via `chat_template_kwargs.enable_thinking`; returns `reasoning_content` field",
92
+ "thinkingLevelMap": {
93
+ "minimal": null,
94
+ "low": null,
95
+ "medium": null,
96
+ "high": "high",
97
+ "xhigh": "max"
98
+ },
99
+ "compat": {
100
+ "thinkingFormat": "deepseek",
101
+ "supportsReasoningEffort": true,
102
+ "requiresReasoningContentOnAssistantMessages": true
103
+ }
104
+ },
105
+ "moonshotai/Kimi-K2.7-Code": {
106
+ "reasoning": true,
107
+ "input": [
108
+ "text",
109
+ "image"
110
+ ],
111
+ "notes": "Reasoning on by default; client-side tool call parsing (vLLM streaming parser bypass)",
112
+ "thinkingLevelMap": {
113
+ "minimal": "low",
114
+ "xhigh": "high"
115
+ },
116
+ "compat": {
117
+ "thinkingFormat": "qwen-chat-template",
118
+ "supportsReasoningEffort": true
119
+ }
120
+ },
121
+ "zai-org/GLM-5.1-FP8": {
122
+ "contextWindow": 200000,
123
+ "reasoning": true,
124
+ "notes": "`enable_thinking` via `qwen-chat-template`; returns `reasoning_content` field; client-side tool call parsing (vLLM streaming parser bypass)",
125
+ "thinkingLevelMap": {
126
+ "minimal": "low",
127
+ "xhigh": "high"
128
+ },
129
+ "compat": {
130
+ "thinkingFormat": "qwen-chat-template",
131
+ "supportsReasoningEffort": true,
132
+ "zaiToolStream": true
133
+ }
134
+ }
135
+ }