pi-lilac-provider 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,196 @@
1
+ <div align="center">
2
+
3
+ # 💜 pi-lilac-provider
4
+
5
+ **Kimi K2.6, GLM 5.1, Gemma 4 & more on idle GPUs via [Lilac](https://getlilac.com/)**
6
+
7
+ _A [pi](https://github.com/earendil-works/pi-coding-agent) provider extension for cost-efficient GPU inference._
8
+
9
+ [![pi extension](https://img.shields.io/badge/pi-extension-blueviolet)](https://github.com/earendil-works/pi-coding-agent)
10
+ [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
11
+
12
+ </div>
13
+
14
+ ---
15
+
16
+ Access Kimi K2.6, GLM 5.1, MiniMax M2.7, and Gemma 4 models through Lilac's OpenAI-compatible API on idle GPUs.
17
+
18
+ ## Features
19
+
20
+ - **3 AI Models** — Kimi K2.6, GLM 5.1, and Gemma 4
21
+ - **OpenAI-Compatible API** — Just change the base URL and API key
22
+ - **Cost Tracking** — Per-model pricing with cache read discounts
23
+ - **Reasoning Models** — Chain-of-thought via `chat_template_kwargs` (all models)
24
+ - **Vision Support** — Image input on Kimi K2.6 and Gemma 4
25
+ - **Context Caching** — Cache read pricing on Kimi K2.6 and GLM 5.1
26
+ - **Idle GPU Scheduling** — Lilac leverages idle GPU capacity for cost-efficient inference
27
+
28
+ ## Installation
29
+
30
+ ### Option 1: Using `pi install` (Recommended)
31
+
32
+ Install directly from GitHub:
33
+
34
+ ```bash
35
+ pi install https://github.com/monotykamary/pi-lilac-provider
36
+ ```
37
+
38
+ Then set your API key and run pi:
39
+ ```bash
40
+ # Recommended: add to auth.json
41
+ # See Authentication section below
42
+
43
+ # Or set as environment variable
44
+ export LILAC_API_KEY=your-api-key-here
45
+
46
+ pi
47
+ ```
48
+
49
+ ### Option 2: Manual Clone
50
+
51
+ 1. Clone this repository:
52
+ ```bash
53
+ git clone https://github.com/monotykamary/pi-lilac-provider.git
54
+ cd pi-lilac-provider
55
+ ```
56
+
57
+ 2. Set your Lilac API key:
58
+ ```bash
59
+ # Recommended: add to auth.json
60
+ # See Authentication section below
61
+
62
+ # Or set as environment variable
63
+ export LILAC_API_KEY=your-api-key-here
64
+ ```
65
+
66
+ 3. Run pi with the extension:
67
+ ```bash
68
+ pi -e /path/to/pi-lilac-provider
69
+ ```
70
+
71
+ ## Available Models
72
+
73
+ | Model | Context | Vision | Reasoning | Input $/M | Cache Read $/M | Output $/M |
74
+ |-------|---------|--------|-----------|-----------|-----------------|------------|
75
+ | Gemma 4 | 262K | ✅ | ✅ | $0.11 | — | $0.35 |
76
+ | GLM 5.1 | 203K | ❌ | ✅ | $0.90 | $0.27 | $3.00 |
77
+ | GLM 5.2 | 524K | ❌ | ✅ | $0.90 | $0.27 | $3.00 |
78
+ | Kimi K2.6 | 262K | ✅ | ✅ | $0.70 | $0.20 | $3.50 |
79
+ | MiniMax M2.7 | 205K | ❌ | ✅ | $0.30 | $0.06 | $1.20 |
80
+ | MiniMax M3 | 1.0M | ❌ | ✅ | $0.28 | $0.05 | $1.10 |
81
+
82
+ *Costs are per million tokens. Prices subject to change — check [getlilac.com](https://getlilac.com/) for current pricing.*
83
+
84
+ **Notes:**
85
+ - **Gemma 4** has reasoning **off by default** — pi enables it when you set a thinking level (Shift+Tab)
86
+ - **Kimi K2.6** and **GLM 5.1** have reasoning **on by default**
87
+ - **Cache read** pricing applies to repeated input tokens served from cache on supported models
88
+ - **Gemma 4** does not support cache read pricing
89
+
90
+ ## Usage
91
+
92
+ After loading the extension, use the `/model` command in pi to select your preferred model:
93
+
94
+ ```
95
+ /model lilac moonshotai/kimi-k2.6
96
+ ```
97
+
98
+ Or start pi directly with a Lilac model:
99
+
100
+ ```bash
101
+ pi --provider lilac --model moonshotai/kimi-k2.6
102
+ ```
103
+
104
+ ### Thinking Mode
105
+
106
+ All Lilac models support chain-of-thought reasoning via `chat_template_kwargs`. Pi uses the `qwen-chat-template` thinking format to send both `thinking` and `enable_thinking` keys, which works across all model families:
107
+
108
+ - **Kimi K2.6**: Honors `thinking` key (Moonshot template)
109
+ - **GLM 5.1**: Honors `enable_thinking` key (Z.ai template)
110
+ - **Gemma 4**: Honors `enable_thinking` key (Google template)
111
+
112
+ In pi, reasoning models automatically use the appropriate thinking format. Use Shift+Tab to control thinking level.
113
+
114
+ ### Vision
115
+
116
+ Kimi K2.6 and Gemma 4 support image inputs. Pass images in messages and pi will handle the formatting automatically.
117
+
118
+ Gemma 4 also supports video by accepting a sequence of frames as images.
119
+
120
+ ## Authentication
121
+
122
+ The Lilac API key can be configured in multiple ways (resolved in this order):
123
+
124
+ 1. **`auth.json`** (recommended) — Add to `~/.pi/agent/auth.json`:
125
+ ```json
126
+ { "lilac": { "type": "api_key", "key": "your-api-key" } }
127
+ ```
128
+ The `key` field supports literal values, env var names, and shell commands (prefix with `!`). See [pi's auth file docs](https://github.com/badlogic/pi-mono) for details.
129
+ 2. **Runtime override** — Use the `--api-key` CLI flag
130
+ 3. **Environment variable** — Set `LILAC_API_KEY`
131
+
132
+ Get your API key at [getlilac.com](https://getlilac.com/).
133
+
134
+ ## Environment Variables
135
+
136
+ | Variable | Required | Description |
137
+ |----------|----------|-------------|
138
+ | `LILAC_API_KEY` | No | Your Lilac API key (fallback if not in auth.json) |
139
+
140
+ ## Configuration
141
+
142
+ Add to your pi configuration for automatic loading:
143
+
144
+ ```json
145
+ {
146
+ "extensions": [
147
+ "/path/to/pi-lilac-provider"
148
+ ]
149
+ }
150
+ ```
151
+
152
+ ### Compat Settings
153
+
154
+ Lilac's API is OpenAI-compatible with these specifics:
155
+
156
+ - **`thinkingFormat: "qwen-chat-template"`** — All reasoning models. Lilac uses `chat_template_kwargs` (with `thinking` and `enable_thinking` keys) to toggle reasoning. Pi sends both keys for forward compatibility.
157
+ - **`maxTokensField: "max_completion_tokens"`** — All models. Lilac supports `max_completion_tokens` (preferred for reasoning models as it includes reasoning tokens).
158
+ - **`supportsDeveloperRole: true`** — All models. Lilac's vLLM backend maps the developer role to system.
159
+ - **`supportsStore: false`** — All models. Lilac doesn't support the `store` parameter.
160
+
161
+ ### Known Caveats
162
+
163
+ - **GLM 5.1 intermittent tool call loss**: vLLM's streaming parser intermittently emits `finish_reason: "tool_calls"` without any `delta.tool_calls` chunks — even with `tool_stream: true` (set via `zaiToolStream` in compat). Pi maps this to `stopReason: "toolUse"` with zero toolCall blocks, causing an "abrupt stop". The extension's `message_end` handler converts this to a retryable error that triggers pi's built-in auto-retry mechanism, so the agent automatically re-prompts and typically succeeds on the next attempt.
164
+ - **GLM 5.1 chain-of-thought leakage**: On the current vLLM build, disabling reasoning on GLM 5.1 may still leak chain-of-thought into `content` terminated by a `</think>` marker. Post-process the response to discard text up to and including the first `</think>` when reasoning is disabled. See [vllm-project/vllm#31319](https://github.com/vllm-project/vllm/issues/31319).
165
+ - **Gemma 4 reasoning parser**: vLLM's reasoning parser can fail to populate the `reasoning` field when special tokens are stripped before the parser runs. Clients that require a clean split should post-process `<|channel|>thought ... <|channel|>` markers. See [vllm-project/vllm#38855](https://github.com/vllm-project/vllm/issues/38855).
166
+ - **Gemma 4 structured output**: Combining `enable_thinking: false` with `response_format: json_schema` can silently disable xgrammar-backed structured output. If you rely on structured output with Gemma 4, leave thinking enabled or validate output client-side. See [vllm-project/vllm#39130](https://github.com/vllm-project/vllm/issues/39130).
167
+
168
+ ### Patch Overrides
169
+
170
+ The `patch.json` file contains overrides that are applied on top of `models.json` data. This is useful for:
171
+ - Correcting API-derived values (e.g., GLM 5.1's `maxTokens` — API returns context length, actual max output is 131K)
172
+ - Marking models as reasoning-capable when the API features list doesn't include it
173
+ - Adding compat settings that the API doesn't provide
174
+ - Overriding pricing when official rates change
175
+
176
+ ## Updating Models
177
+
178
+ Run the update script to fetch the latest models from Lilac's API:
179
+
180
+ ```bash
181
+ export LILAC_API_KEY=your-api-key
182
+ node scripts/update-models.js
183
+ ```
184
+
185
+ This will:
186
+ 1. Fetch models from `https://api.getlilac.com/v1/models`
187
+ 2. Convert per-token pricing to per-million-tokens
188
+ 3. Preserve existing curated data (pricing, compat) for known models
189
+ 4. Apply overrides from `patch.json`
190
+ 5. Update `models.json` and the README model table
191
+
192
+ A GitHub Actions workflow runs this daily and creates a PR if models have changed.
193
+
194
+ ## License
195
+
196
+ MIT
@@ -0,0 +1 @@
1
+ []