pi-lilac-provider 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +196 -0
- package/custom-models.json +1 -0
- package/index.ts +672 -0
- package/models.json +139 -0
- package/package.json +40 -0
- package/patch.json +64 -0
- package/scripts/test-discounts.ts +454 -0
- package/scripts/update-models.js +342 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# 💜 pi-lilac-provider
|
|
4
|
+
|
|
5
|
+
**Kimi K2.6, GLM 5.1, Gemma 4 & more on idle GPUs via [Lilac](https://getlilac.com/)**
|
|
6
|
+
|
|
7
|
+
_A [pi](https://github.com/earendil-works/pi-coding-agent) provider extension for cost-efficient GPU inference._
|
|
8
|
+
|
|
9
|
+
[](https://github.com/earendil-works/pi-coding-agent)
|
|
10
|
+
[](./LICENSE)
|
|
11
|
+
|
|
12
|
+
</div>
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
Access Kimi K2.6, GLM 5.1, MiniMax M2.7, and Gemma 4 models through Lilac's OpenAI-compatible API on idle GPUs.
|
|
17
|
+
|
|
18
|
+
## Features
|
|
19
|
+
|
|
20
|
+
- **3 AI Models** — Kimi K2.6, GLM 5.1, and Gemma 4
|
|
21
|
+
- **OpenAI-Compatible API** — Just change the base URL and API key
|
|
22
|
+
- **Cost Tracking** — Per-model pricing with cache read discounts
|
|
23
|
+
- **Reasoning Models** — Chain-of-thought via `chat_template_kwargs` (all models)
|
|
24
|
+
- **Vision Support** — Image input on Kimi K2.6 and Gemma 4
|
|
25
|
+
- **Context Caching** — Cache read pricing on Kimi K2.6 and GLM 5.1
|
|
26
|
+
- **Idle GPU Scheduling** — Lilac leverages idle GPU capacity for cost-efficient inference
|
|
27
|
+
|
|
28
|
+
## Installation
|
|
29
|
+
|
|
30
|
+
### Option 1: Using `pi install` (Recommended)
|
|
31
|
+
|
|
32
|
+
Install directly from GitHub:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pi install https://github.com/monotykamary/pi-lilac-provider
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Then set your API key and run pi:
|
|
39
|
+
```bash
|
|
40
|
+
# Recommended: add to auth.json
|
|
41
|
+
# See Authentication section below
|
|
42
|
+
|
|
43
|
+
# Or set as environment variable
|
|
44
|
+
export LILAC_API_KEY=your-api-key-here
|
|
45
|
+
|
|
46
|
+
pi
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Option 2: Manual Clone
|
|
50
|
+
|
|
51
|
+
1. Clone this repository:
|
|
52
|
+
```bash
|
|
53
|
+
git clone https://github.com/monotykamary/pi-lilac-provider.git
|
|
54
|
+
cd pi-lilac-provider
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
2. Set your Lilac API key:
|
|
58
|
+
```bash
|
|
59
|
+
# Recommended: add to auth.json
|
|
60
|
+
# See Authentication section below
|
|
61
|
+
|
|
62
|
+
# Or set as environment variable
|
|
63
|
+
export LILAC_API_KEY=your-api-key-here
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
3. Run pi with the extension:
|
|
67
|
+
```bash
|
|
68
|
+
pi -e /path/to/pi-lilac-provider
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Available Models
|
|
72
|
+
|
|
73
|
+
| Model | Context | Vision | Reasoning | Input $/M | Cache Read $/M | Output $/M |
|
|
74
|
+
|-------|---------|--------|-----------|-----------|-----------------|------------|
|
|
75
|
+
| Gemma 4 | 262K | ✅ | ✅ | $0.11 | — | $0.35 |
|
|
76
|
+
| GLM 5.1 | 203K | ❌ | ✅ | $0.90 | $0.27 | $3.00 |
|
|
77
|
+
| GLM 5.2 | 524K | ❌ | ✅ | $0.90 | $0.27 | $3.00 |
|
|
78
|
+
| Kimi K2.6 | 262K | ✅ | ✅ | $0.70 | $0.20 | $3.50 |
|
|
79
|
+
| MiniMax M2.7 | 205K | ❌ | ✅ | $0.30 | $0.06 | $1.20 |
|
|
80
|
+
| MiniMax M3 | 1.0M | ❌ | ✅ | $0.28 | $0.05 | $1.10 |
|
|
81
|
+
|
|
82
|
+
*Costs are per million tokens. Prices subject to change — check [getlilac.com](https://getlilac.com/) for current pricing.*
|
|
83
|
+
|
|
84
|
+
**Notes:**
|
|
85
|
+
- **Gemma 4** has reasoning **off by default** — pi enables it when you set a thinking level (Shift+Tab)
|
|
86
|
+
- **Kimi K2.6** and **GLM 5.1** have reasoning **on by default**
|
|
87
|
+
- **Cache read** pricing applies to repeated input tokens served from cache on supported models
|
|
88
|
+
- **Gemma 4** does not support cache read pricing
|
|
89
|
+
|
|
90
|
+
## Usage
|
|
91
|
+
|
|
92
|
+
After loading the extension, use the `/model` command in pi to select your preferred model:
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
/model lilac moonshotai/kimi-k2.6
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Or start pi directly with a Lilac model:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
pi --provider lilac --model moonshotai/kimi-k2.6
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Thinking Mode
|
|
105
|
+
|
|
106
|
+
All Lilac models support chain-of-thought reasoning via `chat_template_kwargs`. Pi uses the `qwen-chat-template` thinking format to send both `thinking` and `enable_thinking` keys, which works across all model families:
|
|
107
|
+
|
|
108
|
+
- **Kimi K2.6**: Honors `thinking` key (Moonshot template)
|
|
109
|
+
- **GLM 5.1**: Honors `enable_thinking` key (Z.ai template)
|
|
110
|
+
- **Gemma 4**: Honors `enable_thinking` key (Google template)
|
|
111
|
+
|
|
112
|
+
In pi, reasoning models automatically use the appropriate thinking format. Use Shift+Tab to control thinking level.
|
|
113
|
+
|
|
114
|
+
### Vision
|
|
115
|
+
|
|
116
|
+
Kimi K2.6 and Gemma 4 support image inputs. Pass images in messages and pi will handle the formatting automatically.
|
|
117
|
+
|
|
118
|
+
Gemma 4 also supports video by accepting a sequence of frames as images.
|
|
119
|
+
|
|
120
|
+
## Authentication
|
|
121
|
+
|
|
122
|
+
The Lilac API key can be configured in multiple ways (resolved in this order):
|
|
123
|
+
|
|
124
|
+
1. **`auth.json`** (recommended) — Add to `~/.pi/agent/auth.json`:
|
|
125
|
+
```json
|
|
126
|
+
{ "lilac": { "type": "api_key", "key": "your-api-key" } }
|
|
127
|
+
```
|
|
128
|
+
The `key` field supports literal values, env var names, and shell commands (prefix with `!`). See [pi's auth file docs](https://github.com/badlogic/pi-mono) for details.
|
|
129
|
+
2. **Runtime override** — Use the `--api-key` CLI flag
|
|
130
|
+
3. **Environment variable** — Set `LILAC_API_KEY`
|
|
131
|
+
|
|
132
|
+
Get your API key at [getlilac.com](https://getlilac.com/).
|
|
133
|
+
|
|
134
|
+
## Environment Variables
|
|
135
|
+
|
|
136
|
+
| Variable | Required | Description |
|
|
137
|
+
|----------|----------|-------------|
|
|
138
|
+
| `LILAC_API_KEY` | No | Your Lilac API key (fallback if not in auth.json) |
|
|
139
|
+
|
|
140
|
+
## Configuration
|
|
141
|
+
|
|
142
|
+
Add to your pi configuration for automatic loading:
|
|
143
|
+
|
|
144
|
+
```json
|
|
145
|
+
{
|
|
146
|
+
"extensions": [
|
|
147
|
+
"/path/to/pi-lilac-provider"
|
|
148
|
+
]
|
|
149
|
+
}
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Compat Settings
|
|
153
|
+
|
|
154
|
+
Lilac's API is OpenAI-compatible with these specifics:
|
|
155
|
+
|
|
156
|
+
- **`thinkingFormat: "qwen-chat-template"`** — All reasoning models. Lilac uses `chat_template_kwargs` (with `thinking` and `enable_thinking` keys) to toggle reasoning. Pi sends both keys for forward compatibility.
|
|
157
|
+
- **`maxTokensField: "max_completion_tokens"`** — All models. Lilac supports `max_completion_tokens` (preferred for reasoning models as it includes reasoning tokens).
|
|
158
|
+
- **`supportsDeveloperRole: true`** — All models. Lilac's vLLM backend maps the developer role to system.
|
|
159
|
+
- **`supportsStore: false`** — All models. Lilac doesn't support the `store` parameter.
|
|
160
|
+
|
|
161
|
+
### Known Caveats
|
|
162
|
+
|
|
163
|
+
- **GLM 5.1 intermittent tool call loss**: vLLM's streaming parser intermittently emits `finish_reason: "tool_calls"` without any `delta.tool_calls` chunks — even with `tool_stream: true` (set via `zaiToolStream` in compat). Pi maps this to `stopReason: "toolUse"` with zero toolCall blocks, causing an "abrupt stop". The extension's `message_end` handler converts this to a retryable error that triggers pi's built-in auto-retry mechanism, so the agent automatically re-prompts and typically succeeds on the next attempt.
|
|
164
|
+
- **GLM 5.1 chain-of-thought leakage**: On the current vLLM build, disabling reasoning on GLM 5.1 may still leak chain-of-thought into `content` terminated by a `</think>` marker. Post-process the response to discard text up to and including the first `</think>` when reasoning is disabled. See [vllm-project/vllm#31319](https://github.com/vllm-project/vllm/issues/31319).
|
|
165
|
+
- **Gemma 4 reasoning parser**: vLLM's reasoning parser can fail to populate the `reasoning` field when special tokens are stripped before the parser runs. Clients that require a clean split should post-process `<|channel|>thought ... <|channel|>` markers. See [vllm-project/vllm#38855](https://github.com/vllm-project/vllm/issues/38855).
|
|
166
|
+
- **Gemma 4 structured output**: Combining `enable_thinking: false` with `response_format: json_schema` can silently disable xgrammar-backed structured output. If you rely on structured output with Gemma 4, leave thinking enabled or validate output client-side. See [vllm-project/vllm#39130](https://github.com/vllm-project/vllm/issues/39130).
|
|
167
|
+
|
|
168
|
+
### Patch Overrides
|
|
169
|
+
|
|
170
|
+
The `patch.json` file contains overrides that are applied on top of `models.json` data. This is useful for:
|
|
171
|
+
- Correcting API-derived values (e.g., GLM 5.1's `maxTokens` — API returns context length, actual max output is 131K)
|
|
172
|
+
- Marking models as reasoning-capable when the API features list doesn't include it
|
|
173
|
+
- Adding compat settings that the API doesn't provide
|
|
174
|
+
- Overriding pricing when official rates change
|
|
175
|
+
|
|
176
|
+
## Updating Models
|
|
177
|
+
|
|
178
|
+
Run the update script to fetch the latest models from Lilac's API:
|
|
179
|
+
|
|
180
|
+
```bash
|
|
181
|
+
export LILAC_API_KEY=your-api-key
|
|
182
|
+
node scripts/update-models.js
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
This will:
|
|
186
|
+
1. Fetch models from `https://api.getlilac.com/v1/models`
|
|
187
|
+
2. Convert per-token pricing to per-million-tokens
|
|
188
|
+
3. Preserve existing curated data (pricing, compat) for known models
|
|
189
|
+
4. Apply overrides from `patch.json`
|
|
190
|
+
5. Update `models.json` and the README model table
|
|
191
|
+
|
|
192
|
+
A GitHub Actions workflow runs this daily and creates a PR if models have changed.
|
|
193
|
+
|
|
194
|
+
## License
|
|
195
|
+
|
|
196
|
+
MIT
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
[]
|