@simontan/llm-gateway 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.en.md ADDED
@@ -0,0 +1,197 @@
1
+ # LLM Gateway
2
+
3
+ A lightweight LLM API gateway that accepts OpenAI `chat/completions` requests and forwards them to downstream `completions` or `responses` protocols based on provider configuration.
4
+
5
+ ## Features
6
+
7
+ - Rust-based high-performance async forwarding
8
+ - Multi-provider routing (OpenAI, Anthropic, and custom OpenAI-compatible services)
9
+ - Protocol adaptation `completions -> responses -> completions`
10
+ - Global concurrency protection and global rate limiting
11
+ - Configurable CORS, request timeout, and `/metrics` auth
12
+ - Graceful shutdown (SIGINT/SIGTERM)
13
+ - Structured logging and metrics
14
+
15
+ ## Protocol Mapping
16
+
17
+ - Clients always call `/{provider}/v1/chat/completions`
18
+ - If `protocol: "responses"` is configured, the gateway converts both request and response automatically
19
+ - For streaming, responses events are converted into `chat.completion.chunk` SSE
20
+
21
+ ## Quick Start
22
+
23
+ ### 1) Prepare config
24
+
25
+ ```bash
26
+ cp config.yaml.example config.yaml
27
+ cp .env.template .env
28
+ ```
29
+
30
+ `.env` example:
31
+
32
+ ```env
33
+ OPENAI_API_KEY=...
34
+ ANTHROPIC_API_KEY=...
35
+ GATEWAY_API_KEY=... # optional
36
+ ```
37
+
38
+ ### 2) Run
39
+
40
+ ```bash
41
+ cargo run
42
+ ```
43
+
44
+ Default address: `http://localhost:8080`
45
+
46
+ ### 3) Health check
47
+
48
+ ```bash
49
+ curl http://localhost:8080/health
50
+ ```
51
+
52
+ ## Configuration
53
+
54
+ ```yaml
55
+ server:
56
+ address: "0.0.0.0"
57
+ port: 8080
58
+ request-timeout-seconds: 930
59
+ cors:
60
+ allow-any-origin: false
61
+ allow-origins: [] # backend-only default; set explicit origins for browser use
62
+ limits:
63
+ max-in-flight-requests: 512
64
+ max-requests-per-second: 200
65
+ metrics:
66
+ require-auth: true # recommended in production to protect /metrics
67
+ resilience:
68
+ provider-max-concurrency: 128
69
+ retry-max-attempts: 3
70
+ circuit-breaker-failure-threshold: 8
71
+
72
+ providers:
73
+ openai:
74
+ models: ["gpt-4o-mini"]
75
+ base-url: "https://api.openai.com"
76
+
77
+ anthropic:
78
+ models: ["claude-3-5-sonnet"]
79
+ base-url: "https://api.anthropic.com"
80
+ version: "2023-06-01"
81
+
82
+ my-responses-provider:
83
+ models: ["gpt-4.1-mini"]
84
+ base-url: "https://api.example.com"
85
+ protocol: "responses"
86
+ ```
87
+
88
+ ### `server` Parameter Reference
89
+
90
+ | Parameter | Type | Description |
91
+ |---|---|---|
92
+ | `address` | `string` | Bind address. `0.0.0.0` listens on all interfaces |
93
+ | `port` | `u16` | Listen port |
94
+ | `request-timeout-seconds` | `u64` | Per-request timeout in seconds; returns `504` on timeout |
95
+ | `cors.allow-any-origin` | `bool` | Whether to allow any CORS origin; `true` means `*` |
96
+ | `cors.allow-origins` | `string[]` | CORS origin allowlist; effective only when `allow-any-origin: false` |
97
+ | `limits.max-in-flight-requests` | `usize?` | Global in-flight concurrency limit |
98
+ | `limits.max-requests-per-second` | `u64?` | Global requests-per-second limit |
99
+ | `metrics.require-auth` | `bool` | Whether `/metrics` requires gateway authentication |
100
+ | `resilience.provider-max-concurrency` | `usize` | Per-provider bulkhead concurrency limit |
101
+ | `resilience.retry-max-attempts` | `u32` | Max retries for transient transport errors |
102
+ | `resilience.circuit-breaker-failure-threshold` | `u32` | Consecutive failure threshold for circuit breaker |
103
+
104
+ Fixed internal defaults (not configurable):
105
+ - Initial retry backoff: `100ms`
106
+ - Max retry backoff: `1000ms`
107
+ - Circuit breaker open duration: `20s`
108
+
109
+ - `server.metrics.require-auth` only takes effect when `GATEWAY_API_KEY` is set
110
+ - `/health` is always unauthenticated
111
+ - With `allow-any-origin: false` and `allow-origins: []`, CORS is closed by default (recommended for backend-only usage)
112
+
113
+ ## API Endpoints
114
+
115
+ | Endpoint | Method | Description |
116
+ |---|---|---|
117
+ | `/health` | GET | Health check |
118
+ | `/metrics` | GET | Metrics (auth configurable) |
119
+ | `/{provider}/v1/chat/completions` | POST | Unified chat completions entry |
120
+
121
+ ## Architecture
122
+
123
+ ```mermaid
124
+ flowchart LR
125
+ C[Client SDK / HTTP Client] --> G[LLM Gateway<br/>Axum Server]
126
+
127
+ subgraph GW[Gateway Core]
128
+ G --> M1[Middlewares<br/>CORS / Auth / RateLimit / ConcurrencyLimit]
129
+ M1 --> R[Router<br/>/health /metrics /:provider/v1/chat/completions]
130
+ R --> D[Dispatcher]
131
+ R --> L[RequestLogger + MetricsCollector]
132
+ end
133
+
134
+ D --> P1[ProviderClient: openai]
135
+ D --> P2[ProviderClient: anthropic]
136
+ D --> P3[ProviderClient: custom providers]
137
+
138
+ subgraph MAP[Protocol Mapping Layer]
139
+ Q1[RequestMapper<br/>chat/completions -> target protocol]
140
+ Q2[ResponseMapper<br/>target protocol -> chat/completions]
141
+ end
142
+
143
+ R --> Q1
144
+ Q1 --> P1
145
+ Q1 --> P2
146
+ Q1 --> P3
147
+
148
+ P1 --> U1[Upstream OpenAI-compatible API]
149
+ P2 --> U2[Upstream Anthropic Messages API]
150
+ P3 --> U3[Upstream Responses/Completions API]
151
+
152
+ U1 --> Q2
153
+ U2 --> Q2
154
+ U3 --> Q2
155
+ Q2 --> R
156
+ R --> C
157
+ ```
158
+
159
+ ### Request Path (Streaming / Non-Streaming)
160
+
161
+ ```mermaid
162
+ sequenceDiagram
163
+ participant Client
164
+ participant Router
165
+ participant MapperReq as RequestMapper
166
+ participant Provider as ProviderClient
167
+ participant Upstream
168
+ participant MapperResp as ResponseMapper
169
+
170
+ Client->>Router: POST /:provider/v1/chat/completions
171
+ Router->>MapperReq: Convert request by provider protocol
172
+ alt stream=true
173
+ Router->>Provider: forward_request_stream()
174
+ Provider->>Upstream: POST (messages/responses/completions)
175
+ Upstream-->>Provider: SSE chunks
176
+ Provider->>MapperResp: Convert chunk to OpenAI chat.completion.chunk
177
+ MapperResp-->>Router: normalized SSE chunk
178
+ Router-->>Client: data: <chunk>\n\n ... data: [DONE]
179
+ else stream=false
180
+ Router->>Provider: forward_request()
181
+ Provider->>Upstream: POST (messages/responses/completions)
182
+ Upstream-->>Provider: JSON response
183
+ Provider->>MapperResp: Convert to OpenAI chat.completion
184
+ MapperResp-->>Router: normalized JSON
185
+ Router-->>Client: 200 JSON
186
+ end
187
+ ```
188
+
189
+ ## Custom Providers
190
+
191
+ Add a provider in `config.yaml`, then set `{PROVIDER}_API_KEY` in `.env`. Provider names are uppercased and `-` is replaced with `_`.
192
+
193
+ ## Production Notes
194
+
195
+ - Put a reverse proxy in front (TLS, WAF, IP controls)
196
+ - For multi-instance deployments, use distributed rate limiting (Redis / Envoy / Nginx)
197
+ - Keep API keys in environment variables only, never commit them
package/README.md ADDED
@@ -0,0 +1,11 @@
1
+ # LLM Gateway
2
+
3
+ 请选择文档语言 / Choose your language:
4
+
5
+ - [中文文档(README.zh.md)](README.zh.md)
6
+ - [English Documentation (README.en.md)](README.en.md)
7
+
8
+ 快速开始 / Quick Start:
9
+
10
+ - [中文快速开始(QUICKSTART.zh.md)](QUICKSTART.zh.md)
11
+ - [English Quick Start (QUICKSTART.en.md)](QUICKSTART.en.md)
package/README.zh.md ADDED
@@ -0,0 +1,197 @@
1
+ # LLM Gateway
2
+
3
+ 一个轻量级的 LLM API 网关,统一接收 OpenAI `chat/completions` 请求,并按 provider 配置转发到下游 `completions` 或 `responses` 协议。
4
+
5
+ ## 特性
6
+
7
+ - Rust 实现,异步高性能转发
8
+ - 多 provider 路由(OpenAI、Anthropic、以及自定义 OpenAI 兼容服务)
9
+ - 协议适配 `completions -> responses -> completions`
10
+ - 全局并发保护与全局限流
11
+ - 可配置 CORS、请求超时、`/metrics` 鉴权
12
+ - 优雅停机(SIGINT/SIGTERM)
13
+ - 结构化日志与性能指标
14
+
15
+ ## 协议映射
16
+
17
+ - 客户端统一调用 `/{provider}/v1/chat/completions`
18
+ - 若 provider 配置 `protocol: "responses"`,网关会自动做请求和响应双向转换
19
+ - 流式场景会将 responses 事件转换为 `chat.completion.chunk` SSE
20
+
21
+ ## 快速开始
22
+
23
+ ### 1) 准备配置
24
+
25
+ ```bash
26
+ cp config.yaml.example config.yaml
27
+ cp .env.template .env
28
+ ```
29
+
30
+ `.env` 示例:
31
+
32
+ ```env
33
+ OPENAI_API_KEY=...
34
+ ANTHROPIC_API_KEY=...
35
+ GATEWAY_API_KEY=... # 可选
36
+ ```
37
+
38
+ ### 2) 启动服务
39
+
40
+ ```bash
41
+ cargo run
42
+ ```
43
+
44
+ 默认地址:`http://localhost:8080`
45
+
46
+ ### 3) 健康检查
47
+
48
+ ```bash
49
+ curl http://localhost:8080/health
50
+ ```
51
+
52
+ ## 配置说明
53
+
54
+ ```yaml
55
+ server:
56
+ address: "0.0.0.0"
57
+ port: 8080
58
+ request-timeout-seconds: 930
59
+ cors:
60
+ allow-any-origin: false
61
+ allow-origins: [] # backend-only 默认不允许跨域;前端场景请填具体 origin
62
+ limits:
63
+ max-in-flight-requests: 512
64
+ max-requests-per-second: 200
65
+ metrics:
66
+ require-auth: true # 建议生产开启,保护 /metrics
67
+ resilience:
68
+ provider-max-concurrency: 128
69
+ retry-max-attempts: 3
70
+ circuit-breaker-failure-threshold: 8
71
+
72
+ providers:
73
+ openai:
74
+ models: ["gpt-4o-mini"]
75
+ base-url: "https://api.openai.com"
76
+
77
+ anthropic:
78
+ models: ["claude-3-5-sonnet"]
79
+ base-url: "https://api.anthropic.com"
80
+ version: "2023-06-01"
81
+
82
+ my-responses-provider:
83
+ models: ["gpt-4.1-mini"]
84
+ base-url: "https://api.example.com"
85
+ protocol: "responses"
86
+ ```
87
+
88
+ ### `server` 参数说明
89
+
90
+ | 参数 | 类型 | 说明 |
91
+ |---|---|---|
92
+ | `address` | `string` | 服务监听地址,`0.0.0.0` 表示监听所有网卡 |
93
+ | `port` | `u16` | 服务监听端口 |
94
+ | `request-timeout-seconds` | `u64` | 单次请求超时(秒),超时返回 `504` |
95
+ | `cors.allow-any-origin` | `bool` | 是否允许任意来源跨域;`true` 为 `*` |
96
+ | `cors.allow-origins` | `string[]` | CORS 来源白名单,仅在 `allow-any-origin: false` 时生效 |
97
+ | `limits.max-in-flight-requests` | `usize?` | 全局并发请求上限 |
98
+ | `limits.max-requests-per-second` | `u64?` | 全局每秒请求上限(RPS) |
99
+ | `metrics.require-auth` | `bool` | 是否要求 `/metrics` 通过网关鉴权 |
100
+ | `resilience.provider-max-concurrency` | `usize` | 单 provider 并发隔离上限(bulkhead) |
101
+ | `resilience.retry-max-attempts` | `u32` | 传输层瞬时错误最大重试次数 |
102
+ | `resilience.circuit-breaker-failure-threshold` | `u32` | 熔断连续失败阈值 |
103
+
104
+ 固定默认(不可配置):
105
+ - 重试初始退避:`100ms`
106
+ - 重试最大退避:`1000ms`
107
+ - 熔断打开时间:`20s`
108
+
109
+ - `server.metrics.require-auth` 仅在设置 `GATEWAY_API_KEY` 后生效
110
+ - `/health` 永远免认证
111
+ - `allow-any-origin: false` 且 `allow-origins: []` 时默认不放开跨域(后端场景推荐)
112
+
113
+ ## API 端点
114
+
115
+ | 端点 | 方法 | 描述 |
116
+ |---|---|---|
117
+ | `/health` | GET | 健康检查 |
118
+ | `/metrics` | GET | 性能指标(可配置鉴权) |
119
+ | `/{provider}/v1/chat/completions` | POST | 统一聊天补全入口 |
120
+
121
+ ## 架构图
122
+
123
+ ```mermaid
124
+ flowchart LR
125
+ C[Client SDK / HTTP Client] --> G[LLM Gateway<br/>Axum Server]
126
+
127
+ subgraph GW[Gateway Core]
128
+ G --> M1[Middlewares<br/>CORS / Auth / RateLimit / ConcurrencyLimit]
129
+ M1 --> R[Router<br/>/health /metrics /:provider/v1/chat/completions]
130
+ R --> D[Dispatcher]
131
+ R --> L[RequestLogger + MetricsCollector]
132
+ end
133
+
134
+ D --> P1[ProviderClient: openai]
135
+ D --> P2[ProviderClient: anthropic]
136
+ D --> P3[ProviderClient: custom providers]
137
+
138
+ subgraph MAP[Protocol Mapping Layer]
139
+ Q1[RequestMapper<br/>chat/completions -> target protocol]
140
+ Q2[ResponseMapper<br/>target protocol -> chat/completions]
141
+ end
142
+
143
+ R --> Q1
144
+ Q1 --> P1
145
+ Q1 --> P2
146
+ Q1 --> P3
147
+
148
+ P1 --> U1[Upstream OpenAI-compatible API]
149
+ P2 --> U2[Upstream Anthropic Messages API]
150
+ P3 --> U3[Upstream Responses/Completions API]
151
+
152
+ U1 --> Q2
153
+ U2 --> Q2
154
+ U3 --> Q2
155
+ Q2 --> R
156
+ R --> C
157
+ ```
158
+
159
+ ### 请求路径(流式 / 非流式)
160
+
161
+ ```mermaid
162
+ sequenceDiagram
163
+ participant Client
164
+ participant Router
165
+ participant MapperReq as RequestMapper
166
+ participant Provider as ProviderClient
167
+ participant Upstream
168
+ participant MapperResp as ResponseMapper
169
+
170
+ Client->>Router: POST /:provider/v1/chat/completions
171
+ Router->>MapperReq: Convert request by provider protocol
172
+ alt stream=true
173
+ Router->>Provider: forward_request_stream()
174
+ Provider->>Upstream: POST (messages/responses/completions)
175
+ Upstream-->>Provider: SSE chunks
176
+ Provider->>MapperResp: Convert chunk to OpenAI chat.completion.chunk
177
+ MapperResp-->>Router: normalized SSE chunk
178
+ Router-->>Client: data: <chunk>\n\n ... data: [DONE]
179
+ else stream=false
180
+ Router->>Provider: forward_request()
181
+ Provider->>Upstream: POST (messages/responses/completions)
182
+ Upstream-->>Provider: JSON response
183
+ Provider->>MapperResp: Convert to OpenAI chat.completion
184
+ MapperResp-->>Router: normalized JSON
185
+ Router-->>Client: 200 JSON
186
+ end
187
+ ```
188
+
189
+ ## 自定义 Provider
190
+
191
+ 在 `config.yaml` 增加 provider,并在 `.env` 配置 `{PROVIDER}_API_KEY`。provider 名会自动转大写并将 `-` 替换为 `_`。
192
+
193
+ ## 生产建议
194
+
195
+ - 建议在网关前加反向代理(TLS、WAF、IP 限制)
196
+ - 多实例部署时,建议使用分布式限流(Redis / Envoy / Nginx)
197
+ - API keys 仅放在环境变量,不要提交到仓库
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env node
2
+
3
+ const { spawnSync } = require('node:child_process')
4
+ const { resolveBinary } = require('../lib/resolve-binary')
5
+
6
+ const binPath = resolveBinary()
7
+ const result = spawnSync(binPath, process.argv.slice(2), { stdio: 'inherit' })
8
+
9
+ if (result.error) {
10
+ console.error(result.error.message)
11
+ process.exit(1)
12
+ }
13
+
14
+ process.exit(result.status ?? 0)
@@ -0,0 +1,20 @@
1
+ const path = require('node:path')
2
+ const fs = require('node:fs')
3
+
4
+ function resolveBinary() {
5
+ const key = `${process.platform}-${process.arch}`
6
+ if (process.platform !== 'darwin') {
7
+ throw new Error(`Unsupported platform: ${key}. This package only supports macOS.`)
8
+ }
9
+
10
+ const binaryPath = path.resolve(__dirname, '..', 'npm-bin', key, 'llm-gateway')
11
+ if (!fs.existsSync(binaryPath)) {
12
+ throw new Error(`Missing binary: ${binaryPath}. Run: npm run prepare:binaries`)
13
+ }
14
+
15
+ return binaryPath
16
+ }
17
+
18
+ module.exports = {
19
+ resolveBinary,
20
+ }
Binary file
Binary file
package/package.json ADDED
@@ -0,0 +1,23 @@
1
+ {
2
+ "name": "@simontan/llm-gateway",
3
+ "version": "0.2.1",
4
+ "description": "Prebuilt llm-gateway binary wrapper for macOS",
5
+ "license": "MIT",
6
+ "bin": {
7
+ "llm-gateway": "bin/llm-gateway.js"
8
+ },
9
+ "files": [
10
+ "bin/llm-gateway.js",
11
+ "lib/resolve-binary.js",
12
+ "npm-bin/darwin-arm64/llm-gateway",
13
+ "npm-bin/darwin-x64/llm-gateway",
14
+ "README.md",
15
+ "README.en.md",
16
+ "README.zh.md"
17
+ ],
18
+ "scripts": {
19
+ "prepare:binaries": "node scripts/prepare-platform-packages.js",
20
+ "pack:all": "npm run prepare:binaries && npm pack",
21
+ "prepublishOnly": "npm run prepare:binaries"
22
+ }
23
+ }