vertex-proxy 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,29 @@
1
+ # vertex-proxy environment variables
2
+ # Copy to .env and fill in your values.
3
+
4
+ # --- Required ---
5
+ # Path to GCP service-account JSON (must have aiplatform.user role)
6
+ VERTEX_PROXY_CREDENTIALS_PATH=/path/to/gcp-service-account.json
7
+
8
+ # GCP project ID (can be inferred from the service-account key)
9
+ # VERTEX_PROXY_PROJECT_ID=my-gcp-project
10
+
11
+ # --- Optional ---
12
+ # Region for Claude models on Vertex. us-east5 is the primary serving region.
13
+ # VERTEX_PROXY_ANTHROPIC_REGION=us-east5
14
+
15
+ # Region for Gemini models.
16
+ # VERTEX_PROXY_GEMINI_REGION=us-central1
17
+
18
+ # Region for Vertex MaaS partner models (Kimi, GLM, MiniMax, Qwen, Grok, etc.)
19
+ # VERTEX_PROXY_MAAS_REGION=us-central1
20
+
21
+ # Bind host + port
22
+ # VERTEX_PROXY_HOST=127.0.0.1
23
+ # VERTEX_PROXY_PORT=8787
24
+
25
+ # Token refresh interval in seconds (default: 3000 = 50 min; tokens expire at 60 min)
26
+ # VERTEX_PROXY_TOKEN_REFRESH_SECONDS=3000
27
+
28
+ # uvicorn log level: debug, info, warning, error
29
+ # VERTEX_PROXY_LOG_LEVEL=info
@@ -0,0 +1,35 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ['3.11', '3.12']
15
+
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+
19
+ - name: Set up Python ${{ matrix.python-version }}
20
+ uses: actions/setup-python@v5
21
+ with:
22
+ python-version: ${{ matrix.python-version }}
23
+
24
+ - name: Install dependencies
25
+ run: |
26
+ python -m pip install --upgrade pip
27
+ pip install -e '.[dev]'
28
+
29
+ - name: Lint (ruff)
30
+ run: |
31
+ ruff check vertex_proxy tests
32
+ ruff format --check vertex_proxy tests
33
+
34
+ - name: Tests (pytest)
35
+ run: pytest -v
@@ -0,0 +1,55 @@
1
+ name: Release
2
+
3
+ # Publishes to PyPI via Trusted Publishing (OIDC). No API token is stored as a
4
+ # secret. Triggered by pushing a version tag, e.g. `git tag v0.2.0 && git push origin v0.2.0`.
5
+ # One-time setup is documented under "Releasing" in CONTRIBUTING.md.
6
+
7
+ on:
8
+ push:
9
+ tags:
10
+ - "v*"
11
+
12
+ permissions:
13
+ contents: read
14
+
15
+ jobs:
16
+ build:
17
+ name: Build distributions
18
+ runs-on: ubuntu-latest
19
+ steps:
20
+ - uses: actions/checkout@v4
21
+
22
+ - uses: actions/setup-python@v5
23
+ with:
24
+ python-version: "3.12"
25
+
26
+ - name: Build sdist + wheel
27
+ run: |
28
+ python -m pip install --upgrade pip build twine
29
+ python -m build
30
+
31
+ - name: Check metadata
32
+ run: twine check dist/*
33
+
34
+ - uses: actions/upload-artifact@v4
35
+ with:
36
+ name: dist
37
+ path: dist/
38
+
39
+ publish:
40
+ name: Publish to PyPI
41
+ needs: build
42
+ runs-on: ubuntu-latest
43
+ environment:
44
+ name: pypi
45
+ url: https://pypi.org/p/vertex-proxy
46
+ permissions:
47
+ id-token: write # required for Trusted Publishing
48
+ steps:
49
+ - uses: actions/download-artifact@v4
50
+ with:
51
+ name: dist
52
+ path: dist/
53
+
54
+ - name: Publish
55
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,39 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.egg-info/
6
+ *.egg
7
+ .eggs/
8
+ build/
9
+ dist/
10
+ .pytest_cache/
11
+ .ruff_cache/
12
+ .mypy_cache/
13
+ .coverage
14
+ htmlcov/
15
+
16
+ # Virtual envs
17
+ .venv/
18
+ venv/
19
+ env/
20
+
21
+ # Editors / OS
22
+ .vscode/
23
+ .idea/
24
+ *.swp
25
+ .DS_Store
26
+
27
+ # Local config
28
+ .env
29
+ .env.local
30
+ *.json.local
31
+ credentials/
32
+ secrets/
33
+
34
+ # Rendered launchd plist (we only commit the template)
35
+ launchd/*.plist
36
+ !launchd/*.plist.template
37
+
38
+ # Logs
39
+ *.log
@@ -0,0 +1,38 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.2.0] - 2026-06-02
11
+
12
+ ### Added
13
+ - Accept the OpenAI Chat Completions request shape under the `/gemini` and `/openai` prefixes (`/gemini/chat/completions`, `/gemini/v1/chat/completions`, `/openai/chat/completions`) in addition to the bare root, so OpenAI-chat clients work regardless of which `base_url` prefix they are pointed at.
14
+ - Mirror model-discovery endpoints (`/v1/models`, `/models`) under the `/gemini` and `/openai` prefixes so clients that probe for a model catalog before dispatching don't get 404s.
15
+ - Publish to PyPI via a tag-triggered OIDC Trusted Publishing workflow, enabling `pipx install vertex-proxy` (and `uv tool install` / `uvx`).
16
+
17
+ ### Changed
18
+ - Single-source the package version from `vertex_proxy/__init__.py` (hatchling dynamic version), so a release only needs one version bump.
19
+
20
+ ### Fixed
21
+ - Fix `404 {"detail":"Not Found"}` for OpenAI-chat clients (e.g. Hermes) configured with a `/gemini` base URL. The client appends `/chat/completions` to `base_url`, which previously had no handler under the native Gemini route prefix ([#1](https://github.com/prasadus92/vertex-proxy/issues/1)).
22
+ - Prevent long Vertex streaming responses from failing with incomplete chunked reads by removing the fixed upstream read timeout for stream requests.
23
+ - Return structured SSE error events when upstream streaming fails instead of letting the response terminate abruptly.
24
+
25
+ ## [0.1.0] - 2026-04-21
26
+
27
+ Initial release.
28
+
29
+ ### Added
30
+ - Anthropic Messages API-compatible route (`POST /anthropic/v1/messages`) forwarding to Vertex AI's Claude models via `:rawPredict` / `:streamRawPredict`.
31
+ - Gemini generateContent API-compatible route (`POST /gemini/v1beta/models/{model}:{action}`) forwarding to Vertex AI Gemini.
32
+ - OpenAI Chat Completions API-compatible route (`POST /openai/v1/chat/completions`) for Vertex MaaS partner models (Kimi, GLM, MiniMax, Qwen, Grok).
33
+ - Automatic GCP access-token refresh (50-min cadence).
34
+ - Streaming support on Anthropic and Gemini routes.
35
+ - Model alias mapping (e.g., `claude-sonnet-4-5-20250929` → `claude-sonnet-4-5@20250929`).
36
+ - `/health` endpoint for liveness + auth check.
37
+ - `/v1/models` endpoint listing all routable models.
38
+ - launchd plist template + install script for macOS.
@@ -0,0 +1,89 @@
1
+ # Contributing
2
+
3
+ Thanks for your interest. This is a small project maintained in spare time. Contributions welcome via pull request.
4
+
5
+ ## Dev setup
6
+
7
+ ```
8
+ git clone https://github.com/prasadus92/vertex-proxy.git
9
+ cd vertex-proxy
10
+ python -m venv .venv
11
+ .venv/bin/pip install -e '.[dev]'
12
+ ```
13
+
14
+ ## Running tests
15
+
16
+ ```
17
+ .venv/bin/pytest
18
+ ```
19
+
20
+ Tests are pure unit / mocked; they don't hit real GCP. To smoke-test against live Vertex AI you need real credentials, so do it manually (see "Running locally against real Vertex" below); CI stays mock-only.
21
+
22
+ ## Running locally against real Vertex
23
+
24
+ ```
25
+ export VERTEX_PROXY_CREDENTIALS_PATH=/path/to/gcp-key.json
26
+ export VERTEX_PROXY_PROJECT_ID=your-project
27
+ .venv/bin/vertex-proxy
28
+ ```
29
+
30
+ Then in another terminal:
31
+
32
+ ```
33
+ curl http://127.0.0.1:8787/health
34
+ curl -X POST http://127.0.0.1:8787/gemini/v1beta/models/gemini-2.5-flash:generateContent \
35
+ -H "Content-Type: application/json" \
36
+ -d '{"contents":[{"role":"user","parts":[{"text":"hi"}]}]}'
37
+ ```
38
+
39
+ ## Style
40
+
41
+ - Line length: 100 chars
42
+ - Format: `ruff format`
43
+ - Lint: `ruff check`
44
+
45
+ ## Scope
46
+
47
+ This project intentionally does not:
48
+
49
+ - Authenticate incoming requests by default (it's a local-loopback proxy; optional bearer-token auth is available via `VERTEX_PROXY_API_KEY` for remote deploys)
50
+ - Do request transformation beyond what Vertex requires (e.g., Anthropic `model` field → URL path)
51
+ - Cache responses
52
+ - Log request bodies (privacy + credit safety)
53
+
54
+ If you want any of the above, file an issue first so we can discuss design.
55
+
56
+ ## Adding a new model
57
+
58
+ Most additions only require editing `vertex_proxy/config.py`:
59
+
60
+ - Claude model → add to `anthropic_model_aliases`
61
+ - Gemini model → add to `gemini_model_aliases`
62
+ - MaaS partner model (Kimi, GLM, MiniMax, Qwen, Grok, …) → add to `maas_model_aliases`
63
+
64
+ For a genuinely new model family with a different API shape, open an issue first.
65
+
66
+ ## Releasing (maintainer)
67
+
68
+ Releases publish to PyPI via [Trusted Publishing](https://docs.pypi.org/trusted-publishers/) (OIDC), so there is no API token to store as a secret.
69
+
70
+ One-time setup:
71
+
72
+ 1. On PyPI, add a trusted publisher at https://pypi.org/manage/account/publishing/ with:
73
+ - PyPI project name: `vertex-proxy`
74
+ - Owner: `prasadus92`
75
+ - Repository: `vertex-proxy`
76
+ - Workflow name: `release.yml`
77
+ - Environment name: `pypi`
78
+ 2. (Recommended) Create a GitHub environment named `pypi` under repo Settings to gate the publish job.
79
+
80
+ To cut a release:
81
+
82
+ 1. Bump `__version__` in `vertex_proxy/__init__.py`. That is the single source of truth; `pyproject.toml` and the running app's `version` both read from it.
83
+ 2. Move the `[Unreleased]` entries in `CHANGELOG.md` under a new dated version heading.
84
+ 3. Commit, then tag and push:
85
+ ```
86
+ git tag v0.2.0
87
+ git push origin v0.2.0
88
+ ```
89
+ 4. The `Release` workflow builds the sdist + wheel, runs `twine check`, and publishes to PyPI.
@@ -0,0 +1,31 @@
1
+ # syntax=docker/dockerfile:1.6
2
+
3
+ FROM python:3.12-slim AS builder
4
+
5
+ WORKDIR /app
6
+ COPY pyproject.toml README.md ./
7
+ COPY vertex_proxy/ vertex_proxy/
8
+
9
+ RUN pip install --no-cache-dir --prefix=/install .
10
+
11
+ # ---------------------------------------------------------------------------
12
+
13
+ FROM python:3.12-slim
14
+
15
+ RUN useradd --system --uid 1000 --no-create-home --home-dir /app vertex \
16
+ && mkdir -p /app && chown vertex:vertex /app
17
+
18
+ COPY --from=builder /install /usr/local
19
+ WORKDIR /app
20
+ USER vertex
21
+
22
+ ENV VERTEX_PROXY_HOST=0.0.0.0 \
23
+ VERTEX_PROXY_PORT=8787 \
24
+ PYTHONUNBUFFERED=1
25
+
26
+ EXPOSE 8787
27
+
28
+ HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
29
+ CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8787/health', timeout=3).status==200 else 1)"
30
+
31
+ ENTRYPOINT ["vertex-proxy"]
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Prasad Subrahmanya
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,332 @@
1
+ Metadata-Version: 2.4
2
+ Name: vertex-proxy
3
+ Version: 0.2.0
4
+ Summary: Anthropic + Gemini + OpenAI API-compatible proxy for Google Cloud Vertex AI. Bridges static-URL API consumers to Vertex AI's service-account auth.
5
+ Project-URL: Homepage, https://github.com/prasadus92/vertex-proxy
6
+ Project-URL: Author, https://prasad.tech
7
+ Project-URL: Issues, https://github.com/prasadus92/vertex-proxy/issues
8
+ Project-URL: Changelog, https://github.com/prasadus92/vertex-proxy/blob/main/CHANGELOG.md
9
+ Author-email: Prasad Subrahmanya <prasad@luminik.io>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: anthropic,api-proxy,claude,fastapi,gcp,gemini,google-cloud,llm,llm-proxy,openai,openai-compatible,proxy,service-account,vertex-ai
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Software Development :: Libraries
19
+ Requires-Python: >=3.11
20
+ Requires-Dist: fastapi>=0.109
21
+ Requires-Dist: google-auth>=2.28
22
+ Requires-Dist: httpx>=0.26
23
+ Requires-Dist: pydantic-settings>=2.2
24
+ Requires-Dist: pydantic>=2.6
25
+ Requires-Dist: requests>=2.31
26
+ Requires-Dist: uvicorn[standard]>=0.27
27
+ Provides-Extra: dev
28
+ Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
29
+ Requires-Dist: pytest>=8.0; extra == 'dev'
30
+ Requires-Dist: ruff>=0.3; extra == 'dev'
31
+ Description-Content-Type: text/markdown
32
+
33
+ # vertex-proxy
34
+
35
+ [![CI](https://github.com/prasadus92/vertex-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/prasadus92/vertex-proxy/actions/workflows/ci.yml)
36
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
37
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
38
+
39
+ A small, local-only proxy that bridges **any tool speaking the Anthropic Messages API, Gemini API, or OpenAI Chat Completions API** to **Google Cloud Vertex AI**, so you can point existing clients at Vertex without changing their code.
40
+
41
+ ## What this is for
42
+
43
+ You have a tool (Claude Code, Hermes Agent, opencode, Cline, Continue.dev, a custom SDK integration, etc.) that already knows how to talk to:
44
+
45
+ - `api.anthropic.com`
46
+ - `generativelanguage.googleapis.com`
47
+ - any OpenAI-compatible endpoint
48
+
49
+ You want that same tool to hit Vertex AI instead, maybe because you want to burn GCP credits, unify billing, or get higher quotas than the public APIs offer.
50
+
51
+ The problem: Vertex uses **short-lived OAuth access tokens** from a service-account key. Most tools expect a static `Authorization: Bearer xxx` header. Nobody wants to rebuild auth in every client.
52
+
53
+ vertex-proxy runs on `127.0.0.1:8787`, handles the auth refresh loop, and translates between the public API shapes and Vertex's publisher-model endpoints.
54
+
55
+ ```
56
+ ┌──────────────┐ Anthropic/Gemini/OpenAI ┌──────────────┐ GCP auth ┌────────────┐
57
+ │ your tool │ ──────────────────────────► │ vertex-proxy │ ──────────► │ Vertex AI │
58
+ └──────────────┘ localhost:8787 └──────────────┘ SA JWT └────────────┘
59
+ ```
60
+
61
+ No client changes. Small, dependency-light Python. MIT licensed.
62
+
63
+ ## Install
64
+
65
+ Python 3.11+, a GCP project with Vertex AI API enabled, and a service-account JSON key with `roles/aiplatform.user`.
66
+
67
+ ```bash
68
+ pipx install vertex-proxy
69
+ # or: uv tool install vertex-proxy
70
+ # or run it without installing: uvx vertex-proxy
71
+ ```
72
+
73
+ ### From source (for development)
74
+
75
+ ```bash
76
+ git clone https://github.com/prasadus92/vertex-proxy.git
77
+ cd vertex-proxy
78
+ python -m venv .venv
79
+ .venv/bin/pip install -e .
80
+ ```
81
+
82
+ ## Run
83
+
84
+ ```bash
85
+ export VERTEX_PROXY_CREDENTIALS_PATH=/path/to/service-account.json
86
+ export VERTEX_PROXY_PROJECT_ID=your-gcp-project
87
+ vertex-proxy
88
+ # → listening on http://127.0.0.1:8787
89
+ ```
90
+
91
+ Or inline:
92
+
93
+ ```bash
94
+ vertex-proxy \
95
+ --credentials ~/.vertex/key.json \
96
+ --project-id my-project \
97
+ --port 8787
98
+ ```
99
+
100
+ (From a source checkout, the command is `.venv/bin/vertex-proxy`.)
101
+
102
+ Verify:
103
+
104
+ ```bash
105
+ curl http://127.0.0.1:8787/health
106
+ # {"status":"ok","project":"my-project"}
107
+
108
+ curl -X POST http://127.0.0.1:8787/gemini/v1beta/models/gemini-2.5-flash:generateContent \
109
+ -H "Content-Type: application/json" \
110
+ -d '{"contents":[{"role":"user","parts":[{"text":"hello"}]}]}'
111
+ ```
112
+
113
+ ## Endpoints
114
+
115
+ | Path | API compat | Vertex backend |
116
+ |---|---|---|
117
+ | `POST /anthropic/v1/messages` | Anthropic Messages API | `publishers/anthropic/models/{model}:rawPredict` |
118
+ | `POST /gemini/v1beta/models/{m}:{action}` | Gemini generateContent API | `publishers/google/models/{m}:{action}` |
119
+ | `POST /openai/v1/chat/completions` | OpenAI Chat Completions | Gemini (via Vertex OpenAI-compat) + MaaS partner models (Kimi, GLM, MiniMax, Qwen, Grok) |
120
+ | `GET /v1/models` | - | Lists routable models |
121
+ | `GET /health` | - | Liveness + auth check |
122
+
123
+ The OpenAI Chat Completions shape is also accepted under the `/gemini` prefix and the bare root, so clients that build their URL from a `base_url` of `.../openai`, `.../gemini`, or the server root all reach the same handler. Model-discovery probes (`/v1/models`, `/models`) are mirrored under those prefixes too.
124
+
125
+ Streaming is supported on all routes (Anthropic, Gemini, and the OpenAI-compat route).
126
+ Streaming requests use a no-read-timeout upstream client so long Vertex generations do not get cut off during idle periods.
127
+
128
+ ## Pre-configured models
129
+
130
+ All aliases live in [`vertex_proxy/config.py`](vertex_proxy/config.py); extend as needed.
131
+
132
+ **Anthropic** (on Vertex, `us-east5` by default)
133
+ - `claude-sonnet-4-5-20250929` → `claude-sonnet-4-5@20250929`
134
+ - `claude-opus-4-5-20250929` → `claude-opus-4-5@20250929`
135
+ - `claude-haiku-4-5-20250929` → `claude-haiku-4-5@20250929`
136
+
137
+ **Gemini** (on Vertex, `us-central1` by default)
138
+ - `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.0-flash`
139
+
140
+ **MaaS partner models** (OpenAI-compatible route)
141
+ - `kimi-k2.5`, `kimi-k2` (Moonshot)
142
+ - `glm-5`, `glm-5.1`, `glm-4.6` (Zhipu)
143
+ - `minimax-m2.5`, `minimax-m1` (MiniMax)
144
+ - `qwen3.5`, `qwen-3` (Alibaba)
145
+ - `grok-4.20`, `grok-4.1-fast` (xAI)
146
+
147
+ ## Recipes
148
+
149
+ ### Claude Code CLI
150
+
151
+ Point Claude Code at the proxy via `ANTHROPIC_BASE_URL`:
152
+
153
+ ```bash
154
+ export ANTHROPIC_BASE_URL=http://127.0.0.1:8787/anthropic
155
+ export ANTHROPIC_AUTH_TOKEN=bypass # proxy ignores this; Vertex auth is server-side
156
+ claude
157
+ ```
158
+
159
+ Your local Claude Code session now bills against your GCP project instead of api.anthropic.com.
160
+
161
+ ### Hermes Agent
162
+
163
+ Add to `~/.hermes/config.yaml`:
164
+
165
+ ```yaml
166
+ custom_providers:
167
+ - name: vertex-gemini
168
+ # Hermes's openai_chat transport appends /chat/completions (and probes
169
+ # /v1/models) onto base_url. Gemini is served through Vertex's OpenAI-compat
170
+ # layer, so any of these bases work: .../openai, .../gemini, or the bare root.
171
+ base_url: http://127.0.0.1:8787/openai
172
+ transport: openai_chat
173
+
174
+ - name: vertex-anthropic
175
+ base_url: http://127.0.0.1:8787/anthropic
176
+ transport: anthropic_messages
177
+
178
+ fallback_model:
179
+ provider: vertex-gemini
180
+ model: gemini-2.5-pro
181
+ ```
182
+
183
+ Zero Hermes source changes required. Picks up the existing `custom_providers` mechanism. The `openai_chat` transport routes through the proxy's OpenAI-compat handler, which dispatches Gemini models to Vertex's OpenAI-compatible endpoint based on the request body's `model`.
184
+
185
+ ### opencode / Cline / any Anthropic-SDK client
186
+
187
+ Set the base URL environment variable the client supports (usually one of `ANTHROPIC_BASE_URL`, `ANTHROPIC_API_URL`, or the equivalent in your client's config):
188
+
189
+ ```bash
190
+ export ANTHROPIC_BASE_URL=http://127.0.0.1:8787/anthropic
191
+ ```
192
+
193
+ ## Run as a service (macOS launchd)
194
+
195
+ ```bash
196
+ cd launchd
197
+ ./install.sh --credentials /path/to/key.json --project my-gcp-project
198
+ ```
199
+
200
+ This renders the plist template, copies it to `~/Library/LaunchAgents/`, loads it, and does a health check. Logs go to `~/Library/Logs/vertex-proxy.{log,err}`.
201
+
202
+ Stop:
203
+ ```bash
204
+ launchctl unload ~/Library/LaunchAgents/ai.hermes.vertex-proxy.plist
205
+ ```
206
+
207
+ For Linux, the same pattern works with systemd; see [`examples/systemd.service`](examples/systemd.service).
208
+
209
+ ## Configuration reference
210
+
211
+ All settings accept `VERTEX_PROXY_` env var prefix or CLI flags.
212
+
213
+ | Env var | Default | Purpose |
214
+ |---|---|---|
215
+ | `VERTEX_PROXY_CREDENTIALS_PATH` | - | Service-account JSON path (falls back to ADC) |
216
+ | `VERTEX_PROXY_PROJECT_ID` | inferred from key | GCP project ID |
217
+ | `VERTEX_PROXY_ANTHROPIC_REGION` | `us-east5` | Region for Claude |
218
+ | `VERTEX_PROXY_GEMINI_REGION` | `us-central1` | Region for Gemini |
219
+ | `VERTEX_PROXY_MAAS_REGION` | `us-central1` | Region for Kimi / GLM / MiniMax / Qwen / Grok |
220
+ | `VERTEX_PROXY_HOST` | `127.0.0.1` | Bind host |
221
+ | `VERTEX_PROXY_PORT` | `8787` | Bind port |
222
+ | `VERTEX_PROXY_TOKEN_REFRESH_SECONDS` | `3000` | Token refresh interval (50 min) |
223
+ | `VERTEX_PROXY_LOG_LEVEL` | `info` | uvicorn log level |
224
+
225
+ ## A word on GCP credits
226
+
227
+ **GCP promotional credits (startup, free trial, partner) typically do NOT cover Google Cloud Marketplace purchases.** On Vertex AI, this matters because:
228
+
229
+ - **First-party Google models** (Gemini 2.5 Pro / Flash, Gemma) are billed as "Vertex AI API" usage → **credits cover ✅**
230
+ - **Partner models** (Claude, Kimi, GLM, MiniMax, Grok) are typically billed via GCP Marketplace → **credits usually don't cover ❌**
231
+
232
+ The "Promotional credits" section of your model's agreement page in Google Cloud Console will tell you explicitly. Quote from a typical Claude-on-Vertex agreement:
233
+
234
+ > *Most Google Cloud promotional credits don't apply to Google Cloud Marketplace purchases.*
235
+
236
+ If credit-burn is your goal, point vertex-proxy at Gemini. If billing unification is your goal, vertex-proxy works for everything.
237
+
238
+ ## Security
239
+
240
+ vertex-proxy binds to `127.0.0.1` by default and **ships with no authentication**. It's designed as a local-loopback shim; anyone who can reach it can spend your GCP credits via your service account.
241
+
242
+ Do not expose it to a public interface. If you need remote access, put it behind a reverse proxy with proper auth (nginx + basic auth, Tailscale, Cloud Run with IAP, etc.).
243
+
244
+ ## Status
245
+
246
+ - [x] Anthropic Messages API → Vertex Claude (with streaming)
247
+ - [x] Gemini generateContent API → Vertex Gemini (with streaming)
248
+ - [x] OpenAI Chat Completions → Vertex Gemini via Vertex's OpenAI-compat layer
249
+ - [x] OpenAI Chat Completions → Vertex MaaS partner models (Kimi, GLM, MiniMax, Qwen, Grok)
250
+ - [x] Multiple URL shapes accepted for OpenAI client compatibility: chat completions under the `/openai`, `/gemini`, and bare-root prefixes (e.g. `/openai/v1/chat/completions`, `/gemini/chat/completions`, `/chat/completions`), plus model-discovery (`/v1/models`, `/models`) mirrored under the same prefixes
251
+ - [x] Automatic GCP service-account token refresh
252
+ - [x] launchd (macOS) + systemd (Linux) service recipes
253
+ - [x] Dockerfile + docker-compose for containerized deploy
254
+ - [x] Optional bearer-token auth on the proxy itself (for remote deploys)
255
+ - [x] Prometheus metrics endpoint at `/metrics`
256
+ - [x] `pipx` / `uv` / `uvx` install via PyPI (tag-triggered OIDC Trusted Publishing release workflow)
257
+ - [x] 22 unit tests, GitHub Actions CI on Python 3.11 + 3.12
258
+
259
+ ### Tested with
260
+ - [x] Hermes Agent: verified end-to-end with live Gemini 2.5 Flash dispatch
261
+ - [x] Claude Code CLI: via `ANTHROPIC_BASE_URL` env
262
+ - [x] Direct `curl` against all routes
263
+
264
+ ## Troubleshooting
265
+
266
+ ### Client reports incomplete chunked read during streaming
267
+
268
+ This usually means the upstream Vertex stream was interrupted. Current streaming routes keep the upstream read open without a fixed read timeout and return a structured SSE error if Vertex still fails mid-stream, so clients should receive a clean error event instead of a broken HTTP chunk.
269
+
270
+ ### 404 "model not found" on Claude routes
271
+
272
+ Most Vertex AI Claude model endpoints require one-time enablement in Model Garden. Go to https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden and click ENABLE on the specific model (Sonnet, Opus, Haiku). Accept the Marketplace T&Cs. Your service account can then call them.
273
+
274
+ Note: GCP promotional credits typically don't cover Marketplace models. See "A word on GCP credits" above.
275
+
276
+ ### 404 "model not found" on MaaS routes (Kimi, GLM, MiniMax, Qwen, Grok)
277
+
278
+ Same as Claude: Vertex partner models require Model Garden enablement per model. Additionally, the MaaS path in `config.py` is a best-effort guess at Vertex's URL shape for these partners. If you hit 404s after enablement, check the "How to use" tab on the model's page in Model Garden and update the `maas_model_aliases` entry with the exact path fragment Google shows.
279
+
280
+ ### 401 / 403 on all routes
281
+
282
+ Your service account lacks `roles/aiplatform.user`. Grant it:
283
+ ```
284
+ gcloud projects add-iam-policy-binding YOUR_PROJECT \
285
+ --member="serviceAccount:YOUR_SA@YOUR_PROJECT.iam.gserviceaccount.com" \
286
+ --role="roles/aiplatform.user"
287
+ ```
288
+
289
+ ### Gemini 2.5 returns empty content with `reasoning_tokens` populated
290
+
291
+ Gemini 2.5 models use an internal "thinking" budget that counts against `max_tokens`. If `max_tokens` is too low, the model may use all its budget on thinking and return no visible output. Raise `max_tokens` to at least 100 for anything beyond trivial replies.
292
+
293
+ ### Hermes (or any OpenAI-chat client) returns `404 {'detail': 'Not Found'}` for Gemini
294
+
295
+ This happens when an OpenAI-chat client is pointed at the `/gemini` base. That client builds its request URL by appending `/chat/completions` (so it actually calls `/gemini/chat/completions`), but `/gemini` is the *native* `generateContent` route, which has no `chat/completions` handler. `curl` against `/gemini/v1beta/models/...:generateContent` works because that's the native shape; the OpenAI-chat client uses a different shape.
296
+
297
+ The proxy now accepts the OpenAI-chat shape under the `/gemini` and `/openai` prefixes as well as the bare root, so `transport: openai_chat` works against any of these bases. If you're on an older build, point the provider's `base_url` at `http://127.0.0.1:8787/openai` (or the bare `http://127.0.0.1:8787`) instead of `.../gemini`. Gemini still routes correctly because the OpenAI-compat handler dispatches by the request body's `model`.
298
+
299
+ ### Request works with `curl` but fails from my OpenAI client
300
+
301
+ Your client is probably sending requests to a URL shape the shim didn't expect. The shim accepts `/chat/completions`, `/v1/chat/completions`, `/openai/v1/chat/completions`, `/openai/chat/completions`, `/gemini/v1/chat/completions`, and `/gemini/chat/completions` for OpenAI-compatible traffic, and mirrors model discovery (`/v1/models`, `/models`) under the same prefixes. If your client sends something else, file an issue with the exact URL shape and we'll add it.
302
+
303
+ ### Token refresh errors in logs
304
+
305
+ The background refresh task logs errors but doesn't crash the process. If you see repeated refresh failures, check:
306
+ 1. Service account JSON path is correct (`VERTEX_PROXY_CREDENTIALS_PATH`)
307
+ 2. Machine clock is in sync (GCP JWT exchange is clock-sensitive)
308
+ 3. Service account isn't disabled or rotated in GCP IAM
309
+
310
+ ## Comparison with alternatives
311
+
312
+ | Tool | What it does | Fit |
313
+ |---|---|---|
314
+ | **vertex-proxy** (this) | Bridge existing Anthropic/Gemini/OpenAI clients to Vertex AI with auto-auth | You already use a tool with configurable base URL and want to point it at Vertex without rewriting auth |
315
+ | **LiteLLM** | Full-featured multi-provider router with caching, budgets, observability | Managing many providers centrally with policies; heavier dependency |
316
+ | **openai-compat-server** (various) | OpenAI shape over arbitrary backend | Similar to one route of vertex-proxy; doesn't handle GCP SA auth natively |
317
+ | **Vertex AI Python SDK** | Direct first-party Google SDK | You're writing new code and want to talk Vertex directly |
318
+ | **Anthropic Python SDK with Vertex backend** | First-party SDK with Vertex mode flag | You're writing new Anthropic code and control the client |
319
+
320
+ Use vertex-proxy when you have an **existing** tool you can't modify and need to redirect its traffic to Vertex.
321
+
322
+ ## Contributing
323
+
324
+ See [CONTRIBUTING.md](CONTRIBUTING.md). PRs welcome.
325
+
326
+ ## License
327
+
328
+ MIT. See [LICENSE](LICENSE).
329
+
330
+ ## Credits
331
+
332
+ Built by Prasad Subrahmanya ([prasad.tech](https://prasad.tech) · [@prasadus92](https://github.com/prasadus92)) as part of solving the "Hermes fallback model" problem for [Luminik](https://luminik.io), then extracted into a standalone tool because the shim turned out to be useful beyond Hermes.