argo-shim 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,6 @@
1
+ __pycache__/
2
+ *.egg-info/
3
+ dist/
4
+ build/
5
+ CLAUDE.md
6
+ test_stream_500.sh
@@ -0,0 +1,217 @@
1
+ Metadata-Version: 2.4
2
+ Name: argo-shim
3
+ Version: 0.1.0
4
+ Summary: HTTP proxy shim for Argo API via SSH tunnel
5
+ Project-URL: Homepage, https://github.com/n-getty/argo-shim
6
+ Project-URL: Repository, https://github.com/n-getty/argo-shim
7
+ Project-URL: Issues, https://github.com/n-getty/argo-shim/issues
8
+ Keywords: anthropic,argo,claude,proxy,ssh-tunnel
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Environment :: Console
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Operating System :: POSIX
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Topic :: Internet :: Proxy Servers
16
+ Requires-Python: >=3.8
17
+ Description-Content-Type: text/markdown
18
+
19
+ # argo-shim
20
+
21
+ A lightweight HTTP proxy that lets Claude Code talk to the Argo API through an SSH tunnel from an ALCF machine. It handles path rewriting (`/v1/messages` -> `/argoapi/v1/messages`), injects your API key, and bridges plain HTTP (what Claude Code speaks) to HTTPS (what the tunnel carries).
22
+
23
+ ## Installation
24
+
25
+ ```bash
26
+ # Run directly (no install needed):
27
+ uvx argo-shim
28
+
29
+ # Or install globally:
30
+ pip install argo-shim
31
+ # then run:
32
+ argo-shim
33
+ ```
34
+
35
+ ## Prerequisites
36
+
37
+ - SSH access to CELS machines ([setup guide](https://help.cels.anl.gov/docs/linux/ssh/))
38
+ - Python 3.8+
39
+ - Claude Code (`curl -fsSL https://claude.ai/install.sh | bash`)
40
+
41
+ ## Quick Start
42
+
43
+ **1. Run the shim**
44
+
45
+ ```bash
46
+ argo-shim
47
+ ```
48
+
49
+ The shim will:
50
+ - Find or create an SSH tunnel to `apps.inside.anl.gov:443`
51
+ - Start a local HTTP proxy on a port derived from your username (deterministic across restarts)
52
+ - Generate a per-session auth token and update `~/.claude/settings.json` with the correct `ANTHROPIC_BASE_URL` and `apiKeyHelper`
53
+ - Run health checks to verify connectivity
54
+
55
+ To use a specific port instead of the auto-derived one:
56
+
57
+ ```bash
58
+ argo-shim --port 8083
59
+ ```
60
+
61
+ The tunnel will use the port immediately below (e.g., `--port 8083` → tunnel on 8082, shim on 8083).
62
+
63
+ > If your ALCF username differs from your CELS username, set `CELS_USERNAME` to your CELS username
64
+
65
+ **2. Start Claude Code** (in another terminal on the same node)
66
+
67
+ ```bash
68
+ claude
69
+ ```
70
+
71
+ ## Running from Compute Nodes
72
+
73
+ Compute nodes don't have outbound network access, so they can't create SSH tunnels directly. Instead, create the tunnel on a UAN and point the shim at it.
74
+
75
+ **1. On a UAN:**
76
+
77
+ ```bash
78
+ argo-shim --tunnel
79
+ ```
80
+
81
+ This creates an SSH tunnel bound to all interfaces and prints the command to run on the compute node.
82
+
83
+ **2. On the compute node:**
84
+
85
+ ```bash
86
+ argo-shim --tunnel-host <uan-hostname>
87
+ ```
88
+
89
+ Then start Claude Code:
90
+
91
+ ```bash
92
+ claude
93
+ ```
94
+
95
+ The shim automatically clears proxy environment variables in `~/.claude/settings.json`, so no manual unsetting is needed.
96
+
97
+ ### Fallback: Relay through your Mac
98
+
99
+ If your UAN cannot SSH to CELS (e.g., network restrictions on Aurora), you can relay the tunnel through your Mac instead. This requires keeping your Mac connected for the duration of the session.
100
+
101
+ **1. On your Mac:**
102
+
103
+ ```bash
104
+ argo-shim --relay <uan-hostname>
105
+ ```
106
+
107
+ This creates the SSH tunnel locally, reverse-forwards it to the UAN, and starts the local shim (so your Mac can also use Claude Code).
108
+
109
+ **2. On the compute node:**
110
+
111
+ ```bash
112
+ argo-shim --tunnel-host <uan-hostname>
113
+ ```
114
+
115
+ If the UAN has `GatewayPorts` disabled (the default), the shim will automatically create an SSH local forward from the compute node to the UAN's localhost port.
116
+
117
+ > **Note:** The relay approach adds an extra network hop (compute node -> UAN -> Mac -> CELS -> API) and depends on your Mac staying connected. Prefer `--tunnel` on the UAN when SSH to CELS is available.
118
+
119
+ ## Claude Code Settings
120
+
121
+ The shim automatically creates `~/.claude/settings.json` on first run and keeps the port in `ANTHROPIC_BASE_URL` correct on subsequent runs. No manual setup needed.
122
+
123
+ To use a specific model (e.g., Opus), add a `"model"` field to your settings:
124
+
125
+ ```json
126
+ {
127
+ "model": "claudeopus46"
128
+ }
129
+ ```
130
+
131
+ Without this, Claude Code defaults to Sonnet.
132
+
133
+ ## Health Checks
134
+
135
+ The shim runs these automatically on startup. To run them manually:
136
+
137
+ ```bash
138
+ # Tunnel only (direct HTTPS through tunnel; use your tunnel port from startup logs)
139
+ curl -k -H "Host: apps.inside.anl.gov" \
140
+ -H "x-api-key: <username>" \
141
+ https://127.0.0.1:<tunnel-port>/argoapi/v1/models
142
+
143
+ # Tunnel + shim end-to-end (use your shim port; token is printed at startup)
144
+ curl -H "x-api-key: <auth-token>" http://127.0.0.1:<shim-port>/v1/models
145
+ ```
146
+
147
+ ## Troubleshooting
148
+
149
+ **`[SSL: WRONG_VERSION_NUMBER]` proxy errors**
150
+
151
+ The SSH tunnel is stale, usually caused by SSH ControlMaster keeping a dead connection open. Fix:
152
+
153
+ ```bash
154
+ ssh -O exit homes.cels.anl.gov
155
+ argo-shim # re-creates the tunnel
156
+ ```
157
+
158
+ **`HEAD /argoapi HTTP/1.1 501`**
159
+
160
+ You're running an older version of the shim that didn't handle HEAD requests. Update to the latest version.
161
+
162
+ **Port already in use**
163
+
164
+ The shim derives a deterministic port from your username. If that port is taken, specify a different one:
165
+
166
+ ```bash
167
+ argo-shim --port 8083
168
+ ```
169
+
170
+ To find and kill stale SSH tunnels occupying ports:
171
+
172
+ ```bash
173
+ # List SSH tunnels
174
+ ps aux | grep 'ssh -N'
175
+ # Kill a specific one
176
+ kill <pid>
177
+ ```
178
+
179
+ **Claude Code can't connect / API connection refused**
180
+
181
+ A few things to check:
182
+ - **Restart Claude Code** after restarting the shim — Claude Code only reads `~/.claude/settings.json` at startup, so it won't pick up a new port or token until restarted.
183
+ - **Try a different port** — in rare cases the derived port may not work on your node. Use `--port <PORT>` to specify an alternative (e.g., `argo-shim --port 8083`).
184
+
185
+ **401 errors / auth failures with project-level Claude settings**
186
+
187
+ The shim writes `apiKeyHelper` and `ANTHROPIC_BASE_URL` to `~/.claude/settings.json` (global). If you have a project-level `.claude/settings.json` with its own `env` object, it **overrides** the global `env` entirely (Claude Code does not merge object/scalar settings across scopes — only arrays merge). This means the shim's auth token never reaches Claude Code.
188
+
189
+ Fix: run the shim with `--no-auth` to disable token authentication:
190
+
191
+ ```bash
192
+ argo-shim --no-auth
193
+ ```
194
+
195
+ This is safe because the shim only listens on `127.0.0.1`. You will still need `ANTHROPIC_BASE_URL` set correctly — either in your global settings (where the shim writes it) or in your project settings.
196
+
197
+ **`500: Streaming is required for operations that may take longer than 10 minutes`**
198
+
199
+ This error comes from Google Vertex AI (the backend hosting Claude behind Argo), not from the shim or Argo Gateway. It occurs when a non-streaming request (`stream: false` or omitted) has a large payload — typically when Claude Code sends tool results (file reads, web searches) back to the model.
200
+
201
+ The shim works around this by forcing `stream: true` on all POST requests to `/messages` before forwarding upstream. If you see this error, make sure you're running the latest version of the shim.
202
+
203
+ **"ERROR: The requested URL could not be retrieved" in Claude Code**
204
+
205
+ HPC login nodes often set `HTTP_PROXY` / `HTTPS_PROXY` environment variables that route traffic through an institutional proxy, bypassing the shim's localhost proxy entirely. Clear them when launching Claude Code:
206
+
207
+ ```bash
208
+ HTTP_PROXY= HTTPS_PROXY= http_proxy= https_proxy= claude
209
+ ```
210
+
211
+ To make this permanent, unset the proxy vars in your shell config (e.g., `~/.bashrc`):
212
+
213
+ ```bash
214
+ unset HTTP_PROXY HTTPS_PROXY http_proxy https_proxy
215
+ ```
216
+
217
+ See the [ALCF proxy docs](https://docs.alcf.anl.gov/aurora/getting-started-on-aurora/#proxy) for more details on proxy settings on Aurora login nodes.
@@ -0,0 +1,199 @@
1
+ # argo-shim
2
+
3
+ A lightweight HTTP proxy that lets Claude Code talk to the Argo API through an SSH tunnel from an ALCF machine. It handles path rewriting (`/v1/messages` -> `/argoapi/v1/messages`), injects your API key, and bridges plain HTTP (what Claude Code speaks) to HTTPS (what the tunnel carries).
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ # Run directly (no install needed):
9
+ uvx argo-shim
10
+
11
+ # Or install globally:
12
+ pip install argo-shim
13
+ # then run:
14
+ argo-shim
15
+ ```
16
+
17
+ ## Prerequisites
18
+
19
+ - SSH access to CELS machines ([setup guide](https://help.cels.anl.gov/docs/linux/ssh/))
20
+ - Python 3.8+
21
+ - Claude Code (`curl -fsSL https://claude.ai/install.sh | bash`)
22
+
23
+ ## Quick Start
24
+
25
+ **1. Run the shim**
26
+
27
+ ```bash
28
+ argo-shim
29
+ ```
30
+
31
+ The shim will:
32
+ - Find or create an SSH tunnel to `apps.inside.anl.gov:443`
33
+ - Start a local HTTP proxy on a port derived from your username (deterministic across restarts)
34
+ - Generate a per-session auth token and update `~/.claude/settings.json` with the correct `ANTHROPIC_BASE_URL` and `apiKeyHelper`
35
+ - Run health checks to verify connectivity
36
+
37
+ To use a specific port instead of the auto-derived one:
38
+
39
+ ```bash
40
+ argo-shim --port 8083
41
+ ```
42
+
43
+ The tunnel will use the port immediately below (e.g., `--port 8083` → tunnel on 8082, shim on 8083).
44
+
45
+ > If your ALCF username differs from your CELS username, set `CELS_USERNAME` to your CELS username
46
+
47
+ **2. Start Claude Code** (in another terminal on the same node)
48
+
49
+ ```bash
50
+ claude
51
+ ```
52
+
53
+ ## Running from Compute Nodes
54
+
55
+ Compute nodes don't have outbound network access, so they can't create SSH tunnels directly. Instead, create the tunnel on a UAN and point the shim at it.
56
+
57
+ **1. On a UAN:**
58
+
59
+ ```bash
60
+ argo-shim --tunnel
61
+ ```
62
+
63
+ This creates an SSH tunnel bound to all interfaces and prints the command to run on the compute node.
64
+
65
+ **2. On the compute node:**
66
+
67
+ ```bash
68
+ argo-shim --tunnel-host <uan-hostname>
69
+ ```
70
+
71
+ Then start Claude Code:
72
+
73
+ ```bash
74
+ claude
75
+ ```
76
+
77
+ The shim automatically clears proxy environment variables in `~/.claude/settings.json`, so no manual unsetting is needed.
78
+
79
+ ### Fallback: Relay through your Mac
80
+
81
+ If your UAN cannot SSH to CELS (e.g., network restrictions on Aurora), you can relay the tunnel through your Mac instead. This requires keeping your Mac connected for the duration of the session.
82
+
83
+ **1. On your Mac:**
84
+
85
+ ```bash
86
+ argo-shim --relay <uan-hostname>
87
+ ```
88
+
89
+ This creates the SSH tunnel locally, reverse-forwards it to the UAN, and starts the local shim (so your Mac can also use Claude Code).
90
+
91
+ **2. On the compute node:**
92
+
93
+ ```bash
94
+ argo-shim --tunnel-host <uan-hostname>
95
+ ```
96
+
97
+ If the UAN has `GatewayPorts` disabled (the default), the shim will automatically create an SSH local forward from the compute node to the UAN's localhost port.
98
+
99
+ > **Note:** The relay approach adds an extra network hop (compute node -> UAN -> Mac -> CELS -> API) and depends on your Mac staying connected. Prefer `--tunnel` on the UAN when SSH to CELS is available.
100
+
101
+ ## Claude Code Settings
102
+
103
+ The shim automatically creates `~/.claude/settings.json` on first run and keeps the port in `ANTHROPIC_BASE_URL` correct on subsequent runs. No manual setup needed.
104
+
105
+ To use a specific model (e.g., Opus), add a `"model"` field to your settings:
106
+
107
+ ```json
108
+ {
109
+ "model": "claudeopus46"
110
+ }
111
+ ```
112
+
113
+ Without this, Claude Code defaults to Sonnet.
114
+
115
+ ## Health Checks
116
+
117
+ The shim runs these automatically on startup. To run them manually:
118
+
119
+ ```bash
120
+ # Tunnel only (direct HTTPS through tunnel; use your tunnel port from startup logs)
121
+ curl -k -H "Host: apps.inside.anl.gov" \
122
+ -H "x-api-key: <username>" \
123
+ https://127.0.0.1:<tunnel-port>/argoapi/v1/models
124
+
125
+ # Tunnel + shim end-to-end (use your shim port; token is printed at startup)
126
+ curl -H "x-api-key: <auth-token>" http://127.0.0.1:<shim-port>/v1/models
127
+ ```
128
+
129
+ ## Troubleshooting
130
+
131
+ **`[SSL: WRONG_VERSION_NUMBER]` proxy errors**
132
+
133
+ The SSH tunnel is stale, usually caused by SSH ControlMaster keeping a dead connection open. Fix:
134
+
135
+ ```bash
136
+ ssh -O exit homes.cels.anl.gov
137
+ argo-shim # re-creates the tunnel
138
+ ```
139
+
140
+ **`HEAD /argoapi HTTP/1.1 501`**
141
+
142
+ You're running an older version of the shim that didn't handle HEAD requests. Update to the latest version.
143
+
144
+ **Port already in use**
145
+
146
+ The shim derives a deterministic port from your username. If that port is taken, specify a different one:
147
+
148
+ ```bash
149
+ argo-shim --port 8083
150
+ ```
151
+
152
+ To find and kill stale SSH tunnels occupying ports:
153
+
154
+ ```bash
155
+ # List SSH tunnels
156
+ ps aux | grep 'ssh -N'
157
+ # Kill a specific one
158
+ kill <pid>
159
+ ```
160
+
161
+ **Claude Code can't connect / API connection refused**
162
+
163
+ A few things to check:
164
+ - **Restart Claude Code** after restarting the shim — Claude Code only reads `~/.claude/settings.json` at startup, so it won't pick up a new port or token until restarted.
165
+ - **Try a different port** — in rare cases the derived port may not work on your node. Use `--port <PORT>` to specify an alternative (e.g., `argo-shim --port 8083`).
166
+
167
+ **401 errors / auth failures with project-level Claude settings**
168
+
169
+ The shim writes `apiKeyHelper` and `ANTHROPIC_BASE_URL` to `~/.claude/settings.json` (global). If you have a project-level `.claude/settings.json` with its own `env` object, it **overrides** the global `env` entirely (Claude Code does not merge object/scalar settings across scopes — only arrays merge). This means the shim's auth token never reaches Claude Code.
170
+
171
+ Fix: run the shim with `--no-auth` to disable token authentication:
172
+
173
+ ```bash
174
+ argo-shim --no-auth
175
+ ```
176
+
177
+ This is safe because the shim only listens on `127.0.0.1`. You will still need `ANTHROPIC_BASE_URL` set correctly — either in your global settings (where the shim writes it) or in your project settings.
178
+
179
+ **`500: Streaming is required for operations that may take longer than 10 minutes`**
180
+
181
+ This error comes from Google Vertex AI (the backend hosting Claude behind Argo), not from the shim or Argo Gateway. It occurs when a non-streaming request (`stream: false` or omitted) has a large payload — typically when Claude Code sends tool results (file reads, web searches) back to the model.
182
+
183
+ The shim works around this by forcing `stream: true` on all POST requests to `/messages` before forwarding upstream. If you see this error, make sure you're running the latest version of the shim.
184
+
185
+ **"ERROR: The requested URL could not be retrieved" in Claude Code**
186
+
187
+ HPC login nodes often set `HTTP_PROXY` / `HTTPS_PROXY` environment variables that route traffic through an institutional proxy, bypassing the shim's localhost proxy entirely. Clear them when launching Claude Code:
188
+
189
+ ```bash
190
+ HTTP_PROXY= HTTPS_PROXY= http_proxy= https_proxy= claude
191
+ ```
192
+
193
+ To make this permanent, unset the proxy vars in your shell config (e.g., `~/.bashrc`):
194
+
195
+ ```bash
196
+ unset HTTP_PROXY HTTPS_PROXY http_proxy https_proxy
197
+ ```
198
+
199
+ See the [ALCF proxy docs](https://docs.alcf.anl.gov/aurora/getting-started-on-aurora/#proxy) for more details on proxy settings on Aurora login nodes.
@@ -0,0 +1,3 @@
1
+ """argo-shim: HTTP proxy for Argo API via SSH tunnel."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,5 @@
1
+ """Allow running with: python -m argo_shim"""
2
+
3
+ from argo_shim._shim import main
4
+
5
+ main()
@@ -0,0 +1,532 @@
1
+ import argparse
2
+ import getpass
3
+ import hashlib
4
+ import http.server
5
+ import http.client
6
+ import json
7
+ import os
8
+ import secrets
9
+ import signal
10
+ import socket
11
+ import socketserver
12
+ import ssl
13
+ import subprocess
14
+ import threading
15
+ import time
16
+
17
+ TARGET_HOST = "127.0.0.1"
18
+ REAL_HOST = "apps.inside.anl.gov"
19
+ API_KEY = os.environ.get("CELS_USERNAME", getpass.getuser())
20
+
21
+
22
+ def default_port(username):
23
+ """Derive a deterministic listen port from the username."""
24
+ h = hashlib.sha256(username.encode()).hexdigest()
25
+ return 10000 + (int(h[:8], 16) % 22768) # range 10000-32767 (below ephemeral range)
26
+
27
+ SSH_JUMP_HOST = "homes.cels.anl.gov"
28
+ SSH_PROXY_JUMP = "logins.cels.anl.gov"
29
+
30
+ class ProxyHandler(http.server.BaseHTTPRequestHandler):
31
+ protocol_version = 'HTTP/1.1'
32
+
33
+ def do_GET(self):
34
+ self.handle_proxy("GET")
35
+
36
+ def do_HEAD(self):
37
+ # Claude Code sends HEAD to the base URL as a connectivity probe.
38
+ # Reply directly instead of proxying to avoid a 404 from upstream.
39
+ self.send_response(200)
40
+ self.send_header('Content-Length', '0')
41
+ self.end_headers()
42
+
43
+ def do_POST(self):
44
+ self.handle_proxy("POST")
45
+
46
+ def _send_error(self, code, message):
47
+ """Send an HTTP error response to the client."""
48
+ try:
49
+ body = message.encode()
50
+ self.send_response(code)
51
+ self.send_header('Content-Type', 'text/plain')
52
+ self.send_header('Content-Length', str(len(body)))
53
+ self.end_headers()
54
+ self.wfile.write(body)
55
+ except (BrokenPipeError, ConnectionResetError):
56
+ pass
57
+
58
+ def handle_proxy(self, method):
59
+ # Validate auth token (HEAD is exempt — used by Claude Code as a connectivity probe)
60
+ if self.server.auth_token:
61
+ client_key = self.headers.get('x-api-key', '')
62
+ if method != "HEAD" and client_key != self.server.auth_token:
63
+ self.send_response(401)
64
+ self.send_header('Content-Type', 'text/plain')
65
+ msg = b'Unauthorized: invalid or missing x-api-key'
66
+ self.send_header('Content-Length', str(len(msg)))
67
+ self.end_headers()
68
+ self.wfile.write(msg)
69
+ print(f"[{method}] Rejected request (bad token)")
70
+ return
71
+
72
+ body = None
73
+ if method == "POST":
74
+ try:
75
+ content_length = int(self.headers.get('Content-Length', 0))
76
+ body = self.rfile.read(content_length)
77
+ except ConnectionResetError:
78
+ print("Client closed connection before sending body.")
79
+ return
80
+
81
+ print(f"[{method}] Intercepted Request: {self.path}")
82
+
83
+ # Force stream=true on /messages requests to avoid Vertex AI 500 errors.
84
+ # Vertex rejects non-streaming requests it estimates will exceed 10 minutes.
85
+ if method == "POST" and body and "/messages" in self.path:
86
+ try:
87
+ req_json = json.loads(body)
88
+ stream_val = req_json.get("stream", "<not set>")
89
+ model_val = req_json.get("model", "<not set>")
90
+ if req_json.get("stream") is not True:
91
+ print(f"[{method}] /messages: stream={stream_val} -> forcing stream=true (model={model_val})")
92
+ req_json["stream"] = True
93
+ body = json.dumps(req_json).encode("utf-8")
94
+ else:
95
+ print(f"[{method}] /messages: stream={stream_val}, model={model_val}")
96
+ except (json.JSONDecodeError, UnicodeDecodeError):
97
+ print(f"[{method}] /messages: could not parse body, forwarding as-is")
98
+
99
+ # Path rewrite logic
100
+ path = self.path
101
+ if not path.startswith("/argoapi"):
102
+ path = ("/argoapi/" + path.lstrip("/")).replace("//", "/")
103
+
104
+ context = ssl._create_unverified_context()
105
+ conn = http.client.HTTPSConnection(self.server.target_host, self.server.target_port, context=context, timeout=300)
106
+
107
+ # Build headers
108
+ headers = {k: v for k, v in self.headers.items() if k.lower() not in ['host', 'content-length', 'authorization']}
109
+ headers['Host'] = REAL_HOST
110
+ headers['x-api-key'] = API_KEY
111
+ headers['Connection'] = 'close'
112
+
113
+ for attempt in range(2):
114
+ try:
115
+ conn.request(method, path, body=body, headers=headers)
116
+ response = conn.getresponse()
117
+ break
118
+ except ConnectionRefusedError:
119
+ conn.close()
120
+ if attempt == 0 and self.server.recover_tunnel():
121
+ print(f"[{method}] Retrying after tunnel recovery...")
122
+ conn = http.client.HTTPSConnection(self.server.target_host, self.server.target_port, context=context, timeout=300)
123
+ continue
124
+ print(f"[{method}] Upstream connection refused (tunnel is down)")
125
+ self._send_error(502, "Bad Gateway: SSH tunnel is down. Restart argo-shim.")
126
+ return
127
+ except Exception as e:
128
+ conn.close()
129
+ print(f"[{method}] Upstream error: {e}")
130
+ self._send_error(502, f"Bad Gateway: {e}")
131
+ return
132
+
133
+ try:
134
+ self.send_response(response.status)
135
+ if response.status >= 400:
136
+ error_body = response.read()
137
+ print(f"[{method}] Upstream {response.status}: {error_body[:500]}")
138
+ for k, v in response.getheaders():
139
+ if k.lower() not in ('transfer-encoding',):
140
+ self.send_header(k, v)
141
+ self.send_header('Content-Length', str(len(error_body)))
142
+ self.end_headers()
143
+ self.wfile.write(error_body)
144
+ return
145
+ for k, v in response.getheaders():
146
+ if k.lower() != 'transfer-encoding':
147
+ self.send_header(k, v)
148
+ self.end_headers()
149
+
150
+ # Streaming response (Critical for Claude's SSE)
151
+ while True:
152
+ chunk = response.read(4096)
153
+ if not chunk:
154
+ break
155
+ self.wfile.write(chunk)
156
+ self.wfile.flush()
157
+
158
+ except (BrokenPipeError, ConnectionResetError):
159
+ print(f"[{method}] Client disconnected during streaming")
160
+ except Exception as e:
161
+ print(f"[{method}] Error during response: {e}")
162
+ finally:
163
+ conn.close()
164
+
165
+ class ThreadedTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
166
+ daemon_threads = True
167
+ allow_reuse_address = True
168
+
169
+ def __init__(self, server_address, handler, target_host, target_port, auth_token, tunnel_is_remote=False):
170
+ self.target_host = target_host
171
+ self.target_port = target_port
172
+ self.auth_token = auth_token
173
+ self.tunnel_is_remote = tunnel_is_remote
174
+ self._tunnel_lock = threading.Lock()
175
+ super().__init__(server_address, handler)
176
+
177
+ def recover_tunnel(self):
178
+ """Attempt to recreate the SSH tunnel. Returns True if recovery succeeded."""
179
+ if self.tunnel_is_remote:
180
+ # Remote tunnel (--tunnel-host / --relay): can't recreate from here
181
+ print("Tunnel is remote — cannot recover locally. Check the relay or UAN.")
182
+ return False
183
+ with self._tunnel_lock:
184
+ # Re-check under lock — another thread may have already recovered
185
+ if find_existing_tunnel(self.target_port):
186
+ return True
187
+ print("Tunnel is dead, attempting recovery...")
188
+ try:
189
+ create_tunnel(self.target_port)
190
+ print("Tunnel recovered successfully")
191
+ return True
192
+ except Exception as e:
193
+ print(f"Tunnel recovery failed: {e}")
194
+ return False
195
+
196
+
197
+ def check_port_available(port, host="127.0.0.1"):
198
+ """Check that a port is available, or raise with a helpful message."""
199
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
200
+ try:
201
+ s.bind((host, port))
202
+ except OSError:
203
+ raise RuntimeError(
204
+ f"Port {port} is already in use. "
205
+ f"Use --port <PORT> to specify a different port."
206
+ )
207
+
208
+
209
+ def verify_tunnel(port, host="127.0.0.1"):
210
+ """Verify that a listening port is actually a tunnel to REAL_HOST by doing a TLS handshake
211
+ and checking the server certificate."""
212
+ try:
213
+ ctx = ssl.create_default_context()
214
+ with ctx.wrap_socket(socket.socket(socket.AF_INET, socket.SOCK_STREAM),
215
+ server_hostname=REAL_HOST) as s:
216
+ s.settimeout(5)
217
+ s.connect((host, port))
218
+ cert = s.getpeercert()
219
+ # Check that the cert is valid for REAL_HOST (wrap_socket already does this
220
+ # via server_hostname matching, so reaching here means it passed)
221
+ print(f" ✓ TLS verified: tunnel on {port} reaches {REAL_HOST}")
222
+ return True
223
+ except ssl.SSLCertVerificationError as e:
224
+ print(f" ✗ Port {port}: TLS cert does not match {REAL_HOST}: {e}")
225
+ return False
226
+ except ssl.SSLError as e:
227
+ print(f" ✗ Port {port}: TLS handshake failed (not a tunnel to a TLS server): {e}")
228
+ return False
229
+ except (ConnectionRefusedError, OSError) as e:
230
+ print(f" ✗ Port {port}: connection failed: {e}")
231
+ return False
232
+
233
+
234
+ def is_own_process(port):
235
+ """Check if the process listening on a port belongs to the current user."""
236
+ try:
237
+ # Use TCP:{port} without address filter — lsof represents 0.0.0.0 as *
238
+ # so TCP@127.0.0.1 and TCP@0.0.0.0 both fail to match wildcard binds.
239
+ result = subprocess.run(
240
+ ["lsof", "-ti", f"TCP:{port}", "-sTCP:LISTEN"],
241
+ capture_output=True, text=True, timeout=5
242
+ )
243
+ for pid in result.stdout.strip().split('\n'):
244
+ if not pid:
245
+ continue
246
+ stat = subprocess.run(
247
+ ["ps", "-o", "user=", "-p", pid],
248
+ capture_output=True, text=True, timeout=5
249
+ )
250
+ owner = stat.stdout.strip()
251
+ if owner == API_KEY:
252
+ return True
253
+ print(f" Skipping port {port} (owned by {owner}, not {API_KEY})")
254
+ return False
255
+ except Exception:
256
+ pass
257
+ return False
258
+
259
+
260
+ def find_existing_tunnel(port, host="127.0.0.1"):
261
+ """Check if a verified tunnel to REAL_HOST already exists on this port."""
262
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
263
+ s.settimeout(1)
264
+ try:
265
+ s.connect((host, port))
266
+ except (ConnectionRefusedError, OSError):
267
+ return False
268
+ if not is_own_process(port):
269
+ return False
270
+ print(f"Port {port} is listening, verifying tunnel...")
271
+ if verify_tunnel(port, host):
272
+ return True
273
+ print(f" Port {port} is not a valid tunnel to {REAL_HOST}")
274
+ return False
275
+
276
+
277
+ def create_tunnel(port, host="127.0.0.1", bind_address="127.0.0.1"):
278
+ """Create a new SSH tunnel on the given port and verify it's working."""
279
+ check_port_available(port, host)
280
+ cmd = [
281
+ "ssh", "-N", "-f",
282
+ "-o", "ServerAliveInterval=15",
283
+ "-o", "ServerAliveCountMax=4",
284
+ "-J", f"{API_KEY}@{SSH_PROXY_JUMP}",
285
+ "-L", f"{bind_address}:{port}:{REAL_HOST}:443",
286
+ f"{API_KEY}@{SSH_JUMP_HOST}",
287
+ ]
288
+ print(f"Creating SSH tunnel on port {port}...")
289
+ print(f" $ {' '.join(cmd)}")
290
+ result = subprocess.run(cmd)
291
+ if result.returncode != 0:
292
+ raise RuntimeError(f"SSH tunnel failed (exit code {result.returncode})")
293
+
294
+ # Wait for tunnel to start accepting connections
295
+ for i in range(10):
296
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
297
+ s.settimeout(1)
298
+ try:
299
+ s.connect((host, port))
300
+ break
301
+ except (ConnectionRefusedError, OSError):
302
+ time.sleep(0.5)
303
+ else:
304
+ raise RuntimeError(f"SSH tunnel on port {port} never started accepting connections")
305
+
306
+ print(f"Verifying new tunnel...")
307
+ if not verify_tunnel(port, host):
308
+ raise RuntimeError(f"SSH tunnel on port {port} did not verify against {REAL_HOST}")
309
+
310
+ return port
311
+
312
+
313
+ def create_reverse_tunnel(remote_host, port):
314
+ """Create a reverse SSH tunnel, forwarding remote_host:port to localhost:port."""
315
+ cmd = ["ssh", "-N", "-f", "-R", f"0.0.0.0:{port}:127.0.0.1:{port}", remote_host]
316
+ print(f"Creating reverse tunnel to {remote_host}:{port}...")
317
+ print(f" $ {' '.join(cmd)}")
318
+ result = subprocess.run(cmd)
319
+ if result.returncode != 0:
320
+ raise RuntimeError(f"Reverse SSH tunnel to {remote_host} failed (exit code {result.returncode})")
321
+
322
+
323
+ def update_claude_settings(listen_port, auth_token):
324
+ """Update ~/.claude/settings.json with the correct ANTHROPIC_BASE_URL and auth token."""
325
+ settings_path = os.path.expanduser("~/.claude/settings.json")
326
+ os.makedirs(os.path.dirname(settings_path), exist_ok=True)
327
+ try:
328
+ with open(settings_path, "r") as f:
329
+ settings = json.load(f)
330
+ except FileNotFoundError:
331
+ settings = {
332
+ "env": {
333
+ "CLAUDE_CODE_SKIP_ANTHROPIC_AUTH": "1"
334
+ }
335
+ }
336
+ print(f" Creating new {settings_path}")
337
+ except json.JSONDecodeError as e:
338
+ print(f" ⚠ Could not parse {settings_path}: {e}")
339
+ return False
340
+
341
+ new_url = f"http://127.0.0.1:{listen_port}/argoapi"
342
+ if auth_token:
343
+ settings["apiKeyHelper"] = f"echo {auth_token}"
344
+ else:
345
+ settings["apiKeyHelper"] = "echo no-auth"
346
+ env = settings.setdefault("env", {})
347
+ env["ANTHROPIC_BASE_URL"] = new_url
348
+ for var in ("HTTP_PROXY", "HTTPS_PROXY", "http_proxy", "https_proxy"):
349
+ env[var] = ""
350
+
351
+ with open(settings_path, "w") as f:
352
+ json.dump(settings, f, indent=2)
353
+ f.write("\n")
354
+ auth_status = "token rotated" if auth_token else "no auth"
355
+ print(f" ✓ Updated settings.json (port={listen_port}, {auth_status})")
356
+ return True
357
+
358
+
359
+ def health_check(tunnel_host, tunnel_port, listen_port, auth_token):
360
+ """Validate the full chain: tunnel -> remote endpoint, and shim -> tunnel."""
361
+ print("\nRunning health checks...")
362
+ ok = True
363
+
364
+ # 1. Tunnel health: TLS + HTTP request to the real endpoint
365
+ print(f" [1/2] Tunnel ({tunnel_host}:{tunnel_port} -> {REAL_HOST})...")
366
+ try:
367
+ context = ssl._create_unverified_context()
368
+ conn = http.client.HTTPSConnection(tunnel_host, tunnel_port, context=context, timeout=10)
369
+ conn.request("GET", "/argoapi/v1/models", headers={"Host": REAL_HOST, "x-api-key": API_KEY})
370
+ resp = conn.getresponse()
371
+ body = resp.read()
372
+ conn.close()
373
+ if resp.status < 500:
374
+ print(f" ✓ Tunnel healthy (HTTP {resp.status})")
375
+ else:
376
+ print(f" ⚠ Tunnel responded but upstream returned HTTP {resp.status}: {body[:200]}")
377
+ ok = False
378
+ except ssl.SSLError as e:
379
+ print(f" ✗ Tunnel SSL error: {e}")
380
+ print(f" → This often means the SSH tunnel is stale. If you use ControlMaster,")
381
+ print(f" try: ssh -O exit {SSH_JUMP_HOST} then re-run this script.")
382
+ ok = False
383
+ except Exception as e:
384
+ print(f" ✗ Tunnel error: {e}")
385
+ ok = False
386
+
387
+ # 2. Shim health: HTTP request through the shim
388
+ print(f" [2/2] Shim (127.0.0.1:{listen_port} -> tunnel:{tunnel_port})...")
389
+ try:
390
+ conn = http.client.HTTPConnection(TARGET_HOST, listen_port, timeout=10)
391
+ headers = {"x-api-key": auth_token} if auth_token else {}
392
+ conn.request("GET", "/v1/models", headers=headers)
393
+ resp = conn.getresponse()
394
+ body = resp.read()
395
+ conn.close()
396
+ if resp.status < 500:
397
+ print(f" ✓ Shim healthy (HTTP {resp.status})")
398
+ else:
399
+ print(f" ⚠ Shim returned HTTP {resp.status}: {body[:200]}")
400
+ ok = False
401
+ except Exception as e:
402
+ print(f" ✗ Shim error: {e}")
403
+ ok = False
404
+
405
+ if ok:
406
+ print(" ✅ All health checks passed\n")
407
+ else:
408
+ print(" ❌ Some health checks failed — see above\n")
409
+ return ok
410
+
411
+
412
+ def main():
413
+ parser = argparse.ArgumentParser(description="HTTP proxy shim for Argo API via SSH tunnel")
414
+ parser.add_argument("--no-auth", action="store_true",
415
+ help="Disable token authentication on the shim (useful when project-level "
416
+ "Claude settings override the global apiKeyHelper)")
417
+ parser.add_argument("--port", type=int, default=None,
418
+ help="Listen port for the shim (default: derived from username)")
419
+ parser.add_argument("--tunnel", action="store_true",
420
+ help="Create an SSH tunnel bound to 0.0.0.0 (for compute node access) and exit. "
421
+ "Requires SSH access to CELS. Run on a UAN, or use --relay from your Mac.")
422
+ parser.add_argument("--tunnel-host", default=None,
423
+ help="Connect to an existing tunnel on a remote host (e.g., a UAN hostname). "
424
+ "Skips local tunnel creation. Use when running from compute nodes.")
425
+ parser.add_argument("--relay", metavar="REMOTE_HOST", default=None,
426
+ help="Relay mode: create SSH tunnel locally, then reverse-forward it to "
427
+ "REMOTE_HOST (e.g., a UAN). Run this on your Mac so compute nodes "
428
+ "can reach the API via the UAN.")
429
+ parser.add_argument("--no-update-settings", action="store_true",
430
+ help="Don't modify ~/.claude/settings.json (useful if you manage settings separately)")
431
+ args = parser.parse_args()
432
+
433
+ mode_flags = sum(bool(x) for x in [args.tunnel, args.tunnel_host, args.relay])
434
+ if mode_flags > 1:
435
+ parser.error("--tunnel, --tunnel-host, and --relay are mutually exclusive")
436
+
437
+ print(f"API key: {API_KEY}")
438
+
439
+ if args.port:
440
+ listen_port = args.port
441
+ else:
442
+ listen_port = default_port(API_KEY)
443
+ print(f"Derived port {listen_port} from username (override with --port <PORT>)")
444
+ tunnel_port = listen_port - 1
445
+ tunnel_host = args.tunnel_host or "127.0.0.1"
446
+
447
+ if args.tunnel:
448
+ # Tunnel-only mode: create a 0.0.0.0-bound tunnel on the UAN and exit
449
+ hostname = socket.gethostname()
450
+ if find_existing_tunnel(tunnel_port, "0.0.0.0") or find_existing_tunnel(tunnel_port):
451
+ print(f"Tunnel already running on port {tunnel_port}")
452
+ else:
453
+ create_tunnel(tunnel_port, bind_address="0.0.0.0")
454
+ print(f"Tunnel created on port {tunnel_port} (bound to 0.0.0.0)")
455
+ print(f"\nOn the compute node, run:")
456
+ print(f" argo-shim --tunnel-host {hostname}")
457
+ return
458
+
459
+ if args.relay:
460
+ # Relay mode: create local tunnel, then reverse-forward to remote host
461
+ if find_existing_tunnel(tunnel_port):
462
+ print(f"Using existing tunnel on port {tunnel_port}")
463
+ else:
464
+ create_tunnel(tunnel_port)
465
+ print(f"Tunnel created on port {tunnel_port}")
466
+ create_reverse_tunnel(args.relay, tunnel_port)
467
+ print(f"\nRelay active: {args.relay}:{tunnel_port} -> localhost:{tunnel_port}")
468
+ print(f"\nOn the compute node, run:")
469
+ print(f" argo-shim --tunnel-host {args.relay}")
470
+ # Continue to start the local shim so Mac can also use Claude
471
+
472
+ if args.tunnel_host:
473
+ # Compute node mode: use pre-existing tunnel on remote host
474
+ print(f"Using remote tunnel at {tunnel_host}:{tunnel_port}")
475
+ if not verify_tunnel(tunnel_port, tunnel_host):
476
+ # Direct connection failed (likely GatewayPorts disabled).
477
+ # Try SSH local forward to reach the remote host's localhost port.
478
+ print(f" Direct connection failed, creating SSH forward to {tunnel_host}...")
479
+ fwd_cmd = ["ssh", "-N", "-f", "-L",
480
+ f"127.0.0.1:{tunnel_port}:127.0.0.1:{tunnel_port}", tunnel_host]
481
+ print(f" $ {' '.join(fwd_cmd)}")
482
+ result = subprocess.run(fwd_cmd)
483
+ if result.returncode != 0:
484
+ raise RuntimeError(f"SSH forward to {tunnel_host} failed (exit code {result.returncode})")
485
+ tunnel_host = "127.0.0.1"
486
+ if not verify_tunnel(tunnel_port, tunnel_host):
487
+ raise RuntimeError(
488
+ f"No valid tunnel found at {args.tunnel_host}:{tunnel_port} "
489
+ f"(tried direct and SSH forward). "
490
+ f"Ensure --relay is running on your Mac."
491
+ )
492
+ elif args.relay:
493
+ pass # tunnel already created above
494
+ elif find_existing_tunnel(tunnel_port):
495
+ print(f"Using existing tunnel on port {tunnel_port}")
496
+ else:
497
+ create_tunnel(tunnel_port)
498
+ print(f"Tunnel created on port {tunnel_port}")
499
+
500
+ # 3. Start the shim
501
+ check_port_available(listen_port)
502
+ auth_token = None if args.no_auth else secrets.token_urlsafe(32)
503
+
504
+ if args.no_auth:
505
+ print("⚠ Auth disabled (--no-auth): shim accepts unauthenticated requests on localhost")
506
+
507
+ # 4. Update Claude settings with the correct port and token
508
+ if not args.no_update_settings:
509
+ update_claude_settings(listen_port, auth_token)
510
+ print(f"Set ANTHROPIC_BASE_URL=http://127.0.0.1:{listen_port}/argoapi")
511
+
512
+ tunnel_is_remote = bool(args.tunnel_host)
513
+ with ThreadedTCPServer(("127.0.0.1", listen_port), ProxyHandler, tunnel_host, tunnel_port, auth_token, tunnel_is_remote) as httpd:
514
+ print(f"✅ Shim running on {listen_port} -> {tunnel_port}. Supports GET/POST/HEAD.")
515
+
516
+ def shutdown_handler(signum, frame):
517
+ signame = signal.Signals(signum).name
518
+ print(f"\n{signame} received, shutting down...")
519
+ threading.Thread(target=httpd.shutdown).start()
520
+
521
+ signal.signal(signal.SIGINT, shutdown_handler)
522
+ signal.signal(signal.SIGTERM, shutdown_handler)
523
+
524
+ # 5. Run health checks in background after shim is listening
525
+ threading.Thread(target=health_check, args=(tunnel_host, tunnel_port, listen_port, auth_token), daemon=True).start()
526
+
527
+ httpd.serve_forever()
528
+ print("Shim stopped.")
529
+
530
+
531
+ if __name__ == "__main__":
532
+ main()
@@ -0,0 +1,31 @@
1
+ [build-system]
2
+ requires = ["hatchling"]
3
+ build-backend = "hatchling.build"
4
+
5
+ [project]
6
+ name = "argo-shim"
7
+ dynamic = ["version"]
8
+ description = "HTTP proxy shim for Argo API via SSH tunnel"
9
+ readme = "README.md"
10
+ requires-python = ">=3.8"
11
+ classifiers = [
12
+ "Development Status :: 4 - Beta",
13
+ "Environment :: Console",
14
+ "Intended Audience :: Developers",
15
+ "Intended Audience :: Science/Research",
16
+ "Operating System :: POSIX",
17
+ "Programming Language :: Python :: 3",
18
+ "Topic :: Internet :: Proxy Servers",
19
+ ]
20
+ keywords = ["argo", "proxy", "claude", "anthropic", "ssh-tunnel"]
21
+
22
+ [project.urls]
23
+ Homepage = "https://github.com/n-getty/argo-shim"
24
+ Repository = "https://github.com/n-getty/argo-shim"
25
+ Issues = "https://github.com/n-getty/argo-shim/issues"
26
+
27
+ [project.scripts]
28
+ argo-shim = "argo_shim._shim:main"
29
+
30
+ [tool.hatch.version]
31
+ path = "argo_shim/__init__.py"