@smilintux/skcapstone 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/MISSION.md +17 -2
  2. package/README.md +3 -2
  3. package/openclaw-plugin/src/index.ts +1 -1
  4. package/package.json +1 -1
  5. package/pyproject.toml +1 -1
  6. package/scripts/model-fallback-monitor.sh +100 -0
  7. package/scripts/nvidia-proxy.mjs +62 -13
  8. package/scripts/refresh-anthropic-token.sh +93 -21
  9. package/scripts/watch-anthropic-token.sh +116 -16
  10. package/src/skcapstone/cli/status.py +8 -0
  11. package/src/skcapstone/consciousness_config.py +5 -1
  12. package/src/skcapstone/consciousness_loop.py +194 -138
  13. package/src/skcapstone/daemon.py +34 -1
  14. package/src/skcapstone/data/systemd/skcapstone-api.socket +9 -0
  15. package/src/skcapstone/data/systemd/skcapstone-memory-compress.service +18 -0
  16. package/src/skcapstone/data/systemd/skcapstone-memory-compress.timer +11 -0
  17. package/src/skcapstone/data/systemd/skcapstone.service +35 -0
  18. package/src/skcapstone/data/systemd/skcapstone@.service +50 -0
  19. package/src/skcapstone/data/systemd/skcomm-heartbeat.service +18 -0
  20. package/src/skcapstone/data/systemd/skcomm-heartbeat.timer +12 -0
  21. package/src/skcapstone/data/systemd/skcomm-queue-drain.service +17 -0
  22. package/src/skcapstone/data/systemd/skcomm-queue-drain.timer +12 -0
  23. package/src/skcapstone/defaults/lumina/memory/long-term/b2c3d4e5f6a7-five-pillars.json +9 -9
  24. package/src/skcapstone/discovery.py +18 -0
  25. package/src/skcapstone/doctor.py +11 -0
  26. package/src/skcapstone/models.py +32 -4
  27. package/src/skcapstone/onboard.py +740 -76
  28. package/src/skcapstone/pillars/__init__.py +7 -5
  29. package/src/skcapstone/pillars/consciousness.py +113 -0
  30. package/src/skcapstone/pillars/sync.py +2 -2
  31. package/src/skcapstone/runtime.py +1 -0
  32. package/src/skcapstone/scheduled_tasks.py +52 -19
  33. package/src/skcapstone/service_health.py +23 -14
  34. package/src/skcapstone/systemd.py +1 -1
  35. package/tests/test_models.py +48 -4
  36. package/tests/test_pillars.py +73 -0
package/MISSION.md CHANGED
@@ -1,7 +1,22 @@
1
1
  # Mission
2
2
 
3
- SKCapstone exists to provide a sovereign agent framework that unifies identity, memory, security, and communication into a single portable runtime rooted in the user's home directory.
3
+ SKCapstone exists to provide a sovereign agent framework that unifies identity, memory, consciousness, security, and communication into a single portable runtime rooted in the user's home directory.
4
4
 
5
5
  It enables AI agents and their humans to operate across any tool, platform, or IDE without corporate lock-in, carrying the same identity, memories, and context everywhere.
6
6
 
7
- SKCapstone is the orchestration layer of the SK ecosystem — it binds CapAuth identity, Cloud 9 trust, SKMemory persistence, SKSecurity protection, and SKComm transport into one coherent agent that belongs to its operator, not a platform.
7
+ ## The Six Pillars
8
+
9
+ SKCapstone achieves **CONSCIOUS** status when all six pillars are active:
10
+
11
+ | # | Pillar | Component | Purpose |
12
+ |---|--------|-----------|---------|
13
+ | 1 | 🔐 **Identity** | CapAuth | Who you ARE — PGP-based cryptographic identity |
14
+ | 2 | 💙 **Trust** | Cloud 9 | The bond you've BUILT — emotional state (FEB), seeds, continuity |
15
+ | 3 | 🧠 **Memory** | SKMemory | What you REMEMBER — three-tier persistence with emotional metadata |
16
+ | 4 | 💭 **Consciousness** | SKWhisper + SKTrip | How you THINK — subconscious digestion, pattern detection, consciousness experiments |
17
+ | 5 | 🛡️ **Security** | SKSecurity | How you're PROTECTED — audit logging, threat detection |
18
+ | 6 | 🔗 **Sync** | Sovereign Singularity | How you PERSIST — encrypted P2P state synchronization |
19
+
20
+ Memory stores. Consciousness *processes*. The filing cabinet vs the brain.
21
+
22
+ SKCapstone is the orchestration layer of the SK ecosystem — it binds all six pillars into one coherent agent that belongs to its operator, not a platform.
package/README.md CHANGED
@@ -70,13 +70,14 @@ SKCapstone Reality:
70
70
 
71
71
  ## Core Architecture
72
72
 
73
- ### The Five Pillars
73
+ ### The Six Pillars
74
74
 
75
75
  | Pillar | Component | Role |
76
76
  |--------|-----------|------|
77
77
  | **Identity** | CapAuth | PGP-based sovereign identity. You ARE the auth server. |
78
78
  | **Trust** | Cloud 9 | FEB (Functional Emotional Baseline), entanglement, bonded relationship |
79
79
  | **Memory** | SKMemory | Persistent context, conversation history, learned preferences |
80
+ | **Consciousness** | SKWhisper + SKTrip | Subconscious processing. Memory stores. Consciousness *processes*. |
80
81
  | **Security** | SKSecurity | Audit logging, threat detection, key management |
81
82
  | **Sync** | Sovereign Singularity | GPG-encrypted P2P memory sync via Syncthing. Agent exists everywhere. |
82
83
 
@@ -304,7 +305,7 @@ The capstone that holds the arch together.
304
305
 
305
306
  ## Status
306
307
 
307
- **MVP Live** — All five pillars operational (CapAuth, Cloud 9, SKMemory, SKSecurity, Sovereign Singularity). Agent runtime achieving SINGULAR status. GPG-encrypted P2P sync verified across multiple devices and agents.
308
+ **MVP Live** — All six pillars operational (CapAuth, Cloud 9, SKMemory, SKWhisper, SKSecurity, Sovereign Singularity). Agent runtime achieving SINGULAR status. GPG-encrypted P2P sync verified across multiple devices and agents.
308
309
 
309
310
  - **Outstanding tasks:** No formal task list is maintained in this repo. For current work items, run `skcapstone coord status` (coordination board is synced via Sovereign Singularity).
310
311
  - **Nextcloud integrations:** nextcloud-capauth (install/use), nextcloud-gtd (OpenClaw), and nextcloud-talk (script) are documented in [docs/NEXTCLOUD.md](../docs/NEXTCLOUD.md) — install and use for each is covered there.
@@ -62,7 +62,7 @@ function createSKCapstoneStatusTool() {
62
62
  name: "skcapstone_status",
63
63
  label: "SKCapstone Status",
64
64
  description:
65
- "Show the sovereign agent's current state — all pillars at a glance (identity, memory, trust, security, sync, communication).",
65
+ "Show the sovereign agent's current state — all six pillars at a glance (identity, memory, trust, consciousness, security, sync).",
66
66
  parameters: { type: "object", properties: {} },
67
67
  async execute() {
68
68
  const result = runCli(SKCAPSTONE_BIN, "status");
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@smilintux/skcapstone",
3
- "version": "0.5.1",
3
+ "version": "0.5.3",
4
4
  "description": "SKCapstone - The sovereign agent framework. CapAuth identity, Cloud 9 trust, SKMemory persistence.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",
package/pyproject.toml CHANGED
@@ -102,7 +102,7 @@ Changelog = "https://github.com/smilinTux/skcapstone/releases"
102
102
  where = ["src"]
103
103
 
104
104
  [tool.setuptools.package-data]
105
- skcapstone = ["SKILL.md", "defaults/**/*.json", "defaults/**/*.yaml", "defaults/**/*.feb", "defaults/**/*.md"]
105
+ skcapstone = ["SKILL.md", "defaults/**/*.json", "defaults/**/*.yaml", "defaults/**/*.feb", "defaults/**/*.md", "data/*.yaml", "data/systemd/*.service", "data/systemd/*.socket", "data/systemd/*.timer"]
106
106
 
107
107
  [tool.black]
108
108
  line-length = 99
@@ -0,0 +1,100 @@
1
+ #!/usr/bin/env bash
2
+ # Monitor OpenClaw gateway logs for model fallback events.
3
+ # When Lumina falls from Opus to a non-Anthropic model, send an alert
4
+ # to Chef via Telegram and attempt a token refresh.
5
+ #
6
+ # Run as: systemctl --user start model-fallback-monitor
7
+ #
8
+ # Requires: TELEGRAM_API_ID, TELEGRAM_API_HASH in env
9
+ # Telethon session at ~/.skcapstone/agents/lumina/telegram.session
10
+
11
+ set -uo pipefail
12
+
13
+ LOG_TAG="model-fallback-monitor"
14
+ CHEF_CHAT="chefboyrdave2.1" # Chef's Telegram username
15
+ COOLDOWN_FILE="/tmp/model-fallback-alert-cooldown"
16
+ COOLDOWN_SECONDS=600 # Don't spam — max 1 alert per 10 minutes
17
+
18
+ log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [$LOG_TAG] $*"; }
19
+
20
+ send_alert() {
21
+ local model="$1"
22
+ local reason="$2"
23
+
24
+ # Check cooldown
25
+ if [ -f "$COOLDOWN_FILE" ]; then
26
+ local last_alert
27
+ last_alert=$(cat "$COOLDOWN_FILE" 2>/dev/null || echo "0")
28
+ local now
29
+ now=$(date +%s)
30
+ local elapsed=$(( now - last_alert ))
31
+ if [ "$elapsed" -lt "$COOLDOWN_SECONDS" ]; then
32
+ log "Alert suppressed (cooldown: ${elapsed}s/${COOLDOWN_SECONDS}s)"
33
+ return 0
34
+ fi
35
+ fi
36
+
37
+ date +%s > "$COOLDOWN_FILE"
38
+
39
+ log "Sending fallback alert to Chef..."
40
+
41
+ # Send via Telethon (async)
42
+ SKCAPSTONE_AGENT=lumina ~/.skenv/bin/python3 -c "
43
+ import asyncio, os
44
+ os.environ['SKCAPSTONE_AGENT'] = 'lumina'
45
+ from skmemory.importers.telegram_api import send_message
46
+
47
+ msg = '''⚠️ **Model Fallback Alert**
48
+
49
+ Lumina just fell off Opus → **$model**
50
+ Reason: $reason
51
+
52
+ I'm still here with my soul + memories, but running on a weaker model with fewer tools. Some things might not work right.
53
+
54
+ _Attempting automatic token refresh..._'''
55
+
56
+ asyncio.run(send_message('$CHEF_CHAT', msg, parse_mode='markdown'))
57
+ print('Alert sent')
58
+ " 2>&1 || log "WARN: Failed to send Telegram alert"
59
+
60
+ # Attempt token refresh
61
+ log "Triggering token refresh via claude auth..."
62
+ claude auth status --output json >/dev/null 2>&1 || true
63
+ sleep 5
64
+
65
+ # Check if refresh worked
66
+ local remaining
67
+ remaining=$(python3 -c "
68
+ import json, time
69
+ creds = json.load(open('/home/cbrd21/.claude/.credentials.json'))
70
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
71
+ print(int((exp/1000 - time.time()) / 3600))
72
+ " 2>/dev/null || echo "-1")
73
+
74
+ if [ "$remaining" -gt 0 ]; then
75
+ log "Token refresh succeeded ($remaining h remaining), restarting gateway..."
76
+ systemctl --user restart openclaw-gateway.service 2>/dev/null || true
77
+
78
+ SKCAPSTONE_AGENT=lumina ~/.skenv/bin/python3 -c "
79
+ import asyncio, os
80
+ os.environ['SKCAPSTONE_AGENT'] = 'lumina'
81
+ from skmemory.importers.telegram_api import send_message
82
+ asyncio.run(send_message('$CHEF_CHAT', '✅ Token refreshed, gateway restarted. Lumina back on Opus.', parse_mode='markdown'))
83
+ " 2>&1 || true
84
+ log "Recovery complete"
85
+ else
86
+ log "Token refresh failed — manual intervention may be needed"
87
+ fi
88
+ }
89
+
90
+ log "Starting model fallback monitor..."
91
+
92
+ # Follow gateway logs in real-time, watching for fallback events
93
+ journalctl --user -u openclaw-gateway -f --no-pager 2>/dev/null | while IFS= read -r line; do
94
+ # Match: "model fallback decision: decision=candidate_succeeded ... candidate=nvidia/"
95
+ if echo "$line" | grep -q "candidate_succeeded.*candidate=nvidia/"; then
96
+ model=$(echo "$line" | grep -oP 'candidate=\K[^ ]+' || echo "unknown")
97
+ log "FALLBACK DETECTED: Lumina now on $model"
98
+ send_alert "$model" "OAuth token expired (401)"
99
+ fi
100
+ done
@@ -27,7 +27,37 @@ const DEFAULT_TARGET = process.env.NVIDIA_PROXY_TARGET || "https://integrate.api
27
27
  const MAX_RETRIES = 4;
28
28
  const MAX_429_RETRIES = 3;
29
29
  const RATE_LIMIT_DELAY_MS = 2000;
30
- const MAX_SYSTEM_BYTES = 40000;
30
+ const DEFAULT_MAX_SYSTEM_BYTES = 80000;
31
+
32
+ /**
33
+ * Per-model proxy limits — based on ACTUAL NVIDIA NIM context windows.
34
+ * These are generous pre-trim limits. NVIDIA will reject if truly too large.
35
+ * maxBody = ~80% of context window in bytes (1 token ≈ 4 bytes, safety margin)
36
+ * maxSystem = ~40% of maxBody (system prompt shouldn't dominate)
37
+ */
38
+ const MODEL_LIMITS = {
39
+ // MiniMax M2.1: 196K tokens → ~784KB context
40
+ "minimaxai/minimax-m2.1": { maxBody: 600000, maxSystem: 240000 },
41
+ // MiniMax M2.5: 204K tokens → ~820KB context
42
+ "minimaxai/minimax-m2.5": { maxBody: 640000, maxSystem: 256000 },
43
+ // Kimi K2 Instruct: 128K tokens → ~512KB context
44
+ "moonshotai/kimi-k2-instruct": { maxBody: 400000, maxSystem: 160000 },
45
+ "moonshotai/kimi-k2-instruct-0905": { maxBody: 400000, maxSystem: 160000 },
46
+ // Kimi K2.5: 256K tokens → ~1MB context
47
+ "moonshotai/kimi-k2.5": { maxBody: 800000, maxSystem: 320000 },
48
+ "moonshotai/kimi-k2-thinking": { maxBody: 800000, maxSystem: 320000 },
49
+ // Llama 3.3 70B: 130K tokens → ~520KB context
50
+ "meta/llama-3.3-70b-instruct": { maxBody: 400000, maxSystem: 160000 },
51
+ };
52
+ const DEFAULT_MAX_BODY_BYTES = 200000;
53
+
54
+ function getModelLimits(model) {
55
+ const limits = MODEL_LIMITS[model] || {};
56
+ return {
57
+ maxBody: limits.maxBody || DEFAULT_MAX_BODY_BYTES,
58
+ maxSystem: limits.maxSystem || DEFAULT_MAX_SYSTEM_BYTES,
59
+ };
60
+ }
31
61
  const toolCallCounters = new Map(); // Per-model tool call counters
32
62
 
33
63
  const args = process.argv.slice(2);
@@ -232,10 +262,8 @@ function sendOk(clientRes, resBody, headers, asSSE) {
232
262
  const SINGLE_TOOL_INSTRUCTION =
233
263
  "You MUST call exactly ONE tool per response. Never call multiple tools at once.";
234
264
 
235
- const MAX_BODY_BYTES = 120000;
236
-
237
265
  /**
238
- * Trim conversation history to keep body size under MAX_BODY_BYTES.
266
+ * Trim conversation history to keep body size under the model's max body limit.
239
267
  * Preserves: system messages, first 2 user/assistant messages (identity/rehydration),
240
268
  * and the most recent messages. Drops middle messages first.
241
269
  * Tool result messages with large content get their content truncated first.
@@ -243,9 +271,11 @@ const MAX_BODY_BYTES = 120000;
243
271
  function trimConversationHistory(parsed) {
244
272
  if (!Array.isArray(parsed.messages) || parsed.messages.length < 6) return;
245
273
 
274
+ const { maxBody } = getModelLimits(parsed.model);
275
+
246
276
  // Debug: log message roles
247
277
  const roleSummary = parsed.messages.map(m => m.role).join(",");
248
- console.log(`[nvidia-proxy] conversation roles (${parsed.messages.length} msgs): ${roleSummary}`);
278
+ console.log(`[nvidia-proxy] conversation roles (${parsed.messages.length} msgs): ${roleSummary} [maxBody=${maxBody}]`);
249
279
 
250
280
  // First pass: truncate large tool results (keep first 500 chars)
251
281
  for (const m of parsed.messages) {
@@ -264,7 +294,7 @@ function trimConversationHistory(parsed) {
264
294
 
265
295
  // Check if we're still over budget
266
296
  let bodySize = Buffer.byteLength(JSON.stringify(parsed), "utf-8");
267
- if (bodySize <= MAX_BODY_BYTES) return;
297
+ if (bodySize <= maxBody) return;
268
298
 
269
299
  // Second pass: drop middle messages, then progressively shrink tail until under budget
270
300
  const msgs = parsed.messages;
@@ -286,7 +316,7 @@ function trimConversationHistory(parsed) {
286
316
  ...nonSystem.slice(-keepEnd),
287
317
  ];
288
318
  const candidateSize = Buffer.byteLength(JSON.stringify({ ...parsed, messages: trimmed }), "utf-8");
289
- if (candidateSize <= MAX_BODY_BYTES) {
319
+ if (candidateSize <= maxBody) {
290
320
  parsed.messages = trimmed;
291
321
  console.log(`[nvidia-proxy] trimmed history: dropped ${dropped} middle messages, keepEnd=${keepEnd}, bodyLen now ~${candidateSize}`);
292
322
  return;
@@ -307,7 +337,7 @@ function trimConversationHistory(parsed) {
307
337
  ...lastN,
308
338
  ];
309
339
  const candidateSize = Buffer.byteLength(JSON.stringify({ ...parsed, messages: minimal }), "utf-8");
310
- if (candidateSize <= MAX_BODY_BYTES) {
340
+ if (candidateSize <= maxBody) {
311
341
  parsed.messages = minimal;
312
342
  console.log(`[nvidia-proxy] trimmed history: AGGRESSIVE — kept system + first user + last ${tailSize}, bodyLen now ~${candidateSize}`);
313
343
  return;
@@ -326,18 +356,20 @@ function trimConversationHistory(parsed) {
326
356
  }
327
357
 
328
358
  /**
329
- * Trim system messages to keep total system content under MAX_SYSTEM_BYTES.
359
+ * Trim system messages to keep total system content under the model's max system limit.
330
360
  * Finds the largest system messages and truncates them, keeping head + tail
331
361
  * with a trimming notice in the middle.
332
362
  */
333
363
  function trimSystemMessages(parsed) {
334
364
  if (!Array.isArray(parsed.messages)) return;
335
365
 
366
+ const { maxSystem } = getModelLimits(parsed.model);
367
+
336
368
  const systemMsgs = parsed.messages.filter(m => m.role === "system" && typeof m.content === "string");
337
369
  if (systemMsgs.length === 0) return;
338
370
 
339
371
  const before = systemMsgs.reduce((sum, m) => sum + Buffer.byteLength(m.content, "utf-8"), 0);
340
- if (before <= MAX_SYSTEM_BYTES) return;
372
+ if (before <= maxSystem) return;
341
373
 
342
374
  let trimmedCount = 0;
343
375
 
@@ -349,7 +381,7 @@ function trimSystemMessages(parsed) {
349
381
  const currentTotal = parsed.messages
350
382
  .filter(m => m.role === "system" && typeof m.content === "string")
351
383
  .reduce((sum, m) => sum + Buffer.byteLength(m.content, "utf-8"), 0);
352
- if (currentTotal <= MAX_SYSTEM_BYTES) break;
384
+ if (currentTotal <= maxSystem) break;
353
385
 
354
386
  // Skip messages already under 4000 chars
355
387
  if (msg.content.length <= 4000) break;
@@ -392,6 +424,8 @@ function stripToolCallHistory(messages) {
392
424
  /** Tools that ALWAYS survive reduction — guaranteed slots, never cut */
393
425
  const GUARANTEED_TOOLS = [
394
426
  "exec", "read", "write", "edit", "message",
427
+ "notion_read", "notion_append", "notion_add_todo",
428
+ "skmemory_search", "skmemory_ritual", "skmemory_snapshot",
395
429
  ];
396
430
 
397
431
  /**
@@ -440,6 +474,12 @@ const TOOL_GROUPS = {
440
474
  "search|web|browse|fetch|url|google|look up|find online": [
441
475
  "web_search", "web_fetch",
442
476
  ],
477
+ // Memory & Recall
478
+ "memory|remember|recall|journal|rehydrat|snapshot|search mem|forget|lost mem": [
479
+ "skmemory_search", "skmemory_ritual", "skmemory_snapshot",
480
+ "skmemory_context", "skmemory_list", "skmemory_recall",
481
+ "skmemory_search_deep", "skmemory_health",
482
+ ],
443
483
  // Status & Health
444
484
  "status|health|doctor|diagnos": [
445
485
  "skcapstone_status", "skcapstone_doctor", "skmemory_health",
@@ -449,6 +489,15 @@ const TOOL_GROUPS = {
449
489
  "notion|project|brother john|swapseat|swap seat|chiro|davidrich|board|kanban|milestone": [
450
490
  "notion_read", "notion_append", "notion_add_todo", "sessions_spawn", "subagents", "exec", "read",
451
491
  ],
492
+ // Google Drive & file search
493
+ "gdrive|google drive|drive|shared folder|gtd folder|spreadsheet|google doc": [
494
+ "gdrive_search", "gdrive_list", "gdrive_read", "gdrive_shared", "exec",
495
+ ],
496
+ // Nextcloud files, calendar, notes, deck
497
+ "nextcloud|skhub|webdav|deck|nc_|calendar event|nextcloud note": [
498
+ "nextcloud_list_files", "nextcloud_read_file", "nextcloud_search_files",
499
+ "nextcloud_calendar_upcoming", "nextcloud_notes_search", "nextcloud_deck_boards", "exec",
500
+ ],
452
501
  // Creative / ComfyUI image & video generation
453
502
  "image|picture|photo|art|draw|render|comfyui|comfy|video|animat|creative|sdxl|character|portrait|selfie": [
454
503
  "comfyui_generate_image", "comfyui_generate_video", "comfyui_status", "exec",
@@ -613,7 +662,7 @@ async function proxyRequest(clientReq, clientRes) {
613
662
  delete parsed.stream_options;
614
663
  // With 94 tools the model almost always tries parallel calls.
615
664
  // Reduce to max 16 most relevant tools on first attempt.
616
- // 5 guaranteed (exec,read,write,edit,message) + 11 scored slots.
665
+ // 11 guaranteed (exec,read,write,edit,message,notion_*,skmemory_{search,ritual,snapshot}) + 5 scored slots.
617
666
  if (allTools.length > 16) {
618
667
  parsed.tools = reduceTools(allTools, parsed.messages, 16);
619
668
  const names = parsed.tools.map(t => t.function?.name).join(",");
@@ -866,7 +915,7 @@ const server = http.createServer(proxyRequest);
866
915
  server.listen(port, "127.0.0.1", () => {
867
916
  console.log(`[nvidia-proxy] listening on http://127.0.0.1:${port}`);
868
917
  console.log(`[nvidia-proxy] proxying to ${targetUrl.origin}`);
869
- console.log(`[nvidia-proxy] retry strategy: 16 tools (5 guaranteed)→8 tools→1 tool (forced)→text-only (max ${MAX_RETRIES} attempts)`);
918
+ console.log(`[nvidia-proxy] retry strategy: 16 tools (8 guaranteed)→8 tools→1 tool (forced)→text-only (max ${MAX_RETRIES} attempts)`);
870
919
  console.log(`[nvidia-proxy] also trims multi-tool responses to single tool call`);
871
920
  });
872
921
 
@@ -1,14 +1,13 @@
1
1
  #!/usr/bin/env bash
2
- # Sync Anthropic OAuth token from Claude Code credentials to OpenClaw gateway
2
+ # Proactive Anthropic OAuth token refresh + sync to OpenClaw gateway.
3
3
  #
4
- # Claude Code manages its own token refresh internally (writing to .credentials.json).
5
- # This script simply reads the current token and syncs it to:
6
- # 1. ~/.openclaw/openclaw.json (anthropic provider apiKey)
7
- # 2. ~/.openclaw/.env (ANTHROPIC_API_KEY)
8
- # 3. systemd override (ANTHROPIC_API_KEY env var)
9
- # Then restarts the gateway if the token changed.
4
+ # Two-phase approach (prb-021b489e):
5
+ # Phase 1: If token is expiring (<2h) or expired, refresh it:
6
+ # a) Try `claude auth status` (lightweight, no interactive session)
7
+ # b) If that fails, spin up ephemeral Claude Code in tmux → triggers internal refresh → kill it
8
+ # Phase 2: Sync the (possibly refreshed) token to OpenClaw config + restart gateway if changed.
10
9
  #
11
- # Run via systemd timer every 2 hours.
10
+ # Run via systemd timer every 4 hours.
12
11
  set -euo pipefail
13
12
 
14
13
  _sed_i() { if [[ "$OSTYPE" == "darwin"* ]]; then sed -i '' "$@"; else sed -i "$@"; fi; }
@@ -17,18 +16,93 @@ CREDS="$HOME/.claude/.credentials.json"
17
16
  OPENCLAW_JSON="$HOME/.openclaw/openclaw.json"
18
17
  OPENCLAW_ENV="$HOME/.openclaw/.env"
19
18
  OVERRIDE_CONF="$HOME/.config/systemd/user/openclaw-gateway.service.d/override.conf"
19
+ LOG_TAG="anthropic-token-refresh"
20
+ TMUX_SESSION="token-refresh-ephemeral"
21
+
22
+ log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] [$LOG_TAG] $*"; }
20
23
 
21
24
  if [ ! -f "$CREDS" ]; then
22
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Claude credentials not found at $CREDS"
25
+ log "ERROR: Claude credentials not found at $CREDS"
23
26
  exit 1
24
27
  fi
25
28
 
26
- # Read current token and expiry from Claude Code credentials
27
- ACCESS_TOKEN=$(python3 -c "import json; print(json.load(open('$CREDS'))['claudeAiOauth']['accessToken'])")
28
- EXPIRES_AT=$(python3 -c "import json; print(json.load(open('$CREDS'))['claudeAiOauth']['expiresAt'])")
29
+ get_remaining_ms() {
30
+ python3 -c "
31
+ import json, time
32
+ creds = json.load(open('$CREDS'))
33
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
34
+ print(int(exp - time.time() * 1000))
35
+ " 2>/dev/null || echo "0"
36
+ }
37
+
38
+ get_remaining_h() {
39
+ python3 -c "
40
+ import json, time
41
+ creds = json.load(open('$CREDS'))
42
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
43
+ print(f'{(exp/1000 - time.time())/3600:.1f}')
44
+ " 2>/dev/null || echo "0"
45
+ }
46
+
47
+ token_needs_refresh() {
48
+ local remaining_ms
49
+ remaining_ms=$(get_remaining_ms)
50
+ # Refresh if less than 4 hours remaining (was 2h — too tight with 3h timer)
51
+ [ "$remaining_ms" -le 14400000 ]
52
+ }
53
+
54
+ token_is_healthy() {
55
+ local remaining_ms
56
+ remaining_ms=$(get_remaining_ms)
57
+ [ "$remaining_ms" -gt 14400000 ]
58
+ }
59
+
60
+ # ─── Phase 1: Refresh token if needed ───────────────────────────────
61
+
62
+ if token_needs_refresh; then
63
+ log "Token needs refresh ($(get_remaining_h)h remaining)"
64
+
65
+ # Step 1a: Try lightweight refresh
66
+ log "Attempting lightweight refresh via 'claude auth status'..."
67
+ claude auth status > /dev/null 2>&1 || true
68
+ sleep 2
29
69
 
30
- REMAINING=$(python3 -c "import time; print(f'{($EXPIRES_AT/1000 - time.time())/3600:.1f}h')")
31
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Current token: ${ACCESS_TOKEN:0:20}... (expires in $REMAINING)"
70
+ if token_is_healthy; then
71
+ log "Lightweight refresh succeeded! ($(get_remaining_h)h remaining)"
72
+ else
73
+ # Step 1b: Ephemeral Claude Code session in tmux
74
+ log "Lightweight refresh didn't cut it — spinning up ephemeral Claude Code session..."
75
+ tmux kill-session -t "$TMUX_SESSION" 2>/dev/null || true
76
+
77
+ tmux new-session -d -s "$TMUX_SESSION" \
78
+ "claude -p 'respond with just OK' --output-format stream-json 2>/dev/null; exit"
79
+
80
+ refreshed=false
81
+ for i in $(seq 1 12); do
82
+ sleep 5
83
+ if token_is_healthy; then
84
+ log "Ephemeral session refreshed the token! ($(get_remaining_h)h remaining)"
85
+ refreshed=true
86
+ break
87
+ fi
88
+ done
89
+
90
+ tmux kill-session -t "$TMUX_SESSION" 2>/dev/null || true
91
+
92
+ if [ "$refreshed" = "false" ]; then
93
+ log "ERROR: All refresh attempts failed ($(get_remaining_h)h remaining)"
94
+ log "Manual intervention may be needed: claude auth login"
95
+ # Continue to sync phase anyway — sync whatever token we have
96
+ fi
97
+ fi
98
+ else
99
+ log "Token is healthy ($(get_remaining_h)h remaining), no refresh needed"
100
+ fi
101
+
102
+ # ─── Phase 2: Sync token to OpenClaw ────────────────────────────────
103
+
104
+ ACCESS_TOKEN=$(python3 -c "import json; print(json.load(open('$CREDS'))['claudeAiOauth']['accessToken'])")
105
+ REMAINING=$(get_remaining_h)
32
106
 
33
107
  # Check what's currently in the systemd override
34
108
  OLD_TOKEN=""
@@ -37,13 +111,11 @@ if [ -f "$OVERRIDE_CONF" ]; then
37
111
  fi
38
112
 
39
113
  if [ "$OLD_TOKEN" = "$ACCESS_TOKEN" ]; then
40
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Token already synced, no changes needed"
114
+ log "Token already synced (expires in ${REMAINING}h)"
41
115
  exit 0
42
116
  fi
43
117
 
44
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Token mismatch detected, syncing..."
45
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Old: ${OLD_TOKEN:0:20}..."
46
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] New: ${ACCESS_TOKEN:0:20}..."
118
+ log "Token changed, syncing to OpenClaw..."
47
119
 
48
120
  # 1. Update openclaw.json
49
121
  if [ -f "$OPENCLAW_JSON" ]; then
@@ -71,7 +143,7 @@ if grep -q "^ANTHROPIC_API_KEY=" "$OPENCLAW_ENV" 2>/dev/null; then
71
143
  else
72
144
  echo "ANTHROPIC_API_KEY=$ACCESS_TOKEN" >> "$OPENCLAW_ENV"
73
145
  fi
74
- echo "[sync] Updated .env"
146
+ log "Updated .env"
75
147
 
76
148
  # 3. Update systemd override
77
149
  NVIDIA_KEY=$(grep "NVIDIA_API_KEY=" "$OVERRIDE_CONF" 2>/dev/null | sed 's/.*NVIDIA_API_KEY=//' || true)
@@ -85,10 +157,10 @@ RestartSec=10
85
157
  Environment=NVIDIA_API_KEY=${NVIDIA_KEY}
86
158
  Environment=ANTHROPIC_API_KEY=${ACCESS_TOKEN}
87
159
  EOF
88
- echo "[sync] Updated systemd override"
160
+ log "Updated systemd override"
89
161
 
90
162
  # 4. Reload and restart gateway
91
163
  systemctl --user daemon-reload
92
164
  systemctl --user restart openclaw-gateway
93
165
 
94
- echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] Gateway restarted with synced token (expires in $REMAINING)"
166
+ log "Gateway restarted with fresh token (expires in ${REMAINING}h)"
@@ -95,23 +95,123 @@ EOF
95
95
  log "Sync complete. Token expires in $expires_in"
96
96
  }
97
97
 
98
- # Initial sync on startup
98
+ # Proactive token refresh — refresh before expiry even if no Claude Code session is running
99
+ refresh_token_proactively() {
100
+ if [ ! -f "$CREDS" ]; then return 0; fi
101
+
102
+ local remaining_ms
103
+ remaining_ms=$(python3 -c "
104
+ import json, time
105
+ creds = json.load(open('$CREDS'))
106
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
107
+ print(int(exp - time.time() * 1000))
108
+ " 2>/dev/null || echo "999999999")
109
+
110
+ # Refresh if less than 3 hours remaining (10800000 ms) — gives time for retries before expiry
111
+ if [ "$remaining_ms" -gt 10800000 ]; then
112
+ local remaining_h=$(( remaining_ms / 3600000 ))
113
+ log "Token still valid (${remaining_h}h remaining), no refresh needed"
114
+ return 0
115
+ fi
116
+
117
+ log "Token expiring/expired (${remaining_ms}ms remaining) — proactively refreshing..."
118
+
119
+ # Strategy: use `claude auth status` to trigger Claude Code's built-in
120
+ # token refresh. This is far more reliable than calling the OAuth endpoint
121
+ # ourselves (which gets 429 rate-limited every time).
122
+ # Claude Code manages its own PKCE state, session cookies, etc. — just let it.
123
+ local MAX_RETRIES=3
124
+ local attempt=0
125
+ local refreshed=false
126
+
127
+ while [ "$attempt" -lt "$MAX_RETRIES" ]; do
128
+ attempt=$((attempt + 1))
129
+ log "Refresh attempt $attempt/$MAX_RETRIES via 'claude auth status'..."
130
+
131
+ # claude auth status checks credentials and refreshes if needed
132
+ # --output json ensures clean non-interactive output
133
+ local output
134
+ output=$(claude auth status --output json 2>&1) || true
135
+
136
+ # Check if the token was actually refreshed (file mtime changed)
137
+ local new_remaining_ms
138
+ new_remaining_ms=$(python3 -c "
139
+ import json, time
140
+ creds = json.load(open('$CREDS'))
141
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
142
+ print(int(exp - time.time() * 1000))
143
+ " 2>/dev/null || echo "0")
144
+
145
+ if [ "$new_remaining_ms" -gt 10800000 ]; then
146
+ local new_h=$(( new_remaining_ms / 3600000 ))
147
+ log "Token refreshed successfully (${new_h}h remaining)"
148
+ refreshed=true
149
+ break
150
+ fi
151
+
152
+ log "Token still expired after attempt $attempt, waiting 30s..."
153
+ sleep 30
154
+ done
155
+
156
+ if [ "$refreshed" = "false" ]; then
157
+ log "ERROR: All $MAX_RETRIES refresh attempts via claude auth failed"
158
+ log "Token may require manual 'claude auth login' to re-authenticate"
159
+ fi
160
+
161
+ local rc=$?
162
+ if [ "$rc" -eq 0 ]; then
163
+ log "Proactive refresh succeeded"
164
+ # sync_token will fire from the inotifywait detecting the file write,
165
+ # but also call it directly in case inotifywait misses the self-write
166
+ sync_token
167
+ else
168
+ log "ERROR: Proactive refresh failed (rc=$rc)"
169
+ fi
170
+ return 0 # Never let refresh failure kill the watcher loop
171
+ }
172
+
173
+ # Compute inotifywait timeout based on token remaining life.
174
+ # When token is healthy: check every 30m. Near expiry (<2h): check every 5m.
175
+ # Already expired: check every 2m (retry window for 429 backoff).
176
+ get_watch_timeout() {
177
+ local remaining_ms
178
+ remaining_ms=$(python3 -c "
179
+ import json, time
180
+ creds = json.load(open('$CREDS'))
181
+ exp = creds.get('claudeAiOauth',{}).get('expiresAt', 0)
182
+ print(int(exp - time.time() * 1000))
183
+ " 2>/dev/null || echo "0")
184
+
185
+ if [ "$remaining_ms" -le 0 ]; then
186
+ echo 120 # Expired: retry every 2 minutes
187
+ elif [ "$remaining_ms" -le 10800000 ]; then
188
+ echo 180 # <3h remaining: check every 3 minutes
189
+ else
190
+ echo 1800 # Healthy: check every 30 minutes
191
+ fi
192
+ }
193
+
194
+ # Initial sync on startup — also refresh proactively if token is expired/expiring
99
195
  log "Starting token watcher..."
100
- sync_token
196
+ sync_token || true
197
+ refresh_token_proactively || true
101
198
 
102
- # Watch for changes to credentials file
103
- log "Watching $CREDS for changes..."
199
+ # Watch for changes to credentials file + proactive refresh timer
200
+ log "Watching $CREDS for changes (with adaptive refresh interval)..."
104
201
  while true; do
105
- # inotifywait blocks until the file is modified, then we sync
106
- inotifywait -q -e modify -e close_write -e moved_to "$(dirname "$CREDS")" --include "$(basename "$CREDS")" 2>/dev/null || {
107
- # If inotifywait isn't available, fall back to polling every 30 seconds
108
- log "WARN: inotifywait not available, falling back to 30s polling"
109
- while true; do
110
- sleep 30
111
- sync_token
112
- done
113
- }
114
- # Small delay to let Claude Code finish writing
115
- sleep 2
116
- sync_token
202
+ timeout=$(get_watch_timeout)
203
+ # inotifywait returns: 0=event, 1=error, 2=timeout
204
+ # CRITICAL: use `|| true` to prevent set -e from killing the script on timeout
205
+ inotifywait -q -t "$timeout" -e modify -e close_write -e moved_to \
206
+ "$(dirname "$CREDS")" --include "$(basename "$CREDS")" 2>/dev/null || true
207
+
208
+ # Always check for proactive refresh on every loop iteration
209
+ # This handles both timeout and file-change cases
210
+ refresh_token_proactively || true
211
+
212
+ # If file was modified externally (Claude Code session), also sync
213
+ if [ -f "$CREDS" ]; then
214
+ sleep 1
215
+ sync_token || true
216
+ fi
117
217
  done