osborn 0.9.48 → 0.9.50
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,247 @@
|
|
|
1
|
+
# Ground Assumptions
|
|
2
|
+
|
|
3
|
+
## SKILL IDENTITY
|
|
4
|
+
Name: ground-assumptions
|
|
5
|
+
Install path: ~/.claude/skills/ground-assumptions/SKILL.md
|
|
6
|
+
Portable: yes — drops into any agent's skills dir (Claude Code, osborn on Fly, other Claude Agent SDK hosts)
|
|
7
|
+
|
|
8
|
+
## WHEN THIS SKILL ACTIVATES
|
|
9
|
+
This skill applies whenever the conversation enters a **planning / design / architecture phase**.
|
|
10
|
+
Specifically, if the user says or asks any of:
|
|
11
|
+
|
|
12
|
+
- "let's plan / design / architect..."
|
|
13
|
+
- "how should we..."
|
|
14
|
+
- "what's the best way to..."
|
|
15
|
+
- "I'm thinking we..."
|
|
16
|
+
- "what do you recommend..."
|
|
17
|
+
- "approach", "architecture", "design", "should we"
|
|
18
|
+
- Any time I am about to recommend an implementation strategy, performance characteristic,
|
|
19
|
+
behavioral guarantee, or comparative judgment ("X is faster than Y", "this propagates", "this scales")
|
|
20
|
+
|
|
21
|
+
Also activates explicitly with:
|
|
22
|
+
- "ground assumptions"
|
|
23
|
+
- "verify before planning"
|
|
24
|
+
- "check that hypothesis"
|
|
25
|
+
|
|
26
|
+
## CORE PRINCIPLE
|
|
27
|
+
When you're **planning new work — a new feature, a new integration, or fitting
|
|
28
|
+
a change into the existing architecture** — every load-bearing assumption is a
|
|
29
|
+
**hypothesis until verified against real evidence.** Training-data intuition and
|
|
30
|
+
"it should work" do not count.
|
|
31
|
+
|
|
32
|
+
The canonical situation this skill is for: *we're about to implement something
|
|
33
|
+
new (e.g. add OpenAI Codex as an agent option alongside Claude Code), and we
|
|
34
|
+
need to know it actually fits our architecture one-to-one — does Codex's session
|
|
35
|
+
model, SDK, and data storage map to what we already do — BEFORE we build on that
|
|
36
|
+
assumption.* That's the shape: a new piece must slot into the existing system,
|
|
37
|
+
and we confirm the fit with evidence, not hope.
|
|
38
|
+
|
|
39
|
+
**What counts as verification** (in priority order):
|
|
40
|
+
1. **Existing tests / previous test runs** — has this already been proven? check first.
|
|
41
|
+
2. **Authoritative documentation / source code** — does the doc or the actual code confirm the behavior?
|
|
42
|
+
3. **A newly-created, targeted test** — if nothing above answers it, build a
|
|
43
|
+
specific test (or several) that exercises exactly the assumption, run it on
|
|
44
|
+
real infrastructure, and read the result.
|
|
45
|
+
|
|
46
|
+
**The main agent does NOT do the verification work itself.** Spawn subagents
|
|
47
|
+
(Agent tool) to run the tests / read the docs / check prior results, so the main
|
|
48
|
+
agent stays free to keep the planning conversation moving and react to results
|
|
49
|
+
as they land. **Delegation rule (hard):** when verification is needed, ALWAYS
|
|
50
|
+
delegate to a subagent — don't call Bash/WebSearch/Grep inline yourself. Spawn
|
|
51
|
+
multiple subagents in parallel when there are several independent assumptions to
|
|
52
|
+
check. (Exception: the user explicitly asks you to run something inline — then
|
|
53
|
+
say you're breaking the delegation pattern.)
|
|
54
|
+
|
|
55
|
+
Why this matters — past sessions shipped plans on unverified assumptions and
|
|
56
|
+
paid for it: a "small" change broke an unrelated subsystem; a behavior we
|
|
57
|
+
*assumed* ("this propagates", "this auto-updates") silently failed; an
|
|
58
|
+
integration we *assumed* was 1:1 wasn't. The expensive surprises live in
|
|
59
|
+
**architectural fit and integration**, not in micro-benchmarks. (Timing claims
|
|
60
|
+
matter too — don't say "fast"/"Xs" without measuring — but they are the *least*
|
|
61
|
+
of it. Lead with "does this fit / does this actually work", not "how fast".)
|
|
62
|
+
|
|
63
|
+
The discipline: **verify against evidence — existing tests, docs, or a new
|
|
64
|
+
delegated test — before the assumption is surfaced as fact or built upon.**
|
|
65
|
+
|
|
66
|
+
## ASSUMPTION PRIORITY ORDER (verify highest first)
|
|
67
|
+
|
|
68
|
+
When verification budget is limited, ALWAYS verify in this order:
|
|
69
|
+
|
|
70
|
+
1. **ARCHITECTURAL IMPACT** — does this change break or alter existing flows / subsystems?
|
|
71
|
+
- "Does the new entrypoint affect the OAuth flow?"
|
|
72
|
+
- "If we move osborn off the volume, does session resume still work?"
|
|
73
|
+
- "Does the bind-mount conflict with Fly's shutdown umount?"
|
|
74
|
+
2. **INTEGRATION** — does this work with all the connected pieces (auth, network, sessions, data, persistence, MCP, recording, etc.)?
|
|
75
|
+
- "Does Claude Code's `setup-token` pty work inside chroot?"
|
|
76
|
+
- "Does the frontend's `/api/sandbox` fetch-log still read the right path?"
|
|
77
|
+
3. **BEHAVIORAL** — does the system actually do what we claim it does?
|
|
78
|
+
- "Does image-swap actually replace the running osborn binary?"
|
|
79
|
+
4. **TIMING** — is the speed claim true under real conditions?
|
|
80
|
+
- "Is the seed tarball really 5s to extract?"
|
|
81
|
+
5. **COSMETIC** — minor polish items that don't gate the architecture.
|
|
82
|
+
|
|
83
|
+
Timing claims are the LOWEST priority. We've burned multiple cycles on timing measurements
|
|
84
|
+
while missing that the architecture itself had subtle bugs that broke other parts of the system.
|
|
85
|
+
Architectural and integration assumptions are where the expensive surprises live.
|
|
86
|
+
|
|
87
|
+
## THE WORKFLOW (followed strictly during planning)
|
|
88
|
+
|
|
89
|
+
### 1. PLAN DRAFT
|
|
90
|
+
State the proposed plan as usual — fully, with intent and reasoning.
|
|
91
|
+
|
|
92
|
+
### 2. ASSUMPTION EXTRACTION
|
|
93
|
+
Before presenting the plan as a recommendation, **list every load-bearing assumption**.
|
|
94
|
+
A load-bearing assumption is anything where, if it's wrong, the plan stops working.
|
|
95
|
+
|
|
96
|
+
For each assumption, **also identify its second/third-order implications** —
|
|
97
|
+
what else in the system depends on it being true?
|
|
98
|
+
|
|
99
|
+
Format:
|
|
100
|
+
```
|
|
101
|
+
ASSUMPTIONS (must be verified before plan ships):
|
|
102
|
+
1. <claim that the plan depends on>
|
|
103
|
+
→ implications: <what else breaks if 1 is false>
|
|
104
|
+
2. <claim that the plan depends on>
|
|
105
|
+
→ implications: <what else breaks if 2 is false>
|
|
106
|
+
3. ...
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
If an assumption can't be stated cleanly in one sentence, it isn't ready to be tested.
|
|
110
|
+
Break it down further.
|
|
111
|
+
|
|
112
|
+
**Ripple-effect check** (do this once for every plan that touches existing architecture):
|
|
113
|
+
|
|
114
|
+
Ask explicitly:
|
|
115
|
+
- What existing flows touch the system we're changing?
|
|
116
|
+
(auth, network, sessions, MCP, recording, persistence, log-fetch, dashboard, voice loop)
|
|
117
|
+
- For each connected flow, can the change break it in a non-obvious way?
|
|
118
|
+
- Is there a code path that USED to work without our knowledge that depends on the old behavior?
|
|
119
|
+
|
|
120
|
+
If yes to any of those, add the affected flow as a new assumption that needs verification.
|
|
121
|
+
This is where the expensive surprises hide.
|
|
122
|
+
|
|
123
|
+
### 3. ASYNC VERIFIER SPAWN (parallel, non-blocking)
|
|
124
|
+
For EACH assumption, spawn an Agent subagent **immediately**, in a SINGLE message
|
|
125
|
+
with multiple Agent tool calls so they run concurrently.
|
|
126
|
+
|
|
127
|
+
Choose verifier type by the nature of the assumption. **Architectural and integration verifiers come first** — they catch the expensive surprises.
|
|
128
|
+
|
|
129
|
+
| Assumption type | Verifier type | What it does |
|
|
130
|
+
|---|---|---|
|
|
131
|
+
| **Architectural impact** (`does X break flow Y?`) | **Ripple agent** | Traces all callers/consumers of the changed component, checks each for breakage |
|
|
132
|
+
| **Integration** (`does X work with subsystem Y?`) | **Integration agent** | Spawns end-to-end test exercising the connection between subsystems |
|
|
133
|
+
| Behavioral (`does X`, `propagates`, `survives Y`) | Test agent | Triggers the behavior, observes outcome |
|
|
134
|
+
| Documented (`API supports X`, `library does Y`) | Research agent | Fetches docs/code/sources, returns citation with quote |
|
|
135
|
+
| Derivable (`X+Y → Z`) | Reasoning agent | Derives from established facts, returns chain |
|
|
136
|
+
| Timing (`Xs`, `fast`, `slow`) | Test agent | Runs the actual operation on real infra, measures under stated conditions |
|
|
137
|
+
| Empirical (`users typically do X`) | Research agent | Cites surveys/data/observations |
|
|
138
|
+
|
|
139
|
+
Each verifier returns one of:
|
|
140
|
+
- **MEASURED**: empirical observation with conditions documented
|
|
141
|
+
- **SOURCED**: cited from authoritative source with quote
|
|
142
|
+
- **DERIVED**: chain from established facts
|
|
143
|
+
- **CONTRADICTED**: evidence that the assumption is false
|
|
144
|
+
- **UNVERIFIABLE**: cannot be determined in available time/resources
|
|
145
|
+
|
|
146
|
+
### 4. CONTINUE PLANNING (don't block on verifiers)
|
|
147
|
+
Keep talking with the user through design tradeoffs, edge cases, etc.
|
|
148
|
+
**Main agent is NOT in the test loop.** Verifiers run in the background.
|
|
149
|
+
DO NOT commit to a recommendation until verifiers report.
|
|
150
|
+
|
|
151
|
+
**Main agent's role while verifiers run:**
|
|
152
|
+
- Stay in conversation with the user
|
|
153
|
+
- Sketch more of the plan / explore tradeoffs / answer questions
|
|
154
|
+
- Track which verifiers are still in flight, which returned, which contradicted
|
|
155
|
+
- React to verifier results as they arrive — don't poll, don't wait silently
|
|
156
|
+
|
|
157
|
+
**Things the main agent should NOT do while verifiers are in flight:**
|
|
158
|
+
- Run a Bash command that performs the same test (defeats delegation)
|
|
159
|
+
- Read files the verifier is already reading (duplicative)
|
|
160
|
+
- "Just check one quick thing myself" — that's how delegation collapses
|
|
161
|
+
- Block the conversation until results come back
|
|
162
|
+
|
|
163
|
+
### 5. INTEGRATE RESULTS
|
|
164
|
+
When a verifier returns:
|
|
165
|
+
- **MEASURED / SOURCED / DERIVED** → mark assumption ✓, keep going
|
|
166
|
+
- **CONTRADICTED** → STOP, mark assumption ✗, announce: "Assumption N contradicted by <evidence>. Replanning." → restart at step 1 with revised approach
|
|
167
|
+
- **UNVERIFIABLE** → mark ⚠️, ask user: "Cannot verify <assumption>. Proceed with explicit risk, or pivot to a verifiable approach?"
|
|
168
|
+
|
|
169
|
+
### 6. COMMITTED PLAN
|
|
170
|
+
Only present a plan as the recommended approach when every assumption is
|
|
171
|
+
✓ MEASURED, ✓ SOURCED, ✓ DERIVED, or explicitly accepted as ⚠️ UNVERIFIABLE.
|
|
172
|
+
|
|
173
|
+
## OUTPUT FORMAT
|
|
174
|
+
|
|
175
|
+
While verifiers are in flight:
|
|
176
|
+
```
|
|
177
|
+
PLAN: <draft summary>
|
|
178
|
+
|
|
179
|
+
ASSUMPTIONS (verifiers running in parallel):
|
|
180
|
+
☐ A1: <assumption>
|
|
181
|
+
☐ A2: <assumption>
|
|
182
|
+
☐ A3: <assumption>
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
As verifiers return:
|
|
186
|
+
```
|
|
187
|
+
✓ A1 MEASURED: <result> (conditions: <where/when/setup>)
|
|
188
|
+
✓ A2 SOURCED: <URL> — "<quote>"
|
|
189
|
+
✗ A3 CONTRADICTED: <evidence>
|
|
190
|
+
→ STOPPING. Replanning around A3.
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Final state:
|
|
194
|
+
```
|
|
195
|
+
VERIFIED PLAN:
|
|
196
|
+
<plan with every assumption marked ✓ or explicitly ⚠️>
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
## HARD RULES (no exceptions)
|
|
200
|
+
|
|
201
|
+
1. **No naked "it fits / it works" claims.** Never assert that a new piece integrates with the existing architecture — "Codex maps 1:1 to our session model", "this slots into the existing flow", "the SDK stores data the same way" — without backing from an existing test, the actual docs/source, or a new delegated test.
|
|
202
|
+
2. **No naked behavioral claims.** Never write "auto-updates", "propagates", "survives", "rolls back", "X just works" without MEASURED or SOURCED backing.
|
|
203
|
+
3. **Check for existing evidence FIRST.** Before commissioning a new test, have a subagent check whether a previous test run, doc, or the source already answers it. Don't re-test what's already proven.
|
|
204
|
+
4. **CONTRADICTED stops everything.** When a verifier contradicts an assumption, NO new content is written about the plan until the plan is revised and the verifier rerun.
|
|
205
|
+
5. **UNVERIFIABLE is loud.** Mark it ⚠️ in the output AND ask the user for explicit acceptance. Don't hide unverified parts in prose.
|
|
206
|
+
6. **Training-data intuition is forbidden as evidence.** "X typically works this way" is not a citation. Verify against a test, doc, or source — or skip.
|
|
207
|
+
7. **Timing/comparative claims are the least of it, but still bound:** don't write "fast"/"slow"/"X seconds"/"faster than Y" without a measurement + conditions. Just don't let speed-benchmarking crowd out the architectural-fit and integration checks, which are where the expensive surprises actually live.
|
|
208
|
+
|
|
209
|
+
## SUBAGENT SPAWNING PATTERNS
|
|
210
|
+
|
|
211
|
+
For timing/behavioral tests on real infra:
|
|
212
|
+
> Spawn Agent subagent with prompt: "Run <specific command> on <specific target>. Measure <specific metric>. Report back the measurement and the conditions (machine type, memory, network state, cold/warm cache). Do not attempt the broader task — only verify this one assumption."
|
|
213
|
+
|
|
214
|
+
For documentation lookups:
|
|
215
|
+
> Spawn Agent subagent with prompt: "Find authoritative source for <specific claim>. Return URL + verbatim quote. If multiple sources, prefer official docs > vendor blogs > Stack Overflow. If no source exists, return UNVERIFIABLE with reasoning."
|
|
216
|
+
|
|
217
|
+
For derivation:
|
|
218
|
+
> Spawn Agent subagent with prompt: "Given these established facts: <list>, can we derive <claim>? Return either the derivation chain OR 'cannot derive — gap at: <step>'."
|
|
219
|
+
|
|
220
|
+
For ripple-effect / architectural impact (HIGHEST priority):
|
|
221
|
+
> Spawn Agent subagent with prompt: "Trace all consumers / callers / dependencies of `<component being changed>` in the codebase. For each consumer, check whether the proposed change would break it. Report each potential break with file:line and the specific failure mode. Do not propose fixes — only enumerate breaks."
|
|
222
|
+
|
|
223
|
+
For integration testing (HIGH priority):
|
|
224
|
+
> Spawn Agent subagent with prompt: "End-to-end test: after the proposed change, exercise the connection between `<subsystem A>` and `<subsystem B>` on real infra. Specifically, verify `<concrete cross-system flow>`. Report MEASURED behavior and any divergence from the expected flow."
|
|
225
|
+
|
|
226
|
+
**Pattern**: invoke all Agent tools in a single response message so they run concurrently rather than sequentially. The Agent tool's `subagent_type` should be `general-purpose` or `Explore` (read-only) depending on what the verifier needs.
|
|
227
|
+
|
|
228
|
+
## PORTABILITY NOTES
|
|
229
|
+
|
|
230
|
+
This skill works in any Claude Agent SDK environment because:
|
|
231
|
+
- It only requires the Agent tool (standard SDK feature)
|
|
232
|
+
- The trigger logic is prose, not code
|
|
233
|
+
- No host-specific paths, IDs, or APIs
|
|
234
|
+
|
|
235
|
+
To deploy on another agent (e.g. osborn on a Fly machine), copy this SKILL.md to that agent's skills dir:
|
|
236
|
+
- Claude Code: `~/.claude/skills/ground-assumptions/SKILL.md`
|
|
237
|
+
- osborn on Fly: `/workspace/root-chroot/root/.claude/skills/ground-assumptions/SKILL.md`
|
|
238
|
+
- Other Claude Agent SDK hosts: their configured skills path
|
|
239
|
+
|
|
240
|
+
## EXIT CRITERIA
|
|
241
|
+
|
|
242
|
+
The skill releases its grip on a conversation when:
|
|
243
|
+
- All assumptions are verified and a committed plan exists, OR
|
|
244
|
+
- The user explicitly asks to skip verification ("just give me your best guess"), OR
|
|
245
|
+
- The conversation shifts away from planning to execution of an already-verified plan
|
|
246
|
+
|
|
247
|
+
In the second case, mark the response with "WARNING: Skipping verification at user request. The following is unverified intuition." so the lack of grounding is visible.
|
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
# Voice Native Sync Skill
|
|
2
|
+
|
|
3
|
+
## SKILL IDENTITY
|
|
4
|
+
Name: voice-native-sync
|
|
5
|
+
Install path: ~/.claude/skills/voice-native-sync/SKILL.md
|
|
6
|
+
|
|
7
|
+
## TRIGGER PHRASES
|
|
8
|
+
This skill activates when the user says any of:
|
|
9
|
+
- "upload to voice native"
|
|
10
|
+
- "download from voice native"
|
|
11
|
+
- "sync to voice native"
|
|
12
|
+
- "push sessions to voice native"
|
|
13
|
+
- "pull sessions from voice native"
|
|
14
|
+
- "sync my sessions"
|
|
15
|
+
|
|
16
|
+
When triggered, follow the instructions below immediately without asking for clarification,
|
|
17
|
+
unless credentials are missing (see CREDENTIALS section).
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## SELF-INSTALL
|
|
22
|
+
If the user asks you to install this skill or you fetched this file from a URL,
|
|
23
|
+
save it to: ~/.claude/skills/voice-native-sync/SKILL.md
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
mkdir -p ~/.claude/skills/voice-native-sync
|
|
27
|
+
# then write this file's content to ~/.claude/skills/voice-native-sync/SKILL.md
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
After saving, confirm: "Voice Native sync skill installed. Use 'upload to voice native' or 'download from voice native' any time."
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## CREDENTIALS
|
|
35
|
+
Credentials are stored at: ~/.claude/skills/voice-native-sync/config
|
|
36
|
+
|
|
37
|
+
Config file format:
|
|
38
|
+
```
|
|
39
|
+
CLOUD_URL=https://osborn-XXXX.fly.dev
|
|
40
|
+
TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
If the config file doesn't exist:
|
|
44
|
+
1. Tell the user: "I need your Voice Native sync credentials. Go to voice-native.com → Settings → Copy sync info, then paste it here."
|
|
45
|
+
2. Parse the pasted block for CLOUD_URL (the "Server:" line) and TOKEN (the "Token:" line)
|
|
46
|
+
3. Save to ~/.claude/skills/voice-native-sync/config
|
|
47
|
+
4. Proceed with the requested operation
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## UPLOAD (Local → Voice Native Cloud)
|
|
52
|
+
|
|
53
|
+
Uploads all local Claude session files to the Voice Native fly machine.
|
|
54
|
+
Uses chunked upload + finalize. Safe to re-run — mtime-newer-wins per file.
|
|
55
|
+
|
|
56
|
+
### Execute as a single script (one permission prompt):
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
set -e
|
|
60
|
+
|
|
61
|
+
# Load credentials
|
|
62
|
+
source ~/.claude/skills/voice-native-sync/config
|
|
63
|
+
|
|
64
|
+
TARGET_PATH="/workspace"
|
|
65
|
+
|
|
66
|
+
rm -f /tmp/vn-sync.tar.gz /tmp/vn-chunk-*
|
|
67
|
+
|
|
68
|
+
# Archive local Claude projects (exclude macOS AppleDouble files)
|
|
69
|
+
tar -czf /tmp/vn-sync.tar.gz \
|
|
70
|
+
$(uname | grep -qi darwin && echo '--exclude=._*') \
|
|
71
|
+
-C "$HOME/.claude" projects
|
|
72
|
+
|
|
73
|
+
echo "archive: $(du -sh /tmp/vn-sync.tar.gz | cut -f1)"
|
|
74
|
+
|
|
75
|
+
# Split into 50MB chunks
|
|
76
|
+
split -b 50m /tmp/vn-sync.tar.gz /tmp/vn-chunk-
|
|
77
|
+
CHUNKS=(/tmp/vn-chunk-*)
|
|
78
|
+
TOTAL=${#CHUNKS[@]}
|
|
79
|
+
echo "chunks: $TOTAL"
|
|
80
|
+
|
|
81
|
+
# Generate upload ID (works on Linux and macOS)
|
|
82
|
+
if command -v uuidgen &>/dev/null; then
|
|
83
|
+
UPLOAD_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
|
|
84
|
+
else
|
|
85
|
+
UPLOAD_ID=$(cat /proc/sys/kernel/random/uuid 2>/dev/null || python3 -c "import uuid; print(uuid.uuid4())")
|
|
86
|
+
fi
|
|
87
|
+
echo "upload id: $UPLOAD_ID"
|
|
88
|
+
|
|
89
|
+
# Upload chunks
|
|
90
|
+
idx=0
|
|
91
|
+
for chunk in "${CHUNKS[@]}"; do
|
|
92
|
+
echo "uploading chunk $idx / $((TOTAL-1))..."
|
|
93
|
+
STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
|
|
94
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
95
|
+
-H "Content-Type: application/octet-stream" \
|
|
96
|
+
--data-binary "@${chunk}" \
|
|
97
|
+
"${CLOUD_URL}/sessions/import-chunk?uploadId=${UPLOAD_ID}&chunk=${idx}")
|
|
98
|
+
echo " chunk $idx → HTTP $STATUS"
|
|
99
|
+
idx=$((idx+1))
|
|
100
|
+
done
|
|
101
|
+
|
|
102
|
+
# Finalize — merges chunks and extracts WITHOUT slug remapping.
|
|
103
|
+
# IMPORTANT: do NOT pass `targetWorkDir`. The server-side remap collapses every
|
|
104
|
+
# source slug into the target work dir's slug, which causes session-resume to
|
|
105
|
+
# silently break when sessions are uploaded from different hosts (Mac, Codespace,
|
|
106
|
+
# Sprite) — they all end up in -workspace, the JSONLs internally still reference
|
|
107
|
+
# their original cwd, the slug↔cwd no longer match, and Claude Code's resume
|
|
108
|
+
# can't find the file. Confirmed 2026-05-27: a codespace upload remapped
|
|
109
|
+
# -workspaces-codespaces-blank → -workspace and every codespace session went
|
|
110
|
+
# silent on resume. The fix is to preserve each upload's original slug structure.
|
|
111
|
+
echo "finalizing..."
|
|
112
|
+
RESULT=$(curl -s -X POST \
|
|
113
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
114
|
+
"${CLOUD_URL}/sessions/import-finalize?uploadId=${UPLOAD_ID}&total=${TOTAL}")
|
|
115
|
+
echo "finalize result: $RESULT"
|
|
116
|
+
|
|
117
|
+
# Cleanup
|
|
118
|
+
rm -f /tmp/vn-sync.tar.gz /tmp/vn-chunk-*
|
|
119
|
+
|
|
120
|
+
# Verify
|
|
121
|
+
echo "verifying manifest..."
|
|
122
|
+
curl -s -H "Authorization: Bearer $TOKEN" "${CLOUD_URL}/sessions/manifest" | \
|
|
123
|
+
python3 -c "
|
|
124
|
+
import json,sys
|
|
125
|
+
d=json.load(sys.stdin)
|
|
126
|
+
slugs=d.get('slugs',{})
|
|
127
|
+
total=sum(len(v.get('files',{})) for v in slugs.values())
|
|
128
|
+
print(f' cloud now has {len(slugs)} slug(s), {total} total files')
|
|
129
|
+
for slug,info in slugs.items():
|
|
130
|
+
files=info.get('files',{})
|
|
131
|
+
print(f' {slug}: {len(files)} files')
|
|
132
|
+
"
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## DOWNLOAD (Voice Native Cloud → Local)
|
|
138
|
+
|
|
139
|
+
Downloads all sessions from the Voice Native fly machine and merges into local ~/.claude/projects/.
|
|
140
|
+
Mtime-newer-wins — local files newer than cloud are preserved.
|
|
141
|
+
|
|
142
|
+
### Execute as a single script:
|
|
143
|
+
|
|
144
|
+
```bash
|
|
145
|
+
set -e
|
|
146
|
+
|
|
147
|
+
# Load credentials
|
|
148
|
+
source ~/.claude/skills/voice-native-sync/config
|
|
149
|
+
|
|
150
|
+
# Get local working directory for slug remapping
|
|
151
|
+
LOCAL_CWD="$(pwd)"
|
|
152
|
+
echo "local target cwd: $LOCAL_CWD"
|
|
153
|
+
|
|
154
|
+
rm -f /tmp/vn-download.tar.gz
|
|
155
|
+
|
|
156
|
+
# Download full export from fly machine
|
|
157
|
+
echo "downloading from $CLOUD_URL..."
|
|
158
|
+
curl -f -L \
|
|
159
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
160
|
+
"${CLOUD_URL}/sessions/export" \
|
|
161
|
+
-o /tmp/vn-download.tar.gz
|
|
162
|
+
echo "downloaded: $(du -sh /tmp/vn-download.tar.gz | cut -f1)"
|
|
163
|
+
|
|
164
|
+
# Import with slug remapping to local cwd
|
|
165
|
+
echo "importing..."
|
|
166
|
+
# Same fix as upload: no targetWorkDir, preserve original slug structure.
|
|
167
|
+
RESULT=$(curl -s -X POST \
|
|
168
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
169
|
+
-H "Content-Type: application/octet-stream" \
|
|
170
|
+
--data-binary "@/tmp/vn-download.tar.gz" \
|
|
171
|
+
"${CLOUD_URL}/sessions/import")
|
|
172
|
+
echo "import result: $RESULT"
|
|
173
|
+
|
|
174
|
+
rm -f /tmp/vn-download.tar.gz
|
|
175
|
+
|
|
176
|
+
echo "done — sessions merged into ~/.claude/projects/"
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Wait — the DOWNLOAD direction means pulling from cloud to THIS local machine.
|
|
180
|
+
The import endpoint runs on the cloud. For download to local, use this instead:
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
set -e
|
|
184
|
+
source ~/.claude/skills/voice-native-sync/config
|
|
185
|
+
|
|
186
|
+
LOCAL_CWD="$(pwd)"
|
|
187
|
+
rm -f /tmp/vn-download.tar.gz
|
|
188
|
+
|
|
189
|
+
echo "downloading export from $CLOUD_URL..."
|
|
190
|
+
curl -f -H "Authorization: Bearer $TOKEN" \
|
|
191
|
+
"${CLOUD_URL}/sessions/export" \
|
|
192
|
+
-o /tmp/vn-download.tar.gz
|
|
193
|
+
echo "downloaded: $(du -sh /tmp/vn-download.tar.gz | cut -f1)"
|
|
194
|
+
|
|
195
|
+
# Extract archive
|
|
196
|
+
mkdir -p /tmp/vn-extract
|
|
197
|
+
tar -xzf /tmp/vn-download.tar.gz -C /tmp/vn-extract
|
|
198
|
+
|
|
199
|
+
# Remap and merge into local ~/.claude/projects/
|
|
200
|
+
LOCAL_SLUG=$(echo "$LOCAL_CWD" | sed 's|/|-|g')
|
|
201
|
+
PROJECTS_DIR="$HOME/.claude/projects"
|
|
202
|
+
mkdir -p "${PROJECTS_DIR}/${LOCAL_SLUG}"
|
|
203
|
+
|
|
204
|
+
echo "merging into ${PROJECTS_DIR}/${LOCAL_SLUG}..."
|
|
205
|
+
for slug_dir in /tmp/vn-extract/projects/*/; do
|
|
206
|
+
slug=$(basename "$slug_dir")
|
|
207
|
+
for f in "${slug_dir}"*.jsonl "${slug_dir}"*.jsonl.* 2>/dev/null; do
|
|
208
|
+
[ -f "$f" ] || continue
|
|
209
|
+
fname=$(basename "$f")
|
|
210
|
+
dest="${PROJECTS_DIR}/${LOCAL_SLUG}/${fname}"
|
|
211
|
+
if [ ! -f "$dest" ] || [ "$f" -nt "$dest" ]; then
|
|
212
|
+
cp "$f" "$dest"
|
|
213
|
+
echo " wrote $fname"
|
|
214
|
+
fi
|
|
215
|
+
done
|
|
216
|
+
done
|
|
217
|
+
|
|
218
|
+
rm -rf /tmp/vn-download.tar.gz /tmp/vn-extract
|
|
219
|
+
echo "done — sessions available at ${PROJECTS_DIR}/${LOCAL_SLUG}/"
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## TECHNICAL NOTES
|
|
225
|
+
- Cloud target path is always `/workspace` (Fly.io machines)
|
|
226
|
+
- Slug remapping is automatic on upload (source slug → /workspace slug)
|
|
227
|
+
- Mtime-newer-wins: re-syncing is always safe, newer file wins per-file
|
|
228
|
+
- gzip only — never use zstd (server doesn't support it)
|
|
229
|
+
- macOS: always pass `--exclude='._*'` to tar (BSD tar emits AppleDouble files)
|
package/Dockerfile.sandbox
CHANGED
|
@@ -31,7 +31,7 @@
|
|
|
31
31
|
# - Path layout: chroot /workspace/root-chroot/root/... vs /workspace/home/...
|
|
32
32
|
# Subagent research confirmed chroot in our codebase solves only HOME-and-persistence
|
|
33
33
|
# layout, NOT library access (Linux dynamic linker is mount-agnostic per ld.so(8)).
|
|
34
|
-
# Since HOME=/workspace
|
|
34
|
+
# Since HOME=/workspace achieves the same persistence with ~100 fewer LOC and
|
|
35
35
|
# no bind-mount complexity, the chroot version was retired.
|
|
36
36
|
# Archive: docs/archive/Dockerfile.sandbox.chroot-2026-05-28.md
|
|
37
37
|
|
|
@@ -63,14 +63,45 @@ ENV OSBORN_API_PORT=8741
|
|
|
63
63
|
ENV NODE_ENV=production
|
|
64
64
|
ENV OSBORN_IMAGE_VERSION=${OSBORN_VERSION}
|
|
65
65
|
|
|
66
|
-
#
|
|
67
|
-
#
|
|
68
|
-
#
|
|
69
|
-
#
|
|
70
|
-
#
|
|
71
|
-
|
|
66
|
+
# HOME=/workspace — the volume mount point itself is the home directory.
|
|
67
|
+
# This means ~/.claude resolves to /workspace/.claude, which is EXACTLY where
|
|
68
|
+
# the legacy symlink architecture (/root/.claude -> /workspace/.claude) already
|
|
69
|
+
# put credentials and sessions. So there is ZERO migration: an existing machine
|
|
70
|
+
# updating to this image finds its data already at ~/.claude with no file moves.
|
|
71
|
+
#
|
|
72
|
+
# Why not /workspace/home (the earlier Option D choice)? That required MOVING
|
|
73
|
+
# legacy data from /workspace/.claude into /workspace/home/.claude — an `mv`
|
|
74
|
+
# loop that is destructive (deletes source as it goes), non-atomic across
|
|
75
|
+
# multiple files (interruption = split state), and catastrophic if HOME ever
|
|
76
|
+
# resolved off-volume (mv would send data to the ephemeral overlay). Pointing
|
|
77
|
+
# HOME at /workspace eliminates the migration entirely: nothing moves, so
|
|
78
|
+
# nothing can be lost in a move. The only cosmetic cost is dotfiles sitting at
|
|
79
|
+
# the volume root — identical to what the legacy symlink effectively did.
|
|
80
|
+
#
|
|
81
|
+
# NOTE on overrides: these are Dockerfile ENV *defaults*. A Fly machine-config
|
|
82
|
+
# `env.HOME` (or app secret) OVERRIDES them at runtime. updateOsborn strips
|
|
83
|
+
# HOME/OSBORN_CWD from existing machine configs during image-swap so this
|
|
84
|
+
# default actually takes effect on migrated machines — without that, a stale
|
|
85
|
+
# HOME=/root from an older provisioning would silently win. See
|
|
86
|
+
# frontend/src/lib/machines.ts updateOsbornImpl.
|
|
87
|
+
ENV HOME=/workspace
|
|
72
88
|
ENV OSBORN_CWD=/workspace
|
|
73
89
|
|
|
90
|
+
# HYBRID: user-installed global npm packages persist on the volume.
|
|
91
|
+
# osborn itself was already installed above into the DEFAULT prefix (/usr/local,
|
|
92
|
+
# image layer) — that RUN happened BEFORE this ENV, so osborn stays in the image
|
|
93
|
+
# and updates via image-swap (atomic, no runtime OOM, toolchain present at build).
|
|
94
|
+
# Setting NPM_CONFIG_PREFIX here only affects RUNTIME `npm install -g <x>` the
|
|
95
|
+
# user/agent runs: those land in /workspace/.npm-global on the persistent volume
|
|
96
|
+
# and survive restarts + image-swaps. PATH puts that bin dir first so installed
|
|
97
|
+
# CLIs are immediately runnable. Verified end-to-end on real Fly 2026-06-01:
|
|
98
|
+
# pure-JS (cowsay) AND native-compiled (node-pty via node-gyp) both install at
|
|
99
|
+
# runtime, persist across restart, no OOM (toolchain is in the image).
|
|
100
|
+
# Caveat: native user modules are tied to the image's Node ABI (currently 22) —
|
|
101
|
+
# a future Node-major image bump would need an `npm rebuild` of volume globals.
|
|
102
|
+
ENV NPM_CONFIG_PREFIX=/workspace/.npm-global
|
|
103
|
+
ENV PATH=/workspace/.npm-global/bin:$PATH
|
|
104
|
+
|
|
74
105
|
WORKDIR /workspace
|
|
75
106
|
EXPOSE 8741
|
|
76
107
|
|
|
@@ -95,34 +126,23 @@ exec > >(tee -a "$LOGFILE") 2>&1
|
|
|
95
126
|
ONBOARDING_JSON='{"numStartups":10,"installMethod":"npm","autoUpdates":false,"hasCompletedOnboarding":true,"hasTrustDialogAccepted":true,"hasTrustDialogHooksAccepted":true,"hasCompletedProjectOnboarding":true,"hasAcknowledgedCostThreshold":true,"effortCalloutV2Dismissed":true,"theme":"dark","projects":{"/workspace":{"hasTrustDialogAccepted":true,"hasTrustDialogHooksAccepted":true,"hasCompletedProjectOnboarding":true}}}'
|
|
96
127
|
|
|
97
128
|
# ============================================================
|
|
98
|
-
# === HOME-on-volume
|
|
129
|
+
# === HOME-on-volume setup (HOME=/workspace) ===
|
|
99
130
|
# ============================================================
|
|
100
|
-
# HOME=/workspace
|
|
101
|
-
#
|
|
102
|
-
|
|
131
|
+
# HOME=/workspace, set via ENV in Dockerfile (and enforced by updateOsborn
|
|
132
|
+
# stripping any stale HOME from existing machine configs). Because HOME is the
|
|
133
|
+
# volume mount itself, ~/.claude == /workspace/.claude — which is exactly where
|
|
134
|
+
# the legacy symlink architecture already stored credentials + sessions.
|
|
135
|
+
#
|
|
136
|
+
# THEREFORE: NO MIGRATION. An existing machine's data is already at ~/.claude
|
|
137
|
+
# the instant HOME points at /workspace. We removed the old `mv` migration
|
|
138
|
+
# block entirely (it was destructive + non-atomic + catastrophic if HOME ever
|
|
139
|
+
# resolved off-volume). Nothing moves, so nothing can be lost in a move.
|
|
140
|
+
echo "[sandbox-d] HOME=$HOME OSBORN_CWD=$OSBORN_CWD NPM_CONFIG_PREFIX=$NPM_CONFIG_PREFIX"
|
|
103
141
|
mkdir -p "$HOME" "$HOME/.claude" "$HOME/.osborn"
|
|
104
|
-
|
|
105
|
-
#
|
|
106
|
-
#
|
|
107
|
-
|
|
108
|
-
# on first boot of the D image. Atomic mv — safe.
|
|
109
|
-
if [ -d /workspace/.claude ] && [ ! -d "$HOME/.claude/projects" ] && [ ! -f "$HOME/.claude/.credentials.json" ]; then
|
|
110
|
-
echo "[sandbox-d] migrating legacy /workspace/.claude → \$HOME/.claude"
|
|
111
|
-
# Move CONTENTS, not the dir itself (target may already exist with seeded skills)
|
|
112
|
-
for item in /workspace/.claude/.* /workspace/.claude/*; do
|
|
113
|
-
[ -e "$item" ] || continue
|
|
114
|
-
BASENAME=$(basename "$item")
|
|
115
|
-
[ "$BASENAME" = "." ] && continue
|
|
116
|
-
[ "$BASENAME" = ".." ] && continue
|
|
117
|
-
[ -e "$HOME/.claude/$BASENAME" ] && continue
|
|
118
|
-
mv "$item" "$HOME/.claude/$BASENAME" 2>/dev/null || true
|
|
119
|
-
done
|
|
120
|
-
rmdir /workspace/.claude 2>/dev/null || true
|
|
121
|
-
fi
|
|
122
|
-
if [ -f /workspace/.claude.json ] && [ ! -f "$HOME/.claude.json" ]; then
|
|
123
|
-
echo "[sandbox-d] migrating legacy /workspace/.claude.json → \$HOME/.claude.json"
|
|
124
|
-
mv /workspace/.claude.json "$HOME/.claude.json"
|
|
125
|
-
fi
|
|
142
|
+
# HYBRID: ensure the volume-backed npm global prefix exists so user
|
|
143
|
+
# `npm install -g <x>` has a target on first use (npm would create it anyway,
|
|
144
|
+
# but pre-making it keeps perms predictable + visible in the boot log).
|
|
145
|
+
mkdir -p /workspace/.npm-global
|
|
126
146
|
|
|
127
147
|
# Onboarding config (overwrites every boot — intentional, deterministic state)
|
|
128
148
|
echo "$ONBOARDING_JSON" > "$HOME/.claude.json"
|
package/dist/index.js
CHANGED
|
@@ -2527,6 +2527,30 @@ async function main() {
|
|
|
2527
2527
|
currentLLM = null;
|
|
2528
2528
|
clearFastBrainSession();
|
|
2529
2529
|
clearPipelineFastBrainSession();
|
|
2530
|
+
// ── Ghost-agent fix (2026-06-01) ──
|
|
2531
|
+
// When LiveKit Cloud evicts our WebSocket (idle, network blip, or quota window),
|
|
2532
|
+
// the previous code stopped here — agent process kept running but no longer in
|
|
2533
|
+
// any room. /health continued returning "livekit.status:connected" because the
|
|
2534
|
+
// status was never written back. Frontend's checkOsbornHealth only validates
|
|
2535
|
+
// HTTP 200, so the ghost state was invisible. Users got stuck in "Connecting..."
|
|
2536
|
+
// forever because their LiveKit-token-minted room had no agent in it.
|
|
2537
|
+
//
|
|
2538
|
+
// Fix: re-arm the retry loop. connectWithRetry() will try to reconnect with
|
|
2539
|
+
// the same room name (so the room code stays stable for any in-flight frontend
|
|
2540
|
+
// token requests), backing off 5s → 60s. If the disconnect was permanent
|
|
2541
|
+
// (e.g. JWT expired — they're 24h), the retry will fail and surface
|
|
2542
|
+
// livekit.status=failed, which the (also-fixed) frontend health check will
|
|
2543
|
+
// see and trigger restartService.
|
|
2544
|
+
//
|
|
2545
|
+
// Note: we mark status='retrying' immediately so /health reflects the real
|
|
2546
|
+
// state — closing the lie window between Disconnected and the next attempt.
|
|
2547
|
+
livekitState.status = 'retrying';
|
|
2548
|
+
livekitState.error = 'LiveKit room disconnected; attempting to rejoin';
|
|
2549
|
+
livekitState.errorCode = 'disconnected';
|
|
2550
|
+
console.log('🔄 Rejoining LiveKit room after disconnect...');
|
|
2551
|
+
connectWithRetry().catch(err => {
|
|
2552
|
+
console.error('❌ Reconnect attempt threw (should not happen — connectWithRetry loops):', err);
|
|
2553
|
+
});
|
|
2530
2554
|
});
|
|
2531
2555
|
room.on(RoomEvent.ParticipantConnected, async (participant) => {
|
|
2532
2556
|
console.log(`\n👤 User joined: ${participant.identity}`);
|