@archetypeai/ds-cli 0.3.9 → 0.3.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/files/AGENTS.md +19 -3
- package/files/CLAUDE.md +21 -3
- package/files/rules/accessibility.md +49 -0
- package/files/rules/frontend-architecture.md +77 -0
- package/files/skills/apply-ds/SKILL.md +92 -80
- package/files/skills/apply-ds/scripts/audit.sh +169 -0
- package/files/skills/apply-ds/scripts/setup.sh +48 -166
- package/files/skills/create-dashboard/SKILL.md +12 -0
- package/files/skills/embedding-from-file/SKILL.md +415 -0
- package/files/skills/embedding-from-sensor/SKILL.md +406 -0
- package/files/skills/embedding-upload/SKILL.md +414 -0
- package/files/skills/fix-accessibility/SKILL.md +57 -9
- package/files/skills/newton-activity-monitor-lens-on-video/SKILL.md +817 -0
- package/files/skills/newton-camera-frame-analysis/SKILL.md +611 -0
- package/files/skills/newton-camera-frame-analysis/scripts/activity-monitor-frame.py +165 -0
- package/files/skills/newton-camera-frame-analysis/scripts/captures/logs/api_responses_20260206_105610.json +62 -0
- package/files/skills/newton-camera-frame-analysis/scripts/continuous_monitor.py +119 -0
- package/files/skills/newton-direct-query/SKILL.md +212 -0
- package/files/skills/newton-direct-query/scripts/direct_query.py +129 -0
- package/files/skills/newton-machine-state-from-file/SKILL.md +545 -0
- package/files/skills/newton-machine-state-from-sensor/SKILL.md +707 -0
- package/files/skills/newton-machine-state-upload/SKILL.md +986 -0
- package/package.json +1 -1
|
@@ -0,0 +1,817 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: newton-activity-monitor-lens-on-video
|
|
3
|
+
description: Analyze uploaded video files using Newton's activity monitor lens. Server reads the video via video_file_reader and streams results via SSE. Use for batch video analysis, activity detection, or temporal event extraction from video files. NOT for live webcam — use /camera-frame-analysis for that.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Newtown Activity Monitor Lens on Video (Upload Video → Server-Side Processing → SSE)
|
|
7
|
+
|
|
8
|
+
Upload a video file to the Archetype AI platform and analyze it server-side using Newton's activity monitor lens. The server reads the video via `video_file_reader` and streams inference results back via SSE.
|
|
9
|
+
|
|
10
|
+
**This skill is for UPLOADED VIDEO FILES only.** For live webcam analysis, use `/camera-frame-analysis` instead.
|
|
11
|
+
|
|
12
|
+
| | This skill (activity-monitor-lens-on-video) | camera-frame-analysis |
|
|
13
|
+
|---|---|---|
|
|
14
|
+
| **Input** | Uploaded video file (mp4, etc.) | Live webcam (base64 JPEG frames) |
|
|
15
|
+
| **Who reads frames** | Server (`video_file_reader`) | Client (Python cv2 / JS canvas) |
|
|
16
|
+
| **Event type** | Server-driven processing | `model.query` (request/response) |
|
|
17
|
+
| **Response** | Async via SSE stream (`inference.result`) | Direct in POST response |
|
|
18
|
+
| **Use case** | Batch video analysis, post-hoc review | Real-time webcam Q&A |
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Step 0: Choose an implementation approach
|
|
23
|
+
|
|
24
|
+
Before writing any code, ask the user which approach they want:
|
|
25
|
+
|
|
26
|
+
| Approach | Best for | API key security | Requires |
|
|
27
|
+
|----------|----------|-------------------|----------|
|
|
28
|
+
| **Server Proxy (SvelteKit)** | SvelteKit apps, demos with MediaRecorder | Private (server-side only) | SvelteKit server routes, `ffmpeg-static` for MediaRecorder video conversion |
|
|
29
|
+
| **Plain JS (client-side)** | Quick prototypes, static sites | Exposed in client | Direct `fetch` calls, pre-recorded MP4 files only |
|
|
30
|
+
| **Python** | Backend scripts, batch processing, CLI tools | Private (server-side) | `archetypeai` Python package |
|
|
31
|
+
|
|
32
|
+
### Key considerations
|
|
33
|
+
|
|
34
|
+
- **MediaRecorder video (webcam recordings):** Chrome's `MediaRecorder` produces fragmented MP4 or WebM that the Archetype `video_file_reader` **cannot parse**. You MUST convert to proper MP4 (with moov atom) before uploading. The Server Proxy approach handles this automatically with `ffmpeg-static`. Plain JS does NOT support this — use pre-recorded MP4 files only.
|
|
35
|
+
- **CORS:** The Archetype AI API does not allow direct browser requests. The Server Proxy approach avoids this by routing all API calls through SvelteKit server routes. Plain JS will only work if CORS is not an issue (e.g., server-side Node.js scripts).
|
|
36
|
+
- **API key:** The Server Proxy approach keeps the API key in private server env vars. Plain JS exposes it to the browser.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Default Model Parameters
|
|
41
|
+
|
|
42
|
+
| Parameter | Default | Description |
|
|
43
|
+
|---|---|---|
|
|
44
|
+
| `model_version` | `Newton::c2_4_7b_251215a172f6d7` | Newton model ID |
|
|
45
|
+
| `instruction` | *(your activity description prompt)* | What the model should output |
|
|
46
|
+
| `focus` | *(what to look for)* | Focus of analysis |
|
|
47
|
+
| `temporal_focus` | `5` | Temporal window in seconds |
|
|
48
|
+
| `max_new_tokens` | `512` | Max response length |
|
|
49
|
+
| `camera_buffer_size` | `5` | Frames to buffer before processing |
|
|
50
|
+
| `camera_buffer_step_size` | `5` | Step size for frame sampling |
|
|
51
|
+
| `memory_prompt_buffer_size` | `0` | Prior prompts to retain (0 = no memory) |
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## API Reference
|
|
56
|
+
|
|
57
|
+
| Operation | Method | Endpoint | Body |
|
|
58
|
+
|-----------|--------|----------|------|
|
|
59
|
+
| Upload file | POST | `/files` | `FormData` |
|
|
60
|
+
| List lenses | GET | `/lens/metadata` | — |
|
|
61
|
+
| Register lens | POST | `/lens/register` | `{ lens_config: config }` |
|
|
62
|
+
| Delete lens | POST | `/lens/delete` | `{ lens_id }` |
|
|
63
|
+
| Create session | POST | `/lens/sessions/create` | `{ lens_id }` |
|
|
64
|
+
| Process event | POST | `/lens/sessions/events/process` | `{ session_id, event }` |
|
|
65
|
+
| Destroy session | POST | `/lens/sessions/destroy` | `{ session_id }` |
|
|
66
|
+
| SSE consumer | GET | `/lens/sessions/consumer/{sessionId}` | — |
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Frontend Architecture
|
|
71
|
+
|
|
72
|
+
Decompose the UI into components rather than building a monolithic `+page.svelte`. See `@rules/frontend-architecture` for full conventions.
|
|
73
|
+
|
|
74
|
+
### Recommended decomposition
|
|
75
|
+
|
|
76
|
+
| UI Area | Component | Composes (DS primitives) | Key Props |
|
|
77
|
+
|---------|-----------|--------------------------|-----------|
|
|
78
|
+
| Video input | `VideoUpload.svelte` (or reuse VideoPlayer pattern) | Card, Button | `onupload`, `status`, `videoUrl` |
|
|
79
|
+
| Status | `ProcessingStatus.svelte` | Card, Badge, Spinner | `status`, `message` |
|
|
80
|
+
| Summary | `AnalysisSummary.svelte` | Card | `title`, `summary` |
|
|
81
|
+
| Log | Reuse ExpandableLog pattern | Collapsible, Badge, Item | `title`, `logs[]`, `count` |
|
|
82
|
+
|
|
83
|
+
### Layout and API logic
|
|
84
|
+
|
|
85
|
+
- Use `@skills/create-dashboard` as a layout starting point for dashboard-style UIs
|
|
86
|
+
- Extract SSE consumer, session management, and upload logic into `$lib/api/activity-monitor.js`
|
|
87
|
+
- `+page.svelte` orchestrates flow state (status, sessionId) and passes data to components via props
|
|
88
|
+
|
|
89
|
+
### What NOT to include unless requested
|
|
90
|
+
|
|
91
|
+
- Charts (SensorChart, ScatterChart) — only if the user asks for time-series visualization
|
|
92
|
+
- Complex data tables — only if the user needs tabular result inspection
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Approach A: Server Proxy (SvelteKit)
|
|
97
|
+
|
|
98
|
+
Routes all API calls through SvelteKit server routes. Converts MediaRecorder video to proper MP4 using `ffmpeg-static`. API key stays private on the server.
|
|
99
|
+
|
|
100
|
+
### Prerequisites
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
npm install ffmpeg-static
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Add to `.env`:
|
|
107
|
+
```
|
|
108
|
+
ATAI_API_KEY=your-api-key
|
|
109
|
+
ATAI_API_ENDPOINT=https://api.u1.archetypeai.app/v0.5
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Server Route 1: Upload (`src/routes/api/upload/+server.js`)
|
|
113
|
+
|
|
114
|
+
Accepts a video blob, converts to proper MP4 via ffmpeg, then uploads to the Archetype API.
|
|
115
|
+
|
|
116
|
+
```javascript
|
|
117
|
+
import { json, error } from '@sveltejs/kit';
|
|
118
|
+
import { ATAI_API_KEY, ATAI_API_ENDPOINT } from '$env/static/private';
|
|
119
|
+
import { writeFile, unlink, readFile } from 'node:fs/promises';
|
|
120
|
+
import { join } from 'node:path';
|
|
121
|
+
import { tmpdir } from 'node:os';
|
|
122
|
+
import { spawn } from 'node:child_process';
|
|
123
|
+
import ffmpegPath from 'ffmpeg-static';
|
|
124
|
+
|
|
125
|
+
function convertToMp4(inputPath, outputPath) {
|
|
126
|
+
return new Promise((resolve, reject) => {
|
|
127
|
+
const args = [
|
|
128
|
+
'-i', inputPath,
|
|
129
|
+
'-c:v', 'libx264', '-preset', 'ultrafast',
|
|
130
|
+
'-movflags', '+faststart',
|
|
131
|
+
'-an', '-y', outputPath
|
|
132
|
+
];
|
|
133
|
+
const proc = spawn(ffmpegPath, args, { stdio: ['ignore', 'pipe', 'pipe'] });
|
|
134
|
+
let stderr = '';
|
|
135
|
+
proc.stderr.on('data', (d) => (stderr += d.toString()));
|
|
136
|
+
proc.on('close', (code) => {
|
|
137
|
+
if (code === 0) resolve();
|
|
138
|
+
else reject(new Error(`ffmpeg exited with code ${code}: ${stderr.slice(-500)}`));
|
|
139
|
+
});
|
|
140
|
+
proc.on('error', reject);
|
|
141
|
+
});
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
export async function POST({ request }) {
|
|
145
|
+
const formData = await request.formData();
|
|
146
|
+
const file = formData.get('file');
|
|
147
|
+
if (!file) throw error(400, 'No file provided');
|
|
148
|
+
|
|
149
|
+
const timestamp = Date.now();
|
|
150
|
+
const inputPath = join(tmpdir(), `upload_input_${timestamp}`);
|
|
151
|
+
const outputPath = join(tmpdir(), `upload_output_${timestamp}.mp4`);
|
|
152
|
+
|
|
153
|
+
try {
|
|
154
|
+
const buffer = Buffer.from(await file.arrayBuffer());
|
|
155
|
+
await writeFile(inputPath, buffer);
|
|
156
|
+
await convertToMp4(inputPath, outputPath);
|
|
157
|
+
|
|
158
|
+
const convertedBuffer = await readFile(outputPath);
|
|
159
|
+
const outputName = file.name.replace(/\.\w+$/, '.mp4');
|
|
160
|
+
const convertedFile = new File([convertedBuffer], outputName, { type: 'video/mp4' });
|
|
161
|
+
|
|
162
|
+
const uploadForm = new FormData();
|
|
163
|
+
uploadForm.append('file', convertedFile);
|
|
164
|
+
|
|
165
|
+
const response = await fetch(`${ATAI_API_ENDPOINT}/files`, {
|
|
166
|
+
method: 'POST',
|
|
167
|
+
headers: { Authorization: `Bearer ${ATAI_API_KEY}` },
|
|
168
|
+
body: uploadForm
|
|
169
|
+
});
|
|
170
|
+
|
|
171
|
+
if (!response.ok) {
|
|
172
|
+
const errorBody = await response.json().catch(() => ({}));
|
|
173
|
+
throw error(response.status, `Upload failed: ${JSON.stringify(errorBody)}`);
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
return json(await response.json());
|
|
177
|
+
} finally {
|
|
178
|
+
await unlink(inputPath).catch(() => {});
|
|
179
|
+
await unlink(outputPath).catch(() => {});
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### Server Route 2: Activity Monitor (`src/routes/api/activity-monitor/+server.js`)
|
|
185
|
+
|
|
186
|
+
Handles stale lens cleanup, lens registration, session creation, and session-ready polling — all server-side.
|
|
187
|
+
|
|
188
|
+
```javascript
|
|
189
|
+
import { json, error } from '@sveltejs/kit';
|
|
190
|
+
import { ATAI_API_KEY, ATAI_API_ENDPOINT } from '$env/static/private';
|
|
191
|
+
|
|
192
|
+
async function apiGet(path) {
|
|
193
|
+
const response = await fetch(`${ATAI_API_ENDPOINT}${path}`, {
|
|
194
|
+
headers: { Authorization: `Bearer ${ATAI_API_KEY}` }
|
|
195
|
+
});
|
|
196
|
+
if (!response.ok) throw new Error(`API GET ${path} failed: ${response.status}`);
|
|
197
|
+
return response.json();
|
|
198
|
+
}
|
|
199
|
+
|
|
200
|
+
async function apiPost(path, body, timeoutMs = 10000) {
|
|
201
|
+
const controller = new AbortController();
|
|
202
|
+
const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
|
|
203
|
+
try {
|
|
204
|
+
const response = await fetch(`${ATAI_API_ENDPOINT}${path}`, {
|
|
205
|
+
method: 'POST',
|
|
206
|
+
headers: {
|
|
207
|
+
Authorization: `Bearer ${ATAI_API_KEY}`,
|
|
208
|
+
'Content-Type': 'application/json'
|
|
209
|
+
},
|
|
210
|
+
body: JSON.stringify(body),
|
|
211
|
+
signal: controller.signal
|
|
212
|
+
});
|
|
213
|
+
if (!response.ok) {
|
|
214
|
+
const errorBody = await response.json().catch(() => ({}));
|
|
215
|
+
throw new Error(`API POST ${path} failed: ${response.status} - ${JSON.stringify(errorBody)}`);
|
|
216
|
+
}
|
|
217
|
+
return response.json();
|
|
218
|
+
} finally {
|
|
219
|
+
clearTimeout(timeoutId);
|
|
220
|
+
}
|
|
221
|
+
}
|
|
222
|
+
|
|
223
|
+
async function waitForSessionReady(sessionId, maxWaitMs = 60000) {
|
|
224
|
+
const start = Date.now();
|
|
225
|
+
while (Date.now() - start < maxWaitMs) {
|
|
226
|
+
const status = await apiPost(
|
|
227
|
+
'/lens/sessions/events/process',
|
|
228
|
+
{ session_id: sessionId, event: { type: 'session.status' } },
|
|
229
|
+
10000
|
|
230
|
+
);
|
|
231
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_RUNNING' ||
|
|
232
|
+
status.session_status === '3') return true;
|
|
233
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_FAILED' ||
|
|
234
|
+
status.session_status === '6') return false;
|
|
235
|
+
await new Promise((r) => setTimeout(r, 500));
|
|
236
|
+
}
|
|
237
|
+
return false;
|
|
238
|
+
}
|
|
239
|
+
|
|
240
|
+
const LENS_NAME = 'activity-monitor-video-lens';
|
|
241
|
+
|
|
242
|
+
// POST: Create activity monitor session
|
|
243
|
+
export async function POST({ request }) {
|
|
244
|
+
const { lensConfig } = await request.json();
|
|
245
|
+
|
|
246
|
+
// Clean up ALL existing lenses to avoid stale state blocking new registrations
|
|
247
|
+
try {
|
|
248
|
+
const existing = await apiGet('/lens/metadata');
|
|
249
|
+
if (Array.isArray(existing)) {
|
|
250
|
+
for (const lens of existing) {
|
|
251
|
+
await apiPost('/lens/delete', { lens_id: lens.lens_id }).catch(() => {});
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
} catch { /* ignore cleanup errors */ }
|
|
255
|
+
|
|
256
|
+
const registered = await apiPost('/lens/register', { lens_config: lensConfig });
|
|
257
|
+
const lensId = registered.lens_id;
|
|
258
|
+
|
|
259
|
+
const session = await apiPost('/lens/sessions/create', { lens_id: lensId });
|
|
260
|
+
const sessionId = session.session_id;
|
|
261
|
+
|
|
262
|
+
await apiPost('/lens/delete', { lens_id: lensId });
|
|
263
|
+
|
|
264
|
+
const isReady = await waitForSessionReady(sessionId);
|
|
265
|
+
if (!isReady) throw error(500, 'Session failed to start');
|
|
266
|
+
|
|
267
|
+
return json({ sessionId });
|
|
268
|
+
}
|
|
269
|
+
|
|
270
|
+
// DELETE: Destroy a session
|
|
271
|
+
export async function DELETE({ request }) {
|
|
272
|
+
const { sessionId } = await request.json();
|
|
273
|
+
await apiPost('/lens/sessions/destroy', { session_id: sessionId });
|
|
274
|
+
return json({ ok: true });
|
|
275
|
+
}
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
### Server Route 3: SSE Stream Proxy (`src/routes/api/activity-monitor/stream/[sessionId]/+server.js`)
|
|
279
|
+
|
|
280
|
+
```javascript
|
|
281
|
+
import { ATAI_API_KEY, ATAI_API_ENDPOINT } from '$env/static/private';
|
|
282
|
+
|
|
283
|
+
export async function GET({ params }) {
|
|
284
|
+
const { sessionId } = params;
|
|
285
|
+
|
|
286
|
+
const upstream = await fetch(
|
|
287
|
+
`${ATAI_API_ENDPOINT}/lens/sessions/consumer/${sessionId}`,
|
|
288
|
+
{ headers: { Authorization: `Bearer ${ATAI_API_KEY}` } }
|
|
289
|
+
);
|
|
290
|
+
|
|
291
|
+
if (!upstream.ok) {
|
|
292
|
+
return new Response('Failed to connect to SSE stream', { status: upstream.status });
|
|
293
|
+
}
|
|
294
|
+
|
|
295
|
+
return new Response(upstream.body, {
|
|
296
|
+
headers: {
|
|
297
|
+
'Content-Type': 'text/event-stream',
|
|
298
|
+
'Cache-Control': 'no-cache',
|
|
299
|
+
Connection: 'keep-alive'
|
|
300
|
+
}
|
|
301
|
+
});
|
|
302
|
+
}
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### Server Route 4: Direct Query Proxy (`src/routes/api/query/+server.js`)
|
|
306
|
+
|
|
307
|
+
```javascript
|
|
308
|
+
import { json, error } from '@sveltejs/kit';
|
|
309
|
+
import { ATAI_API_KEY, ATAI_API_ENDPOINT } from '$env/static/private';
|
|
310
|
+
|
|
311
|
+
export async function POST({ request }) {
|
|
312
|
+
const { query, systemPrompt = '', maxNewTokens = 1024 } = await request.json();
|
|
313
|
+
|
|
314
|
+
const response = await fetch(`${ATAI_API_ENDPOINT}/query`, {
|
|
315
|
+
method: 'POST',
|
|
316
|
+
headers: {
|
|
317
|
+
Authorization: `Bearer ${ATAI_API_KEY}`,
|
|
318
|
+
'Content-Type': 'application/json'
|
|
319
|
+
},
|
|
320
|
+
body: JSON.stringify({
|
|
321
|
+
query,
|
|
322
|
+
system_prompt: systemPrompt,
|
|
323
|
+
instruction_prompt: systemPrompt,
|
|
324
|
+
file_ids: [],
|
|
325
|
+
model: 'Newton::c2_4_7b_251215a172f6d7',
|
|
326
|
+
max_new_tokens: maxNewTokens,
|
|
327
|
+
sanitize: false
|
|
328
|
+
}),
|
|
329
|
+
signal: AbortSignal.timeout(120000)
|
|
330
|
+
});
|
|
331
|
+
|
|
332
|
+
if (!response.ok) {
|
|
333
|
+
const errorBody = await response.json().catch(() => ({}));
|
|
334
|
+
throw error(response.status, `Query failed: ${JSON.stringify(errorBody)}`);
|
|
335
|
+
}
|
|
336
|
+
|
|
337
|
+
const data = await response.json();
|
|
338
|
+
let text = '';
|
|
339
|
+
const r = data?.response;
|
|
340
|
+
if (r?.response && Array.isArray(r.response)) text = r.response[0] || '';
|
|
341
|
+
else if (Array.isArray(r)) text = r[0] || '';
|
|
342
|
+
else if (typeof r === 'string') text = r;
|
|
343
|
+
else if (data?.text) text = data.text;
|
|
344
|
+
else text = JSON.stringify(data);
|
|
345
|
+
|
|
346
|
+
return json({ text });
|
|
347
|
+
}
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
### Client-side usage (Svelte / JS)
|
|
351
|
+
|
|
352
|
+
The client calls the local server routes — no API key or endpoint needed:
|
|
353
|
+
|
|
354
|
+
```javascript
|
|
355
|
+
// 1. Upload video (server converts to proper MP4 automatically)
|
|
356
|
+
async function uploadVideo(blob) {
|
|
357
|
+
const formData = new FormData();
|
|
358
|
+
const ext = blob.type.includes('mp4') ? 'mp4' : 'webm';
|
|
359
|
+
const timestamp = Date.now();
|
|
360
|
+
formData.append('file', new File([blob], `recording_${timestamp}.${ext}`, { type: blob.type }));
|
|
361
|
+
|
|
362
|
+
const res = await fetch('/api/upload', { method: 'POST', body: formData });
|
|
363
|
+
if (!res.ok) throw new Error(`Upload failed: ${res.status}`);
|
|
364
|
+
const result = await res.json();
|
|
365
|
+
return result.file_id;
|
|
366
|
+
}
|
|
367
|
+
|
|
368
|
+
// 2. Create activity monitor session (server handles lens cleanup, registration, session polling)
|
|
369
|
+
async function createSession(lensConfig) {
|
|
370
|
+
const res = await fetch('/api/activity-monitor', {
|
|
371
|
+
method: 'POST',
|
|
372
|
+
headers: { 'Content-Type': 'application/json' },
|
|
373
|
+
body: JSON.stringify({ lensConfig })
|
|
374
|
+
});
|
|
375
|
+
if (!res.ok) throw new Error(`Session creation failed: ${res.status}`);
|
|
376
|
+
const { sessionId } = await res.json();
|
|
377
|
+
return sessionId;
|
|
378
|
+
}
|
|
379
|
+
|
|
380
|
+
// 3. Consume SSE results (proxied through server)
|
|
381
|
+
async function consumeSSE(sessionId, onResult) {
|
|
382
|
+
const res = await fetch(`/api/activity-monitor/stream/${sessionId}`);
|
|
383
|
+
if (!res.ok) throw new Error(`SSE stream failed: ${res.status}`);
|
|
384
|
+
|
|
385
|
+
const reader = res.body.getReader();
|
|
386
|
+
const decoder = new TextDecoder();
|
|
387
|
+
let buffer = '';
|
|
388
|
+
|
|
389
|
+
while (true) {
|
|
390
|
+
const { done, value } = await reader.read();
|
|
391
|
+
if (done) break;
|
|
392
|
+
|
|
393
|
+
buffer += decoder.decode(value, { stream: true });
|
|
394
|
+
const lines = buffer.split('\n');
|
|
395
|
+
buffer = lines.pop();
|
|
396
|
+
|
|
397
|
+
for (const line of lines) {
|
|
398
|
+
if (!line.startsWith('data: ')) continue;
|
|
399
|
+
try {
|
|
400
|
+
const parsed = JSON.parse(line.slice(6));
|
|
401
|
+
if (parsed.type === 'inference.result') {
|
|
402
|
+
const text = parsed.event_data.response[0];
|
|
403
|
+
const ts = parsed.event_data.query_metadata?.sensor_timestamp;
|
|
404
|
+
onResult({ text, timestamp: ts });
|
|
405
|
+
}
|
|
406
|
+
if (parsed.type === 'sse.stream.end') return;
|
|
407
|
+
} catch { /* skip malformed JSON */ }
|
|
408
|
+
}
|
|
409
|
+
}
|
|
410
|
+
}
|
|
411
|
+
|
|
412
|
+
// 4. Destroy session
|
|
413
|
+
async function destroySession(sessionId) {
|
|
414
|
+
await fetch('/api/activity-monitor', {
|
|
415
|
+
method: 'DELETE',
|
|
416
|
+
headers: { 'Content-Type': 'application/json' },
|
|
417
|
+
body: JSON.stringify({ sessionId })
|
|
418
|
+
});
|
|
419
|
+
}
|
|
420
|
+
|
|
421
|
+
// 5. Direct query (for post-processing)
|
|
422
|
+
async function directQuery(systemPrompt, query) {
|
|
423
|
+
const res = await fetch('/api/query', {
|
|
424
|
+
method: 'POST',
|
|
425
|
+
headers: { 'Content-Type': 'application/json' },
|
|
426
|
+
body: JSON.stringify({ query, systemPrompt })
|
|
427
|
+
});
|
|
428
|
+
if (!res.ok) throw new Error(`Query failed: ${res.status}`);
|
|
429
|
+
return res.json(); // { text: "..." }
|
|
430
|
+
}
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
### Server Proxy Lifecycle Summary
|
|
434
|
+
|
|
435
|
+
```
|
|
436
|
+
Client SvelteKit Server Archetype AI API
|
|
437
|
+
────── ─────────────── ─────────────────
|
|
438
|
+
POST /api/upload → ffmpeg convert → POST /files → returns file_id
|
|
439
|
+
POST /api/activity-monitor → GET /lens/metadata (cleanup) →
|
|
440
|
+
POST /lens/register → returns lens_id
|
|
441
|
+
POST /lens/sessions/create → returns session_id
|
|
442
|
+
POST /lens/delete →
|
|
443
|
+
poll session.status → SESSION_STATUS_RUNNING
|
|
444
|
+
← { sessionId }
|
|
445
|
+
GET /api/.../stream/:id → GET /lens/sessions/consumer/ → SSE events proxied
|
|
446
|
+
← inference.result events
|
|
447
|
+
DELETE /api/activity-monitor → POST /lens/sessions/destroy →
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
---
|
|
451
|
+
|
|
452
|
+
## Approach B: Plain JS (client-side)
|
|
453
|
+
|
|
454
|
+
Direct `fetch` calls from the browser or Node.js. **Only works with pre-recorded MP4 files** (not MediaRecorder blobs). API key is exposed to the client. May hit CORS issues in the browser.
|
|
455
|
+
|
|
456
|
+
### Helpers: API wrappers
|
|
457
|
+
|
|
458
|
+
```typescript
|
|
459
|
+
const API_ENDPOINT = 'https://api.u1.archetypeai.app/v0.5'
|
|
460
|
+
|
|
461
|
+
async function apiGet<T>(path: string, apiKey: string): Promise<T> {
|
|
462
|
+
const response = await fetch(`${API_ENDPOINT}${path}`, {
|
|
463
|
+
method: 'GET',
|
|
464
|
+
headers: { Authorization: `Bearer ${apiKey}` },
|
|
465
|
+
})
|
|
466
|
+
if (!response.ok) throw new Error(`API GET ${path} failed: ${response.status}`)
|
|
467
|
+
return response.json()
|
|
468
|
+
}
|
|
469
|
+
|
|
470
|
+
async function apiPost<T>(path: string, apiKey: string, body: unknown, timeoutMs = 5000): Promise<T> {
|
|
471
|
+
const controller = new AbortController()
|
|
472
|
+
const timeoutId = setTimeout(() => controller.abort(), timeoutMs)
|
|
473
|
+
|
|
474
|
+
try {
|
|
475
|
+
const response = await fetch(`${API_ENDPOINT}${path}`, {
|
|
476
|
+
method: 'POST',
|
|
477
|
+
headers: {
|
|
478
|
+
Authorization: `Bearer ${apiKey}`,
|
|
479
|
+
'Content-Type': 'application/json',
|
|
480
|
+
},
|
|
481
|
+
body: JSON.stringify(body),
|
|
482
|
+
signal: controller.signal,
|
|
483
|
+
})
|
|
484
|
+
|
|
485
|
+
if (!response.ok) {
|
|
486
|
+
const errorBody = await response.json().catch(() => ({}))
|
|
487
|
+
throw new Error(`API POST ${path} failed: ${response.status} - ${JSON.stringify(errorBody)}`)
|
|
488
|
+
}
|
|
489
|
+
|
|
490
|
+
return response.json()
|
|
491
|
+
} finally {
|
|
492
|
+
clearTimeout(timeoutId)
|
|
493
|
+
}
|
|
494
|
+
}
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### Step 1: Upload the video file
|
|
498
|
+
|
|
499
|
+
```typescript
|
|
500
|
+
const formData = new FormData()
|
|
501
|
+
formData.append('file', videoFile) // File object from <input type="file">
|
|
502
|
+
|
|
503
|
+
const uploadResponse = await fetch(`${API_ENDPOINT}/files`, {
|
|
504
|
+
method: 'POST',
|
|
505
|
+
headers: { Authorization: `Bearer ${apiKey}` },
|
|
506
|
+
body: formData,
|
|
507
|
+
})
|
|
508
|
+
const uploadResult = await uploadResponse.json()
|
|
509
|
+
const fileId = uploadResult.file_id
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
### Step 2: Find or create the lens (clean up stale lenses)
|
|
513
|
+
|
|
514
|
+
```typescript
|
|
515
|
+
const LENS_NAME = 'activity-monitor-video-lens'
|
|
516
|
+
|
|
517
|
+
const lensConfig = {
|
|
518
|
+
lens_name: LENS_NAME,
|
|
519
|
+
lens_config: {
|
|
520
|
+
model_pipeline: [
|
|
521
|
+
{ processor_name: 'lens_camera_processor', processor_config: {} },
|
|
522
|
+
],
|
|
523
|
+
model_parameters: {
|
|
524
|
+
model_version: 'Newton::c2_4_7b_251215a172f6d7',
|
|
525
|
+
instruction: 'Describe the activity currently being performed in one concise sentence.',
|
|
526
|
+
focus: 'Focus on the main person and their current action.',
|
|
527
|
+
temporal_focus: 5,
|
|
528
|
+
max_new_tokens: 512,
|
|
529
|
+
camera_buffer_size: 5,
|
|
530
|
+
camera_buffer_step_size: 5,
|
|
531
|
+
memory_prompt_buffer_size: 0,
|
|
532
|
+
},
|
|
533
|
+
input_streams: [
|
|
534
|
+
{
|
|
535
|
+
stream_type: 'video_file_reader',
|
|
536
|
+
stream_config: { file_id: fileId },
|
|
537
|
+
},
|
|
538
|
+
],
|
|
539
|
+
output_streams: [
|
|
540
|
+
{ stream_type: 'server_sent_events_writer' },
|
|
541
|
+
],
|
|
542
|
+
},
|
|
543
|
+
}
|
|
544
|
+
|
|
545
|
+
// Delete ALL existing lenses to avoid stale state blocking new registrations
|
|
546
|
+
const existingLenses = await apiGet<Array<{ lens_id: string; lens_name: string }>>(
|
|
547
|
+
'/lens/metadata', apiKey
|
|
548
|
+
)
|
|
549
|
+
if (Array.isArray(existingLenses)) {
|
|
550
|
+
for (const lens of existingLenses) {
|
|
551
|
+
await apiPost('/lens/delete', apiKey, { lens_id: lens.lens_id }).catch(() => {})
|
|
552
|
+
}
|
|
553
|
+
}
|
|
554
|
+
|
|
555
|
+
// Register fresh lens
|
|
556
|
+
const registeredLens = await apiPost<{ lens_id: string }>(
|
|
557
|
+
'/lens/register', apiKey, { lens_config: lensConfig }
|
|
558
|
+
)
|
|
559
|
+
const lensId = registeredLens.lens_id
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
### Step 3: Create session and wait for ready
|
|
563
|
+
|
|
564
|
+
```typescript
|
|
565
|
+
const session = await apiPost<{ session_id: string }>(
|
|
566
|
+
'/lens/sessions/create', apiKey, { lens_id: lensId }
|
|
567
|
+
)
|
|
568
|
+
const sessionId = session.session_id
|
|
569
|
+
|
|
570
|
+
// Optionally delete the lens definition (the session keeps running independently)
|
|
571
|
+
await apiPost('/lens/delete', apiKey, { lens_id: lensId })
|
|
572
|
+
|
|
573
|
+
// Wait for session to be ready
|
|
574
|
+
async function waitForSessionReady(sessionId: string, maxWaitMs = 60000): Promise<boolean> {
|
|
575
|
+
const start = Date.now()
|
|
576
|
+
while (Date.now() - start < maxWaitMs) {
|
|
577
|
+
const status = await apiPost<{ session_status: string }>(
|
|
578
|
+
'/lens/sessions/events/process', apiKey,
|
|
579
|
+
{ session_id: sessionId, event: { type: 'session.status' } },
|
|
580
|
+
10000
|
|
581
|
+
)
|
|
582
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_RUNNING' ||
|
|
583
|
+
status.session_status === '3') return true
|
|
584
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_FAILED' ||
|
|
585
|
+
status.session_status === '6') return false
|
|
586
|
+
await new Promise(r => setTimeout(r, 500))
|
|
587
|
+
}
|
|
588
|
+
return false
|
|
589
|
+
}
|
|
590
|
+
|
|
591
|
+
const isReady = await waitForSessionReady(sessionId)
|
|
592
|
+
if (!isReady) throw new Error('Session failed to initialize')
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
### Step 4: Consume SSE results
|
|
596
|
+
|
|
597
|
+
```typescript
|
|
598
|
+
import { fetchEventSource } from '@microsoft/fetch-event-source'
|
|
599
|
+
|
|
600
|
+
const results = []
|
|
601
|
+
const abortController = new AbortController()
|
|
602
|
+
|
|
603
|
+
fetchEventSource(`${API_ENDPOINT}/lens/sessions/consumer/${sessionId}`, {
|
|
604
|
+
headers: { Authorization: `Bearer ${apiKey}` },
|
|
605
|
+
signal: abortController.signal,
|
|
606
|
+
onmessage(event) {
|
|
607
|
+
const parsed = JSON.parse(event.data)
|
|
608
|
+
|
|
609
|
+
if (parsed.type === 'inference.result') {
|
|
610
|
+
const response = parsed.event_data.response
|
|
611
|
+
const metadata = parsed.event_data.query_metadata
|
|
612
|
+
const text = Array.isArray(response) ? response[0] : response
|
|
613
|
+
const timestamp = metadata?.sensor_timestamp ?? 'N/A'
|
|
614
|
+
|
|
615
|
+
results.push({ timestamp, frameIds: metadata?.frame_ids ?? [], response: text })
|
|
616
|
+
console.log(`[${timestamp}] ${text}`)
|
|
617
|
+
}
|
|
618
|
+
|
|
619
|
+
if (parsed.type === 'sse.stream.end') {
|
|
620
|
+
console.log(`Complete. ${results.length} results collected.`)
|
|
621
|
+
abortController.abort()
|
|
622
|
+
}
|
|
623
|
+
},
|
|
624
|
+
})
|
|
625
|
+
```
|
|
626
|
+
|
|
627
|
+
### Step 5: Cleanup
|
|
628
|
+
|
|
629
|
+
```typescript
|
|
630
|
+
abortController.abort()
|
|
631
|
+
await apiPost('/lens/sessions/destroy', apiKey, { session_id: sessionId })
|
|
632
|
+
```
|
|
633
|
+
|
|
634
|
+
### Plain JS Lifecycle Summary
|
|
635
|
+
|
|
636
|
+
```
|
|
637
|
+
1. Upload video -> POST /files (FormData)
|
|
638
|
+
2. List existing lenses -> GET /lens/metadata
|
|
639
|
+
3. Delete stale lens -> POST /lens/delete { lens_id } (if same name exists)
|
|
640
|
+
4. Register lens -> POST /lens/register { lens_config with video_file_reader }
|
|
641
|
+
5. Create session -> POST /lens/sessions/create { lens_id }
|
|
642
|
+
6. Wait for ready -> POST /lens/sessions/events/process (poll session.status)
|
|
643
|
+
7. Consume SSE results -> GET /lens/sessions/consumer/{sessionId}
|
|
644
|
+
8. Destroy session -> POST /lens/sessions/destroy { session_id }
|
|
645
|
+
```
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
## Approach C: Python
|
|
650
|
+
|
|
651
|
+
### Requirements
|
|
652
|
+
|
|
653
|
+
- `archetypeai` Python package
|
|
654
|
+
- Environment variables: `ATAI_API_KEY`, optionally `ATAI_API_ENDPOINT`
|
|
655
|
+
|
|
656
|
+
### Full Python Example
|
|
657
|
+
|
|
658
|
+
```python
|
|
659
|
+
import os
|
|
660
|
+
import time
|
|
661
|
+
from archetypeai.api_client import ArchetypeAI
|
|
662
|
+
|
|
663
|
+
api_key = os.getenv("ATAI_API_KEY")
|
|
664
|
+
api_endpoint = os.getenv("ATAI_API_ENDPOINT", ArchetypeAI.get_default_endpoint())
|
|
665
|
+
client = ArchetypeAI(api_key, api_endpoint=api_endpoint)
|
|
666
|
+
|
|
667
|
+
# 1. Upload the video file
|
|
668
|
+
upload_resp = client.files.local.upload("my_video.mp4")
|
|
669
|
+
file_id = upload_resp["file_id"]
|
|
670
|
+
print(f"Uploaded: {file_id}")
|
|
671
|
+
|
|
672
|
+
# 2. Lens config with video_file_reader input stream
|
|
673
|
+
lens_config = {
|
|
674
|
+
"lens_name": "activity-monitor-video-lens",
|
|
675
|
+
"lens_config": {
|
|
676
|
+
"model_pipeline": [
|
|
677
|
+
{"processor_name": "lens_camera_processor", "processor_config": {}}
|
|
678
|
+
],
|
|
679
|
+
"model_parameters": {
|
|
680
|
+
"model_version": "Newton::c2_4_7b_251215a172f6d7",
|
|
681
|
+
"instruction": "Describe the activity currently being performed in one concise sentence.",
|
|
682
|
+
"focus": "Focus on the main person and their current action.",
|
|
683
|
+
"temporal_focus": 5,
|
|
684
|
+
"max_new_tokens": 512,
|
|
685
|
+
"camera_buffer_size": 5,
|
|
686
|
+
"camera_buffer_step_size": 5,
|
|
687
|
+
"memory_prompt_buffer_size": 0,
|
|
688
|
+
},
|
|
689
|
+
"input_streams": [
|
|
690
|
+
{
|
|
691
|
+
"stream_type": "video_file_reader",
|
|
692
|
+
"stream_config": {"file_id": file_id},
|
|
693
|
+
}
|
|
694
|
+
],
|
|
695
|
+
"output_streams": [
|
|
696
|
+
{"stream_type": "server_sent_events_writer"}
|
|
697
|
+
],
|
|
698
|
+
},
|
|
699
|
+
}
|
|
700
|
+
|
|
701
|
+
# 3. Register lens
|
|
702
|
+
lens = client.lens.register(lens_config)
|
|
703
|
+
lens_id = lens["lens_id"]
|
|
704
|
+
|
|
705
|
+
# 4. Create session
|
|
706
|
+
session = client.lens.sessions.create(lens_id)
|
|
707
|
+
session_id = session["session_id"]
|
|
708
|
+
|
|
709
|
+
# Optionally delete the lens (session runs independently)
|
|
710
|
+
client.lens.delete(lens_id)
|
|
711
|
+
|
|
712
|
+
# 5. Wait for session ready
|
|
713
|
+
for _ in range(60):
|
|
714
|
+
try:
|
|
715
|
+
status = client.lens.sessions.process_event(
|
|
716
|
+
session_id, {"type": "session.status"}
|
|
717
|
+
)
|
|
718
|
+
if status.get("session_status") in ["3", "LensSessionStatus.SESSION_STATUS_RUNNING"]:
|
|
719
|
+
break
|
|
720
|
+
except Exception:
|
|
721
|
+
pass
|
|
722
|
+
time.sleep(0.5)
|
|
723
|
+
|
|
724
|
+
# 6. Consume SSE results (server drives video processing)
|
|
725
|
+
sse_reader = client.lens.sessions.create_sse_consumer(
|
|
726
|
+
session_id, max_read_time_sec=600
|
|
727
|
+
)
|
|
728
|
+
|
|
729
|
+
results = []
|
|
730
|
+
for event in sse_reader.read(block=True):
|
|
731
|
+
if isinstance(event, dict) and event.get("type") == "inference.result":
|
|
732
|
+
ed = event.get("event_data", {})
|
|
733
|
+
response = ed.get("response", "")
|
|
734
|
+
meta = ed.get("query_metadata", {})
|
|
735
|
+
|
|
736
|
+
text = response[0] if isinstance(response, list) else response
|
|
737
|
+
|
|
738
|
+
results.append({
|
|
739
|
+
"timestamp": meta.get("sensor_timestamp", "N/A"),
|
|
740
|
+
"frame_ids": meta.get("frame_ids", []),
|
|
741
|
+
"response": text,
|
|
742
|
+
})
|
|
743
|
+
print(f"[{meta.get('sensor_timestamp', '?')}] {text}")
|
|
744
|
+
|
|
745
|
+
sse_reader.close()
|
|
746
|
+
print(f"Done. {len(results)} results collected.")
|
|
747
|
+
|
|
748
|
+
# 7. Cleanup
|
|
749
|
+
client.lens.sessions.destroy(session_id)
|
|
750
|
+
```
|
|
751
|
+
|
|
752
|
+
---
|
|
753
|
+
|
|
754
|
+
## SSE Event Types
|
|
755
|
+
|
|
756
|
+
| Event type | Description |
|
|
757
|
+
|---|---|
|
|
758
|
+
| `inference.result` | A model response with `response[]` text and `query_metadata` |
|
|
759
|
+
| `sse.stream.end` | The session has finished processing the entire video |
|
|
760
|
+
|
|
761
|
+
### `inference.result` payload
|
|
762
|
+
|
|
763
|
+
```json
|
|
764
|
+
{
|
|
765
|
+
"type": "inference.result",
|
|
766
|
+
"event_data": {
|
|
767
|
+
"response": ["Person is stirring a pot on the stove."],
|
|
768
|
+
"query_id": "qry-abc123",
|
|
769
|
+
"query_metadata": {
|
|
770
|
+
"sensor_timestamp": "00:02:15",
|
|
771
|
+
"frame_ids": [270, 275, 280, 285, 290, 295, 300, 305]
|
|
772
|
+
}
|
|
773
|
+
}
|
|
774
|
+
}
|
|
775
|
+
```
|
|
776
|
+
|
|
777
|
+
- `sensor_timestamp`: position in the video (HH:MM:SS)
|
|
778
|
+
- `frame_ids`: the specific video frames the model analyzed
|
|
779
|
+
- `response`: array of model output strings (typically one element)
|
|
780
|
+
|
|
781
|
+
---
|
|
782
|
+
|
|
783
|
+
## Lens Configuration Reference
|
|
784
|
+
|
|
785
|
+
### Model Parameters
|
|
786
|
+
|
|
787
|
+
| Parameter | Description | Default |
|
|
788
|
+
|---|---|---|
|
|
789
|
+
| `model_version` | Newton model ID | `Newton::c2_4_7b_251215a172f6d7` |
|
|
790
|
+
| `instruction` | Main prompt guiding output format | *(user-defined)* |
|
|
791
|
+
| `focus` | What to look for in the video | *(user-defined)* |
|
|
792
|
+
| `temporal_focus` | Temporal window in seconds. Small (<5s) = fine-grained, large (>5s) = macro | `5` |
|
|
793
|
+
| `camera_buffer_size` | Number of frames to buffer before processing | `5` |
|
|
794
|
+
| `camera_buffer_step_size` | Step size for frame sampling from the buffer | `5` |
|
|
795
|
+
| `memory_prompt_buffer_size` | How many prior prompts to retain for context (0 = stateless) | `0` |
|
|
796
|
+
| `max_new_tokens` | Max tokens for model inference output | `512` |
|
|
797
|
+
|
|
798
|
+
### Input Stream Types
|
|
799
|
+
|
|
800
|
+
**Pre-recorded video file (this skill):**
|
|
801
|
+
```json
|
|
802
|
+
{ "stream_type": "video_file_reader", "stream_config": { "file_id": "<uploaded_file_id>" } }
|
|
803
|
+
```
|
|
804
|
+
|
|
805
|
+
**Live RTSP camera stream:**
|
|
806
|
+
```json
|
|
807
|
+
{ "stream_type": "rtsp_video_reader", "stream_config": { "rtsp_url": "rtsp://example.com:554/stream", "target_image_size": [360, 640], "target_frame_rate_hz": 1.0 } }
|
|
808
|
+
```
|
|
809
|
+
|
|
810
|
+
## Troubleshooting
|
|
811
|
+
|
|
812
|
+
- **"Input stream is unhealthy!" / "Failed to load video"**: The uploaded video is likely fragmented MP4 (from MediaRecorder) or WebM. The `video_file_reader` requires standard MP4 with moov atom. Use the Server Proxy approach which converts automatically via ffmpeg, or convert manually before uploading.
|
|
813
|
+
- **"Input stream is unhealthy!" (stale lens)**: Delete **ALL** existing lenses before registering a new one (not just name-matched). Stale lenses from previous sessions can block new registrations even with different names. All approaches above include aggressive lens cleanup via `GET /lens/metadata` → delete every lens.
|
|
814
|
+
- **CORS errors in browser**: The Archetype AI API does not support direct browser CORS. Use the Server Proxy approach or run from Node.js/Python server-side.
|
|
815
|
+
- **No SSE events**: Ensure the session reached `SESSION_STATUS_RUNNING` before connecting the SSE consumer. All approaches above include session-ready polling.
|
|
816
|
+
- **Session fails**: Check that `file_id` in `video_file_reader` config matches the uploaded file.
|
|
817
|
+
- **Video too long**: Use `max_run_time_sec` or adjust `temporal_focus` for faster processing.
|