@archetypeai/ds-cli 0.3.7 → 0.3.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +25 -67
- package/commands/create.js +5 -27
- package/commands/init.js +5 -27
- package/files/AGENTS.md +19 -3
- package/files/CLAUDE.md +21 -3
- package/files/rules/accessibility.md +49 -0
- package/files/rules/frontend-architecture.md +77 -0
- package/files/skills/apply-ds/SKILL.md +92 -80
- package/files/skills/apply-ds/scripts/audit.sh +169 -0
- package/files/skills/apply-ds/scripts/setup.sh +48 -166
- package/files/skills/create-dashboard/SKILL.md +12 -0
- package/files/skills/embedding-from-file/SKILL.md +415 -0
- package/files/skills/embedding-from-sensor/SKILL.md +406 -0
- package/files/skills/embedding-upload/SKILL.md +414 -0
- package/files/skills/fix-accessibility/SKILL.md +57 -9
- package/files/skills/newton-activity-monitor-lens-on-video/SKILL.md +817 -0
- package/files/skills/newton-camera-frame-analysis/SKILL.md +611 -0
- package/files/skills/newton-camera-frame-analysis/scripts/activity-monitor-frame.py +165 -0
- package/files/skills/newton-camera-frame-analysis/scripts/captures/logs/api_responses_20260206_105610.json +62 -0
- package/files/skills/newton-camera-frame-analysis/scripts/continuous_monitor.py +119 -0
- package/files/skills/newton-direct-query/SKILL.md +212 -0
- package/files/skills/newton-direct-query/scripts/direct_query.py +129 -0
- package/files/skills/newton-machine-state-from-file/SKILL.md +545 -0
- package/files/skills/newton-machine-state-from-sensor/SKILL.md +707 -0
- package/files/skills/newton-machine-state-upload/SKILL.md +986 -0
- package/lib/add-ds-ui-svelte.js +5 -2
- package/lib/scaffold-ds-svelte-project.js +25 -18
- package/package.json +13 -2
|
@@ -0,0 +1,611 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: newton-camera-frame-analysis
|
|
3
|
+
description: Live webcam frame analysis using Newton's vision model via model.query (request/response). Captures frames from a webcam as base64 JPEG and sends them to Newton. Use for live camera analysis, scene description, presence detection, or visual Q&A. NOT for video file uploads — use /activity-monitor-lens-on-video for that.
|
|
4
|
+
argument-hint: [question] [camera_index]
|
|
5
|
+
allowed-tools: Bash(python *), Read
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Newton Camera Frame Analysis (Live Webcam → base64 → model.query)
|
|
9
|
+
|
|
10
|
+
Capture live webcam frames, encode as base64 JPEG, and send to Newton's vision model via `model.query` (synchronous request/response). Supports Python (OpenCV) and JavaScript (getUserMedia + canvas).
|
|
11
|
+
|
|
12
|
+
> **Frontend architecture:** When building a web UI for this skill, decompose into components (webcam input, status display, results view) rather than a monolithic page. Extract API logic into `$lib/api/`. See `@rules/frontend-architecture` for conventions and `@skills/create-dashboard` / `@skills/build-pattern` for layout and component patterns.
|
|
13
|
+
|
|
14
|
+
**This skill is for LIVE WEBCAM input only.** For analyzing uploaded video files, use `/activity-monitor-lens-on-video` instead.
|
|
15
|
+
|
|
16
|
+
| | This skill (camera-frame-analysis) | activity-monitor-lens-on-video |
|
|
17
|
+
|---|---|---|
|
|
18
|
+
| **Input** | Live webcam (base64 JPEG frames) | Uploaded video file |
|
|
19
|
+
| **Who captures frames** | Client (Python cv2 / JS canvas) | Server (`video_file_reader`) |
|
|
20
|
+
| **Event type** | `model.query` (request/response) | Server-driven, results via SSE |
|
|
21
|
+
| **Response** | Direct in POST response | Async via SSE stream |
|
|
22
|
+
| **Use case** | Real-time webcam Q&A | Batch video analysis |
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Model Parameters
|
|
27
|
+
|
|
28
|
+
| Parameter | Default | Notes |
|
|
29
|
+
|---|---|---|
|
|
30
|
+
| `model_version` | `Newton::c2_4_7b_251215a172f6d7` | Newton model ID |
|
|
31
|
+
| `template_name` | `image_qa_template_task` | Prompt template |
|
|
32
|
+
| `instruction` | *(user-provided)* | System prompt guiding output format |
|
|
33
|
+
| `focus` | *(user-provided)* | The question or what to look for |
|
|
34
|
+
| `max_new_tokens` | `512` | Max response length |
|
|
35
|
+
| `camera_buffer_size` | `1` | Single-frame buffer for webcam |
|
|
36
|
+
| `min_replicas` / `max_replicas` | `1` / `1` | Scaling config |
|
|
37
|
+
|
|
38
|
+
**IMPORTANT:** `instruction` and `focus` must be passed as parameters — not hardcoded. The values in the lens config (registration) and in each `model.query` event must be consistent. Pass the user's values into both.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Python Implementation
|
|
43
|
+
|
|
44
|
+
### Requirements
|
|
45
|
+
|
|
46
|
+
- `archetypeai` Python package
|
|
47
|
+
- `opencv-python` (`cv2`), `Pillow`
|
|
48
|
+
- Environment variables: `ATAI_API_KEY` or `ARCHETYPE_API_KEY`
|
|
49
|
+
|
|
50
|
+
### Quick Start
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
export ATAI_API_KEY=your_key_here
|
|
54
|
+
python camera_frame_analysis.py "Describe what you see"
|
|
55
|
+
|
|
56
|
+
# Custom question
|
|
57
|
+
python camera_frame_analysis.py "Is anyone present?"
|
|
58
|
+
|
|
59
|
+
# Different camera
|
|
60
|
+
python camera_frame_analysis.py "Describe the scene" 1
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Parameters
|
|
64
|
+
|
|
65
|
+
- **question** (positional, optional): What to analyze (default: "Describe what you see")
|
|
66
|
+
- **camera_index** (positional, optional): Camera index (default: 0)
|
|
67
|
+
|
|
68
|
+
### How It Works
|
|
69
|
+
|
|
70
|
+
1. **Capture**: Opens webcam with OpenCV, reads a frame
|
|
71
|
+
2. **Encode**: Converts frame to base64 JPEG (BGR → RGB → PIL → JPEG → base64)
|
|
72
|
+
3. **Setup**: Registers Newton lens, creates session, waits for ready
|
|
73
|
+
4. **Initialize**: Sends `session.modify` to initialize the processor
|
|
74
|
+
5. **Query**: Sends base64 image as `model.query` event, gets response directly
|
|
75
|
+
6. **Cleanup**: Destroys session
|
|
76
|
+
|
|
77
|
+
### Webcam Capture → base64
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
import cv2
|
|
81
|
+
import base64
|
|
82
|
+
import io
|
|
83
|
+
from PIL import Image
|
|
84
|
+
|
|
85
|
+
def capture_frame_base64(camera_index=0, jpeg_quality=80, resize=(640, 480)):
|
|
86
|
+
"""Capture a webcam frame and return as raw base64 JPEG string."""
|
|
87
|
+
cap = cv2.VideoCapture(camera_index)
|
|
88
|
+
ret, frame = cap.read()
|
|
89
|
+
cap.release()
|
|
90
|
+
|
|
91
|
+
if not ret:
|
|
92
|
+
raise RuntimeError(f"Failed to capture from camera {camera_index}")
|
|
93
|
+
|
|
94
|
+
# Resize if needed
|
|
95
|
+
h, w = frame.shape[:2]
|
|
96
|
+
if (w, h) != resize:
|
|
97
|
+
frame = cv2.resize(frame, resize)
|
|
98
|
+
|
|
99
|
+
# BGR (OpenCV) → RGB → PIL → JPEG → base64
|
|
100
|
+
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
|
101
|
+
pil_image = Image.fromarray(rgb_frame)
|
|
102
|
+
|
|
103
|
+
buffer = io.BytesIO()
|
|
104
|
+
pil_image.save(buffer, format="JPEG", quality=jpeg_quality)
|
|
105
|
+
raw_base64 = base64.b64encode(buffer.getvalue()).decode()
|
|
106
|
+
|
|
107
|
+
return raw_base64 # No "data:image/jpeg;base64," prefix
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Full Python Example
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
import os
|
|
114
|
+
import time
|
|
115
|
+
from archetypeai.api_client import ArchetypeAI
|
|
116
|
+
|
|
117
|
+
api_key = os.getenv("ATAI_API_KEY")
|
|
118
|
+
client = ArchetypeAI(api_key)
|
|
119
|
+
|
|
120
|
+
# --- User-provided values (NOT hardcoded) ---
|
|
121
|
+
instruction = "Answer the following question about the image:"
|
|
122
|
+
focus = "Describe what you see in this image."
|
|
123
|
+
|
|
124
|
+
def build_lens_config(instruction: str, focus: str) -> dict:
|
|
125
|
+
"""Build lens config with user-provided instruction and focus."""
|
|
126
|
+
return {
|
|
127
|
+
"lens_name": "camera-frame-capture-lens",
|
|
128
|
+
"lens_config": {
|
|
129
|
+
"model_pipeline": [
|
|
130
|
+
{"processor_name": "lens_camera_processor", "processor_config": {}}
|
|
131
|
+
],
|
|
132
|
+
"model_parameters": {
|
|
133
|
+
"model_version": "Newton::c2_4_7b_251215a172f6d7",
|
|
134
|
+
"template_name": "image_qa_template_task",
|
|
135
|
+
"instruction": instruction,
|
|
136
|
+
"focus": focus,
|
|
137
|
+
"max_new_tokens": 512,
|
|
138
|
+
"camera_buffer_size": 1,
|
|
139
|
+
"min_replicas": 1,
|
|
140
|
+
"max_replicas": 1,
|
|
141
|
+
},
|
|
142
|
+
},
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
def build_query_event(raw_base64: str, instruction: str, focus: str) -> dict:
|
|
146
|
+
"""Build model.query event with the SAME instruction and focus as the lens config."""
|
|
147
|
+
return {
|
|
148
|
+
"type": "model.query",
|
|
149
|
+
"event_data": {
|
|
150
|
+
"model_version": "Newton::c2_4_7b_251215a172f6d7",
|
|
151
|
+
"template_name": "image_qa_template_task",
|
|
152
|
+
"instruction": instruction,
|
|
153
|
+
"focus": focus,
|
|
154
|
+
"max_new_tokens": 512,
|
|
155
|
+
"data": [{"type": "base64_img", "base64_img": raw_base64}],
|
|
156
|
+
},
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
# 1. Register lens (pass user's instruction + focus)
|
|
160
|
+
lens_config = build_lens_config(instruction, focus)
|
|
161
|
+
lens = client.lens.register(lens_config)
|
|
162
|
+
lens_id = lens["lens_id"]
|
|
163
|
+
|
|
164
|
+
# 2. Create session
|
|
165
|
+
session = client.lens.sessions.create(lens_id)
|
|
166
|
+
session_id = session["session_id"]
|
|
167
|
+
|
|
168
|
+
# 3. Wait for session ready
|
|
169
|
+
for _ in range(60):
|
|
170
|
+
try:
|
|
171
|
+
status = client.lens.sessions.process_event(
|
|
172
|
+
session_id, {"type": "session.status"}
|
|
173
|
+
)
|
|
174
|
+
if status.get("session_status") in ["3", "LensSessionStatus.SESSION_STATUS_RUNNING"]:
|
|
175
|
+
break
|
|
176
|
+
except Exception:
|
|
177
|
+
pass
|
|
178
|
+
time.sleep(0.5)
|
|
179
|
+
|
|
180
|
+
# 4. Initialize processor (REQUIRED)
|
|
181
|
+
client.lens.sessions.process_event(session_id, {
|
|
182
|
+
"type": "session.modify",
|
|
183
|
+
"event_data": {"camera_buffer_size": 1}
|
|
184
|
+
})
|
|
185
|
+
|
|
186
|
+
# 5. Capture frame and send as model.query (same instruction + focus)
|
|
187
|
+
raw_base64 = capture_frame_base64(camera_index=0)
|
|
188
|
+
event = build_query_event(raw_base64, instruction, focus)
|
|
189
|
+
|
|
190
|
+
response = client.lens.sessions.process_event(session_id, event)
|
|
191
|
+
|
|
192
|
+
if response.get("type") == "model.query.response":
|
|
193
|
+
result = response["event_data"]["response"]
|
|
194
|
+
if isinstance(result, list):
|
|
195
|
+
result = result[0]
|
|
196
|
+
print(f"Answer: {result}")
|
|
197
|
+
|
|
198
|
+
# 6. Cleanup
|
|
199
|
+
client.lens.sessions.destroy(session_id)
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Web / JavaScript Implementation
|
|
205
|
+
|
|
206
|
+
Uses direct `fetch` calls to the Archetype AI REST API.
|
|
207
|
+
|
|
208
|
+
### Requirements
|
|
209
|
+
|
|
210
|
+
- Browser with `getUserMedia` support (webcam access)
|
|
211
|
+
- HTTPS (required for camera access, except `localhost`)
|
|
212
|
+
|
|
213
|
+
### API Reference
|
|
214
|
+
|
|
215
|
+
| Operation | Method | Endpoint | Body |
|
|
216
|
+
|-----------|--------|----------|------|
|
|
217
|
+
| List lenses | GET | `/lens/metadata` | — |
|
|
218
|
+
| Register lens | POST | `/lens/register` | `{ lens_config: config }` |
|
|
219
|
+
| Delete lens | POST | `/lens/delete` | `{ lens_id }` |
|
|
220
|
+
| Create session | POST | `/lens/sessions/create` | `{ lens_id }` |
|
|
221
|
+
| Process event | POST | `/lens/sessions/events/process` | `{ session_id, event }` |
|
|
222
|
+
| Destroy session | POST | `/lens/sessions/destroy` | `{ session_id }` |
|
|
223
|
+
|
|
224
|
+
### Helpers: API wrappers
|
|
225
|
+
|
|
226
|
+
```typescript
|
|
227
|
+
const API_ENDPOINT = 'https://api.u1.archetypeai.app/v0.5'
|
|
228
|
+
|
|
229
|
+
async function apiGet<T>(path: string, apiKey: string): Promise<T> {
|
|
230
|
+
const response = await fetch(`${API_ENDPOINT}${path}`, {
|
|
231
|
+
method: 'GET',
|
|
232
|
+
headers: { Authorization: `Bearer ${apiKey}` },
|
|
233
|
+
})
|
|
234
|
+
if (!response.ok) throw new Error(`API GET ${path} failed: ${response.status}`)
|
|
235
|
+
return response.json()
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
async function apiPost<T>(path: string, apiKey: string, body: unknown, timeoutMs = 5000): Promise<T> {
|
|
239
|
+
const controller = new AbortController()
|
|
240
|
+
const timeoutId = setTimeout(() => controller.abort(), timeoutMs)
|
|
241
|
+
|
|
242
|
+
try {
|
|
243
|
+
const response = await fetch(`${API_ENDPOINT}${path}`, {
|
|
244
|
+
method: 'POST',
|
|
245
|
+
headers: {
|
|
246
|
+
Authorization: `Bearer ${apiKey}`,
|
|
247
|
+
'Content-Type': 'application/json',
|
|
248
|
+
},
|
|
249
|
+
body: JSON.stringify(body),
|
|
250
|
+
signal: controller.signal,
|
|
251
|
+
})
|
|
252
|
+
|
|
253
|
+
if (!response.ok) {
|
|
254
|
+
const errorBody = await response.json().catch(() => ({}))
|
|
255
|
+
throw new Error(`API POST ${path} failed: ${response.status} - ${JSON.stringify(errorBody)}`)
|
|
256
|
+
}
|
|
257
|
+
|
|
258
|
+
return response.json()
|
|
259
|
+
} finally {
|
|
260
|
+
clearTimeout(timeoutId)
|
|
261
|
+
}
|
|
262
|
+
}
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### Step 1: Find or create the lens (clean up stale lenses)
|
|
266
|
+
|
|
267
|
+
A stale lens from a previous run causes `"Input stream is unhealthy!"` errors. Always check for an existing lens with the same name and delete it before registering a fresh one.
|
|
268
|
+
|
|
269
|
+
**Pass the user's `instruction` and `focus` into the lens config** — do not hardcode them.
|
|
270
|
+
|
|
271
|
+
```typescript
|
|
272
|
+
const LENS_NAME = 'camera-frame-capture-lens'
|
|
273
|
+
|
|
274
|
+
// --- User-provided values (NOT hardcoded) ---
|
|
275
|
+
const instruction = 'Answer the following question about the image:'
|
|
276
|
+
const focus = 'Describe what you see in this image.'
|
|
277
|
+
|
|
278
|
+
function buildLensConfig(instruction: string, focus: string) {
|
|
279
|
+
return {
|
|
280
|
+
lens_name: LENS_NAME,
|
|
281
|
+
lens_config: {
|
|
282
|
+
model_pipeline: [
|
|
283
|
+
{ processor_name: 'lens_camera_processor', processor_config: {} },
|
|
284
|
+
],
|
|
285
|
+
model_parameters: {
|
|
286
|
+
model_version: 'Newton::c2_4_7b_251215a172f6d7',
|
|
287
|
+
template_name: 'image_qa_template_task',
|
|
288
|
+
instruction,
|
|
289
|
+
focus,
|
|
290
|
+
max_new_tokens: 512,
|
|
291
|
+
camera_buffer_size: 1,
|
|
292
|
+
min_replicas: 1,
|
|
293
|
+
max_replicas: 1,
|
|
294
|
+
},
|
|
295
|
+
},
|
|
296
|
+
}
|
|
297
|
+
}
|
|
298
|
+
|
|
299
|
+
// Delete any existing lens with the same name to avoid stale state
|
|
300
|
+
const existingLenses = await apiGet<Array<{ lens_id: string; lens_name: string }>>(
|
|
301
|
+
'/lens/metadata', apiKey
|
|
302
|
+
)
|
|
303
|
+
const staleLens = existingLenses.find(l => l.lens_name === LENS_NAME)
|
|
304
|
+
if (staleLens) {
|
|
305
|
+
console.log('Deleting stale lens:', staleLens.lens_id)
|
|
306
|
+
await apiPost('/lens/delete', apiKey, { lens_id: staleLens.lens_id })
|
|
307
|
+
}
|
|
308
|
+
|
|
309
|
+
// Register fresh lens with user's instruction + focus
|
|
310
|
+
const lensConfig = buildLensConfig(instruction, focus)
|
|
311
|
+
const registeredLens = await apiPost<{ lens_id: string }>(
|
|
312
|
+
'/lens/register', apiKey, { lens_config: lensConfig }
|
|
313
|
+
)
|
|
314
|
+
const lensId = registeredLens.lens_id
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
### Step 2: Create session and wait for ready
|
|
318
|
+
|
|
319
|
+
```typescript
|
|
320
|
+
const session = await apiPost<{ session_id: string; session_endpoint: string }>(
|
|
321
|
+
'/lens/sessions/create', apiKey, { lens_id: lensId }
|
|
322
|
+
)
|
|
323
|
+
const sessionId = session.session_id
|
|
324
|
+
|
|
325
|
+
// Wait for session to be ready (poll until status = running)
|
|
326
|
+
async function waitForSessionReady(sessionId: string, maxWaitMs = 30000): Promise<boolean> {
|
|
327
|
+
const start = Date.now()
|
|
328
|
+
while (Date.now() - start < maxWaitMs) {
|
|
329
|
+
const status = await apiPost<{ session_status: string }>(
|
|
330
|
+
'/lens/sessions/events/process', apiKey,
|
|
331
|
+
{ session_id: sessionId, event: { type: 'session.status' } },
|
|
332
|
+
10000
|
|
333
|
+
)
|
|
334
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_RUNNING' ||
|
|
335
|
+
status.session_status === '3') {
|
|
336
|
+
return true
|
|
337
|
+
}
|
|
338
|
+
if (status.session_status === 'LensSessionStatus.SESSION_STATUS_FAILED' ||
|
|
339
|
+
status.session_status === '6') {
|
|
340
|
+
return false
|
|
341
|
+
}
|
|
342
|
+
await new Promise(r => setTimeout(r, 500))
|
|
343
|
+
}
|
|
344
|
+
return false
|
|
345
|
+
}
|
|
346
|
+
|
|
347
|
+
const isReady = await waitForSessionReady(sessionId)
|
|
348
|
+
if (!isReady) throw new Error('Session failed to initialize')
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### Step 3: Initialize the processor (REQUIRED for lens_camera_processor)
|
|
352
|
+
|
|
353
|
+
This sends a `session.modify` event that triggers `update_lens_params()` which initializes `video_narrator_memory`. **Without this step, inference will fail.**
|
|
354
|
+
|
|
355
|
+
```typescript
|
|
356
|
+
await apiPost('/lens/sessions/events/process', apiKey, {
|
|
357
|
+
session_id: sessionId,
|
|
358
|
+
event: {
|
|
359
|
+
type: 'session.modify',
|
|
360
|
+
event_data: {
|
|
361
|
+
camera_buffer_size: 1,
|
|
362
|
+
},
|
|
363
|
+
},
|
|
364
|
+
}, 30000) // 30s timeout for initialization
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
### Step 4: Start webcam and capture frames as base64
|
|
368
|
+
|
|
369
|
+
#### 4a. Create a video element
|
|
370
|
+
|
|
371
|
+
```html
|
|
372
|
+
<!-- Visible preview (optional) -->
|
|
373
|
+
<video id="webcam" autoplay playsinline muted></video>
|
|
374
|
+
|
|
375
|
+
<!-- Or create it in JS (no visible preview) -->
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
```typescript
|
|
379
|
+
// Option A: Reference an existing <video> element
|
|
380
|
+
const video = document.getElementById('webcam') as HTMLVideoElement
|
|
381
|
+
|
|
382
|
+
// Option B: Create a hidden video element in JS
|
|
383
|
+
const video = document.createElement('video')
|
|
384
|
+
video.autoplay = true
|
|
385
|
+
video.playsInline = true // Required for iOS
|
|
386
|
+
video.muted = true
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
#### 4b. Request camera access and start the stream
|
|
390
|
+
|
|
391
|
+
```typescript
|
|
392
|
+
async function startCamera(
|
|
393
|
+
preferredWidth = 640,
|
|
394
|
+
preferredHeight = 480,
|
|
395
|
+
facingMode: 'user' | 'environment' = 'user', // 'user' = front, 'environment' = back
|
|
396
|
+
): Promise<MediaStream> {
|
|
397
|
+
const stream = await navigator.mediaDevices.getUserMedia({
|
|
398
|
+
video: {
|
|
399
|
+
width: { ideal: preferredWidth },
|
|
400
|
+
height: { ideal: preferredHeight },
|
|
401
|
+
facingMode,
|
|
402
|
+
},
|
|
403
|
+
audio: false,
|
|
404
|
+
})
|
|
405
|
+
|
|
406
|
+
video.srcObject = stream
|
|
407
|
+
|
|
408
|
+
// Wait until video is actually playing and has dimensions
|
|
409
|
+
await new Promise<void>((resolve) => {
|
|
410
|
+
video.onloadedmetadata = () => {
|
|
411
|
+
video.play()
|
|
412
|
+
resolve()
|
|
413
|
+
}
|
|
414
|
+
})
|
|
415
|
+
|
|
416
|
+
console.log(`Camera started: ${video.videoWidth}x${video.videoHeight}`)
|
|
417
|
+
return stream
|
|
418
|
+
}
|
|
419
|
+
|
|
420
|
+
const stream = await startCamera()
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
**Permission notes:**
|
|
424
|
+
- Browser will show a permission prompt on first call
|
|
425
|
+
- HTTPS is **required** (except `localhost`)
|
|
426
|
+
- On mobile, `facingMode: 'environment'` selects the rear camera
|
|
427
|
+
|
|
428
|
+
#### 4c. Capture a frame as base64 JPEG
|
|
429
|
+
|
|
430
|
+
The flow is: **video element → canvas → toDataURL → base64 string**.
|
|
431
|
+
|
|
432
|
+
```typescript
|
|
433
|
+
function captureFrame(quality = 0.8): string | null {
|
|
434
|
+
if (!video.videoWidth || !video.videoHeight) return null
|
|
435
|
+
|
|
436
|
+
const canvas = document.createElement('canvas')
|
|
437
|
+
canvas.width = video.videoWidth
|
|
438
|
+
canvas.height = video.videoHeight
|
|
439
|
+
|
|
440
|
+
const ctx = canvas.getContext('2d')
|
|
441
|
+
if (!ctx) return null
|
|
442
|
+
|
|
443
|
+
// Draw current video frame onto canvas
|
|
444
|
+
ctx.drawImage(video, 0, 0)
|
|
445
|
+
|
|
446
|
+
// Convert to base64 JPEG — returns "data:image/jpeg;base64,/9j/4AAQ..."
|
|
447
|
+
return canvas.toDataURL('image/jpeg', quality)
|
|
448
|
+
}
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
**Quality vs size tradeoffs:**
|
|
452
|
+
| Quality | ~Size (640x480) | Use case |
|
|
453
|
+
|---------|-----------------|----------|
|
|
454
|
+
| `0.5` | ~20-30 KB | Fast continuous streaming |
|
|
455
|
+
| `0.8` | ~40-60 KB | Good balance (recommended) |
|
|
456
|
+
| `1.0` | ~80-120 KB | Maximum detail |
|
|
457
|
+
|
|
458
|
+
#### 4d. Strip the data URI prefix before sending to the API
|
|
459
|
+
|
|
460
|
+
The API expects **raw base64**, not the `data:image/jpeg;base64,` prefix that `toDataURL` produces.
|
|
461
|
+
|
|
462
|
+
```typescript
|
|
463
|
+
function captureFrameRaw(quality = 0.8): string | null {
|
|
464
|
+
const dataUri = captureFrame(quality)
|
|
465
|
+
if (!dataUri) return null
|
|
466
|
+
|
|
467
|
+
// Strip "data:image/jpeg;base64," prefix → raw base64
|
|
468
|
+
return dataUri.replace(/^data:image\/\w+;base64,/, '')
|
|
469
|
+
}
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
This raw base64 string is what goes into the `model.query` event's `base64_img` field.
|
|
473
|
+
|
|
474
|
+
### Step 5: Send frames for analysis (model.query)
|
|
475
|
+
|
|
476
|
+
This uses **request/response — NOT SSE**. Each frame is sent as a `model.query` event and the response comes back directly in the POST response.
|
|
477
|
+
|
|
478
|
+
The `instruction` and `focus` in the `model.query` event **must match** the values used at lens registration. Pass them through — do not hardcode different values.
|
|
479
|
+
|
|
480
|
+
```typescript
|
|
481
|
+
function createModelQueryEvent(
|
|
482
|
+
rawBase64Images: string[], // Already stripped of data URI prefix
|
|
483
|
+
instruction: string, // Same as lens config
|
|
484
|
+
focus: string, // Same as lens config
|
|
485
|
+
modelVersion = 'Newton::c2_4_7b_251215a172f6d7',
|
|
486
|
+
templateName = 'image_qa_template_task',
|
|
487
|
+
maxNewTokens = 512,
|
|
488
|
+
) {
|
|
489
|
+
return {
|
|
490
|
+
type: 'model.query' as const,
|
|
491
|
+
event_data: {
|
|
492
|
+
model_version: modelVersion,
|
|
493
|
+
template_name: templateName,
|
|
494
|
+
instruction,
|
|
495
|
+
focus,
|
|
496
|
+
max_new_tokens: maxNewTokens,
|
|
497
|
+
data: rawBase64Images.map(img => ({
|
|
498
|
+
type: 'base64_img',
|
|
499
|
+
base64_img: img,
|
|
500
|
+
})),
|
|
501
|
+
},
|
|
502
|
+
}
|
|
503
|
+
}
|
|
504
|
+
|
|
505
|
+
// Send a frame and get the response (uses the same instruction + focus from Step 1)
|
|
506
|
+
async function analyzeFrame(instruction: string, focus: string): Promise<string> {
|
|
507
|
+
const frame = captureFrameRaw() // Raw base64 (no data URI prefix)
|
|
508
|
+
if (!frame) throw new Error('Failed to capture frame')
|
|
509
|
+
|
|
510
|
+
const event = createModelQueryEvent([frame], instruction, focus)
|
|
511
|
+
|
|
512
|
+
const response = await apiPost<{
|
|
513
|
+
type: string
|
|
514
|
+
event_data?: { response?: string | string[]; message?: string }
|
|
515
|
+
}>(
|
|
516
|
+
'/lens/sessions/events/process', apiKey,
|
|
517
|
+
{ session_id: sessionId, event },
|
|
518
|
+
60000 // 60s timeout for model inference
|
|
519
|
+
)
|
|
520
|
+
|
|
521
|
+
// Extract text from response
|
|
522
|
+
if (response.type === 'model.query.response' && response.event_data) {
|
|
523
|
+
const text = response.event_data.response
|
|
524
|
+
if (typeof text === 'string') return text
|
|
525
|
+
if (Array.isArray(text)) return text.join('\n')
|
|
526
|
+
return JSON.stringify(response.event_data)
|
|
527
|
+
}
|
|
528
|
+
|
|
529
|
+
return JSON.stringify(response)
|
|
530
|
+
}
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
### Step 6: Continuous capture loop
|
|
534
|
+
|
|
535
|
+
Send the first frame **immediately** after initialization — do not wait for the interval. The processor expects data promptly after `session.modify`.
|
|
536
|
+
|
|
537
|
+
```typescript
|
|
538
|
+
let isSending = false
|
|
539
|
+
|
|
540
|
+
async function captureAndSend() {
|
|
541
|
+
if (isSending) {
|
|
542
|
+
console.log('Previous request still in progress, skipping frame')
|
|
543
|
+
return
|
|
544
|
+
}
|
|
545
|
+
|
|
546
|
+
isSending = true
|
|
547
|
+
try {
|
|
548
|
+
const result = await analyzeFrame(instruction, focus)
|
|
549
|
+
console.log('Result:', result)
|
|
550
|
+
} catch (error) {
|
|
551
|
+
console.error('Frame analysis failed:', error)
|
|
552
|
+
} finally {
|
|
553
|
+
isSending = false
|
|
554
|
+
}
|
|
555
|
+
}
|
|
556
|
+
|
|
557
|
+
// Send first frame immediately
|
|
558
|
+
captureAndSend()
|
|
559
|
+
|
|
560
|
+
// Then continue at 1 frame per second
|
|
561
|
+
const intervalId = setInterval(captureAndSend, 1000)
|
|
562
|
+
```
|
|
563
|
+
|
|
564
|
+
### Step 7: Cleanup
|
|
565
|
+
|
|
566
|
+
```typescript
|
|
567
|
+
// Stop capture loop
|
|
568
|
+
clearInterval(intervalId)
|
|
569
|
+
|
|
570
|
+
// Stop camera
|
|
571
|
+
stream.getTracks().forEach(track => track.stop())
|
|
572
|
+
|
|
573
|
+
// Destroy session
|
|
574
|
+
await apiPost('/lens/sessions/destroy', apiKey, { session_id: sessionId })
|
|
575
|
+
```
|
|
576
|
+
|
|
577
|
+
### Web Lifecycle Summary
|
|
578
|
+
|
|
579
|
+
```
|
|
580
|
+
1. List existing lenses -> GET /lens/metadata
|
|
581
|
+
2. Delete stale lens -> POST /lens/delete { lens_id } (if same name exists)
|
|
582
|
+
3. Register fresh lens -> POST /lens/register { lens_config: config }
|
|
583
|
+
4. Create session -> POST /lens/sessions/create { lens_id }
|
|
584
|
+
5. Wait for ready -> POST /lens/sessions/events/process (poll session.status)
|
|
585
|
+
6. Initialize processor -> POST /lens/sessions/events/process { session_id, event: session.modify }
|
|
586
|
+
7. Start webcam -> navigator.mediaDevices.getUserMedia()
|
|
587
|
+
8. Send first frame NOW -> POST /lens/sessions/events/process { session_id, event: model.query }
|
|
588
|
+
9. Capture loop (1fps) -> POST /lens/sessions/events/process { session_id, event: model.query }
|
|
589
|
+
10. Stop camera -> stream.getTracks().forEach(t => t.stop())
|
|
590
|
+
11. Destroy session -> POST /lens/sessions/destroy { session_id }
|
|
591
|
+
```
|
|
592
|
+
|
|
593
|
+
---
|
|
594
|
+
|
|
595
|
+
## Use Cases
|
|
596
|
+
|
|
597
|
+
- **Quick scene analysis**: Single frame description
|
|
598
|
+
- **Presence detection**: Check if someone is at their desk
|
|
599
|
+
- **Safety monitoring**: Verify safety equipment usage
|
|
600
|
+
- **Object identification**: Identify specific items in view
|
|
601
|
+
- **Continuous monitoring**: Stream frames with periodic analysis
|
|
602
|
+
|
|
603
|
+
## Troubleshooting
|
|
604
|
+
|
|
605
|
+
- **"Input stream is unhealthy!"**: Stale lens from previous run. Always delete existing lens before registering a new one (see Step 1).
|
|
606
|
+
- **Camera not found**: Try different camera indices (Python) or check browser permissions (Web)
|
|
607
|
+
- **API errors**: Verify API key is set correctly
|
|
608
|
+
- **Session fails**: Ensure `session.modify` (Step 3) is called before sending queries
|
|
609
|
+
- **Timeout on inference**: Model queries can take 10-30s; use 60s timeout
|
|
610
|
+
- **Frame too large**: Use JPEG encoding with quality 0.8 to reduce payload size
|
|
611
|
+
- **Requests overlap**: Gate with `isSending` flag to skip frames while previous request is in-flight
|