@archetypeai/ds-cli 0.3.9 → 0.3.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,545 @@
1
+ ---
2
+ name: newton-machine-state-from-file
3
+ description: Run a Machine State Lens by streaming sensor data from a CSV file. Use when analyzing time-series CSV data for machine state classification, anomaly detection, or n-shot state recognition from files.
4
+ argument-hint: [csv-file-path]
5
+ ---
6
+
7
+ # Newton Machine State Lens — Stream from CSV File
8
+
9
+ Generate a script that streams time-series data from a CSV file to the Archetype AI Machine State Lens for n-shot state classification. Supports both Python and JavaScript/Web.
10
+
11
+ > **Frontend architecture:** When building a web UI for this skill, decompose into components (file input, status display, results view) rather than a monolithic page. Extract API/streaming logic into `$lib/api/`. See `@rules/frontend-architecture` for conventions and `@skills/create-dashboard` / `@skills/build-pattern` for layout and component patterns.
12
+
13
+ ---
14
+
15
+ ## Python Implementation
16
+
17
+ ### Requirements
18
+
19
+ - `archetypeai` Python package
20
+ - `pandas`, `numpy`
21
+ - Environment variables: `ATAI_API_KEY`, optionally `ATAI_API_ENDPOINT`
22
+
23
+ ### Architecture
24
+
25
+ The script must follow this exact pattern:
26
+
27
+ #### 1. API Client Setup
28
+
29
+ ```python
30
+ from archetypeai.api_client import ArchetypeAI
31
+ import os
32
+
33
+ api_key = os.getenv("ATAI_API_KEY")
34
+ api_endpoint = os.getenv("ATAI_API_ENDPOINT", ArchetypeAI.get_default_endpoint())
35
+ client = ArchetypeAI(api_key, api_endpoint=api_endpoint)
36
+ ```
37
+
38
+ #### 2. Upload N-Shot Example Files
39
+
40
+ Upload one CSV per class. The file ID returned is used in the lens YAML config.
41
+
42
+ ```python
43
+ # Upload example files for each class
44
+ # Class name is typically derived from filename stem
45
+ resp = client.files.local.upload("path/to/healthy.csv")
46
+ healthy_id = resp["file_id"]
47
+
48
+ resp = client.files.local.upload("path/to/broken.csv")
49
+ broken_id = resp["file_id"]
50
+ ```
51
+
52
+ #### 3. Lens YAML Configuration
53
+
54
+ Build the YAML config string dynamically, inserting file IDs:
55
+
56
+ ```yaml
57
+ lens_name: Machine State Lens
58
+ lens_config:
59
+ model_pipeline:
60
+ - processor_name: lens_timeseries_state_processor
61
+ processor_config: {}
62
+ model_parameters:
63
+ model_name: OmegaEncoder
64
+ model_version: OmegaEncoder::omega_embeddings_01
65
+ normalize_input: true
66
+ buffer_size: {window_size}
67
+ input_n_shot:
68
+ NORMAL: {healthy_file_id}
69
+ WARNING: {broken_file_id}
70
+ csv_configs:
71
+ timestamp_column: timestamp
72
+ data_columns: ['a1', 'a2', 'a3', 'a4']
73
+ window_size: {window_size}
74
+ step_size: {step_size}
75
+ knn_configs:
76
+ n_neighbors: 5
77
+ metric: manhattan
78
+ weights: distance
79
+ algorithm: ball_tree
80
+ normalize_embeddings: false
81
+ output_streams:
82
+ - stream_type: server_sent_events_writer
83
+ ```
84
+
85
+ **Important**: `input_n_shot` keys become the predicted class labels. Users can define any number of classes (not just two).
86
+
87
+ #### 4. Session Callback Pattern
88
+
89
+ ```python
90
+ def session_callback(session_id, session_endpoint, client, args):
91
+ # Create SSE consumer FIRST
92
+ sse_reader = client.lens.sessions.create_sse_consumer(
93
+ session_id, max_read_time_sec=args["max_run_time_sec"]
94
+ )
95
+
96
+ # Load CSV with pandas
97
+ df = pd.read_csv(args["file_path"])
98
+ columns = ["a1", "a2", "a3", "a4"] # or user-specified columns
99
+ data = df[columns].values.T.tolist() # Transpose: [channels][samples]
100
+
101
+ # Stream data in windows
102
+ total_samples = len(df)
103
+ start = 0
104
+ counter = 0
105
+ while start < total_samples:
106
+ end = start + window_size
107
+ chunk = [series[start:end] for series in data]
108
+
109
+ payload = {
110
+ "type": "session.update",
111
+ "event_data": {
112
+ "type": "data.json",
113
+ "event_data": {
114
+ "sensor_data": chunk,
115
+ "sensor_metadata": {
116
+ "sensor_timestamp": time.time(),
117
+ "sensor_id": f"streamed_sensor_{counter}"
118
+ }
119
+ }
120
+ }
121
+ }
122
+ client.lens.sessions.process_event(session_id, payload)
123
+ start += step_size
124
+ counter += 1
125
+
126
+ # Listen for results
127
+ for event in sse_reader.read(block=True):
128
+ etype = event.get("type")
129
+ if etype == "inference.result":
130
+ result = event["event_data"].get("response")
131
+ meta = event["event_data"].get("query_metadata", {})
132
+ print(f"[{meta.get('query_timestamp', 'N/A')}] Predicted: {result}")
133
+ elif etype == "session.modify.result":
134
+ cls = event["event_data"].get("query_metadata", {}).get("class_name")
135
+ print(f"[TRAINING] Processed class: {cls}")
136
+ ```
137
+
138
+ #### 5. Create and Run Lens
139
+
140
+ ```python
141
+ client.lens.create_and_run_lens(
142
+ yaml_config, session_callback,
143
+ client=client, args=args
144
+ )
145
+ ```
146
+
147
+ ### CLI Arguments to Include
148
+
149
+ ```
150
+ --api-key API key (fallback to ATAI_API_KEY env var)
151
+ --api-endpoint API endpoint (default from SDK)
152
+ --file-path Path to CSV file to analyze (required)
153
+ --n-shot-files Paths to n-shot example CSVs (required, nargs='+')
154
+ --window-size Window size in samples (default: 100)
155
+ --step-size-n-shot Step size for training data (default: 100)
156
+ --step-size-inference Step size for inference stream (default: 100)
157
+ --max-run-time-sec Max runtime in seconds (default: 500)
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Web / JavaScript Implementation
163
+
164
+ Uses direct `fetch` calls to the Archetype AI REST API. Based on the working pattern from `test-stream/src/lib/atai-client.ts`.
165
+
166
+ ### Requirements
167
+
168
+ - `@microsoft/fetch-event-source` for SSE consumption
169
+
170
+ ### API Reference
171
+
172
+ | Operation | Method | Endpoint | Body |
173
+ |-----------|--------|----------|------|
174
+ | Upload file | POST | `/files` | `FormData` |
175
+ | Register lens | POST | `/lens/register` | `{ lens_config: config }` |
176
+ | Delete lens | POST | `/lens/delete` | `{ lens_id }` |
177
+ | Create session | POST | `/lens/sessions/create` | `{ lens_id }` |
178
+ | Process event | POST | `/lens/sessions/events/process` | `{ session_id, event }` |
179
+ | Destroy session | POST | `/lens/sessions/destroy` | `{ session_id }` |
180
+ | SSE consumer | GET | `/lens/sessions/consumer/{sessionId}` | — |
181
+
182
+ ### Helper: API fetch wrapper
183
+
184
+ ```typescript
185
+ const API_ENDPOINT = 'https://api.u1.archetypeai.app/v0.5'
186
+
187
+ async function apiPost<T>(path: string, apiKey: string, body: unknown, timeoutMs = 5000): Promise<T> {
188
+ const controller = new AbortController()
189
+ const timeoutId = setTimeout(() => controller.abort(), timeoutMs)
190
+
191
+ try {
192
+ const response = await fetch(`${API_ENDPOINT}${path}`, {
193
+ method: 'POST',
194
+ headers: {
195
+ Authorization: `Bearer ${apiKey}`,
196
+ 'Content-Type': 'application/json',
197
+ },
198
+ body: JSON.stringify(body),
199
+ signal: controller.signal,
200
+ })
201
+
202
+ if (!response.ok) {
203
+ const errorBody = await response.json().catch(() => ({}))
204
+ throw new Error(`API POST ${path} failed: ${response.status} - ${JSON.stringify(errorBody)}`)
205
+ }
206
+
207
+ return response.json()
208
+ } finally {
209
+ clearTimeout(timeoutId)
210
+ }
211
+ }
212
+ ```
213
+
214
+ ### Step 1: Upload n-shot CSV files
215
+
216
+ ```typescript
217
+ const nShotMap: Record<string, string> = {}
218
+
219
+ for (const { file, className } of nShotFiles) {
220
+ const formData = new FormData()
221
+ formData.append('file', file) // File object from <input type="file">
222
+
223
+ const response = await fetch(`${API_ENDPOINT}/files`, {
224
+ method: 'POST',
225
+ headers: { Authorization: `Bearer ${apiKey}` },
226
+ body: formData,
227
+ })
228
+ const result = await response.json()
229
+ nShotMap[className.toUpperCase()] = result.file_id
230
+ }
231
+ ```
232
+
233
+ ### Step 2: Build the lens config
234
+
235
+ ```typescript
236
+ const windowSize = 100
237
+ const stepSize = 100
238
+
239
+ const lensConfig = {
240
+ lens_name: 'machine_state_lens',
241
+ lens_config: {
242
+ model_pipeline: [
243
+ { processor_name: 'lens_timeseries_state_processor', processor_config: {} },
244
+ ],
245
+ model_parameters: {
246
+ model_name: 'OmegaEncoder',
247
+ model_version: 'OmegaEncoder::omega_embeddings_01',
248
+ normalize_input: true,
249
+ buffer_size: windowSize,
250
+ input_n_shot: nShotMap, // { HEALTHY: 'file_id', BROKEN: 'file_id' }
251
+ csv_configs: {
252
+ timestamp_column: 'timestamp',
253
+ data_columns: ['a1', 'a2', 'a3', 'a4'],
254
+ window_size: windowSize,
255
+ step_size: stepSize,
256
+ },
257
+ knn_configs: {
258
+ n_neighbors: 5,
259
+ metric: 'manhattan',
260
+ weights: 'distance',
261
+ algorithm: 'ball_tree',
262
+ normalize_embeddings: false,
263
+ },
264
+ },
265
+ output_streams: [
266
+ { stream_type: 'server_sent_events_writer' },
267
+ ],
268
+ },
269
+ }
270
+ ```
271
+
272
+ ### Step 3: Register lens, create session, wait for ready
273
+
274
+ ```typescript
275
+ // Register lens — NOTE: body must wrap config as { lens_config: config }
276
+ const registeredLens = await apiPost<{ lens_id: string }>(
277
+ '/lens/register', apiKey, { lens_config: lensConfig }
278
+ )
279
+ const lensId = registeredLens.lens_id
280
+
281
+ // Create session
282
+ const session = await apiPost<{ session_id: string; session_endpoint: string }>(
283
+ '/lens/sessions/create', apiKey, { lens_id: lensId }
284
+ )
285
+ const sessionId = session.session_id
286
+
287
+ // Optionally delete the lens definition (session keeps running independently)
288
+ await apiPost('/lens/delete', apiKey, { lens_id: lensId })
289
+
290
+ // Wait for session to be ready (poll until status = running)
291
+ async function waitForSessionReady(sessionId: string, maxWaitMs = 30000): Promise<boolean> {
292
+ const start = Date.now()
293
+ while (Date.now() - start < maxWaitMs) {
294
+ const status = await apiPost<{ session_status: string }>(
295
+ '/lens/sessions/events/process', apiKey,
296
+ { session_id: sessionId, event: { type: 'session.status' } },
297
+ 10000
298
+ )
299
+ if (status.session_status === 'LensSessionStatus.SESSION_STATUS_RUNNING' ||
300
+ status.session_status === '3') {
301
+ return true
302
+ }
303
+ if (status.session_status === 'LensSessionStatus.SESSION_STATUS_FAILED' ||
304
+ status.session_status === '6') {
305
+ return false
306
+ }
307
+ await new Promise(r => setTimeout(r, 500))
308
+ }
309
+ return false
310
+ }
311
+
312
+ const isReady = await waitForSessionReady(sessionId)
313
+ if (!isReady) throw new Error('Session failed to start')
314
+ ```
315
+
316
+ ### Step 4: Stream CSV data in windows
317
+
318
+ Parse the CSV client-side and send windowed chunks via `POST /lens/sessions/events/process`:
319
+
320
+ ```typescript
321
+ // Parse CSV (using PapaParse or similar)
322
+ const rows = parsedCsv.data // array of { timestamp, a1, a2, a3, a4 }
323
+ const columns = ['a1', 'a2', 'a3', 'a4']
324
+
325
+ let start = 0
326
+ let counter = 0
327
+
328
+ while (start < rows.length) {
329
+ const end = Math.min(start + windowSize, rows.length)
330
+ const window = rows.slice(start, end)
331
+
332
+ // Transpose to channel-first: [[a1_vals], [a2_vals], [a3_vals], [a4_vals]]
333
+ const sensorData = columns.map(col =>
334
+ window.map(row => Number(row[col]))
335
+ )
336
+
337
+ await apiPost('/lens/sessions/events/process', apiKey, {
338
+ session_id: sessionId,
339
+ event: {
340
+ type: 'session.update',
341
+ event_data: {
342
+ type: 'data.json',
343
+ event_data: {
344
+ sensor_data: sensorData,
345
+ sensor_metadata: {
346
+ sensor_timestamp: Date.now() / 1000,
347
+ sensor_id: `web_sensor_${counter}`,
348
+ },
349
+ },
350
+ },
351
+ },
352
+ }, 10000)
353
+
354
+ start += stepSize
355
+ counter++
356
+ }
357
+ ```
358
+
359
+ ### Step 5: Consume SSE results
360
+
361
+ ```typescript
362
+ import { fetchEventSource } from '@microsoft/fetch-event-source'
363
+
364
+ fetchEventSource(`${API_ENDPOINT}/lens/sessions/consumer/${sessionId}`, {
365
+ headers: { Authorization: `Bearer ${apiKey}` },
366
+ onmessage(event) {
367
+ const parsed = JSON.parse(event.data)
368
+
369
+ if (parsed.type === 'inference.result') {
370
+ const result = parsed.event_data.response
371
+ const meta = parsed.event_data.query_metadata
372
+ console.log(`[${meta.query_timestamp ?? 'N/A'}] Predicted: ${result}`)
373
+ }
374
+
375
+ if (parsed.type === 'session.modify.result') {
376
+ const cls = parsed.event_data?.query_metadata?.class_name
377
+ console.log(`[TRAINING] Processed class: ${cls}`)
378
+ }
379
+
380
+ if (parsed.type === 'sse.stream.end') {
381
+ console.log('Stream complete')
382
+ }
383
+ },
384
+ })
385
+ ```
386
+
387
+ ### Step 6: Cleanup
388
+
389
+ ```typescript
390
+ await apiPost('/lens/sessions/destroy', apiKey, { session_id: sessionId })
391
+ ```
392
+
393
+ ### Web Lifecycle Summary
394
+
395
+ ```
396
+ 1. Upload n-shot CSVs -> POST /files (FormData, one per class)
397
+ 2. Register lens -> POST /lens/register { lens_config: config }
398
+ 3. Create session -> POST /lens/sessions/create { lens_id }
399
+ 4. Wait for ready -> POST /lens/sessions/events/process { session_id, event: { type: 'session.status' } }
400
+ 5. (Optional) Delete lens -> POST /lens/delete { lens_id }
401
+ 6. Stream windowed data -> POST /lens/sessions/events/process { session_id, event } (loop)
402
+ 7. Consume SSE results -> GET /lens/sessions/consumer/{sessionId}
403
+ 8. Destroy session -> POST /lens/sessions/destroy { session_id }
404
+ ```
405
+
406
+ ---
407
+
408
+ ## CSV Format Expected
409
+
410
+ ```csv
411
+ timestamp,a1,a2,a3,a4
412
+ 1700000000.0,100,200,300,374
413
+ 1700000000.01,101,199,301,375
414
+ ```
415
+
416
+ - `timestamp`: UNIX epoch float
417
+ - `a1, a2, a3`: Sensor axes (e.g., accelerometer x, y, z)
418
+ - `a4`: Magnitude (sqrt(a1² + a2² + a3²))
419
+ - Column names are configurable via `csv_configs.data_columns`
420
+
421
+ ## Optional: Results Logging
422
+
423
+ Save predictions to a timestamped CSV for analysis or visualization.
424
+
425
+ ### Python — Results CSV
426
+
427
+ ```python
428
+ import csv
429
+ from pathlib import Path
430
+ from datetime import datetime
431
+
432
+ # Create results directory and timestamped filename
433
+ results_dir = Path("results")
434
+ results_dir.mkdir(exist_ok=True)
435
+ file_stem = Path(args["file_path"]).stem
436
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
437
+ results_file = results_dir / f"{file_stem}_{timestamp}.csv"
438
+
439
+ # Write CSV header
440
+ with open(results_file, 'w', newline='', encoding='utf-8') as f:
441
+ writer = csv.writer(f)
442
+ writer.writerow(['read_index', 'predicted_class', 'confidence_scores',
443
+ 'file_id', 'window_size', 'total_rows'])
444
+
445
+ # Inside the SSE event loop, when handling inference.result:
446
+ if etype == "inference.result":
447
+ ed = event.get("event_data", {})
448
+ result = ed.get("response")
449
+ meta = ed.get("query_metadata", {})
450
+ query_meta = meta.get("query_metadata", {})
451
+
452
+ predicted_class = result[0] if isinstance(result, list) and len(result) > 0 else "unknown"
453
+ confidence_scores = result[1] if isinstance(result, list) and len(result) > 1 else {}
454
+ read_index = query_meta.get("read_index", "N/A")
455
+ file_id = query_meta.get("file_id", "N/A")
456
+ window_size = query_meta.get("window_size", "N/A")
457
+ total_rows = query_meta.get("total_rows", "N/A")
458
+
459
+ print(f"[{read_index}] Predicted: {predicted_class} | Scores: {confidence_scores}")
460
+
461
+ with open(results_file, 'a', newline='', encoding='utf-8') as f:
462
+ writer = csv.writer(f)
463
+ writer.writerow([read_index, predicted_class, str(confidence_scores),
464
+ file_id, window_size, total_rows])
465
+ ```
466
+
467
+ ### Response Structure
468
+
469
+ The `inference.result` response contains:
470
+ - `response[0]`: predicted class name (string, e.g. `"HEALTHY"`)
471
+ - `response[1]`: confidence scores dict (e.g. `{"HEALTHY": 0.95, "BROKEN": 0.05}`)
472
+ - `query_metadata.query_metadata.read_index`: window position in the file
473
+ - `query_metadata.query_metadata.file_id`: the file being analyzed
474
+ - `query_metadata.query_metadata.window_size`: window size used
475
+ - `query_metadata.query_metadata.total_rows`: total rows in the file
476
+
477
+ ### Web/JS — Results Array + CSV Download
478
+
479
+ ```typescript
480
+ interface PredictionResult {
481
+ readIndex: number | string
482
+ predictedClass: string
483
+ confidenceScores: Record<string, number>
484
+ fileId: string
485
+ windowSize: number
486
+ totalRows: number
487
+ }
488
+
489
+ const results: PredictionResult[] = []
490
+
491
+ // Inside the SSE onmessage handler:
492
+ if (parsed.type === 'inference.result') {
493
+ const result = parsed.event_data.response
494
+ const meta = parsed.event_data.query_metadata
495
+ const queryMeta = meta?.query_metadata ?? {}
496
+
497
+ const prediction: PredictionResult = {
498
+ readIndex: queryMeta.read_index ?? 'N/A',
499
+ predictedClass: Array.isArray(result) && result.length > 0 ? result[0] : 'unknown',
500
+ confidenceScores: Array.isArray(result) && result.length > 1 ? result[1] : {},
501
+ fileId: queryMeta.file_id ?? 'N/A',
502
+ windowSize: queryMeta.window_size ?? 0,
503
+ totalRows: queryMeta.total_rows ?? 0,
504
+ }
505
+
506
+ results.push(prediction)
507
+ console.log(`[${prediction.readIndex}] ${prediction.predictedClass}`, prediction.confidenceScores)
508
+ }
509
+
510
+ // Download results as CSV
511
+ function downloadResultsCsv(results: PredictionResult[], filename: string) {
512
+ const header = 'read_index,predicted_class,confidence_scores,file_id,window_size,total_rows\n'
513
+ const rows = results.map(r =>
514
+ `${r.readIndex},${r.predictedClass},"${JSON.stringify(r.confidenceScores)}",${r.fileId},${r.windowSize},${r.totalRows}`
515
+ ).join('\n')
516
+
517
+ const blob = new Blob([header + rows], { type: 'text/csv' })
518
+ const url = URL.createObjectURL(blob)
519
+ const a = document.createElement('a')
520
+ a.href = url
521
+ a.download = filename
522
+ a.click()
523
+ URL.revokeObjectURL(url)
524
+ }
525
+ ```
526
+
527
+ ### CLI Flag
528
+
529
+ Add `--save-results` flag (default: `True`) to enable/disable results logging:
530
+
531
+ ```
532
+ --save-results Save predictions to CSV in results/ directory (default: True)
533
+ ```
534
+
535
+ ---
536
+
537
+ ## Key Implementation Notes
538
+
539
+ - N-shot class names are derived from the filename stem (e.g., `healthy.csv` → class `HEALTHY`)
540
+ - The `data_columns` in `csv_configs` must match both the n-shot files and the data file
541
+ - `window_size` and `step_size` control the sliding window over the data
542
+ - Default `window_size` and `step_size`: **100**
543
+ - Use `signal.SIGINT` handler for graceful shutdown (Python) or `AbortController` (Web)
544
+ - Always close `sse_reader` in a `finally` block (Python) or destroy session on unmount (Web)
545
+ - The SSE reader emits `inference.result` for predictions and `session.modify.result` for training confirmations