npm - @blueharford/scrypted-spatial-awareness - Versions diffs - 0.4.8-beta.1 → 0.5.0 - Mend

@blueharford/scrypted-spatial-awareness 0.4.8-beta.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +94 -0
package/package.json +1 -1
package/src/core/spatial-reasoning.ts +61 -5
package/src/core/topology-discovery.ts +641 -0
package/src/main.ts +294 -3
package/src/models/discovery.ts +210 -0
package/src/models/topology.ts +53 -0
package/src/ui/editor-html.ts +494 -1
package/dist/main.nodejs.js +0 -3
package/dist/main.nodejs.js.LICENSE.txt +0 -1
package/dist/main.nodejs.js.map +0 -1
package/dist/plugin.zip +0 -0
package/out/main.nodejs.js +0 -42753
package/out/main.nodejs.js.map +0 -1
package/out/plugin.zip +0 -0

package/README.md CHANGED Viewed

@@ -51,6 +51,7 @@ Done! Your camera topology is configured.
 ### Visual Editor
 - **Floor Plan** - Upload image or draw with built-in tools
 - **Drag & Drop** - Place cameras, draw connections
+- **Polygon Zone Drawing** - Draw custom zones (yards, driveways, patios, etc.)
 - **Live Tracking** - Watch objects move in real-time
 ### AI Features (optional)
@@ -58,6 +59,7 @@ Done! Your camera topology is configured.
 - **Auto-Learning** - Transit times adjust based on observations
 - **Connection Suggestions** - System suggests new camera paths
 - **Landmark Discovery** - AI identifies landmarks from footage
+- **Auto-Topology Discovery** - Vision LLM analyzes camera views to build topology
 ### Integrations
 - **MQTT** - Home Assistant integration
@@ -170,6 +172,98 @@ Base URL: `/endpoint/@blueharford/scrypted-spatial-awareness`
 | `/api/training/apply` | POST | Apply results to topology |
 | `/api/training/status` | GET | Current training status |
+### Discovery API
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/discovery/scan` | POST | Run full discovery scan |
+| `/api/discovery/status` | GET | Current discovery status |
+| `/api/discovery/suggestions` | GET | Pending suggestions |
+| `/api/discovery/camera/{id}` | GET | Analyze single camera |
+## Auto-Topology Discovery
+The plugin can automatically analyze camera views using a vision-capable LLM to discover landmarks, zones, and camera connections.
+### How It Works
+1. **Capture Snapshots** - System takes a picture from each camera
+2. **Scene Analysis** - Vision LLM identifies landmarks, zones, and edges in each view
+3. **Cross-Camera Correlation** - LLM correlates findings across cameras to identify shared landmarks and connections
+4. **Suggestions** - Discoveries are presented as suggestions you can accept or reject
+### Using Discovery
+**Manual Scan:**
+1. Open the topology editor (`/ui/editor`)
+2. Find the "Auto-Discovery" section in the sidebar
+3. Click "Scan Now"
+4. Review and accept/reject suggestions
+**Automatic Scan:**
+- Set `Auto-Discovery Interval (hours)` in plugin settings
+- System will periodically scan and generate suggestions
+- Set to 0 to disable automatic scanning
+### Discovery Settings
+| Setting | Default | Description |
+|---------|---------|-------------|
+| Auto-Discovery Interval | 0 (disabled) | Hours between automatic scans (0 = disabled) |
+| Min Landmark Confidence | 0.6 | Minimum confidence for landmark suggestions |
+| Min Connection Confidence | 0.5 | Minimum confidence for connection suggestions |
+| Auto-Accept Threshold | 0.85 | Auto-accept suggestions above this confidence |
+> **Rate Limiting Note:** If you set the interval to less than 1 hour, a warning will appear in the discovery status. Frequent scans can consume significant LLM API quota and may be rate-limited by your provider.
+### Requirements
+- **Vision-capable LLM** - Install @scrypted/llm with a vision model (OpenAI GPT-4V, Claude, etc.)
+- **Camera access** - Plugin needs camera.takePicture() capability
+### What Gets Discovered
+- **Landmarks**: Doors, gates, mailbox, garage, structures, fences
+- **Zones**: Front yard, driveway, patio, street, walkways
+- **Connections**: Suggested camera paths with transit time estimates
+- **Edges**: What's visible at frame boundaries (for correlation)
+## Zone Drawing
+The visual editor includes a polygon zone drawing tool for marking areas on your floor plan.
+### How to Draw Zones
+1. Click the **Draw Zone** button in the toolbar (green)
+2. Enter a zone name and select the type (yard, driveway, patio, etc.)
+3. Click **Start Drawing**
+4. Click on the canvas to add polygon points
+5. **Double-click** or press **Enter** to finish the zone
+6. Press **Escape** to cancel, **Backspace** to undo last point
+### Zone Types
+| Type | Color | Description |
+|------|-------|-------------|
+| Yard | Green | Front yard, backyard, side yard |
+| Driveway | Gray | Driveway, parking area |
+| Street | Dark Gray | Street, sidewalk |
+| Patio | Orange | Patio, deck |
+| Walkway | Brown | Walkways, paths |
+| Parking | Light Gray | Parking lot, parking space |
+| Garden | Light Green | Garden, landscaped area |
+| Pool | Blue | Pool area |
+| Garage | Medium Gray | Garage area |
+| Entrance | Pink | Entry areas |
+| Custom | Purple | Custom zone type |
+### Using Zones
+- Click on a zone to select it and edit its properties
+- Zones are color-coded by type for easy identification
+- Zones help provide context for object movement descriptions
+- Auto-Discovery can suggest zones based on camera analysis
 ## MQTT Topics
 Base: `scrypted/spatial-awareness`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@blueharford/scrypted-spatial-awareness",
-  "version": "0.4.8-beta.1",
+  "version": "0.5.0",
   "description": "Cross-camera object tracking for Scrypted NVR with spatial awareness",
   "author": "Joshua Seidel <blueharford>",
   "license": "Apache-2.0",

package/src/core/spatial-reasoning.ts CHANGED Viewed

@@ -10,6 +10,7 @@ import sdk, {
   Camera,
   MediaObject,
   ScryptedDevice,
+  ScryptedMimeTypes,
 } from '@scrypted/sdk';
 import {
   CameraTopology,
@@ -26,7 +27,7 @@ import {
 } from '../models/topology';
 import { TrackedObject, ObjectSighting } from '../models/tracked-object';
-const { systemManager } = sdk;
+const { systemManager, mediaManager } = sdk;
 /** Configuration for the spatial reasoning engine */
 export interface SpatialReasoningConfig {
@@ -68,6 +69,29 @@ interface ChatCompletionDevice extends ScryptedDevice {
   streamChatCompletion?(params: any): AsyncGenerator<any>;
 }
+/**
+ * Convert a MediaObject to a base64 data URL for vision LLM consumption
+ * @param mediaObject - MediaObject from camera.takePicture()
+ * @returns Base64 data URL (data:image/jpeg;base64,...) or null if conversion fails
+ */
+export async function mediaObjectToBase64(mediaObject: MediaObject): Promise<string | null> {
+  try {
+    // Convert MediaObject to Buffer using mediaManager
+    const buffer = await mediaManager.convertMediaObjectToBuffer(mediaObject, ScryptedMimeTypes.Image);
+    // Convert buffer to base64
+    const base64 = buffer.toString('base64');
+    // Determine MIME type - default to JPEG for camera images
+    const mimeType = mediaObject.mimeType?.split(';')[0] || 'image/jpeg';
+    return `data:${mimeType};base64,${base64}`;
+  } catch (e) {
+    console.warn('Failed to convert MediaObject to base64:', e);
+    return null;
+  }
+}
 export class SpatialReasoningEngine {
   private config: SpatialReasoningConfig;
   private console: Console;
@@ -712,7 +736,7 @@ export class SpatialReasoningEngine {
     return connection.name || undefined;
   }
-  /** Get LLM-enhanced description using ChatCompletion interface */
+  /** Get LLM-enhanced description using ChatCompletion interface with vision support */
   private async getLlmEnhancedDescription(
     tracked: TrackedObject,
     fromCamera: CameraNode,
@@ -726,6 +750,9 @@ export class SpatialReasoningEngine {
     if (!llm || !llm.getChatCompletion) return null;
     try {
+      // Convert image to base64 for vision LLM
+      const imageBase64 = await mediaObjectToBase64(mediaObject);
       // Retrieve relevant context for RAG
       const relevantChunks = this.retrieveRelevantContext(
         fromCamera.deviceId,
@@ -746,12 +773,25 @@ export class SpatialReasoningEngine {
         ragContext
       );
+      // Build message content - use multimodal format if we have an image
+      let messageContent: any;
+      if (imageBase64) {
+        // Vision-capable multimodal message format (OpenAI compatible)
+        messageContent = [
+          { type: 'text', text: prompt },
+          { type: 'image_url', image_url: { url: imageBase64 } },
+        ];
+      } else {
+        // Fallback to text-only if image conversion failed
+        messageContent = prompt;
+      }
       // Call LLM using ChatCompletion interface
       const result = await llm.getChatCompletion({
         messages: [
           {
             role: 'user',
-            content: prompt,
+            content: messageContent,
           },
         ],
         max_tokens: 150,
@@ -809,7 +849,7 @@ Examples of good descriptions:
 Generate ONLY the description, nothing else:`;
   }
-  /** Suggest a new landmark based on AI analysis using ChatCompletion */
+  /** Suggest a new landmark based on AI analysis using ChatCompletion with vision */
   async suggestLandmark(
     cameraId: string,
     mediaObject: MediaObject,
@@ -822,6 +862,9 @@ Generate ONLY the description, nothing else:`;
     if (!llm || !llm.getChatCompletion) return null;
     try {
+      // Convert image to base64 for vision LLM
+      const imageBase64 = await mediaObjectToBase64(mediaObject);
       const prompt = `Analyze this security camera image. A ${objectClass} was detected.
 Looking at the surroundings and environment, identify any notable landmarks or features visible that could help describe this location. Consider:
@@ -835,12 +878,25 @@ If you can identify a clear landmark feature, respond with ONLY a JSON object:
 If no clear landmark is identifiable, respond with: {"name": null}`;
+      // Build message content - use multimodal format if we have an image
+      let messageContent: any;
+      if (imageBase64) {
+        // Vision-capable multimodal message format (OpenAI compatible)
+        messageContent = [
+          { type: 'text', text: prompt },
+          { type: 'image_url', image_url: { url: imageBase64 } },
+        ];
+      } else {
+        // Fallback to text-only if image conversion failed
+        messageContent = prompt;
+      }
       // Call LLM using ChatCompletion interface
       const result = await llm.getChatCompletion({
         messages: [
           {
             role: 'user',
-            content: prompt,
+            content: messageContent,
           },
         ],
         max_tokens: 100,