npm - cursor-buddy - Versions diffs - 0.0.8 → 0.0.9-beta.1 - Mend

cursor-buddy 0.0.8 → 0.0.9-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/README.md +9 -12
package/dist/{client-D73KQZf8.mjs → client-CliXcNch.mjs} +296 -389
package/dist/client-CliXcNch.mjs.map +1 -0
package/dist/{client-Crn8tW7w.d.mts → client-sjVVGYPU.d.mts} +7 -39
package/dist/client-sjVVGYPU.d.mts.map +1 -0
package/dist/index.d.mts +3 -2
package/dist/index.mjs +3 -2
package/dist/point-tool-DZJmhD8e.mjs +16 -0
package/dist/point-tool-DZJmhD8e.mjs.map +1 -0
package/dist/point-tool-l3FewgM9.d.mts +22 -0
package/dist/point-tool-l3FewgM9.d.mts.map +1 -0
package/dist/react/index.d.mts +1 -1
package/dist/react/index.mjs +1 -1
package/dist/server/adapters/next.d.mts +2 -3
package/dist/server/adapters/next.d.mts.map +1 -1
package/dist/server/adapters/next.mjs +2 -5
package/dist/server/adapters/next.mjs.map +1 -1
package/dist/server/index.d.mts +4 -7
package/dist/server/index.d.mts.map +1 -1
package/dist/server/index.mjs +127 -39
package/dist/server/index.mjs.map +1 -1
package/dist/{types-BxBhjZju.d.mts → types-BJfkApb_.d.mts} +2 -1
package/dist/types-BJfkApb_.d.mts.map +1 -0
package/package.json +3 -2
package/dist/client-Crn8tW7w.d.mts.map +0 -1
package/dist/client-D73KQZf8.mjs.map +0 -1
package/dist/types-BxBhjZju.d.mts.map +0 -1

package/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ Customize its prompt, pass custom tools, choose between browser or server-side s
 - **Push-to-talk voice input** — Hold a hotkey to speak, release to send
 - **Browser-first live transcription** — Realtime transcript while speaking, with server fallback
-- **Annotated screenshot context** — AI sees your current viewport with numbered interactive elements
+- **DOM snapshot context** — AI sees a token-efficient representation of your visible page structure
 - **Voice responses** — Browser or server TTS, with optional streaming playback
 - **Cursor pointing** — AI can point at UI elements it references
 - **Voice interruption** — Start talking again to cut off current response
@@ -57,7 +57,7 @@ export const cursorBuddy = createCursorBuddyHandler({
 import { toNextJsHandler } from "cursor-buddy/server/next"
 import { cursorBuddy } from "@/lib/cursor-buddy"
-export const { GET, POST } = toNextJsHandler(cursorBuddy)
+export const { POST } = toNextJsHandler(cursorBuddy)
 ```
 ### 2. Client Setup
@@ -367,17 +367,15 @@ client.stopListening()
 1. User holds the hotkey
 2. Microphone captures audio, waveform shows audio level, and browser speech recognition starts when available
-3. User releases hotkey
-4. An annotated screenshot of the viewport is captured, with numbered markers on visible interactive elements, based on [agent-browser](https://github.com/vercel-labs/agent-browser) implementation.
+3. At the same time, a screenshot and token-efficient DOM snapshot of the viewport are captured in the background. This runs in parallel with speech capture to minimize latency
+4. User releases hotkey
 5. The client prefers the browser transcript; if it is unavailable or empty in `auto` mode, the recorded audio is transcribed on the server
-6. Screenshot + marker context are sent to the AI model
-7. AI responds with text, optionally including a pointing tag:
-   - Preferred: `[POINT:5:Submit]` for numbered interactive elements
-   - Fallback: `[POINT:640,360:Error text]` for arbitrary screen coordinates
+6. The already-captured screenshot + DOM snapshot are sent to the AI model. Each element has an `@ID` (e.g., `@12`) that the AI can reference.
+7. AI responds with text and can optionally call the `point` tool to indicate an element on screen by its `@ID` from the DOM snapshot
 8. Response is spoken in the browser or on the server based on `speech.mode`,
-   and can either wait for the full response or stream sentence-by-sentence
-   based on `speech.allowStreaming`
-9. If a marker tag is present, it is resolved back to the live DOM element; if a coordinate tag is present, it is mapped back to the live viewport; then the cursor animates to the target location
+    and can either wait for the full response or stream sentence-by-sentence
+    based on `speech.allowStreaming`
+9. If the AI calls the point tool, the cursor animates to the target element's current position (it resolves the element from the snapshot registry and computes its center point)
 10. **If user presses hotkey again at any point, current response is interrupted**
 ## Security Best Practices
@@ -415,7 +413,6 @@ export const GET = POST
 ## TODOs
-- [ ] High: Make tool calls first class: Pointing becomes tool call (once per turn) + re-use pointing bubble UI for tool calls
 - [ ] Medium: Proper test structure without relying on `as any` for audio and voice capture
 ## License