npm - windows-use - Versions diffs - 0.3.0 → 0.3.2 - Mend

windows-use 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md CHANGED Viewed

@@ -1,8 +1,34 @@
-# windows-use
+<p align="center">
+  <h1 align="center">windows-use</h1>
+  <p align="center">
+    <strong>Save 90% context — let cheap models do the clicking.</strong>
+  </p>
+  <p align="center">
+    <a href="https://www.npmjs.com/package/windows-use"><img src="https://img.shields.io/npm/v/windows-use.svg" alt="npm version"></a>
+    <a href="https://www.npmjs.com/package/windows-use"><img src="https://img.shields.io/npm/dm/windows-use.svg" alt="npm downloads"></a>
+    <a href="https://github.com/yuhuison/local-windows-use/blob/main/LICENSE"><img src="https://img.shields.io/npm/l/windows-use.svg" alt="license"></a>
+    <a href="https://github.com/yuhuison/local-windows-use"><img src="https://img.shields.io/github/stars/yuhuison/local-windows-use.svg?style=social" alt="GitHub stars"></a>
+  </p>
+</p>
-Let big LLMs delegate Windows & browser automation to small LLMs via sessions.
+---
-When AI agents use tools to operate Windows or browsers, screenshots and other multimodal returns consume massive context. This project solves that with a "big model directs small model" architecture — the small model (Qwen, GPT-4o-mini, etc.) does the actual work autonomously and reports back concise summaries with optional embedded screenshots.
+## Why?
+When AI agents operate Windows or browsers, **every screenshot eats 1000+ tokens**. A single "open Chrome and search something" task can burn through 10K–50K tokens of your expensive model's context — just on screenshots and tool calls.
+**That's like hiring a CEO to move the mouse.**
+## How it works
+`windows-use` introduces a **"big model directs, small model executes"** architecture:
+| | Without windows-use | With windows-use |
+|---|---|---|
+| **Who clicks?** | Claude / GPT-4o (expensive) | Qwen, GPT-4o-mini, DeepSeek (cheap) |
+| **Context cost per task** | 10K–50K tokens of screenshots | ~200 tokens (text summary) |
+| **What big model sees** | Raw screenshots + coordinates | Clean text report + optional images |
+| **Cost** | $$$ | ¢ |
 ```
 Big Model                    windows-use                    Small Model
@@ -18,6 +44,38 @@ Big Model                    windows-use                    Small Model
    ├─ done_session ──────────►    │  cleanup                     │
 ```
+Your big model just says *"open Notepad and type Hello"* — the small model handles all the screenshots, clicking, and verification autonomously, then reports back a concise summary.
+## Designed for simplicity
+> **You don't need to be an expert.** If you can run `npm install`, you can use this.
+Most computer-use tools ask you to set up Docker, configure sandboxes, manage virtual displays, or fight with permissions. **windows-use does none of that.**
+- **One dependency** — `npm install -g windows-use`. That's it. No Docker, no Python environments, no system-level permissions.
+- **One config** — Run `windows-use init`, paste your API key and endpoint, done. Works with any OpenAI-compatible API.
+- **Your real Chrome** — No headless puppeteer, no sandboxed browser. It connects to **your actual Chrome** via CDP — with your cookies, logins, extensions, and bookmarks intact. Just launch Chrome with `--remote-debugging-port=9222`.
+- **No admin / root needed** — Runs entirely in user space. No elevated privileges, no security prompts.
+- **Share config in one line** — `windows-use init --export` gives you a base64 string. Send it to a teammate, they run `windows-use init <string>`, instant setup.
+```bash
+# That's literally the entire setup:
+npm install -g windows-use
+windows-use init
+windows-use "Open Notepad and type Hello World"
+```
+## Key Features
+- **Save 90% context** — Keep your expensive model's context window for reasoning, not screenshots
+- **Any OpenAI-compatible model** — Qwen, DeepSeek, Ollama, vLLM, GPT-4o-mini, or any local model
+- **16 built-in tools** — Screen capture with coordinate grid, mouse, keyboard, browser automation, file I/O
+- **Your real Chrome** — Uses your existing cookies, login state, and extensions (no webdriver detection)
+- **MCP server** — Drop-in integration with Claude Desktop, VS Code, and any MCP client
+- **Rich reports** — Text + embedded screenshots, so the big model sees exactly what it needs
+- **CLI + REPL + API** — Use from terminal, interactively, or programmatically
+- **Zero config headache** — No Docker, no sandbox, no permissions. Just an API key.
 ## Install
 ```bash
@@ -29,6 +87,12 @@ npm install -g windows-use
 ```bash
 # Interactive setup — saves config to ~/.windows-use.json
 windows-use init
+# Export config as a shareable base64 string
+windows-use init --export
+# Import config from a base64 string
+windows-use init eyJiYXNlVVJMIjoiaHR0cHM6Ly...
 ```
 You'll be prompted for:
@@ -265,6 +329,10 @@ src/
     └── control/                # report
 ```
+## Contributing
+Contributions are welcome! Please open an issue or submit a pull request.
 ## License
 MIT