windows-use 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,34 @@
1
- # windows-use
1
+ <p align="center">
2
+ <h1 align="center">windows-use</h1>
3
+ <p align="center">
4
+ <strong>Save 90% context — let cheap models do the clicking.</strong>
5
+ </p>
6
+ <p align="center">
7
+ <a href="https://www.npmjs.com/package/windows-use"><img src="https://img.shields.io/npm/v/windows-use.svg" alt="npm version"></a>
8
+ <a href="https://www.npmjs.com/package/windows-use"><img src="https://img.shields.io/npm/dm/windows-use.svg" alt="npm downloads"></a>
9
+ <a href="https://github.com/yuhuison/local-windows-use/blob/main/LICENSE"><img src="https://img.shields.io/npm/l/windows-use.svg" alt="license"></a>
10
+ <a href="https://github.com/yuhuison/local-windows-use"><img src="https://img.shields.io/github/stars/yuhuison/local-windows-use.svg?style=social" alt="GitHub stars"></a>
11
+ </p>
12
+ </p>
2
13
 
3
- Let big LLMs delegate Windows & browser automation to small LLMs via sessions.
14
+ ---
4
15
 
5
- When AI agents use tools to operate Windows or browsers, screenshots and other multimodal returns consume massive context. This project solves that with a "big model directs small model" architecture — the small model (Qwen, GPT-4o-mini, etc.) does the actual work autonomously and reports back concise summaries with optional embedded screenshots.
16
+ ## Why?
17
+
18
+ When AI agents operate Windows or browsers, **every screenshot eats 1000+ tokens**. A single "open Chrome and search something" task can burn through 10K–50K tokens of your expensive model's context — just on screenshots and tool calls.
19
+
20
+ **That's like hiring a CEO to move the mouse.**
21
+
22
+ ## How it works
23
+
24
+ `windows-use` introduces a **"big model directs, small model executes"** architecture:
25
+
26
+ | | Without windows-use | With windows-use |
27
+ |---|---|---|
28
+ | **Who clicks?** | Claude / GPT-4o (expensive) | Qwen, GPT-4o-mini, DeepSeek (cheap) |
29
+ | **Context cost per task** | 10K–50K tokens of screenshots | ~200 tokens (text summary) |
30
+ | **What big model sees** | Raw screenshots + coordinates | Clean text report + optional images |
31
+ | **Cost** | $$$ | ¢ |
6
32
 
7
33
  ```
8
34
  Big Model windows-use Small Model
@@ -18,6 +44,38 @@ Big Model windows-use Small Model
18
44
  ├─ done_session ──────────► │ cleanup │
19
45
  ```
20
46
 
47
+ Your big model just says *"open Notepad and type Hello"* — the small model handles all the screenshots, clicking, and verification autonomously, then reports back a concise summary.
48
+
49
+ ## Designed for simplicity
50
+
51
+ > **You don't need to be an expert.** If you can run `npm install`, you can use this.
52
+
53
+ Most computer-use tools ask you to set up Docker, configure sandboxes, manage virtual displays, or fight with permissions. **windows-use does none of that.**
54
+
55
+ - **One dependency** — `npm install -g windows-use`. That's it. No Docker, no Python environments, no system-level permissions.
56
+ - **One config** — Run `windows-use init`, paste your API key and endpoint, done. Works with any OpenAI-compatible API.
57
+ - **Your real Chrome** — No headless puppeteer, no sandboxed browser. It connects to **your actual Chrome** via CDP — with your cookies, logins, extensions, and bookmarks intact. Just launch Chrome with `--remote-debugging-port=9222`.
58
+ - **No admin / root needed** — Runs entirely in user space. No elevated privileges, no security prompts.
59
+ - **Share config in one line** — `windows-use init --export` gives you a base64 string. Send it to a teammate, they run `windows-use init <string>`, instant setup.
60
+
61
+ ```bash
62
+ # That's literally the entire setup:
63
+ npm install -g windows-use
64
+ windows-use init
65
+ windows-use "Open Notepad and type Hello World"
66
+ ```
67
+
68
+ ## Key Features
69
+
70
+ - **Save 90% context** — Keep your expensive model's context window for reasoning, not screenshots
71
+ - **Any OpenAI-compatible model** — Qwen, DeepSeek, Ollama, vLLM, GPT-4o-mini, or any local model
72
+ - **16 built-in tools** — Screen capture with coordinate grid, mouse, keyboard, browser automation, file I/O
73
+ - **Your real Chrome** — Uses your existing cookies, login state, and extensions (no webdriver detection)
74
+ - **MCP server** — Drop-in integration with Claude Desktop, VS Code, and any MCP client
75
+ - **Rich reports** — Text + embedded screenshots, so the big model sees exactly what it needs
76
+ - **CLI + REPL + API** — Use from terminal, interactively, or programmatically
77
+ - **Zero config headache** — No Docker, no sandbox, no permissions. Just an API key.
78
+
21
79
  ## Install
22
80
 
23
81
  ```bash
@@ -29,6 +87,12 @@ npm install -g windows-use
29
87
  ```bash
30
88
  # Interactive setup — saves config to ~/.windows-use.json
31
89
  windows-use init
90
+
91
+ # Export config as a shareable base64 string
92
+ windows-use init --export
93
+
94
+ # Import config from a base64 string
95
+ windows-use init eyJiYXNlVVJMIjoiaHR0cHM6Ly...
32
96
  ```
33
97
 
34
98
  You'll be prompted for:
@@ -265,6 +329,10 @@ src/
265
329
  └── control/ # report
266
330
  ```
267
331
 
332
+ ## Contributing
333
+
334
+ Contributions are welcome! Please open an issue or submit a pull request.
335
+
268
336
  ## License
269
337
 
270
338
  MIT