textweb 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Christopher Robison
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,231 @@
1
+ # TextWeb
2
+
3
+ **A text-grid web renderer for AI agents โ€” see the web without screenshots.**
4
+
5
+ Instead of taking expensive screenshots and piping them through vision models, TextWeb renders web pages as structured text grids that LLMs can reason about natively. Full JavaScript execution, spatial layout preserved, interactive elements annotated.
6
+
7
+ ๐Ÿ“„ [Documentation](https://chrisrobison.github.io/textweb) ยท ๐Ÿ“ฆ [npm](https://www.npmjs.com/package/textweb) ยท ๐Ÿ™ [GitHub](https://github.com/chrisrobison/textweb)
8
+
9
+ ## Why?
10
+
11
+ | Approach | Size | Requires | Speed | Spatial Layout |
12
+ |----------|------|----------|-------|----------------|
13
+ | Screenshot + Vision | ~1MB | Vision model ($$$) | Slow | Pixel-level |
14
+ | Accessibility Tree | ~5KB | Nothing | Fast | โŒ Lost |
15
+ | Raw HTML | ~100KB+ | Nothing | Fast | โŒ Lost |
16
+ | **TextWeb** | **~2-5KB** | **Nothing** | **Fast** | **โœ… Preserved** |
17
+
18
+ ## Quick Start
19
+
20
+ ```bash
21
+ npm install -g textweb
22
+ npx playwright install chromium
23
+ ```
24
+
25
+ ```bash
26
+ # Render any page
27
+ textweb https://news.ycombinator.com
28
+
29
+ # Interactive mode
30
+ textweb --interactive https://github.com
31
+
32
+ # JSON output for agents
33
+ textweb --json https://example.com
34
+ ```
35
+
36
+ ## Example Output
37
+
38
+ ```
39
+ [0]Hacker News [1]new | [2]past | [3]comments | [4]ask | [5]show | [6]jobs | [7]submit [8]login
40
+
41
+ 1. [9]Show HN: TextWeb โ€“ text-grid browser for AI agents (github.com)
42
+ 142 points by chrisrobison 3 hours ago | [10]89 comments
43
+ 2. [11]Why LLMs don't need screenshots to browse the web
44
+ 87 points by somebody 5 hours ago | [12]34 comments
45
+
46
+ [13:______________________] [14 Search]
47
+ ```
48
+
49
+ ~500 bytes. An LLM can read this, understand the layout, and say "click ref 9" to open the first link. No vision model needed.
50
+
51
+ ## Integration Options
52
+
53
+ TextWeb works with any AI agent framework. Pick your integration:
54
+
55
+ ### ๐Ÿ”Œ MCP Server (Claude Desktop, Cursor, Windsurf, Cline, OpenClaw)
56
+
57
+ The fastest way to add web browsing to any MCP-compatible client.
58
+
59
+ ```bash
60
+ # Install globally
61
+ npm install -g textweb
62
+
63
+ # Or run directly
64
+ npx textweb-mcp
65
+ ```
66
+
67
+ **Claude Desktop** โ€” add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
68
+
69
+ ```json
70
+ {
71
+ "mcpServers": {
72
+ "textweb": {
73
+ "command": "textweb-mcp"
74
+ }
75
+ }
76
+ }
77
+ ```
78
+
79
+ **Cursor** โ€” add to `.cursor/mcp.json`:
80
+
81
+ ```json
82
+ {
83
+ "mcpServers": {
84
+ "textweb": {
85
+ "command": "textweb-mcp"
86
+ }
87
+ }
88
+ }
89
+ ```
90
+
91
+ **OpenClaw** โ€” add to `openclaw.json` skills or MCP config.
92
+
93
+ Then just ask: *"Go to hacker news and find posts about AI"* โ€” the agent uses text grids instead of screenshots.
94
+
95
+ ### ๐Ÿ› ๏ธ OpenAI / Anthropic Function Calling
96
+
97
+ Drop-in tool definitions for any function-calling model. See [`tools/tool_definitions.json`](tools/tool_definitions.json).
98
+
99
+ Pair with the [system prompt](tools/system_prompt.md) to teach the model how to read the grid:
100
+
101
+ ```python
102
+ import json
103
+
104
+ # Load tool definitions
105
+ with open("tools/tool_definitions.json") as f:
106
+ textweb_tools = json.load(f)["tools"]
107
+
108
+ # Load system prompt
109
+ with open("tools/system_prompt.md") as f:
110
+ system_prompt = f.read()
111
+
112
+ # Use with OpenAI
113
+ response = openai.chat.completions.create(
114
+ model="gpt-4",
115
+ messages=[
116
+ {"role": "system", "content": system_prompt},
117
+ {"role": "user", "content": "Go to example.com and click the first link"},
118
+ ],
119
+ tools=textweb_tools,
120
+ )
121
+ ```
122
+
123
+ ### ๐Ÿฆœ LangChain
124
+
125
+ ```python
126
+ from tools.langchain import get_textweb_tools
127
+
128
+ # Start the server first: textweb --serve 3000
129
+ tools = get_textweb_tools(base_url="http://localhost:3000")
130
+
131
+ # Use with any LangChain agent
132
+ from langchain.agents import initialize_agent
133
+ agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
134
+ agent.run("Find the top story on Hacker News")
135
+ ```
136
+
137
+ ### ๐Ÿšข CrewAI
138
+
139
+ ```python
140
+ from tools.crewai import TextWebBrowseTool, TextWebClickTool, TextWebTypeTool
141
+
142
+ # Start the server first: textweb --serve 3000
143
+ researcher = Agent(
144
+ role="Web Researcher",
145
+ tools=[TextWebBrowseTool(), TextWebClickTool(), TextWebTypeTool()],
146
+ llm=llm,
147
+ )
148
+ ```
149
+
150
+ ### ๐ŸŒ HTTP API
151
+
152
+ ```bash
153
+ # Start the server
154
+ textweb --serve 3000
155
+
156
+ # Navigate
157
+ curl -X POST http://localhost:3000/navigate \
158
+ -H 'Content-Type: application/json' \
159
+ -d '{"url": "https://example.com"}'
160
+
161
+ # Click, type, scroll
162
+ curl -X POST http://localhost:3000/click -d '{"ref": 3}'
163
+ curl -X POST http://localhost:3000/type -d '{"ref": 7, "text": "hello"}'
164
+ curl -X POST http://localhost:3000/scroll -d '{"direction": "down"}'
165
+ ```
166
+
167
+ ### ๐Ÿ“ฆ Node.js Library
168
+
169
+ ```javascript
170
+ const { AgentBrowser } = require('textweb');
171
+
172
+ const browser = new AgentBrowser({ cols: 120 });
173
+ const { view, elements, meta } = await browser.navigate('https://example.com');
174
+
175
+ console.log(view); // The text grid
176
+ console.log(elements); // { 0: { selector, tag, text, href }, ... }
177
+
178
+ await browser.click(3); // Click element [3]
179
+ await browser.type(7, 'hello'); // Type into element [7]
180
+ await browser.scroll('down'); // Scroll down
181
+ await browser.close();
182
+ ```
183
+
184
+ ## Grid Conventions
185
+
186
+ | Element | Rendering | Interaction |
187
+ |---------|-----------|-------------|
188
+ | Links | `[ref]link text` | `click(ref)` |
189
+ | Buttons | `[ref button text]` | `click(ref)` |
190
+ | Text inputs | `[ref:placeholder____]` | `type(ref, "text")` |
191
+ | Checkboxes | `[ref:X]` / `[ref: ]` | `click(ref)` to toggle |
192
+ | Radio buttons | `[ref:โ—]` / `[ref:โ—‹]` | `click(ref)` |
193
+ | Dropdowns | `[ref:โ–ผ Selected]` | `select(ref, "value")` |
194
+ | File inputs | `[ref:๐Ÿ“Ž Choose file]` | `upload(ref, "/path")` |
195
+ | Headings | `โ•โ•โ• HEADING โ•โ•โ•` | โ€” |
196
+ | Separators | `โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€` | โ€” |
197
+ | List items | `โ€ข Item text` | โ€” |
198
+
199
+ ## How It Works
200
+
201
+ ```
202
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
203
+ โ”‚ Your Agent (any LLM) โ”‚
204
+ โ”‚ "click 3" / "type 7 hello" / "scroll down" โ”‚
205
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
206
+ โ”‚ TextWeb โ”‚
207
+ โ”‚ Pixel positions โ†’ character grid โ”‚
208
+ โ”‚ Interactive elements get [ref] annotations โ”‚
209
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
210
+ โ”‚ Headless Chromium (Playwright) โ”‚
211
+ โ”‚ Full JS/CSS execution โ”‚
212
+ โ”‚ getBoundingClientRect() for all elements โ”‚
213
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
214
+ ```
215
+
216
+ 1. **Real browser** renders the page (full JS, CSS, dynamic content)
217
+ 2. **Extract** every visible element's position, size, text, and interactivity
218
+ 3. **Map** pixel coordinates to character grid positions (spatial layout preserved)
219
+ 4. **Annotate** interactive elements with `[ref]` numbers for agent interaction
220
+
221
+ ## Design Principles
222
+
223
+ 1. **Text is native to LLMs** โ€” no vision model middleman
224
+ 2. **Spatial layout matters** โ€” flat element lists lose the "where"
225
+ 3. **Cheap and fast** โ€” 2-5KB per render vs 1MB+ screenshots
226
+ 4. **Full web support** โ€” real Chromium runs the JS
227
+ 5. **Interactive** โ€” reference numbers map to real DOM elements
228
+
229
+ ## License
230
+
231
+ MIT ยฉ [Christopher Robison](https://cdr2.com)