ember-browser 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,11 @@
1
+ GNU AFFERO GENERAL PUBLIC LICENSE
2
+ Version 3, 19 November 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ ...
9
+
10
+ The full AGPL-3.0 license text is available at:
11
+ https://www.gnu.org/licenses/agpl-3.0.txt
@@ -0,0 +1,338 @@
1
+ Metadata-Version: 2.4
2
+ Name: ember-browser
3
+ Version: 0.1.0
4
+ Summary: Open source, lightweight headless browser for AI agents. pip install ember-browser.
5
+ Author: Anda Usman, AndaLabX
6
+ License: AGPL-3.0
7
+ Project-URL: Homepage, https://github.com/andalabx/ember
8
+ Project-URL: Documentation, https://andalabx.com/ember
9
+ Project-URL: Source, https://github.com/andalabx/ember
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
13
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
14
+ Classifier: License :: OSI Approved :: GNU Affero General Public License v3
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Requires-Python: >=3.10
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Requires-Dist: trafilatura>=2.0.0
23
+ Requires-Dist: beautifulsoup4>=4.12
24
+ Requires-Dist: lxml>=5.0
25
+ Requires-Dist: httpx>=0.28
26
+ Requires-Dist: ddgs>=1.0
27
+ Requires-Dist: typer>=0.15
28
+ Requires-Dist: rich>=13.0
29
+ Requires-Dist: fastapi>=0.115
30
+ Requires-Dist: uvicorn[standard]>=0.34
31
+ Requires-Dist: pydantic>=2.0
32
+ Requires-Dist: pypdf>=4.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=8; extra == "dev"
35
+ Provides-Extra: mcp
36
+ Requires-Dist: fastmcp>=2.0; extra == "mcp"
37
+ Dynamic: license-file
38
+
39
+ <div align="center">
40
+
41
+ <pre>
42
+ ███████╗███╗ ███╗██████╗ ███████╗██████╗
43
+ ██╔════╝████╗ ████║██╔══██╗██╔════╝██╔══██╗
44
+ █████╗ ██╔████╔██║██████╔╝█████╗ ██████╔╝
45
+ ██╔══╝ ██║╚██╔╝██║██╔══██╗██╔══╝ ██╔══██╗
46
+ ███████╗██║ ╚═╝ ██║██████╔╝███████╗██║ ██║
47
+ ╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
48
+ </pre>
49
+
50
+ **Open source, lightweight headless browser for AI agents.**
51
+
52
+ [![PyPI](https://img.shields.io/pypi/v/ember-browser)](https://pypi.org/project/ember-browser/)
53
+ [![Python](https://img.shields.io/pypi/pyversions/ember-browser)](https://pypi.org/project/ember-browser/)
54
+ [![License: AGPL v3](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)
55
+
56
+ ```bash
57
+ pip install ember-browser
58
+ ```
59
+
60
+ *No Docker. No API key to start.*
61
+
62
+ </div>
63
+
64
+ ---
65
+
66
+ ## Why ember
67
+
68
+ Most web tools for agents ship with Chromium (641 MB) or require Docker just to get started. We needed something an agent could use on a VPS, a laptop, or a Raspberry Pi without thinking about it.
69
+
70
+ ember runs at ~17 MB idle. It decides whether a page needs a browser — you just pass it a URL.
71
+
72
+ | | ember | Crawl4AI |
73
+ |---------------------|--------------------|--------------------|
74
+ | Import footprint | ~54 MB | 171.8 MB |
75
+ | Browser binary | 20 MB (Lightpanda) | 641 MB (Chromium) |
76
+ | Scrape success rate | ~85% (trafilatura) / ~95%+ (+ Lightpanda) | 90% |
77
+ | Docker required | No | No |
78
+ | API key required | No | No |
79
+
80
+ ---
81
+
82
+ ## Quick start
83
+
84
+ ```bash
85
+ pip install ember-browser
86
+
87
+ ember # start the interactive session
88
+ ember url https://example.com # or run a one-shot command
89
+ ember serve # start the REST API
90
+ ```
91
+
92
+ ---
93
+
94
+ ## CLI
95
+
96
+ ### Interactive session
97
+
98
+ `ember` with no arguments opens a persistent session. Commands and a save guide are shown on startup — no need to type `help` first.
99
+
100
+ ```
101
+ ███████╗███╗ ███╗██████╗ ███████╗██████╗
102
+ ...
103
+ ╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
104
+
105
+ v0.1.0 lightweight headless browser for AI agents
106
+
107
+ url <url> scrape a page to markdown
108
+ search <query> web search
109
+ crawl <url> crawl a whole website
110
+ map <url> discover all URLs on a site
111
+ interact <url> control a browser with natural language
112
+ extract <url> pull structured data with an LLM
113
+ batch <urls.txt> scrape many URLs concurrently
114
+
115
+ ─── saving results ───────────────────────────────────────────
116
+ one result url example.com -o page.md
117
+ everything output ./research/ then all results auto-save
118
+ last result save page.md after any command
119
+
120
+ ember › url andausman.com
121
+ ember › save page.md
122
+
123
+ ember › output ./research/ # auto-save everything from here
124
+ ember/research › search "python asyncio" -n 10
125
+ ember/research › crawl docs.example.com
126
+ ember/research › output clear # stop auto-saving
127
+ ember › quit
128
+ ```
129
+
130
+ ### One-shot commands
131
+
132
+ Every command works standalone too:
133
+
134
+ ```bash
135
+ ember url https://example.com # scrape a page
136
+ ember search "AI agents python" -n 10 # web search
137
+ ember crawl https://docs.example.com --max-pages 20 # crawl a site
138
+ ember map https://example.com # discover all URLs
139
+ ember interact https://amazon.com \
140
+ --prompt "find a mechanical keyboard under $100"
141
+ ember extract https://example.com/pricing \
142
+ --prompt "list all plans and prices as JSON"
143
+ ```
144
+
145
+ ### Saving results
146
+
147
+ All commands accept `-o` to save that run:
148
+
149
+ ```bash
150
+ ember url https://example.com -o page.md
151
+ ember search "python" -o results.json
152
+ ember crawl https://docs.example.com -o ./pages/ # one .md per page
153
+ ember map https://example.com -o urls.txt
154
+ ember extract https://example.com -o data.json
155
+ ```
156
+
157
+ Set a default save directory so you never need `-o`:
158
+
159
+ ```bash
160
+ ember config --save-dir ./research/ # persists across sessions
161
+ ember config # show current settings
162
+ ember config --save-dir "" # clear it
163
+ ```
164
+
165
+ Or use an environment variable for the current shell:
166
+
167
+ ```bash
168
+ EMBER_SAVE_DIR=./out ember url https://example.com
169
+ ```
170
+
171
+ In a session, the three ways to save:
172
+
173
+ ```
174
+ ember › url example.com -o page.md # save just this run
175
+ ember › save page.md # save the last result
176
+ ember › output ./research/ # auto-save all results from now on
177
+ ```
178
+
179
+ ### Async batch scraping
180
+
181
+ ```bash
182
+ # urls.txt — one URL per line, # = comment
183
+ ember batch urls.txt # 5 concurrent by default
184
+ ember batch urls.txt -c 20 -o ./pages/ # 20 parallel, save to dir
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Python API
190
+
191
+ ```python
192
+ from emb.scrape import scrape_url, scrape_markdown
193
+ from emb.search import search
194
+ from emb.crawl import crawl
195
+ from emb.map import map_url
196
+
197
+ # Scrape a page → ScrapeResult
198
+ result = scrape_url("https://example.com")
199
+ print(result.markdown) # full page content as markdown
200
+ print(result.title) # page title
201
+ print(result.success) # True / False
202
+
203
+ # Just the markdown text
204
+ md = scrape_markdown("https://example.com")
205
+
206
+ # Crawl a site
207
+ result = crawl("https://docs.example.com", max_pages=20, max_depth=3)
208
+ for page in result.pages:
209
+ print(page.url, len(page.markdown))
210
+
211
+ # Discover URLs
212
+ result = map_url("https://example.com", max_links=100)
213
+ print(result.links) # list[str]
214
+
215
+ # Search the web
216
+ results = search("python asyncio tutorial", limit=5)
217
+ for r in results:
218
+ print(r.title, r.url)
219
+
220
+ # Browser interaction with natural language
221
+ from emb.interact import interact
222
+
223
+ result = interact("https://example.com", prompt="click the login button")
224
+ print(result.content) # what the agent did / saw
225
+
226
+ # LLM-powered structured extraction
227
+ from emb.agent import extract
228
+
229
+ data = extract("https://example.com/pricing", prompt="list all plans and prices")
230
+ print(data) # dict
231
+ ```
232
+
233
+ ### Async
234
+
235
+ ```python
236
+ import asyncio
237
+ from emb.scrape import scrape_url_async
238
+
239
+ async def main():
240
+ results = await asyncio.gather(
241
+ scrape_url_async("https://example.com"),
242
+ scrape_url_async("https://httpbin.org/get"),
243
+ )
244
+ for r in results:
245
+ print(r.url, r.success)
246
+
247
+ asyncio.run(main())
248
+ ```
249
+
250
+ ---
251
+
252
+ ## REST API
253
+
254
+ ```bash
255
+ ember serve # http://127.0.0.1:51251
256
+ ember serve --port 8080 # custom port
257
+
258
+ EMBER_API_KEY=your-secret ember serve # require auth
259
+ ```
260
+
261
+ ```bash
262
+ curl -X POST http://localhost:51251/scrape \
263
+ -H "Content-Type: application/json" \
264
+ -H "X-API-Key: your-secret" \
265
+ -d '{"url": "https://example.com"}'
266
+
267
+ curl -X POST http://localhost:51251/search \
268
+ -H "Content-Type: application/json" \
269
+ -d '{"query": "AI agents", "limit": 5}'
270
+
271
+ curl -X POST http://localhost:51251/crawl \
272
+ -H "Content-Type: application/json" \
273
+ -d '{"url": "https://docs.example.com", "max_pages": 10}'
274
+ ```
275
+
276
+ Endpoints: `/scrape` `/search` `/crawl` `/map` `/interact` `/extract` `/agent` `/health`
277
+
278
+ ---
279
+
280
+ ## MCP
281
+
282
+ ```json
283
+ {
284
+ "mcpServers": {
285
+ "ember": {
286
+ "command": "ember",
287
+ "args": ["mcp"]
288
+ }
289
+ }
290
+ }
291
+ ```
292
+
293
+ Works with Claude Code, Cursor, and any MCP-compatible host.
294
+
295
+ Available tools: `scrape`, `search_web`, `crawl_site`, `map_site`, `batch_scrape`, `interact_page`, `extract_data`.
296
+
297
+ ---
298
+
299
+ ## How it works
300
+
301
+ Not every page needs a browser. ember knows the difference.
302
+
303
+ **Tier 1 — trafilatura** handles ~90% of the web: blogs, news, documentation, Wikipedia. Pure HTTP, no browser process, no memory overhead.
304
+
305
+ **Tier 2 — Lightpanda** handles JavaScript-heavy pages, SPAs, and interactive content. It's a real browser engine written in Zig, built for machines rather than humans — 20 MB total. ember downloads and caches it automatically on first use, and only falls back to it when tier 1 produces thin content.
306
+
307
+ Most requests never reach the browser.
308
+
309
+ ### Memory footprint
310
+
311
+ | State | RAM |
312
+ |------------------------|---------|
313
+ | Idle | ~17 MB |
314
+ | Scraping a static page | ~20 MB |
315
+ | Running the browser | ~140 MB |
316
+
317
+ Firecrawl needs 4–8 GB in Docker. Crawl4AI imports at 171 MB before scraping anything. ember fits where your agent already runs.
318
+
319
+ ---
320
+
321
+ ## Environment variables
322
+
323
+ | Variable | Default | Description |
324
+ |---------------------------|--------------------------------|-------------|
325
+ | `EMBER_SAVE_DIR` | _(none)_ | Default directory for saved results. Overrides `ember config --save-dir` for the current shell. |
326
+ | `EMBER_API_KEY` | _(none)_ | Enables API key auth on the REST server (`X-API-Key` header). |
327
+ | `EMBER_PORT` | `51251` | Default port for `ember serve`. Overridden by `--port` flag. |
328
+ | `EMBER_INTERACT_PROVIDER` | `openai` | LLM provider for `interact` (`openai`, `anthropic`, `ollama`, etc.). |
329
+ | `EMBER_LLM_API_KEY` | _(none)_ | API key for LLM-powered extraction. |
330
+ | `EMBER_LLM_BASE_URL` | `https://api.openai.com/v1` | LLM API endpoint for extraction. |
331
+ | `EMBER_LLM_MODEL` | `gpt-4o-mini` | Model used by `extract`. |
332
+ | `EMBER_LIGHTPANDA_PATH` | _(auto)_ | Path to a custom Lightpanda binary. Skips auto-download if set. |
333
+
334
+ ---
335
+
336
+ ## License
337
+
338
+ [AGPL-3.0](LICENSE) — open source forever.
@@ -0,0 +1,300 @@
1
+ <div align="center">
2
+
3
+ <pre>
4
+ ███████╗███╗ ███╗██████╗ ███████╗██████╗
5
+ ██╔════╝████╗ ████║██╔══██╗██╔════╝██╔══██╗
6
+ █████╗ ██╔████╔██║██████╔╝█████╗ ██████╔╝
7
+ ██╔══╝ ██║╚██╔╝██║██╔══██╗██╔══╝ ██╔══██╗
8
+ ███████╗██║ ╚═╝ ██║██████╔╝███████╗██║ ██║
9
+ ╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
10
+ </pre>
11
+
12
+ **Open source, lightweight headless browser for AI agents.**
13
+
14
+ [![PyPI](https://img.shields.io/pypi/v/ember-browser)](https://pypi.org/project/ember-browser/)
15
+ [![Python](https://img.shields.io/pypi/pyversions/ember-browser)](https://pypi.org/project/ember-browser/)
16
+ [![License: AGPL v3](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)
17
+
18
+ ```bash
19
+ pip install ember-browser
20
+ ```
21
+
22
+ *No Docker. No API key to start.*
23
+
24
+ </div>
25
+
26
+ ---
27
+
28
+ ## Why ember
29
+
30
+ Most web tools for agents ship with Chromium (641 MB) or require Docker just to get started. We needed something an agent could use on a VPS, a laptop, or a Raspberry Pi without thinking about it.
31
+
32
+ ember runs at ~17 MB idle. It decides whether a page needs a browser — you just pass it a URL.
33
+
34
+ | | ember | Crawl4AI |
35
+ |---------------------|--------------------|--------------------|
36
+ | Import footprint | ~54 MB | 171.8 MB |
37
+ | Browser binary | 20 MB (Lightpanda) | 641 MB (Chromium) |
38
+ | Scrape success rate | ~85% (trafilatura) / ~95%+ (+ Lightpanda) | 90% |
39
+ | Docker required | No | No |
40
+ | API key required | No | No |
41
+
42
+ ---
43
+
44
+ ## Quick start
45
+
46
+ ```bash
47
+ pip install ember-browser
48
+
49
+ ember # start the interactive session
50
+ ember url https://example.com # or run a one-shot command
51
+ ember serve # start the REST API
52
+ ```
53
+
54
+ ---
55
+
56
+ ## CLI
57
+
58
+ ### Interactive session
59
+
60
+ `ember` with no arguments opens a persistent session. Commands and a save guide are shown on startup — no need to type `help` first.
61
+
62
+ ```
63
+ ███████╗███╗ ███╗██████╗ ███████╗██████╗
64
+ ...
65
+ ╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
66
+
67
+ v0.1.0 lightweight headless browser for AI agents
68
+
69
+ url <url> scrape a page to markdown
70
+ search <query> web search
71
+ crawl <url> crawl a whole website
72
+ map <url> discover all URLs on a site
73
+ interact <url> control a browser with natural language
74
+ extract <url> pull structured data with an LLM
75
+ batch <urls.txt> scrape many URLs concurrently
76
+
77
+ ─── saving results ───────────────────────────────────────────
78
+ one result url example.com -o page.md
79
+ everything output ./research/ then all results auto-save
80
+ last result save page.md after any command
81
+
82
+ ember › url andausman.com
83
+ ember › save page.md
84
+
85
+ ember › output ./research/ # auto-save everything from here
86
+ ember/research › search "python asyncio" -n 10
87
+ ember/research › crawl docs.example.com
88
+ ember/research › output clear # stop auto-saving
89
+ ember › quit
90
+ ```
91
+
92
+ ### One-shot commands
93
+
94
+ Every command works standalone too:
95
+
96
+ ```bash
97
+ ember url https://example.com # scrape a page
98
+ ember search "AI agents python" -n 10 # web search
99
+ ember crawl https://docs.example.com --max-pages 20 # crawl a site
100
+ ember map https://example.com # discover all URLs
101
+ ember interact https://amazon.com \
102
+ --prompt "find a mechanical keyboard under $100"
103
+ ember extract https://example.com/pricing \
104
+ --prompt "list all plans and prices as JSON"
105
+ ```
106
+
107
+ ### Saving results
108
+
109
+ All commands accept `-o` to save that run:
110
+
111
+ ```bash
112
+ ember url https://example.com -o page.md
113
+ ember search "python" -o results.json
114
+ ember crawl https://docs.example.com -o ./pages/ # one .md per page
115
+ ember map https://example.com -o urls.txt
116
+ ember extract https://example.com -o data.json
117
+ ```
118
+
119
+ Set a default save directory so you never need `-o`:
120
+
121
+ ```bash
122
+ ember config --save-dir ./research/ # persists across sessions
123
+ ember config # show current settings
124
+ ember config --save-dir "" # clear it
125
+ ```
126
+
127
+ Or use an environment variable for the current shell:
128
+
129
+ ```bash
130
+ EMBER_SAVE_DIR=./out ember url https://example.com
131
+ ```
132
+
133
+ In a session, the three ways to save:
134
+
135
+ ```
136
+ ember › url example.com -o page.md # save just this run
137
+ ember › save page.md # save the last result
138
+ ember › output ./research/ # auto-save all results from now on
139
+ ```
140
+
141
+ ### Async batch scraping
142
+
143
+ ```bash
144
+ # urls.txt — one URL per line, # = comment
145
+ ember batch urls.txt # 5 concurrent by default
146
+ ember batch urls.txt -c 20 -o ./pages/ # 20 parallel, save to dir
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Python API
152
+
153
+ ```python
154
+ from emb.scrape import scrape_url, scrape_markdown
155
+ from emb.search import search
156
+ from emb.crawl import crawl
157
+ from emb.map import map_url
158
+
159
+ # Scrape a page → ScrapeResult
160
+ result = scrape_url("https://example.com")
161
+ print(result.markdown) # full page content as markdown
162
+ print(result.title) # page title
163
+ print(result.success) # True / False
164
+
165
+ # Just the markdown text
166
+ md = scrape_markdown("https://example.com")
167
+
168
+ # Crawl a site
169
+ result = crawl("https://docs.example.com", max_pages=20, max_depth=3)
170
+ for page in result.pages:
171
+ print(page.url, len(page.markdown))
172
+
173
+ # Discover URLs
174
+ result = map_url("https://example.com", max_links=100)
175
+ print(result.links) # list[str]
176
+
177
+ # Search the web
178
+ results = search("python asyncio tutorial", limit=5)
179
+ for r in results:
180
+ print(r.title, r.url)
181
+
182
+ # Browser interaction with natural language
183
+ from emb.interact import interact
184
+
185
+ result = interact("https://example.com", prompt="click the login button")
186
+ print(result.content) # what the agent did / saw
187
+
188
+ # LLM-powered structured extraction
189
+ from emb.agent import extract
190
+
191
+ data = extract("https://example.com/pricing", prompt="list all plans and prices")
192
+ print(data) # dict
193
+ ```
194
+
195
+ ### Async
196
+
197
+ ```python
198
+ import asyncio
199
+ from emb.scrape import scrape_url_async
200
+
201
+ async def main():
202
+ results = await asyncio.gather(
203
+ scrape_url_async("https://example.com"),
204
+ scrape_url_async("https://httpbin.org/get"),
205
+ )
206
+ for r in results:
207
+ print(r.url, r.success)
208
+
209
+ asyncio.run(main())
210
+ ```
211
+
212
+ ---
213
+
214
+ ## REST API
215
+
216
+ ```bash
217
+ ember serve # http://127.0.0.1:51251
218
+ ember serve --port 8080 # custom port
219
+
220
+ EMBER_API_KEY=your-secret ember serve # require auth
221
+ ```
222
+
223
+ ```bash
224
+ curl -X POST http://localhost:51251/scrape \
225
+ -H "Content-Type: application/json" \
226
+ -H "X-API-Key: your-secret" \
227
+ -d '{"url": "https://example.com"}'
228
+
229
+ curl -X POST http://localhost:51251/search \
230
+ -H "Content-Type: application/json" \
231
+ -d '{"query": "AI agents", "limit": 5}'
232
+
233
+ curl -X POST http://localhost:51251/crawl \
234
+ -H "Content-Type: application/json" \
235
+ -d '{"url": "https://docs.example.com", "max_pages": 10}'
236
+ ```
237
+
238
+ Endpoints: `/scrape` `/search` `/crawl` `/map` `/interact` `/extract` `/agent` `/health`
239
+
240
+ ---
241
+
242
+ ## MCP
243
+
244
+ ```json
245
+ {
246
+ "mcpServers": {
247
+ "ember": {
248
+ "command": "ember",
249
+ "args": ["mcp"]
250
+ }
251
+ }
252
+ }
253
+ ```
254
+
255
+ Works with Claude Code, Cursor, and any MCP-compatible host.
256
+
257
+ Available tools: `scrape`, `search_web`, `crawl_site`, `map_site`, `batch_scrape`, `interact_page`, `extract_data`.
258
+
259
+ ---
260
+
261
+ ## How it works
262
+
263
+ Not every page needs a browser. ember knows the difference.
264
+
265
+ **Tier 1 — trafilatura** handles ~90% of the web: blogs, news, documentation, Wikipedia. Pure HTTP, no browser process, no memory overhead.
266
+
267
+ **Tier 2 — Lightpanda** handles JavaScript-heavy pages, SPAs, and interactive content. It's a real browser engine written in Zig, built for machines rather than humans — 20 MB total. ember downloads and caches it automatically on first use, and only falls back to it when tier 1 produces thin content.
268
+
269
+ Most requests never reach the browser.
270
+
271
+ ### Memory footprint
272
+
273
+ | State | RAM |
274
+ |------------------------|---------|
275
+ | Idle | ~17 MB |
276
+ | Scraping a static page | ~20 MB |
277
+ | Running the browser | ~140 MB |
278
+
279
+ Firecrawl needs 4–8 GB in Docker. Crawl4AI imports at 171 MB before scraping anything. ember fits where your agent already runs.
280
+
281
+ ---
282
+
283
+ ## Environment variables
284
+
285
+ | Variable | Default | Description |
286
+ |---------------------------|--------------------------------|-------------|
287
+ | `EMBER_SAVE_DIR` | _(none)_ | Default directory for saved results. Overrides `ember config --save-dir` for the current shell. |
288
+ | `EMBER_API_KEY` | _(none)_ | Enables API key auth on the REST server (`X-API-Key` header). |
289
+ | `EMBER_PORT` | `51251` | Default port for `ember serve`. Overridden by `--port` flag. |
290
+ | `EMBER_INTERACT_PROVIDER` | `openai` | LLM provider for `interact` (`openai`, `anthropic`, `ollama`, etc.). |
291
+ | `EMBER_LLM_API_KEY` | _(none)_ | API key for LLM-powered extraction. |
292
+ | `EMBER_LLM_BASE_URL` | `https://api.openai.com/v1` | LLM API endpoint for extraction. |
293
+ | `EMBER_LLM_MODEL` | `gpt-4o-mini` | Model used by `extract`. |
294
+ | `EMBER_LIGHTPANDA_PATH` | _(auto)_ | Path to a custom Lightpanda binary. Skips auto-download if set. |
295
+
296
+ ---
297
+
298
+ ## License
299
+
300
+ [AGPL-3.0](LICENSE) — open source forever.
@@ -0,0 +1,38 @@
1
+ """ember — open source, lightweight headless browser for AI agents."""
2
+
3
+ from __future__ import annotations
4
+
5
+ __version__ = "0.1.0"
6
+
7
+ # Lazily re-export the most-used public functions so `from emb import scrape_url`
8
+ # works without loading heavy dependencies at `import emb` time.
9
+ #
10
+ # Names that clash with a same-named submodule (search, crawl) can't be re-exported
11
+ # this way — Python returns the submodule before __getattr__ fires. Use the submodule
12
+ # form for those: `from emb.search import search`, `from emb.crawl import crawl`.
13
+
14
+ __all__ = [
15
+ "__version__",
16
+ "scrape_url",
17
+ "scrape_url_async",
18
+ "scrape_markdown",
19
+ "scrape_markdown_async",
20
+ "map_url",
21
+ ]
22
+
23
+ _LAZY: dict[str, str] = {
24
+ "scrape_url": "emb.scrape",
25
+ "scrape_url_async": "emb.scrape",
26
+ "scrape_markdown": "emb.scrape",
27
+ "scrape_markdown_async": "emb.scrape",
28
+ "map_url": "emb.map",
29
+ }
30
+
31
+
32
+ def __getattr__(name: str):
33
+ module_path = _LAZY.get(name)
34
+ if module_path is None:
35
+ raise AttributeError(f"module 'emb' has no attribute {name!r}")
36
+ import importlib
37
+ module = importlib.import_module(module_path)
38
+ return getattr(module, name)