mdrip 0.1.3 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +118 -129
- package/dist/index.js +1 -1
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -1,216 +1,205 @@
|
|
|
1
1
|
# mdrip
|
|
2
2
|
|
|
3
|
-
Fetch markdown snapshots of web
|
|
3
|
+
Fetch clean markdown snapshots of any web page — optimized for AI agents, RAG pipelines, and context-aware workflows.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Reduces token overhead by ~90% compared to raw HTML while preserving the content structure LLMs need.
|
|
6
|
+
|
|
7
|
+
## Why
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
AI agents and LLMs work better with markdown than HTML. Feeding raw HTML into a context window wastes tokens on tags, scripts, styles, and boilerplate. mdrip solves this by fetching any URL and returning clean, structured markdown.
|
|
8
10
|
|
|
9
|
-
-
|
|
10
|
-
-
|
|
11
|
+
- **~90% fewer tokens** than raw HTML
|
|
12
|
+
- **Automatic HTML-to-markdown fallback** when native markdown isn't available
|
|
13
|
+
- **Works everywhere** — CLI, Node.js, Cloudflare Workers, or via remote MCP
|
|
14
|
+
- **Token-aware** — reports estimated token counts so you can manage context budgets
|
|
11
15
|
|
|
12
|
-
|
|
16
|
+
Sites that support [Cloudflare's Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) return markdown natively at the edge. For all other sites, mdrip's built-in converter handles headings, links, lists, code blocks, tables, blockquotes, and more.
|
|
13
17
|
|
|
14
|
-
|
|
18
|
+
## Installation
|
|
15
19
|
|
|
16
20
|
```bash
|
|
17
|
-
|
|
18
|
-
npx skills add charl-kruger/mdrip
|
|
21
|
+
npm install -g mdrip
|
|
19
22
|
```
|
|
20
23
|
|
|
21
|
-
|
|
24
|
+
Or use directly with `npx`:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npx mdrip <url>
|
|
28
|
+
```
|
|
22
29
|
|
|
23
|
-
|
|
24
|
-
- cleaner structure
|
|
25
|
-
- lower token overhead
|
|
26
|
-
- easier chunking and context management
|
|
30
|
+
## CLI Usage
|
|
27
31
|
|
|
28
|
-
|
|
32
|
+
### Fetch pages
|
|
29
33
|
|
|
30
|
-
|
|
31
|
-
|
|
34
|
+
```bash
|
|
35
|
+
# Fetch one page
|
|
36
|
+
mdrip https://example.com/docs/getting-started
|
|
32
37
|
|
|
33
|
-
|
|
38
|
+
# Fetch multiple pages
|
|
39
|
+
mdrip https://example.com/docs https://example.com/api
|
|
34
40
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
- Cloudflare converts HTML to markdown in real time (for enabled zones)
|
|
38
|
-
- response includes `x-markdown-tokens` for token-size awareness
|
|
41
|
+
# Custom timeout (ms)
|
|
42
|
+
mdrip https://example.com --timeout 45000
|
|
39
43
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
- less token waste in context windows
|
|
43
|
-
- predictable markdown snapshots you can store and reuse in your repo
|
|
44
|
+
# Strict mode — only accept native markdown, no HTML fallback
|
|
45
|
+
mdrip https://example.com --no-html-fallback
|
|
44
46
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
47
|
+
# Raw mode — print markdown to stdout, no file writes
|
|
48
|
+
mdrip https://example.com --raw
|
|
49
|
+
```
|
|
48
50
|
|
|
49
|
-
|
|
51
|
+
### List fetched pages
|
|
50
52
|
|
|
51
53
|
```bash
|
|
52
|
-
|
|
54
|
+
mdrip list
|
|
55
|
+
mdrip list --json
|
|
53
56
|
```
|
|
54
57
|
|
|
55
|
-
|
|
58
|
+
### Remove pages
|
|
56
59
|
|
|
57
60
|
```bash
|
|
58
|
-
|
|
61
|
+
mdrip remove https://example.com/docs/getting-started
|
|
59
62
|
```
|
|
60
63
|
|
|
61
|
-
|
|
64
|
+
### Clean snapshots
|
|
62
65
|
|
|
63
66
|
```bash
|
|
64
|
-
|
|
65
|
-
|
|
67
|
+
# Remove all
|
|
68
|
+
mdrip clean
|
|
66
69
|
|
|
67
|
-
|
|
70
|
+
# Remove only one domain
|
|
71
|
+
mdrip clean --domain example.com
|
|
72
|
+
```
|
|
68
73
|
|
|
69
|
-
###
|
|
74
|
+
### Raw mode for agent runtimes
|
|
70
75
|
|
|
71
|
-
|
|
72
|
-
import { fetchToStore, listStoredPages } from "mdrip/node";
|
|
76
|
+
`--raw` prints markdown to stdout and skips all file writes and prompts. Useful for piping content directly into agent loops.
|
|
73
77
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
78
|
+
```bash
|
|
79
|
+
mdrip https://example.com --raw | your-agent-cli
|
|
80
|
+
```
|
|
77
81
|
|
|
78
|
-
|
|
79
|
-
throw new Error(result.error || "Failed to fetch page");
|
|
80
|
-
}
|
|
82
|
+
## Programmatic API
|
|
81
83
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
+
```bash
|
|
85
|
+
npm install mdrip
|
|
84
86
|
```
|
|
85
87
|
|
|
86
|
-
###
|
|
88
|
+
### Workers / Edge / In-memory
|
|
87
89
|
|
|
88
90
|
```ts
|
|
89
91
|
import { fetchMarkdown } from "mdrip";
|
|
90
92
|
|
|
91
|
-
const page = await fetchMarkdown(
|
|
92
|
-
"https://blog.cloudflare.com/markdown-for-agents/",
|
|
93
|
-
);
|
|
93
|
+
const page = await fetchMarkdown("https://example.com/docs");
|
|
94
94
|
|
|
95
|
-
console.log(page.
|
|
96
|
-
console.log(page.
|
|
95
|
+
console.log(page.markdown); // clean markdown content
|
|
96
|
+
console.log(page.markdownTokens); // estimated token count
|
|
97
|
+
console.log(page.source); // "cloudflare-markdown" or "html-fallback"
|
|
97
98
|
```
|
|
98
99
|
|
|
99
|
-
|
|
100
|
-
- `mdrip` (Workers-safe): `fetchMarkdown(url, options)`, `fetchRawMarkdown(url, options)`
|
|
101
|
-
- `mdrip/node` (filesystem features): `fetchToStore(url, options)`, `fetchManyToStore(urls, options)`, `listStoredPages(cwd?)`
|
|
100
|
+
### Node.js (fetch and store to disk)
|
|
102
101
|
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
### Fetch pages
|
|
106
|
-
|
|
107
|
-
```bash
|
|
108
|
-
# Fetch one page
|
|
109
|
-
mdrip https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/
|
|
110
|
-
|
|
111
|
-
# Fetch multiple pages
|
|
112
|
-
mdrip https://blog.cloudflare.com/markdown-for-agents/ https://developers.cloudflare.com/
|
|
102
|
+
```ts
|
|
103
|
+
import { fetchToStore, listStoredPages } from "mdrip/node";
|
|
113
104
|
|
|
114
|
-
|
|
115
|
-
|
|
105
|
+
const result = await fetchToStore("https://example.com/docs", {
|
|
106
|
+
cwd: process.cwd(),
|
|
107
|
+
});
|
|
116
108
|
|
|
117
|
-
|
|
118
|
-
|
|
109
|
+
if (result.success) {
|
|
110
|
+
console.log(`Saved to ${result.path}`);
|
|
111
|
+
}
|
|
119
112
|
|
|
120
|
-
|
|
121
|
-
mdrip https://blog.cloudflare.com/markdown-for-agents/ --raw
|
|
113
|
+
const pages = await listStoredPages(process.cwd());
|
|
122
114
|
```
|
|
123
115
|
|
|
124
|
-
###
|
|
116
|
+
### Available exports
|
|
125
117
|
|
|
126
|
-
|
|
127
|
-
|
|
118
|
+
| Import | Environment | Functions |
|
|
119
|
+
|--------|-------------|-----------|
|
|
120
|
+
| `mdrip` | Workers, edge, browser | `fetchMarkdown()`, `fetchRawMarkdown()` |
|
|
121
|
+
| `mdrip/node` | Node.js | `fetchToStore()`, `fetchManyToStore()`, `listStoredPages()` |
|
|
128
122
|
|
|
129
|
-
|
|
123
|
+
## Remote MCP Server
|
|
130
124
|
|
|
131
|
-
|
|
132
|
-
# stream markdown directly to another process
|
|
133
|
-
mdrip https://blog.cloudflare.com/markdown-for-agents/ --raw
|
|
134
|
-
```
|
|
125
|
+
mdrip is available as a remote MCP server at **`mdrip.createmcp.dev`** — no install required. Any MCP-compatible client can connect and use the `fetch_markdown` and `batch_fetch_markdown` tools.
|
|
135
126
|
|
|
136
|
-
###
|
|
127
|
+
### Claude Desktop
|
|
137
128
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
129
|
+
Add to `claude_desktop_config.json`:
|
|
130
|
+
|
|
131
|
+
```json
|
|
132
|
+
{
|
|
133
|
+
"mcpServers": {
|
|
134
|
+
"mdrip": {
|
|
135
|
+
"command": "npx",
|
|
136
|
+
"args": ["mcp-remote", "https://mdrip.createmcp.dev/mcp"]
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
}
|
|
141
140
|
```
|
|
142
141
|
|
|
143
|
-
###
|
|
142
|
+
### Claude Code
|
|
144
143
|
|
|
145
144
|
```bash
|
|
146
|
-
mdrip
|
|
145
|
+
claude mcp add mdrip-remote --transport sse https://mdrip.createmcp.dev/sse
|
|
147
146
|
```
|
|
148
147
|
|
|
149
|
-
###
|
|
148
|
+
### Cloudflare AI Playground
|
|
150
149
|
|
|
151
|
-
|
|
152
|
-
# Remove all
|
|
153
|
-
mdrip clean
|
|
154
|
-
|
|
155
|
-
# Remove only one domain
|
|
156
|
-
mdrip clean --domain developers.cloudflare.com
|
|
157
|
-
```
|
|
150
|
+
Enter `mdrip.createmcp.dev/sse` at [playground.ai.cloudflare.com](https://playground.ai.cloudflare.com/).
|
|
158
151
|
|
|
159
152
|
## File modifications
|
|
160
153
|
|
|
161
154
|
On first run, mdrip can optionally update:
|
|
162
|
-
- `.gitignore`
|
|
163
|
-
- `tsconfig.json`
|
|
164
|
-
- `AGENTS.md`
|
|
155
|
+
- `.gitignore` — adds `mdrip/`
|
|
156
|
+
- `tsconfig.json` — excludes `mdrip/`
|
|
157
|
+
- `AGENTS.md` — adds a section pointing agents to your snapshots
|
|
165
158
|
|
|
166
|
-
Choice is stored in `mdrip/settings.json`.
|
|
159
|
+
Choice is stored in `mdrip/settings.json`. Use `--modify` or `--modify=false` to skip the prompt.
|
|
167
160
|
|
|
168
|
-
|
|
161
|
+
`--raw` mode bypasses this entirely.
|
|
169
162
|
|
|
170
|
-
|
|
171
|
-
# allow updates
|
|
172
|
-
mdrip https://example.com --modify
|
|
163
|
+
## Output structure
|
|
173
164
|
|
|
174
|
-
# deny updates
|
|
175
|
-
mdrip https://example.com --modify=false
|
|
176
165
|
```
|
|
177
|
-
|
|
178
|
-
`--raw` mode bypasses this entire flow and never writes settings or snapshots.
|
|
179
|
-
|
|
180
|
-
## Output
|
|
181
|
-
|
|
182
|
-
```text
|
|
183
166
|
mdrip/
|
|
184
167
|
├── settings.json
|
|
185
168
|
├── sources.json
|
|
186
169
|
└── pages/
|
|
187
|
-
└──
|
|
188
|
-
└──
|
|
189
|
-
└──
|
|
190
|
-
└──
|
|
191
|
-
└── index.md
|
|
170
|
+
└── example.com/
|
|
171
|
+
└── docs/
|
|
172
|
+
└── getting-started/
|
|
173
|
+
└── index.md
|
|
192
174
|
```
|
|
193
175
|
|
|
194
|
-
##
|
|
176
|
+
## Benchmark
|
|
195
177
|
|
|
196
|
-
|
|
197
|
-
- The target site must return markdown for `Accept: text/markdown` (Cloudflare Markdown for Agents enabled).
|
|
198
|
-
- If a page does not return `text/markdown`, mdrip can convert `text/html` into markdown fallback unless `--no-html-fallback` is used.
|
|
178
|
+
Measured across popular pages (values vary as pages change):
|
|
199
179
|
|
|
200
|
-
|
|
180
|
+
| Page | Mode | Chars saved | Tokens saved |
|
|
181
|
+
|------|------|------------:|-------------:|
|
|
182
|
+
| blog.cloudflare.com/markdown-for-agents | cloudflare-markdown | 94.9% | 94.9% |
|
|
183
|
+
| developers.cloudflare.com/.../markdown-for-agents | cloudflare-markdown | 95.7% | 95.7% |
|
|
184
|
+
| en.wikipedia.org/wiki/Markdown | html-fallback | 72.7% | 72.7% |
|
|
185
|
+
| github.com/cloudflare/skills | html-fallback | 96.3% | 96.3% |
|
|
186
|
+
| **Average** | | **89.9%** | **89.9%** |
|
|
201
187
|
|
|
202
188
|
```bash
|
|
203
|
-
|
|
204
|
-
|
|
189
|
+
pnpm build && pnpm benchmark
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
## AI Skills
|
|
205
193
|
|
|
206
|
-
|
|
207
|
-
|
|
194
|
+
This repo includes an AI-consumable skills catalog in `skills/`, following the [agentskills](https://agentskills.io) format.
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
npx skills add charl-kruger/mdrip
|
|
208
198
|
```
|
|
209
199
|
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
-
|
|
213
|
-
- `pnpm build`
|
|
200
|
+
## Requirements
|
|
201
|
+
|
|
202
|
+
- Node.js 18+
|
|
214
203
|
|
|
215
204
|
## Author
|
|
216
205
|
|
package/dist/index.js
CHANGED
|
@@ -8,7 +8,7 @@ const program = new Command();
|
|
|
8
8
|
program
|
|
9
9
|
.name("mdrip")
|
|
10
10
|
.description("Fetch markdown snapshots for URLs using Cloudflare Markdown for Agents")
|
|
11
|
-
.version("0.1.
|
|
11
|
+
.version("0.1.4")
|
|
12
12
|
.option("--cwd <path>", "working directory (default: current directory)");
|
|
13
13
|
program
|
|
14
14
|
.argument("[urls...]", "URLs to fetch as markdown")
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "mdrip",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.5",
|
|
4
4
|
"description": "Fetch markdown snapshots of web pages using Cloudflare Markdown for Agents",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/web.js",
|
|
@@ -38,6 +38,7 @@
|
|
|
38
38
|
"build": "tsc",
|
|
39
39
|
"dev": "tsc --watch",
|
|
40
40
|
"start": "node dist/index.js",
|
|
41
|
+
"benchmark": "node scripts/benchmark.mjs",
|
|
41
42
|
"test": "vitest run",
|
|
42
43
|
"test:watch": "vitest",
|
|
43
44
|
"test:coverage": "vitest run --coverage",
|