npm - mdrip - Versions diffs - 0.1.3 → 0.1.5 - Mend

mdrip 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,216 +1,205 @@
 # mdrip
-Fetch markdown snapshots of web pages using Cloudflare's Markdown for Agents feature, so coding agents can consume clean structured content instead of HTML.
+Fetch clean markdown snapshots of any web page — optimized for AI agents, RAG pipelines, and context-aware workflows.
-## AI Skills
+Reduces token overhead by ~90% compared to raw HTML while preserving the content structure LLMs need.
+## Why
-This repo also includes an AI-consumable skills catalog in `skills/`, following the [agentskills](https://agentskills.io) format.
+AI agents and LLMs work better with markdown than HTML. Feeding raw HTML into a context window wastes tokens on tags, scripts, styles, and boilerplate. mdrip solves this by fetching any URL and returning clean, structured markdown.
-- Skill index: `skills/README.md`
-- mdrip skill: `skills/mdrip/SKILL.md`
+- **~90% fewer tokens** than raw HTML
+- **Automatic HTML-to-markdown fallback** when native markdown isn't available
+- **Works everywhere** — CLI, Node.js, Cloudflare Workers, or via remote MCP
+- **Token-aware** — reports estimated token counts so you can manage context budgets
-### Install skills from this repo
+Sites that support [Cloudflare's Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) return markdown natively at the edge. For all other sites, mdrip's built-in converter handles headings, links, lists, code blocks, tables, blockquotes, and more.
-If you use a Skills-compatible agent setup, you can add these skills directly:
+## Installation
 ```bash
-# install skills from this repo
-npx skills add charl-kruger/mdrip
+npm install -g mdrip
 ```
-## Why
+Or use directly with `npx`:
+```bash
+npx mdrip <url>
+```
-For agent workflows, markdown is often better than HTML:
-- cleaner structure
-- lower token overhead
-- easier chunking and context management
+## CLI Usage
-`mdrip` requests pages with `Accept: text/markdown`, stores the markdown locally, and tracks fetched pages in an index.
+### Fetch pages
-If a site does not return `text/markdown`, `mdrip` can automatically fall back to converting `text/html` into markdown.
-The fallback uses an in-project converter optimized for common documentation/blog content (headings, links, lists, code blocks, tables, blockquotes).
+```bash
+# Fetch one page
+mdrip https://example.com/docs/getting-started
-## Why Cloudflare Markdown for Agents matters
+# Fetch multiple pages
+mdrip https://example.com/docs https://example.com/api
-Cloudflare's blog and docs describe Markdown for Agents as content negotiation at the edge:
-- clients request `Accept: text/markdown`
-- Cloudflare converts HTML to markdown in real time (for enabled zones)
-- response includes `x-markdown-tokens` for token-size awareness
+# Custom timeout (ms)
+mdrip https://example.com --timeout 45000
-For AI workflows this is high-value:
-- better structure for LLM parsing than raw HTML
-- less token waste in context windows
-- predictable markdown snapshots you can store and reuse in your repo
+# Strict mode — only accept native markdown, no HTML fallback
+mdrip https://example.com --no-html-fallback
-References:
-- [Cloudflare blog: Markdown for Agents](https://blog.cloudflare.com/markdown-for-agents/)
-- [Cloudflare docs: Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/)
+# Raw mode — print markdown to stdout, no file writes
+mdrip https://example.com --raw
+```
-## Installation
+### List fetched pages
 ```bash
-npm install -g mdrip
+mdrip list
+mdrip list --json
 ```
-Or use with `npx`:
+### Remove pages
 ```bash
-npx mdrip <url>
+mdrip remove https://example.com/docs/getting-started
 ```
-For programmatic usage in Node.js or Workers:
+### Clean snapshots
 ```bash
-npm install mdrip
-```
+# Remove all
+mdrip clean
-## Programmatic API
+# Remove only one domain
+mdrip clean --domain example.com
+```
-### Node.js (fetch and store)
+### Raw mode for agent runtimes
-```ts
-import { fetchToStore, listStoredPages } from "mdrip/node";
+`--raw` prints markdown to stdout and skips all file writes and prompts. Useful for piping content directly into agent loops.
-const result = await fetchToStore("https://developers.cloudflare.com/", {
-  cwd: process.cwd(),
-});
+```bash
+mdrip https://example.com --raw | your-agent-cli
+```
-if (!result.success) {
-  throw new Error(result.error || "Failed to fetch page");
-}
+## Programmatic API
-const pages = await listStoredPages(process.cwd());
-console.log(pages.map((p) => p.path));
+```bash
+npm install mdrip
 ```
-### Cloudflare Workers / Agent runtimes (raw in-memory markdown)
+### Workers / Edge / In-memory
 ```ts
 import { fetchMarkdown } from "mdrip";
-const page = await fetchMarkdown(
-  "https://blog.cloudflare.com/markdown-for-agents/",
-);
+const page = await fetchMarkdown("https://example.com/docs");
-console.log(page.markdownTokens);
-console.log(page.markdown);
+console.log(page.markdown);       // clean markdown content
+console.log(page.markdownTokens); // estimated token count
+console.log(page.source);         // "cloudflare-markdown" or "html-fallback"
 ```
-Available programmatic methods:
-- `mdrip` (Workers-safe): `fetchMarkdown(url, options)`, `fetchRawMarkdown(url, options)`
-- `mdrip/node` (filesystem features): `fetchToStore(url, options)`, `fetchManyToStore(urls, options)`, `listStoredPages(cwd?)`
+### Node.js (fetch and store to disk)
-## Usage
-### Fetch pages
-```bash
-# Fetch one page
-mdrip https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/
-# Fetch multiple pages
-mdrip https://blog.cloudflare.com/markdown-for-agents/ https://developers.cloudflare.com/
+```ts
+import { fetchToStore, listStoredPages } from "mdrip/node";
-# Optional timeout override (ms)
-mdrip https://example.com --timeout 45000
+const result = await fetchToStore("https://example.com/docs", {
+  cwd: process.cwd(),
+});
-# Disable HTML fallback (strict Cloudflare markdown only)
-mdrip https://example.com --no-html-fallback
+if (result.success) {
+  console.log(`Saved to ${result.path}`);
+}
-# Print raw page markdown to stdout (no files/settings changes, no prompts)
-mdrip https://blog.cloudflare.com/markdown-for-agents/ --raw
+const pages = await listStoredPages(process.cwd());
 ```
-### Raw mode for agents (OpenClaw, etc.)
+### Available exports
-`--raw` is designed for agent runtimes that only need in-memory content.
-It prints markdown to stdout and skips settings prompts and all file writes.
+| Import | Environment | Functions |
+|--------|-------------|-----------|
+| `mdrip` | Workers, edge, browser | `fetchMarkdown()`, `fetchRawMarkdown()` |
+| `mdrip/node` | Node.js | `fetchToStore()`, `fetchManyToStore()`, `listStoredPages()` |
-This is useful for flows with OpenClaw and similar AI tools where you want to pipe page content directly into your agent loop.
+## Remote MCP Server
-```bash
-# stream markdown directly to another process
-mdrip https://blog.cloudflare.com/markdown-for-agents/ --raw
-```
+mdrip is available as a remote MCP server at **`mdrip.createmcp.dev`** — no install required. Any MCP-compatible client can connect and use the `fetch_markdown` and `batch_fetch_markdown` tools.
-### List fetched pages
+### Claude Desktop
-```bash
-mdrip list
-mdrip list --json
+Add to `claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "mdrip": {
+      "command": "npx",
+      "args": ["mcp-remote", "https://mdrip.createmcp.dev/mcp"]
+    }
+  }
+}
 ```
-### Remove pages
+### Claude Code
 ```bash
-mdrip remove https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/
+claude mcp add mdrip-remote --transport sse https://mdrip.createmcp.dev/sse
 ```
-### Clean snapshots
+### Cloudflare AI Playground
-```bash
-# Remove all
-mdrip clean
-# Remove only one domain
-mdrip clean --domain developers.cloudflare.com
-```
+Enter `mdrip.createmcp.dev/sse` at [playground.ai.cloudflare.com](https://playground.ai.cloudflare.com/).
 ## File modifications
 On first run, mdrip can optionally update:
-- `.gitignore` (adds `mdrip/`)
-- `tsconfig.json` (excludes `mdrip`)
-- `AGENTS.md` (adds a section pointing agents to snapshots)
+- `.gitignore` — adds `mdrip/`
+- `tsconfig.json` — excludes `mdrip/`
+- `AGENTS.md` — adds a section pointing agents to your snapshots
-Choice is stored in `mdrip/settings.json`.
+Choice is stored in `mdrip/settings.json`. Use `--modify` or `--modify=false` to skip the prompt.
-Use flags to skip prompt:
+`--raw` mode bypasses this entirely.
-```bash
-# allow updates
-mdrip https://example.com --modify
+## Output structure
-# deny updates
-mdrip https://example.com --modify=false
 ```
-`--raw` mode bypasses this entire flow and never writes settings or snapshots.
-## Output
-```text
 mdrip/
 ├── settings.json
 ├── sources.json
 └── pages/
-    └── developers.cloudflare.com/
-        └── fundamentals/
-            └── reference/
-                └── markdown-for-agents/
-                    └── index.md
+    └── example.com/
+        └── docs/
+            └── getting-started/
+                └── index.md
 ```
-## Requirements and notes
+## Benchmark
-- Node.js 18+
-- The target site must return markdown for `Accept: text/markdown` (Cloudflare Markdown for Agents enabled).
-- If a page does not return `text/markdown`, mdrip can convert `text/html` into markdown fallback unless `--no-html-fallback` is used.
+Measured across popular pages (values vary as pages change):
-## Publishing to npm
+| Page | Mode | Chars saved | Tokens saved |
+|------|------|------------:|-------------:|
+| blog.cloudflare.com/markdown-for-agents | cloudflare-markdown | 94.9% | 94.9% |
+| developers.cloudflare.com/.../markdown-for-agents | cloudflare-markdown | 95.7% | 95.7% |
+| en.wikipedia.org/wiki/Markdown | html-fallback | 72.7% | 72.7% |
+| github.com/cloudflare/skills | html-fallback | 96.3% | 96.3% |
+| **Average** | | **89.9%** | **89.9%** |
 ```bash
-# optional package check
-pnpm publish:dry-run
+pnpm build && pnpm benchmark
+```
+## AI Skills
-# publish to npm
-pnpm publish:npm
+This repo includes an AI-consumable skills catalog in `skills/`, following the [agentskills](https://agentskills.io) format.
+```bash
+npx skills add charl-kruger/mdrip
 ```
-`prepublishOnly` runs automatically before publish and executes:
-- `pnpm type-check`
-- `pnpm test`
-- `pnpm build`
+## Requirements
+- Node.js 18+
 ## Author

package/dist/index.js CHANGED Viewed

@@ -8,7 +8,7 @@ const program = new Command();
 program
     .name("mdrip")
     .description("Fetch markdown snapshots for URLs using Cloudflare Markdown for Agents")
-    .version("0.1.3")
+    .version("0.1.4")
     .option("--cwd <path>", "working directory (default: current directory)");
 program
     .argument("[urls...]", "URLs to fetch as markdown")

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mdrip",
-  "version": "0.1.3",
+  "version": "0.1.5",
   "description": "Fetch markdown snapshots of web pages using Cloudflare Markdown for Agents",
   "type": "module",
   "main": "./dist/web.js",
@@ -38,6 +38,7 @@
     "build": "tsc",
     "dev": "tsc --watch",
     "start": "node dist/index.js",
+    "benchmark": "node scripts/benchmark.mjs",
     "test": "vitest run",
     "test:watch": "vitest",
     "test:coverage": "vitest run --coverage",