dembrandt 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,37 +4,25 @@
4
4
  [![npm downloads](https://img.shields.io/npm/dm/dembrandt.svg)](https://www.npmjs.com/package/dembrandt)
5
5
  [![license](https://img.shields.io/npm/l/dembrandt.svg)](https://github.com/dembrandt/dembrandt/blob/main/LICENSE)
6
6
 
7
- Extract any websites design system into design tokens in a few seconds: logo, colors, typography, borders, and more. One command.
7
+ Extract a website's design system into design tokens in a few seconds: logo, colors, typography, borders, and more. One command.
8
8
 
9
9
  ![Dembrandt — Any website to design tokens](https://raw.githubusercontent.com/dembrandt/dembrandt/main/docs/images/banner.png)
10
10
 
11
- **CLI output**
12
-
13
- ![CLI extraction of netflix.com](https://raw.githubusercontent.com/dembrandt/dembrandt/main/docs/images/cli-output.png)
14
-
15
- **Brand Guide PDF**
16
-
17
- ![Brand guide PDF extracted from any URL](https://raw.githubusercontent.com/dembrandt/dembrandt/main/docs/images/brand-guide.png)
18
-
19
- **Local UI**
20
-
21
- ![Local UI showing extracted brand](https://raw.githubusercontent.com/dembrandt/dembrandt/main/docs/images/local-ui.png)
22
-
23
11
  ## Install
24
12
 
25
13
  Install globally: `npm install -g dembrandt`
26
14
 
27
15
  ```bash
28
- dembrandt bmw.de
16
+ dembrandt example.com
29
17
  ```
30
18
 
31
- Or use npx without installing: `npx dembrandt bmw.de`
19
+ Or use npx without installing: `npx dembrandt example.com`
32
20
 
33
21
  Requires Node.js 18+
34
22
 
35
23
  ## AI Agent Integration (MCP)
36
24
 
37
- Use Dembrandt as a tool in Claude Code, Cursor, Windsurf, or any MCP-compatible client. Ask your agent to "extract the color palette from stripe.com" and it calls Dembrandt automatically.
25
+ Use Dembrandt as a tool in Claude Code, Cursor, Windsurf, or any MCP-compatible client. Ask your agent to "extract the color palette from example.com" and it calls Dembrandt automatically.
38
26
 
39
27
  ```bash
40
28
  claude mcp add --transport stdio dembrandt -- npx -y dembrandt-mcp
@@ -69,44 +57,44 @@ Or add to your project's `.mcp.json`:
69
57
  ## Usage
70
58
 
71
59
  ```bash
72
- dembrandt <url> # Basic extraction (terminal display only)
73
- dembrandt bmw.de --json-only # Output raw JSON to terminal (no formatted display, no file save)
74
- dembrandt bmw.de --save-output # Save JSON to output/bmw.de/YYYY-MM-DDTHH-MM-SS.json
75
- dembrandt bmw.de --dtcg # Export in W3C Design Tokens (DTCG) format (auto-saves as .tokens.json)
76
- dembrandt bmw.de --dark-mode # Extract colors from dark mode variant
77
- dembrandt bmw.de --mobile # Use mobile viewport (390x844, iPhone 12/13/14/15) for responsive analysis
78
- dembrandt bmw.de --slow # 3x longer timeouts (24s hydration) for JavaScript-heavy sites
79
- dembrandt bmw.de --brand-guide # Generate a brand guide PDF
80
- dembrandt bmw.de --design-md # Generate a DESIGN.md file for AI agents
81
- dembrandt bmw.de --pages 5 # Analyze 5 pages (homepage + 4 discovered pages), merges results
82
- dembrandt bmw.de --sitemap # Discover pages from sitemap.xml instead of DOM links
83
- dembrandt bmw.de --pages 10 --sitemap # Combine: up to 10 pages discovered via sitemap
84
- dembrandt bmw.de --no-sandbox # Disable Chromium sandbox (required for Docker/CI)
85
- dembrandt bmw.de --browser=firefox # Use Firefox instead of Chromium (better for Cloudflare bypass)
60
+ dembrandt <url> # Basic extraction (terminal display only)
61
+ dembrandt example.com --json-only # Output raw JSON to terminal (no formatted display, no file save)
62
+ dembrandt example.com --save-output # Save JSON to output/example.com/YYYY-MM-DDTHH-MM-SS.json
63
+ dembrandt example.com --dtcg # Export in W3C Design Tokens (DTCG) format (auto-saves as .tokens.json)
64
+ dembrandt example.com --dark-mode # Extract colors from dark mode variant
65
+ dembrandt example.com --mobile # Use mobile viewport (390x844) for responsive analysis
66
+ dembrandt example.com --slow # 3x longer timeouts (24s hydration) for JavaScript-heavy sites
67
+ dembrandt example.com --brand-guide # Generate a brand guide PDF
68
+ dembrandt example.com --design-md # Generate a DESIGN.md file for AI agents
69
+ dembrandt example.com --pages 5 # Analyze 5 pages (homepage + 4 discovered pages), merges results
70
+ dembrandt example.com --sitemap # Discover pages from sitemap.xml instead of DOM links
71
+ dembrandt example.com --pages 10 --sitemap # Combine: up to 10 pages discovered via sitemap
72
+ dembrandt example.com --no-sandbox # Disable Chromium sandbox (required for Docker/CI)
73
+ dembrandt example.com --browser=firefox # Use Firefox instead of Chromium (better for Cloudflare bypass)
86
74
  ```
87
75
 
88
76
  Default: formatted terminal display only. Use `--save-output` to persist results as JSON files. Browser automatically retries in visible mode if headless extraction fails.
89
77
 
90
78
  ### Multi-Page Extraction
91
79
 
92
- Analyze multiple pages to get a more complete picture of a site's design system. Results are merged into a single unified output with cross-page confidence boosting — colors appearing on multiple pages get higher confidence scores.
80
+ Analyze multiple pages to get a more complete picture of a site's design system. Results are merged into a single unified output with cross-page confidence boosting — tokens appearing on multiple pages get higher confidence scores.
93
81
 
94
82
  ```bash
95
83
  # Analyze homepage + 4 auto-discovered pages (default: 5 total)
96
- dembrandt stripe.com --pages 5
84
+ dembrandt example.com --pages 5
97
85
 
98
86
  # Use sitemap.xml for page discovery instead of DOM link scraping
99
- dembrandt stripe.com --sitemap
87
+ dembrandt example.com --sitemap
100
88
 
101
89
  # Combine both: up to 10 pages from sitemap
102
- dembrandt stripe.com --pages 10 --sitemap
90
+ dembrandt example.com --pages 10 --sitemap
103
91
  ```
104
92
 
105
93
  **Page discovery** works two ways:
106
- - **DOM links** (default): Scrapes navigation, header, and footer links from the homepage, prioritizing key pages like /pricing, /about, /features
94
+ - **DOM links** (default): Reads navigation, header, and footer links from the homepage, prioritizing key pages like /pricing, /about, /features
107
95
  - **Sitemap** (`--sitemap`): Parses sitemap.xml (checks robots.txt first), follows sitemapindex references, and scores URLs by importance
108
96
 
109
- Pages are crawled sequentially with polite delays. Failed pages are skipped without aborting the run.
97
+ Pages are fetched sequentially with polite delays. Failed pages are skipped without aborting the run.
110
98
 
111
99
  ### Browser Selection
112
100
 
@@ -114,10 +102,10 @@ By default, dembrandt uses Chromium. If you encounter bot detection or timeouts
114
102
 
115
103
  ```bash
116
104
  # Use Firefox instead of Chromium
117
- dembrandt bmw.de --browser=firefox
105
+ dembrandt example.com --browser=firefox
118
106
 
119
107
  # Combine with other flags
120
- dembrandt bmw.de --browser=firefox --save-output --dtcg
108
+ dembrandt example.com --browser=firefox --save-output --dtcg
121
109
  ```
122
110
 
123
111
  **When to use Firefox:**
@@ -137,24 +125,33 @@ npx playwright install firefox
137
125
  Use `--dtcg` to export in the standardized [W3C Design Tokens Community Group](https://www.designtokens.org/) format:
138
126
 
139
127
  ```bash
140
- dembrandt stripe.com --dtcg
141
- # Saves to: output/stripe.com/TIMESTAMP.tokens.json
128
+ dembrandt example.com --dtcg
129
+ # Saves to: output/example.com/TIMESTAMP.tokens.json
142
130
  ```
143
131
 
144
132
  The DTCG format is an industry-standard JSON schema that can be consumed by design tools and token transformation libraries like [Style Dictionary](https://styledictionary.com).
145
133
 
146
134
  ### DESIGN.md
147
135
 
148
- Use `--design-md` to generate a [DESIGN.md](https://stitch.withgoogle.com/docs/design-md) file — a plain-text design system document readable by AI agents like Google Stitch.
136
+ Use `--design-md` to generate a [DESIGN.md](https://stitch.withgoogle.com/docs/design-md) file — a plain-text design system document readable by AI agents.
149
137
 
150
138
  ```bash
151
- dembrandt stripe.com --design-md
152
- # Saves to: output/stripe.com/DESIGN.md
139
+ dembrandt example.com --design-md
140
+ # Saves to: output/example.com/DESIGN.md
141
+ ```
142
+
143
+ ### Brand Guide PDF
144
+
145
+ Use `--brand-guide` to generate a printable PDF summarizing the extracted design system — colors, typography, components, and logo on a single document.
146
+
147
+ ```bash
148
+ dembrandt example.com --brand-guide
149
+ # Saves to: output/example.com/TIMESTAMP.brand-guide.pdf
153
150
  ```
154
151
 
155
152
  ## Local UI
156
153
 
157
- Browse your extracted brands in a visual interface.
154
+ Browse your extractions in a visual interface.
158
155
 
159
156
  ### Setup
160
157
 
@@ -173,7 +170,7 @@ Opens http://localhost:5173 with API on port 3002.
173
170
 
174
171
  ### Features
175
172
 
176
- - Visual grid of all extracted brands
173
+ - Visual grid of all extractions
177
174
  - Color palettes with click-to-copy
178
175
  - Typography specimens
179
176
  - Spacing, shadows, border radius visualization
@@ -185,14 +182,14 @@ Extractions are performed via CLI (`dembrandt <url> --save-output`) and automati
185
182
 
186
183
  ## Use Cases
187
184
 
188
- - Brand audits & competitive analysis
189
185
  - Design system documentation
190
- - Reverse engineering brands
191
- - Multi-site brand consolidation
186
+ - Multi-site design consolidation
187
+ - Internal design audits on your own properties
188
+ - Learning how design tokens map to real CSS
192
189
 
193
190
  ## How It Works
194
191
 
195
- Uses Playwright to render the page, extracts computed styles from the DOM, analyzes color usage and confidence, groups similar typography, detects spacing patterns, and returns actionable design tokens.
192
+ Uses Playwright to render the page, reads computed styles from the DOM, analyzes color usage and confidence, groups similar typography, detects spacing patterns, and returns design tokens.
196
193
 
197
194
  ### Extraction Process
198
195
 
@@ -207,35 +204,33 @@ Uses Playwright to render the page, extracts computed styles from the DOM, analy
207
204
 
208
205
  ### Color Confidence
209
206
 
210
- - High — Logo, brand elements, primary buttons
211
- - Medium — Interactive elements, icons, navigation
207
+ - High — Logo, primary interactive elements
208
+ - Medium — Secondary interactive elements, icons, navigation
212
209
  - Low — Generic UI components (filtered from display)
213
210
  - Only shows high and medium confidence colors in terminal. Full palette in JSON.
214
211
 
215
212
  ## Limitations
216
213
 
217
- - Dark mode requires --dark-mode flag (not automatically detected)
214
+ - Dark mode requires `--dark-mode` flag (not automatically detected)
218
215
  - Hover/focus states extracted from CSS (not fully interactive)
219
- - Canvas/WebGL-rendered sites cannot be analyzed (e.g., Tesla, Apple Vision Pro demos)
216
+ - Canvas/WebGL-rendered sites cannot be analyzed (no DOM to read)
220
217
  - JavaScript-heavy sites require hydration time (8s initial + 4s stabilization)
221
218
  - Some dynamically-loaded content may be missed
222
- - Default viewport is 1920x1080 (use --mobile for 390x844 iPhone viewport)
219
+ - Default viewport is 1920x1080 (use `--mobile` for 390x844 mobile viewport)
223
220
 
224
- ## Ethics & Legality
221
+ ## Intended Use
225
222
 
226
- Dembrandt extracts publicly available design information (colors, fonts, spacing) from website DOMs for analysis purposes. This falls under fair use in most jurisdictions (USA's DMCA § 1201(f), EU Software Directive 2009/24/EC) when used for competitive analysis, documentation, or learning.
223
+ Dembrandt reads publicly available CSS and computed styles from website DOMs for documentation, learning, and analysis of design systems you own or have permission to analyze.
227
224
 
228
- Legal: Analyzing public HTML/CSS is generally legal. Does not bypass protections or violate copyright. Check site ToS before mass extraction.
225
+ Only run Dembrandt against sites whose Terms of Service permit automated access, or against your own properties. Do not use extracted material to reproduce third-party brand identities, logos, or trademarks. Respect robots.txt, rate limits, and copyright.
229
226
 
230
- Ethical: Use for inspiration and analysis, not direct copying. Respect servers (no mass crawling), give credit to sources, be transparent about data origin.
227
+ Dembrandt does not host, redistribute, or claim rights to any third-party brand assets.
231
228
 
232
229
  ## Contributing
233
230
 
234
- Bugs you found? Weird websites that make it cry? Pull requests (even one-liners make me happy)?
235
-
236
- Spam me in [Issues](https://github.com/dembrandt/dembrandt/issues) or PRs. I reply to everything.
231
+ Bugs, weird sites, pull requests all welcome.
237
232
 
238
- Let's keep the light alive together.
233
+ Open an [Issue](https://github.com/dembrandt/dembrandt/issues) or PR.
239
234
 
240
235
  @thevangelist
241
236
 
package/index.js CHANGED
@@ -20,11 +20,12 @@ import { parseSitemap } from "./lib/discovery.js";
20
20
  import { mergeResults } from "./lib/merger.js";
21
21
  import { writeFileSync, mkdirSync } from "fs";
22
22
  import { join } from "path";
23
+ import { checkRobotsTxt } from "./lib/robots.js";
23
24
 
24
25
  program
25
26
  .name("dembrandt")
26
27
  .description("Extract design tokens from any website")
27
- .version("0.10.0")
28
+ .version("0.11.0")
28
29
  .argument("<url>")
29
30
  .option("--browser <type>", "Browser to use (chromium|firefox)", "chromium")
30
31
  .option("--json-only", "Output raw JSON")
@@ -57,6 +58,21 @@ program
57
58
  }
58
59
 
59
60
  const spinner = ora({ text: "Starting extraction...", stream: opts.jsonOnly ? process.stderr : process.stdout }).start();
61
+
62
+ try {
63
+ const robots = await checkRobotsTxt(url);
64
+ if (robots.status === "ok" && robots.allowed === false) {
65
+ spinner.warn(
66
+ chalk.hex("#FFB86C")(
67
+ `robots.txt disallows this path (rule: "${robots.rule}"). Proceeding anyway — respect the site's terms.`
68
+ )
69
+ );
70
+ spinner.start("Starting extraction...");
71
+ }
72
+ } catch {
73
+ // robots check is advisory; never block extraction
74
+ }
75
+
60
76
  let browser = null;
61
77
 
62
78
  try {
@@ -101,8 +117,7 @@ program
101
117
  let additionalUrls;
102
118
  if (opts.sitemap) {
103
119
  // Try post-redirect URL first, fall back to user-provided URL
104
- // (sites like spotify.com redirect browser to open.spotify.com
105
- // but sitemap lives at www.spotify.com)
120
+ // (some sites redirect to a subdomain while the sitemap stays on www)
106
121
  additionalUrls = await parseSitemap(result.url, maxPages);
107
122
  if (additionalUrls.length === 0 && result.url !== url) {
108
123
  additionalUrls = await parseSitemap(url, maxPages);
package/lib/extractors.js CHANGED
@@ -131,7 +131,7 @@ export async function extractBranding(
131
131
  timeouts.push('Body content rendering');
132
132
  }
133
133
 
134
- // Give SPAs time to hydrate (Linear, Figma, Notion, etc.)
134
+ // Give SPAs time to hydrate
135
135
  spinner.start("Waiting for SPA hydration...");
136
136
  const hydrationTime = 8000 * timeoutMultiplier;
137
137
  await page.waitForTimeout(hydrationTime);
@@ -627,7 +627,7 @@ export async function extractBranding(
627
627
  frameworks,
628
628
  };
629
629
 
630
- // Detect canvas-only / WebGL sites (Tesla, Apple Vision Pro, etc.)
630
+ // Detect canvas-only / WebGL sites
631
631
  const isCanvasOnly = await page.evaluate(() => {
632
632
  const canvases = document.querySelectorAll("canvas");
633
633
  const hasRealContent = document.body.textContent.trim().length > 200;
@@ -641,7 +641,7 @@ export async function extractBranding(
641
641
 
642
642
  if (isCanvasOnly) {
643
643
  result.note =
644
- "This website uses canvas/WebGL rendering (e.g. Tesla, Apple Vision Pro). Design system cannot be extracted from DOM.";
644
+ "This website uses canvas/WebGL rendering. Design system cannot be extracted from DOM.";
645
645
  result.isCanvasOnly = true;
646
646
  }
647
647
 
package/lib/robots.js ADDED
@@ -0,0 +1,101 @@
1
+ const UA = "Dembrandt";
2
+
3
+ export async function checkRobotsTxt(targetUrl, { timeoutMs = 5000 } = {}) {
4
+ const u = new URL(targetUrl);
5
+ const robotsUrl = `${u.protocol}//${u.host}/robots.txt`;
6
+ const path = u.pathname || "/";
7
+
8
+ const controller = new AbortController();
9
+ const timer = setTimeout(() => controller.abort(), timeoutMs);
10
+
11
+ let body;
12
+ try {
13
+ const res = await fetch(robotsUrl, {
14
+ signal: controller.signal,
15
+ headers: { "User-Agent": UA },
16
+ });
17
+ if (!res.ok) return { status: "unavailable", robotsUrl };
18
+ body = await res.text();
19
+ } catch {
20
+ return { status: "unavailable", robotsUrl };
21
+ } finally {
22
+ clearTimeout(timer);
23
+ }
24
+
25
+ const groups = parseRobots(body);
26
+ const rules = matchGroup(groups, UA) || matchGroup(groups, "*") || [];
27
+ const decision = evaluate(rules, path);
28
+
29
+ return { status: "ok", robotsUrl, ...decision };
30
+ }
31
+
32
+ function parseRobots(text) {
33
+ const groups = [];
34
+ let current = null;
35
+ let lastWasAgent = false;
36
+
37
+ for (const raw of text.split(/\r?\n/)) {
38
+ const line = raw.replace(/#.*$/, "").trim();
39
+ if (!line) continue;
40
+ const idx = line.indexOf(":");
41
+ if (idx === -1) continue;
42
+ const field = line.slice(0, idx).trim().toLowerCase();
43
+ const value = line.slice(idx + 1).trim();
44
+
45
+ if (field === "user-agent") {
46
+ if (!current || !lastWasAgent) {
47
+ current = { agents: [], rules: [] };
48
+ groups.push(current);
49
+ }
50
+ current.agents.push(value.toLowerCase());
51
+ lastWasAgent = true;
52
+ } else if (field === "allow" || field === "disallow") {
53
+ if (!current) {
54
+ current = { agents: ["*"], rules: [] };
55
+ groups.push(current);
56
+ }
57
+ current.rules.push({ type: field, value });
58
+ lastWasAgent = false;
59
+ }
60
+ }
61
+ return groups;
62
+ }
63
+
64
+ function matchGroup(groups, agent) {
65
+ const wanted = agent.toLowerCase();
66
+ for (const g of groups) {
67
+ if (g.agents.includes(wanted)) return g.rules;
68
+ }
69
+ return null;
70
+ }
71
+
72
+ function evaluate(rules, path) {
73
+ let best = { type: null, length: -1, value: "" };
74
+ for (const r of rules) {
75
+ if (!r.value) continue;
76
+ if (!pathMatches(path, r.value)) continue;
77
+ if (r.value.length > best.length) best = { ...r, length: r.value.length };
78
+ }
79
+ if (best.type === "disallow") return { allowed: false, rule: best.value };
80
+ return { allowed: true, rule: best.value || null };
81
+ }
82
+
83
+ function pathMatches(path, pattern) {
84
+ const anchored = pattern.endsWith("$");
85
+ const p = anchored ? pattern.slice(0, -1) : pattern;
86
+ const parts = p.split("*");
87
+ let i = 0;
88
+ for (let k = 0; k < parts.length; k++) {
89
+ const seg = parts[k];
90
+ if (k === 0) {
91
+ if (!path.startsWith(seg)) return false;
92
+ i = seg.length;
93
+ } else {
94
+ const found = path.indexOf(seg, i);
95
+ if (found === -1) return false;
96
+ i = found + seg.length;
97
+ }
98
+ }
99
+ if (anchored && i !== path.length) return false;
100
+ return true;
101
+ }
package/mcp-server.js CHANGED
@@ -114,7 +114,7 @@ function toolHandler(pick, extraOptions = {}) {
114
114
 
115
115
  // ── Shared params ──────────────────────────────────────────────────────
116
116
 
117
- const url = z.string().describe("Website URL (e.g. stripe.com)");
117
+ const url = z.string().describe("Website URL (e.g. example.com)");
118
118
  const slow = z.boolean().optional().default(false).describe("3x timeouts for heavy SPAs");
119
119
 
120
120
  // ── Tools ──────────────────────────────────────────────────────────────
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "dembrandt",
3
- "version": "0.10.0",
4
- "description": "Extract design tokens and brand assets from any website",
3
+ "version": "0.11.0",
4
+ "description": "Extract design tokens and publicly visible CSS information from any website",
5
5
  "mcpName": "io.github.dembrandt/dembrandt",
6
6
  "main": "index.js",
7
7
  "type": "module",
@@ -16,8 +16,6 @@
16
16
  ],
17
17
  "scripts": {
18
18
  "start": "node index.js",
19
- "brand-challenge": "node run-no-login-challenge.mjs",
20
- "brand-challenge:report": "node run-no-login-challenge.mjs || true",
21
19
  "install-browser": "npx playwright install chromium firefox || echo 'Playwright browser installation failed. You may need to install system dependencies manually.'",
22
20
  "local-ui": "cd local-ui && npm start",
23
21
  "qa:baseline": "node test/qa.mjs --baseline",
@@ -28,10 +26,9 @@
28
26
  "design-tokens",
29
27
  "design-system",
30
28
  "branding",
31
- "web-scraping",
29
+ "css-analysis",
32
30
  "cli",
33
- "playwright",
34
- "extraction"
31
+ "playwright"
35
32
  ],
36
33
  "repository": {
37
34
  "type": "git",
@@ -56,4 +53,4 @@
56
53
  "engines": {
57
54
  "node": ">=18.0.0"
58
55
  }
59
- }
56
+ }