crawlio-browser 1.6.2 → 1.6.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  // src/shared/constants.ts
2
2
  import { homedir } from "os";
3
3
  import { join } from "path";
4
- var PKG_VERSION = "1.6.2";
4
+ var PKG_VERSION = "1.6.4";
5
5
  var WS_PORT = 9333;
6
6
  var WS_PORT_MAX = 9342;
7
7
  var WS_HOST = "127.0.0.1";
@@ -9,7 +9,7 @@ import {
9
9
  WS_PORT_MAX,
10
10
  WS_RECONNECT_GRACE,
11
11
  WS_STALE_THRESHOLD
12
- } from "./chunk-T4GKS2PG.js";
12
+ } from "./chunk-LOOYHD6I.js";
13
13
 
14
14
  // src/mcp-server/index.ts
15
15
  import { randomBytes as randomBytes3 } from "crypto";
@@ -8758,7 +8758,7 @@ function getMaxOutput() {
8758
8758
  process.title = "Crawlio Agent";
8759
8759
  var initMode = process.argv.includes("init") || process.argv.includes("--setup") || process.argv.includes("setup");
8760
8760
  if (initMode) {
8761
- const { runInit } = await import("./init-PFND5ZFY.js");
8761
+ const { runInit } = await import("./init-UJD7YE3X.js");
8762
8762
  await runInit(process.argv.slice(2));
8763
8763
  process.exit(0);
8764
8764
  }
@@ -1,6 +1,6 @@
1
1
  import {
2
2
  PKG_VERSION
3
- } from "./chunk-T4GKS2PG.js";
3
+ } from "./chunk-LOOYHD6I.js";
4
4
 
5
5
  // src/mcp-server/init.ts
6
6
  import { execFileSync, spawn } from "child_process";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "crawlio-browser",
3
- "version": "1.6.2",
3
+ "version": "1.6.4",
4
4
  "description": "MCP server with 114 CDP-backed tools for browser automation — screenshots, DOM, network capture, framework detection, cookies, storage, session recording, structured data extraction, tracking analysis, SEO auditing, technographic fingerprinting, performance metrics via Chrome",
5
5
  "type": "module",
6
6
  "main": "dist/mcp-server/index.js",
@@ -1,101 +1,111 @@
1
1
  ---
2
2
  name: clone
3
- description: "Clone a site capture design tokens, component tree, assets, and compile a replayable skill"
4
- allowed-tools: Agent, Read, Write, Bash, Glob, Grep
5
- argument-hint: <url>
3
+ description: "Capture a site's design system colors, typography, spacing, layout, components as structured findings"
4
+ allowed-tools: mcp__crawlio-browser__search, mcp__crawlio-browser__execute, mcp__crawlio-browser__connect_tab
6
5
  ---
7
6
 
8
- # Clone Investigation
9
-
10
- You are running a **clone** investigation. Your goal is to capture the design system, component structure, and assets of a target URL, then compile the investigation into a replayable skill.
11
-
12
- ## Loop Definition
13
-
14
- Read `loops/clone.json` to understand the phase sequence. The clone loop has 5 phases:
15
-
16
- 1. **crawl** — Spawn `crawlio-crawler` to capture the target URL. Record the `EVIDENCE_ID`.
17
- 2. **analyze** — Spawn `crawlio-analyzer` with the crawl evidence ID. Identifies framework, rendering mode, component patterns.
18
- 3. **extract-design** — Spawn `crawlio-extractor` with the crawl evidence ID and `what: "design"`. Extracts design tokens (colors, typography, spacing, breakpoints).
19
- 4. **compile** (optional) — Spawn `crawlio-recorder` to compile the investigation into a replayable SKILL.md.
20
- 5. **synthesize** — Spawn `crawlio-synthesizer` with all phase evidence to produce the final `CloneBlueprint`.
21
-
22
- ## Execution
23
-
24
- 1. Read `loops/clone.json` to confirm phase order.
25
- 2. Parse the user's argument: `<url>`.
26
- 3. Spawn `crawlio-crawler` to capture the page:
27
- ```
28
- Crawl <url> and write PageEvidence to .crawlio/evidence/.
29
- ```
30
- Record `EVIDENCE_ID=<crawlId>`.
31
-
32
- 4. Spawn `crawlio-analyzer` with the crawl evidence:
33
- ```
34
- Read PageEvidence from .crawlio/evidence/<crawlId>.json.
35
- Analyze framework, rendering mode, and component patterns.
36
- Write FrameworkEvidence to .crawlio/evidence/.
37
- Target URL: <url>
38
- ```
39
- Record `EVIDENCE_ID=<analyzeId>`.
40
-
41
- 5. Spawn `crawlio-extractor` for design token extraction:
42
- ```
43
- Read PageEvidence from .crawlio/evidence/<crawlId>.json.
44
- Extract "design" data — colors, typography, spacing, breakpoints.
45
- Write DesignTokens evidence to .crawlio/evidence/.
46
- Target URL: <url>
47
- ```
48
- Record `EVIDENCE_ID=<designId>`.
49
-
50
- 6. Spawn `crawlio-recorder` to compile the investigation:
51
- ```
52
- Read evidence chain: <crawlId>, <analyzeId>, <designId>.
53
- Compile into a replayable SKILL.md.
54
- ```
55
- Record the skill path.
56
-
57
- 7. Spawn `crawlio-synthesizer` to produce the CloneBlueprint:
58
- ```
59
- Read all evidence: <crawlId>, <analyzeId>, <designId>.
60
- Produce a CloneBlueprint with design tokens, component tree, assets, and compiled skill path.
61
- Write to .crawlio/evidence/.
62
- Target URL: <url>
63
- ```
64
- Record `EVIDENCE_ID=<blueprintId>`.
65
-
66
- 8. Read the CloneBlueprint evidence and summarize results for the user.
67
-
68
- ## Output Format
69
-
7
+ # Clone — Design System Extraction
8
+
9
+ Capture the visual DNA of a page: design tokens, typography scale, spacing system, component patterns, and CSS framework.
10
+
11
+ ## When to Use
12
+
13
+ - Reproducing or referencing another site's design system
14
+ - Extracting CSS custom properties and design tokens
15
+ - Identifying typography, color palette, spacing conventions
16
+ - Determining what CSS framework or UI library is in use
17
+
18
+ ## Protocol
19
+
20
+ 1. **search** for the right commands: `search("design tokens extract CSS")` or `search("detect technologies")`
21
+ 2. **connect_tab** to the target URL (or use an already-connected tab)
22
+ 3. **execute** Code Mode with smart.* methods to extract evidence
23
+ 4. Emit one `smart.finding()` per design dimension discovered
24
+ 5. Return `smart.findings()` as the final output
25
+
26
+ ## Code Example
27
+
28
+ ```js
29
+ const page = await smart.extractPage();
30
+ const tech = await smart.detectTechnologies();
31
+
32
+ // Extract CSS custom properties (design tokens)
33
+ const tokens = await smart.evaluate(`(() => {
34
+ const styles = getComputedStyle(document.documentElement);
35
+ const props = {};
36
+ for (const name of [...document.styleSheets].flatMap(s => {
37
+ try { return [...s.cssRules] } catch { return [] }
38
+ }).filter(r => r.style).flatMap(r => [...r.style]).filter(p => p.startsWith('--'))) {
39
+ props[name] = styles.getPropertyValue(name).trim();
40
+ }
41
+ return { count: Object.keys(props).length, sample: Object.entries(props).slice(0, 20) };
42
+ })()`);
43
+
44
+ smart.finding({
45
+ claim: `Site uses ${tokens.result.count} CSS custom properties`,
46
+ evidence: tokens.result.sample.map(([k, v]) => `${k}: ${v}`),
47
+ sourceUrl: page.capture.url,
48
+ confidence: "high",
49
+ method: "evaluate + extractPage",
50
+ dimension: "design-system"
51
+ });
52
+
53
+ // Typography
54
+ if (page.fonts?.length) {
55
+ smart.finding({
56
+ claim: `${page.fonts.length} font families loaded`,
57
+ evidence: page.fonts.map(f => f.name || f),
58
+ sourceUrl: page.capture.url,
59
+ confidence: "high",
60
+ method: "extractPage",
61
+ dimension: "typography"
62
+ });
63
+ }
64
+
65
+ // Framework / UI library
66
+ const frameworks = tech.technologies?.map(t => t.name) || [];
67
+ if (frameworks.length) {
68
+ smart.finding({
69
+ claim: `Detected CSS/UI frameworks: ${frameworks.join(", ")}`,
70
+ evidence: frameworks,
71
+ sourceUrl: page.capture.url,
72
+ confidence: "high",
73
+ method: "detectTechnologies",
74
+ dimension: "technology"
75
+ });
76
+ }
77
+
78
+ // Repeating UI patterns (tables / grids)
79
+ const tables = await smart.detectTables();
80
+ if (tables.length) {
81
+ smart.finding({
82
+ claim: `${tables.length} repeating UI patterns detected`,
83
+ evidence: tables.map(t => `${t.selector}: ${t.rowCount} rows, ${t.columns.length} cols`),
84
+ sourceUrl: page.capture.url,
85
+ confidence: "high",
86
+ method: "detectTables",
87
+ dimension: "layout"
88
+ });
89
+ }
90
+
91
+ // Visual reference
92
+ await smart.scrollCapture();
93
+
94
+ return { findings: smart.findings(), fonts: page.fonts, tech: frameworks };
70
95
  ```
71
- ## Clone: <url>
72
-
73
- ### Design Tokens
74
- - Colors: [count] tokens extracted
75
- - Typography: [count] font stacks
76
- - Spacing: [count] spacing values
77
- - Breakpoints: [count] responsive breakpoints
78
96
 
79
- ### Component Tree
80
- - Root: <root component>
81
- - Components: [count] total
82
- - Types: [breakdown by type]
97
+ ## Anti-Patterns
83
98
 
84
- ### Assets
85
- - [count] total assets ([breakdown by type])
99
+ - Do NOT use `smart.screenshot()` — use `smart.scrollCapture()` for full-page visual reference
100
+ - Do NOT use `sleep()` loops to wait for styles — styles are available immediately after page load
101
+ - Do NOT use `location.href` — use `page.capture.url` from extractPage
102
+ - Always `search()` first if unsure which command extracts what you need
86
103
 
87
- ### Compiled Skill
88
- - Path: <skill path or "not compiled">
104
+ ## Output
89
105
 
90
- ### Evidence Chain
91
- - Crawler: <crawlId> (quality: ...)
92
- - Analyzer: <analyzeId> (quality: ...)
93
- - Design: <designId> (quality: ...)
94
- - Blueprint: <blueprintId> (quality: ...)
106
+ The skill produces `Finding[]` via `smart.findings()`. Each finding is tagged with a dimension:
95
107
 
96
- ### Coverage Gaps
97
- - [Any gaps from the investigation]
98
-
99
- ### Confidence
100
- - Overall: high/medium/low
101
- ```
108
+ - **design-system** — CSS custom properties, design tokens
109
+ - **typography** font families, type scale, weights
110
+ - **layout** — grid systems, repeating patterns, spacing
111
+ - **technology** — CSS framework, UI library, build tooling
@@ -1,102 +1,106 @@
1
1
  ---
2
2
  name: compare
3
- description: "Compare two URLs side-by-side across 10 typed dimensions"
4
- allowed-tools: Agent, Read, Write, Bash, Glob, Grep
5
- argument-hint: <urlA> <urlB>
3
+ description: "Side-by-side comparison of two websites across 11 dimensions. Produces Finding[] evidence per dimension."
4
+ allowed-tools: mcp__crawlio-browser__search, mcp__crawlio-browser__execute, mcp__crawlio-browser__connect_tab
6
5
  ---
7
6
 
8
- # Compare Investigation
9
-
10
- You are running a **compare** investigation. Your goal is to capture two URLs, analyze their frameworks, and produce a `ComparisonReport` with typed findings across 10 dimensions.
11
-
12
- ## The 10 Dimensions
13
-
14
- | # | Dimension | What It Measures |
15
- |---|-----------|------------------|
16
- | 1 | Framework | Technology stack, versions, SSR mode |
17
- | 2 | Performance | Web Vitals, load metrics, bottlenecks |
18
- | 3 | Security | TLS, headers, cookies, mixed content |
19
- | 4 | SEO | Meta tags, structured data, heading hierarchy |
20
- | 5 | Accessibility | ARIA, semantic HTML, keyboard nav, contrast |
21
- | 6 | Error Surface | Console errors, network failures, JS exceptions |
22
- | 7 | Third-Party Load | External scripts, tracking, CDN, SDK risk |
23
- | 8 | Architecture | SSR vs CSR, routing, data fetching, state management |
24
- | 9 | Content Delivery | Caching, compression, asset optimization |
25
- | 10 | Mobile Readiness | Viewport, responsive signals, device emulation |
26
-
27
- ## Loop Definition
28
-
29
- Read `loops/compare.json` to understand the phase sequence. The compare loop has 6 phases:
30
-
31
- 1. **crawl-a** — Spawn `crawlio-crawler` to capture URL A. Record the `EVIDENCE_ID`.
32
- 2. **crawl-b** — Spawn `crawlio-crawler` to capture URL B. Record the `EVIDENCE_ID`.
33
- 3. **analyze-a** (optional) — Spawn `crawlio-analyzer` with crawl-a evidence to identify frameworks.
34
- 4. **analyze-b** (optional) — Spawn `crawlio-analyzer` with crawl-b evidence to identify frameworks.
35
- 5. **compare** — Spawn `crawlio-comparator` with all evidence IDs. It reads both URLs' evidence, compares across 10 dimensions, and writes an `EvidenceEnvelope<ComparisonReport>`.
36
- 6. **synthesize** (optional) — Spawn `crawlio-synthesizer` if a full blueprint is useful.
37
-
38
- ## Execution
39
-
40
- 1. Read `loops/compare.json` to confirm phase order.
41
- 2. Parse the user's arguments: `<urlA>` and `<urlB>`.
42
- 3. Spawn `crawlio-crawler` for URL A:
43
- ```
44
- Crawl <urlA> and write PageEvidence to .crawlio/evidence/.
45
- ```
46
- Record `EVIDENCE_ID=<crawlAId>`.
47
-
48
- 4. Spawn `crawlio-crawler` for URL B:
49
- ```
50
- Crawl <urlB> and write PageEvidence to .crawlio/evidence/.
51
- ```
52
- Record `EVIDENCE_ID=<crawlBId>`.
53
-
54
- 5. Spawn `crawlio-analyzer` for URL A (optional):
55
- ```
56
- Analyze page evidence <crawlAId> for <urlA>. Read from .crawlio/evidence/. Write FrameworkEvidence to .crawlio/evidence/.
57
- ```
58
- Record `EVIDENCE_ID=<analyzeAId>`.
59
-
60
- 6. Spawn `crawlio-analyzer` for URL B (optional):
61
- ```
62
- Analyze page evidence <crawlBId> for <urlB>. Read from .crawlio/evidence/. Write FrameworkEvidence to .crawlio/evidence/.
63
- ```
64
- Record `EVIDENCE_ID=<analyzeBId>`.
65
-
66
- 7. Spawn `crawlio-comparator` with all evidence:
67
- ```
68
- Compare URL A (<urlA>) against URL B (<urlB>).
69
- Evidence IDs — crawl-a: <crawlAId>, crawl-b: <crawlBId>, analyze-a: <analyzeAId>, analyze-b: <analyzeBId>.
70
- Read all evidence from .crawlio/evidence/. Write EvidenceEnvelope<ComparisonReport> to .crawlio/evidence/.
71
- ```
72
- Record `EVIDENCE_ID=<compareId>`.
73
-
74
- 8. Read the ComparisonReport evidence and summarize for the user.
75
-
76
- ## Output Format
7
+ # Compare
8
+
9
+ Side-by-side comparison of two websites across 11 typed dimensions. One call captures both sites and returns a scaffold with per-dimension comparability. Produces one finding per dimension.
10
+
11
+ ## When to Use
12
+
13
+ - Compare two competing sites (framework, performance, security)
14
+ - Audit staging vs production
15
+ - Benchmark a site against a competitor across all 11 dimensions
16
+ - Identify gaps where one site excels and the other falls short
17
+
18
+ ## The 11 Dimensions
19
+
20
+ framework, performance, security, seo, accessibility, error-surface, third-party-load, architecture, content-delivery, mobile-readiness, data-structure.
21
+
22
+ ## Protocol
23
+
24
+ **Acquire -> Normalize -> Analyze** with Evidence Mode.
25
+
26
+ ### 1. Connect
77
27
 
78
28
  ```
79
- ## Compare: <urlA> vs <urlB>
29
+ connect_tab({ url: "https://site-a.com" })
30
+ ```
80
31
 
81
- ### Winner: <A|B|Tie|Inconclusive>
82
- <winnerReason>
32
+ `comparePages` handles navigation to both sites internally.
83
33
 
84
- ### Dimension Results
85
- | Dimension | Verdict | Confidence | Key Differences |
86
- |-----------|---------|------------|-----------------|
87
- | [per-dimension rows] |
34
+ ### 2. Acquire + Normalize
88
35
 
89
- ### Summary
90
- - Total differences: N
91
- - Critical differences: N
36
+ ```js
37
+ const comparison = await smart.comparePages(
38
+ "https://site-a.com",
39
+ "https://site-b.com"
40
+ );
41
+ // comparison.siteA / siteB — full PageEvidence (capture, performance, security, etc.)
42
+ // comparison.scaffold.dimensions[] — 11 objects: { name, comparable, siteA.status, siteB.status }
43
+ // comparison.scaffold.sharedFields / missingFields
44
+ // comparison.siteA.gaps[] / siteB.gaps[] — what failed per site
45
+ ```
92
46
 
93
- ### Evidence Chain
94
- - Crawl A: <crawlAId> (quality: ...)
95
- - Crawl B: <crawlBId> (quality: ...)
96
- - Analyze A: <analyzeAId> (quality: ...)
97
- - Analyze B: <analyzeBId> (quality: ...)
98
- - Compare: <compareId> (quality: ...)
47
+ ### 3. Analyze — produce findings
48
+
49
+ Walk the scaffold. One finding per dimension.
50
+
51
+ ```js
52
+ for (const dim of comparison.scaffold.dimensions) {
53
+ if (!dim.comparable) {
54
+ smart.finding({
55
+ claim: `${dim.name}: not comparable — data missing`,
56
+ evidence: [`siteA: ${dim.siteA.status}`, `siteB: ${dim.siteB.status}`],
57
+ sourceUrl: comparison.siteA.capture?.url || "unknown",
58
+ confidence: "low", method: "comparePages", dimension: dim.name
59
+ });
60
+ continue;
61
+ }
62
+ smart.finding({
63
+ claim: `${dim.name}: both sites present — ready for comparison`,
64
+ evidence: [`siteA: ${dim.siteA.status}`, `siteB: ${dim.siteB.status}`],
65
+ sourceUrl: comparison.siteA.capture?.url || "unknown",
66
+ confidence: "high", method: "comparePages", dimension: dim.name
67
+ });
68
+ }
69
+ ```
99
70
 
100
- ### Confidence
101
- - Overall: high/medium/low
71
+ Drill into specific dimensions using raw PageEvidence:
72
+
73
+ ```js
74
+ // Performance drill-down
75
+ if (comparison.siteA.performance && comparison.siteB.performance) {
76
+ const lcpA = comparison.siteA.performance.webVitals?.lcp;
77
+ const lcpB = comparison.siteB.performance.webVitals?.lcp;
78
+ if (lcpA && lcpB) {
79
+ const faster = lcpA < lcpB ? "A" : "B";
80
+ smart.finding({
81
+ claim: `Site ${faster} loads ${Math.abs(lcpA - lcpB)}ms faster (LCP)`,
82
+ evidence: [`siteA LCP: ${lcpA}ms`, `siteB LCP: ${lcpB}ms`],
83
+ sourceUrl: comparison.siteA.capture.url, confidence: "high",
84
+ method: "comparePages", dimension: "performance"
85
+ });
86
+ }
87
+ }
88
+
89
+ return {
90
+ findings: smart.findings(),
91
+ scaffold: comparison.scaffold,
92
+ gaps: { siteA: comparison.siteA.gaps, siteB: comparison.siteB.gaps }
93
+ };
102
94
  ```
95
+
96
+ ## Anti-Patterns
97
+
98
+ - No `smart.screenshot()` -- use `bridge.send({ type: 'take_screenshot' })`
99
+ - No `sleep()` loops -- use `smart.waitForIdle()`
100
+ - No `location.href` -- use `smart.navigate()`
101
+ - Always `search()` before guessing command names
102
+ - No manual "extract A, then extract B" -- `smart.comparePages()` does both. See **browser-automation** for full list.
103
+
104
+ ## Output
105
+
106
+ Produces `Finding[]` via `smart.findings()`. Each finding has: `claim`, `evidence[]`, `sourceUrl`, `confidence`, `method`, `dimension`. Dimension tags match the 11 scaffold dimensions.