@opendatalabs/darshana 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +100 -85
  2. package/package.json +4 -1
package/README.md CHANGED
@@ -1,26 +1,62 @@
1
1
  # darshana
2
2
 
3
- Crawl any web app and generate a labeled PDF, HTML viewer, or image set for AI-assisted design review.
3
+ Crawl any web app and generate a labeled PDF, HTML viewer, or image set for design review.
4
4
 
5
5
  *Darśana* — Sanskrit for "the act of seeing clearly."
6
6
 
7
+ ## Try it now
8
+
9
+ ```bash
10
+ npx @opendatalabs/darshana --url https://vana.org --public
11
+ ```
12
+
13
+ Output lands in `./darshana-output/` — a PDF and a self-contained HTML viewer with sidebar nav, filters, and keyboard navigation.
14
+
15
+ For a private app, darshana opens a browser so you can log in, then saves the session:
16
+
17
+ ```bash
18
+ npx @opendatalabs/darshana --url https://app.vana.org
19
+ # A browser opens → log in → press Enter → capture begins
20
+ ```
21
+
7
22
  ## Install
8
23
 
9
24
  ```bash
10
25
  npm install -g @opendatalabs/darshana
11
- # or use directly:
12
- npx @opendatalabs/darshana --config review.config.json
13
26
  ```
14
27
 
15
- After installing, set up Playwright's browser:
28
+ Chromium is installed automatically. Or skip the install entirely and use `npx @opendatalabs/darshana`.
16
29
 
17
- ```bash
18
- npx playwright install chromium
30
+ ## CLI reference
31
+
32
+ ```
33
+ darshana --url <url> [options] # zero-config
34
+ darshana --config <path> [options] # file-based (CLI args override config)
35
+
36
+ --url <url> Base URL to crawl
37
+ --config <path> Path to a JSON config file
38
+ --title <string> Review title (default: hostname)
39
+ --start <path> Starting path (default: /)
40
+ --public Skip auth — use for public sites
41
+ --auth-storage <path> Where to save/load the session (default: ./darshana-output/auth.json)
42
+ --auth-script <path> Headless login script (see Auth below)
43
+ --themes <list> Comma-separated: system,dark,light (default: system)
44
+ --viewports <list> Comma-separated: desktop,mobile (default: desktop)
45
+ --max-depth <n> BFS depth limit (default: 5)
46
+ --max-pages <n> Page cap (default: 100)
47
+ --delay <ms> Wait after page load before capture (default: 400)
48
+ --outputs <list> Comma-separated: pdf,html,images (default: pdf,html)
49
+ --output-dir <path> Output directory (default: ./darshana-output)
50
+ --include <regex> Crawl only paths matching this pattern (repeatable)
51
+ --exclude <regex> Skip paths matching this pattern (repeatable)
52
+ --dry-run Discover URLs without capturing
53
+ --route <path> Capture a single route only
54
+ --auth-only Save auth session and exit
19
55
  ```
20
56
 
21
- ## Quick start
57
+ ## Config file
22
58
 
23
- **1. Create a config file** (`review.config.json`):
59
+ For complex projects, a JSON config gives you per-route sampling rules and capture overrides. CLI args always override config file values.
24
60
 
25
61
  ```json
26
62
  {
@@ -29,103 +65,92 @@ npx playwright install chromium
29
65
  "start": "/dashboard",
30
66
  "public": false,
31
67
  "authStorage": "./auth.json",
68
+ "authScript": "./auth.mjs",
32
69
  "crawl": {
33
70
  "include": ["^/dashboard"],
34
71
  "exclude": ["logout", "delete"],
35
72
  "maxDepth": 3,
36
73
  "maxPages": 50,
37
- "extraRoutes": []
74
+ "routes": [
75
+ { "pattern": "/dashboard/records/:id", "sample": 1, "follow": false },
76
+ { "pattern": "/dashboard/runs/:id", "sample": 2, "follow": false },
77
+ { "pattern": "/dashboard/**", "follow": true }
78
+ ]
38
79
  },
39
80
  "capture": {
40
- "themes": ["dark"],
81
+ "themes": ["dark", "light"],
41
82
  "viewports": ["desktop", "mobile"],
42
- "delay": 400
83
+ "delay": 400,
84
+ "overrides": [
85
+ { "route": "/dashboard/records/", "delay": 1000 }
86
+ ]
43
87
  },
44
88
  "outputs": ["pdf", "html"],
45
89
  "outputDir": "./output"
46
90
  }
47
91
  ```
48
92
 
49
- **2. Authenticate** (opens a browser — log in, press Enter):
93
+ ### Routes DSL
50
94
 
51
- ```bash
52
- npx darshana --config review.config.json --auth-only
53
- ```
95
+ Without routes, darshana visits every discovered URL. For apps with millions of records or runs, use routes to sample:
54
96
 
55
- Or provide an `authScript` for headless login (see [examples/auth-example.mjs](examples/auth-example.mjs)).
97
+ | Field | Type | Default | Description |
98
+ |---|---|---|---|
99
+ | `pattern` | string | required | Express-style path using `:param` and `/**` |
100
+ | `sample` | number | unlimited | Max pages to capture matching this pattern |
101
+ | `follow` | boolean | `true` | Whether to BFS-follow links on matching pages |
56
102
 
57
- **3. Generate the review**:
103
+ First match wins.
58
104
 
59
- ```bash
60
- npx darshana --config review.config.json
61
- ```
105
+ ### Config reference
62
106
 
63
- ## Config reference
107
+ **Top-level**
64
108
 
65
109
  | Field | Type | Default | Description |
66
110
  |---|---|---|---|
67
- | `title` | string | `"Design Review"` | Title shown on cover page and HTML header |
68
- | `url` | string | required | Base URL of the app |
69
- | `start` | string | required | Path to start crawling from |
70
- | `public` | boolean | `false` | Skip auth entirely for public sites |
71
- | `authStorage` | string | `"./auth.json"` | Path to saved Playwright storageState |
72
- | `authScript` | string | — | Path to a JS file that handles login programmatically |
73
- | `outputs` | string[] | `["pdf"]` | Any of `"pdf"`, `"html"`, `"images"` |
74
- | `outputDir` | string | same dir as config | Directory for generated output files |
111
+ | `title` | string | hostname | Cover page and HTML header title |
112
+ | `url` | string | required | Base URL |
113
+ | `start` | string | `/` | Path to start crawling from |
114
+ | `public` | boolean | `false` | Skip auth |
115
+ | `authStorage` | string | `./auth.json` | Saved session path |
116
+ | `authScript` | string | — | Headless login script |
117
+ | `outputs` | string[] | `["pdf","html"]` | Any of `"pdf"`, `"html"`, `"images"` |
118
+ | `outputDir` | string | `./darshana-output` | Output directory |
75
119
 
76
- ### `crawl`
120
+ **`crawl`**
77
121
 
78
122
  | Field | Type | Default | Description |
79
123
  |---|---|---|---|
80
- | `include` | string[] | `[]` | Regex patterns — URL pathname must match all |
81
- | `exclude` | string[] | `[]` | Regex patterns — URL pathname must not match any |
82
- | `maxDepth` | number | `5` | Max BFS depth from start URL |
83
- | `maxPages` | number | `100` | Hard cap on total pages crawled |
84
- | `extraRoutes` | string[] | `[]` | Additional paths to capture (not crawled for links) |
85
- | `routes` | Route[] | `[]` | Per-pattern sampling rules (see Routes DSL) |
124
+ | `include` | string[] | `[]` | Regex patterns — pathname must match all |
125
+ | `exclude` | string[] | `[]` | Regex patterns — pathname must not match any |
126
+ | `maxDepth` | number | `5` | Max BFS depth |
127
+ | `maxPages` | number | `100` | Hard page cap |
128
+ | `extraRoutes` | string[] | `[]` | Extra paths to capture (not crawled for links) |
129
+ | `routes` | Route[] | `[]` | Per-pattern sampling rules |
86
130
 
87
- ### `capture`
131
+ **`capture`**
88
132
 
89
133
  | Field | Type | Default | Description |
90
134
  |---|---|---|---|
91
- | `themes` | string[] | `["dark"]` | Theme names to capture — injected as `data-theme` + CSS class |
135
+ | `themes` | string[] | `["system"]` | `"system"` (no injection), `"dark"`, `"light"` |
92
136
  | `viewports` | string[] | `["desktop"]` | `"desktop"` (1440×900) or `"mobile"` (390×844) |
93
- | `fullPage` | boolean | `true` | Capture full scrollable page height |
94
- | `delay` | number | `400` | ms to wait after page load before capture |
95
- | `waitFor` | string | — | CSS selector (prefix `$`) or JS expression to wait for |
96
- | `overrides` | Override[] | `[]` | Per-route capture overrides |
97
- | `contextOptions` | object | `{}` | Passed directly to `browser.newContext()` |
98
- | `launchOptions` | object | `{}` | Passed directly to `chromium.launch()` |
99
- | `playwrightOptions` | object | `{}` | Passed directly to `page.screenshot()` |
100
- | `routeOptions` | object | — | `{ blockPatterns: string[] }` — abort matching network requests |
137
+ | `fullPage` | boolean | `true` | Capture full scrollable height |
138
+ | `delay` | number | `400` | ms to wait before capture |
139
+ | `waitFor` | string | — | CSS selector (prefix `$`) or JS expression to await |
140
+ | `overrides` | Override[] | `[]` | Per-route overrides for any capture field |
141
+ | `contextOptions` | object | `{}` | Passed to `browser.newContext()` |
142
+ | `launchOptions` | object | `{}` | Passed to `chromium.launch()` |
143
+ | `playwrightOptions` | object | `{}` | Passed to `page.screenshot()` |
144
+ | `routeOptions` | object | — | `{ blockPatterns: string[] }` — abort matching requests |
101
145
 
102
- ### Routes DSL
146
+ ## Auth
103
147
 
104
- Limit how many pages of each "shape" are captured. Uses Express-style `:param` notation.
148
+ **Headed handover** (default): darshana opens a Chromium window, you log in, press Enter. The session is saved to `authStorage` and reused for 12 hours.
105
149
 
106
- ```json
107
- "routes": [
108
- { "pattern": "/dashboard/records/:id", "sample": 1, "follow": false },
109
- { "pattern": "/dashboard/runs/:id", "sample": 2, "follow": false },
110
- { "pattern": "/dashboard/**", "follow": true }
111
- ]
112
- ```
113
-
114
- | Field | Type | Default | Description |
115
- |---|---|---|---|
116
- | `pattern` | string | required | Path pattern using `:param` and `/**` |
117
- | `sample` | number | unlimited | Max pages to visit matching this pattern |
118
- | `follow` | boolean | `true` | Whether to BFS-follow links on matching pages |
119
-
120
- Patterns are matched in order — first match wins.
121
-
122
- ## Auth options
123
-
124
- **Headed handover** (default when no `authScript`): darshana opens a browser, you log in manually, press Enter — session is saved to `authStorage`. Sessions are reused for 12 hours.
125
-
126
- **Headless auth script**: Create a JS file that exports a default function:
150
+ **Headless auth script**: export a default function that receives a `Browser` and returns the path to a saved `storageState`:
127
151
 
128
152
  ```javascript
153
+ // auth.mjs
129
154
  export default async function login(browser) {
130
155
  const context = await browser.newContext();
131
156
  const page = await context.newPage();
@@ -133,29 +158,19 @@ export default async function login(browser) {
133
158
  await page.fill('#password', process.env.APP_PASSWORD);
134
159
  await page.click('button[type="submit"]');
135
160
  await page.waitForURL(/\/dashboard/);
136
- const storagePath = './auth.json';
137
- await context.storageState({ path: storagePath });
161
+ await context.storageState({ path: './auth.json' });
138
162
  await context.close();
139
- return storagePath;
163
+ return './auth.json';
140
164
  }
141
165
  ```
142
166
 
143
- Set `"authScript": "./my-auth.mjs"` in config.
144
-
145
- ## CLI
146
-
147
- ```bash
148
- darshana --config <path> # run full pipeline
149
- darshana --config <path> --dry-run # discover URLs without capturing
150
- darshana --config <path> --route /dashboard # capture one route only
151
- darshana --config <path> --auth-only # save auth session and exit
152
- ```
167
+ See [examples/auth-example.mjs](examples/auth-example.mjs) for a full example.
153
168
 
154
169
  ## Outputs
155
170
 
156
- - **`pdf`** — `<outputDir>/console-review.pdf` labeled pages, cover page, one page per capture
157
- - **`html`** — `<outputDir>/console-review.html` — self-contained HTML with sidebar nav, filters, keyboard navigation, viewport-correct sizing
158
- - **`images`** — `<outputDir>/images/<viewport>/NNN-slug-theme.png` — individual screenshots grouped by viewport
171
+ - **`pdf`** — one page per capture, labeled header, cover page
172
+ - **`html`** — self-contained file with sidebar nav, theme/viewport filters, keyboard navigation (↑↓), viewport-correct image sizing
173
+ - **`images`** — `<outputDir>/images/<viewport>/NNN-slug-theme.png`
159
174
 
160
175
  ## License
161
176
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@opendatalabs/darshana",
3
- "version": "1.1.0",
3
+ "version": "1.2.0",
4
4
  "description": "Crawl any web app and generate a labeled PDF, HTML viewer, or image set for design review.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -16,6 +16,9 @@
16
16
  "pdf-lib": "^1.17.1",
17
17
  "path-to-regexp": "^8.0.0"
18
18
  },
19
+ "scripts": {
20
+ "postinstall": "playwright install chromium --with-deps 2>/dev/null || playwright install chromium"
21
+ },
19
22
  "devDependencies": {
20
23
  "semantic-release": "^25.0.0",
21
24
  "@semantic-release/commit-analyzer": "^13.0.0",