smippo 0.0.8 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,45 +27,6 @@
27
27
 
28
28
  📚 **[View complete documentation →](https://smippo.com)**
29
29
 
30
- ## Table of Contents
31
-
32
- - [Table of Contents](#table-of-contents)
33
- - [Features](#features)
34
- - [Quick Start](#quick-start)
35
- - [Installation](#installation)
36
- - [Requirements](#requirements)
37
- - [npm (Global)](#npm-global)
38
- - [Homebrew (Coming soon)](#homebrew-coming-soon)
39
- - [Usage](#usage)
40
- - [Basic Usage](#basic-usage)
41
- - [Interactive Mode](#interactive-mode)
42
- - [Filtering](#filtering)
43
- - [Scope Control](#scope-control)
44
- - [Browser Options](#browser-options)
45
- - [Screenshots](#screenshots)
46
- - [Authentication](#authentication)
47
- - [Output Options](#output-options)
48
- - [Performance \& Parallelism: The Vacuum Architecture](#performance--parallelism-the-vacuum-architecture)
49
- - [Continue/Update](#continueupdate)
50
- - [Serve](#serve)
51
- - [Static Mode](#static-mode)
52
- - [Structured Output](#structured-output)
53
- - [Programmatic API](#programmatic-api)
54
- - [Contributing](#contributing)
55
- - [License](#license)
56
- - [Acknowledgments](#acknowledgments)
57
-
58
- ## Features
59
-
60
- - **🚀 Vacuum Architecture** — Parallel workers consume sites rapidly, just like hippos vacuum up everything in their path
61
- - **📸 Structured Mirroring** — Every page, every resource, every network request captured in organized, structured output
62
- - **🔍 Complete Fidelity** — Gets the page exactly as you see it, including CSS-in-JS, dynamic content, and lazy-loaded images
63
- - **🎯 Smart Consumption** — Respects robots.txt, filters by URL patterns, MIME types, and file sizes
64
- - **📦 Structured Output** — Organized mirror structure preserves original paths for seamless offline browsing
65
- - **🎨 Beautiful CLI** — Interactive guided mode, progress bars, and elegant terminal output
66
- - **🌐 Built-in Server** — Serve captured sites locally with directory browsing
67
- - **📊 HAR Files** — Generates HTTP Archive files for debugging and replay
68
-
69
30
  ## Quick Start
70
31
 
71
32
  Install globally:
@@ -92,336 +53,48 @@ Or use without installing:
92
53
  npx smippo https://example.com
93
54
  ```
94
55
 
95
- > 📖 **For complete documentation, guides, and API reference, visit [smippo.com](https://smippo.com)**
96
-
97
- ## Installation
98
-
99
- ### Requirements
100
-
101
- - Node.js 18 or later
102
- - Chromium (automatically downloaded on first install)
103
-
104
- ### npm (Global)
105
-
106
- ```bash
107
- npm install -g smippo
108
- ```
109
-
110
- ### Homebrew (Coming soon)
111
-
112
- ```bash
113
- brew install smippo
114
- ```
115
-
116
- ## Usage
117
-
118
- ### Basic Usage
119
-
120
- ```bash
121
- # Capture a single page with all assets
122
- smippo https://example.com
123
-
124
- # Mirror a site with depth control
125
- smippo https://example.com --depth 3
126
-
127
- # Save to custom directory
128
- smippo https://example.com --output ./my-mirror
129
- ```
130
-
131
- ### Interactive Mode
132
-
133
- Just run `smippo` with no arguments to start the guided wizard:
134
-
135
- ```bash
136
- smippo
137
- ```
138
-
139
- This will walk you through:
140
-
141
- - URL to capture
142
- - Crawl depth
143
- - Scope settings
144
- - Asset options
145
- - Advanced configuration
146
-
147
- Perfect for beginners or when you want to explore options!
148
-
149
- ### Filtering
150
-
151
- ```bash
152
- # Include only specific patterns
153
- smippo https://example.com --include "*.html" --include "*.css"
154
-
155
- # Exclude patterns
156
- smippo https://example.com --exclude "*tracking*" --exclude "*ads*"
157
-
158
- # Filter by MIME type
159
- smippo https://example.com --mime-include "image/*" --mime-exclude "video/*"
160
-
161
- # Filter by file size
162
- smippo https://example.com --max-size 5MB --min-size 1KB
163
- ```
164
-
165
- ### Scope Control
166
-
167
- ```bash
168
- # Stay on same subdomain (default)
169
- smippo https://www.example.com --scope subdomain
170
-
171
- # Allow all subdomains
172
- smippo https://www.example.com --scope domain
173
-
174
- # Go everywhere (use with caution!)
175
- smippo https://example.com --scope all --depth 2
176
- ```
177
-
178
- ### Browser Options
56
+ ## Commands
179
57
 
180
- ```bash
181
- # Wait for specific condition
182
- smippo https://example.com --wait networkidle
183
- smippo https://example.com --wait domcontentloaded
184
-
185
- # Add extra wait time for slow sites
186
- smippo https://example.com --wait-time 5000
187
-
188
- # Custom user agent
189
- smippo https://example.com --user-agent "Mozilla/5.0..."
58
+ Smippo provides several commands for different use cases:
190
59
 
191
- # Custom viewport
192
- smippo https://example.com --viewport 1280x720
193
-
194
- # Emulate device
195
- smippo https://example.com --device "iPhone 13"
196
- ```
197
-
198
- ### Screenshots
199
-
200
- Take quick screenshots without mirroring the full site:
201
-
202
- ```bash
203
- # Basic screenshot
204
- smippo capture https://example.com
60
+ - **`smippo <url>`** — Capture and mirror websites with full fidelity
61
+ - **`smippo capture <url>`** — Take screenshots of web pages
62
+ - **`smippo serve <directory>`** — Serve captured sites locally
63
+ - **`smippo continue`** — Resume an interrupted capture
64
+ - **`smippo update`** Update an existing mirror
205
65
 
206
- # Full-page screenshot (captures entire scrollable page)
207
- smippo capture https://example.com --full-page
66
+ Run `smippo` with no arguments to start the interactive guided mode.
208
67
 
209
- # Save to specific file
210
- smippo capture https://example.com -O ./screenshots/example.png
211
-
212
- # Mobile device screenshot
213
- smippo capture https://example.com --device "iPhone 13" -O mobile.png
214
-
215
- # Screenshot with dark mode
216
- smippo capture https://example.com --dark-mode
217
-
218
- # Capture specific element
219
- smippo capture https://example.com --selector ".hero-section"
220
-
221
- # JPEG format with quality
222
- smippo capture https://example.com --format jpeg --quality 90
223
- ```
224
-
225
- ### Authentication
226
-
227
- ```bash
228
- # Basic auth
229
- smippo https://user:pass@example.com
230
-
231
- # Cookie-based auth
232
- smippo https://example.com --cookies cookies.json
233
-
234
- # Interactive login (opens browser window)
235
- smippo https://example.com --capture-auth
236
- ```
237
-
238
- ### Output Options
239
-
240
- ```bash
241
- # Generate screenshots
242
- smippo https://example.com --screenshot
243
-
244
- # Generate PDFs
245
- smippo https://example.com --pdf
246
-
247
- # Skip HAR file
248
- smippo https://example.com --no-har
249
-
250
- # Output structure
251
- smippo https://example.com --structure original # URL paths (default)
252
- smippo https://example.com --structure flat # All in one directory
253
- smippo https://example.com --structure domain # Organized by domain
254
- ```
255
-
256
- ### Performance & Parallelism: The Vacuum Architecture
257
-
258
- Smippo's parallel worker architecture mirrors how hippos consume everything in their path—rapidly and efficiently. Multiple workers operate simultaneously, each vacuuming up pages, resources, and network requests in parallel.
259
-
260
- ```bash
261
- # Default: 8 parallel workers (8 hippos vacuuming simultaneously)
262
- smippo https://example.com
263
-
264
- # Limit to 4 workers (for rate-limited sites)
265
- smippo https://example.com --workers 4
266
-
267
- # Single worker (sequential, safest)
268
- smippo https://example.com --workers 1
269
-
270
- # Maximum speed (use with caution)
271
- smippo https://example.com --workers 16
272
-
273
- # Limit total pages
274
- smippo https://example.com --max-pages 100
275
-
276
- # Limit total time
277
- smippo https://example.com --max-time 300 # 5 minutes
278
-
279
- # Rate limiting (delay between requests per worker)
280
- smippo https://example.com --rate-limit 1000 # 1 second between requests
281
- ```
282
-
283
- **The Vacuum Architecture:**
284
-
285
- Each worker operates like an independent hippo, vacuuming up:
286
-
287
- - Fully rendered pages (after JavaScript execution)
288
- - All network resources (images, fonts, stylesheets, API responses)
289
- - Network metadata (captured in HAR files)
290
- - Link structures (for recursive crawling)
291
-
292
- All captured content is then **structured** into organized mirrors that preserve original paths and relationships.
293
-
294
- **Tips for optimal performance:**
295
-
296
- - Use `--workers 1` for sites with strict rate limiting
297
- - Use `--workers 4-8` for most sites (default: 8)
298
- - Use `--workers 16` only for fast servers you control
299
- - Combine `--workers` with `--rate-limit` for polite crawling
300
-
301
- ### Continue/Update
302
-
303
- ```bash
304
- # Continue an interrupted capture
305
- smippo continue
306
-
307
- # Update an existing mirror
308
- smippo update
309
- ```
310
-
311
- ### Serve
312
-
313
- Serve captured sites locally with a built-in web server:
314
-
315
- ```bash
316
- # Serve with auto port detection
317
- smippo serve ./site
318
-
319
- # Specify port
320
- smippo serve ./site --port 3000
321
-
322
- # Open browser automatically
323
- smippo serve ./site --open
324
-
325
- # Show all requests
326
- smippo serve ./site --verbose
327
- ```
328
-
329
- The server provides:
330
-
331
- - **Auto port detection** — Finds next available port if default is busy
332
- - **Proper MIME types** — Correct content-type headers for all file types
333
- - **CORS support** — Enabled by default for local development
334
- - **Nice terminal UI** — Shows clickable URL and request logs
335
-
336
- ### Static Mode
337
-
338
- For any site, use `--static` to strip scripts for true offline viewing:
339
-
340
- ```bash
341
- # Capture as static HTML (removes JS, keeps rendered content)
342
- smippo https://example.com --static --external-assets
343
-
344
- # Then serve
345
- smippo serve ./site --open
346
- ```
347
-
348
- ## Structured Output
349
-
350
- Smippo creates **structured mirrors** that preserve the original URL structure and relationships. Every page, every resource, every network request is organized and stored in a logical hierarchy:
351
-
352
- ```
353
- site/
354
- ├── example.com/
355
- │ ├── index.html
356
- │ ├── about/
357
- │ │ └── index.html
358
- │ └── assets/
359
- │ ├── style.css
360
- │ └── logo.png
361
- ├── .smippo/
362
- │ ├── cache.json # Metadata cache
363
- │ ├── network.har # HAR file
364
- │ ├── manifest.json # Capture manifest
365
- │ └── log.txt # Capture log
366
- └── index.html # Entry point
367
- ```
368
-
369
- ## Programmatic API
370
-
371
- ```javascript
372
- import {capture, Crawler, createServer} from 'smippo';
373
-
374
- // Simple capture
375
- const result = await capture('https://example.com', {
376
- output: './mirror',
377
- depth: 2,
378
- });
379
-
380
- console.log(`Captured ${result.stats.pagesCapt} pages`);
381
-
382
- // Advanced usage with events
383
- const crawler = new Crawler({
384
- url: 'https://example.com',
385
- output: './mirror',
386
- depth: 3,
387
- scope: 'domain',
388
- });
389
-
390
- crawler.on('page:complete', ({url, size}) => {
391
- console.log(`Captured: ${url} (${size} bytes)`);
392
- });
68
+ ## Features
393
69
 
394
- crawler.on('error', ({url, error}) => {
395
- console.error(`Failed: ${url} - ${error.message}`);
396
- });
70
+ - **🚀 Vacuum Architecture** — Parallel workers consume sites rapidly
71
+ - **📸 Complete Fidelity** — Captures pages exactly as rendered, including CSS-in-JS, dynamic content, and lazy-loaded images
72
+ - **🎯 Smart Filtering** — Filter by URL patterns, MIME types, and file sizes. Respects robots.txt
73
+ - **🌐 Built-in Server** — Serve captured sites locally with directory browsing
74
+ - **📊 HAR Files** — Generates HTTP Archive files for debugging and replay
75
+ - **💻 Programmatic API** — Use Smippo in your Node.js applications
397
76
 
398
- await crawler.start();
77
+ ## Documentation
399
78
 
400
- // Start a server programmatically
401
- const server = await createServer({
402
- directory: './mirror',
403
- port: 8080,
404
- open: true, // Opens browser automatically
405
- });
79
+ For complete documentation, guides, and API reference, visit **[smippo.com](https://smippo.com)**:
406
80
 
407
- console.log(`Server running at ${server.url}`);
81
+ - **[Installation Guide](https://smippo.com/getting-started/installation)** Detailed installation instructions
82
+ - **[Commands Reference](https://smippo.com/commands)** — All available commands and options
83
+ - **[Configuration](https://smippo.com/configuration)** — Filtering, scope control, performance tuning
84
+ - **[Guides](https://smippo.com/guides)** — Output structure, link rewriting, troubleshooting
85
+ - **[Programmatic API](https://smippo.com/api/programmatic)** — Use Smippo in your Node.js code
86
+ - **[Examples](https://smippo.com/getting-started/examples)** — Real-world use cases
408
87
 
409
- // Later: stop the server
410
- await server.close();
411
- ```
88
+ ## Requirements
412
89
 
413
- > 📖 **For complete API documentation, see the [Programmatic API guide](https://smippo.com/api-reference/programmatic-api) on smippo.com**
90
+ - Node.js 18 or later
91
+ - Chromium (automatically downloaded on first install)
414
92
 
415
93
  ## Contributing
416
94
 
417
95
  Contributions are welcome! Whether it's bug reports, feature requests, or pull requests — all contributions help make Smippo better.
418
96
 
419
- Please read our [Contributing Guide](CONTRIBUTING.md) for details on:
420
-
421
- - Development setup
422
- - Code style guidelines
423
- - Pull request process
424
- - Testing requirements
97
+ Please read our [Contributing Guide](CONTRIBUTING.md) for details on development setup, code style guidelines, and the pull request process.
425
98
 
426
99
  Quick start:
427
100
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "smippo",
3
- "version": "0.0.8",
3
+ "version": "0.1.0",
4
4
  "description": "S.M.I.P.P.O. — Structured Mirroring of Internet Pages and Public Objects. Modern website copier that captures sites exactly as they appear in your browser.",
5
5
  "main": "src/index.js",
6
6
  "bin": {
package/src/cli.js CHANGED
@@ -10,6 +10,12 @@ import {
10
10
  runInteractiveCapture,
11
11
  shouldRunInteractive,
12
12
  } from './interactive.js';
13
+ import {
14
+ getSiteDir,
15
+ getDomainFromUrl,
16
+ addSiteToGlobalManifest,
17
+ ensureSmippoHome,
18
+ } from './utils/home.js';
13
19
 
14
20
  const program = new Command();
15
21
 
@@ -52,7 +58,10 @@ export function run() {
52
58
  // Main capture command
53
59
  program
54
60
  .argument('[url]', 'URL to capture')
55
- .option('-o, --output <dir>', 'Output directory', './site')
61
+ .option(
62
+ '-o, --output <dir>',
63
+ 'Output directory (default: ~/.smippo/sites/[domain])',
64
+ )
56
65
  .option('-d, --depth <n>', 'Recursion depth (0 = single page)', '0')
57
66
  .option('--no-crawl', 'Disable link following (same as -d 0)')
58
67
  .option('--dry-run', 'Show what would be captured without downloading')
@@ -179,7 +188,7 @@ export function run() {
179
188
  // Serve command
180
189
  program
181
190
  .command('serve [directory]')
182
- .description('Serve a captured site locally')
191
+ .description('Serve a captured site locally (default: ~/.smippo/sites/)')
183
192
  .option(
184
193
  '-p, --port <port>',
185
194
  'Port to serve on (auto-finds available)',
@@ -192,8 +201,13 @@ export function run() {
192
201
  .option('-q, --quiet', 'Minimal output')
193
202
  .action(async (directory, options) => {
194
203
  const {serve} = await import('./server.js');
204
+ const {getSitesDir} = await import('./utils/home.js');
205
+
206
+ // If no directory specified, use global smippo sites directory
207
+ const serveDir = directory || getSitesDir();
208
+
195
209
  await serve({
196
- directory: directory || './site',
210
+ directory: serveDir,
197
211
  port: options.port,
198
212
  host: options.host,
199
213
  open: options.open,
@@ -269,6 +283,14 @@ export function run() {
269
283
  }
270
284
 
271
285
  async function capture(url, options) {
286
+ // Compute output directory based on URL domain if not specified
287
+ let outputDir = options.output;
288
+ if (!outputDir) {
289
+ const domain = getDomainFromUrl(url);
290
+ outputDir = getSiteDir(domain);
291
+ await ensureSmippoHome();
292
+ }
293
+
272
294
  const spinner = ora({
273
295
  text: 'Initializing browser...',
274
296
  isSilent: options.quiet,
@@ -276,7 +298,7 @@ async function capture(url, options) {
276
298
 
277
299
  const crawler = new Crawler({
278
300
  url,
279
- output: options.output,
301
+ output: outputDir,
280
302
  depth: parseInt(options.depth, 10),
281
303
  scope: options.scope,
282
304
  stayInDir: options.stayInDir,
@@ -356,7 +378,22 @@ async function capture(url, options) {
356
378
  console.log(chalk.yellow(` Errors: ${result.stats.errors}`));
357
379
  }
358
380
  console.log('');
359
- console.log(` Output: ${chalk.underline(options.output)}`);
381
+ console.log(` Output: ${chalk.underline(outputDir)}`);
382
+
383
+ // Update global manifest with this capture (tracks all sites regardless of location)
384
+ try {
385
+ const domain = getDomainFromUrl(url);
386
+ await addSiteToGlobalManifest({
387
+ domain,
388
+ rootUrl: url,
389
+ outputDir: outputDir,
390
+ title: result.pages?.[0]?.title || domain,
391
+ pagesCount: result.stats.pagesCapt,
392
+ assetsCount: result.stats.assetsCapt,
393
+ });
394
+ } catch {
395
+ // Silently ignore manifest errors
396
+ }
360
397
  }
361
398
 
362
399
  async function continueCapture(options) {
package/src/server.js CHANGED
@@ -4,6 +4,9 @@ import fs from 'fs-extra';
4
4
  import path from 'path';
5
5
  import chalk from 'chalk';
6
6
  import {exec} from 'child_process';
7
+ import * as p from '@clack/prompts';
8
+ import {readManifest} from './manifest.js';
9
+ import {getAllCapturedSites, getSitesDir} from './utils/home.js';
7
10
 
8
11
  // MIME type mapping
9
12
  const MIME_TYPES = {
@@ -254,7 +257,7 @@ async function generateDirectoryListing(dirPath, urlPath, rootDir) {
254
257
  : ''
255
258
  }
256
259
  </div>
257
-
260
+
258
261
  ${
259
262
  !isRoot
260
263
  ? `
@@ -271,7 +274,7 @@ async function generateDirectoryListing(dirPath, urlPath, rootDir) {
271
274
  </div>`
272
275
  : ''
273
276
  }
274
-
277
+
275
278
  <div class="listing">
276
279
  <div class="listing-header">
277
280
  ${isRoot ? 'Captured Sites' : `Contents of ${urlPath}`}
@@ -315,7 +318,7 @@ async function generateDirectoryListing(dirPath, urlPath, rootDir) {
315
318
  : ''
316
319
  }
317
320
  </div>
318
-
321
+
319
322
  <div class="footer">
320
323
  Powered by Smippo • Modern Website Copier
321
324
  </div>
@@ -335,6 +338,7 @@ export async function createServer(options = {}) {
335
338
  port: requestedPort = 8080,
336
339
  host = '127.0.0.1',
337
340
  open = false,
341
+ openPath = null, // Specific path to open (e.g., domain directory)
338
342
  cors = true,
339
343
  verbose = false,
340
344
  quiet = false,
@@ -470,7 +474,10 @@ export async function createServer(options = {}) {
470
474
  // Start listening
471
475
  return new Promise(resolve => {
472
476
  server.listen(port, host, () => {
473
- const url = `http://${host === '0.0.0.0' ? 'localhost' : host}:${port}`;
477
+ const baseUrl = `http://${host === '0.0.0.0' ? 'localhost' : host}:${port}`;
478
+
479
+ // Build the URL with optional path
480
+ const url = openPath ? `${baseUrl}/${openPath}/` : baseUrl;
474
481
 
475
482
  if (!quiet) {
476
483
  console.log('');
@@ -570,16 +577,195 @@ function truncatePath(p, maxLen) {
570
577
  return '...' + p.slice(-(maxLen - 3));
571
578
  }
572
579
 
580
+ /**
581
+ * Get captured sites - either from global manifest or a specific directory
582
+ * @param {string|null} directory - Specific directory to serve, or null for global
583
+ */
584
+ async function getCapturedSites(directory) {
585
+ const sites = [];
586
+
587
+ // If no directory specified or it's the global sites dir, use global manifest
588
+ const globalSitesDir = getSitesDir();
589
+ const isGlobalMode =
590
+ !directory || path.resolve(directory) === path.resolve(globalSitesDir);
591
+
592
+ if (isGlobalMode) {
593
+ // Use global manifest to get all captured sites
594
+ const globalSites = await getAllCapturedSites();
595
+ for (const site of globalSites) {
596
+ // Find the domain subdirectory within the site path
597
+ const domainDir = path.join(site.path, site.domain);
598
+ const indexInDomain = path.join(domainDir, 'index.html');
599
+ const indexInRoot = path.join(site.path, 'index.html');
600
+
601
+ let hasIndex = false;
602
+
603
+ if (await fs.pathExists(indexInDomain)) {
604
+ hasIndex = true;
605
+ } else if (await fs.pathExists(indexInRoot)) {
606
+ hasIndex = true;
607
+ }
608
+
609
+ sites.push({
610
+ domain: site.domain,
611
+ fullPath: site.path, // Absolute path to serve from
612
+ domainPath: site.domain, // Domain subdirectory
613
+ hasIndex,
614
+ rootUrl: site.rootUrl,
615
+ title: site.title || site.domain,
616
+ pagesCount: site.pagesCount || 0,
617
+ assetsCount: site.assetsCount || 0,
618
+ lastUpdated: site.updated || null,
619
+ });
620
+ }
621
+ return sites;
622
+ }
623
+
624
+ // Specific directory mode: check for .smippo directory
625
+ const smippoDir = path.join(directory, '.smippo');
626
+ if (!(await fs.pathExists(smippoDir))) {
627
+ // Check if directory itself is a site directory (has index.html)
628
+ const indexPath = path.join(directory, 'index.html');
629
+ if (await fs.pathExists(indexPath)) {
630
+ const dirName = path.basename(directory);
631
+ sites.push({
632
+ domain: dirName,
633
+ fullPath: directory,
634
+ domainPath: '',
635
+ hasIndex: true,
636
+ rootUrl: null,
637
+ title: dirName,
638
+ pagesCount: 0,
639
+ assetsCount: 0,
640
+ lastUpdated: null,
641
+ });
642
+ }
643
+ return sites;
644
+ }
645
+
646
+ // Read local manifest for site info
647
+ const manifest = await readManifest(directory);
648
+
649
+ if (!manifest?.rootUrl) {
650
+ return sites;
651
+ }
652
+
653
+ // Extract domain from rootUrl
654
+ try {
655
+ const url = new URL(manifest.rootUrl);
656
+ const mainDomain = url.hostname;
657
+
658
+ // Check if the main domain directory exists
659
+ const domainPath = path.join(directory, mainDomain);
660
+ if (await fs.pathExists(domainPath)) {
661
+ const indexPath = path.join(domainPath, 'index.html');
662
+ const hasIndex = await fs.pathExists(indexPath);
663
+
664
+ sites.push({
665
+ domain: mainDomain,
666
+ fullPath: directory,
667
+ domainPath: mainDomain,
668
+ hasIndex,
669
+ rootUrl: manifest.rootUrl,
670
+ title: manifest.pages?.[0]?.title || mainDomain,
671
+ pagesCount: manifest.stats?.pagesCapt || 0,
672
+ assetsCount: manifest.stats?.assetsCapt || 0,
673
+ lastUpdated: manifest.updated || null,
674
+ });
675
+ }
676
+ } catch {
677
+ // Invalid URL in manifest, fall back to directory scan
678
+ }
679
+
680
+ return sites;
681
+ }
682
+
573
683
  /**
574
684
  * Serve command for CLI
575
685
  */
576
686
  export async function serve(options) {
577
687
  try {
688
+ const directory = options.output || options.directory;
689
+ const globalSitesDir = getSitesDir();
690
+
691
+ // Get captured sites (from global manifest if no directory specified)
692
+ const sites = await getCapturedSites(directory);
693
+
694
+ let serveDir = directory ? path.resolve(directory) : null;
695
+ let openPath = null;
696
+ let selectedSite = null;
697
+
698
+ if (sites.length > 0 && process.stdin.isTTY && !options.quiet) {
699
+ // Show interactive site selection
700
+ console.log('');
701
+ p.intro(chalk.cyan('Smippo Server'));
702
+
703
+ if (sites.length === 1) {
704
+ // Single site - auto-select but show info
705
+ selectedSite = sites[0];
706
+ console.log(
707
+ chalk.dim(' Found captured site: ') +
708
+ chalk.bold(selectedSite.domain),
709
+ );
710
+ if (selectedSite.pagesCount > 0) {
711
+ console.log(
712
+ chalk.dim(
713
+ ` ${selectedSite.pagesCount} pages, ${selectedSite.assetsCount} assets`,
714
+ ),
715
+ );
716
+ }
717
+ if (selectedSite.fullPath) {
718
+ console.log(chalk.dim(` Location: ${selectedSite.fullPath}`));
719
+ }
720
+ } else {
721
+ // Multiple sites - let user choose
722
+ const siteOptions = sites.map(site => ({
723
+ value: site,
724
+ label: site.domain,
725
+ hint:
726
+ site.pagesCount > 0
727
+ ? `${site.pagesCount} pages - ${site.fullPath}`
728
+ : site.fullPath,
729
+ }));
730
+
731
+ const selected = await p.select({
732
+ message: 'Which site would you like to serve?',
733
+ options: siteOptions,
734
+ });
735
+
736
+ if (p.isCancel(selected)) {
737
+ p.cancel('Cancelled');
738
+ process.exit(0);
739
+ }
740
+
741
+ selectedSite = selected;
742
+ }
743
+ console.log('');
744
+ } else if (sites.length === 1) {
745
+ // Non-interactive mode with single site
746
+ selectedSite = sites[0];
747
+ } else if (sites.length === 0 && !directory) {
748
+ console.log(chalk.yellow('No captured sites found.'));
749
+ console.log(
750
+ chalk.dim(' Capture a site first: ') + chalk.cyan('smippo <url>'),
751
+ );
752
+ process.exit(0);
753
+ }
754
+
755
+ // Determine serve directory and open path
756
+ if (selectedSite) {
757
+ serveDir = selectedSite.fullPath;
758
+ openPath = selectedSite.domainPath || null;
759
+ } else if (!serveDir) {
760
+ serveDir = globalSitesDir;
761
+ }
762
+
578
763
  const serverInfo = await createServer({
579
- directory: options.output || options.directory || './site',
764
+ directory: serveDir,
580
765
  port: options.port || 8080,
581
766
  host: options.host || '127.0.0.1',
582
767
  open: options.open,
768
+ openPath: openPath,
583
769
  cors: options.cors !== false,
584
770
  verbose: options.verbose,
585
771
  quiet: options.quiet,
@@ -0,0 +1,175 @@
1
+ // @flow
2
+ import fs from 'fs-extra';
3
+ import path from 'path';
4
+ import os from 'os';
5
+
6
+ const SMIPPO_HOME_DIR = '.smippo';
7
+ const SITES_DIR = 'sites';
8
+ const GLOBAL_MANIFEST_FILE = 'manifest.json';
9
+
10
+ /**
11
+ * Get the global smippo home directory (~/.smippo/)
12
+ */
13
+ export function getSmippoHome() {
14
+ return path.join(os.homedir(), SMIPPO_HOME_DIR);
15
+ }
16
+
17
+ /**
18
+ * Get the sites directory (~/.smippo/sites/)
19
+ */
20
+ export function getSitesDir() {
21
+ return path.join(getSmippoHome(), SITES_DIR);
22
+ }
23
+
24
+ /**
25
+ * Get the output directory for a specific domain
26
+ * @param {string} domain - The domain name (e.g., 'example.com')
27
+ * @returns {string} The full path to the site directory
28
+ */
29
+ export function getSiteDir(domain) {
30
+ return path.join(getSitesDir(), domain);
31
+ }
32
+
33
+ /**
34
+ * Extract domain from a URL
35
+ * @param {string} url - The URL to extract domain from
36
+ * @returns {string} The domain (hostname)
37
+ */
38
+ export function getDomainFromUrl(url) {
39
+ try {
40
+ const parsed = new URL(url);
41
+ return parsed.hostname;
42
+ } catch {
43
+ // If URL parsing fails, return the original string
44
+ return url;
45
+ }
46
+ }
47
+
48
+ /**
49
+ * Ensure the smippo home directory exists
50
+ */
51
+ export async function ensureSmippoHome() {
52
+ const homeDir = getSmippoHome();
53
+ const sitesDir = getSitesDir();
54
+
55
+ await fs.ensureDir(homeDir);
56
+ await fs.ensureDir(sitesDir);
57
+
58
+ return homeDir;
59
+ }
60
+
61
+ /**
62
+ * Get the global manifest path (~/.smippo/manifest.json)
63
+ */
64
+ export function getGlobalManifestPath() {
65
+ return path.join(getSmippoHome(), GLOBAL_MANIFEST_FILE);
66
+ }
67
+
68
+ /**
69
+ * Read the global manifest (list of all captured sites)
70
+ */
71
+ export async function readGlobalManifest() {
72
+ const manifestPath = getGlobalManifestPath();
73
+
74
+ if (!(await fs.pathExists(manifestPath))) {
75
+ return {
76
+ version: '1.0.0',
77
+ sites: [],
78
+ };
79
+ }
80
+
81
+ try {
82
+ const content = await fs.readFile(manifestPath, 'utf8');
83
+ return JSON.parse(content);
84
+ } catch {
85
+ return {
86
+ version: '1.0.0',
87
+ sites: [],
88
+ };
89
+ }
90
+ }
91
+
92
+ /**
93
+ * Write the global manifest
94
+ */
95
+ export async function writeGlobalManifest(manifest) {
96
+ await ensureSmippoHome();
97
+ const manifestPath = getGlobalManifestPath();
98
+ await fs.writeFile(manifestPath, JSON.stringify(manifest, null, 2), 'utf8');
99
+ }
100
+
101
+ /**
102
+ * Add a site to the global manifest
103
+ * @param {Object} siteInfo - Site information
104
+ * @param {string} siteInfo.domain - Domain name
105
+ * @param {string} siteInfo.rootUrl - Original URL
106
+ * @param {string} siteInfo.outputDir - Directory where site was saved
107
+ * @param {string} [siteInfo.title] - Page title
108
+ * @param {number} [siteInfo.pagesCount] - Number of pages captured
109
+ * @param {number} [siteInfo.assetsCount] - Number of assets captured
110
+ */
111
+ export async function addSiteToGlobalManifest(siteInfo) {
112
+ const manifest = await readGlobalManifest();
113
+
114
+ // Use outputDir as the unique key (same domain can be saved to different dirs)
115
+ const outputPath = path.resolve(siteInfo.outputDir);
116
+ const existingIndex = manifest.sites.findIndex(s => s.path === outputPath);
117
+
118
+ const siteEntry = {
119
+ domain: siteInfo.domain,
120
+ rootUrl: siteInfo.rootUrl,
121
+ title: siteInfo.title || siteInfo.domain,
122
+ path: outputPath,
123
+ created: siteInfo.created || new Date().toISOString(),
124
+ updated: new Date().toISOString(),
125
+ pagesCount: siteInfo.pagesCount || 0,
126
+ assetsCount: siteInfo.assetsCount || 0,
127
+ };
128
+
129
+ if (existingIndex >= 0) {
130
+ // Update existing entry (preserve created date)
131
+ manifest.sites[existingIndex] = {
132
+ ...siteEntry,
133
+ created: manifest.sites[existingIndex].created,
134
+ };
135
+ } else {
136
+ // Add new entry
137
+ manifest.sites.push(siteEntry);
138
+ }
139
+
140
+ await writeGlobalManifest(manifest);
141
+ return manifest;
142
+ }
143
+
144
+ /**
145
+ * Remove a site from the global manifest
146
+ */
147
+ export async function removeSiteFromGlobalManifest(domain) {
148
+ const manifest = await readGlobalManifest();
149
+ manifest.sites = manifest.sites.filter(s => s.domain !== domain);
150
+ await writeGlobalManifest(manifest);
151
+ return manifest;
152
+ }
153
+
154
+ /**
155
+ * Get all captured sites from the global manifest
156
+ */
157
+ export async function getAllCapturedSites() {
158
+ const manifest = await readGlobalManifest();
159
+
160
+ // Verify each site still exists
161
+ const validSites = [];
162
+ for (const site of manifest.sites) {
163
+ if (await fs.pathExists(site.path)) {
164
+ validSites.push(site);
165
+ }
166
+ }
167
+
168
+ // Update manifest if some sites were removed
169
+ if (validSites.length !== manifest.sites.length) {
170
+ manifest.sites = validSites;
171
+ await writeGlobalManifest(manifest);
172
+ }
173
+
174
+ return validSites;
175
+ }