npm - @pinkpixel/sugarstitch - Versions diffs - 1.0.0 - Mend

@pinkpixel/sugarstitch 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/CHANGELOG.md +59 -0
package/LICENSE +21 -0
package/OVERVIEW.md +306 -0
package/README.md +462 -0
package/assets/banner_dark.png +0 -0
package/assets/banner_light.png +0 -0
package/assets/logo.png +0 -0
package/assets/screenshot_cli.png +0 -0
package/assets/screenshot_completed.png +0 -0
package/assets/screenshot_homepage.png +0 -0
package/assets/screenshot_scraping.png +0 -0
package/dist/index.js +216 -0
package/dist/scraper.js +719 -0
package/dist/server.js +1272 -0
package/package.json +26 -0
package/public/favicon.png +0 -0
package/scripts/add-shebang.js +11 -0
package/src/index.ts +217 -0
package/src/scraper.ts +903 -0
package/src/server.ts +1319 -0
package/tsconfig.json +12 -0
package/website/astro.config.mjs +5 -0
package/website/package-lock.json +6358 -0
package/website/package.json +18 -0
package/website/public/banner_dark.png +0 -0
package/website/public/banner_light.png +0 -0
package/website/public/favicon.png +0 -0
package/website/public/screenshot_cli.png +0 -0
package/website/public/screenshot_completed.png +0 -0
package/website/public/screenshot_homepage.png +0 -0
package/website/public/screenshot_scraping.png +0 -0
package/website/src/layouts/DocsLayout.astro +142 -0
package/website/src/pages/docs/install.astro +96 -0
package/website/src/pages/docs/use-the-app.astro +131 -0
package/website/src/pages/index.astro +94 -0
package/website/src/styles/site.css +611 -0
package/website/tsconfig.json +3 -0
package/website/wrangler.toml +6 -0

package/README.md ADDED Viewed

@@ -0,0 +1,462 @@
+<p align="center">
+  <img src="assets/logo.png" alt="SugarStitch logo" width="300" height="300" />
+</p>
+# SugarStitch
+SugarStitch is a TypeScript scraper for fiber arts pattern websites with both a CLI and a local browser UI. It can scrape individual pattern pages, batch lists of URLs, or discover pattern pages from an index page and then scrape those discovered links for titles, text, images, and PDFs.
+## Screenshots
+### Local UI
+![SugarStitch homepage UI](website/public/screenshot_homepage.png)
+![SugarStitch scraping progress state](website/public/screenshot_scraping.png)
+![SugarStitch completed run summary](website/public/screenshot_completed.png)
+### CLI
+![SugarStitch CLI](website/public/screenshot_cli.png)
+## What It Does
+- Scrapes a single pattern URL or a list of URLs from a text file
+- Includes a simple local browser UI for people who prefer forms over command-line flags
+- Supports discovery crawl mode so one listing page can expand into many pattern pages
+- Supports crawl language filtering so discovered pages can stay in one language
+- Supports crawl pagination so listing pages like `/page/2/` and `/page/3/` can be added automatically
+- Includes built-in selector presets for `generic`, `wordpress`, and `woocommerce`
+- Supports reusable saved site profiles from a JSON config file
+- Lets you override title, description, materials, instructions, and image selectors per run
+- Includes a preview mode to test selectors before downloading files or writing JSON
+- Lets you choose an output directory for the JSON file plus downloaded assets
+- Shows an in-page loading state while preview or scrape requests are running
+- Downloads linked PDFs and page images when found
+- Skips already-known `sourceUrl` entries before re-scraping them
+## Best Supported Site Types
+SugarStitch works best on sites where the pattern content is already present in the HTML response and does not require a JavaScript app to render first.
+Typical use cases include:
+- sewing pattern blogs
+- crochet pattern pages
+- knitting pattern archives
+- quilting, embroidery, and other fiber arts tutorial or pattern sites
+Usually a good fit:
+- WordPress pattern blogs and article pages
+- Blogger and Blogspot pattern pages
+- WooCommerce product-style pattern pages
+- older handcrafted sites with normal HTML articles
+- free-pattern archive pages that link to regular child pages
+More mixed or site-specific:
+- Wix
+- Squarespace
+- Webflow
+- custom JavaScript-heavy sites
+Usually not a good fit with the current scraper approach:
+- React single-page apps
+- hash-routed sites like `#/free-patterns`
+- pages where the content only appears after client-side JavaScript runs
+Why:
+SugarStitch currently fetches page HTML and parses it directly. It does not run a full browser-rendered scraping flow yet, so JavaScript-only pages may return just the site shell instead of the real pattern content.
+If a site only partly works, try:
+- switching selector presets
+- using `Test Selectors` first
+- creating a saved site profile
+- adding one or two advanced selector overrides
+## Install
+### Global Install
+```bash
+npm install -g @pinkpixel/sugarstitch
+```
+Then run it as:
+```bash
+sugarstitch --url "https://example.com/pattern"
+```
+### Local Development Install
+```bash
+git clone https://github.com/pinkpixel-dev/sugarstitch.git
+cd sugarstitch
+npm install
+```
+## Available Scripts
+```bash
+npm run build
+```
+Compiles TypeScript into `dist/`.
+```bash
+npm run scrape -- --url "https://example.com/pattern"
+```
+Runs the CLI with `ts-node`.
+```bash
+npm run ui
+```
+Starts the local UI at `http://localhost:4177`.
+## Quick Start
+### Scrape One Pattern Page
+```bash
+npm run scrape -- --url "https://example.com/pattern" --preset wordpress
+```
+### Scrape Many URLs From a File
+Create `urls.txt`:
+```txt
+https://example.com/pattern-1
+https://example.com/pattern-2
+https://example.com/pattern-3
+```
+Then run:
+```bash
+npm run scrape -- --file urls.txt
+```
+### Save Output Somewhere Else
+```bash
+npm run scrape -- --url "https://example.com/pattern" --output-dir ./exports --output patterns.json
+```
+That saves:
+- `patterns.json`
+- `images/`
+- `pdfs/`
+- `texts/`
+inside `./exports`.
+## Discovery Crawl Mode
+Discovery crawl mode is for index pages such as “Free Patterns” pages. Instead of entering every pattern URL yourself, you can start from one page and let SugarStitch follow links a couple levels deep before scraping the discovered pages.
+This is useful for:
+- free-pattern listing pages
+- archive pages
+- blog category pages
+- collections where the real pattern content lives on child pages
+### Example
+```bash
+npm run scrape -- \
+  --url "https://www.tildasworld.com/free-patterns/" \
+  --preset wordpress \
+  --crawl \
+  --crawl-depth 2 \
+  --crawl-pattern "free_pattern|pattern|quilt|pillow" \
+  --crawl-language english \
+  --crawl-paginate
+```
+That tells SugarStitch to:
+1. Start from the given listing page
+2. Follow matching links up to 2 levels deep
+3. Stay on the same domain by default
+4. Scrape the discovered pages themselves
+So if a child page is a blog-style pattern page with no PDF but useful article content, SugarStitch will still try to scrape that page normally.
+### Crawl Options
+- `--crawl`: turns discovery mode on
+- `--crawl-depth <number>`: how many link levels deep to follow
+- `--crawl-pattern <pattern>`: only follow links whose URL or link text matches this text or regex
+- `--crawl-language <language>`: prefer discovered URLs for one language such as `english`, `french`, or `portuguese`
+- `--crawl-paginate`: expand paginated listing pages like `/page/2/`, `/page/3/`, and so on
+- `--crawl-max-pages <number>`: cap how many listing pages are added in pagination mode
+- `--crawl-any-domain`: allow discovery to follow links outside the starting domain
+- `--crawl-max-urls <number>`: cap how many discovered pages get scraped
+### Why Crawl Language Filtering Helps
+Some sites expose multiple language sections from the same listing page. For example, an English archive may also link to French or Portuguese archives. With `--crawl-language english`, SugarStitch can keep the discovered crawl focused on English pages instead of mixing languages into one run.
+### Why Crawl Pagination Helps
+Some listing pages only expose the first batch of pattern cards until you click a `Load More` control. If the site also exposes those later batches as regular paginated URLs, SugarStitch can add those deeper listing pages automatically before discovery continues.
+## Local Web UI
+Run:
+```bash
+npm run ui
+```
+Then open:
+```text
+http://localhost:4177
+```
+![SugarStitch homepage showing the scrape form and saved profiles](website/public/screenshot_homepage.png)
+The UI includes:
+- single URL mode
+- multi-URL paste mode
+- saved site profile dropdown
+- selector preset dropdown
+- advanced selector override fields
+- discovery crawl controls
+- crawl language and crawl pagination controls
+- output JSON filename field
+- output directory field
+- `Test Selectors` preview button
+- `Start Scraping` button
+- light and dark mode toggle
+- spinner/progress overlay while requests are running
+![SugarStitch progress overlay while a scrape is running](website/public/screenshot_scraping.png)
+![SugarStitch completed run summary with log output](website/public/screenshot_completed.png)
+### Output Directory In the UI
+Use the `Output Directory` field to choose where the JSON file and downloaded folders should be saved.
+If left blank, SugarStitch saves into the project folder you launched it from.
+Note:
+This is currently a path field, not a native folder picker. In a normal browser-based local UI, the page cannot reliably hand a true local filesystem path back to the server the way a desktop app can.
+## Selector Presets
+Selector presets are defined in [`src/scraper.ts`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/src/scraper.ts).
+Built-in presets:
+- `generic`: a broad fallback for custom and article-style pages
+- `wordpress`: tuned for common WordPress post wrappers like `.entry-content`
+- `woocommerce`: tuned for WooCommerce product pages and galleries
+These are starting points, not guarantees.
+## Advanced Selector Overrides
+If a preset is close but not quite right, you can override only the fields you need for a single run.
+Available override flags:
+- `--title-selector`
+- `--description-selector`
+- `--materials-selector`
+- `--instructions-selector`
+- `--image-selector`
+Example:
+```bash
+npm run scrape -- \
+  --url "https://example.com/pattern" \
+  --preset wordpress \
+  --materials-selector ".entry-content ul li"
+```
+Overrides take priority over the selected preset for that field only.
+## Saved Site Profiles
+SugarStitch can load reusable profiles from [`sugarstitch.profiles.json`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/sugarstitch.profiles.json).
+Each profile can define:
+- `id`
+- `label`
+- `description`
+- `preset`
+- `selectorOverrides`
+Example:
+```json
+{
+  "profiles": [
+    {
+      "id": "tildas-world",
+      "label": "Tilda's World",
+      "preset": "wordpress",
+      "selectorOverrides": {
+        "materialsSelector": ".entry-content ul li",
+        "instructionsSelector": ".entry-content ol li"
+      }
+    }
+  ]
+}
+```
+Use one with:
+```bash
+npm run scrape -- --url "https://example.com/pattern" --profile tildas-world
+```
+Or point to another file:
+```bash
+npm run scrape -- --url "https://example.com/pattern" --profile tildas-world --profiles-file ./my-profiles.json
+```
+## Preview Mode
+Preview mode lets you test extraction before writing JSON or downloading files.
+It:
+- fetches the page
+- applies the selected preset, saved profile, and any advanced overrides
+- shows the matched title, description, materials, instructions, images, and PDFs
+- does not write files
+CLI example:
+```bash
+npm run scrape -- --url "https://example.com/pattern" --profile tildas-world --preview
+```
+UI flow:
+1. Choose `Single URL`
+2. Enter a pattern page URL
+3. Pick a preset or saved profile
+4. Add overrides if needed
+5. Click `Test Selectors`
+## CLI Options
+```text
+-u, --url <url>                     A single URL of the pattern page to scrape
+-f, --file <file>                   A text file containing a list of URLs
+-o, --output <path>                 Output JSON file name
+--output-dir <path>                 Directory where JSON, images, and PDFs should be saved
+-p, --preset <preset>               Selector preset
+--crawl                             Discover links from the starting URL(s) before scraping them
+--crawl-depth <number>              How many link levels deep to follow in crawl mode
+--crawl-pattern <pattern>           Only follow discovered links whose URL or link text matches this text or regex
+--crawl-language <language>         Prefer discovered URLs for one language such as english, french, or portuguese
+--crawl-paginate                    Expand listing pages like /page/2/, /page/3/, and scrape them too
+--crawl-max-pages <number>          Maximum listing pages to add in pagination mode
+--crawl-any-domain                  Allow crawl mode to follow links to other domains
+--crawl-max-urls <number>           Maximum number of discovered page URLs to scrape
+--profile <id>                      Use a saved site profile
+--profiles-file <path>              Path to the profiles config file
+--preview                           Preview extraction without saving files
+--title-selector <selector>
+--description-selector <selector>
+--materials-selector <selector>
+--instructions-selector <selector>
+--image-selector <selector>
+```
+## Output Structure
+SugarStitch writes one object per successfully scraped page:
+```json
+{
+  "title": "Pattern Title",
+  "description": "Short description from the page",
+  "materials": ["Cotton fabric", "Stuffing", "Thread"],
+  "instructions": ["Cut the pieces", "Sew the body", "Stuff and close"],
+  "sourceUrl": "https://example.com/pattern",
+  "localImages": ["images/pattern_title/image_1.jpg"],
+  "localPdfs": ["pdfs/pattern_title/pattern.pdf"],
+  "localTextFile": "texts/pattern_title/pattern.txt"
+}
+```
+Each scraped page also gets a plain-text artifact at `texts/<pattern_title>/pattern.txt`.
+That text file includes:
+- title
+- source URL
+- selected preset and optional profile
+- extracted description
+- extracted materials list
+- extracted instructions list
+- a fuller page text block gathered from the article content
+## Notes
+- The CLI prints a small SugarStitch ASCII banner when run in a normal terminal.
+- The local UI now includes a light/dark mode toggle, with light mode as the default.
+![SugarStitch CLI banner and progress output](website/public/screenshot_cli.png)
+## Troubleshooting
+### It scraped PDFs and titles, but not much else
+That still counts as a successful scrape. It usually means the page-level selectors for description, materials, instructions, or images do not match the site structure yet.
+Try one of these:
+- run `Test Selectors` in the UI first
+- switch presets
+- use a saved profile for that site
+- add one or two advanced overrides
+### Discovery crawl found too much or too little
+Adjust:
+- `crawl depth`
+- `crawl pattern`
+- `crawl language`
+- crawl pagination settings
+- same-domain restriction
+- max discovered URLs
+### The output file already exists but the scraper refuses to run
+If the JSON file contains invalid JSON, SugarStitch will stop instead of silently overwriting it. Fix or remove the broken file first.
+## Development Notes
+- CLI entrypoint: [`src/index.ts`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/src/index.ts)
+- UI entrypoint: [`src/server.ts`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/src/server.ts)
+- Shared scraper logic: [`src/scraper.ts`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/src/scraper.ts)
+- Starter profiles config: [`sugarstitch.profiles.json`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/sugarstitch.profiles.json)
+- Technical overview: [`OVERVIEW.md`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/OVERVIEW.md)
+## License
+This project is licensed under the MIT License. See [`LICENSE`](/home/sizzlebop/PINKPIXEL/PROJECTS/CURRENT/sugarstitch/LICENSE).

package/assets/banner_dark.png ADDED Viewed

Binary file

package/assets/banner_light.png ADDED Viewed

Binary file

package/assets/logo.png ADDED Viewed

Binary file

package/assets/screenshot_cli.png ADDED Viewed

Binary file

package/assets/screenshot_completed.png ADDED Viewed

Binary file

package/assets/screenshot_homepage.png ADDED Viewed

Binary file

package/assets/screenshot_scraping.png ADDED Viewed

Binary file

package/dist/index.js ADDED Viewed

@@ -0,0 +1,216 @@
+#!/usr/bin/env node
+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+const commander_1 = require("commander");
+const fs = __importStar(require("fs/promises"));
+const path = __importStar(require("path"));
+const scraper_1 = require("./scraper");
+const program = new commander_1.Command();
+program
+    .name('sugarstitch')
+    .description('✨ Bulk scrape fiber arts patterns, images, AND PDFs into sweet little local files ✨')
+    .version('1.0.0')
+    .option('-u, --url <url>', 'A single URL of the pattern page to scrape')
+    .option('-f, --file <file>', 'A text file containing a list of URLs (one per line)')
+    .option('-o, --output <path>', 'Output JSON file name', 'pattern-data.json')
+    .option('--output-dir <path>', 'Directory where JSON, images, and PDFs should be saved')
+    .option('-p, --preset <preset>', `Selector preset: ${(0, scraper_1.getSelectorPresets)().map(preset => preset.id).join(', ')}`, 'generic')
+    .option('--crawl', 'Discover links from the starting URL(s) before scraping them')
+    .option('--crawl-depth <number>', 'How many link levels deep to follow in crawl mode', '2')
+    .option('--crawl-pattern <pattern>', 'Only follow discovered links whose URL or link text matches this text or regex')
+    .option('--crawl-language <language>', 'Prefer discovered URLs for one language such as english, french, or portuguese')
+    .option('--crawl-paginate', 'Expand listing pages like /page/2/, /page/3/, and scrape them too')
+    .option('--crawl-max-pages <number>', 'Maximum listing pages to add in pagination mode', '20')
+    .option('--crawl-any-domain', 'Allow crawl mode to follow links to other domains')
+    .option('--crawl-max-urls <number>', 'Maximum number of discovered page URLs to scrape', '100')
+    .option('--profile <id>', 'Use a saved site profile from the profiles config file')
+    .option('--profiles-file <path>', `Path to the site profiles config file (default: ${scraper_1.DEFAULT_PROFILES_FILE})`, scraper_1.DEFAULT_PROFILES_FILE)
+    .option('--preview', 'Preview what would be extracted without downloading files or writing JSON')
+    .option('--title-selector <selector>', 'Override the title selector for this run')
+    .option('--description-selector <selector>', 'Override the description selector for this run')
+    .option('--materials-selector <selector>', 'Override the materials selector for this run')
+    .option('--instructions-selector <selector>', 'Override the instructions selector for this run')
+    .option('--image-selector <selector>', 'Override the image selector for this run')
+    .parse(process.argv);
+const options = program.opts();
+const ANSI_RESET = '\x1b[0m';
+const ANSI_PINK = '\x1b[38;5;205m';
+const ANSI_MINT = '\x1b[38;5;121m';
+const ANSI_SKY = '\x1b[38;5;117m';
+const ANSI_GOLD = '\x1b[38;5;223m';
+function colorize(line, color) {
+    return `${color}${line}${ANSI_RESET}`;
+}
+function printBanner() {
+    if (!process.stdout.isTTY || process.env.NO_COLOR) {
+        console.log('\nSugarStitch\n');
+        return;
+    }
+    const bannerLines = [
+        colorize('███████╗██╗   ██╗ ██████╗  █████╗ ██████╗ ', ANSI_PINK),
+        colorize('██╔════╝██║   ██║██╔════╝ ██╔══██╗██╔══██╗', ANSI_MINT),
+        colorize('███████╗██║   ██║██║  ███╗███████║██████╔╝', ANSI_SKY),
+        colorize('╚════██║██║   ██║██║   ██║██╔══██║██╔══██╗', ANSI_GOLD),
+        colorize('███████║╚██████╔╝╚██████╔╝██║  ██║██║  ██║', ANSI_PINK),
+        colorize('╚══════╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝', ANSI_MINT),
+        colorize(' ███████╗████████╗██╗████████╗ ██████╗██╗  ██╗', ANSI_SKY),
+        colorize(' ██╔════╝╚══██╔══╝██║╚══██╔══╝██╔════╝██║  ██║', ANSI_GOLD),
+        colorize(' ███████╗   ██║   ██║   ██║   ██║     ███████║', ANSI_PINK),
+        colorize(' ╚════██║   ██║   ██║   ██║   ██║     ██╔══██║', ANSI_MINT),
+        colorize(' ███████║   ██║   ██║   ██║   ╚██████╗██║  ██║', ANSI_SKY),
+        colorize(' ╚══════╝   ╚═╝   ╚═╝   ╚═╝    ╚═════╝╚═╝  ╚═╝', ANSI_GOLD)
+    ];
+    console.log(`\n${bannerLines.join('\n')}`);
+    console.log(colorize('Sweet little fiber arts scraper', ANSI_GOLD));
+    console.log('');
+}
+function validateInputOptions() {
+    if (options.url && options.file) {
+        console.error('\n❌ Please use either --url or --file, not both at the same time.');
+        process.exitCode = 1;
+        program.help();
+    }
+    if (!options.url && !options.file) {
+        console.error('\n❌ You need to provide either a single URL (-u) or a text file (-f) to scrape.');
+        process.exitCode = 1;
+        program.help();
+    }
+    if (!(0, scraper_1.isSelectorPresetId)(options.preset)) {
+        console.error(`\n❌ Unknown preset "${options.preset}". Use one of: ${(0, scraper_1.getSelectorPresets)().map(preset => preset.id).join(', ')}`);
+        process.exitCode = 1;
+        program.help();
+    }
+}
+function resolveOutputPaths(outputName, outputDirectory) {
+    const resolvedOutputDirectory = outputDirectory
+        ? path.resolve(process.cwd(), outputDirectory)
+        : process.cwd();
+    const outputPath = path.isAbsolute(outputName)
+        ? outputName
+        : path.resolve(resolvedOutputDirectory, outputName);
+    return {
+        outputDirectory: resolvedOutputDirectory,
+        outputPath
+    };
+}
+async function getUrlsFromOptions() {
+    if (options.url) {
+        const normalizedUrl = (0, scraper_1.normalizeUrl)(options.url);
+        if (!normalizedUrl) {
+            throw new Error(`That doesn't look like a valid URL: ${options.url}`);
+        }
+        return [normalizedUrl];
+    }
+    const filePath = path.resolve(process.cwd(), options.file);
+    const fileContent = await fs.readFile(filePath, 'utf-8');
+    const rawLines = fileContent.split(/\r?\n/).map(line => line.trim()).filter(line => line.length > 0);
+    const validUrls = rawLines
+        .map(scraper_1.normalizeUrl)
+        .filter((line) => Boolean(line));
+    const invalidCount = rawLines.length - validUrls.length;
+    const urls = (0, scraper_1.dedupeStrings)(validUrls);
+    console.log(`\n📚 Fuck yeah, loaded ${urls.length} URLs from ${options.file}. Let's get to work...`);
+    if (invalidCount > 0) {
+        console.log(`⚠️ Skipped ${invalidCount} line(s) because they were not valid http(s) URLs.`);
+    }
+    return urls;
+}
+async function run() {
+    printBanner();
+    validateInputOptions();
+    try {
+        const urls = await getUrlsFromOptions();
+        const profilesPath = path.resolve(process.cwd(), options.profilesFile);
+        const { outputDirectory, outputPath } = resolveOutputPaths(options.output, options.outputDir);
+        const selectorOverrides = (0, scraper_1.sanitizeSelectorOverrides)({
+            titleSelector: options.titleSelector,
+            descriptionSelector: options.descriptionSelector,
+            materialsSelector: options.materialsSelector,
+            instructionsSelector: options.instructionsSelector,
+            imageSelector: options.imageSelector
+        });
+        if (options.preview) {
+            const preview = await (0, scraper_1.previewPattern)({
+                url: urls[0],
+                preset: options.preset,
+                selectorOverrides,
+                profileId: options.profile,
+                profilesPath
+            }, message => console.log(message));
+            console.log('\nPreview Summary');
+            console.log(`Title: ${preview.title}`);
+            console.log(`Description: ${preview.description}`);
+            console.log(`Preset: ${preview.presetLabel}`);
+            if (preview.profileLabel) {
+                console.log(`Profile: ${preview.profileLabel}`);
+            }
+            if (preview.materials.length > 0) {
+                console.log(`Materials (${preview.materials.length}): ${preview.materials.join(' | ')}`);
+            }
+            if (preview.instructions.length > 0) {
+                console.log(`Instructions (${preview.instructions.length}): ${preview.instructions.slice(0, 5).join(' | ')}`);
+            }
+            console.log(`Images found: ${preview.imageUrls.length}`);
+            console.log(`PDFs found: ${preview.pdfUrls.length}`);
+            return;
+        }
+        await (0, scraper_1.scrapeUrls)({
+            urls,
+            outputPath,
+            preset: options.preset,
+            profileId: options.profile,
+            profilesPath,
+            selectorOverrides,
+            crawl: {
+                enabled: Boolean(options.crawl),
+                maxDepth: Number.parseInt(options.crawlDepth, 10),
+                sameDomainOnly: !options.crawlAnyDomain,
+                linkPattern: options.crawlPattern,
+                maxDiscoveredUrls: Number.parseInt(options.crawlMaxUrls, 10),
+                language: options.crawlLanguage,
+                paginate: Boolean(options.crawlPaginate),
+                maxPaginationPages: Number.parseInt(options.crawlMaxPages, 10)
+            },
+            workingDirectory: outputDirectory,
+            logger: message => console.log(message)
+        });
+    }
+    catch (error) {
+        console.error(`\n❌ ${error.message}`);
+        process.exitCode = 1;
+    }
+}
+run();