npm - @dealcrawl/sdk - Versions diffs - 2.9.0 → 2.11.0 - Mend

@dealcrawl/sdk 2.9.0 → 2.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -6,14 +6,25 @@ Official TypeScript SDK for the DealCrawl web scraping and crawling API.
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-## What's New in January 2026 🎉
+## What's New in v2.11.0 (January 2026) 🎉
-- **📸 Screenshot Storage (Phase 4)** - Automatic screenshot capture and storage via Supabase with public URLs
-- **🎯 Priority Crawl System (Phase 5)** - 3-tier queue system (high/medium/low) based on SmartFrontier deal scores for optimized resource allocation
-- **🤖 AI Deal Extraction** - LLM-powered deal extraction with customizable score thresholds and automatic database storage
-- **💾 Enhanced Data Persistence** - New `crawled_pages` and `crawled_deals` tables for comprehensive deal tracking
+### Breaking Changes ⚠️
+- **SearchOptions**: `maxResults` → `limit`, `autoScrape` → `scrapeResults`, `autoScrapeLimit` → `maxScrapeResults`
+- **BatchScrapeOptions**: `delay` → `delayMs`
+- **ExtractModel**: Updated to match API (`claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, etc.)
+- **ApiKeyScope**: Removed `scrape:batch` and `search` (use `scrape` scope for both)
+### New Features
+- **📸 Screenshot Storage (SEC-011)** - Private by default with configurable signed URL TTL
+- **🎯 Priority Crawl System** - 3-tier queue system (high/medium/low) based on SmartFrontier deal scores
+- **🤖 AI Deal Extraction** - LLM-powered extraction with customizable score thresholds
 - **📝 Markdown Output** - Convert scraped content to clean Markdown with GFM support
-- **🎬 Browser Actions** - Execute preset actions (click, scroll, write, etc.) before scraping for dynamic content
+- **🎬 Browser Actions** - Execute preset actions (click, scroll, write, etc.) before scraping
+- **🔴 Real-Time SSE Events** - Track jobs in real-time with Server-Sent Events (browser only)
+- **🛡️ Batch Scrape** - Added `ignoreInvalidURLs` for Firecrawl-compatible error handling
+- **🔄 HTML to Markdown** - New `client.convert.htmlToMarkdown()` utility
 ## Features
@@ -63,14 +74,138 @@ console.log(result.data.parsed.markdown); // Markdown content
 console.log(result.data.screenshot); // Public screenshot URL
 ```
+## Real-Time Events (SSE) - Browser Only 🔴
+Track jobs in real-time using Server-Sent Events (SSE). **Browser only** - for Node.js, use polling via `client.waitForResult()`.
+```typescript
+// 1. Generate SSE token (required for EventSource)
+const { token, expiresAt } = await client.auth.generateSSEToken();
+console.log(`Token expires at: ${expiresAt}`); // 5 minutes
+// 2. Subscribe to all events
+const eventSource = client.events.subscribe(token, {
+  onEvent: (event) => {
+    console.log('Event:', event.type);
+    const data = JSON.parse(event.data);
+    console.log('Data:', data);
+  },
+  onError: (error) => {
+    console.error('SSE error:', error);
+  }
+});
+// 3. Listen for specific event types
+eventSource.addEventListener('job.completed', (event) => {
+  const data = JSON.parse(event.data);
+  console.log('Job completed!', data.summary);
+  eventSource.close(); // Clean up
+});
+eventSource.addEventListener('job.progress', (event) => {
+  const data = JSON.parse(event.data);
+  console.log(`Progress: ${data.progress}%`);
+});
+eventSource.addEventListener('deal.found', (event) => {
+  const data = JSON.parse(event.data);
+  console.log('Deal found!', data.title, data.score);
+});
+// 4. Subscribe to specific job only
+const job = await client.scrape.create({ url: "https://example.com" });
+const jobToken = await client.auth.generateSSEToken({ jobId: job.jobId });
+const jobEvents = client.events.subscribeToJob(job.jobId, jobToken.token, {
+  onEvent: (event) => {
+    const data = JSON.parse(event.data);
+    console.log(`[${event.type}]`, data);
+  }
+});
+// 5. Check connection limits before subscribing
+const limits = await client.auth.getLimits();
+console.log(`Available SSE connections: ${limits.sse.available}/${limits.sse.maxConnections}`);
+// Free: 2 concurrent, Pro: 10 concurrent, Enterprise: 50 concurrent
+// 6. Helper: Wait for completion via SSE
+const result = await client.events.waitForCompletion(job.jobId, (progress) => {
+  console.log(`Progress: ${progress}%`);
+});
+```
+**Available Event Types:**
+| Event Type | Description |
+| ---------- | ----------- |
+| `job.created` | Job was created |
+| `job.queued` | Job entered queue |
+| `job.started` | Worker picked up job |
+| `job.progress` | Progress update (includes `progress`, `stats`, `eta`) |
+| `job.status` | Status changed |
+| `job.completed` | Job finished successfully |
+| `job.failed` | Job failed (includes error details) |
+| `job.cancelled` | Job was cancelled |
+| `job.log` | Important log message |
+| `job.metric` | Performance/business metric |
+| `job.alert` | Important alert (quota warning, etc.) |
+| `job.checkpoint` | Checkpoint saved (for resumable jobs) |
+| `deal.found` | Deal detected during crawl |
+| `deal.validated` | Deal scored/validated |
+| `ping` | Keepalive (every 15 seconds) |
+| `connection.open` | SSE connection established |
+| `connection.close` | SSE connection closing |
+| `error` | Error occurred |
+**TypeScript Support:**
+```typescript
+import type {
+  SSEEvent,
+  JobProgressEvent,
+  JobCompletedEvent,
+  DealFoundEvent
+} from "@dealcrawl/sdk";
+// Type-safe event handling
+eventSource.addEventListener('job.progress', (event: MessageEvent) => {
+  const data = JSON.parse(event.data) as JobProgressEvent['data'];
+  console.log(`Progress: ${data.progress}%`);
+  console.log(`ETA: ${data.eta?.remainingFormatted}`);
+  console.log(`Deals found: ${data.stats?.dealsFound}`);
+});
+eventSource.addEventListener('job.completed', (event: MessageEvent) => {
+  const data = JSON.parse(event.data) as JobCompletedEvent['data'];
+  console.log('Completed in:', data.durationMs, 'ms');
+  console.log('Summary:', data.summary);
+});
+```
+**Features:**
+- ✅ Automatic reconnection on disconnect
+- ✅ Event replay via `Last-Event-ID` (up to 50 missed events)
+- ✅ Keepalive pings every 15 seconds
+- ✅ Max connection time: 1 hour (auto-reconnect after)
+- ✅ Multi-tenant isolation (only see your events)
+- ✅ Token-based auth (works with EventSource)
+**Security:**
+- Tokens expire after 5 minutes
+- Tokens can be restricted to specific jobs
+- Tokens stored in Redis (revocable)
+- Connection limits per tier (Free: 2, Pro: 10, Enterprise: 50)
 ## January 2026 Features in Detail
-### 📸 Screenshot Storage
+### 📸 Screenshot Storage (SEC-011)
-Automatically capture and store screenshots with public URLs:
+**Private by default** with configurable signed URL expiration:
 ```typescript
-// With screenshot options
+// Basic screenshot (private with tier-specific TTL)
 const job = await client.scrape.create({
   url: "https://example.com",
   screenshot: {
@@ -78,14 +213,52 @@ const job = await client.scrape.create({
     fullPage: true,
     format: "webp",
     quality: 85,
+    signedUrlTtl: 604800,  // 7 days (default for Pro/Enterprise)
   },
 });
 const result = await client.waitForResult(job.jobId);
-console.log(result.data.screenshot);
-// → "https://...supabase.co/storage/v1/object/public/screenshots/..."
+console.log(result.data.screenshotMetadata);
+// {
+//   url: "https://...supabase.co/storage/v1/object/sign/screenshots-private/...",
+//   isPublic: false,
+//   expiresAt: "2026-01-25T12:00:00Z",
+//   width: 1280,
+//   height: 720,
+//   format: "webp",
+//   sizeBytes: 125000
+// }
+// Refresh signed URL before expiration
+const refreshed = await client.screenshots.refresh({
+  path: "job_abc123/1234567890_nanoid_example.png",
+  ttl: 604800  // Extend for another 7 days
+});
+console.log(refreshed.url);        // New signed URL
+console.log(refreshed.expiresAt);  // "2026-02-01T12:00:00Z"
+// Get tier-specific TTL limits
+const limits = await client.screenshots.getLimits();
+console.log(limits);
+// {
+//   tier: "pro",
+//   limits: { min: 3600, max: 604800, default: 604800 },
+//   formattedLimits: { min: "1 hour", max: "7 days", default: "7 days" }
+// }
+// Enterprise: Public URLs (opt-in)
+const jobPublic = await client.scrape.create({
+  url: "https://example.com",
+  screenshot: {
+    enabled: true,
+    publicUrl: true,  // ⚠️ Enterprise only - exposes data publicly
+  },
+});
+// → Public URL without expiration (Enterprise tier only)
 ```
+**Security Note:** Screenshots are private by default to prevent exposure of personal data, copyrighted content, or sensitive tokens. Public URLs require Enterprise tier + explicit opt-in.
 ### 🎯 Priority Crawl System
 3-tier queue system automatically prioritizes high-value pages:
@@ -250,6 +423,8 @@ const batch = await client.scrape.batch({
     detectSignals: true,
     timeout: 30000,
   },
+  delayMs: 500,            // ✨ Was: delay
+  ignoreInvalidURLs: true, // ✨ NEW: Skip invalid URLs instead of failing
 });
 // Get batch status
@@ -261,14 +436,14 @@ const results = await client.waitForAll(batch.jobIds);
 **Batch Options:**
-| Option       | Type   | Default  | Description                               |
-| ------------ | ------ | -------- | ----------------------------------------- |
-| `urls`       | array  | required | 1-100 URL objects with optional overrides |
-| `defaults`   | object | -        | Default options applied to all URLs       |
-| `priority`   | number | 5        | Priority 1-10 (higher = faster)           |
-| `delay`      | number | 0        | Delay between URLs (0-5000ms)             |
-| `webhookUrl` | string | -        | Webhook for batch completion              |
-| `ref`        | string | -        | Custom reference ID for tracking          |
+| Option             | Type    | Default  | Description                                          |
+| ------------------ | ------- | -------- | ---------------------------------------------------- |
+| `urls`             | array   | required | 1-100 URL objects with optional overrides            |
+| `defaults`         | object  | -        | Default options applied to all URLs                  |
+| `priority`         | number  | 5        | Priority 1-10 (higher = faster)                      |
+| `delayMs`          | number  | 0        | Delay between URLs (0-5000ms)                        |
+| `webhookUrl`       | string  | -        | Webhook for batch completion                         |
+| `ignoreInvalidURLs`| boolean | false    | Continue on invalid URLs (Firecrawl-compatible)      |
 ### Search - Web Search with AI
@@ -276,7 +451,7 @@ const results = await client.waitForAll(batch.jobIds);
 // Basic search
 const job = await client.search.create({
   query: "laptop deals black friday",
-  maxResults: 20,
+  limit: 20,  // ✨ Was: maxResults
 });
 // AI-optimized search with deal scoring
@@ -291,8 +466,8 @@ const job = await client.search.create({
 // Search with auto-scraping of results
 const job = await client.search.create({
   query: "promo codes electronics",
-  autoScrape: true,
-  autoScrapeLimit: 5,
+  scrapeResults: true,      // ✨ Was: autoScrape
+  maxScrapeResults: 5,      // ✨ Was: autoScrapeLimit
 });
 // Filtered search
@@ -302,7 +477,7 @@ const job = await client.search.create({
     location: "fr",
     language: "fr",
     dateRange: "month",
-    domains: ["amazon.fr", "cdiscount.com"],
+    domain: "amazon.fr",  // Single domain filter
   },
 });
@@ -321,14 +496,14 @@ const result = await client.searchAndWait({
 | Option              | Type    | Default  | Description                                     |
 | ------------------- | ------- | -------- | ----------------------------------------------- |
 | `query`             | string  | required | Search query                                    |
-| `maxResults`        | number  | 10       | Results to return (1-100)                       |
+| `limit`             | number  | 10       | Results to return (1-100)                       |
 | `useAiOptimization` | boolean | false    | AI-enhance the query                            |
 | `aiProvider`        | string  | "openai" | "openai" or "anthropic"                         |
 | `aiModel`           | string  | -        | Model ID (gpt-4o-mini, claude-3-5-sonnet, etc.) |
 | `useDealScoring`    | boolean | false    | Score results for deal relevance                |
-| `autoScrape`        | boolean | false    | Auto-scrape top results                         |
-| `autoScrapeLimit`   | number  | 3        | Number of results to scrape                     |
-| `filters`           | object  | -        | Location, language, date, domains               |
+| `scrapeResults`     | boolean | false    | Auto-scrape top results                         |
+| `maxScrapeResults`  | number  | 5        | Number of results to scrape (1-10)              |
+| `filters`           | object  | -        | Location, language, date, domain                |
 ### Crawl - Website Crawling
@@ -553,16 +728,16 @@ const job = await client.agent.withClaude(
 **Action Types:**
-| Action       | Key Parameters                                     | Description               |
-|--------------|---------------------------------------------------|---------------------------|
-| `click`      | `selector`, `waitAfter?`, `button?`, `force?`     | Click an element          |
-| `scroll`     | `direction`, `amount?`, `smooth?`                 | Scroll page/to element    |
-| `write`      | `selector`, `text`, `clearFirst?`, `typeDelay?`   | Type text into input      |
-| `wait`       | `milliseconds?`, `selector?`, `condition?`        | Wait for time or element  |
-| `press`      | `key`, `modifiers?`                               | Press keyboard key        |
-| `screenshot` | `fullPage?`, `selector?`, `name?`                 | Capture screenshot        |
-| `hover`      | `selector`, `duration?`                           | Hover over element        |
-| `select`     | `selector`, `value`, `byLabel?`                   | Select dropdown option    |
+| Action       | Key Parameters                                    | Description              |
+|--------------|---------------------------------------------------|--------------------------|
+| `click`      | `selector`, `waitAfter?`, `button?`, `force?`     | Click an element         |
+| `scroll`     | `direction`, `amount?`, `smooth?`                 | Scroll page/to element   |
+| `write`      | `selector`, `text`, `clearFirst?`, `typeDelay?`   | Type text into input     |
+| `wait`       | `milliseconds?`, `selector?`, `condition?`        | Wait for time or element |
+| `press`      | `key`, `modifiers?`                               | Press keyboard key       |
+| `screenshot` | `fullPage?`, `selector?`, `name?`                 | Capture screenshot       |
+| `hover`      | `selector`, `duration?`                           | Hover over element       |
+| `select`     | `selector`, `value`, `byLabel?`                   | Select dropdown option   |
 **Action Resilience (all actions support):**
@@ -687,6 +862,44 @@ await client.webhooks.delete(webhookId);
 - `crawl.completed` - Crawl job finished
 - `crawl.failed` - Crawl job failed
+### Screenshots - Signed URL Management
+Manage screenshot signed URLs with configurable TTL and automatic refresh:
+```typescript
+// Refresh a signed URL before expiration
+const refreshed = await client.screenshots.refresh({
+  path: "job_abc123/1234567890_nanoid_example.png",
+  ttl: 604800  // Optional: 7 days (defaults to tier default)
+});
+console.log(refreshed.url);           // New signed URL
+console.log(refreshed.expiresAt);     // "2026-01-25T12:00:00Z"
+console.log(refreshed.tierLimits);    // { min: 3600, max: 604800, default: 604800 }
+// Get tier-specific TTL limits
+const limits = await client.screenshots.getLimits();
+console.log(limits.tier);                    // "pro"
+console.log(limits.limits);                  // { min: 3600, max: 604800, default: 604800 }
+console.log(limits.formattedLimits);         // { min: "1 hour", max: "7 days", default: "7 days" }
+// Specify custom bucket (defaults to 'screenshots-private')
+const refreshed = await client.screenshots.refresh({
+  path: "job_xyz/screenshot.png",
+  ttl: 86400,  // 1 day
+  bucket: "screenshots-private"
+});
+```
+**TTL Limits by Tier:**
+| Tier       | Min TTL | Max TTL | Default TTL |
+|------------|---------|---------|-------------|
+| Free       | 1 hour  | 24 hours| 24 hours    |
+| Pro        | 1 hour  | 7 days  | 7 days      |
+| Enterprise | 1 hour  | 30 days | 7 days      |
+**Security Note:** All screenshots are private by default. Public URLs (Enterprise only) don't require refresh as they don't expire.
 ### Keys - API Key Management
 ```typescript
@@ -717,20 +930,18 @@ const stats = await client.keys.getStats(keyId, { days: 30 });
 **Available Scopes:**
-| Scope             | Endpoint                | Description              |
-| ----------------- | ----------------------- | ------------------------ |
-| `scrape`          | `POST /v1/scrape`       | Create scrape jobs       |
-| `scrape:batch`    | `POST /v1/scrape/batch` | Create batch scrape jobs |
-| `search`          | `POST /v1/search`       | Create search jobs       |
-| `crawl`           | `POST /v1/crawl`        | Create crawl jobs        |
-| `dork`            | `POST /v1/dork`         | Create dork searches     |
-| `extract`         | `POST /v1/extract`      | Create extraction jobs   |
-| `agent`           | `POST /v1/agent`        | Create AI agent jobs     |
-| `status`          | `GET /v1/status/:id`    | Read job status          |
-| `data:read`       | `GET /v1/data/*`        | Read jobs/deals          |
-| `data:export`     | `GET /v1/data/export`   | Export data              |
-| `keys:manage`     | `/v1/keys`              | Manage API keys          |
-| `webhooks:manage` | `/v1/webhooks`          | Manage webhooks          |
+| Scope             | Endpoint                          | Description               |
+| ----------------- | --------------------------------- | ------------------------- |
+| `scrape`          | `POST /v1/scrape`, `/v1/scrape/batch` | Create scrape jobs    |
+| `crawl`           | `POST /v1/crawl`                  | Create crawl jobs         |
+| `dork`            | `POST /v1/dork`                   | Create dork searches      |
+| `extract`         | `POST /v1/extract`                | Create extraction jobs    |
+| `agent`           | `POST /v1/agent`                  | Create AI agent jobs      |
+| `status`          | `GET /v1/status/:id`              | Read job status           |
+| `data:read`       | `GET /v1/data/*`                  | Read jobs/deals           |
+| `data:export`     | `GET /v1/data/export`             | Export data               |
+| `keys:manage`     | `/v1/keys`                        | Manage API keys           |
+| `webhooks:manage` | `/v1/webhooks`                    | Manage webhooks           |
 **Scope Examples:**
@@ -756,7 +967,6 @@ await client.keys.create({
     "dork",
     "extract",
     "agent",
-    "search",
     "status",
     "data:read",
     "data:export",
@@ -958,9 +1168,12 @@ import type {
   HoverAction,
   SelectAction,
-  // Screenshot Options
+  // Screenshot Options & Responses
   ScreenshotOptions,
   ScreenshotResult,
+  RefreshScreenshotOptions,
+  ScreenshotRefreshResponse,
+  ScreenshotLimitsResponse,
   // Re-exports from @dealcrawl/shared
   ScrapeResult,
@@ -1073,6 +1286,58 @@ const client = new DealCrawl({
 > **Warning:** Never expose your API key in client-side code. Use a backend proxy or edge function.
+## Migration Guide (v2.10.x → v2.11.0)
+### SearchOptions
+```diff
+const result = await client.search.create({
+  query: "laptop deals",
+- maxResults: 20,
++ limit: 20,
+- autoScrape: true,
++ scrapeResults: true,
+- autoScrapeLimit: 5,
++ maxScrapeResults: 5,
+});
+```
+### BatchScrapeOptions
+```diff
+const batch = await client.scrape.batch({
+  urls: [...],
+- delay: 500,
++ delayMs: 500,
++ ignoreInvalidURLs: true,  // NEW: Firecrawl-compatible
+});
+```
+### ExtractModel
+```diff
+const job = await client.extract.create({
+  url: "...",
+- model: "claude-3-haiku",
++ model: "claude-3-5-haiku-20241022",
+});
+```
+### ApiKeyScope
+```diff
+await client.keys.create({
+  name: "My Key",
+  scopes: [
+    "scrape",
+-   "scrape:batch",  // REMOVED - use "scrape" instead
+-   "search",        // REMOVED - use "scrape" instead
+    "crawl",
+    "status",
+  ],
+});
+```
 ## Compatibility
 - **Node.js**: 18.0+