npm - @dealcrawl/sdk - Versions diffs - 2.10.0 → 2.11.1 - Mend

@dealcrawl/sdk 2.10.0 → 2.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -6,14 +6,48 @@ Official TypeScript SDK for the DealCrawl web scraping and crawling API.
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-## What's New in January 2026 🎉
+## What's New in v2.11.1 (January 2026) 🐛
-- **📸 Screenshot Storage (Phase 4)** - Automatic screenshot capture and storage via Supabase with public URLs
-- **🎯 Priority Crawl System (Phase 5)** - 3-tier queue system (high/medium/low) based on SmartFrontier deal scores for optimized resource allocation
-- **🤖 AI Deal Extraction** - LLM-powered deal extraction with customizable score thresholds and automatic database storage
-- **💾 Enhanced Data Persistence** - New `crawled_pages` and `crawled_deals` tables for comprehensive deal tracking
+### Bug Fixes
+- **DataResource**: Fixed syntax error in `getDealsByCategory()` method (unclosed docstring + duplicate line)
+- **SDK-API Alignment**: Verified 87% endpoint coverage with detailed alignment report
+### Known Gaps
+The following API endpoints do not have SDK methods yet (see [API-SDK Alignment Report](../../docs/API-SDK-ALIGNMENT-REPORT.md)):
+- `GET /v1/status/:jobId/errors` - Get job errors
+- `GET /v1/data/jobs/:jobId` - Get full job details
+- `GET /v1/data/jobs/:jobId/result` - Get job result
+- `GET /v1/data/jobs/:jobId/export` - Export job in multiple formats
+- `POST /v1/webhooks/:id/rotate` - Rotate webhook secret
+- `GET /v1/webhooks/:id/secret-status` - Get webhook secret status
+- `POST /v1/webhooks/verify` - Verify webhook signature
+These methods will be added in a future release.
+---
+## What's New in v2.11.0 (January 2026) 🎉
+### Breaking Changes ⚠️
+- **SearchOptions**: `maxResults` → `limit`, `autoScrape` → `scrapeResults`, `autoScrapeLimit` → `maxScrapeResults`
+- **BatchScrapeOptions**: `delay` → `delayMs`
+- **ExtractModel**: Updated to match API (`claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, etc.)
+- **ApiKeyScope**: Removed `scrape:batch` and `search` (use `scrape` scope for both)
+### New Features
+- **📸 Screenshot Storage (SEC-011)** - Private by default with configurable signed URL TTL
+- **🎯 Priority Crawl System** - 3-tier queue system (high/medium/low) based on SmartFrontier deal scores
+- **🤖 AI Deal Extraction** - LLM-powered extraction with customizable score thresholds
 - **📝 Markdown Output** - Convert scraped content to clean Markdown with GFM support
-- **🎬 Browser Actions** - Execute preset actions (click, scroll, write, etc.) before scraping for dynamic content
+- **🎬 Browser Actions** - Execute preset actions (click, scroll, write, etc.) before scraping
+- **🔴 Real-Time SSE Events** - Track jobs in real-time with Server-Sent Events (browser only)
+- **🛡️ Batch Scrape** - Added `ignoreInvalidURLs` for Firecrawl-compatible error handling
+- **🔄 HTML to Markdown** - New `client.convert.htmlToMarkdown()` utility
 ## Features
@@ -63,6 +97,130 @@ console.log(result.data.parsed.markdown); // Markdown content
 console.log(result.data.screenshot); // Public screenshot URL
 ```
+## Real-Time Events (SSE) - Browser Only 🔴
+Track jobs in real-time using Server-Sent Events (SSE). **Browser only** - for Node.js, use polling via `client.waitForResult()`.
+```typescript
+// 1. Generate SSE token (required for EventSource)
+const { token, expiresAt } = await client.auth.generateSSEToken();
+console.log(`Token expires at: ${expiresAt}`); // 5 minutes
+// 2. Subscribe to all events
+const eventSource = client.events.subscribe(token, {
+  onEvent: (event) => {
+    console.log('Event:', event.type);
+    const data = JSON.parse(event.data);
+    console.log('Data:', data);
+  },
+  onError: (error) => {
+    console.error('SSE error:', error);
+  }
+});
+// 3. Listen for specific event types
+eventSource.addEventListener('job.completed', (event) => {
+  const data = JSON.parse(event.data);
+  console.log('Job completed!', data.summary);
+  eventSource.close(); // Clean up
+});
+eventSource.addEventListener('job.progress', (event) => {
+  const data = JSON.parse(event.data);
+  console.log(`Progress: ${data.progress}%`);
+});
+eventSource.addEventListener('deal.found', (event) => {
+  const data = JSON.parse(event.data);
+  console.log('Deal found!', data.title, data.score);
+});
+// 4. Subscribe to specific job only
+const job = await client.scrape.create({ url: "https://example.com" });
+const jobToken = await client.auth.generateSSEToken({ jobId: job.jobId });
+const jobEvents = client.events.subscribeToJob(job.jobId, jobToken.token, {
+  onEvent: (event) => {
+    const data = JSON.parse(event.data);
+    console.log(`[${event.type}]`, data);
+  }
+});
+// 5. Check connection limits before subscribing
+const limits = await client.auth.getLimits();
+console.log(`Available SSE connections: ${limits.sse.available}/${limits.sse.maxConnections}`);
+// Free: 2 concurrent, Pro: 10 concurrent, Enterprise: 50 concurrent
+// 6. Helper: Wait for completion via SSE
+const result = await client.events.waitForCompletion(job.jobId, (progress) => {
+  console.log(`Progress: ${progress}%`);
+});
+```
+**Available Event Types:**
+| Event Type | Description |
+| ---------- | ----------- |
+| `job.created` | Job was created |
+| `job.queued` | Job entered queue |
+| `job.started` | Worker picked up job |
+| `job.progress` | Progress update (includes `progress`, `stats`, `eta`) |
+| `job.status` | Status changed |
+| `job.completed` | Job finished successfully |
+| `job.failed` | Job failed (includes error details) |
+| `job.cancelled` | Job was cancelled |
+| `job.log` | Important log message |
+| `job.metric` | Performance/business metric |
+| `job.alert` | Important alert (quota warning, etc.) |
+| `job.checkpoint` | Checkpoint saved (for resumable jobs) |
+| `deal.found` | Deal detected during crawl |
+| `deal.validated` | Deal scored/validated |
+| `ping` | Keepalive (every 15 seconds) |
+| `connection.open` | SSE connection established |
+| `connection.close` | SSE connection closing |
+| `error` | Error occurred |
+**TypeScript Support:**
+```typescript
+import type {
+  SSEEvent,
+  JobProgressEvent,
+  JobCompletedEvent,
+  DealFoundEvent
+} from "@dealcrawl/sdk";
+// Type-safe event handling
+eventSource.addEventListener('job.progress', (event: MessageEvent) => {
+  const data = JSON.parse(event.data) as JobProgressEvent['data'];
+  console.log(`Progress: ${data.progress}%`);
+  console.log(`ETA: ${data.eta?.remainingFormatted}`);
+  console.log(`Deals found: ${data.stats?.dealsFound}`);
+});
+eventSource.addEventListener('job.completed', (event: MessageEvent) => {
+  const data = JSON.parse(event.data) as JobCompletedEvent['data'];
+  console.log('Completed in:', data.durationMs, 'ms');
+  console.log('Summary:', data.summary);
+});
+```
+**Features:**
+- ✅ Automatic reconnection on disconnect
+- ✅ Event replay via `Last-Event-ID` (up to 50 missed events)
+- ✅ Keepalive pings every 15 seconds
+- ✅ Max connection time: 1 hour (auto-reconnect after)
+- ✅ Multi-tenant isolation (only see your events)
+- ✅ Token-based auth (works with EventSource)
+**Security:**
+- Tokens expire after 5 minutes
+- Tokens can be restricted to specific jobs
+- Tokens stored in Redis (revocable)
+- Connection limits per tier (Free: 2, Pro: 10, Enterprise: 50)
 ## January 2026 Features in Detail
 ### 📸 Screenshot Storage (SEC-011)
@@ -288,6 +446,8 @@ const batch = await client.scrape.batch({
     detectSignals: true,
     timeout: 30000,
   },
+  delayMs: 500,            // ✨ Was: delay
+  ignoreInvalidURLs: true, // ✨ NEW: Skip invalid URLs instead of failing
 });
 // Get batch status
@@ -299,14 +459,14 @@ const results = await client.waitForAll(batch.jobIds);
 **Batch Options:**
-| Option       | Type   | Default  | Description                               |
-| ------------ | ------ | -------- | ----------------------------------------- |
-| `urls`       | array  | required | 1-100 URL objects with optional overrides |
-| `defaults`   | object | -        | Default options applied to all URLs       |
-| `priority`   | number | 5        | Priority 1-10 (higher = faster)           |
-| `delay`      | number | 0        | Delay between URLs (0-5000ms)             |
-| `webhookUrl` | string | -        | Webhook for batch completion              |
-| `ref`        | string | -        | Custom reference ID for tracking          |
+| Option             | Type    | Default  | Description                                          |
+| ------------------ | ------- | -------- | ---------------------------------------------------- |
+| `urls`             | array   | required | 1-100 URL objects with optional overrides            |
+| `defaults`         | object  | -        | Default options applied to all URLs                  |
+| `priority`         | number  | 5        | Priority 1-10 (higher = faster)                      |
+| `delayMs`          | number  | 0        | Delay between URLs (0-5000ms)                        |
+| `webhookUrl`       | string  | -        | Webhook for batch completion                         |
+| `ignoreInvalidURLs`| boolean | false    | Continue on invalid URLs (Firecrawl-compatible)      |
 ### Search - Web Search with AI
@@ -314,7 +474,7 @@ const results = await client.waitForAll(batch.jobIds);
 // Basic search
 const job = await client.search.create({
   query: "laptop deals black friday",
-  maxResults: 20,
+  limit: 20,  // ✨ Was: maxResults
 });
 // AI-optimized search with deal scoring
@@ -329,8 +489,8 @@ const job = await client.search.create({
 // Search with auto-scraping of results
 const job = await client.search.create({
   query: "promo codes electronics",
-  autoScrape: true,
-  autoScrapeLimit: 5,
+  scrapeResults: true,      // ✨ Was: autoScrape
+  maxScrapeResults: 5,      // ✨ Was: autoScrapeLimit
 });
 // Filtered search
@@ -340,7 +500,7 @@ const job = await client.search.create({
     location: "fr",
     language: "fr",
     dateRange: "month",
-    domains: ["amazon.fr", "cdiscount.com"],
+    domain: "amazon.fr",  // Single domain filter
   },
 });
@@ -359,14 +519,14 @@ const result = await client.searchAndWait({
 | Option              | Type    | Default  | Description                                     |
 | ------------------- | ------- | -------- | ----------------------------------------------- |
 | `query`             | string  | required | Search query                                    |
-| `maxResults`        | number  | 10       | Results to return (1-100)                       |
+| `limit`             | number  | 10       | Results to return (1-100)                       |
 | `useAiOptimization` | boolean | false    | AI-enhance the query                            |
 | `aiProvider`        | string  | "openai" | "openai" or "anthropic"                         |
 | `aiModel`           | string  | -        | Model ID (gpt-4o-mini, claude-3-5-sonnet, etc.) |
 | `useDealScoring`    | boolean | false    | Score results for deal relevance                |
-| `autoScrape`        | boolean | false    | Auto-scrape top results                         |
-| `autoScrapeLimit`   | number  | 3        | Number of results to scrape                     |
-| `filters`           | object  | -        | Location, language, date, domains               |
+| `scrapeResults`     | boolean | false    | Auto-scrape top results                         |
+| `maxScrapeResults`  | number  | 5        | Number of results to scrape (1-10)              |
+| `filters`           | object  | -        | Location, language, date, domain                |
 ### Crawl - Website Crawling
@@ -793,20 +953,18 @@ const stats = await client.keys.getStats(keyId, { days: 30 });
 **Available Scopes:**
-| Scope             | Endpoint                | Description              |
-| ----------------- | ----------------------- | ------------------------ |
-| `scrape`          | `POST /v1/scrape`       | Create scrape jobs       |
-| `scrape:batch`    | `POST /v1/scrape/batch` | Create batch scrape jobs |
-| `search`          | `POST /v1/search`       | Create search jobs       |
-| `crawl`           | `POST /v1/crawl`        | Create crawl jobs        |
-| `dork`            | `POST /v1/dork`         | Create dork searches     |
-| `extract`         | `POST /v1/extract`      | Create extraction jobs   |
-| `agent`           | `POST /v1/agent`        | Create AI agent jobs     |
-| `status`          | `GET /v1/status/:id`    | Read job status          |
-| `data:read`       | `GET /v1/data/*`        | Read jobs/deals          |
-| `data:export`     | `GET /v1/data/export`   | Export data              |
-| `keys:manage`     | `/v1/keys`              | Manage API keys          |
-| `webhooks:manage` | `/v1/webhooks`          | Manage webhooks          |
+| Scope             | Endpoint                          | Description               |
+| ----------------- | --------------------------------- | ------------------------- |
+| `scrape`          | `POST /v1/scrape`, `/v1/scrape/batch` | Create scrape jobs    |
+| `crawl`           | `POST /v1/crawl`                  | Create crawl jobs         |
+| `dork`            | `POST /v1/dork`                   | Create dork searches      |
+| `extract`         | `POST /v1/extract`                | Create extraction jobs    |
+| `agent`           | `POST /v1/agent`                  | Create AI agent jobs      |
+| `status`          | `GET /v1/status/:id`              | Read job status           |
+| `data:read`       | `GET /v1/data/*`                  | Read jobs/deals           |
+| `data:export`     | `GET /v1/data/export`             | Export data               |
+| `keys:manage`     | `/v1/keys`                        | Manage API keys           |
+| `webhooks:manage` | `/v1/webhooks`                    | Manage webhooks           |
 **Scope Examples:**
@@ -832,7 +990,6 @@ await client.keys.create({
     "dork",
     "extract",
     "agent",
-    "search",
     "status",
     "data:read",
     "data:export",
@@ -1152,6 +1309,58 @@ const client = new DealCrawl({
 > **Warning:** Never expose your API key in client-side code. Use a backend proxy or edge function.
+## Migration Guide (v2.10.x → v2.11.0)
+### SearchOptions
+```diff
+const result = await client.search.create({
+  query: "laptop deals",
+- maxResults: 20,
++ limit: 20,
+- autoScrape: true,
++ scrapeResults: true,
+- autoScrapeLimit: 5,
++ maxScrapeResults: 5,
+});
+```
+### BatchScrapeOptions
+```diff
+const batch = await client.scrape.batch({
+  urls: [...],
+- delay: 500,
++ delayMs: 500,
++ ignoreInvalidURLs: true,  // NEW: Firecrawl-compatible
+});
+```
+### ExtractModel
+```diff
+const job = await client.extract.create({
+  url: "...",
+- model: "claude-3-haiku",
++ model: "claude-3-5-haiku-20241022",
+});
+```
+### ApiKeyScope
+```diff
+await client.keys.create({
+  name: "My Key",
+  scopes: [
+    "scrape",
+-   "scrape:batch",  // REMOVED - use "scrape" instead
+-   "search",        // REMOVED - use "scrape" instead
+    "crawl",
+    "status",
+  ],
+});
+```
 ## Compatibility
 - **Node.js**: 18.0+