npm - @ooky/sdk - Versions diffs - 0.1.0 → 0.6.0 - Mend

@ooky/sdk 0.1.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Changelog
+All notable changes to `@ooky/sdk`. Versions follow [semver](https://semver.org);
+pre-1.0, minor versions may include breaking changes (called out explicitly).
+## 0.6.0 — 2026-06-19
+### Fixed
+- **Middleware fails safe on missing config (high).** `createOokyHandler` throws
+  when `apiKey`/`domain` are missing, and the adapters built the handler at
+  module load — so an unset or typo'd `OOKY_API_KEY` / `OOKY_DOMAIN` threw on
+  construction and **500'd the customer's entire site** (the middleware runs on
+  every request, e.g. Vercel's `MIDDLEWARE_INVOCATION_FAILED`). The Express,
+  Next, and Edge adapters now wrap construction: on failure they log one loud
+  line and return a pass-through middleware, so a misconfigured integration
+  disables Ooky **without taking down the host app**. Complements the existing
+  per-request pass-through hardening.
+## 0.5.0 — 2026-06-12
+### Fixed
+- **Crash hardening (high).** A malformed `/api/public/bots` payload — an
+  empty-string pattern, a non-string/`null` entry, a backend row-mapping
+  change, a customer `options.bots` typo, or JSON corruption in transit —
+  could throw in `detectBot`. In the Express adapter that throw ran *outside*
+  the try/catch and became an unhandled rejection that crashed the customer's
+  process for **all** of their traffic, recurring on restart. `detectBot` is
+  now defensive (skips non-string/empty patterns, tolerates a non-array or
+  huge registry, never throws); registries are validated and capped on both
+  construction and refresh; and both adapters wrap `detectBot`/`matchPath` so
+  any throw degrades to pass-through (serve the customer's app) rather than
+  crashing.
+### Added
+- **`geo.country` on events.** The Express and Next/Edge adapters now read the
+  visitor country from edge headers (`cf-ipcountry`, `x-vercel-ip-country`,
+  or `request.geo?.country`) and attach `geo: { country }` to both bot and
+  `ai_referral` events — the dashboard geo panel is no longer empty for SDK
+  customers. (CF placeholders `XX`/`T1` are dropped.)
+- **MCP body-size cap on Next/Edge.** The Next/Edge adapter now rejects an
+  oversized MCP `POST` with a `413` via a `Content-Length` pre-check (parity
+  with the Express adapter's existing 64KB streaming cap). `MAX_MCP_BODY_BYTES`
+  is exported from `core.js`.
+- **`you` / `phind` utm_source attribution** — added to `UTM_SOURCES` for
+  byte-parity with the Worker and WordPress tiers.
+### Changed
+- **`X-Ooky-Sdk` header reflects the runtime.** Was hardcoded `node/<ver>`
+  even on Vercel Edge / Web runtimes; now `edge/<ver>` (or `web/<ver>`) there.
+  `SDK_RUNTIME` is exported from `core.js`.
+- **Untrusted strings are capped before they enter the event payload** — bot
+  UA at 1024 chars, request path at 2048 (`MAX_UA_LENGTH`, `MAX_PATH_LENGTH`,
+  `clampString` exported from `core.js`). Defence-in-depth; the load-bearing
+  clamp remains server-side.
+- The JSON-kind manifest **network-failure** path now returns a JSON `{error}`
+  body with the declared `application/json` content-type (previously returned
+  `text/plain` with a `502`, mismatching the declared type for `manifest`/`mcp`
+  kinds).
+- Docs: `handleMcpInvocation` JSDoc + README now document both the JSON-RPC 2.0
+  and legacy `{tool,arguments}` protocols; the README config example lists
+  `onError` and `maxEventsPerMinute`.
+## 0.4.0 — 2026-06-09
+### Added
+- **Standard MCP server.** `POST /mcp` (and `/.well-known/mcp`) now speaks
+  MCP JSON-RPC 2.0 — `initialize`, `tools/list`, `tools/call`, `ping` — so
+  real MCP clients (Claude, MCP Inspector) can connect and call
+  `get_brand_info`. The legacy `{ tool, arguments }` protocol still works.
+- **AI referral attribution.** Human visits arriving from AI platforms
+  (ChatGPT, Perplexity, Claude, Gemini, Copilot, …) — detected via the
+  `Referer` header or `utm_source` — fire `ai_referral` events. Same
+  platform list as the Worker tier (`src/referrals.js`).
+- **`onError` option.** Called for every failure the SDK swallows: event
+  POST rejections and non-2xx responses (a `401` means a rotated/revoked
+  key), manifest fetch failures, registry refresh failures, throttle drops.
+- **`maxEventsPerMinute` option** (default 300). Token-bucket cap on event
+  POSTs so a bot storm can't turn your server into an unbounded POST
+  source. Drops are reported through `onError`.
+- Edge-runtime test suite — the adapter tests now also run under Vercel's
+  `edge-runtime` VM in CI, backing the "edge-safe" claim with a real check.
+### Changed
+- Unparseable `POST /mcp` bodies now return a JSON-RPC parse error
+  (`-32700`, HTTP 200) instead of a bare 400/500.
+## 0.3.0 — 2026-06-09
+### Added
+- MCP tool invocation on `POST /mcp` (legacy `{ tool, arguments }` shape).
+- `manifest_file` telemetry on bot events for manifest-path hits.
+## 0.2.0 — 2026-06-09
+### Added
+- Hard timeout on every upstream fetch (`fetchTimeoutMs`, default 10s).
+- In-memory manifest cache (`manifestCacheTtlMs`, default 5 min) with
+  stale-on-error serving (up to 24h) and in-flight request dedupe.
+- Automatic hourly bot-registry refresh (ETag-aware) — previously the
+  `autoRefreshBots` option existed but nothing triggered it.
+- Bare `/mcp` path (Worker parity).
+- TypeScript declarations for all entry points.
+- Next adapter: `event.waitUntil()` registration for background work.
+## 0.1.x
+Initial releases: well-known path serving (`/llms.txt`, `/llms-full.txt`,
+`/agents.md`, `/.well-known/ai-manifest.json`, `/.well-known/mcp`), bot
+detection with fire-and-forget events, Express/Next/Edge adapters.

package/README.md CHANGED Viewed

@@ -73,10 +73,37 @@ The SDK responds to these paths with the latest published manifest:
 | `/.well-known/ai-manifest.json` | Full JSON brand manifest (global + per-page) |
 | `/ai-manifest.json` | Same as above (alternate path) |
 | `/agents.md` | Markdown agent guide |
-| `/.well-known/mcp` | MCP server descriptor |
+| `/.well-known/mcp` | MCP server descriptor (GET) / tool invocation (POST) |
+| `/mcp` | Same as above (alternate path — some platforms intercept `/.well-known/*`) |
 Every other request passes through to your app unchanged.
+### MCP tool invocation
+`POST /mcp` (and `/.well-known/mcp`) speaks **two protocols** — pick whichever
+your client uses:
+- **Standard MCP — JSON-RPC 2.0** (what real MCP clients use: Claude, MCP
+  Inspector, ChatGPT connectors). Send `initialize`, `tools/list`, then
+  `tools/call`:
+  ```json
+  { "jsonrpc": "2.0", "id": 1, "method": "tools/call",
+    "params": { "name": "get_brand_info", "arguments": { "section": "about" } } }
+  ```
+  The SDK answers with a single JSON response (no SSE stream required).
+  Notifications (`notifications/*`) get `202 Accepted`; an unparseable body
+  returns a JSON-RPC parse error (`-32700`, HTTP 200 per spec).
+- **Legacy Ooky protocol** — `{ "tool": "get_brand_info", "arguments": { "section": "about" } }`
+  → `{ "result": … }`, kept for Worker-tier compatibility.
+Both answer `get_brand_info` from the published manifest (same cache and
+stale-on-error behavior as the other paths). Product tools
+(`search_products`, …) are Worker-tier only; the SDK returns a tool-not-found
+error for them.
 ## What gets logged
 For **every** request (manifest or not), the SDK checks the `User-Agent` against the bot registry. When a known AI bot is detected, it fires a fire-and-forget POST to `/api/ingest/events` with:
@@ -85,14 +112,20 @@ For **every** request (manifest or not), the SDK checks the `User-Agent` against
 {
   "event_id": "<uuid>",
   "timestamp": "<ISO 8601>",
-  "bot": { "name": "GPTBot", "verified": true, "ua_string": "<full UA>" },
+  "bot": { "name": "GPTBot", "verified": false, "ua_string": "<full UA>" },
   "request": { "page_path": "/pricing", "method": "GET" }
 }
 ```
 The event scope (which domain it belongs to) is determined server-side from your API key — you cannot accidentally log events for a different customer's domain.
-Human traffic produces no events.
+**AI referral attribution:** when a *human* arrives from an AI platform —
+detected via the `Referer` header (chatgpt.com, perplexity.ai, claude.ai,
+gemini.google.com, …) or `utm_source` (`?utm_source=chatgpt`) — the SDK fires
+an `ai_referral` event instead, powering the dashboard's attribution views.
+Same platform list as the Worker tier.
+All other human traffic produces no events.
 ## Configuration options
@@ -107,6 +140,10 @@ ookyMiddleware({
   cdnBase: "https://api.ooky.ai/api/public/manifest",       // Manifest source (default = apiBase + "/public/manifest")
   bots: undefined,                       // Override the bot registry; default ships with major AI bots
   autoRefreshBots: true,                 // Periodically refresh bot UA list from /api/public/bots
+  fetchTimeoutMs: 10000,                 // Hard timeout on every upstream fetch
+  manifestCacheTtlMs: 300000,            // In-memory manifest cache TTL (0 disables)
+  maxEventsPerMinute: 300,               // Token-bucket cap on event POSTs (Infinity disables)
+  onError: (err, ctx) => {},             // Surface swallowed failures (e.g. a 401 = rotated key)
 });
 ```
@@ -117,25 +154,35 @@ ookyMiddleware({
 | `apiBase` | `string` | `https://api.ooky.ai/api` | Override for self-hosted Ooky or staging. |
 | `cdnBase` | `string` | `${apiBase}/public/manifest` | Manifest source. By default the SDK fetches from Ooky's public manifest endpoint. Override to put your own CDN (Cloudflare, CloudFront, Fastly) in front. |
 | `bots` | `Array<{name, pattern, category}>` | Built-in default list | Ships with the major AI bots. Override only if you have custom UA patterns. |
-| `autoRefreshBots` | `boolean` | `true` | Refresh from `/api/public/bots` once an hour. Disable for fully offline use. |
+| `autoRefreshBots` | `boolean` | `true` | Refresh from `/api/public/bots` once an hour (ETag-aware). Disable for fully offline use. |
+| `fetchTimeoutMs` | `number` | `10000` | Abort upstream fetches (manifest, registry, events) after this many ms so a slow Ooky API can never hang your request path. |
+| `manifestCacheTtlMs` | `number` | `300000` | Manifest responses are cached in-memory per process. On upstream failure (network error or 5xx), a stale copy up to 24h old is served instead of an error. Set `0` to disable. |
+| `onError` | `(error, context) => void` | silent | Called for every failure the SDK swallows: event POST rejections **and non-2xx responses** (a `401` means your key was rotated/revoked), manifest fetch failures, registry refresh failures. `context` is `{ op, status?, kind?, throttled? }`. Wire it to your logger so a dead integration is visible: `onError: (e, ctx) => logger.warn("ooky", ctx.op, e.message)`. |
+| `maxEventsPerMinute` | `number` | `300` | Token-bucket cap on event POSTs — a bot storm can't turn your server into an unbounded POST source. Drops are reported through `onError` (at most once per 10s, with a count). Pass `Infinity` to disable. |
+TypeScript declarations ship with the package (`@ooky/sdk`, `/express`, `/next`, `/edge` are all typed) — no `@types/*` install needed.
-## Performance
+## Performance & resilience
-- The manifest fetch is HTTP-cached (`Cache-Control: public, max-age=300, s-maxage=600`) — your CDN/edge will serve repeat requests without hitting Ooky.
-- Event firing uses `fetch(..., { keepalive: true })` so it survives the response cycle without delaying it.
+- Manifest responses are cached in-memory for 5 minutes, with concurrent cold-cache requests deduped into a single upstream fetch.
+- If the Ooky API is unreachable or erroring, the SDK serves the last good copy (up to 24h old) — a transient Ooky outage never breaks your `/llms.txt`.
+- Every upstream fetch carries a hard 10s timeout (`AbortSignal.timeout`), so your request path can never hang on Ooky.
+- The manifest response also carries `Cache-Control: public, max-age=300, s-maxage=600` — your CDN/edge will serve repeat requests without hitting your origin at all.
+- Event firing uses `fetch(..., { keepalive: true })` so it survives the response cycle without delaying it. On Vercel Edge / Next middleware the SDK registers the event POST with `event.waitUntil()` automatically.
 - Bot detection is a substring check against an in-memory list — sub-millisecond per request.
 ## Troubleshooting
 **"I installed it but no events show up"**
-1. Confirm your domain is verified and the integration method is set to `sdk` (or `wordpress`) in the dashboard.
-2. Check that `process.env.OOKY_API_KEY` is actually set in your runtime — log it once at boot.
-3. Hit your site with a bot UA: `curl -H "User-Agent: GPTBot/1.0" https://your-site.com/` and watch the dashboard's AI Sessions page.
-4. If your app is behind a CDN that strips `User-Agent`, the SDK can't see the bot. Check your CDN config.
+1. Set the `onError` option to log swallowed failures — a repeated `recordEvent` error with `status: 401` means the key was rotated or revoked.
+2. Confirm your domain is verified and the integration method is set to `sdk` (or `wordpress`) in the dashboard.
+3. Check that `process.env.OOKY_API_KEY` is actually set in your runtime — log it once at boot.
+4. Hit your site with a bot UA: `curl -H "User-Agent: GPTBot/1.0" https://your-site.com/` and watch the dashboard's AI Sessions page.
+5. If your app is behind a CDN that strips `User-Agent`, the SDK can't see the bot. Check your CDN config.
 **"`/llms.txt` returns 404"**
 - The middleware only intercepts paths the SDK knows about. Make sure your framework's matcher passes those paths to the middleware before falling through to your routes.
-- If you've published the manifest in the dashboard, also check Ooky's edge CDN is reachable from your server: `curl https://edge.ooky.ai/<your-domain>/llms`.
+- If you've published the manifest in the dashboard, also check the manifest source is reachable from your server: `curl https://api.ooky.ai/api/public/manifest/<your-domain>/llms` (or your `cdnBase` override).
 **"Events fail with 401 Unauthorized"**
 - The API key has been revoked or rotated. Generate a new one from the dashboard and update the env var.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ooky/sdk",
-  "version": "0.1.0",
+  "version": "0.6.0",
   "description": "Ooky SDK — middleware for serving AI brand intelligence and capturing AI-bot analytics from your Node, Next.js, or Vercel Edge app.",
   "keywords": [
     "ai",
@@ -16,12 +16,12 @@
   ],
   "homepage": "https://ooky.ai",
   "bugs": {
-    "url": "https://github.com/ooky-ai/ooky/issues",
+    "url": "https://github.com/cloudweld/ooky/issues",
     "email": "support@ooky.ai"
   },
   "repository": {
     "type": "git",
-    "url": "git+https://github.com/ooky-ai/ooky.git",
+    "url": "git+https://github.com/cloudweld/ooky.git",
     "directory": "packages/sdk"
   },
   "license": "MIT",
@@ -32,16 +32,33 @@
   },
   "type": "module",
   "main": "./src/core.js",
+  "types": "./src/index.d.ts",
   "exports": {
-    ".": "./src/core.js",
-    "./core": "./src/core.js",
-    "./express": "./src/express.js",
-    "./next": "./src/next.js",
-    "./edge": "./src/edge.js"
+    ".": {
+      "types": "./src/index.d.ts",
+      "default": "./src/core.js"
+    },
+    "./core": {
+      "types": "./src/index.d.ts",
+      "default": "./src/core.js"
+    },
+    "./express": {
+      "types": "./src/express.d.ts",
+      "default": "./src/express.js"
+    },
+    "./next": {
+      "types": "./src/next.d.ts",
+      "default": "./src/next.js"
+    },
+    "./edge": {
+      "types": "./src/edge.d.ts",
+      "default": "./src/edge.js"
+    }
   },
   "files": [
     "src/",
     "README.md",
+    "CHANGELOG.md",
     "LICENSE"
   ],
   "scripts": {
@@ -57,6 +74,7 @@
     "node": ">=18"
   },
   "devDependencies": {
+    "@edge-runtime/vm": "^5.0.0",
     "express": "^4.18.2",
     "supertest": "^6.3.4",
     "vitest": "^3.2.4"

package/src/bots.js CHANGED Viewed

@@ -43,15 +43,57 @@ export const DEFAULT_BOTS = [
   { name: "ia_archiver", pattern: "ia_archiver", category: "other" },
 ];
+/**
+ * Hard cap on how many registry entries we'll ever scan per request. A
+ * malformed (or maliciously huge) /api/public/bots payload must never turn
+ * bot detection into an unbounded per-request loop. Mirrors the sanity cap
+ * applied on adoption in core.js.
+ */
+export const MAX_BOT_REGISTRY_ENTRIES = 2000;
 /**
  * Returns the matched bot { name, pattern, category } or null.
  * Case-insensitive substring match (the same logic the Worker uses).
+ *
+ * Defensive by contract: this runs on EVERY customer request, and a throw
+ * here (in Express) becomes an unhandled rejection that can crash the
+ * customer's process for all of their traffic. So it must NEVER throw and
+ * must tolerate a garbage registry (non-array, null/non-string patterns,
+ * empty-string patterns, huge arrays). Bad entries are skipped, not matched.
  */
 export function detectBot(userAgent, registry = DEFAULT_BOTS) {
   if (!userAgent || typeof userAgent !== "string") return null;
+  if (!Array.isArray(registry) || registry.length === 0) return null;
   const ua = userAgent.toLowerCase();
-  for (const b of registry) {
+  const limit = Math.min(registry.length, MAX_BOT_REGISTRY_ENTRIES);
+  for (let i = 0; i < limit; i++) {
+    const b = registry[i];
+    // Skip anything that isn't a usable { pattern: <non-empty string> }.
+    // An empty pattern would substring-match every UA — treat as invalid.
+    if (!b || typeof b.pattern !== "string" || b.pattern.length === 0) continue;
     if (ua.includes(b.pattern.toLowerCase())) return b;
   }
   return null;
 }
+/**
+ * Validate + cap an arbitrary bot registry before we adopt it (from the
+ * /api/public/bots endpoint or a customer's `options.bots`). Returns a new
+ * array containing only well-formed entries (object with a non-empty string
+ * `pattern`), capped at MAX_BOT_REGISTRY_ENTRIES. Returns null when the input
+ * isn't a usable array so callers can keep their current registry instead.
+ *
+ * Never throws — the whole point is to neutralise a bad payload at the seam
+ * rather than let it reach detectBot on the hot path.
+ */
+export function sanitizeBotRegistry(input) {
+  if (!Array.isArray(input)) return null;
+  const out = [];
+  for (const b of input) {
+    if (out.length >= MAX_BOT_REGISTRY_ENTRIES) break;
+    if (!b || typeof b !== "object") continue;
+    if (typeof b.pattern !== "string" || b.pattern.length === 0) continue;
+    out.push(b);
+  }
+  return out;
+}