@ooky/sdk 0.1.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,109 @@
1
+ # Changelog
2
+
3
+ All notable changes to `@ooky/sdk`. Versions follow [semver](https://semver.org);
4
+ pre-1.0, minor versions may include breaking changes (called out explicitly).
5
+
6
+ ## 0.6.0 — 2026-06-19
7
+
8
+ ### Fixed
9
+ - **Middleware fails safe on missing config (high).** `createOokyHandler` throws
10
+ when `apiKey`/`domain` are missing, and the adapters built the handler at
11
+ module load — so an unset or typo'd `OOKY_API_KEY` / `OOKY_DOMAIN` threw on
12
+ construction and **500'd the customer's entire site** (the middleware runs on
13
+ every request, e.g. Vercel's `MIDDLEWARE_INVOCATION_FAILED`). The Express,
14
+ Next, and Edge adapters now wrap construction: on failure they log one loud
15
+ line and return a pass-through middleware, so a misconfigured integration
16
+ disables Ooky **without taking down the host app**. Complements the existing
17
+ per-request pass-through hardening.
18
+
19
+ ## 0.5.0 — 2026-06-12
20
+
21
+ ### Fixed
22
+ - **Crash hardening (high).** A malformed `/api/public/bots` payload — an
23
+ empty-string pattern, a non-string/`null` entry, a backend row-mapping
24
+ change, a customer `options.bots` typo, or JSON corruption in transit —
25
+ could throw in `detectBot`. In the Express adapter that throw ran *outside*
26
+ the try/catch and became an unhandled rejection that crashed the customer's
27
+ process for **all** of their traffic, recurring on restart. `detectBot` is
28
+ now defensive (skips non-string/empty patterns, tolerates a non-array or
29
+ huge registry, never throws); registries are validated and capped on both
30
+ construction and refresh; and both adapters wrap `detectBot`/`matchPath` so
31
+ any throw degrades to pass-through (serve the customer's app) rather than
32
+ crashing.
33
+
34
+ ### Added
35
+ - **`geo.country` on events.** The Express and Next/Edge adapters now read the
36
+ visitor country from edge headers (`cf-ipcountry`, `x-vercel-ip-country`,
37
+ or `request.geo?.country`) and attach `geo: { country }` to both bot and
38
+ `ai_referral` events — the dashboard geo panel is no longer empty for SDK
39
+ customers. (CF placeholders `XX`/`T1` are dropped.)
40
+ - **MCP body-size cap on Next/Edge.** The Next/Edge adapter now rejects an
41
+ oversized MCP `POST` with a `413` via a `Content-Length` pre-check (parity
42
+ with the Express adapter's existing 64KB streaming cap). `MAX_MCP_BODY_BYTES`
43
+ is exported from `core.js`.
44
+ - **`you` / `phind` utm_source attribution** — added to `UTM_SOURCES` for
45
+ byte-parity with the Worker and WordPress tiers.
46
+
47
+ ### Changed
48
+ - **`X-Ooky-Sdk` header reflects the runtime.** Was hardcoded `node/<ver>`
49
+ even on Vercel Edge / Web runtimes; now `edge/<ver>` (or `web/<ver>`) there.
50
+ `SDK_RUNTIME` is exported from `core.js`.
51
+ - **Untrusted strings are capped before they enter the event payload** — bot
52
+ UA at 1024 chars, request path at 2048 (`MAX_UA_LENGTH`, `MAX_PATH_LENGTH`,
53
+ `clampString` exported from `core.js`). Defence-in-depth; the load-bearing
54
+ clamp remains server-side.
55
+ - The JSON-kind manifest **network-failure** path now returns a JSON `{error}`
56
+ body with the declared `application/json` content-type (previously returned
57
+ `text/plain` with a `502`, mismatching the declared type for `manifest`/`mcp`
58
+ kinds).
59
+ - Docs: `handleMcpInvocation` JSDoc + README now document both the JSON-RPC 2.0
60
+ and legacy `{tool,arguments}` protocols; the README config example lists
61
+ `onError` and `maxEventsPerMinute`.
62
+
63
+ ## 0.4.0 — 2026-06-09
64
+
65
+ ### Added
66
+ - **Standard MCP server.** `POST /mcp` (and `/.well-known/mcp`) now speaks
67
+ MCP JSON-RPC 2.0 — `initialize`, `tools/list`, `tools/call`, `ping` — so
68
+ real MCP clients (Claude, MCP Inspector) can connect and call
69
+ `get_brand_info`. The legacy `{ tool, arguments }` protocol still works.
70
+ - **AI referral attribution.** Human visits arriving from AI platforms
71
+ (ChatGPT, Perplexity, Claude, Gemini, Copilot, …) — detected via the
72
+ `Referer` header or `utm_source` — fire `ai_referral` events. Same
73
+ platform list as the Worker tier (`src/referrals.js`).
74
+ - **`onError` option.** Called for every failure the SDK swallows: event
75
+ POST rejections and non-2xx responses (a `401` means a rotated/revoked
76
+ key), manifest fetch failures, registry refresh failures, throttle drops.
77
+ - **`maxEventsPerMinute` option** (default 300). Token-bucket cap on event
78
+ POSTs so a bot storm can't turn your server into an unbounded POST
79
+ source. Drops are reported through `onError`.
80
+ - Edge-runtime test suite — the adapter tests now also run under Vercel's
81
+ `edge-runtime` VM in CI, backing the "edge-safe" claim with a real check.
82
+
83
+ ### Changed
84
+ - Unparseable `POST /mcp` bodies now return a JSON-RPC parse error
85
+ (`-32700`, HTTP 200) instead of a bare 400/500.
86
+
87
+ ## 0.3.0 — 2026-06-09
88
+
89
+ ### Added
90
+ - MCP tool invocation on `POST /mcp` (legacy `{ tool, arguments }` shape).
91
+ - `manifest_file` telemetry on bot events for manifest-path hits.
92
+
93
+ ## 0.2.0 — 2026-06-09
94
+
95
+ ### Added
96
+ - Hard timeout on every upstream fetch (`fetchTimeoutMs`, default 10s).
97
+ - In-memory manifest cache (`manifestCacheTtlMs`, default 5 min) with
98
+ stale-on-error serving (up to 24h) and in-flight request dedupe.
99
+ - Automatic hourly bot-registry refresh (ETag-aware) — previously the
100
+ `autoRefreshBots` option existed but nothing triggered it.
101
+ - Bare `/mcp` path (Worker parity).
102
+ - TypeScript declarations for all entry points.
103
+ - Next adapter: `event.waitUntil()` registration for background work.
104
+
105
+ ## 0.1.x
106
+
107
+ Initial releases: well-known path serving (`/llms.txt`, `/llms-full.txt`,
108
+ `/agents.md`, `/.well-known/ai-manifest.json`, `/.well-known/mcp`), bot
109
+ detection with fire-and-forget events, Express/Next/Edge adapters.
package/README.md CHANGED
@@ -73,10 +73,37 @@ The SDK responds to these paths with the latest published manifest:
73
73
  | `/.well-known/ai-manifest.json` | Full JSON brand manifest (global + per-page) |
74
74
  | `/ai-manifest.json` | Same as above (alternate path) |
75
75
  | `/agents.md` | Markdown agent guide |
76
- | `/.well-known/mcp` | MCP server descriptor |
76
+ | `/.well-known/mcp` | MCP server descriptor (GET) / tool invocation (POST) |
77
+ | `/mcp` | Same as above (alternate path — some platforms intercept `/.well-known/*`) |
77
78
 
78
79
  Every other request passes through to your app unchanged.
79
80
 
81
+ ### MCP tool invocation
82
+
83
+ `POST /mcp` (and `/.well-known/mcp`) speaks **two protocols** — pick whichever
84
+ your client uses:
85
+
86
+ - **Standard MCP — JSON-RPC 2.0** (what real MCP clients use: Claude, MCP
87
+ Inspector, ChatGPT connectors). Send `initialize`, `tools/list`, then
88
+ `tools/call`:
89
+
90
+ ```json
91
+ { "jsonrpc": "2.0", "id": 1, "method": "tools/call",
92
+ "params": { "name": "get_brand_info", "arguments": { "section": "about" } } }
93
+ ```
94
+
95
+ The SDK answers with a single JSON response (no SSE stream required).
96
+ Notifications (`notifications/*`) get `202 Accepted`; an unparseable body
97
+ returns a JSON-RPC parse error (`-32700`, HTTP 200 per spec).
98
+
99
+ - **Legacy Ooky protocol** — `{ "tool": "get_brand_info", "arguments": { "section": "about" } }`
100
+ → `{ "result": … }`, kept for Worker-tier compatibility.
101
+
102
+ Both answer `get_brand_info` from the published manifest (same cache and
103
+ stale-on-error behavior as the other paths). Product tools
104
+ (`search_products`, …) are Worker-tier only; the SDK returns a tool-not-found
105
+ error for them.
106
+
80
107
  ## What gets logged
81
108
 
82
109
  For **every** request (manifest or not), the SDK checks the `User-Agent` against the bot registry. When a known AI bot is detected, it fires a fire-and-forget POST to `/api/ingest/events` with:
@@ -85,14 +112,20 @@ For **every** request (manifest or not), the SDK checks the `User-Agent` against
85
112
  {
86
113
  "event_id": "<uuid>",
87
114
  "timestamp": "<ISO 8601>",
88
- "bot": { "name": "GPTBot", "verified": true, "ua_string": "<full UA>" },
115
+ "bot": { "name": "GPTBot", "verified": false, "ua_string": "<full UA>" },
89
116
  "request": { "page_path": "/pricing", "method": "GET" }
90
117
  }
91
118
  ```
92
119
 
93
120
  The event scope (which domain it belongs to) is determined server-side from your API key — you cannot accidentally log events for a different customer's domain.
94
121
 
95
- Human traffic produces no events.
122
+ **AI referral attribution:** when a *human* arrives from an AI platform —
123
+ detected via the `Referer` header (chatgpt.com, perplexity.ai, claude.ai,
124
+ gemini.google.com, …) or `utm_source` (`?utm_source=chatgpt`) — the SDK fires
125
+ an `ai_referral` event instead, powering the dashboard's attribution views.
126
+ Same platform list as the Worker tier.
127
+
128
+ All other human traffic produces no events.
96
129
 
97
130
  ## Configuration options
98
131
 
@@ -107,6 +140,10 @@ ookyMiddleware({
107
140
  cdnBase: "https://api.ooky.ai/api/public/manifest", // Manifest source (default = apiBase + "/public/manifest")
108
141
  bots: undefined, // Override the bot registry; default ships with major AI bots
109
142
  autoRefreshBots: true, // Periodically refresh bot UA list from /api/public/bots
143
+ fetchTimeoutMs: 10000, // Hard timeout on every upstream fetch
144
+ manifestCacheTtlMs: 300000, // In-memory manifest cache TTL (0 disables)
145
+ maxEventsPerMinute: 300, // Token-bucket cap on event POSTs (Infinity disables)
146
+ onError: (err, ctx) => {}, // Surface swallowed failures (e.g. a 401 = rotated key)
110
147
  });
111
148
  ```
112
149
 
@@ -117,25 +154,35 @@ ookyMiddleware({
117
154
  | `apiBase` | `string` | `https://api.ooky.ai/api` | Override for self-hosted Ooky or staging. |
118
155
  | `cdnBase` | `string` | `${apiBase}/public/manifest` | Manifest source. By default the SDK fetches from Ooky's public manifest endpoint. Override to put your own CDN (Cloudflare, CloudFront, Fastly) in front. |
119
156
  | `bots` | `Array<{name, pattern, category}>` | Built-in default list | Ships with the major AI bots. Override only if you have custom UA patterns. |
120
- | `autoRefreshBots` | `boolean` | `true` | Refresh from `/api/public/bots` once an hour. Disable for fully offline use. |
157
+ | `autoRefreshBots` | `boolean` | `true` | Refresh from `/api/public/bots` once an hour (ETag-aware). Disable for fully offline use. |
158
+ | `fetchTimeoutMs` | `number` | `10000` | Abort upstream fetches (manifest, registry, events) after this many ms so a slow Ooky API can never hang your request path. |
159
+ | `manifestCacheTtlMs` | `number` | `300000` | Manifest responses are cached in-memory per process. On upstream failure (network error or 5xx), a stale copy up to 24h old is served instead of an error. Set `0` to disable. |
160
+ | `onError` | `(error, context) => void` | silent | Called for every failure the SDK swallows: event POST rejections **and non-2xx responses** (a `401` means your key was rotated/revoked), manifest fetch failures, registry refresh failures. `context` is `{ op, status?, kind?, throttled? }`. Wire it to your logger so a dead integration is visible: `onError: (e, ctx) => logger.warn("ooky", ctx.op, e.message)`. |
161
+ | `maxEventsPerMinute` | `number` | `300` | Token-bucket cap on event POSTs — a bot storm can't turn your server into an unbounded POST source. Drops are reported through `onError` (at most once per 10s, with a count). Pass `Infinity` to disable. |
162
+
163
+ TypeScript declarations ship with the package (`@ooky/sdk`, `/express`, `/next`, `/edge` are all typed) — no `@types/*` install needed.
121
164
 
122
- ## Performance
165
+ ## Performance & resilience
123
166
 
124
- - The manifest fetch is HTTP-cached (`Cache-Control: public, max-age=300, s-maxage=600`) your CDN/edge will serve repeat requests without hitting Ooky.
125
- - Event firing uses `fetch(..., { keepalive: true })` so it survives the response cycle without delaying it.
167
+ - Manifest responses are cached in-memory for 5 minutes, with concurrent cold-cache requests deduped into a single upstream fetch.
168
+ - If the Ooky API is unreachable or erroring, the SDK serves the last good copy (up to 24h old) — a transient Ooky outage never breaks your `/llms.txt`.
169
+ - Every upstream fetch carries a hard 10s timeout (`AbortSignal.timeout`), so your request path can never hang on Ooky.
170
+ - The manifest response also carries `Cache-Control: public, max-age=300, s-maxage=600` — your CDN/edge will serve repeat requests without hitting your origin at all.
171
+ - Event firing uses `fetch(..., { keepalive: true })` so it survives the response cycle without delaying it. On Vercel Edge / Next middleware the SDK registers the event POST with `event.waitUntil()` automatically.
126
172
  - Bot detection is a substring check against an in-memory list — sub-millisecond per request.
127
173
 
128
174
  ## Troubleshooting
129
175
 
130
176
  **"I installed it but no events show up"**
131
- 1. Confirm your domain is verified and the integration method is set to `sdk` (or `wordpress`) in the dashboard.
132
- 2. Check that `process.env.OOKY_API_KEY` is actually set in your runtime log it once at boot.
133
- 3. Hit your site with a bot UA: `curl -H "User-Agent: GPTBot/1.0" https://your-site.com/` and watch the dashboard's AI Sessions page.
134
- 4. If your app is behind a CDN that strips `User-Agent`, the SDK can't see the bot. Check your CDN config.
177
+ 1. Set the `onError` option to log swallowed failures a repeated `recordEvent` error with `status: 401` means the key was rotated or revoked.
178
+ 2. Confirm your domain is verified and the integration method is set to `sdk` (or `wordpress`) in the dashboard.
179
+ 3. Check that `process.env.OOKY_API_KEY` is actually set in your runtime log it once at boot.
180
+ 4. Hit your site with a bot UA: `curl -H "User-Agent: GPTBot/1.0" https://your-site.com/` and watch the dashboard's AI Sessions page.
181
+ 5. If your app is behind a CDN that strips `User-Agent`, the SDK can't see the bot. Check your CDN config.
135
182
 
136
183
  **"`/llms.txt` returns 404"**
137
184
  - The middleware only intercepts paths the SDK knows about. Make sure your framework's matcher passes those paths to the middleware before falling through to your routes.
138
- - If you've published the manifest in the dashboard, also check Ooky's edge CDN is reachable from your server: `curl https://edge.ooky.ai/<your-domain>/llms`.
185
+ - If you've published the manifest in the dashboard, also check the manifest source is reachable from your server: `curl https://api.ooky.ai/api/public/manifest/<your-domain>/llms` (or your `cdnBase` override).
139
186
 
140
187
  **"Events fail with 401 Unauthorized"**
141
188
  - The API key has been revoked or rotated. Generate a new one from the dashboard and update the env var.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ooky/sdk",
3
- "version": "0.1.0",
3
+ "version": "0.6.0",
4
4
  "description": "Ooky SDK — middleware for serving AI brand intelligence and capturing AI-bot analytics from your Node, Next.js, or Vercel Edge app.",
5
5
  "keywords": [
6
6
  "ai",
@@ -16,12 +16,12 @@
16
16
  ],
17
17
  "homepage": "https://ooky.ai",
18
18
  "bugs": {
19
- "url": "https://github.com/ooky-ai/ooky/issues",
19
+ "url": "https://github.com/cloudweld/ooky/issues",
20
20
  "email": "support@ooky.ai"
21
21
  },
22
22
  "repository": {
23
23
  "type": "git",
24
- "url": "git+https://github.com/ooky-ai/ooky.git",
24
+ "url": "git+https://github.com/cloudweld/ooky.git",
25
25
  "directory": "packages/sdk"
26
26
  },
27
27
  "license": "MIT",
@@ -32,16 +32,33 @@
32
32
  },
33
33
  "type": "module",
34
34
  "main": "./src/core.js",
35
+ "types": "./src/index.d.ts",
35
36
  "exports": {
36
- ".": "./src/core.js",
37
- "./core": "./src/core.js",
38
- "./express": "./src/express.js",
39
- "./next": "./src/next.js",
40
- "./edge": "./src/edge.js"
37
+ ".": {
38
+ "types": "./src/index.d.ts",
39
+ "default": "./src/core.js"
40
+ },
41
+ "./core": {
42
+ "types": "./src/index.d.ts",
43
+ "default": "./src/core.js"
44
+ },
45
+ "./express": {
46
+ "types": "./src/express.d.ts",
47
+ "default": "./src/express.js"
48
+ },
49
+ "./next": {
50
+ "types": "./src/next.d.ts",
51
+ "default": "./src/next.js"
52
+ },
53
+ "./edge": {
54
+ "types": "./src/edge.d.ts",
55
+ "default": "./src/edge.js"
56
+ }
41
57
  },
42
58
  "files": [
43
59
  "src/",
44
60
  "README.md",
61
+ "CHANGELOG.md",
45
62
  "LICENSE"
46
63
  ],
47
64
  "scripts": {
@@ -57,6 +74,7 @@
57
74
  "node": ">=18"
58
75
  },
59
76
  "devDependencies": {
77
+ "@edge-runtime/vm": "^5.0.0",
60
78
  "express": "^4.18.2",
61
79
  "supertest": "^6.3.4",
62
80
  "vitest": "^3.2.4"
package/src/bots.js CHANGED
@@ -43,15 +43,57 @@ export const DEFAULT_BOTS = [
43
43
  { name: "ia_archiver", pattern: "ia_archiver", category: "other" },
44
44
  ];
45
45
 
46
+ /**
47
+ * Hard cap on how many registry entries we'll ever scan per request. A
48
+ * malformed (or maliciously huge) /api/public/bots payload must never turn
49
+ * bot detection into an unbounded per-request loop. Mirrors the sanity cap
50
+ * applied on adoption in core.js.
51
+ */
52
+ export const MAX_BOT_REGISTRY_ENTRIES = 2000;
53
+
46
54
  /**
47
55
  * Returns the matched bot { name, pattern, category } or null.
48
56
  * Case-insensitive substring match (the same logic the Worker uses).
57
+ *
58
+ * Defensive by contract: this runs on EVERY customer request, and a throw
59
+ * here (in Express) becomes an unhandled rejection that can crash the
60
+ * customer's process for all of their traffic. So it must NEVER throw and
61
+ * must tolerate a garbage registry (non-array, null/non-string patterns,
62
+ * empty-string patterns, huge arrays). Bad entries are skipped, not matched.
49
63
  */
50
64
  export function detectBot(userAgent, registry = DEFAULT_BOTS) {
51
65
  if (!userAgent || typeof userAgent !== "string") return null;
66
+ if (!Array.isArray(registry) || registry.length === 0) return null;
52
67
  const ua = userAgent.toLowerCase();
53
- for (const b of registry) {
68
+ const limit = Math.min(registry.length, MAX_BOT_REGISTRY_ENTRIES);
69
+ for (let i = 0; i < limit; i++) {
70
+ const b = registry[i];
71
+ // Skip anything that isn't a usable { pattern: <non-empty string> }.
72
+ // An empty pattern would substring-match every UA — treat as invalid.
73
+ if (!b || typeof b.pattern !== "string" || b.pattern.length === 0) continue;
54
74
  if (ua.includes(b.pattern.toLowerCase())) return b;
55
75
  }
56
76
  return null;
57
77
  }
78
+
79
+ /**
80
+ * Validate + cap an arbitrary bot registry before we adopt it (from the
81
+ * /api/public/bots endpoint or a customer's `options.bots`). Returns a new
82
+ * array containing only well-formed entries (object with a non-empty string
83
+ * `pattern`), capped at MAX_BOT_REGISTRY_ENTRIES. Returns null when the input
84
+ * isn't a usable array so callers can keep their current registry instead.
85
+ *
86
+ * Never throws — the whole point is to neutralise a bad payload at the seam
87
+ * rather than let it reach detectBot on the hot path.
88
+ */
89
+ export function sanitizeBotRegistry(input) {
90
+ if (!Array.isArray(input)) return null;
91
+ const out = [];
92
+ for (const b of input) {
93
+ if (out.length >= MAX_BOT_REGISTRY_ENTRIES) break;
94
+ if (!b || typeof b !== "object") continue;
95
+ if (typeof b.pattern !== "string" || b.pattern.length === 0) continue;
96
+ out.push(b);
97
+ }
98
+ return out;
99
+ }