@blamejs/core 0.12.26 → 0.12.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -8,6 +8,10 @@ upgrading across more than a few patches at a time.
8
8
 
9
9
  ## v0.12.x
10
10
 
11
+ - v0.12.28 (2026-05-24) — **`b.ai.capability` — model-capability registry + cheapest-satisfying-model router.** `b.ai.capability.create({ models })` turns a fleet of AI model descriptors into a routing decision: given a set of requirements (context window, input/output modalities, tool use, structured output, reasoning tier, citation support, prompt-caching size), it picks the cheapest model that satisfies all of them. NIST AI RMF (AI 100-1) MAP 2.x requires documenting each model's capabilities and limitations; the Model Cards convention (Mitchell et al., 2019) formalizes that descriptor — this primitive makes the descriptor actionable. Routing to the cheapest sufficient model is a front-line defense against over-provisioning spend and composes directly with `b.ai.quota`'s `cost-usd` dimension (the chosen descriptor's rate feeds the budget charge); refusing to route a request to a model that cannot satisfy it (missing modality, too-small context window, no tool use) catches a capability mismatch before the inference call burns tokens on a guaranteed-bad result. Cost ranking uses a supplied `costBasis` (`{ inputTokens, outputTokens }`) for real per-call spend, else the sum of the per-1k rates; ties break by model id so the choice is deterministic across calls and nodes. **Added:** *`b.ai.capability.create({ models })` — capability registry + router* — Returns `{ describe, list, register, satisfies, route }`. A descriptor carries `maxContextTokens`, `maxOutputTokens`, `modalitiesIn` / `modalitiesOut` (arrays), `toolUse`, `structuredOutput`, `fineTunable`, `reasoningTier` (`none` / `basic` / `standard` / `advanced`, ordered), `citationSupport`, `promptCachingMaxTokens`, and the cost rates `costPer1kInputTokens` / `costPer1kOutputTokens`. Descriptors are validated + frozen at registration so a typo (negative cost, unknown reasoning tier, non-array modality list) surfaces at config time rather than as a silent mis-route. `describe(modelId)` returns the frozen descriptor; `register(modelId, descriptor)` adds or replaces one at runtime. · *`route({ requirements, fallback?, costBasis? })` — cheapest-satisfying selection* — Collects every model whose descriptor satisfies all requirements, then returns the cheapest (`{ modelId, descriptor, estimatedCost, reason }`). Requirements: `minContextTokens`, `minOutputTokens`, `modalitiesIn` / `modalitiesOut` (model must support every listed modality), `toolUse`, `structuredOutput`, `fineTunable`, `minReasoningTier` (tier ordering — `standard` is met by `standard` or `advanced`), `citationSupport`, `minPromptCachingTokens`. When no model matches, `fallback` (a registered model id) is returned with `reason: "fallback"`, or the call refuses with `aiCapability/no-candidate` if no fallback was supplied. Routing decisions emit `ai/capability-routed` / `ai/capability-fallback` / `ai/capability-no-candidate` through the drop-silent audit chain. · *`satisfies(modelId, requirements)` — precise capability-mismatch reasons* — Returns `{ ok, failures }` where each failure names the `requirement`, the `need`, and what the model `have`s — so a caller surfaces a precise reason (e.g. `minReasoningTier need advanced have basic`) instead of a bare boolean. Use it to explain a routing miss or to gate a request against a specific model before calling it.
12
+
13
+ - v0.12.27 (2026-05-24) — **`b.ai.quota` — per-tenant, per-model AI usage budgets with atomic consume-and-check.** `b.ai.quota.create(opts)` builds an enforcer that caps AI inference usage per `(tenant, model, dimension, period)` and defends OWASP LLM Top 10 2025 LLM10 (Unbounded Consumption) — the class that includes denial-of-wallet, where an attacker drives a high volume of pay-per-use inferences until the bill itself is the attack. Meter by `tokens`, `requests`, `cost-usd`, or `compute-hours` over a calendar-aligned UTC window (`second` through `month`). `consume(tenant, model, amount)` is a single atomic check-and-charge: under the default `hard` enforcement it reserves the amount only if it fits under the ceiling, otherwise it refuses without charging — the limit test and the charge are one indivisible operation, so there is no charge-then-refund window for a concurrent call to observe. The in-memory counter is per-process; multi-node deployments supply an `opts.store` adapter whose `reserve` (an atomic conditional test-and-charge — a Redis Lua script, a SQL `UPDATE ... WHERE used + :amt <= :limit RETURNING used`) and `add` are atomic on the shared backend to enforce one aggregate ceiling across the cluster without false denials under contention. Limit resolution is most-specific-first: `perTenantModel` over `perTenant` over `perModel` over the default `limit`; tenant and model identifiers are percent-encoded into the counter key so a hostile tenant name cannot collide with another tenant's budget. **Added:** *`b.ai.quota.create(opts)` — per-tenant AI usage-budget enforcer* — Returns `{ consume, check, snapshot, reset }` scoped to one `dimension` (`tokens` / `requests` / `cost-usd` / `compute-hours`) and one `period` (`second` / `minute` / `hour` / `day` / `week` (Monday-aligned) / `month` (1st-of-month), all UTC-aligned). `consume(tenant, model, amount, opts?)` returns `{ used, limit, remaining, allowed, exceeded, windowStart, resetsAt, ... }`. `check(tenant, model)` is the read-only snapshot. Spin up one enforcer per dimension you meter — a monthly `cost-usd` budget and a per-minute `tokens` burst cap coexist as two `create()` calls sharing one store. Defends OWASP LLM10:2025 Unbounded Consumption / denial-of-wallet; maps to NIST AI RMF (AI 100-1) MANAGE 2.x and EU AI Act Art. 15 (robustness / resource-exhaustion resilience). · *`hard` / `soft` / `warn` enforcement* — `hard` (default) refuses the over-budget call and throws `aiQuota/exceeded` without charging — the rejected reservation is refunded so the counter is untouched. `soft` admits the charge but reports `allowed: false` so the caller decides whether to honor it. `warn` admits and allows (advisory), flagging `exceeded: true`. A per-call `consume(..., { enforcement })` override lets one endpoint soften the mode for a trusted internal caller without a second enforcer. Every over-budget event emits `ai/quota-exceeded` through the drop-silent audit chain (`ai/quota-applied` on success), tagged with the active cluster node id for attribution. · *Cross-node aggregate budgets via `opts.store`* — The default counter is in-memory (per-process). Supply `opts.store` exposing atomic `reserve` / `add` / `get` / `reset` (a Redis Lua script, a shared SQL row) and the ceiling is enforced on the cluster-wide aggregate. `hard` mode goes through `reserve`, an atomic conditional test-and-charge that adds the amount only if it fits — so a concurrent over-budget call cannot transiently inflate the counter and falsely deny a smaller call that should fit. Per-tenant and per-model limit overrides (`perTenant` / `perModel` / `perTenantModel`) are validated at config time so a malformed cap surfaces at boot, not as a silent fall-through to the default.
14
+
11
15
  - v0.12.26 (2026-05-24) — **`b.compliance` posture cascades — `eu-ai-act` + `ca-ab-853` + `cac-genai-label` POSTURE_DEFAULTS + backup encryption refusal.** Three new posture cascades wired into `b.compliance.POSTURE_DEFAULTS` + `KNOWN_POSTURES` + `REGIME_MAP` so operators globally pinning the EU AI Act / California AB-853 / China CAC GenAI postures get the right floors automatically: backupEncryptionRequired:true, auditChainSignedRequired:true, tlsMinVersion:TLSv1.3, requireVacuumAfterErase:true. `b.backup.bundleAdapterStorage` extends the encryption-required posture list to include the three new postures so `cryptoStrategy: "none"` is refused upfront under any of them (parity with HIPAA + PCI-DSS, which the operator surface has carried since v0.12.10). The canonical `eu-ai-act` posture is the production name; the legacy `ai-act` short name stays in KNOWN_POSTURES for back-compat with operators who pinned it pre-v0.12.26. **Added:** *`eu-ai-act` posture cascade — Regulation (EU) 2024/1689* — POSTURE_DEFAULTS entry: backupEncryptionRequired:true (Art. 12 logging + Art. 15 robustness/cybersecurity demand encryption-at-rest for high-risk system training logs), auditChainSignedRequired:true (Art. 12 + Art. 13 audit-chain integrity), tlsMinVersion:TLSv1.3, requireVacuumAfterErase:true (Art. 50(4) synthetic-content provenance — residual EXIF / metadata pointing at the generating model must be cleared on erase). REGIME_MAP entry under jurisdiction:"EU" domain:"ai-governance". KNOWN_POSTURES carries both `eu-ai-act` (canonical) and `ai-act` (legacy short name). · *`ca-ab-853` posture cascade — California AB-853 effective 2026* — Same encryption + audit floor as eu-ai-act; jurisdiction:"US-CA". Model-generated content watermarking + disclosure regime. Operators serving California traffic pin this posture for the AB-853 §22949.91 obligations the v0.12.12 deepfake primitive's crossWalk references. · *`cac-genai-label` posture cascade — China CAC GenAI Service Measures* — Synthetic-content labelling per Art. 12 + algorithm filing per Art. 4. Same backup encryption + signed audit chain floor. Operators serving Chinese traffic pin this posture so the bundleAdapterStorage refuses plaintext bundles and the disclosure primitive's `jurisdiction: "cn"` cross-walk produces the right legal-reference array. · *`bundleAdapterStorage` BACKUP_ENCRYPTION_REQUIRED_POSTURES extended* — `hipaa` + `pci-dss` (the v0.12.10 baseline) joined by the three AI postures. `cryptoStrategy: "none"` refused upfront under any of `eu-ai-act` / `ca-ab-853` / `cac-genai-label` with `backup/posture-requires-encryption`. Operators wiring backup storage in a regulated AI deployment now get the same posture-driven gate that the storage primitive has always applied to health + payment data.
12
16
 
13
17
  - v0.12.25 (2026-05-24) — **`b.ai.disclosure.applyAll(scenario)` — bundle Art. 50(1) / 50(3) / 50(4) disclosures for mixed-modality AI systems.** Composes the three v0.12.12 disclosure primitives (chatbot / deepfake / emotion) into a single bundled emit. Operators running mixed-modality AI systems (e.g. a chatbot that also generates images, or an emotion-recognition system embedded in a chat flow) declare which Art. 50 obligations apply via `scenario.kinds` and the primitive fans out to the per-obligation emit calls in one pass. Shared opts (jurisdiction, language, audit, correlationId) propagate to every per-kind emission so the cross-walk + audit-chain entries stay correlated across the bundle. **Added:** *`b.ai.disclosure.applyAll(scenario)` — multi-obligation bundled emit* — `scenario.kinds: ["chatbot", "deepfake", "emotion"]` (subset) selects which Art. 50 obligations to satisfy. Per-kind required fields (session for chatbot, content + contentType for deepfake) refused upfront when missing. Returns `{ disclosures: { chatbot?, deepfake?, emotion? } }` with each entry being the corresponding primitive's emission payload. Shared opts propagate: `scenario.jurisdiction` / `scenario.language` / `scenario.audit` / `scenario.correlationId` reach every per-kind call so a US-CA deployment serving chat + image gets both the AB-853 cross-walk AND the Art. 50(1) audit event under the same correlationId.
package/README.md CHANGED
@@ -162,6 +162,8 @@ The framework bundles the surface a typical Node app reaches for. Every primitiv
162
162
  - **Prompt-injection classification** — OWASP LLM01:2025 / NIST COSAIS RFI (`b.ai.input.classify`)
163
163
  - **Agent identity** — A2A signed agent-card primitive (Linux Foundation Agentic AI Foundation v1.x, ML-DSA-87) (`b.a2a`)
164
164
  - **Content provenance** — C2PA 2.1 + California SB-942 / AB-853 manifest builder for AI-generated media (provider, model id + version, timestamp, content ID, signed) (`b.contentCredentials`)
165
+ - **AI usage quotas** — per-tenant / per-model budgets metered by tokens / requests / cost-usd / compute-hours over calendar-aligned windows, with an atomic conditional reserve (no charge-then-refund race) + hard/soft/warn enforcement and an optional cross-node store; defends OWASP LLM10:2025 unbounded consumption / denial-of-wallet (`b.ai.quota`)
166
+ - **AI capability routing** — model-capability registry (context window / modalities / tool use / reasoning tier / cost rates) + a router that picks the cheapest model satisfying a request's requirements, refusing capability mismatches before the inference call (NIST AI RMF MAP + Model Cards); composes with `b.ai.quota` cost budgets (`b.ai.capability`)
165
167
  ### Compliance regimes
166
168
 
167
169
  - **Posture coordinator** — `b.compliance` cascades operator-declared regime into retention / audit / db / cryptoField via POSTURE_DEFAULTS:
package/index.js CHANGED
@@ -443,6 +443,8 @@ module.exports = {
443
443
  aiContentDetect: require("./lib/ai-content-detect"),
444
444
  modelManifest: require("./lib/ai-model-manifest"),
445
445
  disclosure: require("./lib/ai-disclosure"),
446
+ quota: require("./lib/ai-quota"),
447
+ capability: require("./lib/ai-capability"),
446
448
  },
447
449
  promisePool: require("./lib/promise-pool"),
448
450
  sdNotify: require("./lib/sd-notify"),
@@ -0,0 +1,482 @@
1
+ "use strict";
2
+ /**
3
+ * @module b.ai.capability
4
+ * @nav AI
5
+ * @title AI capability routing
6
+ *
7
+ * @intro
8
+ * A capability registry + capability-aware router for AI model
9
+ * fleets. NIST AI RMF (AI 100-1) MAP 2.x requires documenting each
10
+ * model's capabilities and limitations; the Model Cards convention
11
+ * (Mitchell et al., 2019) formalizes that descriptor. This module
12
+ * turns those descriptors into a routing decision: given a set of
13
+ * requirements (context window, modalities, tool use, reasoning
14
+ * tier, …), pick the <em>cheapest</em> model in the fleet that
15
+ * satisfies all of them, or fall back deterministically.
16
+ *
17
+ * <code>b.ai.capability.create({ models })</code> builds a registry
18
+ * from operator-supplied descriptors and returns:
19
+ *
20
+ * - <code>describe(modelId)</code> — the frozen descriptor.
21
+ * - <code>list()</code> — every registered model id.
22
+ * - <code>register(modelId, descriptor)</code> — add / replace one.
23
+ * - <code>satisfies(modelId, requirements)</code> —
24
+ * <code>{ ok, failures }</code> where each failure names the
25
+ * requirement, the need, and what the model has.
26
+ * - <code>route({ requirements, fallback?, costBasis? })</code> —
27
+ * the cheapest satisfying model, or the fallback, or a refusal.
28
+ *
29
+ * A descriptor carries: <code>maxContextTokens</code>,
30
+ * <code>maxOutputTokens</code>, <code>modalitiesIn</code> /
31
+ * <code>modalitiesOut</code> (arrays — e.g. <code>"text"</code>,
32
+ * <code>"image"</code>, <code>"audio"</code>, <code>"video"</code>),
33
+ * <code>toolUse</code>, <code>structuredOutput</code>,
34
+ * <code>fineTunable</code>, <code>reasoningTier</code>
35
+ * (<code>"none" | "basic" | "standard" | "advanced"</code>,
36
+ * ordered), <code>citationSupport</code>,
37
+ * <code>promptCachingMaxTokens</code>, and the cost rates
38
+ * <code>costPer1kInputTokens</code> / <code>costPer1kOutputTokens</code>.
39
+ *
40
+ * <strong>Routing picks the cheapest match.</strong> When a
41
+ * <code>costBasis</code> (<code>{ inputTokens, outputTokens }</code>)
42
+ * is supplied the router estimates the per-call cost and ranks by
43
+ * it; otherwise it ranks by the sum of the per-1k rates. Ties break
44
+ * by model id so the choice is deterministic. Routing to the
45
+ * cheapest sufficient model is the front-line defense against
46
+ * over-provisioning spend — it composes with
47
+ * <code>b.ai.quota</code>'s <code>cost-usd</code> dimension, where
48
+ * the chosen descriptor's rate feeds the budget charge.
49
+ *
50
+ * Refusing to route a request to a model that cannot satisfy it
51
+ * (missing modality, too-small context window, no tool use) catches
52
+ * a capability mismatch before the inference call burns tokens on a
53
+ * guaranteed-bad result.
54
+ *
55
+ * @card
56
+ * Capability registry + cheapest-satisfying-model router for AI
57
+ * model fleets (context / modalities / tool use / reasoning tier /
58
+ * cost). Composes with b.ai.quota cost budgets.
59
+ */
60
+
61
+ var lazyRequire = require("./lazy-require");
62
+ var validateOpts = require("./validate-opts");
63
+ var { defineClass } = require("./framework-error");
64
+
65
+ var AiCapabilityError = defineClass("AiCapabilityError", { alwaysPermanent: true });
66
+
67
+ var audit = lazyRequire(function () { return require("./audit"); });
68
+
69
+ // Ordered reasoning tiers — a requirement of `minReasoningTier:
70
+ // "standard"` is satisfied by "standard" or "advanced", not "basic".
71
+ var REASONING_TIERS = ["none", "basic", "standard", "advanced"];
72
+
73
+ // Cost rates are quoted per 1000 tokens (industry convention; the
74
+ // descriptor fields are costPer1kInputTokens / costPer1kOutputTokens).
75
+ // Dividing a token count by this rate unit converts a per-1k rate into
76
+ // the per-token multiplier — a rate denominator, not a byte size.
77
+ var COST_RATE_TOKEN_UNIT = 1000; // allow:raw-byte-literal — per-1k-token cost-rate denominator, not a byte count
78
+
79
+ var DESCRIPTOR_KEYS = [
80
+ "maxContextTokens", "maxOutputTokens", "modalitiesIn", "modalitiesOut",
81
+ "toolUse", "structuredOutput", "fineTunable", "reasoningTier",
82
+ "citationSupport", "promptCachingMaxTokens",
83
+ "costPer1kInputTokens", "costPer1kOutputTokens", "provider", "version",
84
+ ];
85
+
86
+ var REQUIREMENT_KEYS = [
87
+ "minContextTokens", "minOutputTokens", "modalitiesIn", "modalitiesOut",
88
+ "toolUse", "structuredOutput", "fineTunable", "minReasoningTier",
89
+ "citationSupport", "minPromptCachingTokens",
90
+ ];
91
+
92
+ function _isPositiveInt(n) {
93
+ return typeof n === "number" && isFinite(n) && n > 0 && Math.floor(n) === n;
94
+ }
95
+ function _isNonNegFinite(n) {
96
+ return typeof n === "number" && isFinite(n) && n >= 0;
97
+ }
98
+ function _isStringArray(a) {
99
+ if (!Array.isArray(a)) return false;
100
+ for (var i = 0; i < a.length; i++) {
101
+ if (typeof a[i] !== "string" || a[i].length === 0) return false;
102
+ }
103
+ return true;
104
+ }
105
+
106
+ // Normalize + validate one descriptor at registration time so a typo
107
+ // (negative cost, unknown reasoning tier, non-array modality list)
108
+ // surfaces at config time rather than as a silent mis-route.
109
+ function _normalizeDescriptor(modelId, d) {
110
+ if (!d || typeof d !== "object" || Array.isArray(d)) {
111
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
112
+ "ai.capability: descriptor for '" + modelId + "' must be a plain object");
113
+ }
114
+ validateOpts(d, DESCRIPTOR_KEYS, "ai.capability descriptor['" + modelId + "']");
115
+
116
+ if (!_isPositiveInt(d.maxContextTokens)) {
117
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
118
+ "ai.capability: '" + modelId + "'.maxContextTokens must be a positive integer");
119
+ }
120
+ var maxOut = (d.maxOutputTokens == null) ? d.maxContextTokens : d.maxOutputTokens;
121
+ if (!_isPositiveInt(maxOut)) {
122
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
123
+ "ai.capability: '" + modelId + "'.maxOutputTokens must be a positive integer");
124
+ }
125
+
126
+ var modIn = (d.modalitiesIn == null) ? ["text"] : d.modalitiesIn;
127
+ var modOut = (d.modalitiesOut == null) ? ["text"] : d.modalitiesOut;
128
+ if (!_isStringArray(modIn) || !_isStringArray(modOut)) {
129
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
130
+ "ai.capability: '" + modelId + "'.modalitiesIn / modalitiesOut must be arrays of non-empty strings");
131
+ }
132
+
133
+ var tier = (d.reasoningTier == null) ? "standard" : d.reasoningTier;
134
+ if (REASONING_TIERS.indexOf(tier) === -1) {
135
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
136
+ "ai.capability: '" + modelId + "'.reasoningTier must be one of " + REASONING_TIERS.join(" / "));
137
+ }
138
+
139
+ var cachingMax = (d.promptCachingMaxTokens == null) ? 0 : d.promptCachingMaxTokens;
140
+ var costIn = (d.costPer1kInputTokens == null) ? 0 : d.costPer1kInputTokens;
141
+ var costOut = (d.costPer1kOutputTokens == null) ? 0 : d.costPer1kOutputTokens;
142
+ if (!_isNonNegFinite(cachingMax) || !_isNonNegFinite(costIn) || !_isNonNegFinite(costOut)) {
143
+ throw new AiCapabilityError("aiCapability/bad-descriptor",
144
+ "ai.capability: '" + modelId + "'.promptCachingMaxTokens / costPer1kInputTokens / " +
145
+ "costPer1kOutputTokens must be non-negative finite numbers");
146
+ }
147
+
148
+ return Object.freeze({
149
+ modelId: modelId,
150
+ maxContextTokens: d.maxContextTokens,
151
+ maxOutputTokens: maxOut,
152
+ modalitiesIn: Object.freeze(modIn.slice()),
153
+ modalitiesOut: Object.freeze(modOut.slice()),
154
+ toolUse: d.toolUse === true,
155
+ structuredOutput: d.structuredOutput === true,
156
+ fineTunable: d.fineTunable === true,
157
+ reasoningTier: tier,
158
+ citationSupport: d.citationSupport === true,
159
+ promptCachingMaxTokens: cachingMax,
160
+ costPer1kInputTokens: costIn,
161
+ costPer1kOutputTokens: costOut,
162
+ provider: (typeof d.provider === "string") ? d.provider : null,
163
+ version: (typeof d.version === "string") ? d.version : null,
164
+ });
165
+ }
166
+
167
+ /**
168
+ * @primitive b.ai.capability.create
169
+ * @signature b.ai.capability.create(opts)
170
+ * @since 0.12.28
171
+ * @status stable
172
+ * @compliance soc2
173
+ * @related b.ai.quota.create, b.ai.modelManifest.build
174
+ *
175
+ * Build a capability registry + router from operator-supplied model
176
+ * descriptors. Returns <code>{ describe, list, register, satisfies,
177
+ * route }</code>. Pair it with <code>b.ai.quota</code>:
178
+ * <code>route()</code> picks the cheapest model that meets the
179
+ * request, and the chosen descriptor's cost rate feeds the
180
+ * <code>cost-usd</code> budget charge.
181
+ *
182
+ * @opts
183
+ * {
184
+ * models: { // required, ≥ 1 entry
185
+ * [modelId: string]: {
186
+ * maxContextTokens: number, // required, positive int
187
+ * maxOutputTokens?: number, // default: maxContextTokens
188
+ * modalitiesIn?: string[], // default: ["text"]
189
+ * modalitiesOut?: string[], // default: ["text"]
190
+ * toolUse?: boolean, // default: false
191
+ * structuredOutput?: boolean, // default: false
192
+ * fineTunable?: boolean, // default: false
193
+ * reasoningTier?: string, // none|basic|standard|advanced
194
+ * citationSupport?: boolean, // default: false
195
+ * promptCachingMaxTokens?: number, // default: 0
196
+ * costPer1kInputTokens?: number, // default: 0
197
+ * costPer1kOutputTokens?: number, // default: 0
198
+ * provider?: string,
199
+ * version?: string,
200
+ * }
201
+ * },
202
+ * audit?: boolean, // default: true (route decisions)
203
+ * }
204
+ *
205
+ * @example
206
+ * var fleet = b.ai.capability.create({
207
+ * models: {
208
+ * "haiku": { maxContextTokens: 200000, reasoningTier: "basic",
209
+ * costPer1kInputTokens: 0.001, costPer1kOutputTokens: 0.005 },
210
+ * "opus": { maxContextTokens: 200000, reasoningTier: "advanced",
211
+ * toolUse: true, modalitiesIn: ["text", "image"],
212
+ * costPer1kInputTokens: 0.015, costPer1kOutputTokens: 0.075 },
213
+ * },
214
+ * });
215
+ * var pick = fleet.route({
216
+ * requirements: { minContextTokens: 100000, toolUse: true,
217
+ * modalitiesIn: ["text", "image"] },
218
+ * costBasis: { inputTokens: 4000, outputTokens: 500 },
219
+ * });
220
+ * // → { modelId: "opus", descriptor: {...}, estimatedCost: 0.0975, reason: "cheapest-of-1" }
221
+ */
222
+ function create(opts) {
223
+ validateOpts.requireObject(opts, "ai.capability.create", AiCapabilityError);
224
+ validateOpts(opts, ["models", "audit"], "ai.capability.create");
225
+
226
+ if (!opts.models || typeof opts.models !== "object" || Array.isArray(opts.models)) {
227
+ throw new AiCapabilityError("aiCapability/bad-models",
228
+ "ai.capability.create: models must be a plain object { modelId: descriptor }");
229
+ }
230
+ var ids = Object.keys(opts.models);
231
+ if (ids.length === 0) {
232
+ throw new AiCapabilityError("aiCapability/bad-models",
233
+ "ai.capability.create: models must declare at least one model");
234
+ }
235
+
236
+ var registry = new Map();
237
+ for (var i = 0; i < ids.length; i++) {
238
+ registry.set(ids[i], _normalizeDescriptor(ids[i], opts.models[ids[i]]));
239
+ }
240
+ var auditOn = opts.audit !== false;
241
+
242
+ function _emitAudit(action, outcome, metadata) {
243
+ if (!auditOn) return;
244
+ try {
245
+ audit().safeEmit({ action: action, outcome: outcome, metadata: metadata || {} });
246
+ } catch (_e) { /* audit best-effort — drop-silent */ }
247
+ }
248
+
249
+ function describe(modelId) {
250
+ var d = registry.get(modelId);
251
+ if (!d) {
252
+ throw new AiCapabilityError("aiCapability/unknown-model",
253
+ "ai.capability.describe: unknown model '" + modelId + "'");
254
+ }
255
+ return d;
256
+ }
257
+
258
+ function list() {
259
+ return Array.from(registry.keys());
260
+ }
261
+
262
+ function register(modelId, descriptor) {
263
+ validateOpts.requireNonEmptyString(modelId,
264
+ "ai.capability.register: modelId", AiCapabilityError, "aiCapability/bad-model");
265
+ registry.set(modelId, _normalizeDescriptor(modelId, descriptor));
266
+ return registry.get(modelId);
267
+ }
268
+
269
+ // Returns { ok, failures } — every unmet requirement names what was
270
+ // needed and what the model has, so a caller can surface a precise
271
+ // capability-mismatch reason instead of a bare boolean.
272
+ function _evaluate(descriptor, requirements) {
273
+ var failures = [];
274
+ function fail(requirement, need, have) {
275
+ failures.push({ requirement: requirement, need: need, have: have });
276
+ }
277
+ if (requirements.minContextTokens != null &&
278
+ descriptor.maxContextTokens < requirements.minContextTokens) {
279
+ fail("minContextTokens", requirements.minContextTokens, descriptor.maxContextTokens);
280
+ }
281
+ if (requirements.minOutputTokens != null &&
282
+ descriptor.maxOutputTokens < requirements.minOutputTokens) {
283
+ fail("minOutputTokens", requirements.minOutputTokens, descriptor.maxOutputTokens);
284
+ }
285
+ if (requirements.modalitiesIn != null) {
286
+ for (var a = 0; a < requirements.modalitiesIn.length; a++) {
287
+ if (descriptor.modalitiesIn.indexOf(requirements.modalitiesIn[a]) === -1) {
288
+ fail("modalitiesIn", requirements.modalitiesIn[a], descriptor.modalitiesIn);
289
+ }
290
+ }
291
+ }
292
+ if (requirements.modalitiesOut != null) {
293
+ for (var b = 0; b < requirements.modalitiesOut.length; b++) {
294
+ if (descriptor.modalitiesOut.indexOf(requirements.modalitiesOut[b]) === -1) {
295
+ fail("modalitiesOut", requirements.modalitiesOut[b], descriptor.modalitiesOut);
296
+ }
297
+ }
298
+ }
299
+ if (requirements.toolUse === true && descriptor.toolUse !== true) {
300
+ fail("toolUse", true, false);
301
+ }
302
+ if (requirements.structuredOutput === true && descriptor.structuredOutput !== true) {
303
+ fail("structuredOutput", true, false);
304
+ }
305
+ if (requirements.fineTunable === true && descriptor.fineTunable !== true) {
306
+ fail("fineTunable", true, false);
307
+ }
308
+ if (requirements.citationSupport === true && descriptor.citationSupport !== true) {
309
+ fail("citationSupport", true, false);
310
+ }
311
+ if (requirements.minReasoningTier != null &&
312
+ REASONING_TIERS.indexOf(descriptor.reasoningTier) <
313
+ REASONING_TIERS.indexOf(requirements.minReasoningTier)) {
314
+ fail("minReasoningTier", requirements.minReasoningTier, descriptor.reasoningTier);
315
+ }
316
+ if (requirements.minPromptCachingTokens != null &&
317
+ descriptor.promptCachingMaxTokens < requirements.minPromptCachingTokens) {
318
+ fail("minPromptCachingTokens", requirements.minPromptCachingTokens, descriptor.promptCachingMaxTokens);
319
+ }
320
+ return { ok: failures.length === 0, failures: failures };
321
+ }
322
+
323
+ function _validateRequirements(requirements) {
324
+ if (requirements == null) return {};
325
+ if (typeof requirements !== "object" || Array.isArray(requirements)) {
326
+ throw new AiCapabilityError("aiCapability/bad-requirements",
327
+ "ai.capability: requirements must be a plain object");
328
+ }
329
+ validateOpts(requirements, REQUIREMENT_KEYS, "ai.capability requirements");
330
+ if (requirements.minReasoningTier != null &&
331
+ REASONING_TIERS.indexOf(requirements.minReasoningTier) === -1) {
332
+ throw new AiCapabilityError("aiCapability/bad-requirements",
333
+ "ai.capability: minReasoningTier must be one of " + REASONING_TIERS.join(" / "));
334
+ }
335
+ if (requirements.modalitiesIn != null && !_isStringArray(requirements.modalitiesIn)) {
336
+ throw new AiCapabilityError("aiCapability/bad-requirements",
337
+ "ai.capability: requirements.modalitiesIn must be an array of non-empty strings");
338
+ }
339
+ if (requirements.modalitiesOut != null && !_isStringArray(requirements.modalitiesOut)) {
340
+ throw new AiCapabilityError("aiCapability/bad-requirements",
341
+ "ai.capability: requirements.modalitiesOut must be an array of non-empty strings");
342
+ }
343
+ // Numeric minimums are compared with `<` against the descriptor; a
344
+ // non-numeric value (NaN, "128k", a bad parse) makes that compare
345
+ // false and SILENTLY satisfies the requirement, so an undersized
346
+ // model could be selected. Reject non-finite / negative here so a
347
+ // malformed requirement fails fast instead of fail-open.
348
+ var numericMins = ["minContextTokens", "minOutputTokens", "minPromptCachingTokens"];
349
+ for (var ni = 0; ni < numericMins.length; ni++) {
350
+ var nk = numericMins[ni];
351
+ if (requirements[nk] != null && !_isNonNegFinite(requirements[nk])) {
352
+ throw new AiCapabilityError("aiCapability/bad-requirements",
353
+ "ai.capability: requirements." + nk + " must be a non-negative finite number");
354
+ }
355
+ }
356
+ // Boolean opt-in requirements are matched with `=== true`; a
357
+ // non-boolean (truthy 1, "false") would silently fail to require
358
+ // the capability. Reject non-booleans so the intent is explicit.
359
+ var booleanReqs = ["toolUse", "structuredOutput", "fineTunable", "citationSupport"];
360
+ for (var bi = 0; bi < booleanReqs.length; bi++) {
361
+ var bk = booleanReqs[bi];
362
+ if (requirements[bk] != null && typeof requirements[bk] !== "boolean") {
363
+ throw new AiCapabilityError("aiCapability/bad-requirements",
364
+ "ai.capability: requirements." + bk + " must be a boolean");
365
+ }
366
+ }
367
+ return requirements;
368
+ }
369
+
370
+ function satisfies(modelId, requirements) {
371
+ return _evaluate(describe(modelId), _validateRequirements(requirements));
372
+ }
373
+
374
+ // Per-call cost estimate. With a costBasis the estimate is the
375
+ // real per-call spend (input + output tokens at the model's rates);
376
+ // without one it is the sum of the per-1k rates — a stable proxy
377
+ // for "cheaper model" when the caller hasn't sized the request.
378
+ function _estimateCost(descriptor, costBasis) {
379
+ if (costBasis) {
380
+ var inTok = _isNonNegFinite(costBasis.inputTokens) ? costBasis.inputTokens : 0;
381
+ var outTok = _isNonNegFinite(costBasis.outputTokens) ? costBasis.outputTokens : 0;
382
+ return (inTok / COST_RATE_TOKEN_UNIT) * descriptor.costPer1kInputTokens +
383
+ (outTok / COST_RATE_TOKEN_UNIT) * descriptor.costPer1kOutputTokens;
384
+ }
385
+ return descriptor.costPer1kInputTokens + descriptor.costPer1kOutputTokens;
386
+ }
387
+
388
+ function route(routeOpts) {
389
+ routeOpts = routeOpts || {};
390
+ validateOpts(routeOpts, ["requirements", "fallback", "costBasis"], "ai.capability.route");
391
+ var requirements = _validateRequirements(routeOpts.requirements);
392
+ var costBasis = null;
393
+ if (routeOpts.costBasis != null) {
394
+ if (typeof routeOpts.costBasis !== "object" || Array.isArray(routeOpts.costBasis)) {
395
+ throw new AiCapabilityError("aiCapability/bad-requirements",
396
+ "ai.capability.route: costBasis must be a plain object { inputTokens, outputTokens }");
397
+ }
398
+ validateOpts(routeOpts.costBasis, ["inputTokens", "outputTokens"],
399
+ "ai.capability.route costBasis");
400
+ // A malformed costBasis field silently underprices a candidate
401
+ // and biases the "cheapest" choice toward the wrong model — fail
402
+ // fast instead. An absent field is fine (treated as 0 tokens on
403
+ // that side); a present-but-non-numeric field is rejected.
404
+ var cbFields = ["inputTokens", "outputTokens"];
405
+ for (var ci = 0; ci < cbFields.length; ci++) {
406
+ var ck = cbFields[ci];
407
+ if (routeOpts.costBasis[ck] != null && !_isNonNegFinite(routeOpts.costBasis[ck])) {
408
+ throw new AiCapabilityError("aiCapability/bad-requirements",
409
+ "ai.capability.route: costBasis." + ck + " must be a non-negative finite number");
410
+ }
411
+ }
412
+ costBasis = routeOpts.costBasis;
413
+ }
414
+
415
+ // Collect every satisfying model, then pick the cheapest. Tie
416
+ // break by model id (lexicographic) so the choice is deterministic
417
+ // across calls and across nodes.
418
+ var candidates = [];
419
+ var modelIds = Array.from(registry.keys());
420
+ for (var i = 0; i < modelIds.length; i++) {
421
+ var d = registry.get(modelIds[i]);
422
+ if (_evaluate(d, requirements).ok) {
423
+ candidates.push({ modelId: modelIds[i], descriptor: d, cost: _estimateCost(d, costBasis) });
424
+ }
425
+ }
426
+ candidates.sort(function (x, y) {
427
+ if (x.cost !== y.cost) return x.cost - y.cost;
428
+ return x.modelId < y.modelId ? -1 : (x.modelId > y.modelId ? 1 : 0);
429
+ });
430
+
431
+ if (candidates.length > 0) {
432
+ var pick = candidates[0];
433
+ _emitAudit("ai/capability-routed", "allowed", {
434
+ modelId: pick.modelId, candidateCount: candidates.length,
435
+ estimatedCost: pick.cost, requirements: requirements,
436
+ });
437
+ return {
438
+ modelId: pick.modelId,
439
+ descriptor: pick.descriptor,
440
+ estimatedCost: pick.cost,
441
+ reason: "cheapest-of-" + candidates.length,
442
+ };
443
+ }
444
+
445
+ // No model satisfies the requirements.
446
+ if (routeOpts.fallback != null) {
447
+ var fb = registry.get(routeOpts.fallback);
448
+ if (!fb) {
449
+ throw new AiCapabilityError("aiCapability/unknown-model",
450
+ "ai.capability.route: fallback '" + routeOpts.fallback + "' is not a registered model");
451
+ }
452
+ _emitAudit("ai/capability-fallback", "allowed", {
453
+ modelId: routeOpts.fallback, requirements: requirements,
454
+ });
455
+ return {
456
+ modelId: routeOpts.fallback,
457
+ descriptor: fb,
458
+ estimatedCost: _estimateCost(fb, costBasis),
459
+ reason: "fallback",
460
+ };
461
+ }
462
+
463
+ _emitAudit("ai/capability-no-candidate", "denied", { requirements: requirements });
464
+ throw new AiCapabilityError("aiCapability/no-candidate",
465
+ "ai.capability.route: no registered model satisfies the requirements " +
466
+ "and no fallback was supplied");
467
+ }
468
+
469
+ return {
470
+ describe: describe,
471
+ list: list,
472
+ register: register,
473
+ satisfies: satisfies,
474
+ route: route,
475
+ };
476
+ }
477
+
478
+ module.exports = {
479
+ create: create,
480
+ REASONING_TIERS: REASONING_TIERS,
481
+ AiCapabilityError: AiCapabilityError,
482
+ };
@@ -0,0 +1,526 @@
1
+ "use strict";
2
+ /**
3
+ * @module b.ai.quota
4
+ * @nav Compliance
5
+ * @title AI usage quota
6
+ *
7
+ * @intro
8
+ * Per-tenant, per-model usage budgets for AI inference endpoints.
9
+ * OWASP LLM Top 10 2025 ranks <strong>LLM10: Unbounded
10
+ * Consumption</strong> — the class that includes "denial of
11
+ * wallet" (DoW), where an attacker drives a high volume of
12
+ * pay-per-use inferences until the bill itself becomes the
13
+ * attack — as a top application risk. A single misbehaving (or
14
+ * compromised) tenant can saturate context windows, exhaust GPU
15
+ * minutes, or run up an unbounded cloud-inference bill long
16
+ * before a human notices.
17
+ *
18
+ * This primitive enforces a hard ceiling per
19
+ * <code>(tenant, model, dimension, period)</code>:
20
+ *
21
+ * - <code>dimension</code> — what is being metered:
22
+ * <code>"tokens"</code> (context + completion tokens),
23
+ * <code>"requests"</code> (inference calls),
24
+ * <code>"cost-usd"</code> (provider spend), or
25
+ * <code>"compute-hours"</code> (GPU / accelerator time).
26
+ * - <code>period</code> — the budget window, calendar-aligned in
27
+ * UTC: <code>"second"</code>, <code>"minute"</code>,
28
+ * <code>"hour"</code>, <code>"day"</code>, <code>"week"</code>
29
+ * (Monday-aligned), or <code>"month"</code> (1st-of-month).
30
+ * - <code>enforcement</code> — <code>"hard"</code> (default,
31
+ * refuse the over-budget call), <code>"soft"</code> (admit but
32
+ * report <code>allowed:false</code> so the caller decides), or
33
+ * <code>"warn"</code> (admit + audit only).
34
+ *
35
+ * <code>consume(tenant, model, amount)</code> is the single
36
+ * atomic check-and-charge entry point: in <code>"hard"</code>
37
+ * mode it reserves <code>amount</code> only if it fits under the
38
+ * limit, otherwise it refuses without charging. There is no
39
+ * separate "check then add" two-call shape to race against — the
40
+ * reservation and the limit test happen in one operation.
41
+ *
42
+ * <strong>Single-process by default; cross-node via store.</strong>
43
+ * The in-memory counter is per-process. Multi-node deployments
44
+ * that need an aggregate ceiling across the cluster supply an
45
+ * <code>opts.store</code> adapter whose <code>reserve</code> (an
46
+ * atomic conditional test-and-charge — "add only if current +
47
+ * amount fits under the limit") and <code>add</code> are atomic on
48
+ * the shared backend: a Redis Lua script, or a SQL
49
+ * <code>UPDATE ... SET used = used + :amt WHERE used + :amt &lt;= :limit
50
+ * RETURNING used</code>. The conditional reserve is what keeps
51
+ * <code>hard</code> enforcement correct under cross-node
52
+ * contention — there is no charge-then-refund window for a
53
+ * concurrent call to observe. The framework records the active
54
+ * cluster node id on every breach event so a denial-of-wallet
55
+ * spike is attributable.
56
+ *
57
+ * Limit resolution is most-specific-first:
58
+ * <code>perTenantModel[t|m]</code> →
59
+ * <code>perTenant[t]</code> → <code>perModel[m]</code> →
60
+ * <code>limit</code> (the default). Tenant and model identifiers
61
+ * are percent-encoded into the counter key so a hostile tenant
62
+ * name cannot collide with another tenant's budget.
63
+ *
64
+ * Audit emissions (drop-silent via <code>b.audit.safeEmit</code>):
65
+ * - <code>ai/quota-applied</code> — a consume succeeded.
66
+ * - <code>ai/quota-exceeded</code> — a consume hit the ceiling
67
+ * (refused under <code>"hard"</code>; reported under
68
+ * <code>"soft"</code> / <code>"warn"</code>).
69
+ *
70
+ * NIST AI RMF (AI 100-1) MANAGE 2.x ("AI system performance and
71
+ * trustworthiness are monitored") and EU AI Act Art. 15
72
+ * (accuracy, robustness and cybersecurity of high-risk systems —
73
+ * resource-exhaustion resilience) map onto this primitive;
74
+ * operators wire its emissions into the same audit chain auditors
75
+ * read.
76
+ *
77
+ * @card
78
+ * Per-tenant, per-model AI usage budgets (tokens / requests /
79
+ * cost-usd / compute-hours) with atomic consume-and-check.
80
+ * Defends OWASP LLM10 unbounded consumption / denial-of-wallet.
81
+ */
82
+
83
+ var C = require("./constants");
84
+ var lazyRequire = require("./lazy-require");
85
+ var validateOpts = require("./validate-opts");
86
+ var { defineClass } = require("./framework-error");
87
+
88
+ var AiQuotaError = defineClass("AiQuotaError", { alwaysPermanent: true });
89
+
90
+ var audit = lazyRequire(function () { return require("./audit"); });
91
+ var observability = lazyRequire(function () { return require("./observability"); });
92
+ var cluster = lazyRequire(function () { return require("./cluster"); });
93
+
94
+ var DIMENSIONS = ["tokens", "requests", "cost-usd", "compute-hours"];
95
+ var PERIODS = ["second", "minute", "hour", "day", "week", "month"];
96
+ var ENFORCEMENTS = ["hard", "soft", "warn"];
97
+
98
+ // ---- Calendar-aligned period windows (UTC) ----
99
+ //
100
+ // Fixed-duration periods (second / minute) align to the epoch, which
101
+ // is itself UTC midnight, so a modulo is exact. Hour / day / week /
102
+ // month align to human UTC boundaries via Date.UTC truncation —
103
+ // week starts Monday, month starts on the 1st — so "100k tokens per
104
+ // day" resets at 00:00 UTC, not at a rolling 24h offset from first
105
+ // use.
106
+
107
+ function _windowStartFor(period, now) {
108
+ var d = new Date(now);
109
+ switch (period) {
110
+ case "second": return now - (now % C.TIME.seconds(1));
111
+ case "minute": return now - (now % C.TIME.minutes(1));
112
+ case "hour":
113
+ return Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), d.getUTCDate(), d.getUTCHours());
114
+ case "day":
115
+ return Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), d.getUTCDate());
116
+ case "week": {
117
+ var dayMid = Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), d.getUTCDate());
118
+ var dow = new Date(dayMid).getUTCDay(); // 0=Sun .. 6=Sat
119
+ var sinceMonday = (dow + 6) % 7; // 0=Mon .. 6=Sun
120
+ return dayMid - sinceMonday * C.TIME.days(1);
121
+ }
122
+ case "month":
123
+ return Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), 1);
124
+ default:
125
+ // unreachable — period validated at create()
126
+ return now;
127
+ }
128
+ }
129
+
130
+ function _resetsAtFor(period, windowStart) {
131
+ var d = new Date(windowStart);
132
+ switch (period) {
133
+ case "second": return windowStart + C.TIME.seconds(1);
134
+ case "minute": return windowStart + C.TIME.minutes(1);
135
+ case "hour": return windowStart + C.TIME.hours(1);
136
+ case "day": return windowStart + C.TIME.days(1);
137
+ case "week": return windowStart + C.TIME.weeks(1);
138
+ case "month": return Date.UTC(d.getUTCFullYear(), d.getUTCMonth() + 1, 1);
139
+ default: return windowStart;
140
+ }
141
+ }
142
+
143
+ // ---- Default in-memory atomic counter store ----
144
+ //
145
+ // Single-threaded JS makes each operation below one indivisible step,
146
+ // so a concurrent caller never observes a partial update. Entries
147
+ // self-expire at the window boundary; reads past expiry return 0 (a
148
+ // fresh window). _keysWithPrefix backs the reset(tenant) enumeration
149
+ // the default store can satisfy without an external scan.
150
+
151
+ function _memoryStore() {
152
+ var m = new Map(); // key -> { value, expiresAt }
153
+ function _slot(key, windowMs) {
154
+ var now = Date.now();
155
+ var e = m.get(key);
156
+ if (!e || e.expiresAt <= now) {
157
+ e = { value: 0, expiresAt: now + windowMs };
158
+ m.set(key, e);
159
+ }
160
+ return e;
161
+ }
162
+ return {
163
+ // Atomic conditional reserve — tests current + amount <= limit and
164
+ // charges only if it fits, as one indivisible operation. Returns
165
+ // { allowed, used }; on refusal the amount is NOT charged, so a
166
+ // concurrent over-budget call cannot transiently inflate the
167
+ // counter and falsely deny a smaller call that should fit.
168
+ reserve: function (key, amount, limit, windowMs) {
169
+ var e = _slot(key, windowMs);
170
+ if (e.value + amount > limit) return { allowed: false, used: e.value };
171
+ e.value += amount;
172
+ return { allowed: true, used: e.value };
173
+ },
174
+ // Unconditional add — for soft / warn modes, which always charge.
175
+ add: function (key, amount, windowMs) {
176
+ var e = _slot(key, windowMs);
177
+ e.value += amount;
178
+ return e.value;
179
+ },
180
+ get: function (key) {
181
+ var e = m.get(key);
182
+ if (!e || e.expiresAt <= Date.now()) return 0;
183
+ return e.value;
184
+ },
185
+ reset: function (key) {
186
+ m.delete(key);
187
+ },
188
+ _keysWithPrefix: function (prefix) {
189
+ var out = [];
190
+ m.forEach(function (_e, k) { if (k.indexOf(prefix) === 0) out.push(k); });
191
+ return out;
192
+ },
193
+ _clear: function () { m.clear(); },
194
+ };
195
+ }
196
+
197
+ /**
198
+ * @primitive b.ai.quota.create
199
+ * @signature b.ai.quota.create(opts)
200
+ * @since 0.12.27
201
+ * @status stable
202
+ * @compliance soc2, gdpr
203
+ * @related b.tenantQuota.budget, b.ai.disclosure.chatbot
204
+ *
205
+ * Build a per-tenant AI usage-budget enforcer scoped to one
206
+ * <code>dimension</code> and one <code>period</code>. Returns an
207
+ * object exposing <code>consume(tenant, model, amount, opts?)</code>
208
+ * (the atomic check-and-charge), <code>check(tenant, model)</code>
209
+ * (read-only snapshot), <code>snapshot(tenant, model)</code> (alias
210
+ * of <code>check</code>), and <code>reset(tenant?, model?)</code>
211
+ * (drop the current window's counters).
212
+ *
213
+ * Spin up one enforcer per dimension you meter — e.g. a
214
+ * <code>"cost-usd"</code> monthly budget and a
215
+ * <code>"tokens"</code> per-minute burst cap can coexist as two
216
+ * <code>create()</code> calls sharing the same store.
217
+ *
218
+ * @opts
219
+ * {
220
+ * dimension: string, // required, one of:
221
+ * // "tokens" | "requests" |
222
+ * // "cost-usd" | "compute-hours"
223
+ * period: string, // required, one of:
224
+ * // "second" | "minute" | "hour" |
225
+ * // "day" | "week" | "month"
226
+ * limit: number, // required, default ceiling (> 0)
227
+ * perTenant?: { [tenantId: string]: number },
228
+ * perModel?: { [model: string]: number },
229
+ * perTenantModel?: { [tenantPipeModel: string]: number },
230
+ * // key is `tenantId + "|" + model`
231
+ * enforcement?: string, // "hard" (default) | "soft" | "warn"
232
+ * store?: object, // { reserve, add, get, reset };
233
+ * // default in-memory (per-process)
234
+ * audit?: boolean, // default: true
235
+ * }
236
+ *
237
+ * @example
238
+ * var budget = b.ai.quota.create({
239
+ * dimension: "cost-usd",
240
+ * period: "month",
241
+ * limit: 500,
242
+ * perTenant: { "tenant-vip": 5000 },
243
+ * enforcement: "hard",
244
+ * });
245
+ * var r = await budget.consume("tenant-acme", "opus-4", 0.42);
246
+ * // → { tenantId: "tenant-acme", model: "opus-4",
247
+ * // dimension: "cost-usd", period: "month", used: 0.42,
248
+ * // limit: 500, remaining: 499.58, allowed: true,
249
+ * // exceeded: false, windowStart: ..., resetsAt: ... }
250
+ */
251
+ function create(opts) {
252
+ validateOpts.requireObject(opts, "ai.quota.create", AiQuotaError);
253
+ validateOpts(opts, [
254
+ "dimension", "period", "limit", "perTenant", "perModel",
255
+ "perTenantModel", "enforcement", "store", "audit",
256
+ ], "ai.quota.create");
257
+
258
+ var dimension = opts.dimension;
259
+ if (DIMENSIONS.indexOf(dimension) === -1) {
260
+ throw new AiQuotaError("aiQuota/bad-dimension",
261
+ "ai.quota.create: dimension must be one of " + DIMENSIONS.join(" / ") +
262
+ " (got " + JSON.stringify(dimension) + ")");
263
+ }
264
+
265
+ var period = opts.period;
266
+ if (PERIODS.indexOf(period) === -1) {
267
+ throw new AiQuotaError("aiQuota/bad-period",
268
+ "ai.quota.create: period must be one of " + PERIODS.join(" / ") +
269
+ " (got " + JSON.stringify(period) + ")");
270
+ }
271
+
272
+ if (typeof opts.limit !== "number" || !isFinite(opts.limit) || opts.limit <= 0) {
273
+ throw new AiQuotaError("aiQuota/bad-limit",
274
+ "ai.quota.create: limit must be a positive finite number");
275
+ }
276
+ var defaultLimit = opts.limit;
277
+
278
+ var perTenant = _validateLimitMap(opts.perTenant, "perTenant");
279
+ var perModel = _validateLimitMap(opts.perModel, "perModel");
280
+ var perTenantModel = _validateLimitMap(opts.perTenantModel, "perTenantModel");
281
+
282
+ var enforcement = (opts.enforcement == null) ? "hard" : opts.enforcement;
283
+ if (ENFORCEMENTS.indexOf(enforcement) === -1) {
284
+ throw new AiQuotaError("aiQuota/bad-enforcement",
285
+ "ai.quota.create: enforcement must be one of " + ENFORCEMENTS.join(" / ") +
286
+ " (got " + JSON.stringify(enforcement) + ")");
287
+ }
288
+
289
+ var store = opts.store || _memoryStore();
290
+ _validateStore(store);
291
+ var storeIsDefault = !opts.store;
292
+
293
+ var auditOn = opts.audit !== false;
294
+
295
+ function _limitFor(tenantId, model) {
296
+ var tmKey = tenantId + "|" + model;
297
+ if (Object.prototype.hasOwnProperty.call(perTenantModel, tmKey)) return perTenantModel[tmKey];
298
+ if (Object.prototype.hasOwnProperty.call(perTenant, tenantId)) return perTenant[tenantId];
299
+ if (Object.prototype.hasOwnProperty.call(perModel, model)) return perModel[model];
300
+ return defaultLimit;
301
+ }
302
+
303
+ // Counter key — tenant + model percent-encoded so a value
304
+ // containing the ":" separator cannot collide with another
305
+ // (tenant, model) pair's budget.
306
+ function _keyFor(tenantId, model, windowStart) {
307
+ return "aiq:" + dimension + ":" + period + ":" +
308
+ encodeURIComponent(tenantId) + ":" + encodeURIComponent(model) + ":" + windowStart;
309
+ }
310
+ function _keyPrefixForTenant(tenantId) {
311
+ return "aiq:" + dimension + ":" + period + ":" + encodeURIComponent(tenantId) + ":";
312
+ }
313
+
314
+ function _nodeId() {
315
+ try {
316
+ if (cluster().isClusterMode()) return cluster().currentNodeId();
317
+ } catch (_e) { /* cluster optional */ }
318
+ return null;
319
+ }
320
+
321
+ function _emitAudit(action, outcome, metadata) {
322
+ if (!auditOn) return;
323
+ try {
324
+ audit().safeEmit({ action: action, outcome: outcome, metadata: metadata || {} });
325
+ } catch (_e) { /* audit best-effort — drop-silent */ }
326
+ }
327
+
328
+ function _emitMetric(name, n) {
329
+ try { observability().safeEvent(name, n || 1, {}); }
330
+ catch (_e) { /* drop-silent */ }
331
+ }
332
+
333
+ // `mode` is the enforcement actually applied to this call (the
334
+ // per-call override when present, else the instance default) so the
335
+ // returned `enforcement` reflects how the call was evaluated.
336
+ function _result(tenantId, model, used, limit, windowStart, resetsAt, mode, allowed, exceeded) {
337
+ var remaining = limit - used;
338
+ return {
339
+ tenantId: tenantId,
340
+ model: model,
341
+ dimension: dimension,
342
+ period: period,
343
+ used: used,
344
+ limit: limit,
345
+ remaining: remaining < 0 ? 0 : remaining,
346
+ allowed: allowed,
347
+ exceeded: exceeded,
348
+ enforcement: mode,
349
+ windowStart: windowStart,
350
+ resetsAt: resetsAt,
351
+ };
352
+ }
353
+
354
+ function consume(tenantId, model, amount, consumeOpts) {
355
+ validateOpts.requireNonEmptyString(tenantId,
356
+ "ai.quota.consume: tenantId", AiQuotaError, "aiQuota/bad-tenant");
357
+ validateOpts.requireNonEmptyString(model,
358
+ "ai.quota.consume: model", AiQuotaError, "aiQuota/bad-model");
359
+ if (typeof amount !== "number" || !isFinite(amount) || amount < 0) {
360
+ throw new AiQuotaError("aiQuota/bad-amount",
361
+ "ai.quota.consume: amount must be a non-negative finite number");
362
+ }
363
+ consumeOpts = consumeOpts || {};
364
+ // Per-call enforcement override lets a single endpoint dial a
365
+ // softer mode for a trusted internal caller without a second
366
+ // enforcer; still validated against the allowlist.
367
+ var mode = (consumeOpts.enforcement == null) ? enforcement : consumeOpts.enforcement;
368
+ if (ENFORCEMENTS.indexOf(mode) === -1) {
369
+ throw new AiQuotaError("aiQuota/bad-enforcement",
370
+ "ai.quota.consume: enforcement override must be one of " + ENFORCEMENTS.join(" / "));
371
+ }
372
+
373
+ var now = Date.now();
374
+ var windowStart = _windowStartFor(period, now);
375
+ var resetsAt = _resetsAtFor(period, windowStart);
376
+ var windowMs = resetsAt - windowStart;
377
+ var limit = _limitFor(tenantId, model);
378
+ var key = _keyFor(tenantId, model, windowStart);
379
+
380
+ if (mode === "hard") {
381
+ // Atomic conditional reserve — the store tests current + amount
382
+ // <= limit and charges only if it fits, as one indivisible
383
+ // operation. Charging first and refunding the overage (a
384
+ // read-then-add or add-then-refund shape) would let a concurrent
385
+ // over-budget call transiently inflate the counter and falsely
386
+ // deny a smaller call that should fit; the conditional reserve
387
+ // never charges on refusal, so there is no transient to race.
388
+ var rv = store.reserve(key, amount, limit, windowMs);
389
+ if (rv.allowed) {
390
+ _emitAudit("ai/quota-applied", "allowed", {
391
+ tenantId: tenantId, model: model, dimension: dimension,
392
+ period: period, amount: amount, used: rv.used, limit: limit,
393
+ nodeId: _nodeId(),
394
+ });
395
+ _emitMetric("ai.quota.applied", 1);
396
+ return _result(tenantId, model, rv.used, limit, windowStart, resetsAt, mode, true, false);
397
+ }
398
+ _emitAudit("ai/quota-exceeded", "denied", {
399
+ tenantId: tenantId, model: model, dimension: dimension,
400
+ period: period, amount: amount, used: rv.used, limit: limit,
401
+ enforcement: mode, nodeId: _nodeId(),
402
+ });
403
+ _emitMetric("ai.quota.exceeded", 1);
404
+ throw new AiQuotaError("aiQuota/exceeded",
405
+ "ai.quota.consume: tenant '" + tenantId + "' model '" + model +
406
+ "' is at " + rv.used + " of " + limit + " " + dimension +
407
+ " this " + period + "; consuming " + amount + " would exceed the budget — call refused");
408
+ }
409
+
410
+ // soft / warn always charge — the call proceeds regardless of the
411
+ // ceiling; the mode only changes how the overage is reported.
412
+ var used = store.add(key, amount, windowMs);
413
+ if (used > limit) {
414
+ _emitAudit("ai/quota-exceeded", "allowed", {
415
+ tenantId: tenantId, model: model, dimension: dimension,
416
+ period: period, amount: amount, used: used, limit: limit,
417
+ enforcement: mode, nodeId: _nodeId(),
418
+ });
419
+ _emitMetric("ai.quota.exceeded", 1);
420
+ // soft reports allowed:false so the caller can choose to honor
421
+ // the ceiling; warn reports allowed:true (advisory only).
422
+ return _result(tenantId, model, used, limit, windowStart, resetsAt, mode, mode === "warn", true);
423
+ }
424
+ _emitAudit("ai/quota-applied", "allowed", {
425
+ tenantId: tenantId, model: model, dimension: dimension,
426
+ period: period, amount: amount, used: used, limit: limit,
427
+ nodeId: _nodeId(),
428
+ });
429
+ _emitMetric("ai.quota.applied", 1);
430
+ return _result(tenantId, model, used, limit, windowStart, resetsAt, mode, true, false);
431
+ }
432
+
433
+ function check(tenantId, model) {
434
+ validateOpts.requireNonEmptyString(tenantId,
435
+ "ai.quota.check: tenantId", AiQuotaError, "aiQuota/bad-tenant");
436
+ validateOpts.requireNonEmptyString(model,
437
+ "ai.quota.check: model", AiQuotaError, "aiQuota/bad-model");
438
+ var now = Date.now();
439
+ var windowStart = _windowStartFor(period, now);
440
+ var resetsAt = _resetsAtFor(period, windowStart);
441
+ var limit = _limitFor(tenantId, model);
442
+ var used = store.get(_keyFor(tenantId, model, windowStart));
443
+ return _result(tenantId, model, used, limit, windowStart, resetsAt, enforcement, used < limit, used >= limit);
444
+ }
445
+
446
+ function reset(tenantId, model) {
447
+ var now = Date.now();
448
+ var windowStart = _windowStartFor(period, now);
449
+ if (tenantId === undefined) {
450
+ // Clear everything. The default store supports a full clear;
451
+ // an external store gets a no-arg reset() if it offers one.
452
+ if (storeIsDefault) { store._clear(); return; }
453
+ if (typeof store.reset === "function") { store.reset(); return; }
454
+ return;
455
+ }
456
+ validateOpts.requireNonEmptyString(tenantId,
457
+ "ai.quota.reset: tenantId", AiQuotaError, "aiQuota/bad-tenant");
458
+ if (model !== undefined) {
459
+ validateOpts.requireNonEmptyString(model,
460
+ "ai.quota.reset: model", AiQuotaError, "aiQuota/bad-model");
461
+ store.reset(_keyFor(tenantId, model, windowStart));
462
+ return;
463
+ }
464
+ // tenant-wide reset needs key enumeration. The default in-memory
465
+ // store can scan its own keys; an external store would need a
466
+ // server-side prefix delete the framework can't portably issue.
467
+ if (storeIsDefault) {
468
+ var prefix = _keyPrefixForTenant(tenantId);
469
+ var keys = store._keysWithPrefix(prefix);
470
+ for (var i = 0; i < keys.length; i++) store.reset(keys[i]);
471
+ return;
472
+ }
473
+ throw new AiQuotaError("aiQuota/reset-unsupported",
474
+ "ai.quota.reset: tenant-wide reset with an external store requires " +
475
+ "an explicit model argument (per-key) or a store-side prefix delete");
476
+ }
477
+
478
+ return {
479
+ consume: consume,
480
+ check: check,
481
+ snapshot: check,
482
+ reset: reset,
483
+ dimension: dimension,
484
+ period: period,
485
+ };
486
+ }
487
+
488
+ // Per-tenant / per-model / per-tenant-model limit-override maps are
489
+ // validated at config time so a typo (negative cap, NaN) surfaces at
490
+ // boot, not as a silent fall-through to the default ceiling.
491
+ function _validateLimitMap(map, label) {
492
+ if (map == null) return {};
493
+ if (typeof map !== "object" || Array.isArray(map)) {
494
+ throw new AiQuotaError("aiQuota/bad-override",
495
+ "ai.quota.create: " + label + " must be a plain object { key: limit }");
496
+ }
497
+ var keys = Object.keys(map);
498
+ for (var i = 0; i < keys.length; i++) {
499
+ var v = map[keys[i]];
500
+ if (typeof v !== "number" || !isFinite(v) || v <= 0) {
501
+ throw new AiQuotaError("aiQuota/bad-override",
502
+ "ai.quota.create: " + label + "['" + keys[i] +
503
+ "'] must be a positive finite number");
504
+ }
505
+ }
506
+ return map;
507
+ }
508
+
509
+ function _validateStore(store) {
510
+ if (!store || typeof store !== "object" ||
511
+ typeof store.reserve !== "function" ||
512
+ typeof store.add !== "function" ||
513
+ typeof store.get !== "function" ||
514
+ typeof store.reset !== "function") {
515
+ throw new AiQuotaError("aiQuota/bad-store",
516
+ "ai.quota.create: store must expose reserve / add / get / reset functions");
517
+ }
518
+ }
519
+
520
+ module.exports = {
521
+ create: create,
522
+ DIMENSIONS: DIMENSIONS,
523
+ PERIODS: PERIODS,
524
+ ENFORCEMENTS: ENFORCEMENTS,
525
+ AiQuotaError: AiQuotaError,
526
+ };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@blamejs/core",
3
- "version": "0.12.26",
3
+ "version": "0.12.28",
4
4
  "description": "The Node framework that owns its stack.",
5
5
  "license": "Apache-2.0",
6
6
  "author": "blamejs contributors",
package/sbom.cdx.json CHANGED
@@ -2,10 +2,10 @@
2
2
  "$schema": "http://cyclonedx.org/schema/bom-1.5.schema.json",
3
3
  "bomFormat": "CycloneDX",
4
4
  "specVersion": "1.5",
5
- "serialNumber": "urn:uuid:f16ae992-0fc9-4ad6-b05b-b6dfd629e058",
5
+ "serialNumber": "urn:uuid:2bff79a1-ab38-4b20-8cdc-37b5e80872a3",
6
6
  "version": 1,
7
7
  "metadata": {
8
- "timestamp": "2026-05-24T10:57:05.537Z",
8
+ "timestamp": "2026-05-24T16:44:14.267Z",
9
9
  "lifecycles": [
10
10
  {
11
11
  "phase": "build"
@@ -19,14 +19,14 @@
19
19
  }
20
20
  ],
21
21
  "component": {
22
- "bom-ref": "@blamejs/core@0.12.26",
22
+ "bom-ref": "@blamejs/core@0.12.28",
23
23
  "type": "application",
24
24
  "name": "blamejs",
25
- "version": "0.12.26",
25
+ "version": "0.12.28",
26
26
  "scope": "required",
27
27
  "author": "blamejs contributors",
28
28
  "description": "The Node framework that owns its stack.",
29
- "purl": "pkg:npm/%40blamejs/core@0.12.26",
29
+ "purl": "pkg:npm/%40blamejs/core@0.12.28",
30
30
  "properties": [],
31
31
  "externalReferences": [
32
32
  {
@@ -54,7 +54,7 @@
54
54
  "components": [],
55
55
  "dependencies": [
56
56
  {
57
- "ref": "@blamejs/core@0.12.26",
57
+ "ref": "@blamejs/core@0.12.28",
58
58
  "dependsOn": []
59
59
  }
60
60
  ]