ata-validator 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,12 +10,14 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
10
10
 
11
11
  | Scenario | ata | ajv | |
12
12
  |---|---|---|---|
13
- | **validate(obj)** valid | 15M ops/sec | 8M ops/sec | **ata 1.9x faster** |
14
- | **validate(obj)** invalid | 13.1M ops/sec | 8.1M ops/sec | **ata 1.6x faster** |
13
+ | **validate(obj)** valid | 76M ops/sec | 8M ops/sec | **ata 9.5x faster** |
14
+ | **validate(obj)** invalid | 34M ops/sec | 8M ops/sec | **ata 4.3x faster** |
15
15
  | **isValidObject(obj)** | 15.4M ops/sec | 9.2M ops/sec | **ata 1.7x faster** |
16
16
  | **validateJSON(str)** valid | 2.15M ops/sec | 1.88M ops/sec | **ata 1.1x faster** |
17
- | **validateJSON(str)** invalid | 2.62M ops/sec | 2.35M ops/sec | **ata 1.1x faster** |
18
- | **Schema compilation** | 112K ops/sec | 773 ops/sec | **ata 145x faster** |
17
+ | **validateJSON(str)** invalid | 2.17M ops/sec | 2.29M ops/sec | **ata 1.1x faster** |
18
+ | **Schema compilation** | 113K ops/sec | 818 ops/sec | **ata 138x faster** |
19
+
20
+ > validate(obj) numbers are isolated single-schema benchmarks. Multi-schema benchmark overhead reduces throughput; real-world numbers depend on workload.
19
21
 
20
22
  ### Large Data — JS Object Validation
21
23
 
@@ -38,7 +40,7 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
38
40
 
39
41
  ### How it works
40
42
 
41
- **Combined single-pass validation**: ata compiles schemas into monolithic JS functions that both validate and collect errors in a single pass. Valid data returns immediately (lazy error arrayzero allocation). Invalid data collects errors without a second pass.
43
+ **Hybrid validator**: ata compiles schemas into monolithic JS functions identical to the boolean fast path, but returning `VALID_RESULT` on success and calling the error collector on failure. V8 TurboFan optimizes it identically to a pure boolean function error code is dead code on the valid path. No try/catch (3.3x V8 deopt), no lazy arrays, no double-pass.
42
44
 
43
45
  **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `$ref` (local), `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
44
46
 
@@ -52,7 +54,7 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
52
54
 
53
55
  ## When to use ata
54
56
 
55
- - **Any `validate(obj)` workload** — 1.6x2.7x faster than ajv on all data
57
+ - **Any `validate(obj)` workload** — 4.3x9.5x faster than ajv
56
58
  - **Serverless / cold starts** — 12.5x faster schema compilation
57
59
  - **Security-sensitive apps** — RE2 regex, immune to ReDoS attacks
58
60
  - **Batch/streaming validation** — NDJSON log processing, data pipelines (2.6x faster)
@@ -66,7 +68,7 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
66
68
 
67
69
  ## Features
68
70
 
69
- - **Combined single-pass validation**: One JS function validates + collects errors no double pass, lazy error allocation
71
+ - **Hybrid validator**: 76M ops/sec — same function body as boolean check, returns result or calls error collector. No try/catch, no double pass
70
72
  - **Multi-core**: Parallel validation across all CPU cores — 13.4M validations/sec
71
73
  - **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
72
74
  - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks (2391x faster on pathological input)
@@ -101,7 +103,7 @@ const v = new Validator({
101
103
  required: ['name', 'email']
102
104
  });
103
105
 
104
- // Fast boolean check — JS codegen (1.7x faster than ajv)
106
+ // Fast boolean check — JS codegen (9.5x faster than ajv)
105
107
  v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
106
108
 
107
109
  // Full validation with error details + defaults applied
@@ -1048,7 +1048,7 @@ static ThreadPool& pool() {
1048
1048
 
1049
1049
  // --- Fast Validation Registry ---
1050
1050
  // Global schema slots for V8 Fast API (bypasses NAPI overhead)
1051
- static constexpr size_t MAX_FAST_SLOTS = 256;
1051
+ static constexpr size_t MAX_FAST_SLOTS = 4096;
1052
1052
  static ata::schema_ref g_fast_schemas[MAX_FAST_SLOTS];
1053
1053
  static std::string g_fast_schema_jsons[MAX_FAST_SLOTS];
1054
1054
  static uint32_t g_fast_slot_count = 0;
package/binding.gyp CHANGED
@@ -11,10 +11,10 @@
11
11
  "<!@(node -p \"require('node-addon-api').include\")",
12
12
  "include",
13
13
  "deps/simdjson",
14
- "<!@(node -e \"var p=process.platform,a=process.arch;if(p==='darwin'){console.log(a==='arm64'?'/opt/homebrew/opt/re2/include':'/usr/local/opt/re2/include');console.log(a==='arm64'?'/opt/homebrew/opt/abseil/include':'/usr/local/opt/abseil/include')}else{console.log('/usr/include')}\")"
14
+ "<!@(node -e \"var p=process.platform,a=process.arch;if(p==='darwin'){console.log(a==='arm64'?'/opt/homebrew/opt/re2/include':'/usr/local/opt/re2/include');console.log(a==='arm64'?'/opt/homebrew/opt/abseil/include':'/usr/local/opt/abseil/include');console.log(a==='arm64'?'/opt/homebrew/opt/mimalloc/include':'/usr/local/opt/mimalloc/include')}else{console.log('/usr/include')}\")"
15
15
  ],
16
16
  "libraries": [
17
- "<!@(node -e \"var p=process.platform,a=process.arch;if(p==='darwin'){var pre=a==='arm64'?'/opt/homebrew/opt/re2':'/usr/local/opt/re2';console.log('-L'+pre+'/lib -lre2')}else{console.log('-lre2')}\")"
17
+ "<!@(node -e \"var p=process.platform,a=process.arch;if(p==='darwin'){var pre=a==='arm64'?'/opt/homebrew/opt/re2':'/usr/local/opt/re2';var mi=a==='arm64'?'/opt/homebrew/opt/mimalloc':'/usr/local/opt/mimalloc';console.log('-L'+pre+'/lib -lre2 -L'+mi+'/lib -lmimalloc')}else{console.log('-lre2')}\")"
18
18
  ],
19
19
  "dependencies": [
20
20
  "<!(node -p \"require('node-addon-api').gyp\")"
package/index.js CHANGED
@@ -211,6 +211,7 @@ class Validator {
211
211
  // Pure JS fast path — no NAPI, runs in V8 JIT
212
212
  // Set ATA_FORCE_NAPI=1 to disable JS codegen (for correctness testing)
213
213
  const schemaObj = typeof schema === "string" ? JSON.parse(schema) : schema;
214
+ this._schemaObj = schemaObj;
214
215
  const jsFn = process.env.ATA_FORCE_NAPI
215
216
  ? null
216
217
  : (compileToJSCodegen(schemaObj) || compileToJS(schemaObj));
@@ -330,6 +331,107 @@ class Validator {
330
331
  });
331
332
  }
332
333
 
334
+ // --- Standalone pre-compilation ---
335
+ // Generate a JS module string that can be written to a file.
336
+ // On next startup, load with Validator.fromStandalone() — zero compile time.
337
+ toStandalone() {
338
+ const jsFn = this._jsFn;
339
+ if (!jsFn || !jsFn._source) return null;
340
+ const src = jsFn._source;
341
+ const hybridSrc = jsFn._hybridSource || '';
342
+
343
+ // Also capture error function source for zero-compile standalone load
344
+ const jsErrFn = compileToJSCodegenWithErrors(
345
+ typeof this._schemaObj === 'object' ? this._schemaObj : {}
346
+ );
347
+ const errSrc = jsErrFn && jsErrFn._errSource ? jsErrFn._errSource : '';
348
+
349
+ return `// Auto-generated by ata-validator — do not edit
350
+ 'use strict';
351
+ const boolFn = function(d) {
352
+ ${src}
353
+ };
354
+ const hybridFactory = function(R, E) {
355
+ return function(d) {
356
+ ${hybridSrc}
357
+ };
358
+ };
359
+ ${errSrc ? `const errFn = function(d, _all) {\n ${errSrc}\n};` : 'const errFn = null;'}
360
+ module.exports = { boolFn, hybridFactory, errFn };
361
+ `;
362
+ }
363
+
364
+ // Load a pre-compiled standalone module. Zero schema compilation.
365
+ // No NAPI, no native compile — pure JS. Startup in microseconds.
366
+ // Usage: const v = Validator.fromStandalone(require('./compiled.js'), schema, opts)
367
+ static fromStandalone(mod, schema, opts) {
368
+ const options = opts || {};
369
+ const schemaObj = typeof schema === "string" ? JSON.parse(schema) : schema;
370
+
371
+ // Create a lightweight instance — skip NAPI compile entirely
372
+ const v = Object.create(Validator.prototype);
373
+ v._jsFn = mod.boolFn;
374
+ v._compiled = null;
375
+ v._fastSlot = -1;
376
+
377
+ // Mutators
378
+ const applyDefaults = buildDefaultsApplier(schemaObj);
379
+ const applyCoerce = options.coerceTypes ? buildCoercer(schemaObj) : null;
380
+ const applyRemove = options.removeAdditional ? buildRemover(schemaObj) : null;
381
+ const mutators = [applyRemove, applyCoerce, applyDefaults].filter(Boolean);
382
+ const preprocess = mutators.length === 0 ? null
383
+ : mutators.length === 1 ? mutators[0]
384
+ : (data) => { for (let i = 0; i < mutators.length; i++) mutators[i](data); };
385
+ v._preprocess = preprocess;
386
+
387
+ // Error function — use pre-compiled from standalone if available, else compile
388
+ let errFn = (d) => ({ valid: false, errors: [{ code: 'validation_failed', path: '', message: 'validation failed' }] });
389
+ if (mod.errFn) {
390
+ errFn = (d) => mod.errFn(d, true);
391
+ } else {
392
+ const jsErrFn = compileToJSCodegenWithErrors(schemaObj);
393
+ if (jsErrFn) {
394
+ try { jsErrFn({}, true); errFn = (d) => jsErrFn(d, true); } catch {}
395
+ }
396
+ }
397
+
398
+ // Hybrid or speculative
399
+ const hybridFn = mod.hybridFactory
400
+ ? mod.hybridFactory(VALID_RESULT, errFn)
401
+ : null;
402
+
403
+ v.validate = hybridFn
404
+ ? (preprocess ? (data) => { preprocess(data); return hybridFn(data); } : hybridFn)
405
+ : (preprocess
406
+ ? (data) => { preprocess(data); return mod.boolFn(data) ? VALID_RESULT : errFn(data); }
407
+ : (data) => mod.boolFn(data) ? VALID_RESULT : errFn(data));
408
+ v.isValidObject = mod.boolFn;
409
+ v.isValidJSON = (jsonStr) => {
410
+ try { return mod.boolFn(JSON.parse(jsonStr)); } catch { return false; }
411
+ };
412
+ v.validateJSON = (jsonStr) => {
413
+ try {
414
+ const obj = JSON.parse(jsonStr);
415
+ return hybridFn ? hybridFn(obj) : (mod.boolFn(obj) ? VALID_RESULT : errFn(obj));
416
+ } catch { return { valid: false, errors: [{ code: 'invalid_json', path: '', message: 'invalid JSON' }] }; }
417
+ };
418
+
419
+ // Standard Schema V1
420
+ Object.defineProperty(v, "~standard", {
421
+ value: Object.freeze({
422
+ version: 1, vendor: "ata-validator",
423
+ validate(value) {
424
+ const result = v.validate(value);
425
+ if (result.valid) return { value };
426
+ return { issues: result.errors.map(e => ({ message: e.message, path: parsePointerPath(e.path) })) };
427
+ },
428
+ }),
429
+ writable: false, enumerable: false, configurable: false,
430
+ });
431
+
432
+ return v;
433
+ }
434
+
333
435
  // Fallback methods — only used when JS codegen is unavailable
334
436
  validate(data) {
335
437
  if (this._preprocess) this._preprocess(data);
@@ -384,4 +486,106 @@ function version() {
384
486
  return native.version();
385
487
  }
386
488
 
489
+ // Bundle multiple validators into a single JS file for fast startup.
490
+ // Usage:
491
+ // const bundle = Validator.bundle([schema1, schema2, ...]);
492
+ // fs.writeFileSync('validators.js', bundle);
493
+ // // On startup:
494
+ // const validators = Validator.loadBundle(require('./validators.js'), [schema1, schema2, ...]);
495
+ Validator.bundle = function(schemas, opts) {
496
+ const parts = schemas.map(schema => {
497
+ const v = new Validator(schema, opts);
498
+ const standalone = v.toStandalone();
499
+ if (!standalone) return 'null';
500
+ return '(function(){' + standalone.replace("'use strict';", '').replace('module.exports = ', 'return ') + '})()';
501
+ });
502
+ return "'use strict';\nmodule.exports = [\n" + parts.join(',\n') + '\n];\n';
503
+ };
504
+
505
+ // Zero-dependency self-contained bundle — no require('ata-validator') needed at runtime.
506
+ Validator.bundleStandalone = function(schemas, opts) {
507
+ const R = "Object.freeze({valid:true,errors:Object.freeze([])})";
508
+ const fns = schemas.map(schema => {
509
+ const v = new Validator(schema, opts);
510
+ const jsFn = v._jsFn;
511
+ if (!jsFn || !jsFn._hybridSource) return 'null';
512
+ const jsErrFn = compileToJSCodegenWithErrors(
513
+ typeof schema === 'string' ? JSON.parse(schema) : schema
514
+ );
515
+ const errBody = jsErrFn && jsErrFn._errSource
516
+ ? jsErrFn._errSource
517
+ : "return{valid:false,errors:[{code:'error',path:'',message:'validation failed'}]}";
518
+ return `(function(R){var E=function(d){var _all=true;${errBody}};return function(d){${jsFn._hybridSource}}})(R)`;
519
+ });
520
+ return `'use strict';\nvar R=${R};\nmodule.exports=[${fns.join(',')}];\n`;
521
+ };
522
+
523
+ // Compact bundle: deduplicated code. Shared template functions + per-schema params.
524
+ // Much smaller file → faster V8 parse → faster startup.
525
+ Validator.bundleCompact = function(schemas, opts) {
526
+ // Analyze schemas and group by structure
527
+ const entries = schemas.map(schema => {
528
+ const v = new Validator(schema, opts);
529
+ const jsFn = v._jsFn;
530
+ if (!jsFn || !jsFn._hybridSource) return null;
531
+ const jsErrFn = compileToJSCodegenWithErrors(
532
+ typeof schema === 'string' ? JSON.parse(schema) : schema
533
+ );
534
+ return {
535
+ hybrid: jsFn._hybridSource,
536
+ err: jsErrFn && jsErrFn._errSource ? jsErrFn._errSource : null,
537
+ };
538
+ });
539
+
540
+ // Deduplicate function bodies — many schemas produce identical or near-identical code
541
+ const bodyMap = new Map(); // body → index
542
+ const bodies = [];
543
+ const errMap = new Map();
544
+ const errBodies = [];
545
+
546
+ const indices = entries.map(e => {
547
+ if (!e) return [-1, -1];
548
+ let hi = bodyMap.get(e.hybrid);
549
+ if (hi === undefined) { hi = bodies.length; bodies.push(e.hybrid); bodyMap.set(e.hybrid, hi); }
550
+ let ei = -1;
551
+ if (e.err) {
552
+ ei = errMap.get(e.err);
553
+ if (ei === undefined) { ei = errBodies.length; errBodies.push(e.err); errMap.set(e.err, ei); }
554
+ }
555
+ return [hi, ei];
556
+ });
557
+
558
+ // Generate compact bundle
559
+ let out = "'use strict';\n";
560
+ out += "var R=Object.freeze({valid:true,errors:Object.freeze([])});\n";
561
+
562
+ // Shared hybrid factories
563
+ out += "var H=[\n";
564
+ out += bodies.map(b => `function(R,E){return function(d){${b}}}`).join(',\n');
565
+ out += "\n];\n";
566
+
567
+ // Shared error functions
568
+ out += "var EF=[\n";
569
+ out += errBodies.map(b => `function(d){var _all=true;${b}}`).join(',\n');
570
+ out += "\n];\n";
571
+
572
+ // Build validators from shared templates
573
+ out += "module.exports=[";
574
+ out += indices.map(([hi, ei]) => {
575
+ if (hi < 0) return 'null';
576
+ if (ei >= 0) return `H[${hi}](R,EF[${ei}])`;
577
+ return `H[${hi}](R,function(){return{valid:false,errors:[]}})`;
578
+ }).join(',');
579
+ out += "];\n";
580
+
581
+ return out;
582
+ };
583
+
584
+ Validator.loadBundle = function(mods, schemas, opts) {
585
+ return schemas.map((schema, i) => {
586
+ if (mods[i]) return Validator.fromStandalone(mods[i], schema, opts);
587
+ return new Validator(schema, opts);
588
+ });
589
+ };
590
+
387
591
  module.exports = { Validator, validate, version, createPaddedBuffer, SIMDJSON_PADDING };
@@ -531,14 +531,16 @@ function compileToJSCodegen(schema) {
531
531
  const boolFn = new Function('d', body)
532
532
 
533
533
  // Build hybrid: same body, return R instead of true, return E(d) instead of false.
534
- // V8 optimizes this identically to jsFn — E(d) is dead code on valid path.
535
- // 83M ops/sec vs 26M for combined. Invalid path: 34M vs 6M.
536
534
  const hybridBody = replaceTopLevel(helperStr + checkStr + '\n return R')
537
535
  try {
538
536
  const factory = new Function('R', 'E', `return function(d){${hybridBody}}`)
539
537
  boolFn._hybridFactory = factory
540
538
  } catch {}
541
539
 
540
+ // Store source for standalone compilation (pre-build to file)
541
+ boolFn._source = body
542
+ boolFn._hybridSource = hybridBody
543
+
542
544
  return boolFn
543
545
  } catch {
544
546
  return null
@@ -940,7 +942,9 @@ function compileToJSCodegenWithErrors(schema) {
940
942
  lines.join('\n ') +
941
943
  `\n return{valid:_e.length===0,errors:_e}`
942
944
  try {
943
- return new Function('d', '_all', body)
945
+ const fn = new Function('d', '_all', body)
946
+ fn._errSource = body
947
+ return fn
944
948
  } catch {
945
949
  return null
946
950
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ata-validator",
3
- "version": "0.4.7",
3
+ "version": "0.4.9",
4
4
  "description": "Ultra-fast JSON Schema validator. Beats ajv on every valid-path benchmark: 1.1x–2.7x faster validate(obj), 151x faster compilation, 5.9x faster parallel batch. Speculative validation with V8-optimized JS codegen, simdjson, multi-core. Standard Schema V1 compatible.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",
package/src/ata.cpp CHANGED
@@ -1,5 +1,10 @@
1
1
  #include "ata.h"
2
2
 
3
+ // mimalloc: faster new/delete for small allocations.
4
+ #if __has_include(<mimalloc-new-delete.h>)
5
+ #include <mimalloc-new-delete.h>
6
+ #endif
7
+
3
8
  #include <algorithm>
4
9
  #include <cmath>
5
10
  #include <re2/re2.h>