ata-validator 0.8.0 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -13,7 +13,7 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
13
13
  | **validate(obj)** valid | 22ns | 102ns | **ata 4.6x faster** |
14
14
  | **validate(obj)** invalid | 87ns | 182ns | **ata 2.1x faster** |
15
15
  | **isValidObject(obj)** | 21ns | 100ns | **ata 4.7x faster** |
16
- | **Schema compilation** | 695ns | 1.30ms | **ata 1,867x faster** |
16
+ | **Schema compilation** | 453ns | 1.24ms | **ata 2,729x faster** |
17
17
  | **First validation** | 2.07μs | 1.11ms | **ata 534x faster** |
18
18
 
19
19
  ### Complex Schema (patternProperties + dependentSchemas + propertyNames + additionalProperties)
@@ -54,7 +54,7 @@ Three-tier hybrid codegen: static schemas compile to zero-overhead key checks, d
54
54
  |---|---|---|---|---|---|
55
55
  | **validate (valid)** | **9ns** | 38ns | 50ns | 334ns | 326ns |
56
56
  | **validate (invalid)** | **37ns** | 103ns | 4ns | 11.8μs | 842ns |
57
- | **compilation** | **584ns** | 1.20ms | 52μs | — | — |
57
+ | **compilation** | **453ns** | 1.24ms | 52μs | — | — |
58
58
  | **first validation** | **2.1μs** | 1.11ms | 54μs | — | — |
59
59
 
60
60
  > Different categories: ata/ajv/typebox are JSON Schema validators, zod/valibot are schema-builder DSLs. [Benchmark code](benchmark/bench_all_mitata.mjs)
@@ -71,7 +71,7 @@ Three-tier hybrid codegen: static schemas compile to zero-overhead key checks, d
71
71
 
72
72
  | Scenario | ata | ajv | |
73
73
  |---|---|---|---|
74
- | **Serverless cold start** (50 schemas) | 0.1ms | 23ms | **ata 230x faster** |
74
+ | **Serverless cold start** (50 schemas) | 0.087ms | 3.67ms | **ata 42x faster** |
75
75
  | **ReDoS protection** (`^(a+)+$`) | 0.3ms | 765ms | **ata immune (RE2)** |
76
76
  | **Batch NDJSON** (10K items, multi-core) | 13.4M/sec | 5.1M/sec | **ata 2.6x faster** |
77
77
  | **Fastify startup** (5 routes) | 0.5ms | 6.0ms | **ata 12x faster** |
@@ -88,14 +88,26 @@ Three-tier hybrid codegen: static schemas compile to zero-overhead key checks, d
88
88
 
89
89
  **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields - skipping irrelevant data at GB/s speeds.
90
90
 
91
+ ### $dynamicRef / $dynamicAnchor / $anchor
92
+
93
+ | Scenario | ata | ajv | |
94
+ |---|---|---|---|
95
+ | **$dynamicRef tree** valid | 22ns | 54ns | **ata 2.4x faster** |
96
+ | **$dynamicRef tree** invalid | 70ns | 76ns | **ata 1.1x faster** |
97
+ | **$dynamicRef override** valid | 2.6ns | 183ns | **ata 70x faster** |
98
+ | **$dynamicRef override** invalid | 48ns | 185ns | **ata 3.8x faster** |
99
+ | **$anchor array** valid | 2.3ns | 3.1ns | **ata 1.4x faster** |
100
+
101
+ Self-recursive named functions for $dynamicRef, compile-time cross-schema resolution, zero-wrapper hybrid path. [Benchmark code](benchmark/bench_dynamicref_vs_ajv.mjs)
102
+
91
103
  ### JSON Schema Test Suite
92
104
 
93
- **96.9%** pass rate (1109/1144) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
105
+ **95.3%** pass rate (1170/1227) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12). **95.3%** on [@exodus/schemasafe](https://github.com/ExodusMovement/schemasafe) test suite.
94
106
 
95
107
  ## When to use ata
96
108
 
97
- - **High-throughput `validate(obj)`** - 6.8x faster than ajv on complex schemas, 38x faster than zod
98
- - **Complex schemas** - `patternProperties`, `dependentSchemas`, `propertyNames`, `unevaluatedProperties` all inline JS codegen (6.8x faster than ajv)
109
+ - **High-throughput `validate(obj)`** - 3.1x faster than ajv, 38x faster than zod
110
+ - **Complex schemas** - `patternProperties`, `dependentSchemas`, `propertyNames`, `unevaluatedProperties` all inline JS codegen
99
111
  - **Multi-schema projects** - cross-schema `$ref` with `$id` registry, `addSchema()` API
100
112
  - **Draft 7 migration** - auto-detects `$schema`, normalizes Draft 7 keywords transparently
101
113
  - **Serverless / cold starts** - 6,904x faster compilation, 5,148x faster first validation
@@ -106,12 +118,13 @@ Three-tier hybrid codegen: static schemas compile to zero-overhead key checks, d
106
118
 
107
119
  ## When to use ajv
108
120
 
109
- - **`$dynamicRef`** - not yet supported in ata
110
121
  - **Existing ajv ecosystem** - plugins, custom keywords, large community
122
+ - **Full unevaluatedProperties/Items** - ata covers most cases but some edge cases remain
111
123
 
112
124
  ## Features
113
125
 
114
- - **Hybrid validator**: 6.8x faster than ajv valid, 6.0x faster invalid on complex schemas - jsFn boolean guard for valid path (zero allocation), combined codegen with pre-allocated errors for invalid path. Schema compilation cache for repeated schemas
126
+ - **Hybrid validator**: 4.1x faster than ajv, up to 70x faster on $dynamicRef - zero-wrapper hybrid path for valid data (no allocation), combined codegen for error collection. Schema compilation cache for repeated schemas
127
+ - **$dynamicRef / $dynamicAnchor / $anchor**: Full Draft 2020-12 dynamic reference support. Self-recursive named functions, compile-time cross-schema resolution (42/42 spec tests)
115
128
  - **Cross-schema `$ref`**: `schemas` option and `addSchema()` API. Compile-time resolution with `$id` registry, zero runtime overhead
116
129
  - **Draft 7 support**: Auto-detects `$schema` field, normalizes `dependencies`/`additionalItems`/`definitions` transparently
117
130
  - **Multi-core**: Parallel validation across all CPU cores - 13.4M validations/sec
@@ -17,6 +17,7 @@
17
17
  #include <vector>
18
18
 
19
19
  #include "ata.h"
20
+ #include <simdjson.h>
20
21
 
21
22
  // ============================================================================
22
23
  // V8 Direct Object Traversal Engine
@@ -797,6 +798,67 @@ static void validate_napi(const schema_node_ptr& node,
797
798
  // N-API Binding
798
799
  // ============================================================================
799
800
 
801
+ // ============================================================================
802
+ // simdjson DOM to V8 JS Object conversion
803
+ // ============================================================================
804
+
805
+ static Napi::Value dom_to_napi(Napi::Env env, simdjson::dom::element el) {
806
+ using namespace simdjson;
807
+ switch (el.type()) {
808
+ case dom::element_type::OBJECT: {
809
+ auto obj = Napi::Object::New(env);
810
+ for (auto [key, val] : dom::object(el)) {
811
+ obj.Set(std::string(key), dom_to_napi(env, val));
812
+ }
813
+ return obj;
814
+ }
815
+ case dom::element_type::ARRAY: {
816
+ dom::array arr = el;
817
+ auto jsArr = Napi::Array::New(env, arr.size());
818
+ uint32_t i = 0;
819
+ for (auto val : arr) {
820
+ jsArr.Set(i++, dom_to_napi(env, val));
821
+ }
822
+ return jsArr;
823
+ }
824
+ case dom::element_type::STRING: {
825
+ std::string_view sv;
826
+ el.get(sv);
827
+ return Napi::String::New(env, sv.data(), sv.length());
828
+ }
829
+ case dom::element_type::INT64: {
830
+ int64_t v;
831
+ el.get(v);
832
+ return Napi::Number::New(env, static_cast<double>(v));
833
+ }
834
+ case dom::element_type::UINT64: {
835
+ uint64_t v;
836
+ el.get(v);
837
+ return Napi::Number::New(env, static_cast<double>(v));
838
+ }
839
+ case dom::element_type::DOUBLE: {
840
+ double v;
841
+ el.get(v);
842
+ return Napi::Number::New(env, v);
843
+ }
844
+ case dom::element_type::BOOL: {
845
+ bool v;
846
+ el.get(v);
847
+ return Napi::Boolean::New(env, v);
848
+ }
849
+ case dom::element_type::NULL_VALUE:
850
+ return env.Null();
851
+ default:
852
+ return env.Undefined();
853
+ }
854
+ }
855
+
856
+ // Thread-local simdjson DOM parser for parseJSON / validateAndParse
857
+ static simdjson::dom::parser& tl_dom_parser() {
858
+ thread_local simdjson::dom::parser parser;
859
+ return parser;
860
+ }
861
+
800
862
  static Napi::Object make_result(Napi::Env env,
801
863
  const ata::validation_result& result) {
802
864
  Napi::Object obj = Napi::Object::New(env);
@@ -822,7 +884,8 @@ class CompiledSchema : public Napi::ObjectWrap<CompiledSchema> {
822
884
  {InstanceMethod("validate", &CompiledSchema::Validate),
823
885
  InstanceMethod("validateJSON", &CompiledSchema::ValidateJSON),
824
886
  InstanceMethod("validateDirect", &CompiledSchema::ValidateDirect),
825
- InstanceMethod("isValidJSON", &CompiledSchema::IsValidJSON)});
887
+ InstanceMethod("isValidJSON", &CompiledSchema::IsValidJSON),
888
+ InstanceMethod("validateAndParse", &CompiledSchema::ValidateAndParse)});
826
889
  auto* constructor = new Napi::FunctionReference();
827
890
  *constructor = Napi::Persistent(func);
828
891
  env.SetInstanceData(constructor);
@@ -944,6 +1007,77 @@ class CompiledSchema : public Napi::ObjectWrap<CompiledSchema> {
944
1007
  return ValidateDirectImpl(env, info[0]);
945
1008
  }
946
1009
 
1010
+ // Parse JSON with simdjson, validate against schema, return parsed JS object
1011
+ Napi::Value ValidateAndParse(const Napi::CallbackInfo& info) {
1012
+ Napi::Env env = info.Env();
1013
+ if (info.Length() < 1) {
1014
+ Napi::TypeError::New(env, "JSON string or Buffer expected")
1015
+ .ThrowAsJavaScriptException();
1016
+ return env.Undefined();
1017
+ }
1018
+
1019
+ const char* data;
1020
+ size_t len;
1021
+
1022
+ if (info[0].IsBuffer()) {
1023
+ auto buf = info[0].As<Napi::Buffer<char>>();
1024
+ data = buf.Data();
1025
+ len = buf.Length();
1026
+ } else if (info[0].IsString()) {
1027
+ auto [d, l] = extract_string(env, info[0]);
1028
+ data = d;
1029
+ len = l;
1030
+ } else {
1031
+ Napi::TypeError::New(env, "JSON string or Buffer expected")
1032
+ .ThrowAsJavaScriptException();
1033
+ return env.Undefined();
1034
+ }
1035
+
1036
+ // Parse with simdjson
1037
+ simdjson::padded_string padded(data, len);
1038
+ auto& parser = tl_dom_parser();
1039
+ auto doc_result = parser.parse(padded);
1040
+ if (doc_result.error()) {
1041
+ auto obj = Napi::Object::New(env);
1042
+ obj.Set("valid", false);
1043
+ obj.Set("value", env.Null());
1044
+ auto errors = Napi::Array::New(env, 1);
1045
+ auto err = Napi::Object::New(env);
1046
+ err.Set("code", Napi::Number::New(env, static_cast<int>(ata::error_code::invalid_json)));
1047
+ err.Set("path", Napi::String::New(env, ""));
1048
+ err.Set("message", Napi::String::New(env, "Invalid JSON"));
1049
+ errors[0u] = err;
1050
+ obj.Set("errors", errors);
1051
+ return obj;
1052
+ }
1053
+
1054
+ // Validate
1055
+ auto valResult = ata::validate(schema_, std::string_view(data, len));
1056
+
1057
+ // Convert DOM to JS object
1058
+ Napi::Value jsValue = dom_to_napi(env, doc_result.value());
1059
+
1060
+ // Build result
1061
+ auto obj = Napi::Object::New(env);
1062
+ obj.Set("valid", valResult.valid);
1063
+ obj.Set("value", jsValue);
1064
+ if (valResult.valid) {
1065
+ obj.Set("errors", Napi::Array::New(env, 0));
1066
+ } else {
1067
+ Napi::Array errors = Napi::Array::New(env, valResult.errors.size());
1068
+ for (size_t i = 0; i < valResult.errors.size(); ++i) {
1069
+ Napi::Object err = Napi::Object::New(env);
1070
+ err.Set("code",
1071
+ Napi::Number::New(env, static_cast<int>(valResult.errors[i].code)));
1072
+ err.Set("path", Napi::String::New(env, valResult.errors[i].path));
1073
+ err.Set("message", Napi::String::New(env, valResult.errors[i].message));
1074
+ errors[i] = err;
1075
+ }
1076
+ obj.Set("errors", errors);
1077
+ }
1078
+ return obj;
1079
+ }
1080
+
947
1081
  private:
948
1082
  Napi::Value ValidateDirectImpl(Napi::Env env, Napi::Value value) {
949
1083
  compiled_schema_internal ctx;
@@ -996,6 +1130,44 @@ Napi::Value GetVersion(const Napi::CallbackInfo& info) {
996
1130
  return Napi::String::New(info.Env(), std::string(ata::version()));
997
1131
  }
998
1132
 
1133
+ // Standalone JSON parser using simdjson — returns parsed JS object
1134
+ Napi::Value ParseJSON(const Napi::CallbackInfo& info) {
1135
+ Napi::Env env = info.Env();
1136
+ if (info.Length() < 1) {
1137
+ Napi::TypeError::New(env, "JSON string or Buffer expected")
1138
+ .ThrowAsJavaScriptException();
1139
+ return env.Undefined();
1140
+ }
1141
+
1142
+ const char* data;
1143
+ size_t len;
1144
+
1145
+ if (info[0].IsBuffer()) {
1146
+ auto buf = info[0].As<Napi::Buffer<char>>();
1147
+ data = buf.Data();
1148
+ len = buf.Length();
1149
+ } else if (info[0].IsString()) {
1150
+ auto [d, l] = CompiledSchema::extract_string(env, info[0]);
1151
+ data = d;
1152
+ len = l;
1153
+ } else {
1154
+ Napi::TypeError::New(env, "JSON string or Buffer expected")
1155
+ .ThrowAsJavaScriptException();
1156
+ return env.Undefined();
1157
+ }
1158
+
1159
+ // Parse with simdjson using thread-local parser
1160
+ simdjson::padded_string padded(data, len);
1161
+ auto& parser = tl_dom_parser();
1162
+ auto result = parser.parse(padded);
1163
+ if (result.error()) {
1164
+ Napi::Error::New(env, "Invalid JSON").ThrowAsJavaScriptException();
1165
+ return env.Undefined();
1166
+ }
1167
+
1168
+ return dom_to_napi(env, result.value());
1169
+ }
1170
+
999
1171
  // --- Thread Pool ---
1000
1172
  class ThreadPool {
1001
1173
  public:
@@ -1531,6 +1703,7 @@ Napi::Object Init(Napi::Env env, Napi::Object exports) {
1531
1703
  CompiledSchema::Init(env, exports);
1532
1704
  exports.Set("validate", Napi::Function::New(env, ValidateOneShot));
1533
1705
  exports.Set("version", Napi::Function::New(env, GetVersion));
1706
+ exports.Set("parseJSON", Napi::Function::New(env, ParseJSON));
1534
1707
  exports.Set("fastRegister", Napi::Function::New(env, FastRegister));
1535
1708
  exports.Set("fastValidate", Napi::Function::New(env, FastValidateSlow));
1536
1709
 
package/include/ata.h CHANGED
@@ -8,14 +8,16 @@
8
8
  #include <variant>
9
9
  #include <vector>
10
10
 
11
+ #define ATA_VERSION "0.9.0"
12
+
11
13
  namespace ata {
12
14
 
13
15
  inline constexpr uint32_t VERSION_MAJOR = 0;
14
- inline constexpr uint32_t VERSION_MINOR = 4;
15
- inline constexpr uint32_t VERSION_REVISION = 3;
16
+ inline constexpr uint32_t VERSION_MINOR = 9;
17
+ inline constexpr uint32_t VERSION_REVISION = 0;
16
18
 
17
19
  inline constexpr std::string_view version() noexcept {
18
- return "0.4.3";
20
+ return "0.9.0";
19
21
  }
20
22
 
21
23
  enum class error_code : uint8_t {
@@ -64,8 +66,14 @@ struct validation_result {
64
66
 
65
67
  struct compiled_schema;
66
68
 
69
+ struct schema_warning {
70
+ std::string path;
71
+ std::string message;
72
+ };
73
+
67
74
  struct schema_ref {
68
75
  std::shared_ptr<compiled_schema> impl;
76
+ std::vector<schema_warning> warnings;
69
77
 
70
78
  explicit operator bool() const noexcept { return impl != nullptr; }
71
79
  };
package/index.js CHANGED
@@ -264,6 +264,10 @@ function buildPreprocessCodegen(schema, options) {
264
264
  // Schema compilation cache: same schema string -> reuse compiled functions
265
265
  const _compileCache = new Map();
266
266
 
267
+ // Object identity cache: same schema object reference -> reuse entire compiled state
268
+ // Skips JSON.stringify, cache lookup, and all setup. Near-zero cost for repeated schemas.
269
+ const _identityCache = new WeakMap();
270
+
267
271
  const SIMDJSON_PADDING = 64;
268
272
  const VALID_RESULT = Object.freeze({ valid: true, errors: Object.freeze([]) });
269
273
 
@@ -322,12 +326,18 @@ class Validator {
322
326
  const options = opts || {};
323
327
  const schemaObj = typeof schema === "string" ? JSON.parse(schema) : schema;
324
328
 
329
+ // Ultra-fast path: same schema object reference -> return cached instance
330
+ // JS constructor returning an object makes `new` return that object
331
+ // Cost: one WeakMap lookup. No property copy, no setup, nothing.
332
+ if (!opts && typeof schema === "object" && schema !== null) {
333
+ const hit = _identityCache.get(schema);
334
+ if (hit) return hit;
335
+ }
336
+
325
337
  // Draft 7 normalization — convert keywords to 2020-12 equivalents in-place
326
338
  normalizeDraft7(schemaObj);
327
339
 
328
- const schemaStr = JSON.stringify(schemaObj);
329
-
330
- this._schemaStr = schemaStr;
340
+ this._schemaStr = null; // lazy: computed on first use
331
341
  this._schemaObj = schemaObj;
332
342
  this._options = options;
333
343
  this._initialized = false;
@@ -358,6 +368,11 @@ class Validator {
358
368
  this._ensureCompiled();
359
369
  return this.isValidJSON(jsonStr);
360
370
  };
371
+ this.validateAndParse = (jsonStr) => {
372
+ if (!native) throw new Error('Native addon required for validateAndParse()');
373
+ this._ensureCompiled();
374
+ return this.validateAndParse(jsonStr);
375
+ };
361
376
  this.isValid = (buf) => {
362
377
  if (!native) throw new Error('Native addon required for isValid() — use validate() or isValidObject() instead');
363
378
  this._ensureCompiled();
@@ -407,6 +422,9 @@ class Validator {
407
422
  const schemaObj = this._schemaObj;
408
423
  const options = this._options;
409
424
 
425
+ // Lazy stringify — only computed here, not in constructor
426
+ if (!this._schemaStr) this._schemaStr = JSON.stringify(schemaObj);
427
+
410
428
  // Check cache first -- reuse compiled functions for same schema
411
429
  const sm = this._schemaMap.size > 0 ? this._schemaMap : null;
412
430
  const mapKey = this._schemaMap.size > 0
@@ -475,7 +493,7 @@ class Validator {
475
493
  }
476
494
  // errFn: use JS codegen if safe, else lazy-native fallback
477
495
  // For unevaluated schemas without errFn, use jsFn as boolean-only fallback
478
- const hasUnevaluated = schemaObj && JSON.stringify(schemaObj).includes('unevaluatedProperties') || JSON.stringify(schemaObj).includes('unevaluatedItems')
496
+ const hasUnevaluated = schemaObj && (schemaObj.unevaluatedProperties !== undefined || schemaObj.unevaluatedItems !== undefined || this._schemaStr.includes('unevaluatedProperties') || this._schemaStr.includes('unevaluatedItems'))
479
497
  const hasDynRef = this._schemaStr.includes('"$dynamicRef"') || this._schemaStr.includes('"$dynamicAnchor"')
480
498
  const errFn =
481
499
  safeErrFn ||
@@ -616,6 +634,17 @@ class Validator {
616
634
  return false;
617
635
  }
618
636
  };
637
+ // validateAndParse: requires native addon for simdjson parsing
638
+ if (native) {
639
+ const self = this;
640
+ this.validateAndParse = (jsonStr) => {
641
+ self._ensureNative();
642
+ self.validateAndParse = (s) => self._compiled.validateAndParse(s);
643
+ return self.validateAndParse(jsonStr);
644
+ };
645
+ } else {
646
+ this.validateAndParse = () => { throw new Error('Native addon required for validateAndParse()'); };
647
+ }
619
648
  // Buffer APIs: lazy native init — only compile native schema on first buffer call.
620
649
  // This keeps cold start fast (JS codegen only) for users who only use validate().
621
650
  if (native) {
@@ -657,6 +686,7 @@ class Validator {
657
686
  this.isValidObject = (data) => _validate(data).valid;
658
687
  this.validateJSON = (jsonStr) => this._compiled.validateJSON(jsonStr);
659
688
  this.isValidJSON = (jsonStr) => this._compiled.isValidJSON(jsonStr);
689
+ this.validateAndParse = (jsonStr) => this._compiled.validateAndParse(jsonStr);
660
690
  {
661
691
  const slot = this._fastSlot;
662
692
  this.isValid = (buf) => {
@@ -685,6 +715,11 @@ class Validator {
685
715
  };
686
716
  }
687
717
  }
718
+
719
+ // Save to identity cache for ultra-fast reuse with same schema object
720
+ if (this._schemaObj && typeof this._schemaObj === 'object') {
721
+ _identityCache.set(this._schemaObj, this);
722
+ }
688
723
  }
689
724
 
690
725
  _ensureNative() {
@@ -861,6 +896,14 @@ module.exports = { boolFn, hybridFactory, errFn };
861
896
  }
862
897
  };
863
898
 
899
+ v.validateAndParse = native
900
+ ? (jsonStr) => {
901
+ v._ensureNative();
902
+ v.validateAndParse = (s) => v._compiled.validateAndParse(s);
903
+ return v.validateAndParse(jsonStr);
904
+ }
905
+ : () => { throw new Error('Native addon required for validateAndParse()'); };
906
+
864
907
  // Standard Schema V1
865
908
  Object.defineProperty(v, "~standard", {
866
909
  value: Object.freeze({
@@ -1060,10 +1103,31 @@ Validator.loadBundle = function (mods, schemas, opts) {
1060
1103
  });
1061
1104
  };
1062
1105
 
1106
+ const parseJSON = native ? native.parseJSON : JSON.parse;
1107
+
1108
+ // Ultra-fast compile: returns validate function directly, no Validator wrapper
1109
+ // WeakMap cached — second call with same schema object is ~3ns
1110
+ const _compileFnCache = new WeakMap();
1111
+ function compile(schema, opts) {
1112
+ if (!opts && typeof schema === 'object' && schema !== null) {
1113
+ const hit = _compileFnCache.get(schema);
1114
+ if (hit) return hit;
1115
+ }
1116
+ const v = new Validator(schema, opts);
1117
+ v._ensureCompiled();
1118
+ const fn = v.validate;
1119
+ if (!opts && typeof schema === 'object' && schema !== null) {
1120
+ _compileFnCache.set(schema, fn);
1121
+ }
1122
+ return fn;
1123
+ }
1124
+
1063
1125
  module.exports = {
1064
1126
  Validator,
1127
+ compile,
1065
1128
  validate,
1066
1129
  version,
1067
1130
  createPaddedBuffer,
1068
1131
  SIMDJSON_PADDING,
1132
+ parseJSON,
1069
1133
  };
@@ -805,8 +805,8 @@ function compileToJSCodegen(schema, schemaMap) {
805
805
  if (ctx.usesRecursion) {
806
806
  // Self-recursive: wrap in named function
807
807
  body = `function _validate(d){\n ${checkStr}\n return true\n }\n return _validate(d)`
808
- const hybridCheck = replaceTopLevel(checkStr + '\n return R')
809
- hybridBody = `function _validate(d){\n ${hybridCheck}\n }\n return _validate(d)`
808
+ // Hybrid: keep _validate as boolean, wrap only the outer call
809
+ hybridBody = `function _validate(d){\n ${checkStr}\n return true\n }\n return _validate(d)?R:E(d)`
810
810
  } else {
811
811
  body = checkStr + '\n return true'
812
812
  hybridBody = replaceTopLevel(checkStr + '\n return R')
@@ -872,6 +872,84 @@ function replaceTopLevel(code) {
872
872
  return result
873
873
  }
874
874
 
875
+ // Returns true if a property sub-schema will generate 2+ lines that each access v,
876
+ // meaning a local variable hoist is worthwhile.
877
+ function needsLocal(schema) {
878
+ if (typeof schema !== 'object' || schema === null) return false
879
+ // If it has $ref, allOf, anyOf etc., genCode handles it — don't hoist
880
+ if (schema.$ref || schema.allOf || schema.anyOf || schema.oneOf || schema.if) return false
881
+ if (schema.properties || schema.items || schema.prefixItems) return false
882
+ const types = schema.type ? (Array.isArray(schema.type) ? schema.type : [schema.type]) : null
883
+ if (!types || types.length !== 1) return false
884
+ const t = types[0]
885
+ let checkCount = 1 // type check itself
886
+ if (t === 'string') {
887
+ if (schema.minLength !== undefined) checkCount++
888
+ if (schema.maxLength !== undefined) checkCount++
889
+ if (schema.pattern) checkCount++
890
+ if (schema.format) checkCount++
891
+ } else if (t === 'integer' || t === 'number') {
892
+ if (schema.minimum !== undefined) checkCount++
893
+ if (schema.maximum !== undefined) checkCount++
894
+ if (schema.exclusiveMinimum !== undefined) checkCount++
895
+ if (schema.exclusiveMaximum !== undefined) checkCount++
896
+ if (schema.multipleOf !== undefined) checkCount++
897
+ }
898
+ return checkCount >= 2
899
+ }
900
+
901
+ // Try to generate a single combined check for simple leaf schemas.
902
+ // Returns a string like "{const _v=d["x"];if(typeof _v!=='string'||_v.length<1||_v.length>100)return false}"
903
+ // or null if the schema is too complex.
904
+ function tryGenCombined(schema, access, ctx) {
905
+ if (typeof schema !== 'object' || schema === null) return null
906
+ // Only handle simple leaf schemas with a single type and basic constraints
907
+ if (schema.$ref || schema.allOf || schema.anyOf || schema.oneOf || schema.if) return null
908
+ if (schema.properties || schema.items || schema.prefixItems || schema.patternProperties) return null
909
+ if (schema.enum || schema.const !== undefined) return null
910
+ if (schema.not || schema.dependentRequired || schema.dependentSchemas) return null
911
+ const types = schema.type ? (Array.isArray(schema.type) ? schema.type : [schema.type]) : null
912
+ if (!types || types.length !== 1) return null
913
+ const t = types[0]
914
+
915
+ if (t === 'string') {
916
+ const conds = [`typeof _v!=='string'`]
917
+ if (schema.minLength !== undefined) conds.push(`_v.length<${schema.minLength}`)
918
+ if (schema.maxLength !== undefined) conds.push(`_v.length>${schema.maxLength}`)
919
+ if (conds.length < 2 && !schema.pattern && !schema.format) return null // not worth combining
920
+ // pattern and format need separate statements, fall back if present
921
+ if (schema.pattern || schema.format) return null
922
+ const vi = ctx.varCounter++
923
+ return `{const _v=${access};if(${conds.join('||')})return false}`
924
+ }
925
+
926
+ if (t === 'integer') {
927
+ const conds = [`!Number.isInteger(_v)`]
928
+ if (schema.minimum !== undefined) conds.push(`_v<${schema.minimum}`)
929
+ if (schema.maximum !== undefined) conds.push(`_v>${schema.maximum}`)
930
+ if (schema.exclusiveMinimum !== undefined) conds.push(`_v<=${schema.exclusiveMinimum}`)
931
+ if (schema.exclusiveMaximum !== undefined) conds.push(`_v>=${schema.exclusiveMaximum}`)
932
+ if (schema.multipleOf !== undefined) conds.push(`_v%${schema.multipleOf}!==0`)
933
+ if (conds.length < 2) return null
934
+ const vi = ctx.varCounter++
935
+ return `{const _v=${access};if(${conds.join('||')})return false}`
936
+ }
937
+
938
+ if (t === 'number') {
939
+ const conds = [`typeof _v!=='number'||!isFinite(_v)`]
940
+ if (schema.minimum !== undefined) conds.push(`_v<${schema.minimum}`)
941
+ if (schema.maximum !== undefined) conds.push(`_v>${schema.maximum}`)
942
+ if (schema.exclusiveMinimum !== undefined) conds.push(`_v<=${schema.exclusiveMinimum}`)
943
+ if (schema.exclusiveMaximum !== undefined) conds.push(`_v>=${schema.exclusiveMaximum}`)
944
+ if (schema.multipleOf !== undefined) conds.push(`_v%${schema.multipleOf}!==0`)
945
+ if (conds.length < 2) return null
946
+ const vi = ctx.varCounter++
947
+ return `{const _v=${access};if(${conds.join('||')})return false}`
948
+ }
949
+
950
+ return null
951
+ }
952
+
875
953
  // knownType: if parent already verified the type, skip redundant guards.
876
954
  // 'object' = we know v is a non-null non-array object
877
955
  // 'array' = we know v is an array
@@ -884,8 +962,10 @@ function genCode(schema, v, lines, ctx, knownType) {
884
962
  // Only when THIS schema has unevaluated keywords directly (not via $ref target)
885
963
  const hasSiblings = schema.$ref && (schema.unevaluatedProperties !== undefined || schema.unevaluatedItems !== undefined)
886
964
  if (schema.$ref) {
887
- // Self-reference "#" — no-op (permissive) to avoid infinite recursion
965
+ // Self-reference "#" — recursive call to root validator
888
966
  if (schema.$ref === '#') {
967
+ ctx.usesRecursion = true
968
+ lines.push(`if(!_validate(${v}))return false`)
889
969
  if (!hasSiblings) return
890
970
  }
891
971
  // 1. Local ref
@@ -961,20 +1041,32 @@ function genCode(schema, v, lines, ctx, knownType) {
961
1041
  let effectiveType = knownType
962
1042
  if (types) {
963
1043
  if (!knownType) {
964
- // Emit the type check
965
- const conds = types.map(t => {
966
- switch (t) {
967
- case 'object': return `(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v}))`
968
- case 'array': return `Array.isArray(${v})`
969
- case 'string': return `typeof ${v}==='string'`
970
- case 'number': return `(typeof ${v}==='number'&&isFinite(${v}))`
971
- case 'integer': return `Number.isInteger(${v})`
972
- case 'boolean': return `(${v}===true||${v}===false)`
973
- case 'null': return `${v}===null`
974
- default: return 'true'
1044
+ // Emit the type check — use direct negation for single types (avoids !() wrapper)
1045
+ if (types.length === 1) {
1046
+ switch (types[0]) {
1047
+ case 'object': lines.push(`if(typeof ${v}!=='object'||${v}===null||Array.isArray(${v}))return false`); break
1048
+ case 'array': lines.push(`if(!Array.isArray(${v}))return false`); break
1049
+ case 'string': lines.push(`if(typeof ${v}!=='string')return false`); break
1050
+ case 'number': lines.push(`if(typeof ${v}!=='number'||!isFinite(${v}))return false`); break
1051
+ case 'integer': lines.push(`if(!Number.isInteger(${v}))return false`); break
1052
+ case 'boolean': lines.push(`if(typeof ${v}!=='boolean')return false`); break
1053
+ case 'null': lines.push(`if(${v}!==null)return false`); break
975
1054
  }
976
- })
977
- lines.push(`if(!(${conds.join('||')}))return false`)
1055
+ } else {
1056
+ const conds = types.map(t => {
1057
+ switch (t) {
1058
+ case 'object': return `(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v}))`
1059
+ case 'array': return `Array.isArray(${v})`
1060
+ case 'string': return `typeof ${v}==='string'`
1061
+ case 'number': return `(typeof ${v}==='number'&&isFinite(${v}))`
1062
+ case 'integer': return `Number.isInteger(${v})`
1063
+ case 'boolean': return `typeof ${v}==='boolean'`
1064
+ case 'null': return `${v}===null`
1065
+ default: return 'true'
1066
+ }
1067
+ })
1068
+ lines.push(`if(!(${conds.join('||')}))return false`)
1069
+ }
978
1070
  }
979
1071
  // If single type, downstream checks can skip guards
980
1072
  if (types.length === 1) effectiveType = types[0]
@@ -1291,14 +1383,31 @@ function genCode(schema, v, lines, ctx, knownType) {
1291
1383
  if (schema.properties) {
1292
1384
  for (const [key, prop] of Object.entries(schema.properties)) {
1293
1385
  if (requiredSet.has(key) && isObj) {
1294
- // Required + type:object — property exists, use destructured local
1295
- genCode(prop, hoisted[key] || `${v}[${JSON.stringify(key)}]`, lines, ctx)
1386
+ // Required + type:object — hoist to local to reduce repeated property lookups
1387
+ const access = hoisted[key] || `${v}[${JSON.stringify(key)}]`
1388
+ const combined = tryGenCombined(prop, access, ctx)
1389
+ if (combined) {
1390
+ lines.push(combined)
1391
+ } else if (needsLocal(prop)) {
1392
+ const oi = ctx.varCounter++
1393
+ const local = `_r${oi}`
1394
+ lines.push(`{const ${local}=${access}`)
1395
+ genCode(prop, local, lines, ctx)
1396
+ lines.push(`}`)
1397
+ } else {
1398
+ genCode(prop, access, lines, ctx)
1399
+ }
1296
1400
  } else if (isObj) {
1297
1401
  // Optional — hoist to local, check undefined
1298
1402
  const oi = ctx.varCounter++
1299
1403
  const local = `_o${oi}`
1300
1404
  lines.push(`{const ${local}=${v}[${JSON.stringify(key)}];if(${local}!==undefined){`)
1301
- genCode(prop, local, lines, ctx)
1405
+ const combined = tryGenCombined(prop, local, ctx)
1406
+ if (combined) {
1407
+ lines.push(combined)
1408
+ } else {
1409
+ genCode(prop, local, lines, ctx)
1410
+ }
1302
1411
  lines.push(`}}`)
1303
1412
  } else {
1304
1413
  lines.push(`if(typeof ${v}==='object'&&${v}!==null&&${JSON.stringify(key)} in ${v}){`)
@@ -1322,15 +1431,15 @@ function genCode(schema, v, lines, ctx, knownType) {
1322
1431
 
1323
1432
  // prefixItems
1324
1433
  if (schema.prefixItems) {
1434
+ const pfxVar = ctx.varCounter++
1325
1435
  for (let i = 0; i < schema.prefixItems.length; i++) {
1326
- const elem = `_p${ctx.varCounter}_${i}`
1436
+ const elem = `_p${pfxVar}_${i}`
1327
1437
  lines.push(isArr
1328
1438
  ? `if(${v}.length>${i}){const ${elem}=${v}[${i}]`
1329
1439
  : `if(Array.isArray(${v})&&${v}.length>${i}){const ${elem}=${v}[${i}]`)
1330
1440
  genCode(schema.prefixItems[i], elem, lines, ctx)
1331
1441
  lines.push(`}`)
1332
1442
  }
1333
- ctx.varCounter++
1334
1443
  }
1335
1444
 
1336
1445
  // contains — use helper function to avoid try/catch overhead
@@ -1642,23 +1751,73 @@ function genCode(schema, v, lines, ctx, knownType) {
1642
1751
  const inner = `for(var _k in ${v}){if(!${evVar}[_k])return false}`
1643
1752
  if (!ctx.deferredChecks) ctx.deferredChecks = []
1644
1753
  ctx.deferredChecks.push(isObj ? inner + '}' : `if(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v})){${inner}}}`)
1645
- } else if (schema.patternProperties) {
1646
- // patternProperties: runtime key matching
1647
- const ei = ctx.varCounter++
1648
- const evVar = `_ev${ei}`
1649
- lines.push(`{const ${evVar}={}`)
1650
- for (const k of baseProps) lines.push(`${evVar}[${JSON.stringify(k)}]=1`)
1651
- const patterns = Object.keys(schema.patternProperties)
1652
- const reVars = []
1653
- for (const pat of patterns) {
1654
- const ri = ctx.varCounter++
1655
- ctx.closureVars.push(`_ure${ri}`)
1656
- ctx.closureVals.push(new RegExp(pat))
1657
- reVars.push(`_ure${ri}`)
1754
+ } else {
1755
+ // General fallback: collect all patternProperties from root + allOf sub-schemas + if
1756
+ // and use runtime regex matching
1757
+ const allPatterns = []
1758
+ if (schema.patternProperties) {
1759
+ allPatterns.push(...Object.keys(schema.patternProperties))
1760
+ }
1761
+ if (schema.allOf) {
1762
+ for (const sub of schema.allOf) {
1763
+ if (sub && sub.patternProperties) {
1764
+ allPatterns.push(...Object.keys(sub.patternProperties))
1765
+ }
1766
+ }
1767
+ }
1768
+ // lone if (no then/else) still contributes annotations when it passes
1769
+ if (schema.if && !schema.then && !schema.else && schema.if.patternProperties) {
1770
+ allPatterns.push(...Object.keys(schema.if.patternProperties))
1771
+ }
1772
+ if (allPatterns.length > 0) {
1773
+ const ei = ctx.varCounter++
1774
+ const evVar = `_ev${ei}`
1775
+ lines.push(`{const ${evVar}={}`)
1776
+ for (const k of baseProps) lines.push(`${evVar}[${JSON.stringify(k)}]=1`)
1777
+ const reVars = []
1778
+ for (const pat of allPatterns) {
1779
+ const ri = ctx.varCounter++
1780
+ ctx.closureVars.push(`_ure${ri}`)
1781
+ ctx.closureVals.push(new RegExp(pat))
1782
+ reVars.push(`_ure${ri}`)
1783
+ }
1784
+ if (schema.if && !schema.then && !schema.else) {
1785
+ // Lone if: run the if check first; if it passes, its patternProperties contribute
1786
+ const ifLines2 = []
1787
+ genCode(schema.if, '_iv2', ifLines2, ctx)
1788
+ const ufi = ctx.varCounter++
1789
+ const ifFn = ifLines2.length === 0
1790
+ ? `function(_iv2){return true}`
1791
+ : `function(_iv2){${ifLines2.join(';')};return true}`
1792
+ // Mark keys matching if's patterns as evaluated only when if passes
1793
+ const ifPatterns = schema.if.patternProperties ? Object.keys(schema.if.patternProperties) : []
1794
+ const ifReVars = []
1795
+ for (const pat of ifPatterns) {
1796
+ const ri = ctx.varCounter++
1797
+ ctx.closureVars.push(`_ure${ri}`)
1798
+ ctx.closureVals.push(new RegExp(pat))
1799
+ ifReVars.push(`_ure${ri}`)
1800
+ }
1801
+ const rootReVars = []
1802
+ if (schema.patternProperties) {
1803
+ for (const pat of Object.keys(schema.patternProperties)) {
1804
+ const ri = ctx.varCounter++
1805
+ ctx.closureVars.push(`_ure${ri}`)
1806
+ ctx.closureVals.push(new RegExp(pat))
1807
+ rootReVars.push(`_ure${ri}`)
1808
+ }
1809
+ }
1810
+ const rootPatCheck = rootReVars.map(rv => `if(${rv}.test(_k))continue;`).join('')
1811
+ const ifPatCheck = ifReVars.map(rv => `if(${rv}.test(_k))continue;`).join('')
1812
+ const inner = `const _uif${ufi}=${ifFn};if(_uif${ufi}(${v})){for(var _k in ${v}){if(${evVar}[_k])continue;${rootPatCheck}${ifPatCheck}return false}}else{for(var _k in ${v}){if(${evVar}[_k])continue;${rootPatCheck}return false}}`
1813
+ if (!ctx.deferredChecks) ctx.deferredChecks = []
1814
+ ctx.deferredChecks.push(isObj ? inner + '}' : `if(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v})){${inner}}}`)
1815
+ } else {
1816
+ const inner = `for(var _k in ${v}){if(${evVar}[_k])continue;${reVars.map(rv => `if(${rv}.test(_k)){${evVar}[_k]=1;continue}`).join('')}return false}`
1817
+ if (!ctx.deferredChecks) ctx.deferredChecks = []
1818
+ ctx.deferredChecks.push(isObj ? inner + '}' : `if(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v})){${inner}}}`)
1819
+ }
1658
1820
  }
1659
- const inner = `for(var _k in ${v}){if(${evVar}[_k])continue;${reVars.map(rv => `if(${rv}.test(_k)){${evVar}[_k]=1;continue}`).join('')}return false}`
1660
- if (!ctx.deferredChecks) ctx.deferredChecks = []
1661
- ctx.deferredChecks.push(isObj ? inner + '}' : `if(typeof ${v}==='object'&&${v}!==null&&!Array.isArray(${v})){${inner}}}`)
1662
1821
  }
1663
1822
  } else if (typeof schema.unevaluatedProperties === 'object') {
1664
1823
  // Tier 3 with schema: validate unknown keys against sub-schema
@@ -1710,7 +1869,12 @@ function genCode(schema, v, lines, ctx, knownType) {
1710
1869
  if (schema.unevaluatedItems !== undefined) {
1711
1870
  const evalResult = collectEvaluated(schema, ctx.schemaMap, ctx.rootDefs)
1712
1871
 
1713
- if (evalResult.allItems || schema.unevaluatedItems === true) {
1872
+ // Check if allItems from anyOf/oneOf branches with `items` keyword needs dynamic tracking
1873
+ const branchKw = schema.anyOf ? 'anyOf' : schema.oneOf ? 'oneOf' : null
1874
+ const hasConditionalItems = evalResult.allItems && evalResult.dynamic && branchKw &&
1875
+ schema[branchKw].some(sub => sub && typeof sub === 'object' && ((sub.items && typeof sub.items === 'object') || sub.items === true))
1876
+
1877
+ if (schema.unevaluatedItems === true || (evalResult.allItems && !hasConditionalItems)) {
1714
1878
  // All items evaluated or unevaluatedItems:true — no-op
1715
1879
  } else if (!evalResult.dynamic) {
1716
1880
  // Static: all evaluated items known at compile-time
@@ -1736,16 +1900,29 @@ function genCode(schema, v, lines, ctx, knownType) {
1736
1900
  }
1737
1901
  } else {
1738
1902
  // Dynamic: runtime tracking of max evaluated index
1739
- const baseIdx = evalResult.items || 0
1903
+ // Compute baseIdx from unconditional sources only (root prefixItems/items, allOf)
1904
+ let baseIdx = 0
1905
+ if (schema.prefixItems) baseIdx = Math.max(baseIdx, schema.prefixItems.length)
1906
+ if (schema.items && typeof schema.items === 'object') baseIdx = Infinity // items: schema → all evaluated
1907
+ if (schema.allOf) {
1908
+ for (const sub of schema.allOf) {
1909
+ const subR = collectEvaluated(sub, ctx.schemaMap, ctx.rootDefs)
1910
+ if (subR.items !== null) baseIdx = Math.max(baseIdx, subR.items)
1911
+ if (subR.allItems) baseIdx = Infinity
1912
+ }
1913
+ }
1914
+ if (baseIdx === Infinity) baseIdx = 0 // allItems already handled above
1740
1915
  const branchKeyword = schema.anyOf ? 'anyOf' : schema.oneOf ? 'oneOf' : null
1741
1916
 
1742
1917
  if (branchKeyword && (schema.unevaluatedItems === false || typeof schema.unevaluatedItems === 'object')) {
1743
1918
  // anyOf/oneOf: each branch may evaluate different number of items
1744
1919
  const branches = schema[branchKeyword]
1745
1920
  const branchMaxIdx = []
1921
+ const branchAllItems = []
1746
1922
  for (const sub of branches) {
1747
1923
  const subR = collectEvaluated(sub, ctx.schemaMap, ctx.rootDefs)
1748
1924
  branchMaxIdx.push(subR.items || 0)
1925
+ branchAllItems.push(subR.allItems)
1749
1926
  }
1750
1927
  // Runtime: find max evaluated index across all matching branches
1751
1928
  const fns = []
@@ -1759,7 +1936,10 @@ function genCode(schema, v, lines, ctx, knownType) {
1759
1936
  const evVar = `_eidx${ei}`
1760
1937
  lines.push(`{let ${evVar}=${baseIdx}`)
1761
1938
  lines.push(`const _bf${bfi}=[${fns.join(',')}]`)
1762
- const maxExprs = branchMaxIdx.map((m, i) => `_bi===${i}?${Math.max(m, baseIdx)}`).join(':') + `:${baseIdx}`
1939
+ const maxExprs = branchMaxIdx.map((m, i) => {
1940
+ if (branchAllItems[i]) return `_bi===${i}?${v}.length`
1941
+ return `_bi===${i}?${Math.max(m, baseIdx)}`
1942
+ }).join(':') + `:${baseIdx}`
1763
1943
  if (branchKeyword === 'oneOf') {
1764
1944
  lines.push(`for(let _bi=0;_bi<_bf${bfi}.length;_bi++){if(_bf${bfi}[_bi](${v})){${evVar}=${maxExprs};break}}`)
1765
1945
  } else {
@@ -1784,8 +1964,8 @@ function genCode(schema, v, lines, ctx, knownType) {
1784
1964
  lines.push('}')
1785
1965
  }
1786
1966
  }
1787
- } else if (schema.if && (schema.then || schema.else) && (schema.unevaluatedItems === false || typeof schema.unevaluatedItems === 'object')) {
1788
- // if/then/else: branch-specific max index
1967
+ } else if (schema.if && (schema.unevaluatedItems === false || typeof schema.unevaluatedItems === 'object')) {
1968
+ // if/then/else (or lone if): branch-specific max index
1789
1969
  const ifEval = collectEvaluated(schema.if, ctx.schemaMap, ctx.rootDefs)
1790
1970
  const thenEval = schema.then ? collectEvaluated(schema.then, ctx.schemaMap, ctx.rootDefs) : { items: null }
1791
1971
  const elseEval = schema.else ? collectEvaluated(schema.else, ctx.schemaMap, ctx.rootDefs) : { items: null }
@@ -1804,6 +1984,54 @@ function genCode(schema, v, lines, ctx, knownType) {
1804
1984
  const guard = isArr ? '' : `if(Array.isArray(${v}))`
1805
1985
  lines.push(`${guard}{const _uif${ufi}=${ifFn3};if(_uif${ufi}(${v})){if(${v}.length>${thenIdx})return false}else{if(${v}.length>${elseIdx})return false}}`)
1806
1986
  }
1987
+ } else if ((schema.contains || (schema.allOf && schema.allOf.some(s => s && s.contains))) && (schema.unevaluatedItems === false || typeof schema.unevaluatedItems === 'object')) {
1988
+ // contains + unevaluatedItems: per-item tracking of which items are matched by contains
1989
+ // Collect contains from root and allOf sub-schemas
1990
+ const allContains = []
1991
+ if (schema.contains) allContains.push(schema.contains)
1992
+ if (schema.allOf) {
1993
+ for (const sub of schema.allOf) {
1994
+ if (sub && sub.contains) allContains.push(sub.contains)
1995
+ }
1996
+ }
1997
+ const ci = ctx.varCounter++
1998
+ const evArr = `_cev${ci}`
1999
+ const containsFns = []
2000
+ for (const c of allContains) {
2001
+ const cLines = []
2002
+ genCode(c, '_cv', cLines, ctx)
2003
+ containsFns.push(cLines.length === 0
2004
+ ? `function(_cv){return true}`
2005
+ : `function(_cv){${cLines.join(';')};return true}`)
2006
+ }
2007
+ const cfnArr = `_cfn${ci}`
2008
+ lines.push(`{const ${cfnArr}=[${containsFns.join(',')}]`)
2009
+ // Mark items evaluated by prefixItems
2010
+ lines.push(`const ${evArr}=[]`)
2011
+ if (baseIdx > 0) {
2012
+ lines.push(`for(let _i=0;_i<${Math.min(baseIdx, 1000)};_i++)${evArr}[_i]=true`)
2013
+ }
2014
+ // Mark items matched by each contains function
2015
+ lines.push(`if(Array.isArray(${v})){for(let _ci=0;_ci<${v}.length;_ci++){for(let _cj=0;_cj<${cfnArr}.length;_cj++){if(${cfnArr}[_cj](${v}[_ci])){${evArr}[_ci]=true;break}}}}`)
2016
+ if (schema.unevaluatedItems === false) {
2017
+ const inner = `if(Array.isArray(${v})){for(let _ci=0;_ci<${v}.length;_ci++){if(!${evArr}[_ci])return false}}`
2018
+ if (!ctx.deferredChecks) ctx.deferredChecks = []
2019
+ ctx.deferredChecks.push(inner + '}')
2020
+ } else {
2021
+ // unevaluatedItems: {schema}
2022
+ const ui = ctx.varCounter++
2023
+ const elemVar = `_ue${ui}`
2024
+ const subLines = []
2025
+ genCode(schema.unevaluatedItems, elemVar, subLines, ctx)
2026
+ if (subLines.length > 0) {
2027
+ const check = subLines.join(';')
2028
+ const inner = `if(Array.isArray(${v})){for(let _ci=0;_ci<${v}.length;_ci++){if(!${evArr}[_ci]){const ${elemVar}=${v}[_ci];${check}}}}`
2029
+ if (!ctx.deferredChecks) ctx.deferredChecks = []
2030
+ ctx.deferredChecks.push(inner + '}')
2031
+ } else {
2032
+ lines.push('}')
2033
+ }
2034
+ }
1807
2035
  } else if (schema.unevaluatedItems === false) {
1808
2036
  // Fallback: use static base index (may not be fully correct for all dynamic cases)
1809
2037
  const maxIdx = evalResult.items || 0
@@ -1982,6 +2210,8 @@ function compileToJSCodegenWithErrors(schema, schemaMap) {
1982
2210
  if (typeof schema === 'object' && schema !== null) {
1983
2211
  const s = JSON.stringify(schema)
1984
2212
  if (s.includes('unevaluatedProperties') || s.includes('unevaluatedItems')) return null
2213
+ // Bail on self-referencing schemas — error codegen doesn't support recursion
2214
+ if (s.includes('"$ref":"#"')) return null
1985
2215
  }
1986
2216
  if (typeof schema === 'boolean') {
1987
2217
  return schema
@@ -2129,7 +2359,7 @@ function genCodeE(schema, v, pathExpr, lines, ctx, schemaPrefix) {
2129
2359
  case 'string': return `typeof ${v}==='string'`
2130
2360
  case 'number': return `(typeof ${v}==='number'&&isFinite(${v}))`
2131
2361
  case 'integer': return `Number.isInteger(${v})`
2132
- case 'boolean': return `(${v}===true||${v}===false)`
2362
+ case 'boolean': return `typeof ${v}==='boolean'`
2133
2363
  case 'null': return `${v}===null`
2134
2364
  default: return 'true'
2135
2365
  }
@@ -2463,6 +2693,8 @@ function compileToJSCombined(schema, VALID_RESULT, schemaMap) {
2463
2693
  if (typeof schema === 'object' && schema !== null) {
2464
2694
  const s = JSON.stringify(schema)
2465
2695
  if (s.includes('unevaluatedProperties') || s.includes('unevaluatedItems')) return null
2696
+ // Bail on self-referencing schemas — combined codegen doesn't support recursion
2697
+ if (s.includes('"$ref":"#"')) return null
2466
2698
  }
2467
2699
  if (typeof schema === 'boolean') {
2468
2700
  return schema
@@ -2633,7 +2865,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx, schemaPrefix) {
2633
2865
  case 'string': return `typeof ${v}==='string'`
2634
2866
  case 'number': return `(typeof ${v}==='number'&&isFinite(${v}))`
2635
2867
  case 'integer': return `Number.isInteger(${v})`
2636
- case 'boolean': return `(${v}===true||${v}===false)`
2868
+ case 'boolean': return `typeof ${v}==='boolean'`
2637
2869
  case 'null': return `${v}===null`
2638
2870
  default: return 'true'
2639
2871
  }
@@ -3141,15 +3373,10 @@ function _collectEval(schema, result, defs, schemaMap, refStack, isRoot) {
3141
3373
  result.allItems = true
3142
3374
  }
3143
3375
 
3144
- // contains interaction with unevaluatedItems is complex
3145
- // At root level: contains + unevaluatedItems needs dynamic tracking
3146
- // In nested schemas: contains marks all items as evaluated
3376
+ // contains: marks matching items as evaluated (not ALL items)
3377
+ // Always set dynamic since which items match depends on the data
3147
3378
  if (schema.contains) {
3148
- if (isRoot && (schema.unevaluatedItems !== undefined)) {
3149
- result.dynamic = true
3150
- } else {
3151
- result.allItems = true
3152
- }
3379
+ result.dynamic = true
3153
3380
  }
3154
3381
 
3155
3382
  // unevaluatedProperties: true/schema → all props evaluated (for nested schemas only)
@@ -3184,6 +3411,18 @@ function _collectEval(schema, result, defs, schemaMap, refStack, isRoot) {
3184
3411
  _collectEval(schema.if, result, defs, schemaMap, refStack)
3185
3412
  if (schema.then) _collectEval(schema.then, result, defs, schemaMap, refStack)
3186
3413
  if (schema.else) _collectEval(schema.else, result, defs, schemaMap, refStack)
3414
+ } else if (schema.if) {
3415
+ // Standalone if (no then/else) still produces annotations per spec
3416
+ // Only collect properties and patterns, not deep items (contains etc.)
3417
+ result.dynamic = true
3418
+ if (schema.if.properties) {
3419
+ for (const k of Object.keys(schema.if.properties)) {
3420
+ if (!result.props.includes(k)) result.props.push(k)
3421
+ }
3422
+ }
3423
+ if (schema.if.patternProperties) {
3424
+ // patternProperties contribute to dynamic evaluation
3425
+ }
3187
3426
  }
3188
3427
 
3189
3428
  // dependentSchemas → dynamic
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ata-validator",
3
- "version": "0.8.0",
3
+ "version": "0.9.1",
4
4
  "description": "Ultra-fast JSON Schema validator. 4.7x faster validation, 1,800x faster compilation. Works without native addon. Cross-schema $ref, Draft 2020-12 + Draft 7, V8-optimized JS codegen, simdjson, RE2, multi-core. Standard Schema V1 compatible.",
5
5
  "main": "index.js",
6
6
  "module": "index.mjs",
@@ -19,7 +19,8 @@
19
19
  "types": "./compat.d.ts",
20
20
  "import": "./compat.mjs",
21
21
  "require": "./compat.js"
22
- }
22
+ },
23
+ "./package.json": "./package.json"
23
24
  },
24
25
  "sideEffects": false,
25
26
  "browser": {
package/src/ata.cpp CHANGED
@@ -436,6 +436,10 @@ struct compiled_schema {
436
436
  std::unordered_map<std::string, schema_node_ptr>> resource_dynamic_anchors;
437
437
  bool has_dynamic_refs = false;
438
438
  std::string current_resource_id; // compile-time only
439
+
440
+ // compile-time warnings (misplaced keywords, etc.)
441
+ std::vector<schema_warning> warnings;
442
+ std::string compile_path; // current JSON pointer during compilation
439
443
  };
440
444
 
441
445
  // Thread-local persistent parsers — reused across all validate calls on the
@@ -826,6 +830,61 @@ static schema_node_ptr compile_node(dom::element el,
826
830
  }
827
831
  }
828
832
 
833
+ // Warn about keywords used at the wrong type level.
834
+ // Only check when an explicit "type" is declared (type_mask != 0).
835
+ if (node->type_mask != 0) {
836
+ const uint8_t array_bit = json_type_bit(json_type::array);
837
+ const uint8_t string_bit = json_type_bit(json_type::string);
838
+ const uint8_t number_bits = json_type_bit(json_type::number) |
839
+ json_type_bit(json_type::integer);
840
+ const uint8_t object_bit = json_type_bit(json_type::object);
841
+
842
+ auto warn = [&](const char* keyword, const char* expected_type) {
843
+ ctx.warnings.push_back({
844
+ ctx.compile_path,
845
+ std::string(keyword) + " has no effect on type \"" +
846
+ (node->type_mask & json_type_bit(json_type::string) ? "string" :
847
+ node->type_mask & json_type_bit(json_type::boolean) ? "boolean" :
848
+ node->type_mask & json_type_bit(json_type::number) ? "number" :
849
+ node->type_mask & object_bit ? "object" :
850
+ node->type_mask & array_bit ? "array" : "unknown") +
851
+ "\", only applies to " + expected_type
852
+ });
853
+ };
854
+
855
+ // Array keywords on non-array type
856
+ if (!(node->type_mask & array_bit)) {
857
+ if (node->min_items.has_value()) warn("minItems", "array");
858
+ if (node->max_items.has_value()) warn("maxItems", "array");
859
+ if (node->unique_items) warn("uniqueItems", "array");
860
+ if (!node->prefix_items.empty()) warn("prefixItems", "array");
861
+ if (node->items_schema) warn("items", "array");
862
+ if (node->contains_schema) warn("contains", "array");
863
+ }
864
+
865
+ // String keywords on non-string type
866
+ if (!(node->type_mask & string_bit)) {
867
+ if (node->min_length.has_value()) warn("minLength", "string");
868
+ if (node->max_length.has_value()) warn("maxLength", "string");
869
+ if (node->pattern.has_value()) warn("pattern", "string");
870
+ }
871
+
872
+ // Numeric keywords on non-numeric type
873
+ if (!(node->type_mask & number_bits)) {
874
+ if (node->minimum.has_value()) warn("minimum", "number");
875
+ if (node->maximum.has_value()) warn("maximum", "number");
876
+ if (node->exclusive_minimum.has_value()) warn("exclusiveMinimum", "number");
877
+ if (node->exclusive_maximum.has_value()) warn("exclusiveMaximum", "number");
878
+ if (node->multiple_of.has_value()) warn("multipleOf", "number");
879
+ }
880
+
881
+ // Object keywords on non-object type
882
+ if (!(node->type_mask & object_bit)) {
883
+ if (!node->properties.empty()) warn("properties", "object");
884
+ if (!node->required.empty()) warn("required", "object");
885
+ }
886
+ }
887
+
829
888
  ctx.current_resource_id = prev_resource;
830
889
  return node;
831
890
  }
@@ -1611,7 +1670,7 @@ static void validate_node(const schema_node_ptr& node,
1611
1670
  std::string key_json = "\"" + std::string(key) + "\"";
1612
1671
  auto key_result = tl_dom_key_parser().parse(key_json);
1613
1672
  if (!key_result.error()) {
1614
- validate_node(pn, key_result.value(), path, ctx, errors, all_errors, dynamic_scope);
1673
+ validate_node(pn, key_result.value_unsafe(), path, ctx, errors, all_errors, dynamic_scope);
1615
1674
  }
1616
1675
  }
1617
1676
  }
@@ -2157,7 +2216,9 @@ static bool cg_exec(const cg::plan& p, const std::vector<cg::ins>& code,
2157
2216
  // Returns: true = valid, false = invalid OR unsupported (fallback to DOM).
2158
2217
 
2159
2218
  static json_type od_type(simdjson::ondemand::value& v) {
2160
- switch (v.type()) {
2219
+ simdjson::ondemand::json_type jt;
2220
+ if (v.type().get(jt)) return json_type::null_value;
2221
+ switch (jt) {
2161
2222
  case simdjson::ondemand::json_type::object: return json_type::object;
2162
2223
  case simdjson::ondemand::json_type::array: return json_type::array;
2163
2224
  case simdjson::ondemand::json_type::string: return json_type::string;
@@ -2260,7 +2321,8 @@ static bool od_exec(const cg::plan& p, const std::vector<cg::ins>& code,
2260
2321
  }
2261
2322
  for(auto field:o){
2262
2323
  simdjson::ondemand::raw_json_string rk; if(field.key().get(rk)!=SUCCESS) return false;
2263
- std::string_view key = field.unescaped_key();
2324
+ std::string_view key;
2325
+ if (field.unescaped_key().get(key)) continue;
2264
2326
  bool matched=false;
2265
2327
  for(auto& pp:props){
2266
2328
  if(key==pp.nm){
@@ -2539,7 +2601,8 @@ static bool od_exec_plan(const od_plan& plan, simdjson::ondemand::value value) {
2539
2601
  uint64_t prop_count = 0;
2540
2602
 
2541
2603
  for (auto field : obj) {
2542
- std::string_view key = field.unescaped_key();
2604
+ std::string_view key;
2605
+ if (field.unescaped_key().get(key)) continue;
2543
2606
  prop_count++;
2544
2607
 
2545
2608
  // Single merged scan: required + property in one pass
@@ -2600,7 +2663,7 @@ schema_ref compile(std::string_view schema_json) {
2600
2663
  if (result.error()) {
2601
2664
  return schema_ref{nullptr};
2602
2665
  }
2603
- doc = result.value();
2666
+ doc = result.value_unsafe();
2604
2667
 
2605
2668
  ctx->root = compile_node(doc, *ctx);
2606
2669
 
@@ -2616,6 +2679,7 @@ schema_ref compile(std::string_view schema_json) {
2616
2679
 
2617
2680
  schema_ref ref;
2618
2681
  ref.impl = ctx;
2682
+ ref.warnings = std::move(ctx->warnings);
2619
2683
  return ref;
2620
2684
  }
2621
2685
 
@@ -2655,7 +2719,7 @@ validation_result validate(const schema_ref& schema, std::string_view json,
2655
2719
  // Fast path: codegen bytecode execution (DOM)
2656
2720
  if (!schema.impl->use_ondemand && !schema.impl->gen_plan.code.empty()) {
2657
2721
  if (cg_exec(schema.impl->gen_plan, schema.impl->gen_plan.code,
2658
- result.value())) {
2722
+ result.value_unsafe())) {
2659
2723
  return {true, {}};
2660
2724
  }
2661
2725
  // Codegen said invalid OR hit COMPOSITION — fall through to tree walker
@@ -2675,10 +2739,10 @@ validation_result validate(const schema_ref& schema, std::string_view json,
2675
2739
  scope.push_back(&iit->second);
2676
2740
  }
2677
2741
  }
2678
- validate_node(schema.impl->root, result.value(), "", *schema.impl, errors,
2742
+ validate_node(schema.impl->root, result.value_unsafe(), "", *schema.impl, errors,
2679
2743
  opts.all_errors, &scope);
2680
2744
  } else {
2681
- validate_node(schema.impl->root, result.value(), "", *schema.impl, errors,
2745
+ validate_node(schema.impl->root, result.value_unsafe(), "", *schema.impl, errors,
2682
2746
  opts.all_errors);
2683
2747
  }
2684
2748
 
@@ -2721,10 +2785,10 @@ bool is_valid_prepadded(const schema_ref& schema, const char* data, size_t lengt
2721
2785
  if (result.error()) return false;
2722
2786
 
2723
2787
  if (!schema.impl->gen_plan.code.empty()) {
2724
- return cg_exec(schema.impl->gen_plan, schema.impl->gen_plan.code, result.value());
2788
+ return cg_exec(schema.impl->gen_plan, schema.impl->gen_plan.code, result.value_unsafe());
2725
2789
  }
2726
2790
 
2727
- return validate_fast(schema.impl->root, result.value(), *schema.impl);
2791
+ return validate_fast(schema.impl->root, result.value_unsafe(), *schema.impl);
2728
2792
  }
2729
2793
 
2730
2794
  bool is_valid_buf(const schema_ref& schema, const uint8_t* data, size_t length) {