ata-validator 0.4.9 → 0.4.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,16 +10,16 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
10
10
 
11
11
  | Scenario | ata | ajv | |
12
12
  |---|---|---|---|
13
- | **validate(obj)** valid | 76M ops/sec | 8M ops/sec | **ata 9.5x faster** |
14
- | **validate(obj)** invalid | 34M ops/sec | 8M ops/sec | **ata 4.3x faster** |
13
+ | **validate(obj)** valid | 68M ops/sec | 8M ops/sec | **ata 8.5x faster** |
14
+ | **validate(obj)** invalid | 17M ops/sec | 8M ops/sec | **ata 2.1x faster** |
15
15
  | **isValidObject(obj)** | 15.4M ops/sec | 9.2M ops/sec | **ata 1.7x faster** |
16
- | **validateJSON(str)** valid | 2.15M ops/sec | 1.88M ops/sec | **ata 1.1x faster** |
17
- | **validateJSON(str)** invalid | 2.17M ops/sec | 2.29M ops/sec | **ata 1.1x faster** |
16
+ | **validateJSON(str)** valid | 3.0M ops/sec | 1.9M ops/sec | **ata 1.6x faster** |
17
+ | **validateJSON(str)** invalid | 2.7M ops/sec | 2.3M ops/sec | **ata 1.2x faster** |
18
18
  | **Schema compilation** | 113K ops/sec | 818 ops/sec | **ata 138x faster** |
19
19
 
20
20
  > validate(obj) numbers are isolated single-schema benchmarks. Multi-schema benchmark overhead reduces throughput; real-world numbers depend on workload.
21
21
 
22
- ### Large Data JS Object Validation
22
+ ### Large Data - JS Object Validation
23
23
 
24
24
  | Size | ata | ajv | |
25
25
  |---|---|---|---|
@@ -35,18 +35,19 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
35
35
  | **ReDoS protection** (`^(a+)+$`) | 0.3ms | 765ms | **ata immune (RE2)** |
36
36
  | **Batch NDJSON** (10K items, multi-core) | 13.4M/sec | 5.1M/sec | **ata 2.6x faster** |
37
37
  | **Fastify HTTP** (100 users POST) | 24.6K req/sec | 22.6K req/sec | **ata 9% faster** |
38
+ | **Fastify startup** (500 routes) | 46ms | 77ms (standalone) | **ata 1.7x faster** |
38
39
 
39
- > ata is faster than ajv on **every** benchmark — valid and invalid data, objects and JSON strings, single documents and parallel batches.
40
+ > Isolated single-schema benchmarks. Results vary by workload and hardware.
40
41
 
41
42
  ### How it works
42
43
 
43
- **Hybrid validator**: ata compiles schemas into monolithic JS functions identical to the boolean fast path, but returning `VALID_RESULT` on success and calling the error collector on failure. V8 TurboFan optimizes it identically to a pure boolean function error code is dead code on the valid path. No try/catch (3.3x V8 deopt), no lazy arrays, no double-pass.
44
+ **Hybrid validator**: ata compiles schemas into monolithic JS functions identical to the boolean fast path, but returning `VALID_RESULT` on success and calling the error collector on failure. V8 TurboFan optimizes it identically to a pure boolean function - error code is dead code on the valid path. No try/catch (3.3x V8 deopt), no lazy arrays, no double-pass.
44
45
 
45
46
  **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `$ref` (local), `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
46
47
 
47
48
  **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables, tiered uniqueItems (nested loop for small arrays).
48
49
 
49
- **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields skipping irrelevant data at GB/s speeds.
50
+ **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields - skipping irrelevant data at GB/s speeds.
50
51
 
51
52
  ### JSON Schema Test Suite
52
53
 
@@ -54,27 +55,27 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
54
55
 
55
56
  ## When to use ata
56
57
 
57
- - **Any `validate(obj)` workload** 4.3x–9.5x faster than ajv
58
- - **Serverless / cold starts** 12.5x faster schema compilation
59
- - **Security-sensitive apps** RE2 regex, immune to ReDoS attacks
60
- - **Batch/streaming validation** NDJSON log processing, data pipelines (2.6x faster)
61
- - **Standard Schema V1** native support for Fastify v5, tRPC, TanStack
62
- - **C/C++ embedding** native library, no JS runtime needed
58
+ - **High-throughput `validate(obj)`** - 68M ops/sec valid, 17M ops/sec invalid
59
+ - **Serverless / cold starts** - 12.5x faster schema compilation
60
+ - **Security-sensitive apps** - RE2 regex, immune to ReDoS attacks
61
+ - **Batch/streaming validation** - NDJSON log processing, data pipelines (2.6x faster)
62
+ - **Standard Schema V1** - native support for Fastify v5, tRPC, TanStack
63
+ - **C/C++ embedding** - native library, no JS runtime needed
63
64
 
64
65
  ## When to use ajv
65
66
 
66
- - **Schemas with `patternProperties`, `dependentSchemas`** these bypass JS codegen and hit the slower NAPI path
67
- - **100% spec compliance needed** ajv covers more edge cases (ata: 98.4%)
67
+ - **Schemas with `patternProperties`, `dependentSchemas`** - these bypass JS codegen and hit the slower NAPI path
68
+ - **100% spec compliance needed** - ajv covers more edge cases (ata: 98.4%)
68
69
 
69
70
  ## Features
70
71
 
71
- - **Hybrid validator**: 76M ops/sec same function body as boolean check, returns result or calls error collector. No try/catch, no double pass
72
- - **Multi-core**: Parallel validation across all CPU cores 13.4M validations/sec
72
+ - **Hybrid validator**: 68M ops/sec - same function body as boolean check, returns result or calls error collector. No try/catch, no double pass
73
+ - **Multi-core**: Parallel validation across all CPU cores - 13.4M validations/sec
73
74
  - **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
74
75
  - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks (2391x faster on pathological input)
75
76
  - **V8-optimized codegen**: Destructuring batch reads, type guard elimination, property hoisting
76
77
  - **Standard Schema V1**: Compatible with Fastify, tRPC, TanStack, Drizzle
77
- - **Zero-copy paths**: Buffer and pre-padded input support no unnecessary copies
78
+ - **Zero-copy paths**: Buffer and pre-padded input support - no unnecessary copies
78
79
  - **Defaults + coercion**: `default` values, `coerceTypes`, `removeAdditional` support
79
80
  - **C/C++ library**: Native API for non-Node.js environments
80
81
  - **98.4% spec compliant**: Draft 2020-12
@@ -103,7 +104,7 @@ const v = new Validator({
103
104
  required: ['name', 'email']
104
105
  });
105
106
 
106
- // Fast boolean check JS codegen (9.5x faster than ajv)
107
+ // Fast boolean check - JS codegen, 68M ops/sec
107
108
  v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
108
109
 
109
110
  // Full validation with error details + defaults applied
@@ -117,7 +118,7 @@ v.isValidJSON('{"name": "Mert", "email": "mert@example.com"}'); // true
117
118
  // Buffer input (zero-copy, raw NAPI)
118
119
  v.isValid(Buffer.from('{"name": "Mert", "email": "mert@example.com"}'));
119
120
 
120
- // Parallel batch multi-core, NDJSON (2.6x faster than ajv)
121
+ // Parallel batch - multi-core, NDJSON, 13.4M items/sec
121
122
  const ndjson = Buffer.from(lines.join('\n'));
122
123
  v.isValidParallel(ndjson); // bool[]
123
124
  v.countValid(ndjson); // number
@@ -132,6 +133,27 @@ const v = new Validator(schema, {
132
133
  });
133
134
  ```
134
135
 
136
+ ### Standalone Pre-compilation
137
+
138
+ Pre-compile schemas to JS files for near-zero startup. No native addon needed at runtime.
139
+
140
+ ```javascript
141
+ const fs = require('fs');
142
+
143
+ // Build phase (once)
144
+ const v = new Validator(schema);
145
+ fs.writeFileSync('./compiled.js', v.toStandalone());
146
+
147
+ // Read phase (every startup) - 0.6μs per schema, pure JS
148
+ const v2 = Validator.fromStandalone(require('./compiled.js'), schema);
149
+
150
+ // Bundle multiple schemas - deduplicated, single file
151
+ fs.writeFileSync('./bundle.js', Validator.bundleCompact(schemas));
152
+ const validators = Validator.loadBundle(require('./bundle.js'), schemas);
153
+ ```
154
+
155
+ **Fastify startup (500 routes): ajv standalone 77ms → ata standalone 46ms (1.7x faster)**
156
+
135
157
  ### Standard Schema V1
136
158
 
137
159
  ```javascript
@@ -28,7 +28,7 @@ using schema_node_ptr = std::shared_ptr<schema_node>;
28
28
 
29
29
  // MUST match layout in src/ata.cpp exactly (reinterpret_cast)
30
30
  struct schema_node {
31
- std::vector<std::string> types;
31
+ uint8_t type_mask = 0;
32
32
 
33
33
  std::optional<double> minimum;
34
34
  std::optional<double> maximum;
@@ -67,11 +67,11 @@ struct schema_node {
67
67
  };
68
68
  std::vector<pattern_prop> pattern_properties;
69
69
 
70
- std::optional<std::string> enum_values_raw;
71
70
  std::vector<std::string> enum_values_minified;
72
71
  std::optional<std::string> const_value_raw;
73
72
 
74
73
  std::optional<std::string> format;
74
+ uint8_t format_id = 255;
75
75
 
76
76
  std::vector<schema_node_ptr> all_of;
77
77
  std::vector<schema_node_ptr> any_of;
@@ -413,46 +413,39 @@ static void validate_napi(const schema_node_ptr& node,
413
413
 
414
414
  auto actual_type = napi_type_of(value);
415
415
 
416
- // type
417
- if (!node->types.empty()) {
418
- bool match = false;
419
- for (const auto& t : node->types) {
420
- if (napi_type_matches(value, t)) {
421
- match = true;
422
- break;
423
- }
424
- }
425
- if (!match) {
416
+ // type — uses bitmask matching ata.cpp json_type enum order:
417
+ // 0=string, 1=number, 2=integer, 3=boolean, 4=null_value, 5=object, 6=array
418
+ if (node->type_mask) {
419
+ uint8_t val_bits = 0;
420
+ if (actual_type == "string") val_bits = 1u << 0;
421
+ else if (actual_type == "number") val_bits = 1u << 1;
422
+ else if (actual_type == "integer") val_bits = (1u << 2) | (1u << 1); // integer matches number
423
+ else if (actual_type == "boolean") val_bits = 1u << 3;
424
+ else if (actual_type == "null") val_bits = 1u << 4;
425
+ else if (actual_type == "object") val_bits = 1u << 5;
426
+ else if (actual_type == "array") val_bits = 1u << 6;
427
+ if (!(val_bits & node->type_mask)) {
428
+ static const char* type_names[] = {"string","number","integer","boolean","null","object","array"};
426
429
  std::string expected;
427
- for (size_t i = 0; i < node->types.size(); ++i) {
428
- if (i > 0) expected += ", ";
429
- expected += node->types[i];
430
+ for (int b = 0; b < 7; ++b) {
431
+ if (node->type_mask & (1u << b)) {
432
+ if (!expected.empty()) expected += ", ";
433
+ expected += type_names[b];
434
+ }
430
435
  }
431
436
  errors.push_back({ata::error_code::type_mismatch, path,
432
437
  "expected type " + expected + ", got " + actual_type});
433
438
  }
434
439
  }
435
440
 
436
- // enum
437
- if (node->enum_values_raw.has_value()) {
441
+ // enum — compare against pre-minified canonical values
442
+ if (!node->enum_values_minified.empty()) {
438
443
  std::string val_json = napi_to_json(env, value);
439
- // Parse enum from raw and compare
440
444
  bool found = false;
441
- // We need to compare against each element in the enum array
442
- // The enum_values_raw is a JSON array string like [1,2,3]
443
- // We'll use JSON.parse in JS to handle this
444
- auto json_obj = env.Global().Get("JSON").As<Napi::Object>();
445
- auto parse_fn = json_obj.Get("parse").As<Napi::Function>();
446
- auto enum_arr = parse_fn.Call(json_obj,
447
- {Napi::String::New(env, node->enum_values_raw.value())});
448
- if (enum_arr.IsArray()) {
449
- auto arr = enum_arr.As<Napi::Array>();
450
- for (uint32_t i = 0; i < arr.Length(); ++i) {
451
- std::string elem_json = napi_to_json(env, arr.Get(i));
452
- if (elem_json == val_json) {
453
- found = true;
454
- break;
455
- }
445
+ for (const auto& ev : node->enum_values_minified) {
446
+ if (ev == val_json) {
447
+ found = true;
448
+ break;
456
449
  }
457
450
  }
458
451
  if (!found) {