ata-validator 0.4.9 → 0.4.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -21
- package/binding/ata_napi.cpp +26 -33
- package/index.js +363 -184
- package/package.json +1 -1
- package/src/ata.cpp +329 -202
package/README.md
CHANGED
|
@@ -10,16 +10,16 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
|
|
|
10
10
|
|
|
11
11
|
| Scenario | ata | ajv | |
|
|
12
12
|
|---|---|---|---|
|
|
13
|
-
| **validate(obj)** valid |
|
|
14
|
-
| **validate(obj)** invalid |
|
|
13
|
+
| **validate(obj)** valid | 68M ops/sec | 8M ops/sec | **ata 8.5x faster** |
|
|
14
|
+
| **validate(obj)** invalid | 17M ops/sec | 8M ops/sec | **ata 2.1x faster** |
|
|
15
15
|
| **isValidObject(obj)** | 15.4M ops/sec | 9.2M ops/sec | **ata 1.7x faster** |
|
|
16
|
-
| **validateJSON(str)** valid |
|
|
17
|
-
| **validateJSON(str)** invalid | 2.
|
|
16
|
+
| **validateJSON(str)** valid | 3.0M ops/sec | 1.9M ops/sec | **ata 1.6x faster** |
|
|
17
|
+
| **validateJSON(str)** invalid | 2.7M ops/sec | 2.3M ops/sec | **ata 1.2x faster** |
|
|
18
18
|
| **Schema compilation** | 113K ops/sec | 818 ops/sec | **ata 138x faster** |
|
|
19
19
|
|
|
20
20
|
> validate(obj) numbers are isolated single-schema benchmarks. Multi-schema benchmark overhead reduces throughput; real-world numbers depend on workload.
|
|
21
21
|
|
|
22
|
-
### Large Data
|
|
22
|
+
### Large Data - JS Object Validation
|
|
23
23
|
|
|
24
24
|
| Size | ata | ajv | |
|
|
25
25
|
|---|---|---|---|
|
|
@@ -35,18 +35,19 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
|
|
|
35
35
|
| **ReDoS protection** (`^(a+)+$`) | 0.3ms | 765ms | **ata immune (RE2)** |
|
|
36
36
|
| **Batch NDJSON** (10K items, multi-core) | 13.4M/sec | 5.1M/sec | **ata 2.6x faster** |
|
|
37
37
|
| **Fastify HTTP** (100 users POST) | 24.6K req/sec | 22.6K req/sec | **ata 9% faster** |
|
|
38
|
+
| **Fastify startup** (500 routes) | 46ms | 77ms (standalone) | **ata 1.7x faster** |
|
|
38
39
|
|
|
39
|
-
>
|
|
40
|
+
> Isolated single-schema benchmarks. Results vary by workload and hardware.
|
|
40
41
|
|
|
41
42
|
### How it works
|
|
42
43
|
|
|
43
|
-
**Hybrid validator**: ata compiles schemas into monolithic JS functions identical to the boolean fast path, but returning `VALID_RESULT` on success and calling the error collector on failure. V8 TurboFan optimizes it identically to a pure boolean function
|
|
44
|
+
**Hybrid validator**: ata compiles schemas into monolithic JS functions identical to the boolean fast path, but returning `VALID_RESULT` on success and calling the error collector on failure. V8 TurboFan optimizes it identically to a pure boolean function - error code is dead code on the valid path. No try/catch (3.3x V8 deopt), no lazy arrays, no double-pass.
|
|
44
45
|
|
|
45
46
|
**JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `$ref` (local), `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
|
|
46
47
|
|
|
47
48
|
**V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables, tiered uniqueItems (nested loop for small arrays).
|
|
48
49
|
|
|
49
|
-
**Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields
|
|
50
|
+
**Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields - skipping irrelevant data at GB/s speeds.
|
|
50
51
|
|
|
51
52
|
### JSON Schema Test Suite
|
|
52
53
|
|
|
@@ -54,27 +55,27 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
|
|
|
54
55
|
|
|
55
56
|
## When to use ata
|
|
56
57
|
|
|
57
|
-
- **
|
|
58
|
-
- **Serverless / cold starts**
|
|
59
|
-
- **Security-sensitive apps**
|
|
60
|
-
- **Batch/streaming validation**
|
|
61
|
-
- **Standard Schema V1**
|
|
62
|
-
- **C/C++ embedding**
|
|
58
|
+
- **High-throughput `validate(obj)`** - 68M ops/sec valid, 17M ops/sec invalid
|
|
59
|
+
- **Serverless / cold starts** - 12.5x faster schema compilation
|
|
60
|
+
- **Security-sensitive apps** - RE2 regex, immune to ReDoS attacks
|
|
61
|
+
- **Batch/streaming validation** - NDJSON log processing, data pipelines (2.6x faster)
|
|
62
|
+
- **Standard Schema V1** - native support for Fastify v5, tRPC, TanStack
|
|
63
|
+
- **C/C++ embedding** - native library, no JS runtime needed
|
|
63
64
|
|
|
64
65
|
## When to use ajv
|
|
65
66
|
|
|
66
|
-
- **Schemas with `patternProperties`, `dependentSchemas`**
|
|
67
|
-
- **100% spec compliance needed**
|
|
67
|
+
- **Schemas with `patternProperties`, `dependentSchemas`** - these bypass JS codegen and hit the slower NAPI path
|
|
68
|
+
- **100% spec compliance needed** - ajv covers more edge cases (ata: 98.4%)
|
|
68
69
|
|
|
69
70
|
## Features
|
|
70
71
|
|
|
71
|
-
- **Hybrid validator**:
|
|
72
|
-
- **Multi-core**: Parallel validation across all CPU cores
|
|
72
|
+
- **Hybrid validator**: 68M ops/sec - same function body as boolean check, returns result or calls error collector. No try/catch, no double pass
|
|
73
|
+
- **Multi-core**: Parallel validation across all CPU cores - 13.4M validations/sec
|
|
73
74
|
- **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
|
|
74
75
|
- **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks (2391x faster on pathological input)
|
|
75
76
|
- **V8-optimized codegen**: Destructuring batch reads, type guard elimination, property hoisting
|
|
76
77
|
- **Standard Schema V1**: Compatible with Fastify, tRPC, TanStack, Drizzle
|
|
77
|
-
- **Zero-copy paths**: Buffer and pre-padded input support
|
|
78
|
+
- **Zero-copy paths**: Buffer and pre-padded input support - no unnecessary copies
|
|
78
79
|
- **Defaults + coercion**: `default` values, `coerceTypes`, `removeAdditional` support
|
|
79
80
|
- **C/C++ library**: Native API for non-Node.js environments
|
|
80
81
|
- **98.4% spec compliant**: Draft 2020-12
|
|
@@ -103,7 +104,7 @@ const v = new Validator({
|
|
|
103
104
|
required: ['name', 'email']
|
|
104
105
|
});
|
|
105
106
|
|
|
106
|
-
// Fast boolean check
|
|
107
|
+
// Fast boolean check - JS codegen, 68M ops/sec
|
|
107
108
|
v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
|
|
108
109
|
|
|
109
110
|
// Full validation with error details + defaults applied
|
|
@@ -117,7 +118,7 @@ v.isValidJSON('{"name": "Mert", "email": "mert@example.com"}'); // true
|
|
|
117
118
|
// Buffer input (zero-copy, raw NAPI)
|
|
118
119
|
v.isValid(Buffer.from('{"name": "Mert", "email": "mert@example.com"}'));
|
|
119
120
|
|
|
120
|
-
// Parallel batch
|
|
121
|
+
// Parallel batch - multi-core, NDJSON, 13.4M items/sec
|
|
121
122
|
const ndjson = Buffer.from(lines.join('\n'));
|
|
122
123
|
v.isValidParallel(ndjson); // bool[]
|
|
123
124
|
v.countValid(ndjson); // number
|
|
@@ -132,6 +133,27 @@ const v = new Validator(schema, {
|
|
|
132
133
|
});
|
|
133
134
|
```
|
|
134
135
|
|
|
136
|
+
### Standalone Pre-compilation
|
|
137
|
+
|
|
138
|
+
Pre-compile schemas to JS files for near-zero startup. No native addon needed at runtime.
|
|
139
|
+
|
|
140
|
+
```javascript
|
|
141
|
+
const fs = require('fs');
|
|
142
|
+
|
|
143
|
+
// Build phase (once)
|
|
144
|
+
const v = new Validator(schema);
|
|
145
|
+
fs.writeFileSync('./compiled.js', v.toStandalone());
|
|
146
|
+
|
|
147
|
+
// Read phase (every startup) - 0.6μs per schema, pure JS
|
|
148
|
+
const v2 = Validator.fromStandalone(require('./compiled.js'), schema);
|
|
149
|
+
|
|
150
|
+
// Bundle multiple schemas - deduplicated, single file
|
|
151
|
+
fs.writeFileSync('./bundle.js', Validator.bundleCompact(schemas));
|
|
152
|
+
const validators = Validator.loadBundle(require('./bundle.js'), schemas);
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**Fastify startup (500 routes): ajv standalone 77ms → ata standalone 46ms (1.7x faster)**
|
|
156
|
+
|
|
135
157
|
### Standard Schema V1
|
|
136
158
|
|
|
137
159
|
```javascript
|
package/binding/ata_napi.cpp
CHANGED
|
@@ -28,7 +28,7 @@ using schema_node_ptr = std::shared_ptr<schema_node>;
|
|
|
28
28
|
|
|
29
29
|
// MUST match layout in src/ata.cpp exactly (reinterpret_cast)
|
|
30
30
|
struct schema_node {
|
|
31
|
-
|
|
31
|
+
uint8_t type_mask = 0;
|
|
32
32
|
|
|
33
33
|
std::optional<double> minimum;
|
|
34
34
|
std::optional<double> maximum;
|
|
@@ -67,11 +67,11 @@ struct schema_node {
|
|
|
67
67
|
};
|
|
68
68
|
std::vector<pattern_prop> pattern_properties;
|
|
69
69
|
|
|
70
|
-
std::optional<std::string> enum_values_raw;
|
|
71
70
|
std::vector<std::string> enum_values_minified;
|
|
72
71
|
std::optional<std::string> const_value_raw;
|
|
73
72
|
|
|
74
73
|
std::optional<std::string> format;
|
|
74
|
+
uint8_t format_id = 255;
|
|
75
75
|
|
|
76
76
|
std::vector<schema_node_ptr> all_of;
|
|
77
77
|
std::vector<schema_node_ptr> any_of;
|
|
@@ -413,46 +413,39 @@ static void validate_napi(const schema_node_ptr& node,
|
|
|
413
413
|
|
|
414
414
|
auto actual_type = napi_type_of(value);
|
|
415
415
|
|
|
416
|
-
// type
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
if (
|
|
416
|
+
// type — uses bitmask matching ata.cpp json_type enum order:
|
|
417
|
+
// 0=string, 1=number, 2=integer, 3=boolean, 4=null_value, 5=object, 6=array
|
|
418
|
+
if (node->type_mask) {
|
|
419
|
+
uint8_t val_bits = 0;
|
|
420
|
+
if (actual_type == "string") val_bits = 1u << 0;
|
|
421
|
+
else if (actual_type == "number") val_bits = 1u << 1;
|
|
422
|
+
else if (actual_type == "integer") val_bits = (1u << 2) | (1u << 1); // integer matches number
|
|
423
|
+
else if (actual_type == "boolean") val_bits = 1u << 3;
|
|
424
|
+
else if (actual_type == "null") val_bits = 1u << 4;
|
|
425
|
+
else if (actual_type == "object") val_bits = 1u << 5;
|
|
426
|
+
else if (actual_type == "array") val_bits = 1u << 6;
|
|
427
|
+
if (!(val_bits & node->type_mask)) {
|
|
428
|
+
static const char* type_names[] = {"string","number","integer","boolean","null","object","array"};
|
|
426
429
|
std::string expected;
|
|
427
|
-
for (
|
|
428
|
-
if (
|
|
429
|
-
|
|
430
|
+
for (int b = 0; b < 7; ++b) {
|
|
431
|
+
if (node->type_mask & (1u << b)) {
|
|
432
|
+
if (!expected.empty()) expected += ", ";
|
|
433
|
+
expected += type_names[b];
|
|
434
|
+
}
|
|
430
435
|
}
|
|
431
436
|
errors.push_back({ata::error_code::type_mismatch, path,
|
|
432
437
|
"expected type " + expected + ", got " + actual_type});
|
|
433
438
|
}
|
|
434
439
|
}
|
|
435
440
|
|
|
436
|
-
// enum
|
|
437
|
-
if (node->
|
|
441
|
+
// enum — compare against pre-minified canonical values
|
|
442
|
+
if (!node->enum_values_minified.empty()) {
|
|
438
443
|
std::string val_json = napi_to_json(env, value);
|
|
439
|
-
// Parse enum from raw and compare
|
|
440
444
|
bool found = false;
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
445
|
-
auto parse_fn = json_obj.Get("parse").As<Napi::Function>();
|
|
446
|
-
auto enum_arr = parse_fn.Call(json_obj,
|
|
447
|
-
{Napi::String::New(env, node->enum_values_raw.value())});
|
|
448
|
-
if (enum_arr.IsArray()) {
|
|
449
|
-
auto arr = enum_arr.As<Napi::Array>();
|
|
450
|
-
for (uint32_t i = 0; i < arr.Length(); ++i) {
|
|
451
|
-
std::string elem_json = napi_to_json(env, arr.Get(i));
|
|
452
|
-
if (elem_json == val_json) {
|
|
453
|
-
found = true;
|
|
454
|
-
break;
|
|
455
|
-
}
|
|
445
|
+
for (const auto& ev : node->enum_values_minified) {
|
|
446
|
+
if (ev == val_json) {
|
|
447
|
+
found = true;
|
|
448
|
+
break;
|
|
456
449
|
}
|
|
457
450
|
}
|
|
458
451
|
if (!found) {
|