ata-validator 0.4.4 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,10 +10,10 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
10
10
 
11
11
  | Scenario | ata | ajv | |
12
12
  |---|---|---|---|
13
- | **validate(obj)** | 9.6M ops/sec | 8.5M ops/sec | **ata 1.1x faster** |
14
- | **isValidObject(obj)** | 10.4M ops/sec | 9.3M ops/sec | **ata 1.1x faster** |
15
- | **validateJSON(str)** | 1.9M ops/sec | 1.87M ops/sec | **ata 1.02x faster** |
16
- | **isValidJSON(str)** | 1.9M ops/sec | 1.89M ops/sec | **ata 1.01x faster** |
13
+ | **validate(obj)** | 15M ops/sec | 8.5M ops/sec | **ata 1.8x faster** |
14
+ | **isValidObject(obj)** | 17.4M ops/sec | 9.4M ops/sec | **ata 1.8x faster** |
15
+ | **validateJSON(str)** | 2.1M ops/sec | 1.9M ops/sec | **ata 1.1x faster** |
16
+ | **isValidJSON(str)** | 2.0M ops/sec | 1.9M ops/sec | **ata 1.1x faster** |
17
17
  | **Schema compilation** | 125,690 ops/sec | 831 ops/sec | **ata 151x faster** |
18
18
 
19
19
  ### Large Data — JS Object Validation
@@ -24,61 +24,64 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
24
24
  | 100 users (20KB) | 658K ops/sec | 243K ops/sec | **ata 2.7x faster** |
25
25
  | 1,000 users (205KB) | 64K ops/sec | 23.5K ops/sec | **ata 2.7x faster** |
26
26
 
27
- ### Parallel Batch Validation (multi-core)
27
+ ### Real-World Scenarios
28
28
 
29
- | Batch Size | ata | ajv | |
29
+ | Scenario | ata | ajv | |
30
30
  |---|---|---|---|
31
- | 1,000 items | 8.4M items/sec | 2.2M items/sec | **ata 3.9x faster** |
32
- | 10,000 items | 12.5M items/sec | 2.1M items/sec | **ata 5.9x faster** |
33
-
34
- > ajv is single-threaded (JS). ata uses all CPU cores via a persistent C++ thread pool.
31
+ | **Serverless cold start** (50 schemas) | 7.7ms | 96ms | **ata 12.5x faster** |
32
+ | **ReDoS protection** (`^(a+)+$`) | 0.3ms | 765ms | **ata immune (RE2)** |
33
+ | **Batch NDJSON** (10K items, multi-core) | 13.4M/sec | 5.1M/sec | **ata 2.6x faster** |
34
+ | **Fastify HTTP** (100 users POST) | 24.6K req/sec | 22.6K req/sec | **ata 9% faster** |
35
35
 
36
36
  ### Where ajv wins
37
37
 
38
38
  | Scenario | ata | ajv | |
39
39
  |---|---|---|---|
40
- | **validate(obj)** (invalid data, error collection) | 133K ops/sec | 7.5M ops/sec | **ajv 56x faster** |
41
- | **validateJSON(str)** (invalid data) | 169K ops/sec | 2.3M ops/sec | **ajv 14x faster** |
40
+ | **validate(obj)** (invalid data) | 6M ops/sec | 7.9M ops/sec | **ajv 1.3x faster** |
41
+ | **validateJSON(str)** (invalid data) | 2.2M ops/sec | 2.3M ops/sec | **ajv 1.1x faster** |
42
42
 
43
- > Invalid-data error collection goes through the C++ NAPI path. This is the slow path by design — production traffic is overwhelmingly valid.
43
+ > Invalid-data error path ajv is slightly faster. Production traffic is overwhelmingly valid.
44
44
 
45
45
  ### How it works
46
46
 
47
- **Speculative validation**: For valid data (the common case), ata runs a JS codegen fast path entirely in V8 JIT — no NAPI boundary crossing. Only when validation fails does it fall through to the C++ engine for detailed error collection.
47
+ **Speculative validation**: For valid data (the common case), ata runs a JS codegen fast path entirely in V8 JIT — no NAPI boundary crossing. Only when validation fails does it fall through to the JS error-collecting codegen or C++ engine.
48
48
 
49
- **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
49
+ **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `$ref` (local), `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
50
50
 
51
- **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables.
51
+ **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables, tiered uniqueItems (nested loop for small arrays).
52
52
 
53
53
  **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields — skipping irrelevant data at GB/s speeds.
54
54
 
55
55
  ### JSON Schema Test Suite
56
56
 
57
- **98.5%** pass rate (938/952) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
57
+ **98.4%** pass rate (937/952) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
58
58
 
59
59
  ## When to use ata
60
60
 
61
- - **Any `validate(obj)` workload** — 1.1x–2.7x faster than ajv on valid data
62
- - **Batch/streaming validation** — NDJSON log processing, data pipelines (5.9x faster)
63
- - **Schema-heavy startup** — many schemas compiled at boot (151x faster compile)
61
+ - **Any `validate(obj)` workload** — 1.8x–2.7x faster than ajv on valid data
62
+ - **Serverless / cold starts** — 12.5x faster schema compilation
63
+ - **Security-sensitive apps** — RE2 regex, immune to ReDoS attacks
64
+ - **Batch/streaming validation** — NDJSON log processing, data pipelines (2.6x faster)
65
+ - **Standard Schema V1** — native support for Fastify v5, tRPC, TanStack
64
66
  - **C/C++ embedding** — native library, no JS runtime needed
65
67
 
66
68
  ## When to use ajv
67
69
 
68
- - **Error-heavy workloads** — where most data is invalid and error details matter
69
- - **Schemas with `$ref`, `patternProperties`, `dependentSchemas`** — these bypass JS codegen and hit the slower NAPI path
70
+ - **Error-heavy workloads** — where most data is invalid (ajv 1.3x faster on error path)
71
+ - **Schemas with `patternProperties`, `dependentSchemas`** — these bypass JS codegen
70
72
 
71
73
  ## Features
72
74
 
73
75
  - **Speculative validation**: JS codegen fast path — valid data never crosses the NAPI boundary
74
- - **Multi-core**: Parallel validation across all CPU cores — 12.5M validations/sec
76
+ - **Multi-core**: Parallel validation across all CPU cores — 13.4M validations/sec
75
77
  - **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
76
- - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks
78
+ - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks (2391x faster on pathological input)
77
79
  - **V8-optimized codegen**: Destructuring batch reads, type guard elimination, property hoisting
78
80
  - **Standard Schema V1**: Compatible with Fastify, tRPC, TanStack, Drizzle
79
81
  - **Zero-copy paths**: Buffer and pre-padded input support — no unnecessary copies
82
+ - **Defaults + coercion**: `default` values, `coerceTypes`, `removeAdditional` support
80
83
  - **C/C++ library**: Native API for non-Node.js environments
81
- - **98.5% spec compliant**: Draft 2020-12
84
+ - **98.4% spec compliant**: Draft 2020-12
82
85
 
83
86
  ## Installation
84
87
 
@@ -98,18 +101,18 @@ const v = new Validator({
98
101
  properties: {
99
102
  name: { type: 'string', minLength: 1 },
100
103
  email: { type: 'string', format: 'email' },
101
- age: { type: 'integer', minimum: 0 }
104
+ age: { type: 'integer', minimum: 0 },
105
+ role: { type: 'string', default: 'user' }
102
106
  },
103
107
  required: ['name', 'email']
104
108
  });
105
109
 
106
- // Fast boolean check — JS codegen, no NAPI (1.1x faster than ajv)
110
+ // Fast boolean check — JS codegen, no NAPI (1.8x faster than ajv)
107
111
  v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
108
112
 
109
- // Full validation with error details
110
- const result = v.validate({ name: 'Mert', email: 'mert@example.com', age: 26 });
111
- console.log(result.valid); // true
112
- console.log(result.errors); // []
113
+ // Full validation with error details + defaults applied
114
+ const result = v.validate({ name: 'Mert', email: 'mert@example.com' });
115
+ // result.valid === true, data.role === 'user' (default applied)
113
116
 
114
117
  // JSON string validation (simdjson fast path)
115
118
  v.validateJSON('{"name": "Mert", "email": "mert@example.com"}');
@@ -118,12 +121,21 @@ v.isValidJSON('{"name": "Mert", "email": "mert@example.com"}'); // true
118
121
  // Buffer input (zero-copy, raw NAPI)
119
122
  v.isValid(Buffer.from('{"name": "Mert", "email": "mert@example.com"}'));
120
123
 
121
- // Parallel batch — multi-core, NDJSON (5.9x faster than ajv)
124
+ // Parallel batch — multi-core, NDJSON (2.6x faster than ajv)
122
125
  const ndjson = Buffer.from(lines.join('\n'));
123
126
  v.isValidParallel(ndjson); // bool[]
124
127
  v.countValid(ndjson); // number
125
128
  ```
126
129
 
130
+ ### Options
131
+
132
+ ```javascript
133
+ const v = new Validator(schema, {
134
+ coerceTypes: true, // "42" → 42 for integer fields
135
+ removeAdditional: true, // strip properties not in schema
136
+ });
137
+ ```
138
+
127
139
  ### Standard Schema V1
128
140
 
129
141
  ```javascript
@@ -143,7 +155,10 @@ npm install fastify-ata
143
155
 
144
156
  ```javascript
145
157
  const fastify = require('fastify')();
146
- fastify.register(require('fastify-ata'));
158
+ fastify.register(require('fastify-ata'), {
159
+ coerceTypes: true,
160
+ removeAdditional: true,
161
+ });
147
162
 
148
163
  // All existing JSON Schema route definitions work as-is
149
164
  ```
package/index.js CHANGED
@@ -250,17 +250,24 @@ class Validator {
250
250
  const useSimdjsonForLarge = !hasArrayTraversal;
251
251
 
252
252
  if (jsFn) {
253
- // Error handler: combined (optimized) jsErrFn NAPI fallback
254
- const errFn = jsCombinedFn
255
- ? (d) => { try { return jsCombinedFn(d); } catch { return compiled.validate(d); } }
256
- : jsErrFn
257
- ? (d) => { try { return jsErrFn(d, true); } catch { return compiled.validate(d); } }
258
- : (d) => compiled.validate(d);
259
- this.validate = preprocess
260
- ? (data) => { preprocess(data); return jsFn(data) ? VALID_RESULT : errFn(data); }
261
- : (data) => jsFn(data) ? VALID_RESULT : errFn(data);
253
+ // Best path: combined validator (single pass, lazy error array)
254
+ // Valid data: no array allocation, returns VALID_RESULT
255
+ // Invalid data: collects errors in one pass (no double validation)
256
+ // Fallback: jsFn + errFn for schemas combined can't handle
257
+ const errFn = jsErrFn
258
+ ? (d) => { try { return jsErrFn(d, true); } catch { return compiled.validate(d); } }
259
+ : (d) => compiled.validate(d);
260
+ this.validate = jsCombinedFn
261
+ ? (preprocess
262
+ ? (data) => { preprocess(data); try { return jsCombinedFn(data); } catch { return jsFn(data) ? VALID_RESULT : errFn(data); } }
263
+ : (data) => { try { return jsCombinedFn(data); } catch { return jsFn(data) ? VALID_RESULT : errFn(data); } })
264
+ : (preprocess
265
+ ? (data) => { preprocess(data); return jsFn(data) ? VALID_RESULT : errFn(data); }
266
+ : (data) => jsFn(data) ? VALID_RESULT : errFn(data));
262
267
  this.isValidObject = jsFn;
263
- const jsonValidateFn = (obj) => jsFn(obj) ? VALID_RESULT : errFn(obj);
268
+ const jsonValidateFn = jsCombinedFn
269
+ ? (obj) => { try { return jsCombinedFn(obj); } catch { return jsFn(obj) ? VALID_RESULT : errFn(obj); } }
270
+ : (obj) => jsFn(obj) ? VALID_RESULT : errFn(obj);
264
271
  this.validateJSON = useSimdjsonForLarge
265
272
  ? (jsonStr) => {
266
273
  if (jsonStr.length >= SIMDJSON_THRESHOLD) {
@@ -1208,16 +1208,18 @@ function compileToJSCombined(schema, VALID_RESULT) {
1208
1208
 
1209
1209
  // Use factory pattern: closure vars (regexes, etc.) created once, not per call
1210
1210
  const closureParams = ctx.closureVars.join(',')
1211
- const inner = `const _e=[];\n ` +
1211
+ // Lazy error array no allocation for valid data (the common case)
1212
+ const inner = `let _e;\n ` +
1212
1213
  (ctx.helperCode.length ? ctx.helperCode.join('\n ') + '\n ' : '') +
1213
1214
  lines.join('\n ') +
1214
- `\n return _e.length===0?R:{valid:false,errors:_e}`
1215
+ `\n return _e?{valid:false,errors:_e}:R`
1215
1216
 
1216
1217
  try {
1217
1218
  const factory = new Function('R' + (closureParams ? ',' + closureParams : ''),
1218
1219
  `return function(d){${inner}}`)
1219
1220
  return factory(VALID_RESULT, ...ctx.closureVals)
1220
- } catch {
1221
+ } catch (e) {
1222
+ if (process.env.ATA_DEBUG) console.error('compileToJSCombined error:', e.message, '\n', inner.slice(0, 500))
1221
1223
  return null
1222
1224
  }
1223
1225
  }
@@ -1239,7 +1241,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1239
1241
  const types = schema.type ? (Array.isArray(schema.type) ? schema.type : [schema.type]) : null
1240
1242
  let isObj = false, isArr = false, isStr = false, isNum = false
1241
1243
 
1242
- const fail = (code, msg) => `_e.push({code:'${code}',path:${pathExpr||'""'},message:${msg}})`
1244
+ const fail = (code, msg) => `(_e||(_e=[])).push({code:'${code}',path:${pathExpr||'""'},message:${msg}})`
1243
1245
 
1244
1246
  if (types) {
1245
1247
  const conds = types.map(t => {
@@ -1267,8 +1269,6 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1267
1269
  isNum = types[0] === 'number' || types[0] === 'integer'
1268
1270
  }
1269
1271
  lines.push(`if(${typeOk}){`)
1270
- // We'll close this block at the end of genCodeC — mark it
1271
- ctx._typeBlock = true
1272
1272
  }
1273
1273
 
1274
1274
  // enum
@@ -1311,7 +1311,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1311
1311
  for (const key of schema.required) {
1312
1312
  const check = hoisted[key] ? `${hoisted[key]}===undefined` : `${v}[${JSON.stringify(key)}]===undefined`
1313
1313
  const p = pathExpr ? `${pathExpr}+'/${esc(key)}'` : `'/${esc(key)}'`
1314
- lines.push(`if(${check}){${`_e.push({code:'required_missing',path:${p},message:'missing: ${esc(key)}'})`}}`)
1314
+ lines.push(`if(${check}){${`(_e||(_e=[])).push({code:'required_missing',path:${p},message:'missing: ${esc(key)}'})`}}`)
1315
1315
  }
1316
1316
  } else if (schema.required) {
1317
1317
  for (const key of schema.required) {
@@ -1383,7 +1383,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1383
1383
  for (const [key, deps] of Object.entries(schema.dependentRequired)) {
1384
1384
  for (const dep of deps) {
1385
1385
  const p = pathExpr ? `${pathExpr}+'/${esc(dep)}'` : `'/${esc(dep)}'`
1386
- lines.push(`if(typeof ${v}==='object'&&${v}!==null&&${JSON.stringify(key)} in ${v}&&!(${JSON.stringify(dep)} in ${v})){_e.push({code:'required_missing',path:${p},message:'${esc(key)} requires ${esc(dep)}'})}`)
1386
+ lines.push(`if(typeof ${v}==='object'&&${v}!==null&&${JSON.stringify(key)} in ${v}&&!(${JSON.stringify(dep)} in ${v})){(_e||(_e=[])).push({code:'required_missing',path:${p},message:'${esc(key)} requires ${esc(dep)}'})}`)
1387
1387
  }
1388
1388
  }
1389
1389
  }
@@ -1482,9 +1482,8 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1482
1482
  }
1483
1483
 
1484
1484
  // Close type-success block if opened
1485
- if (ctx._typeBlock) {
1485
+ if (types) {
1486
1486
  lines.push(`}`)
1487
- ctx._typeBlock = false
1488
1487
  }
1489
1488
  }
1490
1489
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ata-validator",
3
- "version": "0.4.4",
3
+ "version": "0.4.5",
4
4
  "description": "Ultra-fast JSON Schema validator. Beats ajv on every valid-path benchmark: 1.1x–2.7x faster validate(obj), 151x faster compilation, 5.9x faster parallel batch. Speculative validation with V8-optimized JS codegen, simdjson, multi-core. Standard Schema V1 compatible.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",