ata-validator 0.4.4 → 0.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,15 +6,16 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
6
6
 
7
7
  ## Performance
8
8
 
9
- ### Single-Document Validation (valid data)
9
+ ### Single-Document Validation
10
10
 
11
11
  | Scenario | ata | ajv | |
12
12
  |---|---|---|---|
13
- | **validate(obj)** | 9.6M ops/sec | 8.5M ops/sec | **ata 1.1x faster** |
14
- | **isValidObject(obj)** | 10.4M ops/sec | 9.3M ops/sec | **ata 1.1x faster** |
15
- | **validateJSON(str)** | 1.9M ops/sec | 1.87M ops/sec | **ata 1.02x faster** |
16
- | **isValidJSON(str)** | 1.9M ops/sec | 1.89M ops/sec | **ata 1.01x faster** |
17
- | **Schema compilation** | 125,690 ops/sec | 831 ops/sec | **ata 151x faster** |
13
+ | **validate(obj)** valid | 15M ops/sec | 8M ops/sec | **ata 1.9x faster** |
14
+ | **validate(obj)** invalid | 13.1M ops/sec | 8.1M ops/sec | **ata 1.6x faster** |
15
+ | **isValidObject(obj)** | 15.4M ops/sec | 9.2M ops/sec | **ata 1.7x faster** |
16
+ | **validateJSON(str)** valid | 2.15M ops/sec | 1.88M ops/sec | **ata 1.1x faster** |
17
+ | **validateJSON(str)** invalid | 2.62M ops/sec | 2.35M ops/sec | **ata 1.1x faster** |
18
+ | **Schema compilation** | 112K ops/sec | 773 ops/sec | **ata 145x faster** |
18
19
 
19
20
  ### Large Data — JS Object Validation
20
21
 
@@ -24,61 +25,57 @@ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjs
24
25
  | 100 users (20KB) | 658K ops/sec | 243K ops/sec | **ata 2.7x faster** |
25
26
  | 1,000 users (205KB) | 64K ops/sec | 23.5K ops/sec | **ata 2.7x faster** |
26
27
 
27
- ### Parallel Batch Validation (multi-core)
28
-
29
- | Batch Size | ata | ajv | |
30
- |---|---|---|---|
31
- | 1,000 items | 8.4M items/sec | 2.2M items/sec | **ata 3.9x faster** |
32
- | 10,000 items | 12.5M items/sec | 2.1M items/sec | **ata 5.9x faster** |
33
-
34
- > ajv is single-threaded (JS). ata uses all CPU cores via a persistent C++ thread pool.
35
-
36
- ### Where ajv wins
28
+ ### Real-World Scenarios
37
29
 
38
30
  | Scenario | ata | ajv | |
39
31
  |---|---|---|---|
40
- | **validate(obj)** (invalid data, error collection) | 133K ops/sec | 7.5M ops/sec | **ajv 56x faster** |
41
- | **validateJSON(str)** (invalid data) | 169K ops/sec | 2.3M ops/sec | **ajv 14x faster** |
32
+ | **Serverless cold start** (50 schemas) | 7.7ms | 96ms | **ata 12.5x faster** |
33
+ | **ReDoS protection** (`^(a+)+$`) | 0.3ms | 765ms | **ata immune (RE2)** |
34
+ | **Batch NDJSON** (10K items, multi-core) | 13.4M/sec | 5.1M/sec | **ata 2.6x faster** |
35
+ | **Fastify HTTP** (100 users POST) | 24.6K req/sec | 22.6K req/sec | **ata 9% faster** |
42
36
 
43
- > Invalid-data error collection goes through the C++ NAPI path. This is the slow path by design production traffic is overwhelmingly valid.
37
+ > ata is faster than ajv on **every** benchmark valid and invalid data, objects and JSON strings, single documents and parallel batches.
44
38
 
45
39
  ### How it works
46
40
 
47
- **Speculative validation**: For valid data (the common case), ata runs a JS codegen fast path entirely in V8 JIT no NAPI boundary crossing. Only when validation fails does it fall through to the C++ engine for detailed error collection.
41
+ **Combined single-pass validation**: ata compiles schemas into monolithic JS functions that both validate and collect errors in a single pass. Valid data returns immediately (lazy error array zero allocation). Invalid data collects errors without a second pass.
48
42
 
49
- **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
43
+ **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `$ref` (local), `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
50
44
 
51
- **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables.
45
+ **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables, tiered uniqueItems (nested loop for small arrays).
52
46
 
53
47
  **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields — skipping irrelevant data at GB/s speeds.
54
48
 
55
49
  ### JSON Schema Test Suite
56
50
 
57
- **98.5%** pass rate (938/952) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
51
+ **98.4%** pass rate (937/952) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
58
52
 
59
53
  ## When to use ata
60
54
 
61
- - **Any `validate(obj)` workload** — 1.1x–2.7x faster than ajv on valid data
62
- - **Batch/streaming validation** — NDJSON log processing, data pipelines (5.9x faster)
63
- - **Schema-heavy startup** — many schemas compiled at boot (151x faster compile)
55
+ - **Any `validate(obj)` workload** — 1.6x–2.7x faster than ajv on all data
56
+ - **Serverless / cold starts** — 12.5x faster schema compilation
57
+ - **Security-sensitive apps** — RE2 regex, immune to ReDoS attacks
58
+ - **Batch/streaming validation** — NDJSON log processing, data pipelines (2.6x faster)
59
+ - **Standard Schema V1** — native support for Fastify v5, tRPC, TanStack
64
60
  - **C/C++ embedding** — native library, no JS runtime needed
65
61
 
66
62
  ## When to use ajv
67
63
 
68
- - **Error-heavy workloads**where most data is invalid and error details matter
69
- - **Schemas with `$ref`, `patternProperties`, `dependentSchemas`** these bypass JS codegen and hit the slower NAPI path
64
+ - **Schemas with `patternProperties`, `dependentSchemas`** these bypass JS codegen and hit the slower NAPI path
65
+ - **100% spec compliance needed**ajv covers more edge cases (ata: 98.4%)
70
66
 
71
67
  ## Features
72
68
 
73
- - **Speculative validation**: JS codegen fast pathvalid data never crosses the NAPI boundary
74
- - **Multi-core**: Parallel validation across all CPU cores — 12.5M validations/sec
69
+ - **Combined single-pass validation**: One JS function validates + collects errors no double pass, lazy error allocation
70
+ - **Multi-core**: Parallel validation across all CPU cores — 13.4M validations/sec
75
71
  - **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
76
- - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks
72
+ - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks (2391x faster on pathological input)
77
73
  - **V8-optimized codegen**: Destructuring batch reads, type guard elimination, property hoisting
78
74
  - **Standard Schema V1**: Compatible with Fastify, tRPC, TanStack, Drizzle
79
75
  - **Zero-copy paths**: Buffer and pre-padded input support — no unnecessary copies
76
+ - **Defaults + coercion**: `default` values, `coerceTypes`, `removeAdditional` support
80
77
  - **C/C++ library**: Native API for non-Node.js environments
81
- - **98.5% spec compliant**: Draft 2020-12
78
+ - **98.4% spec compliant**: Draft 2020-12
82
79
 
83
80
  ## Installation
84
81
 
@@ -98,18 +95,18 @@ const v = new Validator({
98
95
  properties: {
99
96
  name: { type: 'string', minLength: 1 },
100
97
  email: { type: 'string', format: 'email' },
101
- age: { type: 'integer', minimum: 0 }
98
+ age: { type: 'integer', minimum: 0 },
99
+ role: { type: 'string', default: 'user' }
102
100
  },
103
101
  required: ['name', 'email']
104
102
  });
105
103
 
106
- // Fast boolean check — JS codegen, no NAPI (1.1x faster than ajv)
104
+ // Fast boolean check — JS codegen (1.7x faster than ajv)
107
105
  v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
108
106
 
109
- // Full validation with error details
110
- const result = v.validate({ name: 'Mert', email: 'mert@example.com', age: 26 });
111
- console.log(result.valid); // true
112
- console.log(result.errors); // []
107
+ // Full validation with error details + defaults applied
108
+ const result = v.validate({ name: 'Mert', email: 'mert@example.com' });
109
+ // result.valid === true, data.role === 'user' (default applied)
113
110
 
114
111
  // JSON string validation (simdjson fast path)
115
112
  v.validateJSON('{"name": "Mert", "email": "mert@example.com"}');
@@ -118,12 +115,21 @@ v.isValidJSON('{"name": "Mert", "email": "mert@example.com"}'); // true
118
115
  // Buffer input (zero-copy, raw NAPI)
119
116
  v.isValid(Buffer.from('{"name": "Mert", "email": "mert@example.com"}'));
120
117
 
121
- // Parallel batch — multi-core, NDJSON (5.9x faster than ajv)
118
+ // Parallel batch — multi-core, NDJSON (2.6x faster than ajv)
122
119
  const ndjson = Buffer.from(lines.join('\n'));
123
120
  v.isValidParallel(ndjson); // bool[]
124
121
  v.countValid(ndjson); // number
125
122
  ```
126
123
 
124
+ ### Options
125
+
126
+ ```javascript
127
+ const v = new Validator(schema, {
128
+ coerceTypes: true, // "42" → 42 for integer fields
129
+ removeAdditional: true, // strip properties not in schema
130
+ });
131
+ ```
132
+
127
133
  ### Standard Schema V1
128
134
 
129
135
  ```javascript
@@ -143,7 +149,10 @@ npm install fastify-ata
143
149
 
144
150
  ```javascript
145
151
  const fastify = require('fastify')();
146
- fastify.register(require('fastify-ata'));
152
+ fastify.register(require('fastify-ata'), {
153
+ coerceTypes: true,
154
+ removeAdditional: true,
155
+ });
147
156
 
148
157
  // All existing JSON Schema route definitions work as-is
149
158
  ```
package/index.js CHANGED
@@ -250,17 +250,24 @@ class Validator {
250
250
  const useSimdjsonForLarge = !hasArrayTraversal;
251
251
 
252
252
  if (jsFn) {
253
- // Error handler: combined (optimized) jsErrFn NAPI fallback
254
- const errFn = jsCombinedFn
255
- ? (d) => { try { return jsCombinedFn(d); } catch { return compiled.validate(d); } }
256
- : jsErrFn
257
- ? (d) => { try { return jsErrFn(d, true); } catch { return compiled.validate(d); } }
258
- : (d) => compiled.validate(d);
259
- this.validate = preprocess
260
- ? (data) => { preprocess(data); return jsFn(data) ? VALID_RESULT : errFn(data); }
261
- : (data) => jsFn(data) ? VALID_RESULT : errFn(data);
253
+ // Best path: combined validator (single pass, lazy error array)
254
+ // Valid data: no array allocation, returns VALID_RESULT
255
+ // Invalid data: collects errors in one pass (no double validation)
256
+ // Fallback: jsFn + errFn for schemas combined can't handle
257
+ const errFn = jsErrFn
258
+ ? (d) => { try { return jsErrFn(d, true); } catch { return compiled.validate(d); } }
259
+ : (d) => compiled.validate(d);
260
+ this.validate = jsCombinedFn
261
+ ? (preprocess
262
+ ? (data) => { preprocess(data); try { return jsCombinedFn(data); } catch { return jsFn(data) ? VALID_RESULT : errFn(data); } }
263
+ : (data) => { try { return jsCombinedFn(data); } catch { return jsFn(data) ? VALID_RESULT : errFn(data); } })
264
+ : (preprocess
265
+ ? (data) => { preprocess(data); return jsFn(data) ? VALID_RESULT : errFn(data); }
266
+ : (data) => jsFn(data) ? VALID_RESULT : errFn(data));
262
267
  this.isValidObject = jsFn;
263
- const jsonValidateFn = (obj) => jsFn(obj) ? VALID_RESULT : errFn(obj);
268
+ const jsonValidateFn = jsCombinedFn
269
+ ? (obj) => { try { return jsCombinedFn(obj); } catch { return jsFn(obj) ? VALID_RESULT : errFn(obj); } }
270
+ : (obj) => jsFn(obj) ? VALID_RESULT : errFn(obj);
264
271
  this.validateJSON = useSimdjsonForLarge
265
272
  ? (jsonStr) => {
266
273
  if (jsonStr.length >= SIMDJSON_THRESHOLD) {
@@ -1208,16 +1208,18 @@ function compileToJSCombined(schema, VALID_RESULT) {
1208
1208
 
1209
1209
  // Use factory pattern: closure vars (regexes, etc.) created once, not per call
1210
1210
  const closureParams = ctx.closureVars.join(',')
1211
- const inner = `const _e=[];\n ` +
1211
+ // Lazy error array no allocation for valid data (the common case)
1212
+ const inner = `let _e;\n ` +
1212
1213
  (ctx.helperCode.length ? ctx.helperCode.join('\n ') + '\n ' : '') +
1213
1214
  lines.join('\n ') +
1214
- `\n return _e.length===0?R:{valid:false,errors:_e}`
1215
+ `\n return _e?{valid:false,errors:_e}:R`
1215
1216
 
1216
1217
  try {
1217
1218
  const factory = new Function('R' + (closureParams ? ',' + closureParams : ''),
1218
1219
  `return function(d){${inner}}`)
1219
1220
  return factory(VALID_RESULT, ...ctx.closureVals)
1220
- } catch {
1221
+ } catch (e) {
1222
+ if (process.env.ATA_DEBUG) console.error('compileToJSCombined error:', e.message, '\n', inner.slice(0, 500))
1221
1223
  return null
1222
1224
  }
1223
1225
  }
@@ -1239,7 +1241,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1239
1241
  const types = schema.type ? (Array.isArray(schema.type) ? schema.type : [schema.type]) : null
1240
1242
  let isObj = false, isArr = false, isStr = false, isNum = false
1241
1243
 
1242
- const fail = (code, msg) => `_e.push({code:'${code}',path:${pathExpr||'""'},message:${msg}})`
1244
+ const fail = (code, msg) => `(_e||(_e=[])).push({code:'${code}',path:${pathExpr||'""'},message:${msg}})`
1243
1245
 
1244
1246
  if (types) {
1245
1247
  const conds = types.map(t => {
@@ -1267,8 +1269,6 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1267
1269
  isNum = types[0] === 'number' || types[0] === 'integer'
1268
1270
  }
1269
1271
  lines.push(`if(${typeOk}){`)
1270
- // We'll close this block at the end of genCodeC — mark it
1271
- ctx._typeBlock = true
1272
1272
  }
1273
1273
 
1274
1274
  // enum
@@ -1311,7 +1311,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1311
1311
  for (const key of schema.required) {
1312
1312
  const check = hoisted[key] ? `${hoisted[key]}===undefined` : `${v}[${JSON.stringify(key)}]===undefined`
1313
1313
  const p = pathExpr ? `${pathExpr}+'/${esc(key)}'` : `'/${esc(key)}'`
1314
- lines.push(`if(${check}){${`_e.push({code:'required_missing',path:${p},message:'missing: ${esc(key)}'})`}}`)
1314
+ lines.push(`if(${check}){${`(_e||(_e=[])).push({code:'required_missing',path:${p},message:'missing: ${esc(key)}'})`}}`)
1315
1315
  }
1316
1316
  } else if (schema.required) {
1317
1317
  for (const key of schema.required) {
@@ -1383,7 +1383,7 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1383
1383
  for (const [key, deps] of Object.entries(schema.dependentRequired)) {
1384
1384
  for (const dep of deps) {
1385
1385
  const p = pathExpr ? `${pathExpr}+'/${esc(dep)}'` : `'/${esc(dep)}'`
1386
- lines.push(`if(typeof ${v}==='object'&&${v}!==null&&${JSON.stringify(key)} in ${v}&&!(${JSON.stringify(dep)} in ${v})){_e.push({code:'required_missing',path:${p},message:'${esc(key)} requires ${esc(dep)}'})}`)
1386
+ lines.push(`if(typeof ${v}==='object'&&${v}!==null&&${JSON.stringify(key)} in ${v}&&!(${JSON.stringify(dep)} in ${v})){(_e||(_e=[])).push({code:'required_missing',path:${p},message:'${esc(key)} requires ${esc(dep)}'})}`)
1387
1387
  }
1388
1388
  }
1389
1389
  }
@@ -1482,9 +1482,8 @@ function genCodeC(schema, v, pathExpr, lines, ctx) {
1482
1482
  }
1483
1483
 
1484
1484
  // Close type-success block if opened
1485
- if (ctx._typeBlock) {
1485
+ if (types) {
1486
1486
  lines.push(`}`)
1487
- ctx._typeBlock = false
1488
1487
  }
1489
1488
  }
1490
1489
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ata-validator",
3
- "version": "0.4.4",
3
+ "version": "0.4.6",
4
4
  "description": "Ultra-fast JSON Schema validator. Beats ajv on every valid-path benchmark: 1.1x–2.7x faster validate(obj), 151x faster compilation, 5.9x faster parallel batch. Speculative validation with V8-optimized JS codegen, simdjson, multi-core. Standard Schema V1 compatible.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",