ata-validator 0.1.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,65 +1,89 @@
1
- # ata
1
+ # ata-validator
2
2
 
3
- A blazing-fast C++ JSON Schema validator powered by [simdjson](https://github.com/simdjson/simdjson). Schema compilation 11,000x faster than ajv, JSON validation 2-4x faster. CSP-safe, multi-language, zero JS dependencies.
3
+ Ultra-fast JSON Schema validator powered by [simdjson](https://github.com/simdjson/simdjson). Multi-core parallel validation, RE2 regex, codegen bytecode engine. Standard Schema V1 compatible.
4
+
5
+ **[ata-validator.com](https://ata-validator.com)**
4
6
 
5
7
  ## Performance
6
8
 
7
- ### Schema Compilation
9
+ ### Single-Document Validation (valid data)
10
+
11
+ | Scenario | ata | ajv | |
12
+ |---|---|---|---|
13
+ | **validate(obj)** | 9.6M ops/sec | 8.5M ops/sec | **ata 1.1x faster** |
14
+ | **isValidObject(obj)** | 10.4M ops/sec | 9.3M ops/sec | **ata 1.1x faster** |
15
+ | **validateJSON(str)** | 1.9M ops/sec | 1.87M ops/sec | **ata 1.02x faster** |
16
+ | **isValidJSON(str)** | 1.9M ops/sec | 1.89M ops/sec | **ata 1.01x faster** |
17
+ | **Schema compilation** | 125,690 ops/sec | 831 ops/sec | **ata 151x faster** |
18
+
19
+ ### Large Data — JS Object Validation
8
20
 
9
- | Validator | ops/sec |
10
- |---|---|
11
- | **ata** | **175,548** |
12
- | ajv | 16 |
21
+ | Size | ata | ajv | |
22
+ |---|---|---|---|
23
+ | 10 users (2KB) | 6.2M ops/sec | 2.5M ops/sec | **ata 2.5x faster** |
24
+ | 100 users (20KB) | 658K ops/sec | 243K ops/sec | **ata 2.7x faster** |
25
+ | 1,000 users (205KB) | 64K ops/sec | 23.5K ops/sec | **ata 2.7x faster** |
26
+
27
+ ### Parallel Batch Validation (multi-core)
28
+
29
+ | Batch Size | ata | ajv | |
30
+ |---|---|---|---|
31
+ | 1,000 items | 8.4M items/sec | 2.2M items/sec | **ata 3.9x faster** |
32
+ | 10,000 items | 12.5M items/sec | 2.1M items/sec | **ata 5.9x faster** |
13
33
 
14
- > ata compiles schemas **11,000x faster** than ajv.
34
+ > ajv is single-threaded (JS). ata uses all CPU cores via a persistent C++ thread pool.
15
35
 
16
- ### JSON String Validation (real-world scenario)
36
+ ### Where ajv wins
17
37
 
18
- | Payload Size | ata | ajv | Winner |
38
+ | Scenario | ata | ajv | |
19
39
  |---|---|---|---|
20
- | 2 KB | 449,447 | 193,181 | **ata 2.3x faster** |
21
- | 10 KB | 136,301 | 40,644 | **ata 3.4x faster** |
22
- | 20 KB | 73,142 | 20,459 | **ata 3.6x faster** |
23
- | 100 KB | 14,388 | 4,062 | **ata 3.5x faster** |
24
- | 200 KB | 7,590 | 2,021 | **ata 3.8x faster** |
40
+ | **validate(obj)** (invalid data, error collection) | 133K ops/sec | 7.5M ops/sec | **ajv 56x faster** |
41
+ | **validateJSON(str)** (invalid data) | 169K ops/sec | 2.3M ops/sec | **ajv 14x faster** |
25
42
 
26
- > Tested on Apple Silicon. JSON string validation = `JSON.parse()` + `validate()` for ajv vs single `validateJSON()` call for ata. The gap grows with payload size.
43
+ > Invalid-data error collection goes through the C++ NAPI path. This is the slow path by design production traffic is overwhelmingly valid.
44
+
45
+ ### How it works
46
+
47
+ **Speculative validation**: For valid data (the common case), ata runs a JS codegen fast path entirely in V8 JIT — no NAPI boundary crossing. Only when validation fails does it fall through to the C++ engine for detailed error collection.
48
+
49
+ **JS codegen**: Schemas are compiled to monolithic JS functions (like ajv). Supported keywords: `type`, `required`, `properties`, `items`, `enum`, `const`, `allOf`, `anyOf`, `oneOf`, `not`, `if/then/else`, `uniqueItems`, `contains`, `prefixItems`, `additionalProperties`, `dependentRequired`, `minimum/maximum`, `minLength/maxLength`, `pattern`, `format`.
50
+
51
+ **V8 TurboFan optimizations**: Destructuring batch reads, `undefined` checks instead of `in` operator, context-aware type guard elimination, property hoisting to local variables.
52
+
53
+ **Adaptive simdjson**: For large documents (>8KB) with selective schemas, simdjson On Demand seeks only the needed fields — skipping irrelevant data at GB/s speeds.
27
54
 
28
55
  ### JSON Schema Test Suite
29
56
 
30
- **97.1%** pass rate on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
57
+ **98.5%** pass rate (938/952) on official [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) (Draft 2020-12).
31
58
 
32
- ## Features
59
+ ## When to use ata
33
60
 
34
- - **Fast**: SIMD-accelerated JSON parsing via simdjson, pre-compiled schemas, cached regex patterns, branchless UTF-8 counting
35
- - **CSP-Safe**: No `new Function()` or `eval()` works in strict Content Security Policy environments where ajv cannot
36
- - **V8 Direct Traversal**: Validates JS objects directly in C++ without `JSON.stringify` overhead
37
- - **Comprehensive**: Supports JSON Schema Draft 2020-12 keywords including `$ref`, `if/then/else`, `patternProperties`, `prefixItems`, `format`
38
- - **Multi-Language**: C API (`ata_c.h`) enables bindings for Rust, Python, Go, Ruby, and more
39
- - **Drop-in Replacement**: ajv-compatible API — switch with one line change
40
- - **Node.js Binding**: Native N-API addon
41
- - **Error Details**: Rich error messages with JSON Pointer paths
61
+ - **Any `validate(obj)` workload** 1.1x–2.7x faster than ajv on valid data
62
+ - **Batch/streaming validation**NDJSON log processing, data pipelines (5.9x faster)
63
+ - **Schema-heavy startup** many schemas compiled at boot (151x faster compile)
64
+ - **C/C++ embedding** native library, no JS runtime needed
42
65
 
43
- ## Installation
66
+ ## When to use ajv
44
67
 
45
- ### Node.js
68
+ - **Error-heavy workloads** — where most data is invalid and error details matter
69
+ - **Schemas with `$ref`, `patternProperties`, `dependentSchemas`** — these bypass JS codegen and hit the slower NAPI path
46
70
 
47
- ```bash
48
- npm install ata-validator
49
- ```
71
+ ## Features
50
72
 
51
- ### CMake (C++)
73
+ - **Speculative validation**: JS codegen fast path — valid data never crosses the NAPI boundary
74
+ - **Multi-core**: Parallel validation across all CPU cores — 12.5M validations/sec
75
+ - **simdjson**: SIMD-accelerated JSON parsing at GB/s speeds, adaptive On Demand for large docs
76
+ - **RE2 regex**: Linear-time guarantees, immune to ReDoS attacks
77
+ - **V8-optimized codegen**: Destructuring batch reads, type guard elimination, property hoisting
78
+ - **Standard Schema V1**: Compatible with Fastify, tRPC, TanStack, Drizzle
79
+ - **Zero-copy paths**: Buffer and pre-padded input support — no unnecessary copies
80
+ - **C/C++ library**: Native API for non-Node.js environments
81
+ - **98.5% spec compliant**: Draft 2020-12
52
82
 
53
- ```cmake
54
- include(FetchContent)
55
- FetchContent_Declare(
56
- ata
57
- GIT_REPOSITORY https://github.com/mertcanaltin/ata.git
58
- GIT_TAG main
59
- )
60
- FetchContent_MakeAvailable(ata)
83
+ ## Installation
61
84
 
62
- target_link_libraries(your_target PRIVATE ata::ata)
85
+ ```bash
86
+ npm install ata-validator
63
87
  ```
64
88
 
65
89
  ## Usage
@@ -67,9 +91,8 @@ target_link_libraries(your_target PRIVATE ata::ata)
67
91
  ### Node.js
68
92
 
69
93
  ```javascript
70
- const { Validator, validate } = require('ata-validator');
94
+ const { Validator } = require('ata-validator');
71
95
 
72
- // Pre-compiled schema (recommended)
73
96
  const v = new Validator({
74
97
  type: 'object',
75
98
  properties: {
@@ -80,196 +103,106 @@ const v = new Validator({
80
103
  required: ['name', 'email']
81
104
  });
82
105
 
83
- // Validate JS objects directly (V8 direct traversal)
84
- const result = v.validate({ name: 'Mert', email: 'mert@example.com', age: 28 });
106
+ // Fast boolean check — JS codegen, no NAPI (1.1x faster than ajv)
107
+ v.isValidObject({ name: 'Mert', email: 'mert@example.com', age: 26 }); // true
108
+
109
+ // Full validation with error details
110
+ const result = v.validate({ name: 'Mert', email: 'mert@example.com', age: 26 });
85
111
  console.log(result.valid); // true
112
+ console.log(result.errors); // []
86
113
 
87
- // Validate JSON strings (simdjson fast path)
88
- const r = v.validateJSON('{"name": "Mert", "email": "mert@example.com"}');
89
- console.log(r.valid); // true
114
+ // JSON string validation (simdjson fast path)
115
+ v.validateJSON('{"name": "Mert", "email": "mert@example.com"}');
116
+ v.isValidJSON('{"name": "Mert", "email": "mert@example.com"}'); // true
90
117
 
91
- // Error details
92
- const r2 = v.validate({ name: '', age: -1 });
93
- console.log(r2.errors);
94
- // [{ code: 4, path: '', message: 'missing required property: email' }, ...]
118
+ // Buffer input (zero-copy, raw NAPI)
119
+ v.isValid(Buffer.from('{"name": "Mert", "email": "mert@example.com"}'));
120
+
121
+ // Parallel batch multi-core, NDJSON (5.9x faster than ajv)
122
+ const ndjson = Buffer.from(lines.join('\n'));
123
+ v.isValidParallel(ndjson); // bool[]
124
+ v.countValid(ndjson); // number
95
125
  ```
96
126
 
97
- ### Drop-in ajv Replacement
127
+ ### Standard Schema V1
98
128
 
99
- ```diff
100
- - const Ajv = require('ajv');
101
- + const Ajv = require('ata-validator/compat');
129
+ ```javascript
130
+ const v = new Validator(schema);
102
131
 
103
- const ajv = new Ajv();
104
- const validate = ajv.compile(schema);
105
- const valid = validate(data);
106
- if (!valid) console.log(validate.errors);
132
+ // Works with Fastify, tRPC, TanStack, etc.
133
+ const result = v['~standard'].validate(data);
134
+ // { value: data } on success
135
+ // { issues: [{ message, path }] } on failure
107
136
  ```
108
137
 
109
- ### C++
138
+ ### Fastify Plugin
110
139
 
111
- ```cpp
112
- #include "ata.h"
113
- #include <iostream>
114
-
115
- int main() {
116
- auto schema = ata::compile(R"({
117
- "type": "object",
118
- "properties": {
119
- "name": {"type": "string"},
120
- "age": {"type": "integer", "minimum": 0}
121
- },
122
- "required": ["name"]
123
- })");
124
-
125
- auto result = ata::validate(schema, R"({"name": "Mert", "age": 28})");
126
-
127
- if (result) {
128
- std::cout << "Valid!" << std::endl;
129
- } else {
130
- for (const auto& err : result.errors) {
131
- std::cout << err.path << ": " << err.message << std::endl;
132
- }
133
- }
134
- return 0;
135
- }
140
+ ```bash
141
+ npm install fastify-ata
136
142
  ```
137
143
 
138
- ### C API
144
+ ```javascript
145
+ const fastify = require('fastify')();
146
+ fastify.register(require('fastify-ata'));
139
147
 
140
- ```c
141
- #include "ata_c.h"
142
- #include <stdio.h>
143
- #include <string.h>
148
+ // All existing JSON Schema route definitions work as-is
149
+ ```
144
150
 
145
- int main(void) {
146
- const char* schema = "{\"type\":\"string\",\"minLength\":3}";
147
- ata_schema s = ata_compile(schema, strlen(schema));
151
+ ### C++
148
152
 
149
- const char* doc = "\"hello\"";
150
- ata_result r = ata_validate(s, doc, strlen(doc));
153
+ ```cpp
154
+ #include "ata.h"
151
155
 
152
- if (r.valid) {
153
- printf("Valid!\n");
154
- } else {
155
- for (size_t i = 0; i < r.error_count; i++) {
156
- ata_string msg = ata_get_error_message(i);
157
- printf("Error: %.*s\n", (int)msg.length, msg.data);
158
- }
159
- }
156
+ auto schema = ata::compile(R"({
157
+ "type": "object",
158
+ "properties": { "name": {"type": "string"} },
159
+ "required": ["name"]
160
+ })");
160
161
 
161
- ata_schema_free(s);
162
- return 0;
163
- }
162
+ auto result = ata::validate(schema, R"({"name": "Mert"})");
163
+ // result.valid == true
164
164
  ```
165
165
 
166
166
  ## Supported Keywords
167
167
 
168
168
  | Category | Keywords |
169
169
  |----------|----------|
170
- | Type | `type` (string, number, integer, boolean, null, array, object, union) |
170
+ | Type | `type` |
171
171
  | Numeric | `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum`, `multipleOf` |
172
172
  | String | `minLength`, `maxLength`, `pattern`, `format` |
173
- | Array | `items`, `prefixItems`, `minItems`, `maxItems`, `uniqueItems` |
174
- | Object | `properties`, `required`, `additionalProperties`, `patternProperties`, `minProperties`, `maxProperties` |
173
+ | Array | `items`, `prefixItems`, `minItems`, `maxItems`, `uniqueItems`, `contains`, `minContains`, `maxContains` |
174
+ | Object | `properties`, `required`, `additionalProperties`, `patternProperties`, `minProperties`, `maxProperties`, `propertyNames`, `dependentRequired`, `dependentSchemas` |
175
175
  | Enum/Const | `enum`, `const` |
176
176
  | Composition | `allOf`, `anyOf`, `oneOf`, `not` |
177
177
  | Conditional | `if`, `then`, `else` |
178
178
  | References | `$ref`, `$defs`, `definitions`, `$id` |
179
- | Boolean | `true` (accept all), `false` (reject all) |
179
+ | Boolean | `true`, `false` |
180
180
 
181
- ### Format Validators
181
+ ### Format Validators (hand-written, no regex)
182
182
 
183
183
  `email`, `date`, `date-time`, `time`, `uri`, `uri-reference`, `ipv4`, `ipv6`, `uuid`, `hostname`
184
184
 
185
- ## Why ata over ajv?
186
-
187
- | | ata | ajv |
188
- |---|---|---|
189
- | Schema compilation | **11,000x faster** | Slow (code generation) |
190
- | JSON string validation | **2-4x faster** | JSON.parse + validate |
191
- | CSP compatible | Yes | No (`new Function()`) |
192
- | Multi-language | C, C++, Rust, Python, Go | JavaScript only |
193
- | Bundle size | ~20KB JS + native | ~150KB minified |
194
- | Node.js core candidate | Yes (like ada-url, simdutf) | No (JS dependency) |
195
-
196
185
  ## Building from Source
197
186
 
198
187
  ```bash
199
188
  # C++ library + tests
200
189
  cmake -B build
201
190
  cmake --build build
202
- ctest --test-dir build
203
-
204
- # With benchmarks
205
- cmake -B build -DATA_BENCHMARKS=ON
206
- cmake --build build
207
- ./build/ata_bench
191
+ ./build/ata_tests
208
192
 
209
193
  # Node.js addon
210
194
  npm install
211
- node test.js
212
-
213
- # Run JSON Schema Test Suite
214
- node tests/run_suite.js
215
- ```
216
-
217
- ### Build Options
218
-
219
- | Option | Default | Description |
220
- |--------|---------|-------------|
221
- | `ATA_TESTING` | `ON` | Build test suite |
222
- | `ATA_BENCHMARKS` | `OFF` | Build benchmarks |
223
- | `ATA_SANITIZE` | `OFF` | Enable address sanitizer |
195
+ npm run build
196
+ npm test
224
197
 
225
- ## API Reference
226
-
227
- ### C++ API
228
-
229
- #### `ata::compile(schema_json) -> schema_ref`
230
- Compile a JSON Schema string. Returns a reusable `schema_ref` (falsy on error).
231
-
232
- #### `ata::validate(schema_ref, json, opts) -> validation_result`
233
- Validate a JSON string against a pre-compiled schema. Pass `{.all_errors = false}` to stop at first error (faster).
234
-
235
- #### `ata::validation_result`
236
- ```cpp
237
- struct validation_result {
238
- bool valid;
239
- std::vector<validation_error> errors;
240
- explicit operator bool() const noexcept { return valid; }
241
- };
242
- ```
243
-
244
- ### Node.js API
245
-
246
- #### `new Validator(schema)`
247
- Create a validator with a pre-compiled schema. `schema` can be an object or JSON string.
248
-
249
- #### `validator.validate(data) -> { valid, errors }`
250
- Validate any JS value directly via V8 traversal (no serialization).
251
-
252
- #### `validator.validateJSON(jsonString) -> { valid, errors }`
253
- Validate a JSON string via simdjson (fastest path for string input).
254
-
255
- #### `validate(schema, data) -> { valid, errors }`
256
- One-shot validation without pre-compilation.
257
-
258
- ### ajv-compatible API (`compat.js`)
259
-
260
- ```javascript
261
- const Ata = require('ata-validator/compat');
262
- const ata = new Ata();
263
- const validate = ata.compile(schema);
264
- const valid = validate(data);
265
- if (!valid) console.log(validate.errors);
198
+ # JSON Schema Test Suite
199
+ npm run test:suite
266
200
  ```
267
201
 
268
202
  ## License
269
203
 
270
- Licensed under either of
204
+ MIT
271
205
 
272
- - Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
273
- - MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
206
+ ## Author
274
207
 
275
- at your option.
208
+ [Mert Can Altin](https://github.com/mertcanaltin)