@docrouter/mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,852 @@
1
+ # DocRouter Schema Definition Manual
2
+
3
+ ## Overview
4
+
5
+ DocRouter uses **OpenAI's Structured Outputs JSON Schema format** to define extraction schemas for document processing. Schemas ensure that AI-extracted data from documents follows a consistent, validated structure.
6
+
7
+ ## Table of Contents
8
+
9
+ 1. [Schema Format Specification](#schema-format-specification)
10
+ 2. [Basic Schema Structure](#basic-schema-structure)
11
+ 3. [Field Types](#field-types)
12
+ 4. [Required Fields and Strict Mode](#required-fields-and-strict-mode)
13
+ 5. [Advanced Schema Features](#advanced-schema-features)
14
+ 6. [Best Practices](#best-practices)
15
+ 7. [Examples](#examples)
16
+ 8. [API Integration](#api-integration)
17
+
18
+ ---
19
+
20
+ ## Schema Format Specification
21
+
22
+ ### Root Structure
23
+
24
+ All DocRouter schemas follow this format:
25
+
26
+ ```json
27
+ {
28
+ "type": "json_schema",
29
+ "json_schema": {
30
+ "name": "document_extraction",
31
+ "schema": {
32
+ "type": "object",
33
+ "properties": { ... },
34
+ "required": [ ... ],
35
+ "additionalProperties": false
36
+ },
37
+ "strict": true
38
+ }
39
+ }
40
+ ```
41
+
42
+ ### Components
43
+
44
+ | Component | Type | Required | Description |
45
+ |-----------|------|----------|-------------|
46
+ | `type` | string | Yes | Must be `"json_schema"` |
47
+ | `json_schema` | object | Yes | Container for schema definition |
48
+ | `json_schema.name` | string | Yes | Identifier for the schema (typically `"document_extraction"`) |
49
+ | `json_schema.schema` | object | Yes | JSON Schema specification following JSON Schema Draft 7 |
50
+ | `json_schema.strict` | boolean | Yes | **Must be `true`** - Ensures 100% schema adherence |
51
+
52
+ ### Strict Mode Constraints
53
+
54
+ When `strict: true` is enabled (mandatory for DocRouter), the following rules apply:
55
+
56
+ 1. **All properties MUST be in the `required` array** - No optional fields allowed
57
+ 2. **`additionalProperties: false` MUST be set** - At every level, including nested objects
58
+ 3. **Perfect schema adherence** - The LLM output will always match the schema exactly
59
+ 4. **Default values for missing data** - Empty strings, zeros, false, or empty arrays/objects
60
+
61
+ ---
62
+
63
+ ## Basic Schema Structure
64
+
65
+ ### Minimal Schema Example
66
+
67
+ ```json
68
+ {
69
+ "type": "json_schema",
70
+ "json_schema": {
71
+ "name": "document_extraction",
72
+ "schema": {
73
+ "type": "object",
74
+ "properties": {
75
+ "field_name": {
76
+ "type": "string",
77
+ "description": "Human-readable description of this field"
78
+ }
79
+ },
80
+ "required": ["field_name"],
81
+ "additionalProperties": false
82
+ },
83
+ "strict": true
84
+ }
85
+ }
86
+ ```
87
+
88
+ ### Schema Object Properties
89
+
90
+ | Property | Type | Required | Description |
91
+ |----------|------|----------|-------------|
92
+ | `type` | string | Yes | Must be `"object"` for root schema |
93
+ | `properties` | object | Yes | Defines all extractable fields |
94
+ | `required` | array | Yes | **Must list ALL properties** when `strict: true` |
95
+ | `additionalProperties` | boolean | Yes | **Must be `false`** when `strict: true` |
96
+
97
+ ---
98
+
99
+ ## Field Types
100
+
101
+ DocRouter schemas support standard JSON Schema data types:
102
+
103
+ ### String Fields
104
+
105
+ ```json
106
+ {
107
+ "field_name": {
108
+ "type": "string",
109
+ "description": "A text field"
110
+ }
111
+ }
112
+ ```
113
+
114
+ **Use for:** Names, emails, addresses, free-text descriptions, comma-separated lists
115
+
116
+ ### Number Fields
117
+
118
+ ```json
119
+ {
120
+ "amount": {
121
+ "type": "number",
122
+ "description": "A numeric value"
123
+ }
124
+ }
125
+ ```
126
+
127
+ **Use for:** Quantities, amounts, percentages, measurements
128
+
129
+ ### Integer Fields
130
+
131
+ ```json
132
+ {
133
+ "count": {
134
+ "type": "integer",
135
+ "description": "A whole number"
136
+ }
137
+ }
138
+ ```
139
+
140
+ **Use for:** Counts, years, age, quantity of items
141
+
142
+ ### Boolean Fields
143
+
144
+ ```json
145
+ {
146
+ "is_verified": {
147
+ "type": "boolean",
148
+ "description": "True/false indicator"
149
+ }
150
+ }
151
+ ```
152
+
153
+ **Use for:** Yes/no questions, checkboxes, status flags
154
+
155
+ ### Array Fields
156
+
157
+ ```json
158
+ {
159
+ "skills": {
160
+ "type": "array",
161
+ "description": "List of programming skills",
162
+ "items": {
163
+ "type": "string"
164
+ }
165
+ }
166
+ }
167
+ ```
168
+
169
+ **Use for:** Lists, multiple values, repeated items
170
+
171
+ ### Object Fields (Nested)
172
+
173
+ ```json
174
+ {
175
+ "address": {
176
+ "type": "object",
177
+ "description": "Address information",
178
+ "properties": {
179
+ "street": {
180
+ "type": "string",
181
+ "description": "Street address"
182
+ },
183
+ "city": {
184
+ "type": "string",
185
+ "description": "City name"
186
+ },
187
+ "postal_code": {
188
+ "type": "string",
189
+ "description": "Postal code"
190
+ }
191
+ },
192
+ "required": ["street", "city", "postal_code"],
193
+ "additionalProperties": false
194
+ }
195
+ }
196
+ ```
197
+
198
+ **Use for:** Grouped related fields, structured sub-data
199
+
200
+ ---
201
+
202
+ ## Required Fields and Strict Mode
203
+
204
+ ### Strict Mode Requirements
205
+
206
+ **IMPORTANT:** When using OpenAI's Structured Outputs with `strict: true` (which DocRouter uses), **ALL properties MUST be listed in the `required` array**. This is a mandatory requirement from OpenAI's API.
207
+
208
+ ```json
209
+ {
210
+ "type": "object",
211
+ "properties": {
212
+ "name": { "type": "string", "description": "Full name" },
213
+ "email": { "type": "string", "description": "Email address" },
214
+ "middle_name": { "type": "string", "description": "Middle name" }
215
+ },
216
+ "required": ["name", "email", "middle_name"],
217
+ "additionalProperties": false
218
+ }
219
+ ```
220
+
221
+ ### How the LLM Handles Missing Data
222
+
223
+ Since all fields must be required in strict mode, the LLM handles missing data as follows:
224
+
225
+ - **String fields**: Returns empty string `""` if data not found in document
226
+ - **Number/Integer fields**: Returns `0` if data not found
227
+ - **Boolean fields**: Returns `false` if data not found
228
+ - **Array fields**: Returns empty array `[]` if data not found
229
+ - **Object fields**: Returns object with all nested required fields populated with default values
230
+
231
+ **Best Practice:** Design your schema knowing that all fields will always be present in the response, but may contain empty/default values when data is not found in the document.
232
+
233
+ ---
234
+
235
+ ## Advanced Schema Features
236
+
237
+ **⚠️ PORTABILITY WARNING:** The features below are supported by OpenAI's Structured Outputs, but **not recommended** for DocRouter schemas. These constraints may not be portable across different LLM providers (Anthropic Claude, Google Gemini, etc.). For maximum compatibility and reliability:
238
+
239
+ - **Use basic types only**: string, number, integer, boolean, array, object
240
+ - **Avoid** enums, patterns, minimum/maximum, minItems/maxItems, uniqueItems
241
+ - **Handle validation in your application code** instead of in the schema
242
+ - **Use detailed descriptions** to guide the LLM rather than strict constraints
243
+
244
+ ### Enums (Restricted Values) - ⚠️ NOT RECOMMENDED
245
+
246
+ While OpenAI supports limiting field values to specific options, this feature may not work with other LLM providers:
247
+
248
+ ```json
249
+ {
250
+ "document_type": {
251
+ "type": "string",
252
+ "description": "Type of document",
253
+ "enum": ["invoice", "receipt", "contract", "bill"]
254
+ }
255
+ }
256
+ ```
257
+
258
+ **Better approach:**
259
+ ```json
260
+ {
261
+ "document_type": {
262
+ "type": "string",
263
+ "description": "Type of document (e.g., invoice, receipt, contract, or bill)"
264
+ }
265
+ }
266
+ ```
267
+
268
+ ### String Patterns - ⚠️ NOT RECOMMENDED
269
+
270
+ Regex validation is OpenAI-specific and reduces portability:
271
+
272
+ ```json
273
+ {
274
+ "phone": {
275
+ "type": "string",
276
+ "description": "Phone number in E.164 format",
277
+ "pattern": "^\\+[1-9]\\d{1,14}$"
278
+ }
279
+ }
280
+ ```
281
+
282
+ **Better approach:**
283
+ ```json
284
+ {
285
+ "phone": {
286
+ "type": "string",
287
+ "description": "Phone number in E.164 format (e.g., +1234567890)"
288
+ }
289
+ }
290
+ ```
291
+
292
+ ### Number Constraints - ⚠️ NOT RECOMMENDED
293
+
294
+ Minimum, maximum, and multipleOf constraints may not be portable:
295
+
296
+ ```json
297
+ {
298
+ "age": {
299
+ "type": "integer",
300
+ "description": "Age in years",
301
+ "minimum": 0,
302
+ "maximum": 150
303
+ },
304
+ "price": {
305
+ "type": "number",
306
+ "description": "Price in USD",
307
+ "minimum": 0,
308
+ "multipleOf": 0.01
309
+ }
310
+ }
311
+ ```
312
+
313
+ **Better approach:**
314
+ ```json
315
+ {
316
+ "age": {
317
+ "type": "integer",
318
+ "description": "Age in years (0-150)"
319
+ },
320
+ "price": {
321
+ "type": "number",
322
+ "description": "Price in USD (e.g., 19.99)"
323
+ }
324
+ }
325
+ ```
326
+
327
+ ### Array Constraints - ⚠️ NOT RECOMMENDED
328
+
329
+ Array size and uniqueness constraints are not universally supported:
330
+
331
+ ```json
332
+ {
333
+ "tags": {
334
+ "type": "array",
335
+ "description": "Document tags",
336
+ "items": {
337
+ "type": "string"
338
+ },
339
+ "minItems": 1,
340
+ "maxItems": 10,
341
+ "uniqueItems": true
342
+ }
343
+ }
344
+ ```
345
+
346
+ **Better approach:**
347
+ ```json
348
+ {
349
+ "tags": {
350
+ "type": "array",
351
+ "description": "Document tags (1-10 unique tags)",
352
+ "items": {
353
+ "type": "string"
354
+ }
355
+ }
356
+ }
357
+ ```
358
+
359
+ ### Complex Nested Objects
360
+
361
+ ```json
362
+ {
363
+ "work_history": {
364
+ "type": "array",
365
+ "description": "Employment history",
366
+ "items": {
367
+ "type": "object",
368
+ "properties": {
369
+ "company": {
370
+ "type": "string",
371
+ "description": "Company name"
372
+ },
373
+ "position": {
374
+ "type": "string",
375
+ "description": "Job title"
376
+ },
377
+ "start_date": {
378
+ "type": "string",
379
+ "description": "Start date (YYYY-MM-DD)"
380
+ },
381
+ "end_date": {
382
+ "type": "string",
383
+ "description": "End date (YYYY-MM-DD or 'Present')"
384
+ },
385
+ "responsibilities": {
386
+ "type": "array",
387
+ "items": {
388
+ "type": "string"
389
+ },
390
+ "description": "List of job responsibilities"
391
+ }
392
+ },
393
+ "required": ["company", "position", "start_date", "end_date", "responsibilities"],
394
+ "additionalProperties": false
395
+ }
396
+ }
397
+ }
398
+ ```
399
+
400
+ ---
401
+
402
+ ## Best Practices
403
+
404
+ ### 1. Use Clear, Descriptive Field Names
405
+
406
+ **Good:**
407
+ ```json
408
+ "current_academic_program": { "type": "string", "description": "Current degree program" }
409
+ ```
410
+
411
+ **Avoid:**
412
+ ```json
413
+ "prog": { "type": "string", "description": "program" }
414
+ ```
415
+
416
+ ### 2. Provide Detailed Descriptions
417
+
418
+ Descriptions guide the LLM on what to extract:
419
+
420
+ **Good:**
421
+ ```json
422
+ {
423
+ "total_amount": {
424
+ "type": "string",
425
+ "description": "Total invoice amount including tax, with currency symbol and commas (e.g., $1,234.56)"
426
+ }
427
+ }
428
+ ```
429
+
430
+ **Avoid:**
431
+ ```json
432
+ {
433
+ "total_amount": {
434
+ "type": "string",
435
+ "description": "total"
436
+ }
437
+ }
438
+ ```
439
+
440
+ ### 3. Choose Appropriate Field Types
441
+
442
+ - Use **string** for currency values with formatting (e.g., "$1,234.56")
443
+ - Use **number** for numeric calculations
444
+ - Use **array** for multiple items instead of comma-separated strings
445
+ - Use **object** to group related fields
446
+
447
+ ### 4. Set `additionalProperties: false`
448
+
449
+ Prevent the LLM from adding unexpected fields:
450
+
451
+ ```json
452
+ {
453
+ "type": "object",
454
+ "properties": { ... },
455
+ "additionalProperties": false
456
+ }
457
+ ```
458
+
459
+ ### 5. All Fields Must Be Required (Strict Mode)
460
+
461
+ - **ALL fields must be listed in the `required` array** when using `strict: true`
462
+ - The LLM will return empty/default values for fields not found in the document
463
+ - Design your schema to handle empty values gracefully in your application logic
464
+ - There are no optional fields in strict mode - this is an OpenAI API requirement
465
+
466
+ ### 6. Avoid Advanced Constraints for Portability
467
+
468
+ For maximum portability across LLM providers (OpenAI, Anthropic, Gemini, etc.):
469
+
470
+ - **Use basic types only** and avoid enums, patterns, min/max constraints
471
+ - **Put constraints in descriptions** instead: `"Status (paid, unpaid, overdue, or cancelled)"`
472
+ - **Validate data in your application** rather than in the schema
473
+ - This ensures your schemas work consistently across all supported LLM providers
474
+
475
+ **Not recommended:**
476
+ ```json
477
+ {
478
+ "invoice_status": {
479
+ "type": "string",
480
+ "enum": ["paid", "unpaid", "overdue", "cancelled"]
481
+ }
482
+ }
483
+ ```
484
+
485
+ **Recommended:**
486
+ ```json
487
+ {
488
+ "invoice_status": {
489
+ "type": "string",
490
+ "description": "Invoice status (paid, unpaid, overdue, or cancelled)"
491
+ }
492
+ }
493
+ ```
494
+
495
+ ### 7. Document Your Schema
496
+
497
+ Include clear descriptions that explain:
498
+ - What data to extract
499
+ - Expected format
500
+ - How to handle edge cases
501
+
502
+ ---
503
+
504
+ ## Examples
505
+
506
+ ### Example 1: Invoice Schema
507
+
508
+ ```json
509
+ {
510
+ "type": "json_schema",
511
+ "json_schema": {
512
+ "name": "document_extraction",
513
+ "schema": {
514
+ "type": "object",
515
+ "properties": {
516
+ "invoice_number": {
517
+ "type": "string",
518
+ "description": "Unique invoice identifier"
519
+ },
520
+ "invoice_date": {
521
+ "type": "string",
522
+ "description": "Date of invoice in YYYY-MM-DD format"
523
+ },
524
+ "vendor_name": {
525
+ "type": "string",
526
+ "description": "Name of the vendor/supplier"
527
+ },
528
+ "vendor_address": {
529
+ "type": "string",
530
+ "description": "Complete vendor address"
531
+ },
532
+ "customer_name": {
533
+ "type": "string",
534
+ "description": "Name of the customer/buyer"
535
+ },
536
+ "line_items": {
537
+ "type": "array",
538
+ "description": "List of items on the invoice",
539
+ "items": {
540
+ "type": "object",
541
+ "properties": {
542
+ "description": {
543
+ "type": "string",
544
+ "description": "Item description"
545
+ },
546
+ "quantity": {
547
+ "type": "string",
548
+ "description": "Quantity ordered"
549
+ },
550
+ "unit_price": {
551
+ "type": "string",
552
+ "description": "Price per unit with currency"
553
+ },
554
+ "total": {
555
+ "type": "string",
556
+ "description": "Line total with currency"
557
+ }
558
+ },
559
+ "required": ["description", "quantity", "unit_price", "total"],
560
+ "additionalProperties": false
561
+ }
562
+ },
563
+ "subtotal": {
564
+ "type": "string",
565
+ "description": "Subtotal before tax with currency"
566
+ },
567
+ "tax_amount": {
568
+ "type": "string",
569
+ "description": "Tax amount with currency"
570
+ },
571
+ "total_amount": {
572
+ "type": "string",
573
+ "description": "Total amount due with currency"
574
+ },
575
+ "payment_terms": {
576
+ "type": "string",
577
+ "description": "Payment terms (e.g., Net 30, Due on Receipt)"
578
+ }
579
+ },
580
+ "required": [
581
+ "invoice_number",
582
+ "invoice_date",
583
+ "vendor_name",
584
+ "vendor_address",
585
+ "customer_name",
586
+ "line_items",
587
+ "subtotal",
588
+ "tax_amount",
589
+ "total_amount",
590
+ "payment_terms"
591
+ ],
592
+ "additionalProperties": false
593
+ },
594
+ "strict": true
595
+ }
596
+ }
597
+ ```
598
+
599
+ ### Example 2: Resume/CV Schema
600
+
601
+ ```json
602
+ {
603
+ "type": "json_schema",
604
+ "json_schema": {
605
+ "name": "document_extraction",
606
+ "schema": {
607
+ "type": "object",
608
+ "properties": {
609
+ "Name": {
610
+ "type": "string",
611
+ "description": "Candidate's full name"
612
+ },
613
+ "Email": {
614
+ "type": "string",
615
+ "description": "Email address"
616
+ },
617
+ "Telephone": {
618
+ "type": "string",
619
+ "description": "Phone number"
620
+ },
621
+ "Current Academic Program": {
622
+ "type": "string",
623
+ "description": "Current degree program (e.g., MEng Computing)"
624
+ },
625
+ "Current Grade": {
626
+ "type": "string",
627
+ "description": "Academic year or GPA/grade information"
628
+ },
629
+ "High School Qualification": {
630
+ "type": "string",
631
+ "description": "A-levels, GCSEs, or equivalent qualifications"
632
+ },
633
+ "Programming Languages": {
634
+ "type": "string",
635
+ "description": "Comma-separated list of programming languages"
636
+ },
637
+ "Experiences": {
638
+ "type": "string",
639
+ "description": "Professional or research experiences"
640
+ },
641
+ "Projects": {
642
+ "type": "string",
643
+ "description": "Academic or personal projects with descriptions"
644
+ },
645
+ "Awards": {
646
+ "type": "string",
647
+ "description": "Academic awards, honors, competition placements"
648
+ },
649
+ "Work Experience": {
650
+ "type": "string",
651
+ "description": "Employment history with companies and roles"
652
+ },
653
+ "Extracurricular": {
654
+ "type": "string",
655
+ "description": "Clubs, hobbies, volunteer work, sports"
656
+ },
657
+ "Languages": {
658
+ "type": "string",
659
+ "description": "Spoken languages and proficiency levels"
660
+ }
661
+ },
662
+ "required": [
663
+ "Name",
664
+ "Email",
665
+ "Telephone",
666
+ "Current Academic Program",
667
+ "Current Grade",
668
+ "High School Qualification",
669
+ "Programming Languages",
670
+ "Experiences",
671
+ "Projects",
672
+ "Awards",
673
+ "Work Experience",
674
+ "Extracurricular",
675
+ "Languages"
676
+ ],
677
+ "additionalProperties": false
678
+ },
679
+ "strict": true
680
+ }
681
+ }
682
+ ```
683
+
684
+ ### Example 3: Financial Statement Schema
685
+
686
+ ```json
687
+ {
688
+ "type": "json_schema",
689
+ "json_schema": {
690
+ "name": "document_extraction",
691
+ "schema": {
692
+ "type": "object",
693
+ "properties": {
694
+ "net_interest_income": {
695
+ "type": "string",
696
+ "description": "Net interest income in thousands with formatting"
697
+ },
698
+ "net_fee_and_commission_income": {
699
+ "type": "string",
700
+ "description": "Net fee and commission income"
701
+ },
702
+ "other_operating_income": {
703
+ "type": "string",
704
+ "description": "Other operating income"
705
+ },
706
+ "credit_loss_expense": {
707
+ "type": "string",
708
+ "description": "Credit loss expense (negative values in parentheses)"
709
+ },
710
+ "net_operating_income": {
711
+ "type": "string",
712
+ "description": "Net operating income"
713
+ },
714
+ "personnel_expenses": {
715
+ "type": "string",
716
+ "description": "Personnel expenses"
717
+ },
718
+ "other_operating_expenses": {
719
+ "type": "string",
720
+ "description": "Other operating expenses"
721
+ },
722
+ "total_expenses": {
723
+ "type": "string",
724
+ "description": "Total expenses"
725
+ },
726
+ "profit_loss_before_tax": {
727
+ "type": "string",
728
+ "description": "Profit/loss before tax"
729
+ },
730
+ "tax_expense_credit": {
731
+ "type": "string",
732
+ "description": "Tax expense or credit"
733
+ },
734
+ "profit_loss_for_the_year": {
735
+ "type": "string",
736
+ "description": "Final profit/loss for the year"
737
+ }
738
+ },
739
+ "required": [
740
+ "net_interest_income",
741
+ "net_fee_and_commission_income",
742
+ "other_operating_income",
743
+ "credit_loss_expense",
744
+ "net_operating_income",
745
+ "personnel_expenses",
746
+ "other_operating_expenses",
747
+ "total_expenses",
748
+ "profit_loss_before_tax",
749
+ "tax_expense_credit",
750
+ "profit_loss_for_the_year"
751
+ ],
752
+ "additionalProperties": false
753
+ },
754
+ "strict": true
755
+ }
756
+ }
757
+ ```
758
+
759
+ ---
760
+
761
+ ## API Integration
762
+
763
+ DocRouter provides multiple ways to interact with schemas programmatically:
764
+
765
+ - **TypeScript/JavaScript SDK** - Type-safe client library for Node.js and browsers (see `packages/typescript/docrouter-sdk/`)
766
+ - **Python SDK** - Type-safe Python client library (see `packages/docrouter_sdk/`)
767
+ - **REST API** - Direct HTTP requests (see API documentation for endpoints)
768
+ - **MCP (Model Context Protocol)** - Integration with AI assistants like Claude Code
769
+
770
+ All methods support the same schema operations: create, list, retrieve, update, delete, and validate against schemas.
771
+
772
+ ---
773
+
774
+ ## Schema Workflow
775
+
776
+ ### 1. Design Phase
777
+ - Identify document type and key fields to extract
778
+ - Choose appropriate data types for each field
779
+ - Design nested structures for complex data
780
+ - Remember: ALL fields will be required in strict mode
781
+
782
+ ### 2. Creation Phase
783
+ - Create schema using API or UI
784
+ - Test with sample documents
785
+ - Iterate based on extraction results
786
+
787
+ ### 3. Prompt Integration
788
+ - Link schema to extraction prompt
789
+ - Configure LLM model (e.g., gpt-4o-mini, gemini-2.0-flash)
790
+ - Associate with document tags for automatic processing
791
+
792
+ ### 4. Processing Phase
793
+ - Upload documents with appropriate tags
794
+ - LLM extracts data according to schema
795
+ - Results available via `getLLMResult` API
796
+
797
+ ### 5. Validation Phase
798
+ - Review extracted data
799
+ - Verify against schema requirements
800
+ - Mark as verified when accurate
801
+
802
+ ---
803
+
804
+ ## Troubleshooting
805
+
806
+ ### Common Issues
807
+
808
+ **Issue:** LLM returns empty strings for all fields
809
+ - **Solution:** Check prompt content, ensure it references the schema, verify document has OCR text
810
+
811
+ **Issue:** Extra fields appear in extraction
812
+ - **Solution:** Ensure `additionalProperties: false` is set in schema
813
+
814
+ **Issue:** Error "all properties must be required when strict is true"
815
+ - **Solution:** Ensure ALL properties are listed in the `required` array at every level (including nested objects)
816
+
817
+ **Issue:** Error "additionalProperties must be false when strict is true"
818
+ - **Solution:** Set `additionalProperties: false` on all objects in the schema, including nested objects
819
+
820
+ **Issue:** Required fields missing from extraction
821
+ - **Solution:** This should not happen with `strict: true`. Verify schema matches exactly, check field names
822
+
823
+ **Issue:** Number fields returned as strings
824
+ - **Solution:** For formatted numbers (with commas, currency), use string type. For calculations, use number type.
825
+
826
+ **Issue:** Array fields contain single concatenated string
827
+ - **Solution:** Update prompt to explicitly instruct LLM to return array of items
828
+
829
+ ---
830
+
831
+ ## Version Control
832
+
833
+ DocRouter maintains schema versioning:
834
+
835
+ - Each schema update creates a new version
836
+ - `schema_version` increments with each change
837
+ - `schema_revid` uniquely identifies each version
838
+ - Previous versions remain accessible for historical extractions
839
+
840
+ ---
841
+
842
+ ## References
843
+
844
+ - [JSON Schema Specification](https://json-schema.org/)
845
+ - [OpenAI Structured Outputs Documentation](https://platform.openai.com/docs/guides/structured-outputs)
846
+ - [DocRouter API Documentation](../README.md)
847
+
848
+ ---
849
+
850
+ **Document Version:** 1.0
851
+ **Last Updated:** 2025-10-11
852
+ **Maintained by:** DocRouter Development Team