npm - parsefy - Versions diffs - 1.0.1 → 1.0.3 - Mend

parsefy 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -1,8 +1,16 @@
-# Parsefy
+<p align="center">
+  <img src="/assets/logo.png" alt="Parsefy Logo" width="120" />
+</p>
-Official TypeScript SDK for [Parsefy](https://parsefy.io) – AI-powered document data extraction.
+<h1 align="center">Parsefy TypeScript / JavaScript SDK</h1>
-Extract structured data from PDFs and DOCX files using Zod schemas with full TypeScript type inference.
+<p align="center">
+  <strong>Official TypeScript / JavaScript SDK for Parsefy - Financial Document Infrastructure for Developers</strong><br>
+Turn financial PDFs (invoices, receipts, bills) into structured JSON with validation and risk signals.
+</p>
+---
 ## Installation
@@ -19,21 +27,142 @@ import * as z from 'zod';
 const client = new Parsefy('pk_your_api_key');
 const schema = z.object({
+  // REQUIRED - triggers fallback if below confidence threshold
   invoice_number: z.string().describe('The invoice number'),
-  date: z.string().describe('Invoice date in YYYY-MM-DD format'),
-  total: z.number().describe('Total amount'),
+  total: z.number().describe('Total amount including tax'),
+  // OPTIONAL - won't trigger fallback if missing or low confidence
+  vendor: z.string().optional().describe('Vendor name'),
+  due_date: z.string().optional().describe('Payment due date'),
 });
-const { object, error } = await client.extract({
+const { object, metadata, error } = await client.extract({
   file: './invoice.pdf',
   schema,
 });
 if (!error && object) {
   console.log(object.invoice_number); // Fully typed!
+  // Access field-level confidence and evidence
+  console.log(`Overall confidence: ${metadata.confidenceScore}`);
+  metadata.fieldConfidence.forEach((fc) => {
+    console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
+  });
 }
 ```
+## ⚠️ Required vs Optional Fields (Important for Billing)
+**All fields are required by default.** This is critical to understand:
+| User writes (SDK) | SDK converts to (JSON Schema) | API interprets as |
+|-------------------|-------------------------------|-------------------|
+| `name: z.string()` | `required: ["name"]` | **Required** |
+| `name: z.string().optional()` | `required: []` | **Optional** |
+### Why This Matters
+If a **required** field returns `null` or falls below the `confidenceThreshold`, the API triggers the **fallback model** (Tier 2), which is more expensive.
+**To avoid unexpected high billing:**
+```typescript
+const schema = z.object({
+  // REQUIRED - Always present on invoices, keep required
+  invoice_number: z.string(),
+  total: z.number(),
+  // OPTIONAL - May not appear on all documents, mark optional!
+  vendor: z.string().optional(),      // Not all invoices have vendor name
+  tax_id: z.string().optional(),      // Rarely present
+  notes: z.string().optional(),       // Usually empty
+  due_date: z.string().optional(),    // Sometimes missing
+});
+```
+**Rule of thumb:** If a field might be missing in >20% of your documents, mark it `.optional()`.
+## Confidence Threshold
+Control when the fallback model is triggered:
+```typescript
+const { object, metadata } = await client.extract({
+  file: './invoice.pdf',
+  schema,
+  confidenceThreshold: 0.85, // default
+});
+```
+| Threshold | Behavior |
+|-----------|----------|
+| Lower (e.g., 0.70) | **Faster** – Accepts Tier 1 results more often |
+| Higher (e.g., 0.95) | **More accurate** – Triggers Tier 2 fallback more often |
+**Default:** `0.85`
+## Response Format
+```typescript
+interface ExtractResult<T> {
+  // Extracted data matching your schema, or null if extraction failed
+  object: T | null;
+  // Metadata about the extraction
+  metadata: {
+    processingTimeMs: number;     // Processing time in milliseconds
+    inputTokens: number;          // Input tokens used
+    outputTokens: number;         // Output tokens generated
+    credits: number;              // Credits consumed (1 credit = 1 page)
+    fallbackTriggered: boolean;   // Whether fallback model was used
+    // 🆕 Field-level confidence and evidence
+    confidenceScore: number;      // Overall confidence (0.0 to 1.0)
+    fieldConfidence: Array<{
+      field: string;              // JSON path (e.g., "$.invoice_number")
+      score: number;              // Confidence score (0.0 to 1.0)
+      reason: string;             // "Exact match", "Inferred from header", etc.
+      page: number;               // Page number where found
+      text: string;               // Source text evidence
+    }>;
+    issues: string[];             // Warnings or anomalies detected
+  };
+  // Error details if extraction failed
+  error: {
+    code: string;
+    message: string;
+  } | null;
+}
+```
+### Example Response
+```typescript
+const { object, metadata } = await client.extract({ file, schema });
+// object:
+{
+  invoice_number: "INV-2024-0042",
+  date: "2024-01-15",
+  total: 1250.00,
+  vendor: "Acme Corp"
+}
+// metadata.confidenceScore: 0.94
+// metadata.fieldConfidence:
+[
+  { field: "$.invoice_number", score: 0.98, reason: "Exact match", page: 1, text: "Invoice # INV-2024-0042" },
+  { field: "$.date", score: 0.95, reason: "Exact match", page: 1, text: "Date: 01/15/2024" },
+  { field: "$.total", score: 0.92, reason: "Formatting ambiguous", page: 1, text: "Total: $1,250.00" },
+  { field: "$.vendor", score: 0.90, reason: "Inferred from header", page: 1, text: "Acme Corp" }
+]
+// metadata.issues: []
+```
 ## Configuration
 ### API Key
@@ -60,35 +189,19 @@ const client = new Parsefy({
 | `apiKey` | `string` | `process.env.PARSEFY_API_KEY` | Your Parsefy API key |
 | `timeout` | `number` | `60000` | Request timeout in ms |
-## Usage
-### Basic Extraction
-```typescript
-import { Parsefy } from 'parsefy';
-import * as z from 'zod';
-const client = new Parsefy();
-const schema = z.object({
-  name: z.string(),
-  email: z.string().email(),
-  phone: z.string().optional(),
-});
+### Extract Options
-const { object, metadata, error } = await client.extract({
-  file: './contact.pdf',
-  schema,
-});
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `file` | `File \| Blob \| Buffer \| string` | required | Document to extract from |
+| `schema` | `z.ZodType` | required | Zod schema defining extraction structure |
+| `confidenceThreshold` | `number` | `0.85` | Minimum confidence before triggering fallback |
-if (!error) {
-  console.log(object);
-}
-```
+## Usage
 ### File Input Options
-The SDK supports multiple file input types. **Files don't need to be on disk** – you can work entirely in memory, which is ideal for building APIs and serverless functions.
+The SDK supports multiple file input types. **Files don't need to be on disk** – you can work entirely in memory.
 | Input Type | Usage | Environment |
 |------------|-------|-------------|
@@ -96,19 +209,18 @@ The SDK supports multiple file input types. **Files don't need to be on disk**
 | `Buffer` | In-memory bytes | Node.js |
 | `File` | From file input or FormData | Browser, Node.js 20+, Edge |
 | `Blob` | Raw binary with MIME type | Universal |
-| `ArrayBuffer` | Wrap in `Blob` first | Universal |
 ```typescript
-// Node.js: File path (convenience for scripts/CLI)
+// Node.js: File path
 const result = await client.extract({
-  file: './document.pdf',
+  file: './invoice.pdf',
   schema,
 });
 // Node.js: Buffer (in-memory)
 import { readFileSync } from 'fs';
 const result = await client.extract({
-  file: readFileSync('./document.pdf'),
+  file: readFileSync('./invoice.pdf'),
   schema,
 });
@@ -118,18 +230,43 @@ const result = await client.extract({
   file: fileInput.files[0],
   schema,
 });
+```
-// Universal: Blob (with explicit MIME type)
-const result = await client.extract({
-  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
-  schema,
+### Complex Schemas for Financial Documents
+Use `.describe()` to guide the AI extraction:
+```typescript
+const invoiceSchema = z.object({
+  // REQUIRED - Core financial data
+  invoice_number: z.string().describe('The invoice or receipt number'),
+  date: z.string().describe('Invoice date in YYYY-MM-DD format'),
+  total: z.number().describe('Total amount due including tax'),
+  currency: z.string().describe('3-letter currency code (USD, EUR, etc.)'),
+  // REQUIRED - Line items (usually present)
+  line_items: z.array(z.object({
+    description: z.string().describe('Item description'),
+    quantity: z.number().describe('Number of units'),
+    unit_price: z.number().describe('Price per unit'),
+    amount: z.number().describe('Total amount for this line'),
+  })).describe('List of items on the invoice'),
+  // OPTIONAL - May not appear on all invoices
+  vendor: z.object({
+    name: z.string().describe('Company name of the vendor'),
+    address: z.string().optional().describe('Full address'),
+    tax_id: z.string().optional().describe('Tax ID or VAT number'),
+  }).optional(),
+  subtotal: z.number().optional().describe('Subtotal before tax'),
+  tax: z.number().optional().describe('Tax amount'),
+  due_date: z.string().optional().describe('Payment due date'),
+  payment_terms: z.string().optional().describe('e.g., Net 30'),
 });
 ```
 ### Server-Side / API Usage
-When building APIs that receive file uploads, files are typically kept in memory. The SDK handles this seamlessly:
 **Express with Multer:**
 ```typescript
@@ -137,38 +274,22 @@ import express from 'express';
 import multer from 'multer';
 import { Parsefy } from 'parsefy';
-const upload = multer(); // Store in memory, not disk
+const upload = multer(); // Store in memory
 const client = new Parsefy();
 app.post('/extract', upload.single('document'), async (req, res) => {
-  const { object, error } = await client.extract({
-    file: req.file.buffer, // Buffer from multer
+  const { object, metadata, error } = await client.extract({
+    file: req.file.buffer,
     schema,
+    confidenceThreshold: 0.80, // Adjust based on your needs
   });
-  res.json({ data: object, error });
-});
-```
-**Fastify:**
-```typescript
-import Fastify from 'fastify';
-import multipart from '@fastify/multipart';
-import { Parsefy } from 'parsefy';
-const fastify = Fastify();
-await fastify.register(multipart);
-const client = new Parsefy();
-fastify.post('/extract', async (request) => {
-  const data = await request.file();
-  const buffer = await data.toBuffer();
-  const { object, error } = await client.extract({
-    file: buffer,
-    schema,
+  res.json({
+    data: object,
+    confidence: metadata.confidenceScore,
+    fieldDetails: metadata.fieldConfidence,
+    error,
   });
-  return { data: object, error };
 });
 ```
@@ -183,43 +304,19 @@ const client = new Parsefy();
 app.post('/extract', async (c) => {
   const formData = await c.req.formData();
-  const file = formData.get('document'); // File object
+  const file = formData.get('document');
-  const { object, error } = await client.extract({
-    file, // File from FormData works directly
+  const { object, metadata, error } = await client.extract({
+    file,
     schema,
   });
-  return c.json({ data: object, error });
-});
-```
-### Complex Schemas
-Use `.describe()` to guide the AI extraction:
-```typescript
-const invoiceSchema = z.object({
-  invoice_number: z.string().describe('The invoice or receipt number'),
-  date: z.string().describe('Invoice date in YYYY-MM-DD format'),
-  vendor: z.object({
-    name: z.string().describe('Company name of the vendor'),
-    address: z.string().describe('Full address of the vendor'),
-  }),
-  line_items: z.array(z.object({
-    description: z.string().describe('Item description'),
-    quantity: z.number().describe('Number of units'),
-    unit_price: z.number().describe('Price per unit'),
-    amount: z.number().describe('Total amount for this line'),
-  })).describe('List of items on the invoice'),
-  subtotal: z.number().describe('Subtotal before tax'),
-  tax: z.number().describe('Tax amount'),
-  total: z.number().describe('Total amount due'),
-  currency: z.string().describe('3-letter currency code (USD, EUR, etc.)'),
-});
-const { object } = await client.extract({
-  file: './invoice.pdf',
-  schema: invoiceSchema,
+  return c.json({
+    data: object,
+    confidence: metadata.confidenceScore,
+    issues: metadata.issues,
+    error,
+  });
 });
 ```
@@ -230,17 +327,24 @@ import { Parsefy, APIError, ValidationError, ParsefyError } from 'parsefy';
 try {
   const { object, error, metadata } = await client.extract({
-    file: './document.pdf',
+    file: './invoice.pdf',
     schema,
   });
   // Extraction-level errors (request succeeded, but extraction failed)
   if (error) {
     console.error(`Extraction failed: [${error.code}] ${error.message}`);
-    console.log(`Tokens used: ${metadata.inputTokens} in, ${metadata.outputTokens} out`);
+    console.log(`Fallback triggered: ${metadata.fallbackTriggered}`);
+    console.log(`Issues: ${metadata.issues.join(', ')}`);
     return;
   }
+  // Check for low confidence fields
+  const lowConfidence = metadata.fieldConfidence.filter((fc) => fc.score < 0.80);
+  if (lowConfidence.length > 0) {
+    console.warn('Low confidence fields:', lowConfidence);
+  }
   console.log('Success:', object);
 } catch (err) {
   // HTTP/Network errors
@@ -254,30 +358,6 @@ try {
 }
 ```
-## Response Format
-```typescript
-interface ExtractResult<T> {
-  // Extracted data matching your schema, or null if extraction failed
-  object: T | null;
-  // Metadata about the extraction
-  metadata: {
-    processingTimeMs: number;    // Processing time in milliseconds
-    inputTokens: number;         // Input tokens used
-    outputTokens: number;        // Output tokens generated
-    credits: number;             // Credits consumed (1 credit = 1 page)
-    fallbackTriggered: boolean;  // Whether fallback model was used
-  };
-  // Error details if extraction failed
-  error: {
-    code: string;    // EXTRACTION_FAILED, LLM_ERROR, PARSING_ERROR, TIMEOUT_ERROR
-    message: string;
-  } | null;
-}
-```
 ## Error Types
 | Error Class | Description |
@@ -301,7 +381,23 @@ The API allows 1 request per second. The SDK automatically retries with exponent
 - Node.js 18+ (for native `fetch` and `FormData`)
 - Zod 3.x (peer dependency)
+## TypeScript Types
+All types are exported for your convenience:
+```typescript
+import type {
+  ParsefyConfig,
+  ExtractOptions,
+  ExtractResult,
+  ExtractionMetadata,
+  FieldConfidence,
+  APIErrorResponse,
+} from 'parsefy';
+import { DEFAULT_CONFIDENCE_THRESHOLD } from 'parsefy'; // 0.85
+```
 ## License
 MIT © [Parsefy](https://parsefy.io)

package/dist/index.cjs CHANGED Viewed

@@ -1,2 +1,2 @@
-'use strict';var zodToJsonSchema=require('zod-to-json-schema');var m={".pdf":"application/pdf",".docx":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"},f=10*1024*1024,h="https://api.parsefy.io",g=6e4;var i=class extends Error{constructor(t,r){super(t),this.name="ParsefyError",this.code=r,typeof Error.captureStackTrace=="function"&&Error.captureStackTrace(this,this.constructor);}},c=class extends i{constructor(t,r,o){super(t),this.name="APIError",this.statusCode=r,this.response=o;}},l=class extends i{constructor(t,r,o){super(t,r),this.name="ExtractionError",this.metadata=o;}},s=class extends i{constructor(t){super(t),this.name="ValidationError";}};function d(){return typeof process<"u"&&process.versions?.node!==void 0}function T(e){return zodToJsonSchema.zodToJsonSchema(e,{$refStrategy:"none",target:"openApi3"})}function b(e){let t=e.toLowerCase().match(/\.[^.]+$/)?.[0];return t&&m[t]||null}function R(e){if(!b(e)){let r=Object.keys(m).join(", ");throw new s(`Unsupported file type. Supported types: ${r}`)}}function u(e){if(e===0)throw new s("File is empty");if(e>f){let t=f/1048576;throw new s(`File size exceeds maximum limit of ${t}MB`)}}function w(e){return {object:e.object,metadata:{processingTimeMs:e.metadata.processing_time_ms,inputTokens:e.metadata.input_tokens,outputTokens:e.metadata.output_tokens,credits:e.metadata.credits,fallbackTriggered:e.metadata.fallback_triggered},error:e.error}}function E(e,t){let r=b(t)||"application/octet-stream",o=e.buffer.slice(e.byteOffset,e.byteOffset+e.byteLength);return typeof File<"u"?new File([o],t,{type:r}):new Blob([o],{type:r})}async function _(e){if(!d())throw new s("File path strings are only supported in Node.js. Use File or Blob in the browser.");let t=await import('fs'),r=await import('path');if(!t.existsSync(e))throw new s(`File not found: ${e}`);let o=r.basename(e);R(o);let a=t.readFileSync(e);return u(a.length),{buffer:a,filename:o}}async function P(e){if(typeof e=="string"){let{buffer:t,filename:r}=await _(e);return E(t,r)}if(Buffer.isBuffer(e))return u(e.length),E(e,"document.pdf");if(e instanceof File)return R(e.name),u(e.size),e;if(e instanceof Blob)return u(e.size),e;throw new s("Invalid file input. Expected File, Blob, Buffer, or file path string.")}function F(e){return new Promise(t=>setTimeout(t,e))}function k(e,t=1e3){let r=t*Math.pow(2,e),o=Math.random()*.1*r;return Math.min(r+o,3e4)}var y=class{constructor(t){this.maxRetries=3;let r={};if(typeof t=="string"?r={apiKey:t}:t&&(r=t),this.apiKey=r.apiKey||this.getEnvApiKey(),!this.apiKey)throw new s("API key is required. Provide it in the constructor or set the PARSEFY_API_KEY environment variable.");this.baseUrl=r.baseUrl||h,this.timeout=r.timeout||g;}getEnvApiKey(){return d()&&process.env.PARSEFY_API_KEY||""}async extract(t){let{file:r,schema:o}=t,a=T(o),n=await P(r),p=new FormData;return p.append("file",n),p.append("output_schema",JSON.stringify(a)),this.makeRequestWithRetry(p)}async makeRequestWithRetry(t,r=0){try{return await this.makeRequest(t)}catch(o){if(o instanceof c&&o.statusCode===429&&r<this.maxRetries){let a=k(r);return await F(a),this.makeRequestWithRetry(t,r+1)}throw o}}async makeRequest(t){let r=`${this.baseUrl}/v1/extract`,o=new AbortController,a=setTimeout(()=>o.abort(),this.timeout);try{let n=await fetch(r,{method:"POST",headers:{Authorization:`Bearer ${this.apiKey}`},body:t,signal:o.signal});if(clearTimeout(a),!n.ok){let x=await this.parseErrorResponse(n);throw new c(x.message||`API request failed with status ${n.status}`,n.status,x)}let p=await n.json();return w(p)}catch(n){throw clearTimeout(a),n instanceof Error&&n.name==="AbortError"?new i(`Request timed out after ${this.timeout}ms`,"TIMEOUT"):n instanceof i?n:n instanceof TypeError?new i("Network error: Unable to connect to the Parsefy API","NETWORK_ERROR"):new i(`Unexpected error: ${n instanceof Error?n.message:String(n)}`,"UNKNOWN_ERROR")}}async parseErrorResponse(t){try{return await t.json()}catch{try{return {message:await t.text()||t.statusText}}catch{return {message:t.statusText}}}}};
-exports.APIError=c;exports.ExtractionError=l;exports.Parsefy=y;exports.ParsefyError=i;exports.ValidationError=s;
+'use strict';var zodToJsonSchema=require('zod-to-json-schema');var u=.85,d={".pdf":"application/pdf",".docx":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"},l=10*1024*1024,g="https://api.parsefy.io",x=6e4;var s=class extends Error{constructor(e,r){super(e),this.name="ParsefyError",this.code=r,typeof Error.captureStackTrace=="function"&&Error.captureStackTrace(this,this.constructor);}},p=class extends s{constructor(e,r,o){super(e),this.name="APIError",this.statusCode=r,this.response=o;}},y=class extends s{constructor(e,r,o){super(e,r),this.name="ExtractionError",this.metadata=o;}},a=class extends s{constructor(e){super(e),this.name="ValidationError";}};function h(){return typeof process<"u"&&process.versions?.node!==void 0}function R(t){let e=zodToJsonSchema.zodToJsonSchema(t,{$refStrategy:"none",target:"jsonSchema7"});return "$schema"in e&&delete e.$schema,e}function b(t){let e=t.toLowerCase().match(/\.[^.]+$/)?.[0];return e&&d[e]||null}function w(t){if(!b(t)){let r=Object.keys(d).join(", ");throw new a(`Unsupported file type. Supported types: ${r}`)}}function m(t){if(t===0)throw new a("File is empty");if(t>l){let e=l/1048576;throw new a(`File size exceeds maximum limit of ${e}MB`)}}function F(t){let e=t._meta||{confidence_score:1,field_confidence:[],issues:[]};return {object:t.object,metadata:{processingTimeMs:t.metadata.processing_time_ms,inputTokens:t.metadata.input_tokens,outputTokens:t.metadata.output_tokens,credits:t.metadata.credits,fallbackTriggered:t.metadata.fallback_triggered,confidenceScore:e.confidence_score,fieldConfidence:e.field_confidence.map(r=>({field:r.field,score:r.score,reason:r.reason,page:r.page,text:r.text})),issues:e.issues},error:t.error}}function T(t,e){let r=b(e)||"application/octet-stream",o=t.buffer.slice(t.byteOffset,t.byteOffset+t.byteLength);return typeof File<"u"?new File([o],e,{type:r}):new Blob([o],{type:r})}async function I(t){if(!h())throw new a("File path strings are only supported in Node.js. Use File or Blob in the browser.");let e=await import('fs'),r=await import('path');if(!e.existsSync(t))throw new a(`File not found: ${t}`);let o=r.basename(t);w(o);let c=e.readFileSync(t);return m(c.length),{buffer:c,filename:o}}async function _(t){if(typeof t=="string"){let{buffer:e,filename:r}=await I(t);return T(e,r)}if(Buffer.isBuffer(t))return m(t.length),T(t,"document.pdf");if(t instanceof File)return w(t.name),m(t.size),t;if(t instanceof Blob)return m(t.size),t;throw new a("Invalid file input. Expected File, Blob, Buffer, or file path string.")}function P(t){return new Promise(e=>setTimeout(e,t))}function S(t,e=1e3){let r=e*Math.pow(2,t),o=Math.random()*.1*r;return Math.min(r+o,3e4)}var E=class{constructor(e){this.maxRetries=3;let r={};if(typeof e=="string"?r={apiKey:e}:e&&(r=e),this.apiKey=r.apiKey||this.getEnvApiKey(),!this.apiKey)throw new a("API key is required. Provide it in the constructor or set the PARSEFY_API_KEY environment variable.");this.baseUrl=r.baseUrl||g,this.timeout=r.timeout||x;}getEnvApiKey(){return h()&&process.env.PARSEFY_API_KEY||""}async extract(e){let{file:r,schema:o,confidenceThreshold:c}=e,n=R(o),f=await _(r),i=new FormData;return i.append("file",f),i.append("output_schema",JSON.stringify(n)),i.append("confidence_threshold",String(c??.85)),this.makeRequestWithRetry(i)}async makeRequestWithRetry(e,r=0){try{return await this.makeRequest(e)}catch(o){if(o instanceof p&&o.statusCode===429&&r<this.maxRetries){let c=S(r);return await P(c),this.makeRequestWithRetry(e,r+1)}throw o}}async makeRequest(e){let r=`${this.baseUrl}/v1/extract`,o=new AbortController,c=setTimeout(()=>o.abort(),this.timeout);try{let n=await fetch(r,{method:"POST",headers:{Authorization:`Bearer ${this.apiKey}`},body:e,signal:o.signal});if(clearTimeout(c),!n.ok){let i=await this.parseErrorResponse(n);throw new p(i.message||`API request failed with status ${n.status}`,n.status,i)}let f;try{f=await n.json();}catch{throw new s("Failed to parse API response as JSON. The API may have returned an invalid response.","PARSE_ERROR")}try{return F(f)}catch(i){throw new s(`Failed to transform API response: ${i instanceof Error?i.message:String(i)}`,"TRANSFORM_ERROR")}}catch(n){throw clearTimeout(c),n instanceof Error&&n.name==="AbortError"?new s(`Request timed out after ${this.timeout}ms`,"TIMEOUT"):n instanceof s?n:n instanceof TypeError&&n.message.includes("fetch")?new s(`Network error: Unable to connect to the Parsefy API. ${n.message}`,"NETWORK_ERROR"):n instanceof TypeError?new s(`Type error: ${n.message}. This may indicate an API response format issue.`,"TYPE_ERROR"):new s(`Unexpected error: ${n instanceof Error?n.message:String(n)}`,"UNKNOWN_ERROR")}}async parseErrorResponse(e){try{return await e.json()}catch{try{return {message:await e.text()||e.statusText}}catch{return {message:e.statusText}}}}};
+exports.APIError=p;exports.DEFAULT_CONFIDENCE_THRESHOLD=u;exports.ExtractionError=y;exports.Parsefy=E;exports.ParsefyError=s;exports.ValidationError=a;

package/dist/index.d.cts CHANGED Viewed

@@ -11,6 +11,11 @@ interface ParsefyConfig {
     /** Request timeout in milliseconds. Defaults to 60000 (60 seconds). */
     timeout?: number;
 }
+/**
+ * Default confidence threshold for extraction.
+ * Fields below this threshold on required fields will trigger the fallback model.
+ */
+declare const DEFAULT_CONFIDENCE_THRESHOLD = 0.85;
 /**
  * Options for the extract method.
  */
@@ -19,6 +24,35 @@ interface ExtractOptions<T extends z.ZodType> {
     file: File | Blob | Buffer | string;
     /** Zod schema defining the structure of data to extract. */
     schema: T;
+    /**
+     * Confidence threshold for extraction (0.0 to 1.0). Defaults to 0.85.
+     *
+     * If a **required** field's confidence falls below this threshold (or returns null),
+     * the fallback model is triggered for higher accuracy.
+     *
+     * **Tip**: Lower threshold = faster (accepts Tier 1 more often).
+     * Higher threshold = more accurate (triggers Tier 2 fallback more often).
+     *
+     * **Important**: Mark fields as `.optional()` in your Zod schema if they might not
+     * appear in all documents. This prevents unnecessary fallback triggers and reduces costs.
+     */
+    confidenceThreshold?: number;
+}
+/**
+ * Confidence details for a single extracted field.
+ * Provides evidence and explanation for each extraction.
+ */
+interface FieldConfidence {
+    /** JSON path to the field (e.g., "$.invoice_number"). */
+    field: string;
+    /** Confidence score for this field (0.0 to 1.0). */
+    score: number;
+    /** Explanation of how the value was extracted (e.g., "Exact match", "Inferred from header"). */
+    reason: string;
+    /** Page number where the field was found. */
+    page: number;
+    /** Source text evidence from the document. */
+    text: string;
 }
 /**
  * Metadata about the extraction process.
@@ -34,6 +68,12 @@ interface ExtractionMetadata {
     credits: number;
     /** Whether the fallback model was triggered for higher accuracy. */
     fallbackTriggered: boolean;
+    /** Overall confidence score for the extraction (0.0 to 1.0). */
+    confidenceScore: number;
+    /** Per-field confidence details with evidence and explanations. */
+    fieldConfidence: FieldConfidence[];
+    /** List of issues or warnings encountered during extraction. */
+    issues: string[];
 }
 /**
  * Error response from the API.
@@ -57,7 +97,10 @@ interface ExtractResult<T> {
 }
 /**
- * Parsefy client for extracting structured data from documents.
+ * Parsefy client for extracting structured data from financial documents.
+ *
+ * **Important**: All fields are **required by default**. Use `.optional()` for fields
+ * that may not appear in all documents to avoid triggering expensive fallback models.
  *
  * @example
  * ```ts
@@ -67,13 +110,24 @@ interface ExtractResult<T> {
  * const client = new Parsefy('pk_your_api_key');
  *
  * const schema = z.object({
- *   name: z.string(),
+ *   // REQUIRED - fallback triggered if below confidence threshold
+ *   invoice_number: z.string(),
  *   total: z.number(),
+ *
+ *   // OPTIONAL - won't trigger fallback if missing
+ *   vendor: z.string().optional(),
+ *   notes: z.string().optional(),
  * });
  *
- * const { object, error } = await client.extract({
+ * const { object, metadata, error } = await client.extract({
  *   file: './invoice.pdf',
  *   schema,
+ *   confidenceThreshold: 0.85, // default
+ * });
+ *
+ * // Check per-field confidence and evidence
+ * metadata.fieldConfidence.forEach((fc) => {
+ *   console.log(`${fc.field}: ${fc.score} - "${fc.text}"`);
  * });
  * ```
  */
@@ -109,25 +163,41 @@ declare class Parsefy {
      */
     private getEnvApiKey;
     /**
-     * Extracts structured data from a document using the provided Zod schema.
+     * Extracts structured data from a financial document using the provided Zod schema.
+     *
+     * ** Billing Warning**: All fields are **required by default**. If a required field
+     * returns `null` or falls below the `confidenceThreshold`, the fallback model is triggered,
+     * which is more expensive. Use `.optional()` for fields that may not appear in all documents.
      *
-     * @param options - Extraction options including file and schema.
-     * @returns Promise resolving to the extraction result with typed data.
+     * @param options - Extraction options including file, schema, and confidence threshold.
+     * @returns Promise resolving to the extraction result with typed data and field-level confidence.
      *
      * @example
      * ```ts
      * const schema = z.object({
+     *   // REQUIRED - triggers fallback if confidence < threshold
      *   invoice_number: z.string().describe('The invoice number'),
-     *   total: z.number().describe('Total amount'),
+     *   total: z.number().describe('Total amount including tax'),
+     *
+     *   // OPTIONAL - won't trigger fallback if missing or low confidence
+     *   vendor: z.string().optional().describe('Vendor/supplier name'),
+     *   due_date: z.string().optional().describe('Payment due date'),
      * });
      *
      * const { object, metadata, error } = await client.extract({
      *   file: './invoice.pdf',
      *   schema,
+     *   confidenceThreshold: 0.85, // Lower = faster, Higher = more accurate
      * });
      *
      * if (!error && object) {
      *   console.log(object.invoice_number); // Fully typed!
+     *
+     *   // Access field-level confidence and evidence
+     *   console.log(`Overall confidence: ${metadata.confidenceScore}`);
+     *   metadata.fieldConfidence.forEach((fc) => {
+     *     console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
+     *   });
      * }
      * ```
      */
@@ -180,4 +250,4 @@ declare class ValidationError extends ParsefyError {
     constructor(message: string);
 }
-export { APIError, type APIErrorResponse, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };
+export { APIError, type APIErrorResponse, DEFAULT_CONFIDENCE_THRESHOLD, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, type FieldConfidence, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };

package/dist/index.d.mts CHANGED Viewed

@@ -11,6 +11,11 @@ interface ParsefyConfig {
     /** Request timeout in milliseconds. Defaults to 60000 (60 seconds). */
     timeout?: number;
 }
+/**
+ * Default confidence threshold for extraction.
+ * Fields below this threshold on required fields will trigger the fallback model.
+ */
+declare const DEFAULT_CONFIDENCE_THRESHOLD = 0.85;
 /**
  * Options for the extract method.
  */
@@ -19,6 +24,35 @@ interface ExtractOptions<T extends z.ZodType> {
     file: File | Blob | Buffer | string;
     /** Zod schema defining the structure of data to extract. */
     schema: T;
+    /**
+     * Confidence threshold for extraction (0.0 to 1.0). Defaults to 0.85.
+     *
+     * If a **required** field's confidence falls below this threshold (or returns null),
+     * the fallback model is triggered for higher accuracy.
+     *
+     * **Tip**: Lower threshold = faster (accepts Tier 1 more often).
+     * Higher threshold = more accurate (triggers Tier 2 fallback more often).
+     *
+     * **Important**: Mark fields as `.optional()` in your Zod schema if they might not
+     * appear in all documents. This prevents unnecessary fallback triggers and reduces costs.
+     */
+    confidenceThreshold?: number;
+}
+/**
+ * Confidence details for a single extracted field.
+ * Provides evidence and explanation for each extraction.
+ */
+interface FieldConfidence {
+    /** JSON path to the field (e.g., "$.invoice_number"). */
+    field: string;
+    /** Confidence score for this field (0.0 to 1.0). */
+    score: number;
+    /** Explanation of how the value was extracted (e.g., "Exact match", "Inferred from header"). */
+    reason: string;
+    /** Page number where the field was found. */
+    page: number;
+    /** Source text evidence from the document. */
+    text: string;
 }
 /**
  * Metadata about the extraction process.
@@ -34,6 +68,12 @@ interface ExtractionMetadata {
     credits: number;
     /** Whether the fallback model was triggered for higher accuracy. */
     fallbackTriggered: boolean;
+    /** Overall confidence score for the extraction (0.0 to 1.0). */
+    confidenceScore: number;
+    /** Per-field confidence details with evidence and explanations. */
+    fieldConfidence: FieldConfidence[];
+    /** List of issues or warnings encountered during extraction. */
+    issues: string[];
 }
 /**
  * Error response from the API.
@@ -57,7 +97,10 @@ interface ExtractResult<T> {
 }
 /**
- * Parsefy client for extracting structured data from documents.
+ * Parsefy client for extracting structured data from financial documents.
+ *
+ * **Important**: All fields are **required by default**. Use `.optional()` for fields
+ * that may not appear in all documents to avoid triggering expensive fallback models.
  *
  * @example
  * ```ts
@@ -67,13 +110,24 @@ interface ExtractResult<T> {
  * const client = new Parsefy('pk_your_api_key');
  *
  * const schema = z.object({
- *   name: z.string(),
+ *   // REQUIRED - fallback triggered if below confidence threshold
+ *   invoice_number: z.string(),
  *   total: z.number(),
+ *
+ *   // OPTIONAL - won't trigger fallback if missing
+ *   vendor: z.string().optional(),
+ *   notes: z.string().optional(),
  * });
  *
- * const { object, error } = await client.extract({
+ * const { object, metadata, error } = await client.extract({
  *   file: './invoice.pdf',
  *   schema,
+ *   confidenceThreshold: 0.85, // default
+ * });
+ *
+ * // Check per-field confidence and evidence
+ * metadata.fieldConfidence.forEach((fc) => {
+ *   console.log(`${fc.field}: ${fc.score} - "${fc.text}"`);
  * });
  * ```
  */
@@ -109,25 +163,41 @@ declare class Parsefy {
      */
     private getEnvApiKey;
     /**
-     * Extracts structured data from a document using the provided Zod schema.
+     * Extracts structured data from a financial document using the provided Zod schema.
+     *
+     * ** Billing Warning**: All fields are **required by default**. If a required field
+     * returns `null` or falls below the `confidenceThreshold`, the fallback model is triggered,
+     * which is more expensive. Use `.optional()` for fields that may not appear in all documents.
      *
-     * @param options - Extraction options including file and schema.
-     * @returns Promise resolving to the extraction result with typed data.
+     * @param options - Extraction options including file, schema, and confidence threshold.
+     * @returns Promise resolving to the extraction result with typed data and field-level confidence.
      *
      * @example
      * ```ts
      * const schema = z.object({
+     *   // REQUIRED - triggers fallback if confidence < threshold
      *   invoice_number: z.string().describe('The invoice number'),
-     *   total: z.number().describe('Total amount'),
+     *   total: z.number().describe('Total amount including tax'),
+     *
+     *   // OPTIONAL - won't trigger fallback if missing or low confidence
+     *   vendor: z.string().optional().describe('Vendor/supplier name'),
+     *   due_date: z.string().optional().describe('Payment due date'),
      * });
      *
      * const { object, metadata, error } = await client.extract({
      *   file: './invoice.pdf',
      *   schema,
+     *   confidenceThreshold: 0.85, // Lower = faster, Higher = more accurate
      * });
      *
      * if (!error && object) {
      *   console.log(object.invoice_number); // Fully typed!
+     *
+     *   // Access field-level confidence and evidence
+     *   console.log(`Overall confidence: ${metadata.confidenceScore}`);
+     *   metadata.fieldConfidence.forEach((fc) => {
+     *     console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
+     *   });
      * }
      * ```
      */
@@ -180,4 +250,4 @@ declare class ValidationError extends ParsefyError {
     constructor(message: string);
 }
-export { APIError, type APIErrorResponse, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };
+export { APIError, type APIErrorResponse, DEFAULT_CONFIDENCE_THRESHOLD, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, type FieldConfidence, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };

package/dist/index.d.ts CHANGED Viewed

@@ -11,6 +11,11 @@ interface ParsefyConfig {
     /** Request timeout in milliseconds. Defaults to 60000 (60 seconds). */
     timeout?: number;
 }
+/**
+ * Default confidence threshold for extraction.
+ * Fields below this threshold on required fields will trigger the fallback model.
+ */
+declare const DEFAULT_CONFIDENCE_THRESHOLD = 0.85;
 /**
  * Options for the extract method.
  */
@@ -19,6 +24,35 @@ interface ExtractOptions<T extends z.ZodType> {
     file: File | Blob | Buffer | string;
     /** Zod schema defining the structure of data to extract. */
     schema: T;
+    /**
+     * Confidence threshold for extraction (0.0 to 1.0). Defaults to 0.85.
+     *
+     * If a **required** field's confidence falls below this threshold (or returns null),
+     * the fallback model is triggered for higher accuracy.
+     *
+     * **Tip**: Lower threshold = faster (accepts Tier 1 more often).
+     * Higher threshold = more accurate (triggers Tier 2 fallback more often).
+     *
+     * **Important**: Mark fields as `.optional()` in your Zod schema if they might not
+     * appear in all documents. This prevents unnecessary fallback triggers and reduces costs.
+     */
+    confidenceThreshold?: number;
+}
+/**
+ * Confidence details for a single extracted field.
+ * Provides evidence and explanation for each extraction.
+ */
+interface FieldConfidence {
+    /** JSON path to the field (e.g., "$.invoice_number"). */
+    field: string;
+    /** Confidence score for this field (0.0 to 1.0). */
+    score: number;
+    /** Explanation of how the value was extracted (e.g., "Exact match", "Inferred from header"). */
+    reason: string;
+    /** Page number where the field was found. */
+    page: number;
+    /** Source text evidence from the document. */
+    text: string;
 }
 /**
  * Metadata about the extraction process.
@@ -34,6 +68,12 @@ interface ExtractionMetadata {
     credits: number;
     /** Whether the fallback model was triggered for higher accuracy. */
     fallbackTriggered: boolean;
+    /** Overall confidence score for the extraction (0.0 to 1.0). */
+    confidenceScore: number;
+    /** Per-field confidence details with evidence and explanations. */
+    fieldConfidence: FieldConfidence[];
+    /** List of issues or warnings encountered during extraction. */
+    issues: string[];
 }
 /**
  * Error response from the API.
@@ -57,7 +97,10 @@ interface ExtractResult<T> {
 }
 /**
- * Parsefy client for extracting structured data from documents.
+ * Parsefy client for extracting structured data from financial documents.
+ *
+ * **Important**: All fields are **required by default**. Use `.optional()` for fields
+ * that may not appear in all documents to avoid triggering expensive fallback models.
  *
  * @example
  * ```ts
@@ -67,13 +110,24 @@ interface ExtractResult<T> {
  * const client = new Parsefy('pk_your_api_key');
  *
  * const schema = z.object({
- *   name: z.string(),
+ *   // REQUIRED - fallback triggered if below confidence threshold
+ *   invoice_number: z.string(),
  *   total: z.number(),
+ *
+ *   // OPTIONAL - won't trigger fallback if missing
+ *   vendor: z.string().optional(),
+ *   notes: z.string().optional(),
  * });
  *
- * const { object, error } = await client.extract({
+ * const { object, metadata, error } = await client.extract({
  *   file: './invoice.pdf',
  *   schema,
+ *   confidenceThreshold: 0.85, // default
+ * });
+ *
+ * // Check per-field confidence and evidence
+ * metadata.fieldConfidence.forEach((fc) => {
+ *   console.log(`${fc.field}: ${fc.score} - "${fc.text}"`);
  * });
  * ```
  */
@@ -109,25 +163,41 @@ declare class Parsefy {
      */
     private getEnvApiKey;
     /**
-     * Extracts structured data from a document using the provided Zod schema.
+     * Extracts structured data from a financial document using the provided Zod schema.
+     *
+     * ** Billing Warning**: All fields are **required by default**. If a required field
+     * returns `null` or falls below the `confidenceThreshold`, the fallback model is triggered,
+     * which is more expensive. Use `.optional()` for fields that may not appear in all documents.
      *
-     * @param options - Extraction options including file and schema.
-     * @returns Promise resolving to the extraction result with typed data.
+     * @param options - Extraction options including file, schema, and confidence threshold.
+     * @returns Promise resolving to the extraction result with typed data and field-level confidence.
      *
      * @example
      * ```ts
      * const schema = z.object({
+     *   // REQUIRED - triggers fallback if confidence < threshold
      *   invoice_number: z.string().describe('The invoice number'),
-     *   total: z.number().describe('Total amount'),
+     *   total: z.number().describe('Total amount including tax'),
+     *
+     *   // OPTIONAL - won't trigger fallback if missing or low confidence
+     *   vendor: z.string().optional().describe('Vendor/supplier name'),
+     *   due_date: z.string().optional().describe('Payment due date'),
      * });
      *
      * const { object, metadata, error } = await client.extract({
      *   file: './invoice.pdf',
      *   schema,
+     *   confidenceThreshold: 0.85, // Lower = faster, Higher = more accurate
      * });
      *
      * if (!error && object) {
      *   console.log(object.invoice_number); // Fully typed!
+     *
+     *   // Access field-level confidence and evidence
+     *   console.log(`Overall confidence: ${metadata.confidenceScore}`);
+     *   metadata.fieldConfidence.forEach((fc) => {
+     *     console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
+     *   });
      * }
      * ```
      */
@@ -180,4 +250,4 @@ declare class ValidationError extends ParsefyError {
     constructor(message: string);
 }
-export { APIError, type APIErrorResponse, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };
+export { APIError, type APIErrorResponse, DEFAULT_CONFIDENCE_THRESHOLD, type ExtractOptions, type ExtractResult, ExtractionError, type ExtractionMetadata, type FieldConfidence, Parsefy, type ParsefyConfig, ParsefyError, ValidationError };

package/dist/index.mjs CHANGED Viewed

@@ -1,2 +1,2 @@
-import {zodToJsonSchema}from'zod-to-json-schema';var m={".pdf":"application/pdf",".docx":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"},f=10*1024*1024,h="https://api.parsefy.io",g=6e4;var i=class extends Error{constructor(t,r){super(t),this.name="ParsefyError",this.code=r,typeof Error.captureStackTrace=="function"&&Error.captureStackTrace(this,this.constructor);}},c=class extends i{constructor(t,r,o){super(t),this.name="APIError",this.statusCode=r,this.response=o;}},l=class extends i{constructor(t,r,o){super(t,r),this.name="ExtractionError",this.metadata=o;}},s=class extends i{constructor(t){super(t),this.name="ValidationError";}};function d(){return typeof process<"u"&&process.versions?.node!==void 0}function T(e){return zodToJsonSchema(e,{$refStrategy:"none",target:"openApi3"})}function b(e){let t=e.toLowerCase().match(/\.[^.]+$/)?.[0];return t&&m[t]||null}function R(e){if(!b(e)){let r=Object.keys(m).join(", ");throw new s(`Unsupported file type. Supported types: ${r}`)}}function u(e){if(e===0)throw new s("File is empty");if(e>f){let t=f/1048576;throw new s(`File size exceeds maximum limit of ${t}MB`)}}function w(e){return {object:e.object,metadata:{processingTimeMs:e.metadata.processing_time_ms,inputTokens:e.metadata.input_tokens,outputTokens:e.metadata.output_tokens,credits:e.metadata.credits,fallbackTriggered:e.metadata.fallback_triggered},error:e.error}}function E(e,t){let r=b(t)||"application/octet-stream",o=e.buffer.slice(e.byteOffset,e.byteOffset+e.byteLength);return typeof File<"u"?new File([o],t,{type:r}):new Blob([o],{type:r})}async function _(e){if(!d())throw new s("File path strings are only supported in Node.js. Use File or Blob in the browser.");let t=await import('fs'),r=await import('path');if(!t.existsSync(e))throw new s(`File not found: ${e}`);let o=r.basename(e);R(o);let a=t.readFileSync(e);return u(a.length),{buffer:a,filename:o}}async function P(e){if(typeof e=="string"){let{buffer:t,filename:r}=await _(e);return E(t,r)}if(Buffer.isBuffer(e))return u(e.length),E(e,"document.pdf");if(e instanceof File)return R(e.name),u(e.size),e;if(e instanceof Blob)return u(e.size),e;throw new s("Invalid file input. Expected File, Blob, Buffer, or file path string.")}function F(e){return new Promise(t=>setTimeout(t,e))}function k(e,t=1e3){let r=t*Math.pow(2,e),o=Math.random()*.1*r;return Math.min(r+o,3e4)}var y=class{constructor(t){this.maxRetries=3;let r={};if(typeof t=="string"?r={apiKey:t}:t&&(r=t),this.apiKey=r.apiKey||this.getEnvApiKey(),!this.apiKey)throw new s("API key is required. Provide it in the constructor or set the PARSEFY_API_KEY environment variable.");this.baseUrl=r.baseUrl||h,this.timeout=r.timeout||g;}getEnvApiKey(){return d()&&process.env.PARSEFY_API_KEY||""}async extract(t){let{file:r,schema:o}=t,a=T(o),n=await P(r),p=new FormData;return p.append("file",n),p.append("output_schema",JSON.stringify(a)),this.makeRequestWithRetry(p)}async makeRequestWithRetry(t,r=0){try{return await this.makeRequest(t)}catch(o){if(o instanceof c&&o.statusCode===429&&r<this.maxRetries){let a=k(r);return await F(a),this.makeRequestWithRetry(t,r+1)}throw o}}async makeRequest(t){let r=`${this.baseUrl}/v1/extract`,o=new AbortController,a=setTimeout(()=>o.abort(),this.timeout);try{let n=await fetch(r,{method:"POST",headers:{Authorization:`Bearer ${this.apiKey}`},body:t,signal:o.signal});if(clearTimeout(a),!n.ok){let x=await this.parseErrorResponse(n);throw new c(x.message||`API request failed with status ${n.status}`,n.status,x)}let p=await n.json();return w(p)}catch(n){throw clearTimeout(a),n instanceof Error&&n.name==="AbortError"?new i(`Request timed out after ${this.timeout}ms`,"TIMEOUT"):n instanceof i?n:n instanceof TypeError?new i("Network error: Unable to connect to the Parsefy API","NETWORK_ERROR"):new i(`Unexpected error: ${n instanceof Error?n.message:String(n)}`,"UNKNOWN_ERROR")}}async parseErrorResponse(t){try{return await t.json()}catch{try{return {message:await t.text()||t.statusText}}catch{return {message:t.statusText}}}}};
-export{c as APIError,l as ExtractionError,y as Parsefy,i as ParsefyError,s as ValidationError};
+import {zodToJsonSchema}from'zod-to-json-schema';var u=.85,d={".pdf":"application/pdf",".docx":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"},l=10*1024*1024,g="https://api.parsefy.io",x=6e4;var s=class extends Error{constructor(e,r){super(e),this.name="ParsefyError",this.code=r,typeof Error.captureStackTrace=="function"&&Error.captureStackTrace(this,this.constructor);}},p=class extends s{constructor(e,r,o){super(e),this.name="APIError",this.statusCode=r,this.response=o;}},y=class extends s{constructor(e,r,o){super(e,r),this.name="ExtractionError",this.metadata=o;}},a=class extends s{constructor(e){super(e),this.name="ValidationError";}};function h(){return typeof process<"u"&&process.versions?.node!==void 0}function R(t){let e=zodToJsonSchema(t,{$refStrategy:"none",target:"jsonSchema7"});return "$schema"in e&&delete e.$schema,e}function b(t){let e=t.toLowerCase().match(/\.[^.]+$/)?.[0];return e&&d[e]||null}function w(t){if(!b(t)){let r=Object.keys(d).join(", ");throw new a(`Unsupported file type. Supported types: ${r}`)}}function m(t){if(t===0)throw new a("File is empty");if(t>l){let e=l/1048576;throw new a(`File size exceeds maximum limit of ${e}MB`)}}function F(t){let e=t._meta||{confidence_score:1,field_confidence:[],issues:[]};return {object:t.object,metadata:{processingTimeMs:t.metadata.processing_time_ms,inputTokens:t.metadata.input_tokens,outputTokens:t.metadata.output_tokens,credits:t.metadata.credits,fallbackTriggered:t.metadata.fallback_triggered,confidenceScore:e.confidence_score,fieldConfidence:e.field_confidence.map(r=>({field:r.field,score:r.score,reason:r.reason,page:r.page,text:r.text})),issues:e.issues},error:t.error}}function T(t,e){let r=b(e)||"application/octet-stream",o=t.buffer.slice(t.byteOffset,t.byteOffset+t.byteLength);return typeof File<"u"?new File([o],e,{type:r}):new Blob([o],{type:r})}async function I(t){if(!h())throw new a("File path strings are only supported in Node.js. Use File or Blob in the browser.");let e=await import('fs'),r=await import('path');if(!e.existsSync(t))throw new a(`File not found: ${t}`);let o=r.basename(t);w(o);let c=e.readFileSync(t);return m(c.length),{buffer:c,filename:o}}async function _(t){if(typeof t=="string"){let{buffer:e,filename:r}=await I(t);return T(e,r)}if(Buffer.isBuffer(t))return m(t.length),T(t,"document.pdf");if(t instanceof File)return w(t.name),m(t.size),t;if(t instanceof Blob)return m(t.size),t;throw new a("Invalid file input. Expected File, Blob, Buffer, or file path string.")}function P(t){return new Promise(e=>setTimeout(e,t))}function S(t,e=1e3){let r=e*Math.pow(2,t),o=Math.random()*.1*r;return Math.min(r+o,3e4)}var E=class{constructor(e){this.maxRetries=3;let r={};if(typeof e=="string"?r={apiKey:e}:e&&(r=e),this.apiKey=r.apiKey||this.getEnvApiKey(),!this.apiKey)throw new a("API key is required. Provide it in the constructor or set the PARSEFY_API_KEY environment variable.");this.baseUrl=r.baseUrl||g,this.timeout=r.timeout||x;}getEnvApiKey(){return h()&&process.env.PARSEFY_API_KEY||""}async extract(e){let{file:r,schema:o,confidenceThreshold:c}=e,n=R(o),f=await _(r),i=new FormData;return i.append("file",f),i.append("output_schema",JSON.stringify(n)),i.append("confidence_threshold",String(c??.85)),this.makeRequestWithRetry(i)}async makeRequestWithRetry(e,r=0){try{return await this.makeRequest(e)}catch(o){if(o instanceof p&&o.statusCode===429&&r<this.maxRetries){let c=S(r);return await P(c),this.makeRequestWithRetry(e,r+1)}throw o}}async makeRequest(e){let r=`${this.baseUrl}/v1/extract`,o=new AbortController,c=setTimeout(()=>o.abort(),this.timeout);try{let n=await fetch(r,{method:"POST",headers:{Authorization:`Bearer ${this.apiKey}`},body:e,signal:o.signal});if(clearTimeout(c),!n.ok){let i=await this.parseErrorResponse(n);throw new p(i.message||`API request failed with status ${n.status}`,n.status,i)}let f;try{f=await n.json();}catch{throw new s("Failed to parse API response as JSON. The API may have returned an invalid response.","PARSE_ERROR")}try{return F(f)}catch(i){throw new s(`Failed to transform API response: ${i instanceof Error?i.message:String(i)}`,"TRANSFORM_ERROR")}}catch(n){throw clearTimeout(c),n instanceof Error&&n.name==="AbortError"?new s(`Request timed out after ${this.timeout}ms`,"TIMEOUT"):n instanceof s?n:n instanceof TypeError&&n.message.includes("fetch")?new s(`Network error: Unable to connect to the Parsefy API. ${n.message}`,"NETWORK_ERROR"):n instanceof TypeError?new s(`Type error: ${n.message}. This may indicate an API response format issue.`,"TYPE_ERROR"):new s(`Unexpected error: ${n instanceof Error?n.message:String(n)}`,"UNKNOWN_ERROR")}}async parseErrorResponse(e){try{return await e.json()}catch{try{return {message:await e.text()||e.statusText}}catch{return {message:e.statusText}}}}};
+export{p as APIError,u as DEFAULT_CONFIDENCE_THRESHOLD,y as ExtractionError,E as Parsefy,s as ParsefyError,a as ValidationError};

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "parsefy",
-  "version": "1.0.1",
-  "description": "Official TypeScript SDK for Parsefy - AI-powered document data extraction",
+  "version": "1.0.3",
+  "description": "Official TypeScript SDK for Parsefy - Financial Document Infrastructure for Developers",
   "author": "",
   "license": "MIT",
   "repository": {