@pdfvector/client 0.0.29 → 0.0.31
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +20 -0
- package/README.md +117 -76
- package/package.json +2 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,25 @@
|
|
|
1
1
|
# @pdfvector/client
|
|
2
2
|
|
|
3
|
+
## 0.0.31
|
|
4
|
+
### Patch Changes
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
- [#244](https://github.com/phuctm97/pdfvector/pull/244) [`d751cdd`](https://github.com/phuctm97/pdfvector/commit/d751cdde1c208c3298d1a0c2c34406e724e53264) Thanks [@khanhduyvt0101](https://github.com/khanhduyvt0101)! - Improve PDF Vector SDK error handling.
|
|
9
|
+
|
|
10
|
+
- Updated dependencies [[`d751cdd`](https://github.com/phuctm97/pdfvector/commit/d751cdde1c208c3298d1a0c2c34406e724e53264)]:
|
|
11
|
+
- @pdfvector/instance-client@0.0.51
|
|
12
|
+
|
|
13
|
+
## 0.0.30
|
|
14
|
+
### Patch Changes
|
|
15
|
+
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
- [#240](https://github.com/phuctm97/pdfvector/pull/240) [`2c8691c`](https://github.com/phuctm97/pdfvector/commit/2c8691c9bbd251ff7b7a153fd4254d9360c11c08) Thanks [@khanhduyvt0101](https://github.com/khanhduyvt0101)! - Add academic.parse to resolve academic paper IDs or provider URLs to public PDFs and parse them to markdown.
|
|
19
|
+
|
|
20
|
+
- Updated dependencies []:
|
|
21
|
+
- @pdfvector/instance-client@0.0.50
|
|
22
|
+
|
|
3
23
|
## 0.0.29
|
|
4
24
|
### Patch Changes
|
|
5
25
|
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# PDF Vector TypeScript/JavaScript SDK
|
|
2
2
|
|
|
3
|
-
The official TypeScript/JavaScript SDK for the [PDF Vector](https://www.pdfvector.com) API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.
|
|
3
|
+
The official TypeScript/JavaScript SDK for the [PDF Vector](https://www.pdfvector.com) API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, convert academic paper IDs or provider URLs to markdown, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
@@ -380,6 +380,36 @@ result.errors?.forEach((error) => {
|
|
|
380
380
|
|
|
381
381
|
**Supported ID types:** DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, ERIC ID, Europe PMC ID, OpenAlex ID.
|
|
382
382
|
|
|
383
|
+
### Parse Academic Paper to Markdown
|
|
384
|
+
|
|
385
|
+
Resolve a paper ID or provider URL to its public PDF and parse it into markdown. Uses the same per-page model pricing as Document Parse.
|
|
386
|
+
|
|
387
|
+
```typescript
|
|
388
|
+
const result = await client.academic.parse({
|
|
389
|
+
id: "1706.03762", // DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, or provider URL
|
|
390
|
+
model: "auto", // "auto" | "nano" | "mini" | "pro" | "max"
|
|
391
|
+
});
|
|
392
|
+
|
|
393
|
+
console.log(`Title: ${result.title}`);
|
|
394
|
+
console.log(`Provider: ${result.detectedProvider}`);
|
|
395
|
+
console.log(`PDF: ${result.pdfURL}`);
|
|
396
|
+
console.log(result.markdown);
|
|
397
|
+
console.log(`Pages: ${result.pageCount}, Credits: ${result.credits}`);
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
You can pass a provider URL instead of an ID:
|
|
401
|
+
|
|
402
|
+
```typescript
|
|
403
|
+
const result = await client.academic.parse({
|
|
404
|
+
url: "https://arxiv.org/abs/1706.03762",
|
|
405
|
+
model: "nano",
|
|
406
|
+
});
|
|
407
|
+
|
|
408
|
+
console.log(result.markdown);
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
Provide exactly one of `id` or `url`. If the paper cannot be found, has no public PDF, or the resolved PDF cannot be fetched, the API returns a typed `PDFVectorError` with a clear message and no parse credits are charged.
|
|
412
|
+
|
|
383
413
|
### Find Citations for a Paragraph
|
|
384
414
|
|
|
385
415
|
Find relevant academic citations for each sentence in a paragraph using semantic similarity. Costs 2 credits per sentence analyzed.
|
|
@@ -573,6 +603,7 @@ console.log(resultB.documentId); // "doc-b"
|
|
|
573
603
|
| Bank Statement Extract | 6 | 10 | 14 | 18 | /page |
|
|
574
604
|
| Academic Search | 2 | 2 | 2 | 2 | /request |
|
|
575
605
|
| Academic Fetch | 2 | 2 | 2 | 2 | /request |
|
|
606
|
+
| Academic Parse | 1 | 2 | 4 | 8 | /page |
|
|
576
607
|
| Academic Find Citations | 2 | 2 | 2 | 2 | /sentence |
|
|
577
608
|
| Academic Paper Graph | 2+ | 2+ | 2+ | 2+ | /request |
|
|
578
609
|
| Academic Similar Papers | 3 | 3 | 3 | 3 | /request |
|
|
@@ -580,10 +611,14 @@ console.log(resultB.documentId); // "doc-b"
|
|
|
580
611
|
|
|
581
612
|
## Error Handling
|
|
582
613
|
|
|
583
|
-
All API errors are thrown as `PDFVectorError` instances. The SDK
|
|
614
|
+
All API errors are thrown as `PDFVectorError` instances. The SDK maps server errors into specific subclasses and adds user/agent-friendly fields such as `title`, `suggestion`, `userError`, retry flags, and `requestId`.
|
|
584
615
|
|
|
585
616
|
```typescript
|
|
586
|
-
import {
|
|
617
|
+
import {
|
|
618
|
+
PDFVectorError,
|
|
619
|
+
createClient,
|
|
620
|
+
isPDFVectorUserError,
|
|
621
|
+
} from "@pdfvector/client";
|
|
587
622
|
|
|
588
623
|
const client = createClient({ apiKey: "your-api-key" });
|
|
589
624
|
|
|
@@ -593,35 +628,59 @@ try {
|
|
|
593
628
|
});
|
|
594
629
|
console.log(result.markdown);
|
|
595
630
|
} catch (error) {
|
|
631
|
+
if (isPDFVectorUserError(error)) {
|
|
632
|
+
console.error(error.title);
|
|
633
|
+
console.error(error.suggestion);
|
|
634
|
+
return;
|
|
635
|
+
}
|
|
636
|
+
|
|
596
637
|
if (error instanceof PDFVectorError) {
|
|
597
|
-
console.error(
|
|
598
|
-
console.error(
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
|
|
604
|
-
|
|
638
|
+
console.error(error.supportMessage);
|
|
639
|
+
console.error(error.toAgentError());
|
|
640
|
+
return;
|
|
641
|
+
}
|
|
642
|
+
|
|
643
|
+
// Network errors (DNS, connection refused, timeout) bubble up as TypeError.
|
|
644
|
+
console.error("Unexpected Error:", error);
|
|
645
|
+
}
|
|
646
|
+
```
|
|
647
|
+
|
|
648
|
+
### User errors
|
|
649
|
+
|
|
650
|
+
Use `isPDFVectorUserError(error)` or `error.userError` for caller-fixable failures that should usually be shown to the user instead of reported as system failures. For example, URL input failures such as `URL did not return a supported document` are `URLFetchError` instances with `userError: true`.
|
|
651
|
+
|
|
652
|
+
```typescript
|
|
653
|
+
import { isPDFVectorUserError, isPDFVectorError } from "@pdfvector/client";
|
|
654
|
+
|
|
655
|
+
try {
|
|
656
|
+
await client.document.parse({ url: "https://example.com/page.html" });
|
|
657
|
+
} catch (error) {
|
|
658
|
+
if (isPDFVectorUserError(error)) {
|
|
659
|
+
console.error(error.suggestion);
|
|
660
|
+
}
|
|
661
|
+
|
|
662
|
+
if (isPDFVectorError(error) && error.retryableWithHigherModel) {
|
|
663
|
+
console.error("Retry with a stronger model or a smaller document.");
|
|
605
664
|
}
|
|
606
665
|
}
|
|
607
666
|
```
|
|
608
667
|
|
|
609
668
|
### Branching on specific error types
|
|
610
669
|
|
|
611
|
-
Every error class extends `PDFVectorError`, so you can use `instanceof` to handle specific cases. Specialized subclasses expose typed fields pulled from the error
|
|
670
|
+
Every error class extends `PDFVectorError`, so you can use `instanceof` to handle specific cases. Specialized subclasses expose typed fields pulled from the error payload:
|
|
612
671
|
|
|
613
672
|
```typescript
|
|
614
673
|
import {
|
|
615
|
-
|
|
674
|
+
EmptyDocumentError,
|
|
675
|
+
ExtractionFailedError,
|
|
616
676
|
FileTooLargeError,
|
|
677
|
+
InvalidSchemaError,
|
|
678
|
+
NoPublicPDFError,
|
|
617
679
|
PageLimitExceededError,
|
|
618
680
|
PasswordProtectedError,
|
|
619
|
-
URLFetchError,
|
|
620
|
-
UnauthorizedError,
|
|
621
681
|
TooManyRequestsError,
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
PDFVectorError,
|
|
682
|
+
UnauthorizedError,
|
|
683
|
+
URLFetchError,
|
|
625
684
|
} from "@pdfvector/client";
|
|
626
685
|
|
|
627
686
|
try {
|
|
@@ -633,14 +692,18 @@ try {
|
|
|
633
692
|
);
|
|
634
693
|
} else if (error instanceof PageLimitExceededError) {
|
|
635
694
|
console.error(
|
|
636
|
-
`Document has ${error.pageCount} pages
|
|
695
|
+
`Document has ${error.pageCount} pages; ${error.model} supports up to ${error.pageLimit}`,
|
|
637
696
|
);
|
|
638
697
|
} else if (error instanceof PasswordProtectedError) {
|
|
639
698
|
console.error("Remove the password from the file and try again");
|
|
640
699
|
} else if (error instanceof URLFetchError) {
|
|
641
|
-
console.error(
|
|
700
|
+
console.error(error.suggestion);
|
|
701
|
+
} else if (error instanceof InvalidSchemaError) {
|
|
702
|
+
console.error(error.reason);
|
|
703
|
+
} else if (error instanceof NoPublicPDFError) {
|
|
704
|
+
console.error("Provide a direct PDF URL or upload the paper file directly");
|
|
642
705
|
} else if (error instanceof UnauthorizedError) {
|
|
643
|
-
console.error("Invalid API key
|
|
706
|
+
console.error("Invalid API key; check your dashboard");
|
|
644
707
|
} else if (error instanceof TooManyRequestsError) {
|
|
645
708
|
console.error(`Rate limit ${error.limit} exceeded; resets at ${error.resetAt}`);
|
|
646
709
|
} else if (error instanceof EmptyDocumentError) {
|
|
@@ -648,34 +711,6 @@ try {
|
|
|
648
711
|
} else if (error instanceof ExtractionFailedError) {
|
|
649
712
|
console.error(`Extraction failed. Hint: ${error.hint}`);
|
|
650
713
|
if (error.rawText) console.error(`Model output sample: ${error.rawText}`);
|
|
651
|
-
} else if (error instanceof PDFVectorError) {
|
|
652
|
-
// Catch-all for any error code not specifically handled
|
|
653
|
-
console.error(`API Error [${error.code}]: ${error.message}`);
|
|
654
|
-
}
|
|
655
|
-
}
|
|
656
|
-
```
|
|
657
|
-
|
|
658
|
-
You can also branch on the error code if you prefer:
|
|
659
|
-
|
|
660
|
-
```typescript
|
|
661
|
-
try {
|
|
662
|
-
await client.document.parse({ url: "..." });
|
|
663
|
-
} catch (error) {
|
|
664
|
-
if (error instanceof PDFVectorError) {
|
|
665
|
-
switch (error.code) {
|
|
666
|
-
case "UNAUTHORIZED":
|
|
667
|
-
console.error("Invalid API key");
|
|
668
|
-
break;
|
|
669
|
-
case "BAD_REQUEST":
|
|
670
|
-
console.error("Validation error:", error.message);
|
|
671
|
-
break;
|
|
672
|
-
case "UNPROCESSABLE_CONTENT":
|
|
673
|
-
console.error("Could not process document:", error.message);
|
|
674
|
-
break;
|
|
675
|
-
case "INTERNAL_SERVER_ERROR":
|
|
676
|
-
console.error(`Server error (requestId: ${error.requestId}):`, error.message);
|
|
677
|
-
break;
|
|
678
|
-
}
|
|
679
714
|
}
|
|
680
715
|
}
|
|
681
716
|
```
|
|
@@ -690,13 +725,17 @@ PDFVectorError
|
|
|
690
725
|
│ ├── PasswordProtectedError
|
|
691
726
|
│ ├── UnsupportedFormatError — format, supportedFormats
|
|
692
727
|
│ ├── URLFetchError — url, statusCode, statusText
|
|
728
|
+
│ ├── InvalidDocumentURLError
|
|
729
|
+
│ ├── InvalidBase64Error
|
|
693
730
|
│ ├── TierNotSupportedError — documentType, model, allowedTypes
|
|
694
731
|
│ ├── InvalidSchemaError — reason
|
|
695
732
|
│ └── NoInputProvidedError
|
|
696
733
|
├── UnauthorizedError (401)
|
|
697
734
|
├── NotFoundError (404)
|
|
735
|
+
│ ├── AcademicPaperNotFoundError — input, paperErrorCode
|
|
736
|
+
│ └── NoPublicPDFError — input, paperTitle, doi, providerURL
|
|
698
737
|
├── ConflictError (409)
|
|
699
|
-
├── TooManyRequestsError (429) — limit, resetAt
|
|
738
|
+
├── TooManyRequestsError (429) — limit, resetAt, retryAfterSeconds
|
|
700
739
|
├── UnprocessableContentError (422)
|
|
701
740
|
│ ├── EmptyDocumentError
|
|
702
741
|
│ ├── NoTextDetectedError
|
|
@@ -709,42 +748,36 @@ PDFVectorError
|
|
|
709
748
|
|
|
710
749
|
| Field | Type | Description |
|
|
711
750
|
|-------|------|-------------|
|
|
712
|
-
| `code` | `string` |
|
|
713
|
-
| `status` | `number` | HTTP status code
|
|
714
|
-
| `
|
|
715
|
-
| `
|
|
716
|
-
| `
|
|
751
|
+
| `code` | `string` | API error code (`BAD_REQUEST`, `UNAUTHORIZED`, etc.) |
|
|
752
|
+
| `status` | `number` | HTTP-style status code |
|
|
753
|
+
| `title` | `string` | Short readable summary |
|
|
754
|
+
| `message` | `string` | Server-provided error message |
|
|
755
|
+
| `suggestion` | `string` | Recommended next action |
|
|
756
|
+
| `category` | `string` | `authentication`, `validation`, `document_input`, `document_processing`, `rate_limit`, `not_found`, `conflict`, `unsupported`, or `server` |
|
|
757
|
+
| `origin` | `"user" \| "system"` | Whether the failure is caller-fixable or likely server/provider-side |
|
|
758
|
+
| `userError` | `boolean` | `true` for expected caller-fixable failures |
|
|
759
|
+
| `retryable` | `boolean` | `true` when retrying may help |
|
|
760
|
+
| `retryableWithHigherModel` | `boolean` | `true` when retrying with a stronger model or smaller document may help |
|
|
761
|
+
| `requestId` | `number \| undefined` | Server-assigned request ID; include in support tickets |
|
|
717
762
|
| `documentId` | `string \| undefined` | Echoed back if you passed `context.documentId` |
|
|
718
|
-
| `
|
|
719
|
-
| `
|
|
720
|
-
|
|
721
|
-
|
|
722
|
-
|
|
723
|
-
If you'd rather not import `PDFVectorError` just to do an `instanceof` check, use the `isPDFVectorError` guard:
|
|
724
|
-
|
|
725
|
-
```typescript
|
|
726
|
-
import { isPDFVectorError } from "@pdfvector/client";
|
|
763
|
+
| `reasonCode` | `string \| undefined` | More specific server reason when available, such as `NO_PUBLIC_PDF` |
|
|
764
|
+
| `supportMessage` | `string` | Compact support/logging message |
|
|
765
|
+
| `data` | `Record<string, unknown>` | Raw error payload from the server |
|
|
766
|
+
| `cause` | `unknown` | Original underlying error |
|
|
727
767
|
|
|
728
|
-
|
|
729
|
-
await client.document.parse({ url: "..." });
|
|
730
|
-
} catch (error) {
|
|
731
|
-
if (isPDFVectorError(error)) {
|
|
732
|
-
console.error(error.code, error.message, error.requestId);
|
|
733
|
-
}
|
|
734
|
-
}
|
|
735
|
-
```
|
|
768
|
+
Use `error.toAgentError()` or `JSON.stringify(error)` when you need a serializable error object for logs, workflows, retry planners, or agent tool responses.
|
|
736
769
|
|
|
737
770
|
### Error Codes
|
|
738
771
|
|
|
739
772
|
| Code | Status | Description |
|
|
740
773
|
|------|--------|-------------|
|
|
741
|
-
| `BAD_REQUEST` | 400 | Input validation failed
|
|
774
|
+
| `BAD_REQUEST` | 400 | Input validation failed, including invalid URLs, unsupported formats, file size limits, page limits, invalid base64, and invalid JSON Schema |
|
|
742
775
|
| `UNAUTHORIZED` | 401 | Missing or invalid API key |
|
|
743
|
-
| `NOT_FOUND` | 404 | Resource not found
|
|
776
|
+
| `NOT_FOUND` | 404 | Resource not found, including academic paper IDs and papers without public PDFs |
|
|
744
777
|
| `CONFLICT` | 409 | Operation conflicts with the current state |
|
|
745
|
-
| `UNPROCESSABLE_CONTENT` | 422 | Document could not be processed
|
|
778
|
+
| `UNPROCESSABLE_CONTENT` | 422 | Document could not be processed, including empty documents, no readable text, and extraction failures |
|
|
746
779
|
| `TOO_MANY_REQUESTS` | 429 | Rate limit exceeded |
|
|
747
|
-
| `INTERNAL_SERVER_ERROR` | 500 | Server-side failure
|
|
780
|
+
| `INTERNAL_SERVER_ERROR` | 500 | Server-side failure; capture the `requestId` for support |
|
|
748
781
|
| `NOT_IMPLEMENTED` | 501 | Endpoint not available on this instance |
|
|
749
782
|
|
|
750
783
|
## TypeScript Support
|
|
@@ -755,6 +788,7 @@ The SDK is written in TypeScript and includes full type definitions:
|
|
|
755
788
|
import {
|
|
756
789
|
createClient,
|
|
757
790
|
isPDFVectorError,
|
|
791
|
+
isPDFVectorUserError,
|
|
758
792
|
// Base error class — all errors inherit from this
|
|
759
793
|
PDFVectorError,
|
|
760
794
|
// HTTP-aligned error categories
|
|
@@ -772,12 +806,16 @@ import {
|
|
|
772
806
|
PasswordProtectedError,
|
|
773
807
|
UnsupportedFormatError,
|
|
774
808
|
URLFetchError,
|
|
809
|
+
InvalidDocumentURLError,
|
|
810
|
+
InvalidBase64Error,
|
|
775
811
|
TierNotSupportedError,
|
|
776
812
|
InvalidSchemaError,
|
|
777
813
|
NoInputProvidedError,
|
|
778
814
|
EmptyDocumentError,
|
|
779
815
|
NoTextDetectedError,
|
|
780
816
|
ExtractionFailedError,
|
|
817
|
+
AcademicPaperNotFoundError,
|
|
818
|
+
NoPublicPDFError,
|
|
781
819
|
// Underlying ORPC error — re-exported for advanced use cases
|
|
782
820
|
ORPCError,
|
|
783
821
|
} from "@pdfvector/client";
|
|
@@ -789,7 +827,10 @@ import type {
|
|
|
789
827
|
ContractInputs,
|
|
790
828
|
ContractOutputs,
|
|
791
829
|
PDFVectorModel,
|
|
830
|
+
PDFVectorAgentError,
|
|
831
|
+
PDFVectorErrorCategory,
|
|
792
832
|
PDFVectorErrorCode,
|
|
833
|
+
PDFVectorErrorOrigin,
|
|
793
834
|
} from "@pdfvector/client";
|
|
794
835
|
```
|
|
795
836
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pdfvector/client",
|
|
3
|
-
"version": "0.0.
|
|
3
|
+
"version": "0.0.31",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Official TypeScript/JavaScript SDK for PDF Vector API",
|
|
6
6
|
"license": "MIT",
|
|
@@ -23,7 +23,7 @@
|
|
|
23
23
|
},
|
|
24
24
|
"main": ".tsc/lib/index.js",
|
|
25
25
|
"dependencies": {
|
|
26
|
-
"@pdfvector/instance-client": "^0.0.
|
|
26
|
+
"@pdfvector/instance-client": "^0.0.51"
|
|
27
27
|
},
|
|
28
28
|
"files": [
|
|
29
29
|
".tsc",
|