@pdfvector/client 0.0.29 → 0.0.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/CHANGELOG.md +20 -0
  2. package/README.md +117 -76
  3. package/package.json +2 -2
package/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  # @pdfvector/client
2
2
 
3
+ ## 0.0.31
4
+ ### Patch Changes
5
+
6
+
7
+
8
+ - [#244](https://github.com/phuctm97/pdfvector/pull/244) [`d751cdd`](https://github.com/phuctm97/pdfvector/commit/d751cdde1c208c3298d1a0c2c34406e724e53264) Thanks [@khanhduyvt0101](https://github.com/khanhduyvt0101)! - Improve PDF Vector SDK error handling.
9
+
10
+ - Updated dependencies [[`d751cdd`](https://github.com/phuctm97/pdfvector/commit/d751cdde1c208c3298d1a0c2c34406e724e53264)]:
11
+ - @pdfvector/instance-client@0.0.51
12
+
13
+ ## 0.0.30
14
+ ### Patch Changes
15
+
16
+
17
+
18
+ - [#240](https://github.com/phuctm97/pdfvector/pull/240) [`2c8691c`](https://github.com/phuctm97/pdfvector/commit/2c8691c9bbd251ff7b7a153fd4254d9360c11c08) Thanks [@khanhduyvt0101](https://github.com/khanhduyvt0101)! - Add academic.parse to resolve academic paper IDs or provider URLs to public PDFs and parse them to markdown.
19
+
20
+ - Updated dependencies []:
21
+ - @pdfvector/instance-client@0.0.50
22
+
3
23
  ## 0.0.29
4
24
  ### Patch Changes
5
25
 
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # PDF Vector TypeScript/JavaScript SDK
2
2
 
3
- The official TypeScript/JavaScript SDK for the [PDF Vector](https://www.pdfvector.com) API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.
3
+ The official TypeScript/JavaScript SDK for the [PDF Vector](https://www.pdfvector.com) API: Parse PDF, Word, Image, and Excel documents to clean, structured markdown format, ask questions about documents using AI, extract structured data from documents with JSON Schema, search across multiple academic databases with a unified API, fetch specific publications by DOI, PubMed ID, ArXiv ID, and more, convert academic paper IDs or provider URLs to markdown, find relevant academic citations for paragraphs of text, explore paper citation graphs, find similar papers, and search for research grants across US, EU, and UK funding databases.
4
4
 
5
5
  ## Installation
6
6
 
@@ -380,6 +380,36 @@ result.errors?.forEach((error) => {
380
380
 
381
381
  **Supported ID types:** DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, ERIC ID, Europe PMC ID, OpenAlex ID.
382
382
 
383
+ ### Parse Academic Paper to Markdown
384
+
385
+ Resolve a paper ID or provider URL to its public PDF and parse it into markdown. Uses the same per-page model pricing as Document Parse.
386
+
387
+ ```typescript
388
+ const result = await client.academic.parse({
389
+ id: "1706.03762", // DOI, PubMed ID, ArXiv ID, Semantic Scholar ID, or provider URL
390
+ model: "auto", // "auto" | "nano" | "mini" | "pro" | "max"
391
+ });
392
+
393
+ console.log(`Title: ${result.title}`);
394
+ console.log(`Provider: ${result.detectedProvider}`);
395
+ console.log(`PDF: ${result.pdfURL}`);
396
+ console.log(result.markdown);
397
+ console.log(`Pages: ${result.pageCount}, Credits: ${result.credits}`);
398
+ ```
399
+
400
+ You can pass a provider URL instead of an ID:
401
+
402
+ ```typescript
403
+ const result = await client.academic.parse({
404
+ url: "https://arxiv.org/abs/1706.03762",
405
+ model: "nano",
406
+ });
407
+
408
+ console.log(result.markdown);
409
+ ```
410
+
411
+ Provide exactly one of `id` or `url`. If the paper cannot be found, has no public PDF, or the resolved PDF cannot be fetched, the API returns a typed `PDFVectorError` with a clear message and no parse credits are charged.
412
+
383
413
  ### Find Citations for a Paragraph
384
414
 
385
415
  Find relevant academic citations for each sentence in a paragraph using semantic similarity. Costs 2 credits per sentence analyzed.
@@ -573,6 +603,7 @@ console.log(resultB.documentId); // "doc-b"
573
603
  | Bank Statement Extract | 6 | 10 | 14 | 18 | /page |
574
604
  | Academic Search | 2 | 2 | 2 | 2 | /request |
575
605
  | Academic Fetch | 2 | 2 | 2 | 2 | /request |
606
+ | Academic Parse | 1 | 2 | 4 | 8 | /page |
576
607
  | Academic Find Citations | 2 | 2 | 2 | 2 | /sentence |
577
608
  | Academic Paper Graph | 2+ | 2+ | 2+ | 2+ | /request |
578
609
  | Academic Similar Papers | 3 | 3 | 3 | 3 | /request |
@@ -580,10 +611,14 @@ console.log(resultB.documentId); // "doc-b"
580
611
 
581
612
  ## Error Handling
582
613
 
583
- All API errors are thrown as `PDFVectorError` instances. The SDK transparently maps every server error into the most specific subclass it can, so you can branch on the type using `instanceof` and read typed metadata fields directly.
614
+ All API errors are thrown as `PDFVectorError` instances. The SDK maps server errors into specific subclasses and adds user/agent-friendly fields such as `title`, `suggestion`, `userError`, retry flags, and `requestId`.
584
615
 
585
616
  ```typescript
586
- import { createClient, PDFVectorError } from "@pdfvector/client";
617
+ import {
618
+ PDFVectorError,
619
+ createClient,
620
+ isPDFVectorUserError,
621
+ } from "@pdfvector/client";
587
622
 
588
623
  const client = createClient({ apiKey: "your-api-key" });
589
624
 
@@ -593,35 +628,59 @@ try {
593
628
  });
594
629
  console.log(result.markdown);
595
630
  } catch (error) {
631
+ if (isPDFVectorUserError(error)) {
632
+ console.error(error.title);
633
+ console.error(error.suggestion);
634
+ return;
635
+ }
636
+
596
637
  if (error instanceof PDFVectorError) {
597
- console.error(`API Error [${error.code}]: ${error.message}`);
598
- console.error(`HTTP Status: ${error.status}`);
599
- console.error(`Request ID: ${error.requestId}`); // server-assigned, useful for support
600
- console.error(`Document ID: ${error.documentId}`); // echoed back if you set one
601
- console.error(`User error: ${error.userError}`); // true if caused by your input
602
- } else {
603
- // Network errors (DNS, connection refused, timeout) bubble up as TypeError.
604
- console.error("Unexpected Error:", error);
638
+ console.error(error.supportMessage);
639
+ console.error(error.toAgentError());
640
+ return;
641
+ }
642
+
643
+ // Network errors (DNS, connection refused, timeout) bubble up as TypeError.
644
+ console.error("Unexpected Error:", error);
645
+ }
646
+ ```
647
+
648
+ ### User errors
649
+
650
+ Use `isPDFVectorUserError(error)` or `error.userError` for caller-fixable failures that should usually be shown to the user instead of reported as system failures. For example, URL input failures such as `URL did not return a supported document` are `URLFetchError` instances with `userError: true`.
651
+
652
+ ```typescript
653
+ import { isPDFVectorUserError, isPDFVectorError } from "@pdfvector/client";
654
+
655
+ try {
656
+ await client.document.parse({ url: "https://example.com/page.html" });
657
+ } catch (error) {
658
+ if (isPDFVectorUserError(error)) {
659
+ console.error(error.suggestion);
660
+ }
661
+
662
+ if (isPDFVectorError(error) && error.retryableWithHigherModel) {
663
+ console.error("Retry with a stronger model or a smaller document.");
605
664
  }
606
665
  }
607
666
  ```
608
667
 
609
668
  ### Branching on specific error types
610
669
 
611
- Every error class extends `PDFVectorError`, so you can use `instanceof` to handle specific cases. Specialized subclasses expose typed fields pulled from the error's `data` payload:
670
+ Every error class extends `PDFVectorError`, so you can use `instanceof` to handle specific cases. Specialized subclasses expose typed fields pulled from the error payload:
612
671
 
613
672
  ```typescript
614
673
  import {
615
- createClient,
674
+ EmptyDocumentError,
675
+ ExtractionFailedError,
616
676
  FileTooLargeError,
677
+ InvalidSchemaError,
678
+ NoPublicPDFError,
617
679
  PageLimitExceededError,
618
680
  PasswordProtectedError,
619
- URLFetchError,
620
- UnauthorizedError,
621
681
  TooManyRequestsError,
622
- EmptyDocumentError,
623
- ExtractionFailedError,
624
- PDFVectorError,
682
+ UnauthorizedError,
683
+ URLFetchError,
625
684
  } from "@pdfvector/client";
626
685
 
627
686
  try {
@@ -633,14 +692,18 @@ try {
633
692
  );
634
693
  } else if (error instanceof PageLimitExceededError) {
635
694
  console.error(
636
- `Document has ${error.pageCount} pages ${error.model} only supports up to ${error.pageLimit}`,
695
+ `Document has ${error.pageCount} pages; ${error.model} supports up to ${error.pageLimit}`,
637
696
  );
638
697
  } else if (error instanceof PasswordProtectedError) {
639
698
  console.error("Remove the password from the file and try again");
640
699
  } else if (error instanceof URLFetchError) {
641
- console.error(`Could not fetch ${error.url}: ${error.statusCode} ${error.statusText}`);
700
+ console.error(error.suggestion);
701
+ } else if (error instanceof InvalidSchemaError) {
702
+ console.error(error.reason);
703
+ } else if (error instanceof NoPublicPDFError) {
704
+ console.error("Provide a direct PDF URL or upload the paper file directly");
642
705
  } else if (error instanceof UnauthorizedError) {
643
- console.error("Invalid API key check your dashboard");
706
+ console.error("Invalid API key; check your dashboard");
644
707
  } else if (error instanceof TooManyRequestsError) {
645
708
  console.error(`Rate limit ${error.limit} exceeded; resets at ${error.resetAt}`);
646
709
  } else if (error instanceof EmptyDocumentError) {
@@ -648,34 +711,6 @@ try {
648
711
  } else if (error instanceof ExtractionFailedError) {
649
712
  console.error(`Extraction failed. Hint: ${error.hint}`);
650
713
  if (error.rawText) console.error(`Model output sample: ${error.rawText}`);
651
- } else if (error instanceof PDFVectorError) {
652
- // Catch-all for any error code not specifically handled
653
- console.error(`API Error [${error.code}]: ${error.message}`);
654
- }
655
- }
656
- ```
657
-
658
- You can also branch on the error code if you prefer:
659
-
660
- ```typescript
661
- try {
662
- await client.document.parse({ url: "..." });
663
- } catch (error) {
664
- if (error instanceof PDFVectorError) {
665
- switch (error.code) {
666
- case "UNAUTHORIZED":
667
- console.error("Invalid API key");
668
- break;
669
- case "BAD_REQUEST":
670
- console.error("Validation error:", error.message);
671
- break;
672
- case "UNPROCESSABLE_CONTENT":
673
- console.error("Could not process document:", error.message);
674
- break;
675
- case "INTERNAL_SERVER_ERROR":
676
- console.error(`Server error (requestId: ${error.requestId}):`, error.message);
677
- break;
678
- }
679
714
  }
680
715
  }
681
716
  ```
@@ -690,13 +725,17 @@ PDFVectorError
690
725
  │ ├── PasswordProtectedError
691
726
  │ ├── UnsupportedFormatError — format, supportedFormats
692
727
  │ ├── URLFetchError — url, statusCode, statusText
728
+ │ ├── InvalidDocumentURLError
729
+ │ ├── InvalidBase64Error
693
730
  │ ├── TierNotSupportedError — documentType, model, allowedTypes
694
731
  │ ├── InvalidSchemaError — reason
695
732
  │ └── NoInputProvidedError
696
733
  ├── UnauthorizedError (401)
697
734
  ├── NotFoundError (404)
735
+ │ ├── AcademicPaperNotFoundError — input, paperErrorCode
736
+ │ └── NoPublicPDFError — input, paperTitle, doi, providerURL
698
737
  ├── ConflictError (409)
699
- ├── TooManyRequestsError (429) — limit, resetAt
738
+ ├── TooManyRequestsError (429) — limit, resetAt, retryAfterSeconds
700
739
  ├── UnprocessableContentError (422)
701
740
  │ ├── EmptyDocumentError
702
741
  │ ├── NoTextDetectedError
@@ -709,42 +748,36 @@ PDFVectorError
709
748
 
710
749
  | Field | Type | Description |
711
750
  |-------|------|-------------|
712
- | `code` | `string` | The ORPC error code (`BAD_REQUEST`, `UNAUTHORIZED`, etc.) |
713
- | `status` | `number` | HTTP status code (400, 401, 404, 409, 422, 429, 500, 501) |
714
- | `message` | `string` | Human-readable error message |
715
- | `data` | `Record<string, unknown>` | Raw error payload from the server |
716
- | `requestId` | `number \| undefined` | Server-assigned request ID — include in support tickets |
751
+ | `code` | `string` | API error code (`BAD_REQUEST`, `UNAUTHORIZED`, etc.) |
752
+ | `status` | `number` | HTTP-style status code |
753
+ | `title` | `string` | Short readable summary |
754
+ | `message` | `string` | Server-provided error message |
755
+ | `suggestion` | `string` | Recommended next action |
756
+ | `category` | `string` | `authentication`, `validation`, `document_input`, `document_processing`, `rate_limit`, `not_found`, `conflict`, `unsupported`, or `server` |
757
+ | `origin` | `"user" \| "system"` | Whether the failure is caller-fixable or likely server/provider-side |
758
+ | `userError` | `boolean` | `true` for expected caller-fixable failures |
759
+ | `retryable` | `boolean` | `true` when retrying may help |
760
+ | `retryableWithHigherModel` | `boolean` | `true` when retrying with a stronger model or smaller document may help |
761
+ | `requestId` | `number \| undefined` | Server-assigned request ID; include in support tickets |
717
762
  | `documentId` | `string \| undefined` | Echoed back if you passed `context.documentId` |
718
- | `userError` | `boolean` | `true` if the failure was caused by your input (vs. a server-side issue) |
719
- | `cause` | `unknown` | Original error (the underlying `ORPCError` from the wire) |
720
-
721
- ### Type guard
722
-
723
- If you'd rather not import `PDFVectorError` just to do an `instanceof` check, use the `isPDFVectorError` guard:
724
-
725
- ```typescript
726
- import { isPDFVectorError } from "@pdfvector/client";
763
+ | `reasonCode` | `string \| undefined` | More specific server reason when available, such as `NO_PUBLIC_PDF` |
764
+ | `supportMessage` | `string` | Compact support/logging message |
765
+ | `data` | `Record<string, unknown>` | Raw error payload from the server |
766
+ | `cause` | `unknown` | Original underlying error |
727
767
 
728
- try {
729
- await client.document.parse({ url: "..." });
730
- } catch (error) {
731
- if (isPDFVectorError(error)) {
732
- console.error(error.code, error.message, error.requestId);
733
- }
734
- }
735
- ```
768
+ Use `error.toAgentError()` or `JSON.stringify(error)` when you need a serializable error object for logs, workflows, retry planners, or agent tool responses.
736
769
 
737
770
  ### Error Codes
738
771
 
739
772
  | Code | Status | Description |
740
773
  |------|--------|-------------|
741
- | `BAD_REQUEST` | 400 | Input validation failed (e.g., missing fields, invalid URL, file too large, page limit exceeded, invalid JSON Schema) |
774
+ | `BAD_REQUEST` | 400 | Input validation failed, including invalid URLs, unsupported formats, file size limits, page limits, invalid base64, and invalid JSON Schema |
742
775
  | `UNAUTHORIZED` | 401 | Missing or invalid API key |
743
- | `NOT_FOUND` | 404 | Resource not found (e.g., academic paper ID, version) |
776
+ | `NOT_FOUND` | 404 | Resource not found, including academic paper IDs and papers without public PDFs |
744
777
  | `CONFLICT` | 409 | Operation conflicts with the current state |
745
- | `UNPROCESSABLE_CONTENT` | 422 | Document could not be processed (empty, no readable text, extraction failed) |
778
+ | `UNPROCESSABLE_CONTENT` | 422 | Document could not be processed, including empty documents, no readable text, and extraction failures |
746
779
  | `TOO_MANY_REQUESTS` | 429 | Rate limit exceeded |
747
- | `INTERNAL_SERVER_ERROR` | 500 | Server-side failure capture the `requestId` for support |
780
+ | `INTERNAL_SERVER_ERROR` | 500 | Server-side failure; capture the `requestId` for support |
748
781
  | `NOT_IMPLEMENTED` | 501 | Endpoint not available on this instance |
749
782
 
750
783
  ## TypeScript Support
@@ -755,6 +788,7 @@ The SDK is written in TypeScript and includes full type definitions:
755
788
  import {
756
789
  createClient,
757
790
  isPDFVectorError,
791
+ isPDFVectorUserError,
758
792
  // Base error class — all errors inherit from this
759
793
  PDFVectorError,
760
794
  // HTTP-aligned error categories
@@ -772,12 +806,16 @@ import {
772
806
  PasswordProtectedError,
773
807
  UnsupportedFormatError,
774
808
  URLFetchError,
809
+ InvalidDocumentURLError,
810
+ InvalidBase64Error,
775
811
  TierNotSupportedError,
776
812
  InvalidSchemaError,
777
813
  NoInputProvidedError,
778
814
  EmptyDocumentError,
779
815
  NoTextDetectedError,
780
816
  ExtractionFailedError,
817
+ AcademicPaperNotFoundError,
818
+ NoPublicPDFError,
781
819
  // Underlying ORPC error — re-exported for advanced use cases
782
820
  ORPCError,
783
821
  } from "@pdfvector/client";
@@ -789,7 +827,10 @@ import type {
789
827
  ContractInputs,
790
828
  ContractOutputs,
791
829
  PDFVectorModel,
830
+ PDFVectorAgentError,
831
+ PDFVectorErrorCategory,
792
832
  PDFVectorErrorCode,
833
+ PDFVectorErrorOrigin,
793
834
  } from "@pdfvector/client";
794
835
  ```
795
836
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pdfvector/client",
3
- "version": "0.0.29",
3
+ "version": "0.0.31",
4
4
  "type": "module",
5
5
  "description": "Official TypeScript/JavaScript SDK for PDF Vector API",
6
6
  "license": "MIT",
@@ -23,7 +23,7 @@
23
23
  },
24
24
  "main": ".tsc/lib/index.js",
25
25
  "dependencies": {
26
- "@pdfvector/instance-client": "^0.0.49"
26
+ "@pdfvector/instance-client": "^0.0.51"
27
27
  },
28
28
  "files": [
29
29
  ".tsc",