@opendataloader/pdf 1.11.2 → 1.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -5
- package/lib/opendataloader-pdf-cli.jar +0 -0
- package/package.json +6 -6
package/README.md
CHANGED
|
@@ -6,7 +6,6 @@
|
|
|
6
6
|
[](https://pypi.org/project/opendataloader-pdf/)
|
|
7
7
|
[](https://www.npmjs.com/package/@opendataloader/pdf)
|
|
8
8
|
[](https://search.maven.org/artifact/org.opendataloader/opendataloader-pdf-core)
|
|
9
|
-
[](https://github.com/opendataloader-project/opendataloader-pdf/pkgs/container/opendataloader-pdf-cli)
|
|
10
9
|
[](https://github.com/opendataloader-project/opendataloader-pdf#java)
|
|
11
10
|
|
|
12
11
|
Convert PDFs into **LLM-ready Markdown and JSON** with accurate reading order, table extraction, and bounding boxes — all running locally on your machine.
|
|
@@ -57,14 +56,14 @@ Building RAG pipelines? You've probably hit these problems:
|
|
|
57
56
|
- **Bounding Boxes** — Every element includes `[x1, y1, x2, y2]` coordinates for citations
|
|
58
57
|
- **Reading Order** — XY-Cut++ algorithm handles multi-column layouts correctly
|
|
59
58
|
- **Noise Filtering** — Headers, footers, hidden text, watermarks auto-removed
|
|
60
|
-
- **LangChain Integration** — [Official document loader](https://
|
|
59
|
+
- **LangChain Integration** — [Official document loader](https://docs.langchain.com/oss/python/integrations/document_loaders/opendataloader_pdf)
|
|
61
60
|
|
|
62
61
|
### Performance & Privacy
|
|
63
62
|
|
|
64
63
|
- **No GPU** — Fast, rule-based heuristics
|
|
65
64
|
- **Local-First** — Your documents never leave your machine
|
|
66
65
|
- **High Throughput** — Process thousands of PDFs efficiently
|
|
67
|
-
- **Multi-Language SDK** — Python, Node.js, Java
|
|
66
|
+
- **Multi-Language SDK** — Python, Node.js, Java
|
|
68
67
|
|
|
69
68
|
### Document Understanding
|
|
70
69
|
|
|
@@ -135,7 +134,6 @@ Building RAG pipelines? You've probably hit these problems:
|
|
|
135
134
|
|
|
136
135
|
- [Python](https://opendataloader.org/docs/quick-start-python)
|
|
137
136
|
- [Node.js / TypeScript](https://opendataloader.org/docs/quick-start-nodejs)
|
|
138
|
-
- [Docker](https://opendataloader.org/docs/quick-start-docker)
|
|
139
137
|
- [Java](https://opendataloader.org/docs/quick-start-java)
|
|
140
138
|
|
|
141
139
|
<br/>
|
|
@@ -232,7 +230,7 @@ opendataloader_pdf.convert(
|
|
|
232
230
|
|
|
233
231
|
- **Local-first**: Simple pages processed locally, complex pages routed to backend
|
|
234
232
|
- **Fallback**: If backend unavailable, gracefully falls back to local processing
|
|
235
|
-
- **Privacy**: Run the backend locally
|
|
233
|
+
- **Privacy**: Run the backend locally for 100% on-premise
|
|
236
234
|
|
|
237
235
|
### Formula Extraction (LaTeX)
|
|
238
236
|
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@opendataloader/pdf",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.12.0",
|
|
4
4
|
"description": "A Node.js wrapper for the opendataloader-pdf Java CLI.",
|
|
5
5
|
"main": "./dist/index.cjs",
|
|
6
6
|
"module": "./dist/index.js",
|
|
@@ -48,12 +48,12 @@
|
|
|
48
48
|
},
|
|
49
49
|
"devDependencies": {
|
|
50
50
|
"@eslint/js": "^10.0.1",
|
|
51
|
-
"@types/node": "^25.3.
|
|
52
|
-
"@typescript-eslint/eslint-plugin": "^8.56.
|
|
53
|
-
"@typescript-eslint/parser": "^8.56.
|
|
54
|
-
"eslint": "^10.0.
|
|
51
|
+
"@types/node": "^25.3.3",
|
|
52
|
+
"@typescript-eslint/eslint-plugin": "^8.56.1",
|
|
53
|
+
"@typescript-eslint/parser": "^8.56.1",
|
|
54
|
+
"eslint": "^10.0.2",
|
|
55
55
|
"glob": "^13.0.6",
|
|
56
|
-
"globals": "^17.
|
|
56
|
+
"globals": "^17.4.0",
|
|
57
57
|
"prettier": "^3.8.1",
|
|
58
58
|
"tsup": "^8.5.1",
|
|
59
59
|
"typescript": "^5.9.3",
|