@pdftron/data-extraction 10.1.1-1 → 10.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/package.json +1 -1
  2. package/readme.md +21 -50
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pdftron/data-extraction",
3
- "version": "10.1.1-1",
3
+ "version": "10.1.1",
4
4
  "main": "./lib/main.js",
5
5
  "binary": {
6
6
  "module_name": "ApryseIDP",
package/readme.md CHANGED
@@ -1,67 +1,38 @@
1
- ## @pdftron/data-extraction
1
+ ## @pdftron/pdfnet-node
2
2
 
3
- This package is meant to be used in conjunction with @pdftron/pdfnet-node to support IDP data extraction from Apryse. Follow this guide for more info on usage.
4
- https://docs.apryse.com/documentation/core/guides/intelligent-data-extraction/
5
-
6
- For further reading checkout our blog post on the project.
7
- https://apryse.com/blog/introducing-automated-data-extraction-pdf-idp
3
+ This package leverages the full power of PDFTron's native SDK for maximal performance and accuracy. In order to maintain consistency across platforms the Javascript API is used in the same manner as the PDFNet API available in PDFTron's Web platform. Since access to the filesystem is included in Node.js/Electron some additional APIs requiring filesystem access have also been included.
8
4
 
9
5
  #### Supported platform, Node.js, and Electron versions
10
6
  This package depends on unmanaged add-on binaries, and the add-on binaries are not cross-platform. At the moment we have support for
11
- * **OS**: Linux (excluding Alpine), Windows(x64)
7
+ * **OS**: Linux (excluding Alpine), Windows(x64), Mac
12
8
  * **Node.js version**: 8 - 18
13
9
  * **Electron version**: 6 - 19
14
10
 
15
11
  Installation will fail if your OS, Node.js or Electron version is not supported.
16
12
 
17
- #### Usage
18
-
19
- Add the `@pdftron/data-extraction` package as a dependency in your `package.json`
20
-
21
- Inside of your @pdftron/pdfnet-node code after initialization you should include the following line:
22
-
23
- ```javascript
24
- await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")
25
- ```
26
-
27
- Here is an example of data extraction being used with this line.
13
+ To install for Electron, *runtime* and *target* options are needed. For example, For Electron 6, we need to run *npm i @pdftron/pdfnet-node --runtime=electron --target=6.0.0*. Note that we need to use *6.0.0* for all Electron 6 versions.
28
14
 
15
+ #### Usage
16
+ Here is a code snippet to demonstrate how to use this package.
29
17
  ```javascript
30
- const { PDFNet } = require('@pdftron/pdfnet-node');
31
- const licenseKey = "Insert license key here"
32
- const inputFile = "Insert input file location here"
33
-
34
- async function main() {
35
- // This is where we import data-extraction
36
- await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")
37
-
38
- // Extract document structure as a JSON file
39
- console.log('Extract document structure as a JSON file');
40
-
41
- let outputFile = 'out/paragraphs_and_tables.json';
42
- await PDFNet.DataExtractionModule.extractData(inputFile, outputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
43
-
44
- console.log('Result saved in ' + outputFile);
45
-
46
- ///////////////////////////////////////////////////////
47
- // Extract document structure as a JSON string
48
- console.log('Extract document structure as a JSON string');
49
-
50
- outputFile = 'out/tagged.json';
51
- const json = await PDFNet.DataExtractionModule.extractDataAsString(inputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
52
-
53
- fs.writeFileSync(outputFile, json);
54
- }
55
-
56
- PDFNet.runWithCleanup(main, licenseKey).catch(function (error) {
57
- console.log('Error: ' + JSON.stringify(error));
58
- }).then(function () { return PDFNet.shutdown(); });;
59
-
18
+ const { PDFNet } = require('@pdftron/pdfnet-node'); // you may need to set up NODE_PATH environment variable to make this work.
19
+
20
+ const main = async() => {
21
+ const doc = await PDFNet.PDFDoc.create();
22
+ const page = await doc.pageCreate();
23
+ doc.pagePushBack(page);
24
+ doc.save('blank.pdf', PDFNet.SDFDoc.SaveOptions.e_linearized);
25
+ };
26
+
27
+ // add your own license key as the second parameter, e.g. in place of 'YOUR_LICENSE_KEY'.
28
+ PDFNet.runWithCleanup(main, 'YOUR_LICENSE_KEY').catch(function(error) {
29
+ console.log('Error: ' + JSON.stringify(error));
30
+ }).then(function(){ return PDFNet.shutdown(); });
60
31
  ```
61
32
 
62
- A larger code sample can be found [here](https://docs.apryse.com/documentation/samples/node/js/DataExtractionTest/)
33
+ There are some code samples in the [@pdftron/pdfnet-node-samples](https://www.npmjs.com/package/@pdftron/pdfnet-node-samples) package.
63
34
 
64
35
  To get started please see the documentation at https://www.pdftron.com/documentation/nodejs/get-started/integration.
65
36
 
66
37
  #### Licensing
67
- Please go to https://docs.apryse.com/documentation/core/info/license/ to obtain a demo or production license.
38
+ Please go to https://www.pdftron.com/pws/get-key to obtain a demo license or https://www.pdftron.com/form/contact-sales to obtain a production key. For further information, please visit https://www.pdftron.com/licensing.