@luii/node-tesseract-ocr 2.1.0 → 2.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CMakeLists.txt +3 -3
- package/README.md +461 -104
- package/binding-options.js +4 -0
- package/dist/cjs/index.cjs +21 -9
- package/dist/cjs/index.d.ts +4 -926
- package/dist/cjs/types.d.ts +1272 -0
- package/dist/cjs/types.js +17 -0
- package/dist/cjs/utils.js +15 -0
- package/dist/esm/index.d.ts +4 -926
- package/dist/esm/index.mjs +16 -9
- package/dist/esm/types.d.ts +1272 -0
- package/dist/esm/types.js +16 -0
- package/dist/esm/utils.js +15 -0
- package/package.json +6 -3
- package/prebuilds/node-tesseract-ocr-darwin-arm64/node-napi-v10.node +0 -0
- package/prebuilds/node-tesseract-ocr-linux-x64/node-napi-v10.node +0 -0
- package/src/commands.hpp +657 -88
- package/src/tesseract_wrapper.cpp +630 -187
- package/src/tesseract_wrapper.hpp +27 -2
- package/src/worker_thread.cpp +146 -2
- package/src/worker_thread.hpp +4 -1
package/README.md
CHANGED
|
@@ -17,7 +17,6 @@ Native C++ addon for Node.js that exposes Tesseract OCR (`libtesseract-dev`) to
|
|
|
17
17
|
- [Types](#types)
|
|
18
18
|
- [Tesseract API](#tesseract-api)
|
|
19
19
|
- [License](#license)
|
|
20
|
-
- [Special Thanks](#special-thanks)
|
|
21
20
|
|
|
22
21
|
## Features
|
|
23
22
|
|
|
@@ -31,7 +30,7 @@ Native C++ addon for Node.js that exposes Tesseract OCR (`libtesseract-dev`) to
|
|
|
31
30
|
- nodejs
|
|
32
31
|
- node-addon-api
|
|
33
32
|
- c++ build toolchain (e.g. build-essentials)
|
|
34
|
-
- libtesseract-dev
|
|
33
|
+
- libtesseract-dev (exactly `5.5.2`)
|
|
35
34
|
- libleptonica-dev
|
|
36
35
|
- Tesseract training data (eng, deu, ...) or let the library handle that
|
|
37
36
|
|
|
@@ -44,6 +43,15 @@ sudo apt update
|
|
|
44
43
|
sudo apt install -y nodejs npm build-essential pkg-config libtesseract-dev libleptonica-dev tesseract-ocr-eng
|
|
45
44
|
```
|
|
46
45
|
|
|
46
|
+
Verify the required Tesseract version:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
pkg-config --modversion tesseract
|
|
50
|
+
# expected: 5.5.2
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
If your distro ships another version, install/build `tesseract 5.5.2` and ensure `pkg-config` resolves that installation.
|
|
54
|
+
|
|
47
55
|
```bash
|
|
48
56
|
git clone git@github.com:luii/node-tesseract-ocr.git
|
|
49
57
|
cd node-tesseract-ocr
|
|
@@ -250,6 +258,18 @@ Full list of page segmentation modes from Tesseract.
|
|
|
250
258
|
| `bottom` | `number` | No | n/a | Bottom coordinate of current element bbox. |
|
|
251
259
|
| `left` | `number` | No | n/a | Left coordinate of current element bbox. |
|
|
252
260
|
|
|
261
|
+
#### `TesseractProcessPagesStatus`
|
|
262
|
+
|
|
263
|
+
| Field | Type | Optional | Default | Description |
|
|
264
|
+
| ----------------- | --------- | -------- | ------- | ----------------------------------------------------- |
|
|
265
|
+
| `active` | `boolean` | No | n/a | Whether a multipage session is currently active. |
|
|
266
|
+
| `healthy` | `boolean` | No | n/a | Whether the renderer is healthy. |
|
|
267
|
+
| `processedPages` | `number` | No | n/a | Number of pages already processed in this session. |
|
|
268
|
+
| `nextPageIndex` | `number` | No | n/a | Zero-based index that will be used for the next page. |
|
|
269
|
+
| `outputBase` | `string` | No | n/a | Effective output base used by the PDF renderer. |
|
|
270
|
+
| `timeoutMillisec` | `number` | No | n/a | Timeout per page in milliseconds (`0` = unlimited). |
|
|
271
|
+
| `textonly` | `boolean` | No | n/a | Whether text-only PDF mode is enabled. |
|
|
272
|
+
|
|
253
273
|
#### `DetectOrientationScriptResult`
|
|
254
274
|
|
|
255
275
|
| Field | Type | Optional | Default | Description |
|
|
@@ -269,13 +289,92 @@ new Tesseract();
|
|
|
269
289
|
|
|
270
290
|
Creates a new Tesseract instance.
|
|
271
291
|
|
|
292
|
+
#### Initialization Requirements
|
|
293
|
+
|
|
294
|
+
Call `init(...)` once before using OCR/engine-dependent methods.
|
|
295
|
+
|
|
296
|
+
Methods that do **not** require `init(...)`:
|
|
297
|
+
|
|
298
|
+
- `version()`
|
|
299
|
+
- `isInitialized()`
|
|
300
|
+
- `setInputName(...)`
|
|
301
|
+
- `getInputName()`
|
|
302
|
+
- `abortProcessPages()`
|
|
303
|
+
- `getProcessPagesStatus()`
|
|
304
|
+
- `document.abort()`
|
|
305
|
+
- `document.status()`
|
|
306
|
+
- `init(...)`
|
|
307
|
+
- `end()`
|
|
308
|
+
|
|
309
|
+
Methods that **require** `init(...)`:
|
|
310
|
+
|
|
311
|
+
- `setInputImage(...)`
|
|
312
|
+
- `getInputImage()`
|
|
313
|
+
- `getSourceYResolution()`
|
|
314
|
+
- `getDataPath()`
|
|
315
|
+
- `setOutputName(...)`
|
|
316
|
+
- `clearPersistentCache()`
|
|
317
|
+
- `clearAdaptiveClassifier()`
|
|
318
|
+
- `setImage(...)`
|
|
319
|
+
- `getThresholdedImage()`
|
|
320
|
+
- `getThresholdedImageScaleFactor()`
|
|
321
|
+
- `setPageMode(...)`
|
|
322
|
+
- `setRectangle(...)`
|
|
323
|
+
- `setSourceResolution(...)`
|
|
324
|
+
- `recognize(...)`
|
|
325
|
+
- `detectOrientationScript()`
|
|
326
|
+
- `meanTextConf()`
|
|
327
|
+
- `allWordConfidences()`
|
|
328
|
+
- `getPAGEText(...)`
|
|
329
|
+
- `getLSTMBoxText(...)`
|
|
330
|
+
- `getBoxText(...)`
|
|
331
|
+
- `getWordStrBoxText(...)`
|
|
332
|
+
- `getOSDText(...)`
|
|
333
|
+
- `getUTF8Text()`
|
|
334
|
+
- `getHOCRText(...)`
|
|
335
|
+
- `getTSVText(...)`
|
|
336
|
+
- `getUNLVText()`
|
|
337
|
+
- `getALTOText(...)`
|
|
338
|
+
- `getInitLanguages()`
|
|
339
|
+
- `getLoadedLanguages()`
|
|
340
|
+
- `getAvailableLanguages()`
|
|
341
|
+
- `setDebugVariable(...)`
|
|
342
|
+
- `setVariable(...)`
|
|
343
|
+
- `getIntVariable(...)`
|
|
344
|
+
- `getBoolVariable(...)`
|
|
345
|
+
- `getDoubleVariable(...)`
|
|
346
|
+
- `getStringVariable(...)`
|
|
347
|
+
- `clear()`
|
|
348
|
+
- `beginProcessPages(...)`
|
|
349
|
+
- `addProcessPage(...)`
|
|
350
|
+
- `finishProcessPages()`
|
|
351
|
+
- `document.begin(...)`
|
|
352
|
+
- `document.addPage(...)`
|
|
353
|
+
- `document.finish()`
|
|
354
|
+
|
|
355
|
+
#### version
|
|
356
|
+
|
|
357
|
+
Returns the currently loaded libtesseract version string.
|
|
358
|
+
|
|
359
|
+
```ts
|
|
360
|
+
version(): Promise<string>
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
#### isInitialized
|
|
364
|
+
|
|
365
|
+
Returns whether `init(...)` has already completed successfully and has not been reset via `end()`.
|
|
366
|
+
|
|
367
|
+
```ts
|
|
368
|
+
isInitialized(): Promise<boolean>
|
|
369
|
+
```
|
|
370
|
+
|
|
272
371
|
#### init
|
|
273
372
|
|
|
274
|
-
Initializes
|
|
373
|
+
Initializes the OCR engine with language, OEM, configs, and variables.
|
|
275
374
|
|
|
276
|
-
| Name
|
|
277
|
-
|
|
|
278
|
-
| options | [`TesseractInitOptions`](#tesseractinitoptions) | No | n/a | Initialization options. |
|
|
375
|
+
| Name | Type | Optional | Default | Description |
|
|
376
|
+
| --------- | ----------------------------------------------- | -------- | ------- | ----------------------- |
|
|
377
|
+
| `options` | [`TesseractInitOptions`](#tesseractinitoptions) | No | n/a | Initialization options. |
|
|
279
378
|
|
|
280
379
|
```ts
|
|
281
380
|
init(options: TesseractInitOptions): Promise<void>
|
|
@@ -283,56 +382,282 @@ init(options: TesseractInitOptions): Promise<void>
|
|
|
283
382
|
|
|
284
383
|
#### initForAnalysePage
|
|
285
384
|
|
|
286
|
-
Initializes
|
|
385
|
+
Initializes the engine in analysis-only mode.
|
|
287
386
|
|
|
288
387
|
```ts
|
|
289
388
|
initForAnalysePage(): Promise<void>
|
|
290
389
|
```
|
|
291
390
|
|
|
292
|
-
####
|
|
391
|
+
#### analyseLayout
|
|
392
|
+
|
|
393
|
+
Runs page layout analysis on the current image.
|
|
394
|
+
|
|
395
|
+
| Name | Type | Optional | Default | Description |
|
|
396
|
+
| ------------------- | --------- | -------- | ------- | ------------------------------------------- |
|
|
397
|
+
| `mergeSimilarWords` | `boolean` | No | n/a | Merge similar words during layout analysis. |
|
|
398
|
+
|
|
399
|
+
```ts
|
|
400
|
+
analyseLayout(mergeSimilarWords: boolean): Promise<void>
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
#### setInputName
|
|
404
|
+
|
|
405
|
+
Sets the source/input name used by renderer/training APIs.
|
|
406
|
+
|
|
407
|
+
| Name | Type | Optional | Default | Description |
|
|
408
|
+
| ----------- | -------- | -------- | ------- | ------------------------------------------ |
|
|
409
|
+
| `inputName` | `string` | No | n/a | Input name used by renderer/training APIs. |
|
|
410
|
+
|
|
411
|
+
```ts
|
|
412
|
+
setInputName(inputName: string): Promise<void>
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
#### getInputName
|
|
416
|
+
|
|
417
|
+
Returns the current input name from engine state.
|
|
418
|
+
|
|
419
|
+
```ts
|
|
420
|
+
getInputName(): Promise<string>
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
#### setInputImage
|
|
424
|
+
|
|
425
|
+
Sets the encoded source image buffer.
|
|
426
|
+
|
|
427
|
+
| Name | Type | Optional | Default | Description |
|
|
428
|
+
| -------- | -------- | -------- | ------- | ---------------------------- |
|
|
429
|
+
| `buffer` | `Buffer` | No | n/a | Encoded source image buffer. |
|
|
430
|
+
|
|
431
|
+
```ts
|
|
432
|
+
setInputImage(buffer: Buffer): Promise<void>
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
#### getInputImage
|
|
293
436
|
|
|
294
|
-
|
|
437
|
+
Returns the current input image bytes.
|
|
295
438
|
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
439
|
+
```ts
|
|
440
|
+
getInputImage(): Promise<Buffer>
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
#### getSourceYResolution
|
|
444
|
+
|
|
445
|
+
Returns source image Y resolution (DPI).
|
|
299
446
|
|
|
300
447
|
```ts
|
|
301
|
-
|
|
448
|
+
getSourceYResolution(): Promise<number>
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
#### getDataPath
|
|
452
|
+
|
|
453
|
+
Returns the active tessdata path from the engine.
|
|
454
|
+
|
|
455
|
+
```ts
|
|
456
|
+
getDataPath(): Promise<string>
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
#### setOutputName
|
|
460
|
+
|
|
461
|
+
Sets the output base name for renderer-based outputs.
|
|
462
|
+
|
|
463
|
+
| Name | Type | Optional | Default | Description |
|
|
464
|
+
| ------------ | -------- | -------- | ------- | -------------------------------------- |
|
|
465
|
+
| `outputName` | `string` | No | n/a | Output base name for renderer outputs. |
|
|
466
|
+
|
|
467
|
+
```ts
|
|
468
|
+
setOutputName(outputName: string): Promise<void>
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
#### clearPersistentCache
|
|
472
|
+
|
|
473
|
+
Clears global library-level caches (for example dictionaries).
|
|
474
|
+
|
|
475
|
+
```ts
|
|
476
|
+
clearPersistentCache(): Promise<void>
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
#### clearAdaptiveClassifier
|
|
480
|
+
|
|
481
|
+
Cleans adaptive classifier state between pages/documents.
|
|
482
|
+
|
|
483
|
+
```ts
|
|
484
|
+
clearAdaptiveClassifier(): Promise<void>
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
#### setImage
|
|
488
|
+
|
|
489
|
+
Sets the image used by OCR recognition.
|
|
490
|
+
|
|
491
|
+
| Name | Type | Optional | Default | Description |
|
|
492
|
+
| -------- | -------- | -------- | ------- | ------------------------ |
|
|
493
|
+
| `buffer` | `Buffer` | No | n/a | Image data used for OCR. |
|
|
494
|
+
|
|
495
|
+
```ts
|
|
496
|
+
setImage(buffer: Buffer): Promise<void>
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
#### getThresholdedImage
|
|
500
|
+
|
|
501
|
+
Returns thresholded image bytes from Tesseract internals.
|
|
502
|
+
|
|
503
|
+
```ts
|
|
504
|
+
getThresholdedImage(): Promise<Buffer>
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
#### getThresholdedImageScaleFactor
|
|
508
|
+
|
|
509
|
+
Returns scale factor for thresholded/component images.
|
|
510
|
+
|
|
511
|
+
```ts
|
|
512
|
+
getThresholdedImageScaleFactor(): Promise<number>
|
|
302
513
|
```
|
|
303
514
|
|
|
304
515
|
#### setPageMode
|
|
305
516
|
|
|
306
|
-
Sets the page segmentation mode.
|
|
517
|
+
Sets the page segmentation mode (PSM).
|
|
307
518
|
|
|
308
|
-
| Name
|
|
309
|
-
|
|
|
310
|
-
| psm
|
|
519
|
+
| Name | Type | Optional | Default | Description |
|
|
520
|
+
| ----- | ----------------------------------------------- | -------- | ------- | ----------------------- |
|
|
521
|
+
| `psm` | [`PageSegmentationMode`](#pagesegmentationmode) | No | n/a | Page segmentation mode. |
|
|
311
522
|
|
|
312
523
|
```ts
|
|
313
524
|
setPageMode(psm: PageSegmentationMode): Promise<void>
|
|
314
525
|
```
|
|
315
526
|
|
|
527
|
+
#### setRectangle
|
|
528
|
+
|
|
529
|
+
Restricts recognition to the given rectangle.
|
|
530
|
+
|
|
531
|
+
| Name | Type | Optional | Default | Description |
|
|
532
|
+
| --------- | --------------------------------------------------------------- | -------- | ------- | ------------------ |
|
|
533
|
+
| `options` | [`TesseractSetRectangleOptions`](#tesseractsetrectangleoptions) | No | n/a | Region definition. |
|
|
534
|
+
|
|
535
|
+
```ts
|
|
536
|
+
setRectangle(options: TesseractSetRectangleOptions): Promise<void>
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
#### setSourceResolution
|
|
540
|
+
|
|
541
|
+
Sets the source resolution in PPI.
|
|
542
|
+
|
|
543
|
+
| Name | Type | Optional | Default | Description |
|
|
544
|
+
| ----- | -------- | -------- | ------- | ------------------------- |
|
|
545
|
+
| `ppi` | `number` | No | n/a | Source resolution in PPI. |
|
|
546
|
+
|
|
547
|
+
```ts
|
|
548
|
+
setSourceResolution(ppi: number): Promise<void>
|
|
549
|
+
```
|
|
550
|
+
|
|
551
|
+
#### document
|
|
552
|
+
|
|
553
|
+
Facade for multipage PDF/document processing lifecycle.
|
|
554
|
+
|
|
555
|
+
```ts
|
|
556
|
+
document: {
|
|
557
|
+
begin(options: TesseractBeginProcessPagesOptions): Promise<void>;
|
|
558
|
+
addPage(buffer: Buffer, filename?: string): Promise<void>;
|
|
559
|
+
finish(): Promise<string>;
|
|
560
|
+
abort(): Promise<void>;
|
|
561
|
+
status(): Promise<TesseractProcessPagesStatus>;
|
|
562
|
+
}
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
#### document.begin
|
|
566
|
+
|
|
567
|
+
Starts a multipage processing session.
|
|
568
|
+
|
|
569
|
+
| Name | Type | Optional | Default | Description |
|
|
570
|
+
| --------- | ----------------------------------- | -------- | ------- | --------------------------- |
|
|
571
|
+
| `options` | `TesseractBeginProcessPagesOptions` | No | n/a | Multipage renderer options. |
|
|
572
|
+
|
|
573
|
+
```ts
|
|
574
|
+
document.begin(options: TesseractBeginProcessPagesOptions): Promise<void>
|
|
575
|
+
```
|
|
576
|
+
|
|
577
|
+
#### document.addPage
|
|
578
|
+
|
|
579
|
+
Adds an encoded page to the active session.
|
|
580
|
+
|
|
581
|
+
| Name | Type | Optional | Default | Description |
|
|
582
|
+
| ---------- | -------- | -------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
583
|
+
| `buffer` | `Buffer` | No | n/a | Encoded page image buffer. |
|
|
584
|
+
| `filename` | `string` | Yes | `undefined` | Optional source filename/path passed to Tesseract `ProcessPage` for this page. Tesseract/Leptonica may open this file internally and use it as the source image for parts of PDF rendering. If output pages look wrong (for example inverted or visually corrupted), pass a real image path here to force a stable source image path for that page. |
|
|
585
|
+
|
|
586
|
+
```ts
|
|
587
|
+
document.addPage(buffer: Buffer, filename?: string): Promise<void>
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
#### document.finish
|
|
591
|
+
|
|
592
|
+
Finalizes the active session and returns output PDF path.
|
|
593
|
+
|
|
594
|
+
```ts
|
|
595
|
+
document.finish(): Promise<string>
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
#### document.abort
|
|
599
|
+
|
|
600
|
+
Aborts and resets the active multipage session.
|
|
601
|
+
|
|
602
|
+
```ts
|
|
603
|
+
document.abort(): Promise<void>
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
#### document.status
|
|
607
|
+
|
|
608
|
+
Returns the current multipage session status (active flag, page counters, and effective renderer settings).
|
|
609
|
+
|
|
610
|
+
```ts
|
|
611
|
+
document.status(): Promise<TesseractProcessPagesStatus>
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
#### getProcessPagesStatus
|
|
615
|
+
|
|
616
|
+
Returns the current multipage session status from the instance API.
|
|
617
|
+
|
|
618
|
+
```ts
|
|
619
|
+
getProcessPagesStatus(): Promise<TesseractProcessPagesStatus>
|
|
620
|
+
```
|
|
621
|
+
|
|
622
|
+
#### setDebugVariable
|
|
623
|
+
|
|
624
|
+
Sets a debug configuration variable.
|
|
625
|
+
|
|
626
|
+
| Name | Type | Optional | Default | Description |
|
|
627
|
+
| ------- | -------------------------------------------------------------- | -------- | ------- | --------------- |
|
|
628
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
629
|
+
| `value` | `SetVariableConfigVariables[keyof SetVariableConfigVariables]` | No | n/a | Variable value. |
|
|
630
|
+
|
|
631
|
+
```ts
|
|
632
|
+
setDebugVariable(
|
|
633
|
+
name: keyof SetVariableConfigVariables,
|
|
634
|
+
value: SetVariableConfigVariables[keyof SetVariableConfigVariables],
|
|
635
|
+
): Promise<boolean>
|
|
636
|
+
```
|
|
637
|
+
|
|
316
638
|
#### setVariable
|
|
317
639
|
|
|
318
|
-
Sets a
|
|
640
|
+
Sets a regular configuration variable.
|
|
319
641
|
|
|
320
|
-
| Name
|
|
321
|
-
|
|
|
322
|
-
| name | keyof SetVariableConfigVariables
|
|
323
|
-
| value | SetVariableConfigVariables
|
|
642
|
+
| Name | Type | Optional | Default | Description |
|
|
643
|
+
| ------- | -------------------------------------------------------------- | -------- | ------- | --------------- |
|
|
644
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
645
|
+
| `value` | `SetVariableConfigVariables[keyof SetVariableConfigVariables]` | No | n/a | Variable value. |
|
|
324
646
|
|
|
325
647
|
```ts
|
|
326
|
-
setVariable(
|
|
648
|
+
setVariable(
|
|
649
|
+
name: keyof SetVariableConfigVariables,
|
|
650
|
+
value: SetVariableConfigVariables[keyof SetVariableConfigVariables],
|
|
651
|
+
): Promise<boolean>
|
|
327
652
|
```
|
|
328
653
|
|
|
329
654
|
#### getIntVariable
|
|
330
655
|
|
|
331
|
-
Reads
|
|
656
|
+
Reads a configuration variable as integer.
|
|
332
657
|
|
|
333
|
-
| Name
|
|
334
|
-
|
|
|
335
|
-
| name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
|
|
658
|
+
| Name | Type | Optional | Default | Description |
|
|
659
|
+
| ------ | ---------------------------------- | -------- | ------- | -------------- |
|
|
660
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
336
661
|
|
|
337
662
|
```ts
|
|
338
663
|
getIntVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
@@ -340,11 +665,11 @@ getIntVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
|
340
665
|
|
|
341
666
|
#### getBoolVariable
|
|
342
667
|
|
|
343
|
-
Reads a
|
|
668
|
+
Reads a configuration variable as boolean (`0`/`1`).
|
|
344
669
|
|
|
345
|
-
| Name
|
|
346
|
-
|
|
|
347
|
-
| name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
|
|
670
|
+
| Name | Type | Optional | Default | Description |
|
|
671
|
+
| ------ | ---------------------------------- | -------- | ------- | -------------- |
|
|
672
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
348
673
|
|
|
349
674
|
```ts
|
|
350
675
|
getBoolVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
@@ -352,11 +677,11 @@ getBoolVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
|
352
677
|
|
|
353
678
|
#### getDoubleVariable
|
|
354
679
|
|
|
355
|
-
Reads a
|
|
680
|
+
Reads a configuration variable as double.
|
|
356
681
|
|
|
357
|
-
| Name
|
|
358
|
-
|
|
|
359
|
-
| name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
|
|
682
|
+
| Name | Type | Optional | Default | Description |
|
|
683
|
+
| ------ | ---------------------------------- | -------- | ------- | -------------- |
|
|
684
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
360
685
|
|
|
361
686
|
```ts
|
|
362
687
|
getDoubleVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
@@ -364,139 +689,175 @@ getDoubleVariable(name: keyof SetVariableConfigVariables): Promise<number>
|
|
|
364
689
|
|
|
365
690
|
#### getStringVariable
|
|
366
691
|
|
|
367
|
-
Reads a
|
|
692
|
+
Reads a configuration variable as string.
|
|
368
693
|
|
|
369
|
-
| Name
|
|
370
|
-
|
|
|
371
|
-
| name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
|
|
694
|
+
| Name | Type | Optional | Default | Description |
|
|
695
|
+
| ------ | ---------------------------------- | -------- | ------- | -------------- |
|
|
696
|
+
| `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
|
|
372
697
|
|
|
373
698
|
```ts
|
|
374
699
|
getStringVariable(name: keyof SetVariableConfigVariables): Promise<string>
|
|
375
700
|
```
|
|
376
701
|
|
|
377
|
-
####
|
|
702
|
+
#### recognize
|
|
378
703
|
|
|
379
|
-
|
|
704
|
+
Runs OCR recognition (optionally with progress callback).
|
|
380
705
|
|
|
381
|
-
| Name
|
|
382
|
-
|
|
|
383
|
-
|
|
|
706
|
+
| Name | Type | Optional | Default | Description |
|
|
707
|
+
| ------------------ | ------------------------------------- | -------- | ----------- | ---------------------- |
|
|
708
|
+
| `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | OCR progress callback. |
|
|
384
709
|
|
|
385
710
|
```ts
|
|
386
|
-
|
|
711
|
+
recognize(progressCallback?: (info: ProgressChangedInfo) => void): Promise<void>
|
|
387
712
|
```
|
|
388
713
|
|
|
389
|
-
####
|
|
714
|
+
#### detectOrientationScript
|
|
390
715
|
|
|
391
|
-
|
|
716
|
+
Detects orientation and script with confidence values.
|
|
392
717
|
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
718
|
+
```ts
|
|
719
|
+
detectOrientationScript(): Promise<DetectOrientationScriptResult>
|
|
720
|
+
```
|
|
721
|
+
|
|
722
|
+
#### meanTextConf
|
|
723
|
+
|
|
724
|
+
Returns mean text confidence.
|
|
396
725
|
|
|
397
726
|
```ts
|
|
398
|
-
|
|
727
|
+
meanTextConf(): Promise<number>
|
|
399
728
|
```
|
|
400
729
|
|
|
401
|
-
####
|
|
730
|
+
#### allWordConfidences
|
|
402
731
|
|
|
403
|
-
|
|
732
|
+
Returns all word confidences for current recognition result.
|
|
733
|
+
|
|
734
|
+
```ts
|
|
735
|
+
allWordConfidences(): Promise<number[]>
|
|
736
|
+
```
|
|
737
|
+
|
|
738
|
+
#### getPAGEText
|
|
404
739
|
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
|
740
|
+
Returns PAGE XML output.
|
|
741
|
+
|
|
742
|
+
| Name | Type | Optional | Default | Description |
|
|
743
|
+
| ------------------ | ------------------------------------- | -------- | ----------- | ---------------------------------- |
|
|
744
|
+
| `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | PAGE generation progress callback. |
|
|
745
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
408
746
|
|
|
409
747
|
```ts
|
|
410
|
-
|
|
748
|
+
getPAGEText(
|
|
749
|
+
progressCallback?: (info: ProgressChangedInfo) => void,
|
|
750
|
+
pageNumber?: number,
|
|
751
|
+
): Promise<string>
|
|
411
752
|
```
|
|
412
753
|
|
|
413
|
-
####
|
|
754
|
+
#### getLSTMBoxText
|
|
414
755
|
|
|
415
|
-
|
|
756
|
+
Returns LSTM box output.
|
|
416
757
|
|
|
417
|
-
| Name
|
|
418
|
-
|
|
|
419
|
-
|
|
|
758
|
+
| Name | Type | Optional | Default | Description |
|
|
759
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
760
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
420
761
|
|
|
421
762
|
```ts
|
|
422
|
-
|
|
763
|
+
getLSTMBoxText(pageNumber?: number): Promise<string>
|
|
423
764
|
```
|
|
424
765
|
|
|
425
|
-
####
|
|
766
|
+
#### getBoxText
|
|
767
|
+
|
|
768
|
+
Returns classic box output.
|
|
426
769
|
|
|
427
|
-
|
|
770
|
+
| Name | Type | Optional | Default | Description |
|
|
771
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
772
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
428
773
|
|
|
429
774
|
```ts
|
|
430
|
-
|
|
775
|
+
getBoxText(pageNumber?: number): Promise<string>
|
|
431
776
|
```
|
|
432
777
|
|
|
433
|
-
####
|
|
778
|
+
#### getWordStrBoxText
|
|
434
779
|
|
|
435
|
-
Returns
|
|
780
|
+
Returns WordStr box output.
|
|
436
781
|
|
|
437
|
-
| Name
|
|
438
|
-
|
|
|
439
|
-
|
|
|
440
|
-
| pageNumber | number | Yes | undefined | Page number (0-based). |
|
|
782
|
+
| Name | Type | Optional | Default | Description |
|
|
783
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
784
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
441
785
|
|
|
442
786
|
```ts
|
|
443
|
-
|
|
444
|
-
progressCallback?: (info: ProgressChangedInfo) => void,
|
|
445
|
-
pageNumber?: number,
|
|
446
|
-
): Promise<string>
|
|
787
|
+
getWordStrBoxText(pageNumber?: number): Promise<string>
|
|
447
788
|
```
|
|
448
789
|
|
|
449
|
-
####
|
|
790
|
+
#### getOSDText
|
|
450
791
|
|
|
451
|
-
Returns
|
|
792
|
+
Returns OSD text output.
|
|
793
|
+
|
|
794
|
+
| Name | Type | Optional | Default | Description |
|
|
795
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
796
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
452
797
|
|
|
453
798
|
```ts
|
|
454
|
-
|
|
799
|
+
getOSDText(pageNumber?: number): Promise<string>
|
|
455
800
|
```
|
|
456
801
|
|
|
457
|
-
####
|
|
802
|
+
#### getUTF8Text
|
|
458
803
|
|
|
459
|
-
Returns
|
|
804
|
+
Returns recognized UTF-8 text.
|
|
460
805
|
|
|
461
806
|
```ts
|
|
462
|
-
|
|
807
|
+
getUTF8Text(): Promise<string>
|
|
463
808
|
```
|
|
464
809
|
|
|
465
|
-
####
|
|
810
|
+
#### getHOCRText
|
|
466
811
|
|
|
467
|
-
Returns
|
|
812
|
+
Returns hOCR output.
|
|
468
813
|
|
|
469
|
-
| Name
|
|
470
|
-
|
|
|
471
|
-
| progressCallback | (info:
|
|
472
|
-
| pageNumber | number
|
|
814
|
+
| Name | Type | Optional | Default | Description |
|
|
815
|
+
| ------------------ | ------------------------------------- | -------- | ----------- | ---------------------------------- |
|
|
816
|
+
| `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | hOCR generation progress callback. |
|
|
817
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
473
818
|
|
|
474
819
|
```ts
|
|
475
|
-
|
|
820
|
+
getHOCRText(
|
|
476
821
|
progressCallback?: (info: ProgressChangedInfo) => void,
|
|
477
822
|
pageNumber?: number,
|
|
478
823
|
): Promise<string>
|
|
479
824
|
```
|
|
480
825
|
|
|
481
|
-
####
|
|
826
|
+
#### getTSVText
|
|
827
|
+
|
|
828
|
+
Returns TSV output.
|
|
482
829
|
|
|
483
|
-
|
|
830
|
+
| Name | Type | Optional | Default | Description |
|
|
831
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
832
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
484
833
|
|
|
485
834
|
```ts
|
|
486
|
-
|
|
835
|
+
getTSVText(pageNumber?: number): Promise<string>
|
|
487
836
|
```
|
|
488
837
|
|
|
489
|
-
####
|
|
838
|
+
#### getUNLVText
|
|
490
839
|
|
|
491
|
-
|
|
840
|
+
Returns UNLV output.
|
|
492
841
|
|
|
493
842
|
```ts
|
|
494
|
-
|
|
843
|
+
getUNLVText(): Promise<string>
|
|
844
|
+
```
|
|
845
|
+
|
|
846
|
+
#### getALTOText
|
|
847
|
+
|
|
848
|
+
Returns ALTO XML output.
|
|
849
|
+
|
|
850
|
+
| Name | Type | Optional | Default | Description |
|
|
851
|
+
| ------------ | -------- | -------- | ----------- | -------------------- |
|
|
852
|
+
| `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
|
|
853
|
+
|
|
854
|
+
```ts
|
|
855
|
+
getALTOText(pageNumber?: number): Promise<string>
|
|
495
856
|
```
|
|
496
857
|
|
|
497
858
|
#### getInitLanguages
|
|
498
859
|
|
|
499
|
-
Returns
|
|
860
|
+
Returns languages used during initialization (for example `deu+eng`).
|
|
500
861
|
|
|
501
862
|
```ts
|
|
502
863
|
getInitLanguages(): Promise<string>
|
|
@@ -504,7 +865,7 @@ getInitLanguages(): Promise<string>
|
|
|
504
865
|
|
|
505
866
|
#### getLoadedLanguages
|
|
506
867
|
|
|
507
|
-
Returns
|
|
868
|
+
Returns languages currently loaded in the engine.
|
|
508
869
|
|
|
509
870
|
```ts
|
|
510
871
|
getLoadedLanguages(): Promise<Language[]>
|
|
@@ -512,7 +873,7 @@ getLoadedLanguages(): Promise<Language[]>
|
|
|
512
873
|
|
|
513
874
|
#### getAvailableLanguages
|
|
514
875
|
|
|
515
|
-
Returns
|
|
876
|
+
Returns languages available from tessdata.
|
|
516
877
|
|
|
517
878
|
```ts
|
|
518
879
|
getAvailableLanguages(): Promise<Language[]>
|
|
@@ -520,7 +881,7 @@ getAvailableLanguages(): Promise<Language[]>
|
|
|
520
881
|
|
|
521
882
|
#### clear
|
|
522
883
|
|
|
523
|
-
Clears internal state.
|
|
884
|
+
Clears internal recognition state/results.
|
|
524
885
|
|
|
525
886
|
```ts
|
|
526
887
|
clear(): Promise<void>
|
|
@@ -528,7 +889,7 @@ clear(): Promise<void>
|
|
|
528
889
|
|
|
529
890
|
#### end
|
|
530
891
|
|
|
531
|
-
|
|
892
|
+
Releases native resources and ends the instance.
|
|
532
893
|
|
|
533
894
|
```ts
|
|
534
895
|
end(): Promise<void>
|
|
@@ -537,7 +898,3 @@ end(): Promise<void>
|
|
|
537
898
|
## License
|
|
538
899
|
|
|
539
900
|
Apache-2.0. See [`LICENSE.md`](/LICENSE.md) for full terms.
|
|
540
|
-
|
|
541
|
-
## Special Thanks
|
|
542
|
-
|
|
543
|
-
- **Stunt3000**
|