@luii/node-tesseract-ocr 2.1.0 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -17,7 +17,6 @@ Native C++ addon for Node.js that exposes Tesseract OCR (`libtesseract-dev`) to
17
17
  - [Types](#types)
18
18
  - [Tesseract API](#tesseract-api)
19
19
  - [License](#license)
20
- - [Special Thanks](#special-thanks)
21
20
 
22
21
  ## Features
23
22
 
@@ -31,7 +30,7 @@ Native C++ addon for Node.js that exposes Tesseract OCR (`libtesseract-dev`) to
31
30
  - nodejs
32
31
  - node-addon-api
33
32
  - c++ build toolchain (e.g. build-essentials)
34
- - libtesseract-dev
33
+ - libtesseract-dev (exactly `5.5.2`)
35
34
  - libleptonica-dev
36
35
  - Tesseract training data (eng, deu, ...) or let the library handle that
37
36
 
@@ -44,6 +43,15 @@ sudo apt update
44
43
  sudo apt install -y nodejs npm build-essential pkg-config libtesseract-dev libleptonica-dev tesseract-ocr-eng
45
44
  ```
46
45
 
46
+ Verify the required Tesseract version:
47
+
48
+ ```bash
49
+ pkg-config --modversion tesseract
50
+ # expected: 5.5.2
51
+ ```
52
+
53
+ If your distro ships another version, install/build `tesseract 5.5.2` and ensure `pkg-config` resolves that installation.
54
+
47
55
  ```bash
48
56
  git clone git@github.com:luii/node-tesseract-ocr.git
49
57
  cd node-tesseract-ocr
@@ -250,6 +258,18 @@ Full list of page segmentation modes from Tesseract.
250
258
  | `bottom` | `number` | No | n/a | Bottom coordinate of current element bbox. |
251
259
  | `left` | `number` | No | n/a | Left coordinate of current element bbox. |
252
260
 
261
+ #### `TesseractProcessPagesStatus`
262
+
263
+ | Field | Type | Optional | Default | Description |
264
+ | ----------------- | --------- | -------- | ------- | ----------------------------------------------------- |
265
+ | `active` | `boolean` | No | n/a | Whether a multipage session is currently active. |
266
+ | `healthy` | `boolean` | No | n/a | Whether the renderer is healthy. |
267
+ | `processedPages` | `number` | No | n/a | Number of pages already processed in this session. |
268
+ | `nextPageIndex` | `number` | No | n/a | Zero-based index that will be used for the next page. |
269
+ | `outputBase` | `string` | No | n/a | Effective output base used by the PDF renderer. |
270
+ | `timeoutMillisec` | `number` | No | n/a | Timeout per page in milliseconds (`0` = unlimited). |
271
+ | `textonly` | `boolean` | No | n/a | Whether text-only PDF mode is enabled. |
272
+
253
273
  #### `DetectOrientationScriptResult`
254
274
 
255
275
  | Field | Type | Optional | Default | Description |
@@ -269,13 +289,92 @@ new Tesseract();
269
289
 
270
290
  Creates a new Tesseract instance.
271
291
 
292
+ #### Initialization Requirements
293
+
294
+ Call `init(...)` once before using OCR/engine-dependent methods.
295
+
296
+ Methods that do **not** require `init(...)`:
297
+
298
+ - `version()`
299
+ - `isInitialized()`
300
+ - `setInputName(...)`
301
+ - `getInputName()`
302
+ - `abortProcessPages()`
303
+ - `getProcessPagesStatus()`
304
+ - `document.abort()`
305
+ - `document.status()`
306
+ - `init(...)`
307
+ - `end()`
308
+
309
+ Methods that **require** `init(...)`:
310
+
311
+ - `setInputImage(...)`
312
+ - `getInputImage()`
313
+ - `getSourceYResolution()`
314
+ - `getDataPath()`
315
+ - `setOutputName(...)`
316
+ - `clearPersistentCache()`
317
+ - `clearAdaptiveClassifier()`
318
+ - `setImage(...)`
319
+ - `getThresholdedImage()`
320
+ - `getThresholdedImageScaleFactor()`
321
+ - `setPageMode(...)`
322
+ - `setRectangle(...)`
323
+ - `setSourceResolution(...)`
324
+ - `recognize(...)`
325
+ - `detectOrientationScript()`
326
+ - `meanTextConf()`
327
+ - `allWordConfidences()`
328
+ - `getPAGEText(...)`
329
+ - `getLSTMBoxText(...)`
330
+ - `getBoxText(...)`
331
+ - `getWordStrBoxText(...)`
332
+ - `getOSDText(...)`
333
+ - `getUTF8Text()`
334
+ - `getHOCRText(...)`
335
+ - `getTSVText(...)`
336
+ - `getUNLVText()`
337
+ - `getALTOText(...)`
338
+ - `getInitLanguages()`
339
+ - `getLoadedLanguages()`
340
+ - `getAvailableLanguages()`
341
+ - `setDebugVariable(...)`
342
+ - `setVariable(...)`
343
+ - `getIntVariable(...)`
344
+ - `getBoolVariable(...)`
345
+ - `getDoubleVariable(...)`
346
+ - `getStringVariable(...)`
347
+ - `clear()`
348
+ - `beginProcessPages(...)`
349
+ - `addProcessPage(...)`
350
+ - `finishProcessPages()`
351
+ - `document.begin(...)`
352
+ - `document.addPage(...)`
353
+ - `document.finish()`
354
+
355
+ #### version
356
+
357
+ Returns the currently loaded libtesseract version string.
358
+
359
+ ```ts
360
+ version(): Promise<string>
361
+ ```
362
+
363
+ #### isInitialized
364
+
365
+ Returns whether `init(...)` has already completed successfully and has not been reset via `end()`.
366
+
367
+ ```ts
368
+ isInitialized(): Promise<boolean>
369
+ ```
370
+
272
371
  #### init
273
372
 
274
- Initializes Tesseract with language, engine mode, configs, and variables.
373
+ Initializes the OCR engine with language, OEM, configs, and variables.
275
374
 
276
- | Name | Type | Optional | Default | Description |
277
- | ------- | ----------------------------------------------- | -------- | ------- | ----------------------- |
278
- | options | [`TesseractInitOptions`](#tesseractinitoptions) | No | n/a | Initialization options. |
375
+ | Name | Type | Optional | Default | Description |
376
+ | --------- | ----------------------------------------------- | -------- | ------- | ----------------------- |
377
+ | `options` | [`TesseractInitOptions`](#tesseractinitoptions) | No | n/a | Initialization options. |
279
378
 
280
379
  ```ts
281
380
  init(options: TesseractInitOptions): Promise<void>
@@ -283,56 +382,282 @@ init(options: TesseractInitOptions): Promise<void>
283
382
 
284
383
  #### initForAnalysePage
285
384
 
286
- Initializes for layout analysis only.
385
+ Initializes the engine in analysis-only mode.
287
386
 
288
387
  ```ts
289
388
  initForAnalysePage(): Promise<void>
290
389
  ```
291
390
 
292
- #### analysePage
391
+ #### analyseLayout
392
+
393
+ Runs page layout analysis on the current image.
394
+
395
+ | Name | Type | Optional | Default | Description |
396
+ | ------------------- | --------- | -------- | ------- | ------------------------------------------- |
397
+ | `mergeSimilarWords` | `boolean` | No | n/a | Merge similar words during layout analysis. |
398
+
399
+ ```ts
400
+ analyseLayout(mergeSimilarWords: boolean): Promise<void>
401
+ ```
402
+
403
+ #### setInputName
404
+
405
+ Sets the source/input name used by renderer/training APIs.
406
+
407
+ | Name | Type | Optional | Default | Description |
408
+ | ----------- | -------- | -------- | ------- | ------------------------------------------ |
409
+ | `inputName` | `string` | No | n/a | Input name used by renderer/training APIs. |
410
+
411
+ ```ts
412
+ setInputName(inputName: string): Promise<void>
413
+ ```
414
+
415
+ #### getInputName
416
+
417
+ Returns the current input name from engine state.
418
+
419
+ ```ts
420
+ getInputName(): Promise<string>
421
+ ```
422
+
423
+ #### setInputImage
424
+
425
+ Sets the encoded source image buffer.
426
+
427
+ | Name | Type | Optional | Default | Description |
428
+ | -------- | -------- | -------- | ------- | ---------------------------- |
429
+ | `buffer` | `Buffer` | No | n/a | Encoded source image buffer. |
430
+
431
+ ```ts
432
+ setInputImage(buffer: Buffer): Promise<void>
433
+ ```
434
+
435
+ #### getInputImage
293
436
 
294
- Runs the layout analysis.
437
+ Returns the current input image bytes.
295
438
 
296
- | Name | Type | Optional | Default | Description |
297
- | ----------------- | ------- | -------- | ------- | ------------------------------- |
298
- | mergeSimilarWords | boolean | No | n/a | Whether to merge similar words. |
439
+ ```ts
440
+ getInputImage(): Promise<Buffer>
441
+ ```
442
+
443
+ #### getSourceYResolution
444
+
445
+ Returns source image Y resolution (DPI).
299
446
 
300
447
  ```ts
301
- analysePage(mergeSimilarWords: boolean): Promise<void>
448
+ getSourceYResolution(): Promise<number>
449
+ ```
450
+
451
+ #### getDataPath
452
+
453
+ Returns the active tessdata path from the engine.
454
+
455
+ ```ts
456
+ getDataPath(): Promise<string>
457
+ ```
458
+
459
+ #### setOutputName
460
+
461
+ Sets the output base name for renderer-based outputs.
462
+
463
+ | Name | Type | Optional | Default | Description |
464
+ | ------------ | -------- | -------- | ------- | -------------------------------------- |
465
+ | `outputName` | `string` | No | n/a | Output base name for renderer outputs. |
466
+
467
+ ```ts
468
+ setOutputName(outputName: string): Promise<void>
469
+ ```
470
+
471
+ #### clearPersistentCache
472
+
473
+ Clears global library-level caches (for example dictionaries).
474
+
475
+ ```ts
476
+ clearPersistentCache(): Promise<void>
477
+ ```
478
+
479
+ #### clearAdaptiveClassifier
480
+
481
+ Cleans adaptive classifier state between pages/documents.
482
+
483
+ ```ts
484
+ clearAdaptiveClassifier(): Promise<void>
485
+ ```
486
+
487
+ #### setImage
488
+
489
+ Sets the image used by OCR recognition.
490
+
491
+ | Name | Type | Optional | Default | Description |
492
+ | -------- | -------- | -------- | ------- | ------------------------ |
493
+ | `buffer` | `Buffer` | No | n/a | Image data used for OCR. |
494
+
495
+ ```ts
496
+ setImage(buffer: Buffer): Promise<void>
497
+ ```
498
+
499
+ #### getThresholdedImage
500
+
501
+ Returns thresholded image bytes from Tesseract internals.
502
+
503
+ ```ts
504
+ getThresholdedImage(): Promise<Buffer>
505
+ ```
506
+
507
+ #### getThresholdedImageScaleFactor
508
+
509
+ Returns scale factor for thresholded/component images.
510
+
511
+ ```ts
512
+ getThresholdedImageScaleFactor(): Promise<number>
302
513
  ```
303
514
 
304
515
  #### setPageMode
305
516
 
306
- Sets the page segmentation mode.
517
+ Sets the page segmentation mode (PSM).
307
518
 
308
- | Name | Type | Optional | Default | Description |
309
- | ---- | ------------------------------------------------ | -------- | ------- | ----------------------- |
310
- | psm | [`PageSegmentationMode`](#pagesegmentationmodes) | No | n/a | Page segmentation mode. |
519
+ | Name | Type | Optional | Default | Description |
520
+ | ----- | ----------------------------------------------- | -------- | ------- | ----------------------- |
521
+ | `psm` | [`PageSegmentationMode`](#pagesegmentationmode) | No | n/a | Page segmentation mode. |
311
522
 
312
523
  ```ts
313
524
  setPageMode(psm: PageSegmentationMode): Promise<void>
314
525
  ```
315
526
 
527
+ #### setRectangle
528
+
529
+ Restricts recognition to the given rectangle.
530
+
531
+ | Name | Type | Optional | Default | Description |
532
+ | --------- | --------------------------------------------------------------- | -------- | ------- | ------------------ |
533
+ | `options` | [`TesseractSetRectangleOptions`](#tesseractsetrectangleoptions) | No | n/a | Region definition. |
534
+
535
+ ```ts
536
+ setRectangle(options: TesseractSetRectangleOptions): Promise<void>
537
+ ```
538
+
539
+ #### setSourceResolution
540
+
541
+ Sets the source resolution in PPI.
542
+
543
+ | Name | Type | Optional | Default | Description |
544
+ | ----- | -------- | -------- | ------- | ------------------------- |
545
+ | `ppi` | `number` | No | n/a | Source resolution in PPI. |
546
+
547
+ ```ts
548
+ setSourceResolution(ppi: number): Promise<void>
549
+ ```
550
+
551
+ #### document
552
+
553
+ Facade for multipage PDF/document processing lifecycle.
554
+
555
+ ```ts
556
+ document: {
557
+ begin(options: TesseractBeginProcessPagesOptions): Promise<void>;
558
+ addPage(buffer: Buffer, filename?: string): Promise<void>;
559
+ finish(): Promise<string>;
560
+ abort(): Promise<void>;
561
+ status(): Promise<TesseractProcessPagesStatus>;
562
+ }
563
+ ```
564
+
565
+ #### document.begin
566
+
567
+ Starts a multipage processing session.
568
+
569
+ | Name | Type | Optional | Default | Description |
570
+ | --------- | ----------------------------------- | -------- | ------- | --------------------------- |
571
+ | `options` | `TesseractBeginProcessPagesOptions` | No | n/a | Multipage renderer options. |
572
+
573
+ ```ts
574
+ document.begin(options: TesseractBeginProcessPagesOptions): Promise<void>
575
+ ```
576
+
577
+ #### document.addPage
578
+
579
+ Adds an encoded page to the active session.
580
+
581
+ | Name | Type | Optional | Default | Description |
582
+ | ---------- | -------- | -------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
583
+ | `buffer` | `Buffer` | No | n/a | Encoded page image buffer. |
584
+ | `filename` | `string` | Yes | `undefined` | Optional source filename/path passed to Tesseract `ProcessPage` for this page. Tesseract/Leptonica may open this file internally and use it as the source image for parts of PDF rendering. If output pages look wrong (for example inverted or visually corrupted), pass a real image path here to force a stable source image path for that page. |
585
+
586
+ ```ts
587
+ document.addPage(buffer: Buffer, filename?: string): Promise<void>
588
+ ```
589
+
590
+ #### document.finish
591
+
592
+ Finalizes the active session and returns output PDF path.
593
+
594
+ ```ts
595
+ document.finish(): Promise<string>
596
+ ```
597
+
598
+ #### document.abort
599
+
600
+ Aborts and resets the active multipage session.
601
+
602
+ ```ts
603
+ document.abort(): Promise<void>
604
+ ```
605
+
606
+ #### document.status
607
+
608
+ Returns the current multipage session status (active flag, page counters, and effective renderer settings).
609
+
610
+ ```ts
611
+ document.status(): Promise<TesseractProcessPagesStatus>
612
+ ```
613
+
614
+ #### getProcessPagesStatus
615
+
616
+ Returns the current multipage session status from the instance API.
617
+
618
+ ```ts
619
+ getProcessPagesStatus(): Promise<TesseractProcessPagesStatus>
620
+ ```
621
+
622
+ #### setDebugVariable
623
+
624
+ Sets a debug configuration variable.
625
+
626
+ | Name | Type | Optional | Default | Description |
627
+ | ------- | -------------------------------------------------------------- | -------- | ------- | --------------- |
628
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
629
+ | `value` | `SetVariableConfigVariables[keyof SetVariableConfigVariables]` | No | n/a | Variable value. |
630
+
631
+ ```ts
632
+ setDebugVariable(
633
+ name: keyof SetVariableConfigVariables,
634
+ value: SetVariableConfigVariables[keyof SetVariableConfigVariables],
635
+ ): Promise<boolean>
636
+ ```
637
+
316
638
  #### setVariable
317
639
 
318
- Sets a Tesseract variable. Returns `false` if the lookup failed.
640
+ Sets a regular configuration variable.
319
641
 
320
- | Name | Type | Optional | Default | Description |
321
- | ----- | -------------------------------------------------------------- | -------- | ------- | --------------- |
322
- | name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
323
- | value | SetVariableConfigVariables\[keyof SetVariableConfigVariables\] | No | n/a | Variable value. |
642
+ | Name | Type | Optional | Default | Description |
643
+ | ------- | -------------------------------------------------------------- | -------- | ------- | --------------- |
644
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
645
+ | `value` | `SetVariableConfigVariables[keyof SetVariableConfigVariables]` | No | n/a | Variable value. |
324
646
 
325
647
  ```ts
326
- setVariable(name: keyof SetVariableConfigVariables, value: SetVariableConfigVariables[keyof SetVariableConfigVariables]): Promise<boolean>
648
+ setVariable(
649
+ name: keyof SetVariableConfigVariables,
650
+ value: SetVariableConfigVariables[keyof SetVariableConfigVariables],
651
+ ): Promise<boolean>
327
652
  ```
328
653
 
329
654
  #### getIntVariable
330
655
 
331
- Reads an integer variable from Tesseract.
656
+ Reads a configuration variable as integer.
332
657
 
333
- | Name | Type | Optional | Default | Description |
334
- | ---- | -------------------------------- | -------- | ------- | -------------- |
335
- | name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
658
+ | Name | Type | Optional | Default | Description |
659
+ | ------ | ---------------------------------- | -------- | ------- | -------------- |
660
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
336
661
 
337
662
  ```ts
338
663
  getIntVariable(name: keyof SetVariableConfigVariables): Promise<number>
@@ -340,11 +665,11 @@ getIntVariable(name: keyof SetVariableConfigVariables): Promise<number>
340
665
 
341
666
  #### getBoolVariable
342
667
 
343
- Reads a boolean variable from Tesseract. Returns `0` or `1`.
668
+ Reads a configuration variable as boolean (`0`/`1`).
344
669
 
345
- | Name | Type | Optional | Default | Description |
346
- | ---- | -------------------------------- | -------- | ------- | -------------- |
347
- | name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
670
+ | Name | Type | Optional | Default | Description |
671
+ | ------ | ---------------------------------- | -------- | ------- | -------------- |
672
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
348
673
 
349
674
  ```ts
350
675
  getBoolVariable(name: keyof SetVariableConfigVariables): Promise<number>
@@ -352,11 +677,11 @@ getBoolVariable(name: keyof SetVariableConfigVariables): Promise<number>
352
677
 
353
678
  #### getDoubleVariable
354
679
 
355
- Reads a double variable from Tesseract.
680
+ Reads a configuration variable as double.
356
681
 
357
- | Name | Type | Optional | Default | Description |
358
- | ---- | -------------------------------- | -------- | ------- | -------------- |
359
- | name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
682
+ | Name | Type | Optional | Default | Description |
683
+ | ------ | ---------------------------------- | -------- | ------- | -------------- |
684
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
360
685
 
361
686
  ```ts
362
687
  getDoubleVariable(name: keyof SetVariableConfigVariables): Promise<number>
@@ -364,139 +689,175 @@ getDoubleVariable(name: keyof SetVariableConfigVariables): Promise<number>
364
689
 
365
690
  #### getStringVariable
366
691
 
367
- Reads a string variable from Tesseract.
692
+ Reads a configuration variable as string.
368
693
 
369
- | Name | Type | Optional | Default | Description |
370
- | ---- | -------------------------------- | -------- | ------- | -------------- |
371
- | name | keyof SetVariableConfigVariables | No | n/a | Variable name. |
694
+ | Name | Type | Optional | Default | Description |
695
+ | ------ | ---------------------------------- | -------- | ------- | -------------- |
696
+ | `name` | `keyof SetVariableConfigVariables` | No | n/a | Variable name. |
372
697
 
373
698
  ```ts
374
699
  getStringVariable(name: keyof SetVariableConfigVariables): Promise<string>
375
700
  ```
376
701
 
377
- #### setImage
702
+ #### recognize
378
703
 
379
- Sets the image from a Buffer.
704
+ Runs OCR recognition (optionally with progress callback).
380
705
 
381
- | Name | Type | Optional | Default | Description |
382
- | ------ | ------ | -------- | ------- | ----------- |
383
- | buffer | Buffer | No | n/a | Image data. |
706
+ | Name | Type | Optional | Default | Description |
707
+ | ------------------ | ------------------------------------- | -------- | ----------- | ---------------------- |
708
+ | `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | OCR progress callback. |
384
709
 
385
710
  ```ts
386
- setImage(buffer: Buffer): Promise<void>
711
+ recognize(progressCallback?: (info: ProgressChangedInfo) => void): Promise<void>
387
712
  ```
388
713
 
389
- #### setRectangle
714
+ #### detectOrientationScript
390
715
 
391
- Sets the image region using coordinates and size.
716
+ Detects orientation and script with confidence values.
392
717
 
393
- | Name | Type | Optional | Default | Description |
394
- | ------- | --------------------------------------------------------------- | -------- | ------- | ------------------ |
395
- | options | [`TesseractSetRectangleOptions`](#tesseractsetrectangleoptions) | No | n/a | Region definition. |
718
+ ```ts
719
+ detectOrientationScript(): Promise<DetectOrientationScriptResult>
720
+ ```
721
+
722
+ #### meanTextConf
723
+
724
+ Returns mean text confidence.
396
725
 
397
726
  ```ts
398
- setRectangle(options: TesseractSetRectangleOptions): Promise<void>
727
+ meanTextConf(): Promise<number>
399
728
  ```
400
729
 
401
- #### setSourceResolution
730
+ #### allWordConfidences
402
731
 
403
- Sets the source resolution in PPI.
732
+ Returns all word confidences for current recognition result.
733
+
734
+ ```ts
735
+ allWordConfidences(): Promise<number[]>
736
+ ```
737
+
738
+ #### getPAGEText
404
739
 
405
- | Name | Type | Optional | Default | Description |
406
- | ---- | ------ | -------- | ------- | ---------------- |
407
- | ppi | number | No | n/a | Pixels per inch. |
740
+ Returns PAGE XML output.
741
+
742
+ | Name | Type | Optional | Default | Description |
743
+ | ------------------ | ------------------------------------- | -------- | ----------- | ---------------------------------- |
744
+ | `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | PAGE generation progress callback. |
745
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
408
746
 
409
747
  ```ts
410
- setSourceResolution(ppi: number): Promise<void>
748
+ getPAGEText(
749
+ progressCallback?: (info: ProgressChangedInfo) => void,
750
+ pageNumber?: number,
751
+ ): Promise<string>
411
752
  ```
412
753
 
413
- #### recognize
754
+ #### getLSTMBoxText
414
755
 
415
- Starts OCR and calls the callback with progress info.
756
+ Returns LSTM box output.
416
757
 
417
- | Name | Type | Optional | Default | Description |
418
- | ---------------- | ------------------------------------------------------------- | -------- | ------- | ------------------ |
419
- | progressCallback | (info: [`ProgressChangedInfo`](#progresschangedinfo)) => void | No | n/a | Progress callback. |
758
+ | Name | Type | Optional | Default | Description |
759
+ | ------------ | -------- | -------- | ----------- | -------------------- |
760
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
420
761
 
421
762
  ```ts
422
- recognize(progressCallback: (info: ProgressChangedInfo) => void): Promise<void>
763
+ getLSTMBoxText(pageNumber?: number): Promise<string>
423
764
  ```
424
765
 
425
- #### getUTF8Text
766
+ #### getBoxText
767
+
768
+ Returns classic box output.
426
769
 
427
- Returns recognized text as UTF-8.
770
+ | Name | Type | Optional | Default | Description |
771
+ | ------------ | -------- | -------- | ----------- | -------------------- |
772
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
428
773
 
429
774
  ```ts
430
- getUTF8Text(): Promise<string>
775
+ getBoxText(pageNumber?: number): Promise<string>
431
776
  ```
432
777
 
433
- #### getHOCRText
778
+ #### getWordStrBoxText
434
779
 
435
- Returns HOCR output. Optional progress callback and page number.
780
+ Returns WordStr box output.
436
781
 
437
- | Name | Type | Optional | Default | Description |
438
- | ---------------- | ------------------------------------------------------------- | -------- | --------- | ---------------------- |
439
- | progressCallback | (info: [`ProgressChangedInfo`](#progresschangedinfo)) => void | Yes | undefined | Progress callback. |
440
- | pageNumber | number | Yes | undefined | Page number (0-based). |
782
+ | Name | Type | Optional | Default | Description |
783
+ | ------------ | -------- | -------- | ----------- | -------------------- |
784
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
441
785
 
442
786
  ```ts
443
- getHOCRText(
444
- progressCallback?: (info: ProgressChangedInfo) => void,
445
- pageNumber?: number,
446
- ): Promise<string>
787
+ getWordStrBoxText(pageNumber?: number): Promise<string>
447
788
  ```
448
789
 
449
- #### getTSVText
790
+ #### getOSDText
450
791
 
451
- Returns TSV output.
792
+ Returns OSD text output.
793
+
794
+ | Name | Type | Optional | Default | Description |
795
+ | ------------ | -------- | -------- | ----------- | -------------------- |
796
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
452
797
 
453
798
  ```ts
454
- getTSVText(): Promise<string>
799
+ getOSDText(pageNumber?: number): Promise<string>
455
800
  ```
456
801
 
457
- #### getUNLVText
802
+ #### getUTF8Text
458
803
 
459
- Returns UNLV output.
804
+ Returns recognized UTF-8 text.
460
805
 
461
806
  ```ts
462
- getUNLVText(): Promise<string>
807
+ getUTF8Text(): Promise<string>
463
808
  ```
464
809
 
465
- #### getALTOText
810
+ #### getHOCRText
466
811
 
467
- Returns ALTO output. Optional progress callback and page number.
812
+ Returns hOCR output.
468
813
 
469
- | Name | Type | Optional | Default | Description |
470
- | ---------------- | ------------------------------------------------------------- | -------- | --------- | ---------------------- |
471
- | progressCallback | (info: [`ProgressChangedInfo`](#progresschangedinfo)) => void | Yes | undefined | Progress callback. |
472
- | pageNumber | number | Yes | undefined | Page number (0-based). |
814
+ | Name | Type | Optional | Default | Description |
815
+ | ------------------ | ------------------------------------- | -------- | ----------- | ---------------------------------- |
816
+ | `progressCallback` | `(info: ProgressChangedInfo) => void` | Yes | `undefined` | hOCR generation progress callback. |
817
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
473
818
 
474
819
  ```ts
475
- getALTOText(
820
+ getHOCRText(
476
821
  progressCallback?: (info: ProgressChangedInfo) => void,
477
822
  pageNumber?: number,
478
823
  ): Promise<string>
479
824
  ```
480
825
 
481
- #### detectOrientationScript
826
+ #### getTSVText
827
+
828
+ Returns TSV output.
482
829
 
483
- Detects orientation and script with confidences. Returns [`DetectOrientationScriptResult`](#detectorientationscriptresult).
830
+ | Name | Type | Optional | Default | Description |
831
+ | ------------ | -------- | -------- | ----------- | -------------------- |
832
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
484
833
 
485
834
  ```ts
486
- detectOrientationScript(): Promise<DetectOrientationScriptResult>
835
+ getTSVText(pageNumber?: number): Promise<string>
487
836
  ```
488
837
 
489
- #### meanTextConf
838
+ #### getUNLVText
490
839
 
491
- Mean text confidence (0-100).
840
+ Returns UNLV output.
492
841
 
493
842
  ```ts
494
- meanTextConf(): Promise<number>
843
+ getUNLVText(): Promise<string>
844
+ ```
845
+
846
+ #### getALTOText
847
+
848
+ Returns ALTO XML output.
849
+
850
+ | Name | Type | Optional | Default | Description |
851
+ | ------------ | -------- | -------- | ----------- | -------------------- |
852
+ | `pageNumber` | `number` | Yes | `undefined` | 0-based page number. |
853
+
854
+ ```ts
855
+ getALTOText(pageNumber?: number): Promise<string>
495
856
  ```
496
857
 
497
858
  #### getInitLanguages
498
859
 
499
- Returns [`Language`](#availablelanguages) in raw Tesseract format (e.g. "deu+eng").
860
+ Returns languages used during initialization (for example `deu+eng`).
500
861
 
501
862
  ```ts
502
863
  getInitLanguages(): Promise<string>
@@ -504,7 +865,7 @@ getInitLanguages(): Promise<string>
504
865
 
505
866
  #### getLoadedLanguages
506
867
 
507
- Returns [`Language[]`](#availablelanguages) in raw Tesseract format.
868
+ Returns languages currently loaded in the engine.
508
869
 
509
870
  ```ts
510
871
  getLoadedLanguages(): Promise<Language[]>
@@ -512,7 +873,7 @@ getLoadedLanguages(): Promise<Language[]>
512
873
 
513
874
  #### getAvailableLanguages
514
875
 
515
- Returns [`Language[]`](#availablelanguages) in raw Tesseract format.
876
+ Returns languages available from tessdata.
516
877
 
517
878
  ```ts
518
879
  getAvailableLanguages(): Promise<Language[]>
@@ -520,7 +881,7 @@ getAvailableLanguages(): Promise<Language[]>
520
881
 
521
882
  #### clear
522
883
 
523
- Clears internal state.
884
+ Clears internal recognition state/results.
524
885
 
525
886
  ```ts
526
887
  clear(): Promise<void>
@@ -528,7 +889,7 @@ clear(): Promise<void>
528
889
 
529
890
  #### end
530
891
 
531
- Ends the instance.
892
+ Releases native resources and ends the instance.
532
893
 
533
894
  ```ts
534
895
  end(): Promise<void>
@@ -537,7 +898,3 @@ end(): Promise<void>
537
898
  ## License
538
899
 
539
900
  Apache-2.0. See [`LICENSE.md`](/LICENSE.md) for full terms.
540
-
541
- ## Special Thanks
542
-
543
- - **Stunt3000**