label-studio-converter 1.3.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -21,8 +21,29 @@
21
21
  - [Usage](#eyes-usage)
22
22
  - [Library Usage](#library-usage)
23
23
  - [CLI Usage](#cli-usage)
24
+ - [Available Commands](#available-commands)
25
+ - [Command Options Reference](#command-options-reference)
26
+ - [Common Options (All Commands)](#common-options-all-commands)
27
+ - [Enhancement Options (All Commands)](#enhancement-options-all-commands)
28
+ - [toLabelStudio Specific Options](#tolabelstudio-specific-options)
29
+ - [toPPOCR Specific Options](#toppocr-specific-options)
30
+ - [enhance-labelstudio Specific Options](#enhance-labelstudio-specific-options)
31
+ - [Error Handling](#error-handling)
32
+ - [Detailed Command Help](#detailed-command-help)
33
+ - [toLabelStudio Command](#tolabelstudio-command)
34
+ - [toPPOCR Command](#toppocr-command)
35
+ - [enhance-labelstudio Command](#enhance-labelstudio-command)
36
+ - [enhance-ppocr Command](#enhance-ppocr-command)
24
37
  - [Examples](#examples)
25
- - [Enhancement Features](#enhancement-features)
38
+ - [Basic Conversion Examples](#basic-conversion-examples)
39
+ - [toLabelStudio Examples](#tolabelstudio-examples)
40
+ - [toPPOCR Examples](#toppocr-examples)
41
+ - [Recursive Search and Pattern Matching Examples](#recursive-search-and-pattern-matching-examples)
42
+ - [Enhancement Examples](#enhancement-examples)
43
+ - [Shape Normalization Examples](#shape-normalization-examples)
44
+ - [Bounding Box Resizing Examples](#bounding-box-resizing-examples)
45
+ - [Combined Enhancement Examples](#combined-enhancement-examples)
46
+ - [Special Format Examples](#special-format-examples)
26
47
  - [Using generated files with Label Studio](#using-generated-files-with-label-studio)
27
48
  - [Interface setup](#interface-setup)
28
49
  - [Serving annotation files locally](#serving-annotation-files-locally)
@@ -217,13 +238,15 @@ const sorted = sortBoundingBoxes(annotations, 'top-bottom', 'ltr');
217
238
 
218
239
  ### CLI Usage
219
240
 
220
- **Available Commands:**
241
+ #### Available Commands
221
242
 
222
243
  ```bash
223
244
  label-studio-converter --help
224
245
  ```
225
246
 
226
- ```bash
247
+ **Output:**
248
+
249
+ ```
227
250
  USAGE
228
251
  label-studio-converter toLabelStudio [--outDir value] [--fileName value] [--backup] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] [--outputMode value] <args>...
229
252
  label-studio-converter toPPOCR [--outDir value] [--fileName value] [--backup] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] <args>...
@@ -245,11 +268,120 @@ COMMANDS
245
268
  enhance-ppocr Enhance PPOCRLabel files with sorting, normalization, and resizing
246
269
  ```
247
270
 
248
- **Commands:**
271
+ #### Command Options Reference
272
+
273
+ ##### Common Options (All Commands)
274
+
275
+ These options are available for all commands:
276
+
277
+ - **`--outDir <path>`**: Output directory. If not specified, files are saved in
278
+ the same directory as the source files
279
+ - **`--fileName <name>`**: Custom output filename. If not specified, uses source
280
+ filename with format suffix
281
+ - **`--backup` / `--noBackup`**: Create backup of existing files before
282
+ overwriting. Default: `false`
283
+ - **`--recursive` / `--noRecursive`**: Recursively search directories for files.
284
+ Default: `false`
285
+ - **`--filePattern <regex>`**: Regex pattern to match files. Default: `.*\.txt$`
286
+ (PPOCR) or `.*\.json$` (Label Studio)
287
+ - **`-h` / `--help`**: Print help information and exit
288
+
289
+ ##### Enhancement Options (All Commands)
290
+
291
+ These options control bounding box transformations:
292
+
293
+ - **`--sortVertical <order>`**: Sort bounding boxes vertically
294
+ - Options: `none` (default), `top-bottom`, `bottom-top`
295
+ - Useful for organizing annotations by reading order
296
+
297
+ - **`--sortHorizontal <order>`**: Sort bounding boxes horizontally
298
+ - Options: `none` (default), `ltr` (left-to-right), `rtl` (right-to-left)
299
+ - `ltr`: For English, most European languages
300
+ - `rtl`: For Arabic, Hebrew, and SinoNom (classical Vietnamese/Chinese
301
+ vertical text)
302
+
303
+ - **`--normalizeShape <shape>`**: Normalize shapes to standard forms
304
+ - Options: `none` (default), `rectangle`
305
+ - `rectangle`: Converts diamond-like or rotated quadrilaterals to axis-aligned
306
+ rectangles
307
+
308
+ - **`--widthIncrement <pixels>`**: Increase/decrease bounding box width (can be
309
+ negative)
310
+ - Default: `0`
311
+ - Example: `10` adds 10px, `-5` removes 5px
312
+
313
+ - **`--heightIncrement <pixels>`**: Increase/decrease bounding box height (can
314
+ be negative)
315
+ - Default: `0`
316
+ - Example: `15` adds 15px, `-3` removes 3px
317
+
318
+ - **`--precision <decimals>`**: Number of decimal places for coordinates
319
+ - `-1`: Full precision, no rounding (default for Label Studio output)
320
+ - `0`: Round to integers (default for PPOCR output)
321
+ - `1+`: Round to specified decimal places
322
+
323
+ ##### toLabelStudio Specific Options
324
+
325
+ - **`--defaultLabelName <name>`**: Default label name for text annotations.
326
+ Default: `"Text"`
327
+ - **`--toFullJson` / `--noToFullJson`**: Convert to Full OCR Label Studio
328
+ format. Default: `true`
329
+ - **`--createFilePerImage` / `--noCreateFilePerImage`**: Create separate JSON
330
+ file for each image. Default: `false`
331
+ - **`--createFileListForServing` / `--noCreateFileListForServing`**: Create file
332
+ list for serving in Label Studio. Default: `true`
333
+ - **`--fileListName <name>`**: Name of the file list for serving. Default:
334
+ `"files.txt"`
335
+ - **`--baseServerUrl <url>`**: Base server URL for image URLs in file list.
336
+ Default: `"http://localhost:8081"`
337
+ - **`--outputMode <mode>`**: Output format mode
338
+ - Options: `annotations` (default), `predictions`
339
+ - `annotations`: Editable annotations (ground truth)
340
+ - `predictions`: Read-only predictions (pre-annotations)
341
+ - Only available with `--toFullJson`
342
+
343
+ ##### toPPOCR Specific Options
344
+
345
+ - **`--baseImageDir <path>`**: Base directory path to prepend to image filenames
346
+ (e.g., `"ch"` or `"images/ch"`)
347
+ - **`--fileName <name>`**: Output PPOCR file name. Default: `"Label.txt"`
348
+
349
+ ##### enhance-labelstudio Specific Options
350
+
351
+ - **`--outputMode <mode>`**: Output format mode
352
+ - Options: `annotations` (default), `predictions`
353
+ - Only available for Full JSON format files
354
+
355
+ > [!NOTE]
356
+ > **Output Mode Availability:**
357
+ >
358
+ > - `--outputMode` is only available for:
359
+ > - `toLabelStudio` (when using `--toFullJson`)
360
+ > - `enhance-labelstudio` (for Full JSON format only)
361
+ > - Not available for `toPPOCR` or `enhance-ppocr` (PPOCR format doesn't
362
+ > distinguish annotations/predictions)
363
+ > - When using `--outputMode predictions`, the `dt_score` field from PPOCRLabel
364
+ > is mapped to Label Studio's prediction `score` field
365
+
366
+ ##### Error Handling
249
367
 
250
- - `toLabelStudio` - Convert PPOCRLabel files to Label Studio format
368
+ The `toLabelStudio` command handles missing or unreadable image files gracefully:
369
+
370
+ - If an image file cannot be found or read, a warning is logged
371
+ - Default dimensions of **1920×1080** are used as fallback
372
+ - Conversion continues for remaining images without interruption
373
+
374
+ #### Detailed Command Help
375
+
376
+ ##### toLabelStudio Command
251
377
 
252
378
  ```bash
379
+ label-studio-converter toLabelStudio --help
380
+ ```
381
+
382
+ **Output:**
383
+
384
+ ```
253
385
  USAGE
254
386
  label-studio-converter toLabelStudio [--outDir value] [--fileName value] [--backup] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] [--outputMode value] <args>...
255
387
  label-studio-converter toLabelStudio --help
@@ -281,9 +413,15 @@ ARGUMENTS
281
413
  args... Input directories containing PPOCRLabel files
282
414
  ```
283
415
 
284
- - `toPPOCR` - Convert Label Studio files to PPOCRLabel format
416
+ ##### toPPOCR Command
285
417
 
286
418
  ```bash
419
+ label-studio-converter toPPOCR --help
420
+ ```
421
+
422
+ **Output:**
423
+
424
+ ```
287
425
  USAGE
288
426
  label-studio-converter toPPOCR [--outDir value] [--fileName value] [--backup] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] <args>...
289
427
  label-studio-converter toPPOCR --help
@@ -309,10 +447,15 @@ ARGUMENTS
309
447
  args... Input directories containing Label Studio files
310
448
  ```
311
449
 
312
- - `enhance-labelstudio` - Enhance Label Studio files with sorting,
313
- normalization, and resizing
450
+ ##### enhance-labelstudio Command
314
451
 
315
452
  ```bash
453
+ label-studio-converter enhance-labelstudio --help
454
+ ```
455
+
456
+ **Output:**
457
+
458
+ ```
316
459
  USAGE
317
460
  label-studio-converter enhance-labelstudio [--outDir value] [--fileName value] [--backup] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] [--outputMode value] <args>...
318
461
  label-studio-converter enhance-labelstudio --help
@@ -338,9 +481,15 @@ ARGUMENTS
338
481
  args... Input directories containing Label Studio JSON files
339
482
  ```
340
483
 
341
- - `enhance-ppocr` - Enhance PPOCRLabel files with sorting, normalization, and resizing
484
+ ##### enhance-ppocr Command
342
485
 
343
486
  ```bash
487
+ label-studio-converter enhance-ppocr --help
488
+ ```
489
+
490
+ **Output:**
491
+
492
+ ```
344
493
  USAGE
345
494
  label-studio-converter enhance-ppocr [--outDir value] [--fileName value] [--backup] [--sortVertical value] [--sortHorizontal value] [--normalizeShape value] [--widthIncrement value] [--heightIncrement value] [--precision value] [--recursive] [--filePattern value] <args>...
346
495
  label-studio-converter enhance-ppocr --help
@@ -365,19 +514,9 @@ ARGUMENTS
365
514
  args... Input directories containing PPOCRLabel files
366
515
  ```
367
516
 
368
- **Error Handling:**
369
-
370
- The `toLabelStudio` command handles missing or unreadable image files gracefully:
371
-
372
- - If an image file referenced in PPOCRLabel cannot be found or read, a warning is logged
373
- - Default dimensions of **1920×1080** are used as fallback
374
- - Conversion continues for remaining images without interruption
375
-
376
- This allows the conversion process to complete even when some image files are missing from the dataset.
377
-
378
517
  #### Examples
379
518
 
380
- **Basic Conversions:**
519
+ ##### Basic Conversion Examples
381
520
 
382
521
  ```bash
383
522
  # Convert PPOCRLabel files to full Label Studio format
@@ -396,7 +535,7 @@ label-studio-converter toPPOCR ./input-label-studio --baseImageDir images/ch
396
535
  > [!NOTE]
397
536
  > By default, all PPOCRLabel positions are treated as **polygons** in Label Studio.
398
537
 
399
- **toLabelStudio Options:**
538
+ ##### toLabelStudio Examples
400
539
 
401
540
  ```bash
402
541
  # Create separate JSON file for each image
@@ -437,24 +576,7 @@ label-studio-converter toLabelStudio ./input-ppocr \
437
576
  --outputMode annotations
438
577
  ```
439
578
 
440
- > [!IMPORTANT]
441
- > **Output Mode Restrictions:**
442
- >
443
- > - The `--outputMode` flag is only available for:
444
- > - `toLabelStudio` command (when using `--toFullJson`)
445
- > - `enhance-labelstudio` command (for Full JSON format files only)
446
- > - **Not available** for:
447
- > - `toPPOCR` command (PPOCR format doesn't distinguish annotations/predictions)
448
- > - `enhance-ppocr` command (PPOCR format doesn't distinguish annotations/predictions)
449
- > - Min JSON Label Studio format (doesn't support annotations/predictions)
450
- >
451
- > **Prediction Scores:**
452
- >
453
- > - When converting from PPOCRLabel to Label Studio with `--outputMode predictions`, the `dt_score` field from PPOCRLabel is automatically mapped to the prediction `score` field in Label Studio
454
- > - This allows pre-annotation confidence scores to be preserved and displayed in Label Studio
455
- > - Score values should be between 0.0 and 1.0 (confidence percentage)
456
-
457
- **toPPOCR Options:**
579
+ ##### toPPOCR Examples
458
580
 
459
581
  ```bash
460
582
  # Basic conversion with output directory
@@ -472,7 +594,7 @@ label-studio-converter toPPOCR ./input-label-studio \
472
594
  --baseImageDir dataset/images
473
595
  ```
474
596
 
475
- **Recursive Search and Pattern Matching:**
597
+ ##### Recursive Search and Pattern Matching Examples
476
598
 
477
599
  ```bash
478
600
  # Recursively search all subdirectories for .txt files
@@ -497,12 +619,15 @@ label-studio-converter enhance-ppocr ./dataset \
497
619
  > [!NOTE]
498
620
  >
499
621
  > - `--recursive`: Searches all subdirectories for matching files
500
- > - `--filePattern`: Regex pattern to filter files (default: `.*\.txt$` for PPOCR, `.*\.json$` for Label Studio)
501
- > - Patterns are flexible - use any regex, but ensure they match appropriate file types (.txt for PPOCR, .json for Label Studio)
622
+ > - `--filePattern`: Regex pattern to filter files (default: `.*\.txt$` for
623
+ > PPOCR, `.*\.json$` for Label Studio)
624
+ > - Patterns are flexible - use any regex, but ensure they match appropriate
625
+ > file types (.txt for PPOCR, .json for Label Studio)
502
626
 
503
- ### Enhancement Features
627
+ ##### Enhancement Examples
504
628
 
505
- The tool provides powerful enhancement capabilities that can be used standalone or integrated with conversion:
629
+ The tool provides powerful enhancement capabilities that can be used standalone
630
+ or integrated with conversion.
506
631
 
507
632
  **Enhance PPOCRLabel files:**
508
633
 
@@ -536,91 +661,7 @@ label-studio-converter enhance-labelstudio ./data \
536
661
  label-studio-converter enhance-labelstudio ./label-studio-files --outDir ./enhanced
537
662
  ```
538
663
 
539
- **Enhancement Options:**
540
-
541
- - `--sortVertical`: Sort bounding boxes vertically
542
- - `none` (default): No sorting
543
- - `top-bottom`: Sort from top to bottom
544
- - `bottom-top`: Sort from bottom to top
545
- - Example:
546
- ```bash
547
- # Sort annotations from top to bottom
548
- label-studio-converter enhance-ppocr ./data --sortVertical top-bottom
549
- ```
550
-
551
- - `--sortHorizontal`: Sort bounding boxes horizontally
552
- - `none` (default): No sorting
553
- - `ltr`: Sort left to right (useful for English, most European languages)
554
- - `rtl`: Sort right to left (useful for Arabic, Hebrew)
555
- - Example:
556
-
557
- ```bash
558
- # Sort annotations left to right
559
- label-studio-converter enhance-ppocr ./data --sortHorizontal ltr
560
-
561
- # Sort annotations right to left
562
- label-studio-converter enhance-ppocr ./data --sortHorizontal rtl
563
- ```
564
-
565
- - `--normalizeShape`: Normalize shapes
566
- - `none` (default): Keep original shape
567
- - `rectangle`: Convert diamond-like or rotated shapes to axis-aligned rectangles
568
- - Example:
569
- ```bash
570
- # Convert irregular shapes to clean rectangles
571
- label-studio-converter enhance-ppocr ./data --normalizeShape rectangle
572
- ```
573
-
574
- - `--widthIncrement`: Increase/decrease width (pixels, can be negative)
575
- - Default: `0`
576
- - Examples:
577
-
578
- ```bash
579
- # Increase width by 10 pixels
580
- label-studio-converter enhance-ppocr ./data --widthIncrement 10
581
-
582
- # Decrease width by 5 pixels
583
- label-studio-converter enhance-ppocr ./data --widthIncrement -5
584
- ```
585
-
586
- - `--heightIncrement`: Increase/decrease height (pixels, can be negative)
587
- - Default: `0`
588
- - Examples:
589
-
590
- ```bash
591
- # Increase height by 15 pixels
592
- label-studio-converter enhance-ppocr ./data --heightIncrement 15
593
-
594
- # Decrease height by 3 pixels
595
- label-studio-converter enhance-ppocr ./data --heightIncrement -3
596
- ```
597
-
598
- - `--precision`: Control the number of decimal places for coordinate values
599
- - `-1`: Full precision - no rounding, keeps all decimal places (default for Label Studio output)
600
- - Example output: `27.44656917885264`
601
- - `0`: Round to integers (default for PPOCR output)
602
- - Example output: `27`
603
- - `1`: Round to 1 decimal place
604
- - Example output: `27.4`
605
- - `2`: Round to 2 decimal places
606
- - Example output: `27.45`
607
- - Any positive integer for that many decimal places
608
- - Examples:
609
-
610
- ```bash
611
- # Use full precision
612
- label-studio-converter toLabelStudio ./data --precision -1
613
-
614
- # Use integer coordinates
615
- label-studio-converter toPPOCR ./data --precision 0
616
-
617
- # Use 2 decimal places
618
- label-studio-converter enhance-labelstudio ./data --precision 2
619
- ```
620
-
621
- **Conversion with Enhancement:**
622
-
623
- All enhancement options are available in conversion commands:
664
+ **Conversion with enhancements:**
624
665
 
625
666
  ```bash
626
667
  # Convert with enhancements applied during conversion
@@ -637,24 +678,7 @@ label-studio-converter toPPOCR ./input-label-studio \
637
678
  --normalizeShape rectangle
638
679
  ```
639
680
 
640
- **Convert PPOCRLabel files to Label Studio format with one file per image:**
641
-
642
- ```bash
643
- label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --toFullJson --createFilePerImage --sortVertical none --sortHorizontal none
644
- ```
645
-
646
- **Convert PPOCRLabel files to minimal Label Studio format (cannot be used for serving):**
647
-
648
- ```bash
649
- label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --noToFullJson --sortVertical none --sortHorizontal none
650
- ```
651
-
652
- > [!IMPORTANT]
653
- > Minimal Label Studio format cannot be used for serving in Label Studio, as it
654
- > lacks necessary fields such as `id` and `data`. So you can only use minimal
655
- > format for conversion back to PPOCRLabelv2 format or other purposes.
656
-
657
- **Shape Normalization**
681
+ ##### Shape Normalization Examples
658
682
 
659
683
  Convert diamond-like or irregular quadrilateral shapes to axis-aligned
660
684
  rectangles. This is useful when your annotations have irregular shapes that you
@@ -716,10 +740,9 @@ Command:
716
740
 
717
741
  </details>
718
742
 
719
- **Bounding Box Resizing**
743
+ ##### Bounding Box Resizing Examples
720
744
 
721
- Increase or decrease bounding box dimensions while keeping them centered. This
722
- is useful for adjusting annotation margins:
745
+ Increase or decrease bounding box dimensions while keeping them centered:
723
746
 
724
747
  ```bash
725
748
  # Increase width by 10 pixels and height by 20 pixels
@@ -732,44 +755,43 @@ label-studio-converter toLabelStudio ./input-ppocr --outDir ./output --widthIncr
732
755
  label-studio-converter toPPOCR ./input-label-studio --outDir ./output --widthIncrement 10 --heightIncrement 10
733
756
  ```
734
757
 
735
- **Combining Features**
758
+ ##### Combined Enhancement Examples
736
759
 
737
- You can combine shape normalization and resizing:
760
+ Combine multiple enhancements:
738
761
 
739
762
  ```bash
740
763
  # Normalize to rectangle and increase size
741
764
  label-studio-converter toLabelStudio ./input-ppocr --outDir ./output --normalizeShape rectangle --widthIncrement 5 --heightIncrement 5
742
765
 
743
- # Also works with sorting
766
+ # Combine sorting with shape normalization
744
767
  label-studio-converter toLabelStudio ./input-ppocr --outDir ./output --normalizeShape rectangle --widthIncrement 10 --sortVertical top-bottom --sortHorizontal ltr
745
768
  ```
746
769
 
747
- **Number Precision Control**
770
+ ##### Special Format Examples
748
771
 
749
- Control the precision of coordinate values in the output. This is useful for
750
- matching format expectations or reducing file size:
772
+ **Convert with one file per image:**
751
773
 
752
774
  ```bash
753
- # Convert to Label Studio with full precision (default: -1)
754
- label-studio-converter toLabelStudio ./input-ppocr --outDir ./output --precision -1
755
-
756
- # Convert to PPOCR with integer coordinates (default: 0)
757
- label-studio-converter toPPOCR ./input-label-studio --outDir ./output --precision 0
758
-
759
- # Use 2 decimal places for more compact but still precise coordinates
760
- label-studio-converter toLabelStudio ./input-ppocr --outDir ./output --precision 2
775
+ label-studio-converter toLabelStudio ./input-ppocr \
776
+ --outDir ./output-label-studio \
777
+ --defaultLabelName Text \
778
+ --toFullJson \
779
+ --createFilePerImage
761
780
  ```
762
781
 
763
- Precision values:
782
+ **Convert to minimal Label Studio format:**
764
783
 
765
- - `-1`: Full floating-point precision (default for Label Studio output)
766
- - `0`: Round to integers (default for PPOCR output)
767
- - `1+`: Round to specified number of decimal places
784
+ ```bash
785
+ label-studio-converter toLabelStudio ./input-ppocr \
786
+ --outDir ./output-label-studio \
787
+ --defaultLabelName Text \
788
+ --noToFullJson
789
+ ```
768
790
 
769
- > [!NOTE]
770
- > The default precision matches typical format conventions: Label Studio uses
771
- > full precision for percentage-based coordinates, while PPOCR format typically
772
- > uses integer pixel coordinates.
791
+ > [!IMPORTANT]
792
+ > Minimal Label Studio format cannot be used for serving in Label Studio, as it
793
+ > lacks necessary fields such as `id` and `data`. You can only use minimal
794
+ > format for conversion back to PPOCRLabelv2 format or other purposes.
773
795
 
774
796
  ### Using generated files with Label Studio
775
797