label-studio-converter 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md ADDED
@@ -0,0 +1,33 @@
1
+ # Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
2
+
3
+ By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
4
+
5
+ ## Section 1 – Definitions.
6
+
7
+ - **Licensed Material**: the artistic or literary work, database, or other material to which the Licensor applied this Public License.
8
+ - **Licensor**: the individual(s) or entity(ies) granting rights under this Public License.
9
+ - **You**: the individual or entity exercising the Licensed Rights under this Public License.
10
+ - **Share**: to provide material to the public by any means or process.
11
+ - **Adapted Material**: material derived from or modified based on the Licensed Material.
12
+ - **NonCommercial**: not primarily intended for or directed towards commercial advantage or monetary compensation.
13
+
14
+ ## Section 2 – Scope.
15
+
16
+ ### 2.1 License Grant
17
+
18
+ Subject to the terms of this Public License, the Licensor grants You a worldwide, royalty-free, non-exclusive, irrevocable license to:
19
+
20
+ - **Share**: copy and redistribute the Licensed Material in any medium or format.
21
+ - **Adapt**: remix, transform, and build upon the Licensed Material.
22
+
23
+ ### 2.2 Conditions
24
+
25
+ - **Attribution (BY)**: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
26
+ - **NonCommercial (NC)**: You may **not** use the material for commercial purposes.
27
+ - **ShareAlike (SA)**: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
28
+
29
+ ## Section 3 – Disclaimer.
30
+
31
+ The Licensed Material is provided "as-is" without any warranties or guarantees. The Licensor is not liable for any damages arising from its use.
32
+
33
+ **Full License Text:** [Creative Commons License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode)
package/README.md ADDED
@@ -0,0 +1,351 @@
1
+ <div align="center">
2
+
3
+ <h1>label-studio-converter</h1>
4
+
5
+ <p>
6
+ Convert between Label Studio OCR format and PPOCRLabelv2 format
7
+ </p>
8
+
9
+ </div>
10
+
11
+ <br />
12
+
13
+ <!-- Table of Contents -->
14
+
15
+ # :notebook_with_decorative_cover: Table of Contents
16
+
17
+ - [Getting Started](#toolbox-getting-started)
18
+ - [Prerequisites](#bangbang-prerequisites)
19
+ - [Run Locally](#running-run-locally)
20
+ - [Usage](#eyes-usage)
21
+ - [Basic Usage](#basic-usage)
22
+ - [CLI Usage](#cli-usage)
23
+ - [Using generated files with Label Studio](#using-generated-files-with-label-studio)
24
+ - [Interface setup](#interface-setup)
25
+ - [Serving annotation files locally](#serving-annotation-files-locally)
26
+ - [Using generated files with PPOCRLabelv2](#using-generated-files-with-ppocrlabelv2)
27
+ - [Roadmap](#compass-roadmap)
28
+ - [Contributing](#wave-contributing)
29
+ - [Code of Conduct](#scroll-code-of-conduct)
30
+ - [License](#warning-license)
31
+ - [Contact](#handshake-contact)
32
+ - [Acknowledgements](#gem-acknowledgements)
33
+
34
+ <!-- Getting Started -->
35
+
36
+ ## :toolbox: Getting Started
37
+
38
+ <!-- Prerequisites -->
39
+
40
+ ### :bangbang: Prerequisites
41
+
42
+ This project uses [pnpm](https://pnpm.io/) as package manager:
43
+
44
+ ```bash
45
+ npm install --global pnpm
46
+ ```
47
+
48
+ <!-- Run Locally -->
49
+
50
+ ### :running: Run Locally
51
+
52
+ Clone the project:
53
+
54
+ ```bash
55
+ git clone https://github.com/DuckyMomo20012/label-studio-converter.git
56
+ ```
57
+
58
+ Go to the project directory:
59
+
60
+ ```bash
61
+ cd label-studio-converter
62
+ ```
63
+
64
+ Install dependencies:
65
+
66
+ ```bash
67
+ pnpm install
68
+ ```
69
+
70
+ <!-- Usage -->
71
+
72
+ ## :eyes: Usage
73
+
74
+ ### Basic Usage
75
+
76
+ ```ts
77
+ import { toLabelStudio, toPPOCR } from 'label-studio-converter';
78
+
79
+ // Convert PPOCRLabel files to Label Studio format
80
+ await toLabelStudio({
81
+ inputDirs: ['./input-ppocr'],
82
+ outDir: './output-label-studio',
83
+ defaultLabelName: 'Text',
84
+ toFullJson: true,
85
+ createFilePerImage: false,
86
+ createFileListForServing: true,
87
+ fileListName: 'files.txt',
88
+ baseServerUrl: 'http://localhost:8081',
89
+ sortVertical: 'none',
90
+ sortHorizontal: 'none',
91
+ });
92
+
93
+ // Convert Label Studio files to PPOCRLabel format
94
+ await toPPOCR({
95
+ inputDirs: ['./input-label-studio'],
96
+ outDir: './output-ppocr',
97
+ fileName: 'Label.txt',
98
+ baseImageDir: 'images/ch',
99
+ sortVertical: 'none',
100
+ sortHorizontal: 'none',
101
+ });
102
+ ```
103
+
104
+ ### CLI Usage
105
+
106
+ ```
107
+ USAGE
108
+ label-studio-converter toLabelStudio [--outDir value] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] <args>...
109
+ label-studio-converter toPPOCR [--outDir value] [--fileName value] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] <args>...
110
+ label-studio-converter --help
111
+ label-studio-converter --version
112
+
113
+ Convert between Label Studio OCR format and PPOCRLabelv2 format
114
+
115
+ FLAGS
116
+ -h --help Print help information and exit
117
+ -v --version Print version information and exit
118
+
119
+ COMMANDS
120
+ toLabelStudio Convert PPOCRLabel files to Label Studio format
121
+ toPPOCR Convert Label Studio files to PPOCRLabel format
122
+ ```
123
+
124
+ Subcommands:
125
+
126
+ ```
127
+ USAGE
128
+ label-studio-converter toLabelStudio [--outDir value] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] <args>...
129
+ label-studio-converter toLabelStudio --help
130
+
131
+ Convert PPOCRLabel files to Label Studio format
132
+
133
+ FLAGS
134
+ [--outDir] Output directory. Default to "./output"
135
+ [--defaultLabelName] Default label name for text annotations. Default to "Text"
136
+ [--toFullJson/--noToFullJson] Convert to Full OCR Label Studio format. Default to "true"
137
+ [--createFilePerImage/--noCreateFilePerImage] Create a separate Label Studio JSON file for each image. Default to "false"
138
+ [--createFileListForServing/--noCreateFileListForServing] Create a file list for serving in Label Studio. Default to "true"
139
+ [--fileListName] Name of the file list for serving. Default to "files.txt"
140
+ [--baseServerUrl] Base server URL for constructing image URLs in the file list. Default to "http://localhost:8081"
141
+ [--sortVertical] Sort bounding boxes vertically. Options: "none" (default), "top-bottom", "bottom-top"
142
+ [--sortHorizontal] Sort bounding boxes horizontally. Options: "none" (default), "ltr", "rtl"
143
+ -h --help Print help information and exit
144
+
145
+ ARGUMENTS
146
+ args... Input directories containing PPOCRLabel files
147
+ ```
148
+
149
+ ```
150
+ USAGE
151
+ label-studio-converter toPPOCR [--outDir value] [--fileName value] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] <args>...
152
+ label-studio-converter toPPOCR --help
153
+
154
+ Convert Label Studio files to PPOCRLabel format
155
+
156
+ FLAGS
157
+ [--outDir] Output directory. Default to "./output"
158
+ [--fileName] Output PPOCR file name. Default to "Label.txt"
159
+ [--baseImageDir] Base directory path to prepend to image filenames in output (e.g., "ch" or "images/ch")
160
+ [--sortVertical] Sort bounding boxes vertically. Options: "none" (default), "top-bottom", "bottom-top"
161
+ [--sortHorizontal] Sort bounding boxes horizontally. Options: "none" (default), "ltr", "rtl"
162
+ -h --help Print help information and exit
163
+
164
+ ARGUMENTS
165
+ args... Input directories containing Label Studio files
166
+ ```
167
+
168
+ #### Examples
169
+
170
+ **Convert PPOCRLabel files to full Label Studio format:**
171
+
172
+ ```bash
173
+ label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --toFullJson --createFileListForServing --fileListName files.txt --baseServerUrl http://localhost:8081 --sortVertical none --sortHorizontal none
174
+ ```
175
+
176
+ **Convert Label Studio files to PPOCRLabel format:**
177
+
178
+ ```bash
179
+ label-studio-converter toPPOCR ./input-label-studio --outDir ./output-ppocr --fileName Label.txt --baseImageDir images/ch --sortVertical none --sortHorizontal none
180
+ ```
181
+
182
+ **Convert PPOCRLabel files to Label Studio format with one file per image:**
183
+
184
+ ```bash
185
+ label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --toFullJson --createFilePerImage --sortVertical none --sortHorizontal none
186
+ ```
187
+
188
+ **Convert PPOCRLabel files to minimal Label Studio format (cannot be used for serving):**
189
+
190
+ ```bash
191
+ label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --noToFullJson --sortVertical none --sortHorizontal none
192
+ ```
193
+
194
+ ### Using generated files with Label Studio
195
+
196
+ #### Interface setup
197
+
198
+ When creating a new labeling project in Label Studio, choose the "OCR" template.
199
+ This will set up the appropriate interface for text recognition tasks.
200
+
201
+ This project uses the following Label Studio interface configuration:
202
+
203
+ ```xml
204
+ <View>
205
+ <Image name="image" value="$ocr" zoom="false" rotateControl="true" zoomControl="false"/>
206
+ <Labels name="label" toName="image">
207
+ <Label value="Text" background="green"/>
208
+ <Label value="Handwriting" background="blue"/>
209
+ </Labels>
210
+ <Rectangle name="bbox" toName="image" strokeWidth="3"/>
211
+ <Polygon name="poly" toName="image" strokeWidth="3"/>
212
+ <TextArea name="transcription" toName="image" editable="true" perRegion="true" required="false" maxSubmissions="1" rows="5" placeholder="Recognized Text" displayMode="region-list"/>
213
+ </View>
214
+ ```
215
+
216
+ This setup includes:
217
+
218
+ - An `Image` tag to display the image to be annotated.
219
+ - A `Labels` tag with two label options: "Text" and "Handwriting". By default,
220
+ all annotations will be labeled as "Text". You can modify this based on your
221
+ needs.
222
+ - A `Rectangle` tag to allow annotators to draw bounding boxes around text regions.
223
+ - A `Polygon` tag to allow annotators to draw polygons around text regions.
224
+ - A `TextArea` tag for annotators to input the recognized text for each region.
225
+
226
+ > [!IMPORTANT]
227
+ > Make sure that the `Image` tag's `value` attribute is set to `$ocr`, as this
228
+ > is where the image URLs will be populated from the generated JSON files.
229
+
230
+ #### Serving annotation files locally
231
+
232
+ To serve the generated Label Studio annotation files and images locally, you can
233
+ follow official [Label Studio
234
+ documentation](https://labelstud.io/guide/tasks#Import-data-from-a-local-directory).
235
+
236
+ 1. Start a simple HTTP server in the output directory containing the generated
237
+ Label Studio files. You can use Python's built-in HTTP server for this:
238
+
239
+ ```bash
240
+ cd ./output-label-studio
241
+ python3 -m http.server 8081
242
+ ```
243
+
244
+ or using `http-server` from npm:
245
+
246
+ ```bash
247
+ npx http-server -p 8081 --cors
248
+ ```
249
+
250
+ > [!IMPORTANT]
251
+ > Ensure that the port number (e.g., `8081`) matches the `baseServerUrl` used
252
+ > during conversion.
253
+
254
+ > [!NOTE]
255
+ > The server may have to be configured CORS settings to allow Label Studio to
256
+ > access the files. Refer to the documentation of the server you are using for
257
+ > instructions on how to enable CORS.
258
+
259
+ 2. Add the file directory as source storage in Label Studio, by following the official
260
+ [Label Studio
261
+ documentation](https://labelstud.io/guide/tasks#Import-data-from-a-local-directory).
262
+
263
+ By default, the generated file list is named `files.txt`. before running the
264
+ command below, ensure that the `files.txt` is copied to the `./myfiles`
265
+ directory.
266
+
267
+ The following command starts a Docker container with the latest image of
268
+ Label Studio with port 8080 and an environment variable that allows Label
269
+ Studio to access local files. In this example, a local directory `./myfiles`
270
+ is mounted to the `/label-studio/files` location.
271
+
272
+ ```bash
273
+ docker run -it -p 8080:8080 -v $(pwd)/mydata:/label-studio/data \
274
+ --env LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
275
+ --env LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/files \
276
+ -v $(pwd)/myfiles:/label-studio/files \
277
+ heartexlabs/label-studio:latest label-studio
278
+ ```
279
+
280
+ 3. Open your web browser and navigate to `http://localhost:8080` to access
281
+ Label Studio.
282
+
283
+ 4. Create a new project or open an existing one, and go to the "Import" tab.
284
+
285
+ 5. Import the generated tasks to Label Studio.
286
+
287
+ ### Using generated files with PPOCRLabelv2
288
+
289
+ PPOCRLabelv2 has many Github repositories, but we have tested the generated
290
+ files with the following repository:
291
+
292
+ - [`PFCCLab/PPOCRLabel`](https://github.com/PFCCLab/PPOCRLabel).
293
+
294
+ Generated files can be used by placing them in the appropriate directory
295
+ structure as expected by PPOCRLabelv2, by replaceing the existing `Label.txt`
296
+ files in the dataset directories.
297
+
298
+ If the images are put in a different directory, make sure to update the image
299
+ directory path by specifying the `baseImageDir` option during conversion.
300
+
301
+ <!-- Roadmap -->
302
+
303
+ ## :compass: Roadmap
304
+
305
+ - [x] Add tests.
306
+
307
+ <!-- Contributing -->
308
+
309
+ ## :wave: Contributing
310
+
311
+ <a href="https://github.com/DuckyMomo20012/label-studio-converter/graphs/contributors">
312
+ <img src="https://contrib.rocks/image?repo=DuckyMomo20012/label-studio-converter" />
313
+ </a>
314
+
315
+ Contributions are always welcome!
316
+
317
+ Please read the [contribution guidelines](./CONTRIBUTING.md).
318
+
319
+ <!-- Code of Conduct -->
320
+
321
+ ### :scroll: Code of Conduct
322
+
323
+ Please read the [Code of Conduct](./CODE_OF_CONDUCT.md).
324
+
325
+ <!-- License -->
326
+
327
+ ## :warning: License
328
+
329
+ This project is licensed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)** License.
330
+
331
+ [![License: CC BY-NC-SA 4.0](https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png)](https://creativecommons.org/licenses/by-nc-sa/4.0/).
332
+
333
+ See the **[LICENSE.md](./LICENSE.md)** file for full details.
334
+
335
+ <!-- Contact -->
336
+
337
+ ## :handshake: Contact
338
+
339
+ Duong Vinh - tienvinh.duong4@gmail.com
340
+
341
+ Project Link: [https://github.com/DuckyMomo20012/label-studio-converter](https://github.com/DuckyMomo20012/label-studio-converter).
342
+
343
+ <!-- Acknowledgments -->
344
+
345
+ ## :gem: Acknowledgements
346
+
347
+ Here are useful resources and libraries that we have used in our projects:
348
+
349
+ - [Label Studio Documentation](https://labelstud.io/guide): Official documentation for Label Studio.
350
+ - [PPOCRLabel GitHub Repository](https://github.com/PFCCLab/PPOCRLabel):
351
+ Repository for PPOCRLabelv2 tool.