label-studio-converter 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE.md +33 -0
- package/README.md +351 -0
- package/dist/bash-complete.cjs +1296 -0
- package/dist/bash-complete.cjs.map +1 -0
- package/dist/bash-complete.d.cts +1 -0
- package/dist/bash-complete.d.ts +1 -0
- package/dist/bash-complete.js +1279 -0
- package/dist/bash-complete.js.map +1 -0
- package/dist/cli.cjs +1281 -0
- package/dist/cli.cjs.map +1 -0
- package/dist/cli.d.cts +1 -0
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +1264 -0
- package/dist/cli.js.map +1 -0
- package/dist/index.cjs +418 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +309 -0
- package/dist/index.d.ts +309 -0
- package/dist/index.js +377 -0
- package/dist/index.js.map +1 -0
- package/package.json +78 -0
package/LICENSE.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
|
|
2
|
+
|
|
3
|
+
By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
|
|
4
|
+
|
|
5
|
+
## Section 1 – Definitions.
|
|
6
|
+
|
|
7
|
+
- **Licensed Material**: the artistic or literary work, database, or other material to which the Licensor applied this Public License.
|
|
8
|
+
- **Licensor**: the individual(s) or entity(ies) granting rights under this Public License.
|
|
9
|
+
- **You**: the individual or entity exercising the Licensed Rights under this Public License.
|
|
10
|
+
- **Share**: to provide material to the public by any means or process.
|
|
11
|
+
- **Adapted Material**: material derived from or modified based on the Licensed Material.
|
|
12
|
+
- **NonCommercial**: not primarily intended for or directed towards commercial advantage or monetary compensation.
|
|
13
|
+
|
|
14
|
+
## Section 2 – Scope.
|
|
15
|
+
|
|
16
|
+
### 2.1 License Grant
|
|
17
|
+
|
|
18
|
+
Subject to the terms of this Public License, the Licensor grants You a worldwide, royalty-free, non-exclusive, irrevocable license to:
|
|
19
|
+
|
|
20
|
+
- **Share**: copy and redistribute the Licensed Material in any medium or format.
|
|
21
|
+
- **Adapt**: remix, transform, and build upon the Licensed Material.
|
|
22
|
+
|
|
23
|
+
### 2.2 Conditions
|
|
24
|
+
|
|
25
|
+
- **Attribution (BY)**: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
|
|
26
|
+
- **NonCommercial (NC)**: You may **not** use the material for commercial purposes.
|
|
27
|
+
- **ShareAlike (SA)**: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
|
|
28
|
+
|
|
29
|
+
## Section 3 – Disclaimer.
|
|
30
|
+
|
|
31
|
+
The Licensed Material is provided "as-is" without any warranties or guarantees. The Licensor is not liable for any damages arising from its use.
|
|
32
|
+
|
|
33
|
+
**Full License Text:** [Creative Commons License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode)
|
package/README.md
ADDED
|
@@ -0,0 +1,351 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
<h1>label-studio-converter</h1>
|
|
4
|
+
|
|
5
|
+
<p>
|
|
6
|
+
Convert between Label Studio OCR format and PPOCRLabelv2 format
|
|
7
|
+
</p>
|
|
8
|
+
|
|
9
|
+
</div>
|
|
10
|
+
|
|
11
|
+
<br />
|
|
12
|
+
|
|
13
|
+
<!-- Table of Contents -->
|
|
14
|
+
|
|
15
|
+
# :notebook_with_decorative_cover: Table of Contents
|
|
16
|
+
|
|
17
|
+
- [Getting Started](#toolbox-getting-started)
|
|
18
|
+
- [Prerequisites](#bangbang-prerequisites)
|
|
19
|
+
- [Run Locally](#running-run-locally)
|
|
20
|
+
- [Usage](#eyes-usage)
|
|
21
|
+
- [Basic Usage](#basic-usage)
|
|
22
|
+
- [CLI Usage](#cli-usage)
|
|
23
|
+
- [Using generated files with Label Studio](#using-generated-files-with-label-studio)
|
|
24
|
+
- [Interface setup](#interface-setup)
|
|
25
|
+
- [Serving annotation files locally](#serving-annotation-files-locally)
|
|
26
|
+
- [Using generated files with PPOCRLabelv2](#using-generated-files-with-ppocrlabelv2)
|
|
27
|
+
- [Roadmap](#compass-roadmap)
|
|
28
|
+
- [Contributing](#wave-contributing)
|
|
29
|
+
- [Code of Conduct](#scroll-code-of-conduct)
|
|
30
|
+
- [License](#warning-license)
|
|
31
|
+
- [Contact](#handshake-contact)
|
|
32
|
+
- [Acknowledgements](#gem-acknowledgements)
|
|
33
|
+
|
|
34
|
+
<!-- Getting Started -->
|
|
35
|
+
|
|
36
|
+
## :toolbox: Getting Started
|
|
37
|
+
|
|
38
|
+
<!-- Prerequisites -->
|
|
39
|
+
|
|
40
|
+
### :bangbang: Prerequisites
|
|
41
|
+
|
|
42
|
+
This project uses [pnpm](https://pnpm.io/) as package manager:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
npm install --global pnpm
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
<!-- Run Locally -->
|
|
49
|
+
|
|
50
|
+
### :running: Run Locally
|
|
51
|
+
|
|
52
|
+
Clone the project:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
git clone https://github.com/DuckyMomo20012/label-studio-converter.git
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Go to the project directory:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
cd label-studio-converter
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Install dependencies:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
pnpm install
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
<!-- Usage -->
|
|
71
|
+
|
|
72
|
+
## :eyes: Usage
|
|
73
|
+
|
|
74
|
+
### Basic Usage
|
|
75
|
+
|
|
76
|
+
```ts
|
|
77
|
+
import { toLabelStudio, toPPOCR } from 'label-studio-converter';
|
|
78
|
+
|
|
79
|
+
// Convert PPOCRLabel files to Label Studio format
|
|
80
|
+
await toLabelStudio({
|
|
81
|
+
inputDirs: ['./input-ppocr'],
|
|
82
|
+
outDir: './output-label-studio',
|
|
83
|
+
defaultLabelName: 'Text',
|
|
84
|
+
toFullJson: true,
|
|
85
|
+
createFilePerImage: false,
|
|
86
|
+
createFileListForServing: true,
|
|
87
|
+
fileListName: 'files.txt',
|
|
88
|
+
baseServerUrl: 'http://localhost:8081',
|
|
89
|
+
sortVertical: 'none',
|
|
90
|
+
sortHorizontal: 'none',
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
// Convert Label Studio files to PPOCRLabel format
|
|
94
|
+
await toPPOCR({
|
|
95
|
+
inputDirs: ['./input-label-studio'],
|
|
96
|
+
outDir: './output-ppocr',
|
|
97
|
+
fileName: 'Label.txt',
|
|
98
|
+
baseImageDir: 'images/ch',
|
|
99
|
+
sortVertical: 'none',
|
|
100
|
+
sortHorizontal: 'none',
|
|
101
|
+
});
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### CLI Usage
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
USAGE
|
|
108
|
+
label-studio-converter toLabelStudio [--outDir value] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] <args>...
|
|
109
|
+
label-studio-converter toPPOCR [--outDir value] [--fileName value] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] <args>...
|
|
110
|
+
label-studio-converter --help
|
|
111
|
+
label-studio-converter --version
|
|
112
|
+
|
|
113
|
+
Convert between Label Studio OCR format and PPOCRLabelv2 format
|
|
114
|
+
|
|
115
|
+
FLAGS
|
|
116
|
+
-h --help Print help information and exit
|
|
117
|
+
-v --version Print version information and exit
|
|
118
|
+
|
|
119
|
+
COMMANDS
|
|
120
|
+
toLabelStudio Convert PPOCRLabel files to Label Studio format
|
|
121
|
+
toPPOCR Convert Label Studio files to PPOCRLabel format
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Subcommands:
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
USAGE
|
|
128
|
+
label-studio-converter toLabelStudio [--outDir value] [--defaultLabelName value] [--toFullJson] [--createFilePerImage] [--createFileListForServing] [--fileListName value] [--baseServerUrl value] [--sortVertical value] [--sortHorizontal value] <args>...
|
|
129
|
+
label-studio-converter toLabelStudio --help
|
|
130
|
+
|
|
131
|
+
Convert PPOCRLabel files to Label Studio format
|
|
132
|
+
|
|
133
|
+
FLAGS
|
|
134
|
+
[--outDir] Output directory. Default to "./output"
|
|
135
|
+
[--defaultLabelName] Default label name for text annotations. Default to "Text"
|
|
136
|
+
[--toFullJson/--noToFullJson] Convert to Full OCR Label Studio format. Default to "true"
|
|
137
|
+
[--createFilePerImage/--noCreateFilePerImage] Create a separate Label Studio JSON file for each image. Default to "false"
|
|
138
|
+
[--createFileListForServing/--noCreateFileListForServing] Create a file list for serving in Label Studio. Default to "true"
|
|
139
|
+
[--fileListName] Name of the file list for serving. Default to "files.txt"
|
|
140
|
+
[--baseServerUrl] Base server URL for constructing image URLs in the file list. Default to "http://localhost:8081"
|
|
141
|
+
[--sortVertical] Sort bounding boxes vertically. Options: "none" (default), "top-bottom", "bottom-top"
|
|
142
|
+
[--sortHorizontal] Sort bounding boxes horizontally. Options: "none" (default), "ltr", "rtl"
|
|
143
|
+
-h --help Print help information and exit
|
|
144
|
+
|
|
145
|
+
ARGUMENTS
|
|
146
|
+
args... Input directories containing PPOCRLabel files
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
```
|
|
150
|
+
USAGE
|
|
151
|
+
label-studio-converter toPPOCR [--outDir value] [--fileName value] [--baseImageDir value] [--sortVertical value] [--sortHorizontal value] <args>...
|
|
152
|
+
label-studio-converter toPPOCR --help
|
|
153
|
+
|
|
154
|
+
Convert Label Studio files to PPOCRLabel format
|
|
155
|
+
|
|
156
|
+
FLAGS
|
|
157
|
+
[--outDir] Output directory. Default to "./output"
|
|
158
|
+
[--fileName] Output PPOCR file name. Default to "Label.txt"
|
|
159
|
+
[--baseImageDir] Base directory path to prepend to image filenames in output (e.g., "ch" or "images/ch")
|
|
160
|
+
[--sortVertical] Sort bounding boxes vertically. Options: "none" (default), "top-bottom", "bottom-top"
|
|
161
|
+
[--sortHorizontal] Sort bounding boxes horizontally. Options: "none" (default), "ltr", "rtl"
|
|
162
|
+
-h --help Print help information and exit
|
|
163
|
+
|
|
164
|
+
ARGUMENTS
|
|
165
|
+
args... Input directories containing Label Studio files
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
#### Examples
|
|
169
|
+
|
|
170
|
+
**Convert PPOCRLabel files to full Label Studio format:**
|
|
171
|
+
|
|
172
|
+
```bash
|
|
173
|
+
label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --toFullJson --createFileListForServing --fileListName files.txt --baseServerUrl http://localhost:8081 --sortVertical none --sortHorizontal none
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
**Convert Label Studio files to PPOCRLabel format:**
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
label-studio-converter toPPOCR ./input-label-studio --outDir ./output-ppocr --fileName Label.txt --baseImageDir images/ch --sortVertical none --sortHorizontal none
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
**Convert PPOCRLabel files to Label Studio format with one file per image:**
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --toFullJson --createFilePerImage --sortVertical none --sortHorizontal none
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
**Convert PPOCRLabel files to minimal Label Studio format (cannot be used for serving):**
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
label-studio-converter toLabelStudio ./input-ppocr --outDir ./output-label-studio --defaultLabelName Text --noToFullJson --sortVertical none --sortHorizontal none
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Using generated files with Label Studio
|
|
195
|
+
|
|
196
|
+
#### Interface setup
|
|
197
|
+
|
|
198
|
+
When creating a new labeling project in Label Studio, choose the "OCR" template.
|
|
199
|
+
This will set up the appropriate interface for text recognition tasks.
|
|
200
|
+
|
|
201
|
+
This project uses the following Label Studio interface configuration:
|
|
202
|
+
|
|
203
|
+
```xml
|
|
204
|
+
<View>
|
|
205
|
+
<Image name="image" value="$ocr" zoom="false" rotateControl="true" zoomControl="false"/>
|
|
206
|
+
<Labels name="label" toName="image">
|
|
207
|
+
<Label value="Text" background="green"/>
|
|
208
|
+
<Label value="Handwriting" background="blue"/>
|
|
209
|
+
</Labels>
|
|
210
|
+
<Rectangle name="bbox" toName="image" strokeWidth="3"/>
|
|
211
|
+
<Polygon name="poly" toName="image" strokeWidth="3"/>
|
|
212
|
+
<TextArea name="transcription" toName="image" editable="true" perRegion="true" required="false" maxSubmissions="1" rows="5" placeholder="Recognized Text" displayMode="region-list"/>
|
|
213
|
+
</View>
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
This setup includes:
|
|
217
|
+
|
|
218
|
+
- An `Image` tag to display the image to be annotated.
|
|
219
|
+
- A `Labels` tag with two label options: "Text" and "Handwriting". By default,
|
|
220
|
+
all annotations will be labeled as "Text". You can modify this based on your
|
|
221
|
+
needs.
|
|
222
|
+
- A `Rectangle` tag to allow annotators to draw bounding boxes around text regions.
|
|
223
|
+
- A `Polygon` tag to allow annotators to draw polygons around text regions.
|
|
224
|
+
- A `TextArea` tag for annotators to input the recognized text for each region.
|
|
225
|
+
|
|
226
|
+
> [!IMPORTANT]
|
|
227
|
+
> Make sure that the `Image` tag's `value` attribute is set to `$ocr`, as this
|
|
228
|
+
> is where the image URLs will be populated from the generated JSON files.
|
|
229
|
+
|
|
230
|
+
#### Serving annotation files locally
|
|
231
|
+
|
|
232
|
+
To serve the generated Label Studio annotation files and images locally, you can
|
|
233
|
+
follow official [Label Studio
|
|
234
|
+
documentation](https://labelstud.io/guide/tasks#Import-data-from-a-local-directory).
|
|
235
|
+
|
|
236
|
+
1. Start a simple HTTP server in the output directory containing the generated
|
|
237
|
+
Label Studio files. You can use Python's built-in HTTP server for this:
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
cd ./output-label-studio
|
|
241
|
+
python3 -m http.server 8081
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
or using `http-server` from npm:
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
npx http-server -p 8081 --cors
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
> [!IMPORTANT]
|
|
251
|
+
> Ensure that the port number (e.g., `8081`) matches the `baseServerUrl` used
|
|
252
|
+
> during conversion.
|
|
253
|
+
|
|
254
|
+
> [!NOTE]
|
|
255
|
+
> The server may have to be configured CORS settings to allow Label Studio to
|
|
256
|
+
> access the files. Refer to the documentation of the server you are using for
|
|
257
|
+
> instructions on how to enable CORS.
|
|
258
|
+
|
|
259
|
+
2. Add the file directory as source storage in Label Studio, by following the official
|
|
260
|
+
[Label Studio
|
|
261
|
+
documentation](https://labelstud.io/guide/tasks#Import-data-from-a-local-directory).
|
|
262
|
+
|
|
263
|
+
By default, the generated file list is named `files.txt`. before running the
|
|
264
|
+
command below, ensure that the `files.txt` is copied to the `./myfiles`
|
|
265
|
+
directory.
|
|
266
|
+
|
|
267
|
+
The following command starts a Docker container with the latest image of
|
|
268
|
+
Label Studio with port 8080 and an environment variable that allows Label
|
|
269
|
+
Studio to access local files. In this example, a local directory `./myfiles`
|
|
270
|
+
is mounted to the `/label-studio/files` location.
|
|
271
|
+
|
|
272
|
+
```bash
|
|
273
|
+
docker run -it -p 8080:8080 -v $(pwd)/mydata:/label-studio/data \
|
|
274
|
+
--env LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
|
|
275
|
+
--env LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/files \
|
|
276
|
+
-v $(pwd)/myfiles:/label-studio/files \
|
|
277
|
+
heartexlabs/label-studio:latest label-studio
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
3. Open your web browser and navigate to `http://localhost:8080` to access
|
|
281
|
+
Label Studio.
|
|
282
|
+
|
|
283
|
+
4. Create a new project or open an existing one, and go to the "Import" tab.
|
|
284
|
+
|
|
285
|
+
5. Import the generated tasks to Label Studio.
|
|
286
|
+
|
|
287
|
+
### Using generated files with PPOCRLabelv2
|
|
288
|
+
|
|
289
|
+
PPOCRLabelv2 has many Github repositories, but we have tested the generated
|
|
290
|
+
files with the following repository:
|
|
291
|
+
|
|
292
|
+
- [`PFCCLab/PPOCRLabel`](https://github.com/PFCCLab/PPOCRLabel).
|
|
293
|
+
|
|
294
|
+
Generated files can be used by placing them in the appropriate directory
|
|
295
|
+
structure as expected by PPOCRLabelv2, by replaceing the existing `Label.txt`
|
|
296
|
+
files in the dataset directories.
|
|
297
|
+
|
|
298
|
+
If the images are put in a different directory, make sure to update the image
|
|
299
|
+
directory path by specifying the `baseImageDir` option during conversion.
|
|
300
|
+
|
|
301
|
+
<!-- Roadmap -->
|
|
302
|
+
|
|
303
|
+
## :compass: Roadmap
|
|
304
|
+
|
|
305
|
+
- [x] Add tests.
|
|
306
|
+
|
|
307
|
+
<!-- Contributing -->
|
|
308
|
+
|
|
309
|
+
## :wave: Contributing
|
|
310
|
+
|
|
311
|
+
<a href="https://github.com/DuckyMomo20012/label-studio-converter/graphs/contributors">
|
|
312
|
+
<img src="https://contrib.rocks/image?repo=DuckyMomo20012/label-studio-converter" />
|
|
313
|
+
</a>
|
|
314
|
+
|
|
315
|
+
Contributions are always welcome!
|
|
316
|
+
|
|
317
|
+
Please read the [contribution guidelines](./CONTRIBUTING.md).
|
|
318
|
+
|
|
319
|
+
<!-- Code of Conduct -->
|
|
320
|
+
|
|
321
|
+
### :scroll: Code of Conduct
|
|
322
|
+
|
|
323
|
+
Please read the [Code of Conduct](./CODE_OF_CONDUCT.md).
|
|
324
|
+
|
|
325
|
+
<!-- License -->
|
|
326
|
+
|
|
327
|
+
## :warning: License
|
|
328
|
+
|
|
329
|
+
This project is licensed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)** License.
|
|
330
|
+
|
|
331
|
+
[](https://creativecommons.org/licenses/by-nc-sa/4.0/).
|
|
332
|
+
|
|
333
|
+
See the **[LICENSE.md](./LICENSE.md)** file for full details.
|
|
334
|
+
|
|
335
|
+
<!-- Contact -->
|
|
336
|
+
|
|
337
|
+
## :handshake: Contact
|
|
338
|
+
|
|
339
|
+
Duong Vinh - tienvinh.duong4@gmail.com
|
|
340
|
+
|
|
341
|
+
Project Link: [https://github.com/DuckyMomo20012/label-studio-converter](https://github.com/DuckyMomo20012/label-studio-converter).
|
|
342
|
+
|
|
343
|
+
<!-- Acknowledgments -->
|
|
344
|
+
|
|
345
|
+
## :gem: Acknowledgements
|
|
346
|
+
|
|
347
|
+
Here are useful resources and libraries that we have used in our projects:
|
|
348
|
+
|
|
349
|
+
- [Label Studio Documentation](https://labelstud.io/guide): Official documentation for Label Studio.
|
|
350
|
+
- [PPOCRLabel GitHub Repository](https://github.com/PFCCLab/PPOCRLabel):
|
|
351
|
+
Repository for PPOCRLabelv2 tool.
|