udpipe-node 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +182 -0
- package/dist/cli.d.ts +2 -0
- package/dist/cli.d.ts.map +1 -0
- package/dist/cli.js +906 -0
- package/dist/cli.js.map +1 -0
- package/dist/conllu.d.ts +66 -0
- package/dist/conllu.d.ts.map +1 -0
- package/dist/conllu.js +135 -0
- package/dist/conllu.js.map +1 -0
- package/dist/engine.d.ts +46 -0
- package/dist/engine.d.ts.map +1 -0
- package/dist/engine.js +69 -0
- package/dist/engine.js.map +1 -0
- package/dist/index.d.ts +28 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +67 -0
- package/dist/index.js.map +1 -0
- package/dist/wasm-engine.d.ts +22 -0
- package/dist/wasm-engine.d.ts.map +1 -0
- package/dist/wasm-engine.js +82 -0
- package/dist/wasm-engine.js.map +1 -0
- package/models/english-gum-ud-2.5-191206.udpipe +0 -0
- package/package.json +49 -0
- package/runtime/udpipe-wasm.mjs +2 -0
- package/runtime/udpipe-wasm.wasm +0 -0
package/README.md
ADDED
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# udpipe-node
|
|
2
|
+
|
|
3
|
+
Run the [UDPipe 1.x](https://ufal.mff.cuni.cz/udpipe/1) dependency parser in Node.js / TypeScript. In-package:English UD 2.5 model (GUM), Multi-OS WASM engine, MacOS CLI engine.
|
|
4
|
+
|
|
5
|
+
Wrapping the original C++ implementation of [UDPipe 1.4.0 (version released on 11/20/2025)](https://github.com/ufal/udpipe/releases/tag/v1.4.0), this portable package provides a typed, synchronous API, two swappable execution backends for extra compatibility, & a built-in model.
|
|
6
|
+
**Why this exists**: The official UDPipe ships bindings for Python, Java, C#, and Perl — but none for JS/TS/Node. This package fills that gap and, thanks to WASM, can run anywhere.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Out-of-the-Box Contents
|
|
11
|
+
|
|
12
|
+
To facilitate expedient setup and portability, this package bundles:
|
|
13
|
+
1. **A Pre-compiled WebAssembly Layer** (`runtime/udpipe-wasm.*`) generated using Emscripten.
|
|
14
|
+
2. **A Single Default English Model**: The GUM variant from Universal Dependencies 2.5 (`models/english-gum-ud-2.5-191206.udpipe`).
|
|
15
|
+
|
|
16
|
+
> [!NOTE]
|
|
17
|
+
> To use other languages or alternative English models, you must download them from the official **[LINDAT UDPipe Models Repository](https://lindat.mff.cuni.cz/repository/items/41f05304-629f-4313-b9cf-9eeb0a2ca7c6)** and pass the custom path to the constructor.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Architecture & OS Compatibility
|
|
22
|
+
|
|
23
|
+
The package supports two backends, each with different OS compatibility characteristics:
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
┌────────────────────────────────────────────┐
|
|
27
|
+
text → │ UDPipe.parse() → engine.process() → CoNLL-U → parseConllu() → UDSentence[]
|
|
28
|
+
└────────────────────────────────────────────┘
|
|
29
|
+
▲
|
|
30
|
+
┌───────────────┴───────────────┐
|
|
31
|
+
WasmEngine CliEngine
|
|
32
|
+
(Pure WebAssembly) (Subprocess spawn)
|
|
33
|
+
```
|
|
34
|
+
- **`conllu.ts`** — backend-independent CoNLL-U → typed objects. Reused by every engine.
|
|
35
|
+
- **`engine.ts`** — the `UDPipeEngine` interface + `CliEngine` (spawns the binary).
|
|
36
|
+
- **`index.ts`** — the `UDPipe` convenience class.
|
|
37
|
+
|
|
38
|
+
### 1. WasmEngine (Pure WebAssembly — Recommended)
|
|
39
|
+
* **Compatibility**: 💻 **Fully Cross-Platform** (macOS ARM64/Intel, Windows x64, and Linux x64).
|
|
40
|
+
* **Requirements**: Node.js >= 20. Runs anywhere Node.js runs without native binaries, native compilation, or subprocess spawning.
|
|
41
|
+
* **Internals**: Instantiates the Emscripten-compiled WebAssembly binary synchronously on import using top-level await, allowing synchronous, low-latency parsing (~5ms warm re-parse).
|
|
42
|
+
|
|
43
|
+
### 2. CliEngine (Subprocess Native Binary)
|
|
44
|
+
* **Compatibility**: **macOS Only** out-of-the-box (other OS binaries *not* packaged in the npm distribution).
|
|
45
|
+
* **Custom Config**: To use `CliEngine` on Windows or Linux, you must obtain a compiled `udpipe` executable for your system and specify its path manually using the `binaryPath` option.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Quick Start / Setup
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
# 1. Install dependencies
|
|
53
|
+
npm install
|
|
54
|
+
|
|
55
|
+
# 2. Build TypeScript and run the WASM smoke test
|
|
56
|
+
npm start
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Usage Guide
|
|
62
|
+
|
|
63
|
+
### 1. Using the Portable WASM Engine (Recommended)
|
|
64
|
+
Importing from the `/wasm` subpath resolves the bundled English model automatically and runs via pure WebAssembly.
|
|
65
|
+
|
|
66
|
+
```typescript
|
|
67
|
+
import { createUDPipe } from 'udpipe-node/wasm';
|
|
68
|
+
|
|
69
|
+
// Initializes the WASM engine with the bundled English GUM model
|
|
70
|
+
const nlp = createUDPipe();
|
|
71
|
+
|
|
72
|
+
// Verse mode — one input line = one sentence (keeps line boundaries):
|
|
73
|
+
const sentences = nlp.parseLines('From fairest creatures we desire increase,');
|
|
74
|
+
|
|
75
|
+
// Prose Mode: UDPipe segments sentences itself
|
|
76
|
+
const sentences = nlp.parse("I love poetry. It scans well.");
|
|
77
|
+
|
|
78
|
+
for (const s of sentences) {
|
|
79
|
+
console.log(`# Sentence: ${s.text}`);
|
|
80
|
+
for (const w of s.words) {
|
|
81
|
+
console.log(`${w.id}\t${w.form}\t${w.xpos}\t${w.deprel}\t→\t${w.head}`);
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### 2. Using the CLI Engine (macOS Only Out-of-the-Box)
|
|
87
|
+
If you want to use the native binary runner instead, use the method below.
|
|
88
|
+
// Defaults to bin-macos/udpipe (macOS only) and English GUM model in the repo root.
|
|
89
|
+
// Will fail on Windows/Linux or in published npm environments unless paths are provided.
|
|
90
|
+
|
|
91
|
+
```typescript
|
|
92
|
+
import { UDPipe, CliEngine } from 'udpipe-node';
|
|
93
|
+
|
|
94
|
+
const nlp = new UDPipe();
|
|
95
|
+
|
|
96
|
+
const sentences = nlp.parseLines("From fairest creatures we desire increase,\nThat thereby beauty's rose might never die,");
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### 3. Custom Binary and Model Paths
|
|
100
|
+
**Download binaries** for your OS from the [UDPipe GitHub Distribution](https://github.com/ufal/udpipe/releases/tag/v1.4.0). <br>
|
|
101
|
+
**Download models** from the [LINDAT repo](https://lindat.mff.cuni.cz/repository/items/41f05304-629f-4313-b9cf-9eeb0a2ca7c6). <br>
|
|
102
|
+
You can **specify custom paths** for other languages, models, or native executables:
|
|
103
|
+
|
|
104
|
+
```typescript
|
|
105
|
+
import { UDPipe, CliEngine } from 'udpipe-node';
|
|
106
|
+
|
|
107
|
+
const nlp = new UDPipe({
|
|
108
|
+
// Path to your custom language model downloaded from LINDAT
|
|
109
|
+
modelPath: '/path/to/czech-pdt-ud-2.5-191206.udpipe',
|
|
110
|
+
// Path to your local Windows or Linux udpipe binary
|
|
111
|
+
binaryPath: '/path/to/udpipe'
|
|
112
|
+
});
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Interactive Parser CLI
|
|
116
|
+
|
|
117
|
+
The package includes a rich, interactive, chalk-colored terminal CLI for querying and testing the parser. To build the codebase and launch the CLI, run:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
npm start
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Main Menu Options
|
|
124
|
+
1. **Verse Mode**: Parse pasted text line-by-line (respects line breaks).
|
|
125
|
+
2. **Prose Mode**: Parse pasted text with sentence segmentation.
|
|
126
|
+
3. **Parse from File (Verse Mode)**: Read a text file and parse line-by-line.
|
|
127
|
+
4. **Parse from File (Prose Mode)**: Read a text file and segment into sentences automatically.
|
|
128
|
+
5. **Load Model File**: Dynamically load a different `.udpipe` language model file.
|
|
129
|
+
6. **Information**: Full-screen 3-page reference guide for POS tags and dependencies.
|
|
130
|
+
7. **Exit**: Terminate the CLI.
|
|
131
|
+
|
|
132
|
+
### Parsing Navigation Controls
|
|
133
|
+
Once parsing is complete, traverse sentences using the keyboard:
|
|
134
|
+
* **`→` or `Enter`**: Go to the next sentence/line parse.
|
|
135
|
+
* **`←`**: Go to the previous sentence/line.
|
|
136
|
+
* **`↓`**: Open the full-screen Reference Legend overlay.
|
|
137
|
+
* **`ESC` or `q`**: Close parses and return to the main menu.
|
|
138
|
+
* **Scroll-past boundary**: Triggers a `Return to the main menu? (y/n)` prompt.
|
|
139
|
+
|
|
140
|
+
### Reference Legend Navigation
|
|
141
|
+
Available in option 6 (Information) and the parse overlay (`↓`):
|
|
142
|
+
* **`←` / `→` (or `Enter`)**: Switch between three structured, side-by-side full-screen reference cards:
|
|
143
|
+
* **Page 1**: Universal POS (UPOS) & English Penn Treebank XPOS.
|
|
144
|
+
* **Page 2**: Universal Dependency Relations (deprels) grouped by clausal/nominal/modifier subtypes.
|
|
145
|
+
* **Page 3**: Legacy Stanford relations mapping table & key structural notes.
|
|
146
|
+
* **`↑` (in overlay), `ESC` or `q`**: Return to parses or the main menu.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## API Reference
|
|
151
|
+
|
|
152
|
+
| Method | Returns | Description |
|
|
153
|
+
|---|---|---|
|
|
154
|
+
| `parse(text, opts?)` | `UDSentence[]` | Tokenizes, tags, and parses dependencies. |
|
|
155
|
+
| `parseLines(text, opts?)` | `UDSentence[]` | Treats each input line as a single sentence (best for verse). |
|
|
156
|
+
| `tag(text, opts?)` | `UDSentence[]` | Tokenizes and tags parts-of-speech (skips dependency parser). |
|
|
157
|
+
| `conllu(text, opts?)` | `string` | Returns the raw CoNLL-U format output. |
|
|
158
|
+
| `parseConllu(str)` | `UDSentence[]` | Utility to parse any CoNLL-U format string into typed JS objects. |
|
|
159
|
+
|
|
160
|
+
### Typing Details (`UDWord`)
|
|
161
|
+
Each parsed word carries full UDPipe annotations, including Universal Dependencies tags:
|
|
162
|
+
* `id`: Numerical/ordered position within the phrase.
|
|
163
|
+
* `form`: Word form or punctuation symbol.
|
|
164
|
+
* `lemma`: Lemma or stem of word form.
|
|
165
|
+
* `upos`: Universal part-of-speech tag.
|
|
166
|
+
* `xpos`: Language-specific part-of-speech tag (Penn Treebank for English).
|
|
167
|
+
* `feats` / `featsMap`: Morphological features.
|
|
168
|
+
* `head`: ID of the head word (0 = ROOT).
|
|
169
|
+
* `deprel`: Universal dependency relation (e.g., `nsubj`, `obj`, `root`).
|
|
170
|
+
* `spaceAfter`: Boolean flag indicating if a whitespace character follows this word.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Development & Building
|
|
175
|
+
|
|
176
|
+
To rebuild the WebAssembly binary from source, ensure you have the Emscripten SDK (`emsdk`) installed and activated, then run:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
npm run build:wasm
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
The compiled assets will be written to `runtime/udpipe-wasm.mjs` and `runtime/udpipe-wasm.wasm`.
|
package/dist/cli.d.ts
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":""}
|