udpipe-node 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,182 @@
1
+ # udpipe-node
2
+
3
+ Run the [UDPipe 1.x](https://ufal.mff.cuni.cz/udpipe/1) dependency parser in Node.js / TypeScript. In-package:English UD 2.5 model (GUM), Multi-OS WASM engine, MacOS CLI engine.
4
+
5
+ Wrapping the original C++ implementation of [UDPipe 1.4.0 (version released on 11/20/2025)](https://github.com/ufal/udpipe/releases/tag/v1.4.0), this portable package provides a typed, synchronous API, two swappable execution backends for extra compatibility, & a built-in model.
6
+ **Why this exists**: The official UDPipe ships bindings for Python, Java, C#, and Perl — but none for JS/TS/Node. This package fills that gap and, thanks to WASM, can run anywhere.
7
+
8
+ ---
9
+
10
+ ## Out-of-the-Box Contents
11
+
12
+ To facilitate expedient setup and portability, this package bundles:
13
+ 1. **A Pre-compiled WebAssembly Layer** (`runtime/udpipe-wasm.*`) generated using Emscripten.
14
+ 2. **A Single Default English Model**: The GUM variant from Universal Dependencies 2.5 (`models/english-gum-ud-2.5-191206.udpipe`).
15
+
16
+ > [!NOTE]
17
+ > To use other languages or alternative English models, you must download them from the official **[LINDAT UDPipe Models Repository](https://lindat.mff.cuni.cz/repository/items/41f05304-629f-4313-b9cf-9eeb0a2ca7c6)** and pass the custom path to the constructor.
18
+
19
+ ---
20
+
21
+ ## Architecture & OS Compatibility
22
+
23
+ The package supports two backends, each with different OS compatibility characteristics:
24
+
25
+ ```
26
+ ┌────────────────────────────────────────────┐
27
+ text → │ UDPipe.parse() → engine.process() → CoNLL-U → parseConllu() → UDSentence[]
28
+ └────────────────────────────────────────────┘
29
+
30
+ ┌───────────────┴───────────────┐
31
+ WasmEngine CliEngine
32
+ (Pure WebAssembly) (Subprocess spawn)
33
+ ```
34
+ - **`conllu.ts`** — backend-independent CoNLL-U → typed objects. Reused by every engine.
35
+ - **`engine.ts`** — the `UDPipeEngine` interface + `CliEngine` (spawns the binary).
36
+ - **`index.ts`** — the `UDPipe` convenience class.
37
+
38
+ ### 1. WasmEngine (Pure WebAssembly — Recommended)
39
+ * **Compatibility**: 💻 **Fully Cross-Platform** (macOS ARM64/Intel, Windows x64, and Linux x64).
40
+ * **Requirements**: Node.js >= 20. Runs anywhere Node.js runs without native binaries, native compilation, or subprocess spawning.
41
+ * **Internals**: Instantiates the Emscripten-compiled WebAssembly binary synchronously on import using top-level await, allowing synchronous, low-latency parsing (~5ms warm re-parse).
42
+
43
+ ### 2. CliEngine (Subprocess Native Binary)
44
+ * **Compatibility**: **macOS Only** out-of-the-box (other OS binaries *not* packaged in the npm distribution).
45
+ * **Custom Config**: To use `CliEngine` on Windows or Linux, you must obtain a compiled `udpipe` executable for your system and specify its path manually using the `binaryPath` option.
46
+
47
+ ---
48
+
49
+ ## Quick Start / Setup
50
+
51
+ ```bash
52
+ # 1. Install dependencies
53
+ npm install
54
+
55
+ # 2. Build TypeScript and run the WASM smoke test
56
+ npm start
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Usage Guide
62
+
63
+ ### 1. Using the Portable WASM Engine (Recommended)
64
+ Importing from the `/wasm` subpath resolves the bundled English model automatically and runs via pure WebAssembly.
65
+
66
+ ```typescript
67
+ import { createUDPipe } from 'udpipe-node/wasm';
68
+
69
+ // Initializes the WASM engine with the bundled English GUM model
70
+ const nlp = createUDPipe();
71
+
72
+ // Verse mode — one input line = one sentence (keeps line boundaries):
73
+ const sentences = nlp.parseLines('From fairest creatures we desire increase,');
74
+
75
+ // Prose Mode: UDPipe segments sentences itself
76
+ const sentences = nlp.parse("I love poetry. It scans well.");
77
+
78
+ for (const s of sentences) {
79
+ console.log(`# Sentence: ${s.text}`);
80
+ for (const w of s.words) {
81
+ console.log(`${w.id}\t${w.form}\t${w.xpos}\t${w.deprel}\t→\t${w.head}`);
82
+ }
83
+ }
84
+ ```
85
+
86
+ ### 2. Using the CLI Engine (macOS Only Out-of-the-Box)
87
+ If you want to use the native binary runner instead, use the method below.
88
+ // Defaults to bin-macos/udpipe (macOS only) and English GUM model in the repo root.
89
+ // Will fail on Windows/Linux or in published npm environments unless paths are provided.
90
+
91
+ ```typescript
92
+ import { UDPipe, CliEngine } from 'udpipe-node';
93
+
94
+ const nlp = new UDPipe();
95
+
96
+ const sentences = nlp.parseLines("From fairest creatures we desire increase,\nThat thereby beauty's rose might never die,");
97
+ ```
98
+
99
+ ### 3. Custom Binary and Model Paths
100
+ **Download binaries** for your OS from the [UDPipe GitHub Distribution](https://github.com/ufal/udpipe/releases/tag/v1.4.0). <br>
101
+ **Download models** from the [LINDAT repo](https://lindat.mff.cuni.cz/repository/items/41f05304-629f-4313-b9cf-9eeb0a2ca7c6). <br>
102
+ You can **specify custom paths** for other languages, models, or native executables:
103
+
104
+ ```typescript
105
+ import { UDPipe, CliEngine } from 'udpipe-node';
106
+
107
+ const nlp = new UDPipe({
108
+ // Path to your custom language model downloaded from LINDAT
109
+ modelPath: '/path/to/czech-pdt-ud-2.5-191206.udpipe',
110
+ // Path to your local Windows or Linux udpipe binary
111
+ binaryPath: '/path/to/udpipe'
112
+ });
113
+ ```
114
+
115
+ ## Interactive Parser CLI
116
+
117
+ The package includes a rich, interactive, chalk-colored terminal CLI for querying and testing the parser. To build the codebase and launch the CLI, run:
118
+
119
+ ```bash
120
+ npm start
121
+ ```
122
+
123
+ ### Main Menu Options
124
+ 1. **Verse Mode**: Parse pasted text line-by-line (respects line breaks).
125
+ 2. **Prose Mode**: Parse pasted text with sentence segmentation.
126
+ 3. **Parse from File (Verse Mode)**: Read a text file and parse line-by-line.
127
+ 4. **Parse from File (Prose Mode)**: Read a text file and segment into sentences automatically.
128
+ 5. **Load Model File**: Dynamically load a different `.udpipe` language model file.
129
+ 6. **Information**: Full-screen 3-page reference guide for POS tags and dependencies.
130
+ 7. **Exit**: Terminate the CLI.
131
+
132
+ ### Parsing Navigation Controls
133
+ Once parsing is complete, traverse sentences using the keyboard:
134
+ * **`→` or `Enter`**: Go to the next sentence/line parse.
135
+ * **`←`**: Go to the previous sentence/line.
136
+ * **`↓`**: Open the full-screen Reference Legend overlay.
137
+ * **`ESC` or `q`**: Close parses and return to the main menu.
138
+ * **Scroll-past boundary**: Triggers a `Return to the main menu? (y/n)` prompt.
139
+
140
+ ### Reference Legend Navigation
141
+ Available in option 6 (Information) and the parse overlay (`↓`):
142
+ * **`←` / `→` (or `Enter`)**: Switch between three structured, side-by-side full-screen reference cards:
143
+ * **Page 1**: Universal POS (UPOS) & English Penn Treebank XPOS.
144
+ * **Page 2**: Universal Dependency Relations (deprels) grouped by clausal/nominal/modifier subtypes.
145
+ * **Page 3**: Legacy Stanford relations mapping table & key structural notes.
146
+ * **`↑` (in overlay), `ESC` or `q`**: Return to parses or the main menu.
147
+
148
+ ---
149
+
150
+ ## API Reference
151
+
152
+ | Method | Returns | Description |
153
+ |---|---|---|
154
+ | `parse(text, opts?)` | `UDSentence[]` | Tokenizes, tags, and parses dependencies. |
155
+ | `parseLines(text, opts?)` | `UDSentence[]` | Treats each input line as a single sentence (best for verse). |
156
+ | `tag(text, opts?)` | `UDSentence[]` | Tokenizes and tags parts-of-speech (skips dependency parser). |
157
+ | `conllu(text, opts?)` | `string` | Returns the raw CoNLL-U format output. |
158
+ | `parseConllu(str)` | `UDSentence[]` | Utility to parse any CoNLL-U format string into typed JS objects. |
159
+
160
+ ### Typing Details (`UDWord`)
161
+ Each parsed word carries full UDPipe annotations, including Universal Dependencies tags:
162
+ * `id`: Numerical/ordered position within the phrase.
163
+ * `form`: Word form or punctuation symbol.
164
+ * `lemma`: Lemma or stem of word form.
165
+ * `upos`: Universal part-of-speech tag.
166
+ * `xpos`: Language-specific part-of-speech tag (Penn Treebank for English).
167
+ * `feats` / `featsMap`: Morphological features.
168
+ * `head`: ID of the head word (0 = ROOT).
169
+ * `deprel`: Universal dependency relation (e.g., `nsubj`, `obj`, `root`).
170
+ * `spaceAfter`: Boolean flag indicating if a whitespace character follows this word.
171
+
172
+ ---
173
+
174
+ ## Development & Building
175
+
176
+ To rebuild the WebAssembly binary from source, ensure you have the Emscripten SDK (`emsdk`) installed and activated, then run:
177
+
178
+ ```bash
179
+ npm run build:wasm
180
+ ```
181
+
182
+ The compiled assets will be written to `runtime/udpipe-wasm.mjs` and `runtime/udpipe-wasm.wasm`.
package/dist/cli.d.ts ADDED
@@ -0,0 +1,2 @@
1
+ export {};
2
+ //# sourceMappingURL=cli.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":""}