auralwise_cli 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,255 @@
1
+ # AuralWise CLI
2
+
3
+ Command-line interface for [AuralWise](https://auralwise.cn) Speech Intelligence API.
4
+
5
+ One API call returns transcription, speaker diarization, speaker embeddings, word-level timestamps, and 521-class audio event detection — all at once.
6
+
7
+ ## Features
8
+
9
+ - **Speech Transcription** — 99 languages, with a dedicated Chinese engine (`optimize_zh`) for faster speed and higher accuracy
10
+ - **Speaker Diarization** — Automatic speaker count detection, per-segment speaker labels
11
+ - **Speaker Embeddings** — 192-dim voice print vectors for cross-recording speaker matching
12
+ - **Timestamps** — Word-level (~10ms) or segment-level (~100ms) precision
13
+ - **Audio Event Detection** — 521 AudioSet sound event classes (applause, cough, music, keyboard, etc.)
14
+ - **VAD** — Voice Activity Detection segments
15
+ - **Batch Mode** — Half-price processing using off-peak GPU capacity, delivered within 24h
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ npm install -g auralwise_cli
21
+ ```
22
+
23
+ Requires Node.js >= 18.
24
+
25
+ ## Quick Start
26
+
27
+ ```bash
28
+ # Set your API key (get one at https://auralwise.cn)
29
+ export AURALWISE_API_KEY=asr_xxxxxxxxxxxxxxxxxxxx
30
+
31
+ # Transcribe from URL — waits for completion and prints results
32
+ auralwise transcribe https://example.com/meeting.mp3
33
+
34
+ # Transcribe a local file (auto base64 upload)
35
+ auralwise transcribe ./recording.wav
36
+
37
+ # Chinese optimization mode (faster, cheaper for Chinese audio)
38
+ auralwise transcribe ./meeting.mp3 --optimize-zh --language zh
39
+
40
+ # Submit without waiting
41
+ auralwise transcribe https://example.com/audio.mp3 --no-wait
42
+
43
+ # Get JSON output
44
+ auralwise transcribe ./audio.mp3 --json --output result.json
45
+ ```
46
+
47
+ ## Commands
48
+
49
+ ### `auralwise transcribe <source>`
50
+
51
+ Submit an audio file for processing. `<source>` can be an HTTP(S) URL or a local file path.
52
+
53
+ **Input modes:**
54
+ - **URL mode** — Pass an `https://...` URL; the GPU node downloads directly
55
+ - **File mode** — Pass a local file path; the CLI reads and uploads as base64
56
+
57
+ **Common options:**
58
+
59
+ | Option | Description |
60
+ |--------|-------------|
61
+ | `--language <lang>` | ASR language code (`zh`, `en`, `ja`, ...) or auto-detect if omitted |
62
+ | `--optimize-zh` | Use dedicated Chinese engine (faster, cheaper, segment-level timestamps) |
63
+ | `--no-asr` | Disable transcription |
64
+ | `--no-diarize` | Disable speaker diarization |
65
+ | `--no-events` | Disable audio event detection |
66
+ | `--hotwords <words>` | Boost recognition of specific words (comma-separated) |
67
+ | `--num-speakers <n>` | Set fixed number of speakers |
68
+ | `--max-speakers <n>` | Max speakers for auto-detection (default: 10) |
69
+ | `--batch` | Use batch mode (half-price, 24h delivery) |
70
+ | `--no-wait` | Return immediately after task creation |
71
+ | `--json` | Output result as JSON |
72
+ | `--output <file>` | Save result to file |
73
+ | `--callback-url <url>` | Webhook URL for completion notification |
74
+
75
+ **Advanced ASR options:**
76
+
77
+ | Option | Description |
78
+ |--------|-------------|
79
+ | `--beam-size <n>` | Beam search width (default: 5) |
80
+ | `--temperature <n>` | Decoding temperature (default: 0.0) |
81
+ | `--initial-prompt <text>` | Guide transcription style |
82
+ | `--vad-threshold <n>` | VAD sensitivity 0-1 (default: 0.35) |
83
+ | `--events-threshold <n>` | Audio event confidence threshold (default: 0.3) |
84
+ | `--events-classes <list>` | Only detect specific event classes |
85
+
86
+ ### `auralwise tasks`
87
+
88
+ List your tasks with optional filtering.
89
+
90
+ ```bash
91
+ auralwise tasks # List all tasks
92
+ auralwise tasks --status done # Only completed tasks
93
+ auralwise tasks --page 2 --page-size 50 # Pagination
94
+ auralwise tasks --json # JSON output
95
+ ```
96
+
97
+ ### `auralwise task <id>`
98
+
99
+ Get details of a specific task.
100
+
101
+ ```bash
102
+ auralwise task 550e8400-e29b-41d4-a716-446655440000
103
+ auralwise task 550e8400-e29b-41d4-a716-446655440000 --json
104
+ ```
105
+
106
+ ### `auralwise result <id>`
107
+
108
+ Retrieve the full result of a completed task.
109
+
110
+ ```bash
111
+ auralwise result <task-id> # Pretty-printed output
112
+ auralwise result <task-id> --json # JSON output
113
+ auralwise result <task-id> --output result.json # Save to file
114
+ ```
115
+
116
+ ### `auralwise delete <id>`
117
+
118
+ Delete a task and its associated files.
119
+
120
+ ```bash
121
+ auralwise delete <task-id> # With confirmation prompt
122
+ auralwise delete <task-id> --force # Skip confirmation
123
+ ```
124
+
125
+ ### `auralwise events`
126
+
127
+ Browse the 521 AudioSet sound event classes.
128
+
129
+ ```bash
130
+ auralwise events # List all 521 classes
131
+ auralwise events --search Cough # Search by name
132
+ auralwise events --category Music # Filter by category
133
+ auralwise events --json # JSON output
134
+ ```
135
+
136
+ ## Configuration
137
+
138
+ ### API Key
139
+
140
+ Set your API key via `--api-key` flag or environment variable:
141
+
142
+ ```bash
143
+ # Environment variable (recommended)
144
+ export AURALWISE_API_KEY=asr_xxxxxxxxxxxxxxxxxxxx
145
+
146
+ # Or pass directly
147
+ auralwise --api-key asr_xxxx transcribe ./audio.mp3
148
+ ```
149
+
150
+ ### Base URL
151
+
152
+ Override the API endpoint (default: `https://auralwise.cn/api/v1`):
153
+
154
+ ```bash
155
+ auralwise --base-url https://your-private-instance.com/api/v1 transcribe ./audio.mp3
156
+ ```
157
+
158
+ ### Language
159
+
160
+ The CLI supports English and Chinese interfaces:
161
+
162
+ ```bash
163
+ auralwise --locale zh --help # Chinese interface
164
+ auralwise --locale en transcribe --help # English interface (default)
165
+ ```
166
+
167
+ ## Examples
168
+
169
+ ### Meeting transcription with speaker diarization
170
+
171
+ ```bash
172
+ auralwise transcribe ./meeting.mp3 \
173
+ --optimize-zh \
174
+ --language zh \
175
+ --max-speakers 5 \
176
+ --output meeting_result.json
177
+ ```
178
+
179
+ ### Batch processing (half-price)
180
+
181
+ ```bash
182
+ # Submit in batch mode — processed during off-peak hours, 50% discount
183
+ auralwise transcribe https://storage.example.com/archive.mp3 \
184
+ --batch \
185
+ --no-wait \
186
+ --callback-url https://your-server.com/webhook
187
+ ```
188
+
189
+ ### Audio event detection only
190
+
191
+ ```bash
192
+ auralwise transcribe ./audio.mp3 \
193
+ --no-asr \
194
+ --no-diarize \
195
+ --events-classes "Cough,Music,Applause" \
196
+ --json
197
+ ```
198
+
199
+ ### Transcription only (no diarization, no events)
200
+
201
+ ```bash
202
+ auralwise transcribe ./podcast.mp3 \
203
+ --no-diarize \
204
+ --no-events \
205
+ --hotwords "AuralWise,PGPU" \
206
+ --output transcript.json
207
+ ```
208
+
209
+ ## Output Format
210
+
211
+ ### Pretty-printed (default)
212
+
213
+ ```
214
+ Audio Duration: 5.3min
215
+ Language: zh (99%)
216
+ Speakers: 2
217
+
218
+ Transcription
219
+
220
+ [0:00.5 - 0:02.3] SPEAKER_0: This is the first sentence
221
+ [0:02.5 - 0:04.1] SPEAKER_1: And this is the reply
222
+
223
+ Audio Events
224
+
225
+ [0:45.0 - 0:45.9] Cough (87%)
226
+ [1:20.0 - 1:25.0] Music (92%)
227
+
228
+ Speaker Embeddings
229
+
230
+ SPEAKER_0: 25 segments, 192-dim vector
231
+ SPEAKER_1: 18 segments, 192-dim vector
232
+ ```
233
+
234
+ ### JSON (`--json`)
235
+
236
+ Returns the full API response. See [API documentation](https://auralwise.cn/api-docs) for the complete schema.
237
+
238
+ ## Pricing
239
+
240
+ | Capability | Standard | Batch (50% off) |
241
+ |-----------|----------|-----------------|
242
+ | Chinese transcription | ¥0.27/hr | ¥0.14/hr |
243
+ | General transcription (with word timestamps) | ¥1.20/hr | ¥0.60/hr |
244
+ | Speaker diarization (labels + embeddings) | +¥0.40/hr | +¥0.20/hr |
245
+ | Audio event detection (521 classes) | +¥0.10/hr | +¥0.05/hr |
246
+
247
+ **Example: 100 hours of Chinese meetings (full features) = ¥39 in batch mode.**
248
+
249
+ ## API Documentation
250
+
251
+ Full API reference: https://auralwise.cn/api-docs
252
+
253
+ ## License
254
+
255
+ MIT
package/bin/auralwise.js CHANGED
@@ -20,7 +20,7 @@ const program = new Command();
20
20
  program
21
21
  .name('auralwise')
22
22
  .description(t('descMain'))
23
- .version('1.0.1')
23
+ .version('1.0.2')
24
24
  .option('--api-key <key>', t('optApiKey'))
25
25
  .option('--base-url <url>', t('optBaseUrl'), 'https://auralwise.cn/api/v1')
26
26
  .option('--locale <locale>', t('optLocale'));
package/lib/i18n.js CHANGED
@@ -98,14 +98,19 @@ const messages = {
98
98
  category: 'Category',
99
99
  noEventsFound: 'No matching events found.',
100
100
 
101
+ // Result display - additional
102
+ vadSegments: 'VAD Segments',
103
+ diarizeSegments: 'Diarize Segments',
104
+ langProb: 'Language Probability',
105
+
101
106
  // Command descriptions
102
- descMain: 'CLI for AuralWise audio intelligence API',
107
+ descMain: 'AuralWise Speech Intelligence API CLI\n\n Transcription, speaker diarization, speaker embeddings,\n word-level timestamps, and 521-class audio event detection\n — all in one API call.',
103
108
  descTranscribe: 'Submit an audio transcription task (URL or local file)',
104
109
  descTasks: 'List tasks',
105
110
  descTask: 'Get task details',
106
111
  descResult: 'Get task result',
107
112
  descDelete: 'Delete a task',
108
- descEvents: 'List audio event classes (521 classes)',
113
+ descEvents: 'List all 521 AudioSet sound event classes',
109
114
  argSource: 'Audio URL or local file path',
110
115
  argTaskId: 'Task ID',
111
116
  },
@@ -208,14 +213,19 @@ const messages = {
208
213
  category: '类别',
209
214
  noEventsFound: '未找到匹配的事件。',
210
215
 
216
+ // Result display - additional
217
+ vadSegments: 'VAD 语音段',
218
+ diarizeSegments: '说话人分离段',
219
+ langProb: '语言置信度',
220
+
211
221
  // Command descriptions
212
- descMain: 'AuralWise 语音智能 API 命令行工具',
222
+ descMain: 'AuralWise 语音智能 API 命令行工具\n\n 转写、说话人分离、声纹向量、词级时间戳、521 类声音事件检测\n —— 一次调用,全部返回。',
213
223
  descTranscribe: '提交音频转写任务(URL 或本地文件)',
214
224
  descTasks: '列出任务',
215
225
  descTask: '查看任务详情',
216
226
  descResult: '获取任务结果',
217
227
  descDelete: '删除任务',
218
- descEvents: '列出声音事件类别(521 类)',
228
+ descEvents: '列出全部 521 类 AudioSet 声音事件',
219
229
  argSource: '音频 URL 或本地文件路径',
220
230
  argTaskId: '任务 ID',
221
231
  },
package/lib/utils.js CHANGED
@@ -85,6 +85,32 @@ export function printResult(result) {
85
85
  console.log(` ${chalk.cyan(spk.speaker_id)}: ${spk.segment_count} ${t('segments')}, ${spk.embedding.length}${t('dimVector')}`);
86
86
  }
87
87
  }
88
+
89
+ if (result.vad_segments && result.vad_segments.length > 0) {
90
+ console.log();
91
+ console.log(chalk.bold.underline(`${t('vadSegments')} (${result.vad_segments.length})`));
92
+ console.log();
93
+ for (const seg of result.vad_segments.slice(0, 20)) {
94
+ const dur = (seg.end - seg.start).toFixed(1);
95
+ console.log(` [${formatTime(seg.start)} - ${formatTime(seg.end)}] ${chalk.dim(`${dur}s`)}`);
96
+ }
97
+ if (result.vad_segments.length > 20) {
98
+ console.log(chalk.dim(` ... +${result.vad_segments.length - 20} more`));
99
+ }
100
+ }
101
+
102
+ if (result.diarize_segments && result.diarize_segments.length > 0) {
103
+ console.log();
104
+ console.log(chalk.bold.underline(`${t('diarizeSegments')} (${result.diarize_segments.length})`));
105
+ console.log();
106
+ for (const seg of result.diarize_segments.slice(0, 20)) {
107
+ const dur = (seg.end - seg.start).toFixed(1);
108
+ console.log(` [${formatTime(seg.start)} - ${formatTime(seg.end)}] ${chalk.cyan(seg.speaker)} ${chalk.dim(`${dur}s`)}`);
109
+ }
110
+ if (result.diarize_segments.length > 20) {
111
+ console.log(chalk.dim(` ... +${result.diarize_segments.length - 20} more`));
112
+ }
113
+ }
88
114
  }
89
115
 
90
116
  export function printTaskDetail(task) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "auralwise_cli",
3
- "version": "1.0.1",
3
+ "version": "1.0.2",
4
4
  "description": "CLI for AuralWise audio intelligence API - transcription, speaker diarization, audio event detection",
5
5
  "type": "module",
6
6
  "bin": {