aiex-cli 0.1.0 → 0.1.1-beta.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -38,7 +38,7 @@ aiex watch -s invoice -d ./watch_folder # watch folder daemon for automatic extr
|
|
|
38
38
|
- **Interactive Mode** — Run `aiex extract` without arguments for a guided extraction workflow
|
|
39
39
|
- **Batch Mode** — `aiex extract -d <dir>` processes entire directories with optional glob filtering
|
|
40
40
|
- **Incremental Extraction** — File hash deduplication skips already-processed files; use `--force` to override
|
|
41
|
-
- **Data
|
|
41
|
+
- **Web Data Export** — Export SQLite table data to CSV, Excel (.xlsx), or JSON from the Web UI
|
|
42
42
|
- **Notion Sync** — Optionally sync CLI extraction results to configured Notion data sources
|
|
43
43
|
- **Extraction Audit Trail** — Every extraction is recorded with status, input source, output path, token usage, database inserts, Notion pages, and errors
|
|
44
44
|
- **Built-in Model Registry** — Knows capabilities of 2000+ models (vision, structured output) so you don't have to guess
|
|
@@ -73,10 +73,6 @@ aiex extract -s <schema> -f <file> --no-insert # extract and save JSON witho
|
|
|
73
73
|
aiex extract -s <schema> -f <file> --force # force re-extraction even if already processed
|
|
74
74
|
aiex extract -s <schema> -d <directory> # batch extract all supported files in a directory
|
|
75
75
|
aiex extract -s <schema> -d <dir> -g "*.pdf" # batch with glob filter
|
|
76
|
-
aiex extract history # list extraction audit records
|
|
77
|
-
aiex extract show <audit-id> # show full audit record JSON
|
|
78
|
-
aiex extract retry <audit-id> # retry a previous extraction
|
|
79
|
-
aiex extract rm <audit-id> # delete an audit record and cached upload
|
|
80
76
|
```
|
|
81
77
|
|
|
82
78
|
The AI reads your document and outputs structured JSON matching your schema.
|
|
@@ -89,33 +85,21 @@ aiex extract -s paper -f research.pdf --no-insert # save result only, skip data
|
|
|
89
85
|
aiex extract -s paper -f research.pdf -m gpt-4o # use a specific model
|
|
90
86
|
aiex extract -s paper -f research.pdf --force # force re-extraction even if already processed
|
|
91
87
|
aiex extract -s paper -d ./papers -g "*.pdf" # batch extract PDFs from a directory
|
|
92
|
-
aiex extract history # inspect recent extraction runs
|
|
93
88
|
```
|
|
94
89
|
Saves the extracted result to `.aiex/extracted/<schema-name>-<timestamp>.json` with fields like `title`, `firstAuthor`, `journal`, `year` — exactly as defined in your schema. Data is automatically inserted into the SQLite database.
|
|
95
90
|
|
|
96
91
|
By default, aiex automatically selects a model based on your input type (vision-capable for images, structured output for text). Use `--model` / `-m` to override and specify any model from your AI configuration.
|
|
97
92
|
|
|
98
|
-
Every extraction is also recorded under `.aiex/extracted/_audit/`. Audit records include the run status (`running`, `succeeded`, `failed`, or `stale`), schema name, input source, output file, token usage, inserted table rows, synced Notion pages, retry lineage, and error message.
|
|
93
|
+
Every extraction is also recorded under `.aiex/extracted/_audit/`. Audit records include the run status (`running`, `succeeded`, `failed`, or `stale`), schema name, input source, output file, token usage, inserted table rows, synced Notion pages, retry lineage, and error message. Use the Web UI to inspect, retry, or delete extraction records.
|
|
99
94
|
|
|
100
95
|
### 4. Watch Folder Daemon (Auto-Extraction)
|
|
101
96
|
|
|
102
97
|
```bash
|
|
98
|
+
aiex watch
|
|
103
99
|
aiex watch -s <schema> -d <folder>
|
|
104
100
|
```
|
|
105
101
|
|
|
106
|
-
Runs a background watcher daemon to monitor a folder for new incoming files (such as scanned documents or downloads), automatically performing offline data extraction, database insertion, and system notifications.
|
|
107
|
-
|
|
108
|
-
### 5. Dump Data
|
|
109
|
-
|
|
110
|
-
```bash
|
|
111
|
-
aiex dump -s <schema> # dump to CSV (default)
|
|
112
|
-
aiex dump -s <schema> -f xlsx -o output.xlsx # dump to Excel
|
|
113
|
-
aiex dump -t <table> -f csv -o output.csv # dump a specific table by name
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
Dumps all extracted data for a given schema (or table) from the SQLite database to CSV or Excel format.
|
|
117
|
-
|
|
118
|
-
<br>
|
|
102
|
+
Runs a background watcher daemon to monitor a folder for new incoming files (such as scanned documents or downloads), automatically performing offline data extraction, database insertion, and system notifications. Run without arguments to choose a schema, watch directory, model, and insert mode interactively.
|
|
119
103
|
|
|
120
104
|
## 📖 Commands
|
|
121
105
|
|
|
@@ -131,15 +115,9 @@ Dumps all extracted data for a given schema (or table) from the SQLite database
|
|
|
131
115
|
| `aiex extract -s <name> -f <file> --force` | Force re-extraction even if the file has already been processed |
|
|
132
116
|
| `aiex extract -s <name> -d <dir>` | Batch extract all supported files in a directory |
|
|
133
117
|
| `aiex extract -s <name> -d <dir> -g "*.pdf"` | Batch extract with glob filter |
|
|
134
|
-
| `aiex
|
|
135
|
-
| `aiex extract show <audit-id>` | Show a full extraction audit record |
|
|
136
|
-
| `aiex extract retry <audit-id>` | Retry a previous extraction run |
|
|
137
|
-
| `aiex extract retry <audit-id> --no-insert` | Retry without inserting into SQLite |
|
|
138
|
-
| `aiex extract rm <audit-id>` | Delete an audit record and its cached upload |
|
|
118
|
+
| `aiex watch` | Guided setup for watching a directory and automatically extracting new files |
|
|
139
119
|
| `aiex watch -s <name> -d <dir>` | Watch a directory for new files and automatically extract data |
|
|
140
120
|
| `aiex watch -s <name> -d <dir> --no-insert` | Watch and save JSON without inserting into SQLite |
|
|
141
|
-
| `aiex dump -s <name>` | Dump extracted data for a schema to CSV |
|
|
142
|
-
| `aiex dump -s <name> -f xlsx -o <file>` | Dump to Excel (.xlsx) |
|
|
143
121
|
| `aiex doctor` | System and configuration diagnostics |
|
|
144
122
|
| `aiex completion bash\|zsh\|fish` | Generate shell completion scripts |
|
|
145
123
|
|
package/dist/cli.mjs
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { A as package_default, C as DEFAULT_PROMPT_CONFIG, D as seedConfig, E as createConfig, O as description, S as DEFAULT_MINERU_CONFIG, T as PLACEHOLDER_TEXT, _ as doctorDiagnosticsSeverityRows, a as recognizeImageText, b as DEFAULT_LITEPARSE_CONFIG, c as t, d as writeAIConfig, f as AIConfigSchema, h as toSnakeCase, j as version, k as name, l as getDefaultAIConfig, m as parseJsonSchema, n as collectDoctorDiagnostics, o as shouldUseImageOcrFallback, p as JsonSchemaDefinitionSchema, r as createMigrationConfig, s as initI18n, t as generateDrizzleSchema, u as readAIConfig, v as doctorDiagnosticsTableRows, w as PLACEHOLDER_SCHEMA, x as DEFAULT_MINERU_API_CONFIG, y as formatDoctorDiagnosticsJson } from "./generate-drizzle-schema-
|
|
1
|
+
import { A as package_default, C as DEFAULT_PROMPT_CONFIG, D as seedConfig, E as createConfig, O as description, S as DEFAULT_MINERU_CONFIG, T as PLACEHOLDER_TEXT, _ as doctorDiagnosticsSeverityRows, a as recognizeImageText, b as DEFAULT_LITEPARSE_CONFIG, c as t, d as writeAIConfig, f as AIConfigSchema, h as toSnakeCase, j as version, k as name, l as getDefaultAIConfig, m as parseJsonSchema, n as collectDoctorDiagnostics, o as shouldUseImageOcrFallback, p as JsonSchemaDefinitionSchema, r as createMigrationConfig, s as initI18n, t as generateDrizzleSchema, u as readAIConfig, v as doctorDiagnosticsTableRows, w as PLACEHOLDER_SCHEMA, x as DEFAULT_MINERU_API_CONFIG, y as formatDoctorDiagnosticsJson } from "./generate-drizzle-schema-CaSMqQWx.mjs";
|
|
2
2
|
import { createRequire } from "node:module";
|
|
3
3
|
import fs from "node:fs/promises";
|
|
4
4
|
import os from "node:os";
|
|
@@ -13,13 +13,12 @@ import { defineCommand, runMain } from "citty";
|
|
|
13
13
|
import { consola } from "consola";
|
|
14
14
|
import updateNotifier from "update-notifier";
|
|
15
15
|
import CliTable3 from "cli-table3";
|
|
16
|
-
import fs$1 from "node:fs";
|
|
17
16
|
import { confirm, intro, isCancel, outro, select, spinner, text } from "@clack/prompts";
|
|
18
17
|
import pc from "picocolors";
|
|
19
|
-
import
|
|
20
|
-
import * as XLSX from "xlsx";
|
|
18
|
+
import fs$1 from "node:fs";
|
|
21
19
|
import { TextDecoder, promisify } from "node:util";
|
|
22
20
|
import { fileTypeFromBuffer, fileTypeFromFile } from "file-type";
|
|
21
|
+
import { Buffer } from "node:buffer";
|
|
23
22
|
import { glob, globSync } from "tinyglobby";
|
|
24
23
|
import { extractText, getDocumentProxy, getMeta } from "unpdf";
|
|
25
24
|
import AdmZip from "adm-zip";
|
|
@@ -144,260 +143,9 @@ const doctorCommand = defineCommand({
|
|
|
144
143
|
}
|
|
145
144
|
});
|
|
146
145
|
|
|
147
|
-
//#endregion
|
|
148
|
-
//#region src/application/export/export-manager.ts
|
|
149
|
-
function formatRowsConformingToSchema(rows, columns, schema, format) {
|
|
150
|
-
return rows.map((row) => {
|
|
151
|
-
const newRow = {};
|
|
152
|
-
columns.forEach((col) => {
|
|
153
|
-
const colName = col.name;
|
|
154
|
-
const val = row[colName];
|
|
155
|
-
const type = (schema?.properties?.[colName])?.type || "";
|
|
156
|
-
if (val === null || val === void 0) newRow[colName] = "";
|
|
157
|
-
else if (type === "boolean") if (format === "xlsx") newRow[colName] = val === 1 || val === "1" || val === true;
|
|
158
|
-
else newRow[colName] = val === 1 || val === "1" || val === true ? "true" : "false";
|
|
159
|
-
else if (type === "number" || type === "integer") if (val === "") newRow[colName] = "";
|
|
160
|
-
else {
|
|
161
|
-
const num = Number(val);
|
|
162
|
-
newRow[colName] = Number.isNaN(num) ? val : num;
|
|
163
|
-
}
|
|
164
|
-
else if (typeof val === "object") newRow[colName] = JSON.stringify(val);
|
|
165
|
-
else {
|
|
166
|
-
const dbType = (col.type || "").toLowerCase();
|
|
167
|
-
if ((dbType.includes("int") || dbType.includes("real") || dbType.includes("num") || dbType.includes("double") || dbType.includes("float")) && typeof val === "string" && val !== "") {
|
|
168
|
-
const num = Number(val);
|
|
169
|
-
newRow[colName] = Number.isNaN(num) ? val : num;
|
|
170
|
-
} else newRow[colName] = val;
|
|
171
|
-
}
|
|
172
|
-
});
|
|
173
|
-
return newRow;
|
|
174
|
-
});
|
|
175
|
-
}
|
|
176
|
-
function generateExportBuffer(tableName, formattedRows, columns, format) {
|
|
177
|
-
const ws = XLSX.utils.json_to_sheet(formattedRows, { header: columns.map((col) => col.name) });
|
|
178
|
-
if (format === "xlsx") {
|
|
179
|
-
const wb = XLSX.utils.book_new();
|
|
180
|
-
XLSX.utils.book_append_sheet(wb, ws, tableName.slice(0, 31));
|
|
181
|
-
return XLSX.write(wb, {
|
|
182
|
-
bookType: "xlsx",
|
|
183
|
-
type: "buffer"
|
|
184
|
-
});
|
|
185
|
-
} else {
|
|
186
|
-
const csv = XLSX.utils.sheet_to_csv(ws);
|
|
187
|
-
return Buffer.from("" + csv, "utf8");
|
|
188
|
-
}
|
|
189
|
-
}
|
|
190
|
-
|
|
191
|
-
//#endregion
|
|
192
|
-
//#region src/application/schema/load-schema.ts
|
|
193
|
-
const JSON_EXT_RE$1 = /\.json$/;
|
|
194
|
-
async function loadSchema(config, schemaName) {
|
|
195
|
-
const schemaPath = path.join(config.schemaPath, `${schemaName}.json`);
|
|
196
|
-
try {
|
|
197
|
-
const parsed = await readFile(schemaPath);
|
|
198
|
-
return { schema: JsonSchemaDefinitionSchema.parse(parsed) };
|
|
199
|
-
} catch (e) {
|
|
200
|
-
if (e instanceof ZodError) return {
|
|
201
|
-
schema: null,
|
|
202
|
-
error: t("errors.schema.validationFailed", {
|
|
203
|
-
name: `${schemaName}.json`,
|
|
204
|
-
issues: e.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n")
|
|
205
|
-
})
|
|
206
|
-
};
|
|
207
|
-
if (e.code === "ENOENT") return {
|
|
208
|
-
schema: null,
|
|
209
|
-
error: t("errors.schema.cannotRead", { name: `${schemaName}.json` })
|
|
210
|
-
};
|
|
211
|
-
if (e instanceof SyntaxError) return {
|
|
212
|
-
schema: null,
|
|
213
|
-
error: t("errors.schema.invalidJson", { name: `${schemaName}.json` })
|
|
214
|
-
};
|
|
215
|
-
return {
|
|
216
|
-
schema: null,
|
|
217
|
-
error: String(e)
|
|
218
|
-
};
|
|
219
|
-
}
|
|
220
|
-
}
|
|
221
|
-
async function listSchemas(aiexDir) {
|
|
222
|
-
try {
|
|
223
|
-
const dir = path.join(aiexDir, "schema");
|
|
224
|
-
return (await fs.readdir(dir)).filter((f) => f.endsWith(".json")).map((f) => f.replace(JSON_EXT_RE$1, "")).sort();
|
|
225
|
-
} catch {
|
|
226
|
-
return [];
|
|
227
|
-
}
|
|
228
|
-
}
|
|
229
|
-
|
|
230
|
-
//#endregion
|
|
231
|
-
//#region src/commands/utils.ts
|
|
232
|
-
function failCommand(message) {
|
|
233
|
-
if (message) consola.error(message);
|
|
234
|
-
outro(t("common.failed"));
|
|
235
|
-
process.exitCode = 1;
|
|
236
|
-
}
|
|
237
|
-
|
|
238
|
-
//#endregion
|
|
239
|
-
//#region src/commands/dump.ts
|
|
240
|
-
const dumpCommand = defineCommand({
|
|
241
|
-
meta: {
|
|
242
|
-
name: "dump",
|
|
243
|
-
description: t("command.dump.description")
|
|
244
|
-
},
|
|
245
|
-
args: {
|
|
246
|
-
table: {
|
|
247
|
-
type: "string",
|
|
248
|
-
alias: "t",
|
|
249
|
-
description: t("command.dump.args.table")
|
|
250
|
-
},
|
|
251
|
-
schema: {
|
|
252
|
-
type: "string",
|
|
253
|
-
alias: "s",
|
|
254
|
-
description: t("command.dump.args.schema")
|
|
255
|
-
},
|
|
256
|
-
format: {
|
|
257
|
-
type: "string",
|
|
258
|
-
alias: "f",
|
|
259
|
-
description: t("command.dump.args.format")
|
|
260
|
-
},
|
|
261
|
-
output: {
|
|
262
|
-
type: "string",
|
|
263
|
-
alias: "o",
|
|
264
|
-
description: t("command.dump.args.output")
|
|
265
|
-
}
|
|
266
|
-
},
|
|
267
|
-
async run({ args }) {
|
|
268
|
-
intro(pc.inverse(" aiex dump "));
|
|
269
|
-
await initI18n();
|
|
270
|
-
if (!args.table && !args.schema) {
|
|
271
|
-
failCommand(t("command.dump.errors.tableOrSchemaRequired"));
|
|
272
|
-
return;
|
|
273
|
-
}
|
|
274
|
-
const cwd = process.cwd();
|
|
275
|
-
const config = createMigrationConfig(cwd);
|
|
276
|
-
const schemaDir = config.schemaPath;
|
|
277
|
-
let tableName = args.table || "";
|
|
278
|
-
let schema = null;
|
|
279
|
-
if (args.schema) {
|
|
280
|
-
const schemaLoad = await loadSchema(config, args.schema);
|
|
281
|
-
if (!schemaLoad.schema) {
|
|
282
|
-
failCommand(schemaLoad.error || t("command.dump.errors.schemaNotFound", { name: args.schema }));
|
|
283
|
-
return;
|
|
284
|
-
}
|
|
285
|
-
schema = schemaLoad.schema;
|
|
286
|
-
const tName = schema.table?.name;
|
|
287
|
-
if (!tName) {
|
|
288
|
-
failCommand(t("command.dump.errors.noTableName", { name: args.schema }));
|
|
289
|
-
return;
|
|
290
|
-
}
|
|
291
|
-
if (tableName && tableName !== tName) {
|
|
292
|
-
failCommand(t("command.dump.errors.tableMismatch", {
|
|
293
|
-
table: tableName,
|
|
294
|
-
schemaTable: tName
|
|
295
|
-
}));
|
|
296
|
-
return;
|
|
297
|
-
}
|
|
298
|
-
tableName = tName;
|
|
299
|
-
} else try {
|
|
300
|
-
if (fs$1.existsSync(schemaDir)) {
|
|
301
|
-
const files = fs$1.readdirSync(schemaDir).filter((f) => f.endsWith(".json"));
|
|
302
|
-
for (const file of files) {
|
|
303
|
-
const s$1 = await readFile(path.join(schemaDir, file));
|
|
304
|
-
if (s$1.table?.name === tableName) {
|
|
305
|
-
schema = s$1;
|
|
306
|
-
break;
|
|
307
|
-
}
|
|
308
|
-
}
|
|
309
|
-
}
|
|
310
|
-
} catch {}
|
|
311
|
-
let format = args.format?.toLowerCase();
|
|
312
|
-
const outputPathArg = args.output;
|
|
313
|
-
if (outputPathArg) {
|
|
314
|
-
const ext = path.extname(outputPathArg).toLowerCase();
|
|
315
|
-
if (!format) {
|
|
316
|
-
if (ext === ".xlsx") format = "xlsx";
|
|
317
|
-
else if (ext === ".csv") format = "csv";
|
|
318
|
-
}
|
|
319
|
-
}
|
|
320
|
-
if (!format) format = "csv";
|
|
321
|
-
if (format !== "csv" && format !== "xlsx") {
|
|
322
|
-
failCommand(t("command.dump.errors.unsupportedFormat", { format }));
|
|
323
|
-
return;
|
|
324
|
-
}
|
|
325
|
-
const resolvedOutput = outputPathArg ? path.resolve(outputPathArg) : path.resolve(cwd, `${tableName}.${format}`);
|
|
326
|
-
if (!fs$1.existsSync(config.databasePath)) {
|
|
327
|
-
failCommand(t("command.dump.errors.dbNotFound", {
|
|
328
|
-
path: config.databasePath,
|
|
329
|
-
cmd: "aiex schema"
|
|
330
|
-
}));
|
|
331
|
-
return;
|
|
332
|
-
}
|
|
333
|
-
const s = spinner();
|
|
334
|
-
s.start(t("command.dump.loading", { name: tableName }));
|
|
335
|
-
let columns = [];
|
|
336
|
-
let rows = [];
|
|
337
|
-
try {
|
|
338
|
-
const db = new Database(config.databasePath, { readonly: true });
|
|
339
|
-
if (!db.prepare(`
|
|
340
|
-
select name from sqlite_master
|
|
341
|
-
where type = 'table' and name = ?
|
|
342
|
-
`).get(tableName)) {
|
|
343
|
-
s.stop(t("command.dump.dbQueryFailed"));
|
|
344
|
-
failCommand(t("command.dump.errors.tableNotFound", {
|
|
345
|
-
name: tableName,
|
|
346
|
-
cmd: "aiex schema"
|
|
347
|
-
}));
|
|
348
|
-
db.close();
|
|
349
|
-
return;
|
|
350
|
-
}
|
|
351
|
-
columns = db.pragma(`table_info(${tableName})`);
|
|
352
|
-
rows = db.prepare(`select * from ${tableName}`).all();
|
|
353
|
-
db.close();
|
|
354
|
-
} catch (error) {
|
|
355
|
-
s.stop(t("command.dump.dbQueryFailed"));
|
|
356
|
-
failCommand(error instanceof Error ? error.message : String(error));
|
|
357
|
-
return;
|
|
358
|
-
}
|
|
359
|
-
if (rows.length === 0) {
|
|
360
|
-
s.stop(t("command.dump.emptyTable"));
|
|
361
|
-
consola.warn(t("command.dump.errors.tableEmpty", { name: tableName }));
|
|
362
|
-
} else s.stop(t("command.dump.loaded", { count: rows.length }));
|
|
363
|
-
const s2 = spinner();
|
|
364
|
-
s2.start(t("command.dump.formatting"));
|
|
365
|
-
let formattedRows;
|
|
366
|
-
try {
|
|
367
|
-
formattedRows = formatRowsConformingToSchema(rows, columns, schema, format);
|
|
368
|
-
s2.stop(t("command.dump.formatted"));
|
|
369
|
-
} catch (error) {
|
|
370
|
-
s2.stop(t("command.dump.dbQueryFailed"));
|
|
371
|
-
failCommand(error instanceof Error ? error.message : String(error));
|
|
372
|
-
return;
|
|
373
|
-
}
|
|
374
|
-
const s3 = spinner();
|
|
375
|
-
s3.start(t("command.dump.writing", {
|
|
376
|
-
format: format.toUpperCase(),
|
|
377
|
-
path: resolvedOutput
|
|
378
|
-
}));
|
|
379
|
-
try {
|
|
380
|
-
const buffer = generateExportBuffer(tableName, formattedRows, columns, format);
|
|
381
|
-
const outputDir = path.dirname(resolvedOutput);
|
|
382
|
-
if (!fs$1.existsSync(outputDir)) fs$1.mkdirSync(outputDir, { recursive: true });
|
|
383
|
-
fs$1.writeFileSync(resolvedOutput, buffer);
|
|
384
|
-
s3.stop(t("command.dump.dumpCompleted"));
|
|
385
|
-
consola.success(t("command.dump.successMsg", {
|
|
386
|
-
count: rows.length,
|
|
387
|
-
path: pc.cyan(resolvedOutput)
|
|
388
|
-
}));
|
|
389
|
-
} catch (error) {
|
|
390
|
-
s3.stop(t("command.dump.fileWriteFailed"));
|
|
391
|
-
failCommand(error instanceof Error ? error.message : String(error));
|
|
392
|
-
return;
|
|
393
|
-
}
|
|
394
|
-
outro(t("common.done"));
|
|
395
|
-
}
|
|
396
|
-
});
|
|
397
|
-
|
|
398
146
|
//#endregion
|
|
399
147
|
//#region src/application/extraction/quality.ts
|
|
400
|
-
function formatInputProcessing
|
|
148
|
+
function formatInputProcessing(input) {
|
|
401
149
|
const handler = input.converter ? `${input.handler}(${input.converter})` : input.handler;
|
|
402
150
|
return `${input.mime ?? input.kind} -> ${handler}`;
|
|
403
151
|
}
|
|
@@ -2166,6 +1914,45 @@ Please output the corrected JSON object now:`;
|
|
|
2166
1914
|
}
|
|
2167
1915
|
}
|
|
2168
1916
|
|
|
1917
|
+
//#endregion
|
|
1918
|
+
//#region src/application/schema/load-schema.ts
|
|
1919
|
+
const JSON_EXT_RE$1 = /\.json$/;
|
|
1920
|
+
async function loadSchema(config, schemaName) {
|
|
1921
|
+
const schemaPath = path.join(config.schemaPath, `${schemaName}.json`);
|
|
1922
|
+
try {
|
|
1923
|
+
const parsed = await readFile(schemaPath);
|
|
1924
|
+
return { schema: JsonSchemaDefinitionSchema.parse(parsed) };
|
|
1925
|
+
} catch (e) {
|
|
1926
|
+
if (e instanceof ZodError) return {
|
|
1927
|
+
schema: null,
|
|
1928
|
+
error: t("errors.schema.validationFailed", {
|
|
1929
|
+
name: `${schemaName}.json`,
|
|
1930
|
+
issues: e.issues.map((i) => ` - ${i.path.join(".")}: ${i.message}`).join("\n")
|
|
1931
|
+
})
|
|
1932
|
+
};
|
|
1933
|
+
if (e.code === "ENOENT") return {
|
|
1934
|
+
schema: null,
|
|
1935
|
+
error: t("errors.schema.cannotRead", { name: `${schemaName}.json` })
|
|
1936
|
+
};
|
|
1937
|
+
if (e instanceof SyntaxError) return {
|
|
1938
|
+
schema: null,
|
|
1939
|
+
error: t("errors.schema.invalidJson", { name: `${schemaName}.json` })
|
|
1940
|
+
};
|
|
1941
|
+
return {
|
|
1942
|
+
schema: null,
|
|
1943
|
+
error: String(e)
|
|
1944
|
+
};
|
|
1945
|
+
}
|
|
1946
|
+
}
|
|
1947
|
+
async function listSchemas(aiexDir) {
|
|
1948
|
+
try {
|
|
1949
|
+
const dir = path.join(aiexDir, "schema");
|
|
1950
|
+
return (await fs.readdir(dir)).filter((f) => f.endsWith(".json")).map((f) => f.replace(JSON_EXT_RE$1, "")).sort();
|
|
1951
|
+
} catch {
|
|
1952
|
+
return [];
|
|
1953
|
+
}
|
|
1954
|
+
}
|
|
1955
|
+
|
|
2169
1956
|
//#endregion
|
|
2170
1957
|
//#region src/infrastructure/extraction/insert-extracted-data.ts
|
|
2171
1958
|
function convertValue(value, column) {
|
|
@@ -2511,7 +2298,7 @@ async function runAuditedExtraction(options) {
|
|
|
2511
2298
|
filePath = input.filePath;
|
|
2512
2299
|
inputProcessing = input.inputProcessing;
|
|
2513
2300
|
inputQuality = input.quality;
|
|
2514
|
-
if (!quiet) consola.info(`Input: ${formatInputProcessing
|
|
2301
|
+
if (!quiet) consola.info(`Input: ${formatInputProcessing(inputProcessing)}`);
|
|
2515
2302
|
await updateExtractionAuditRecord(aiexDir, audit.id, {
|
|
2516
2303
|
inputProcessing,
|
|
2517
2304
|
quality: inputQuality
|
|
@@ -2719,36 +2506,15 @@ async function runBatchExtraction(aiexDir, config, aiConfig, schemaName, dir, gl
|
|
|
2719
2506
|
}
|
|
2720
2507
|
|
|
2721
2508
|
//#endregion
|
|
2722
|
-
//#region src/commands/
|
|
2723
|
-
function
|
|
2724
|
-
if (
|
|
2725
|
-
|
|
2726
|
-
|
|
2727
|
-
}
|
|
2728
|
-
function isExtractSubCommand(rawArgs) {
|
|
2729
|
-
if (!Array.isArray(rawArgs)) return false;
|
|
2730
|
-
return rawArgs.some((arg) => typeof arg === "string" && [
|
|
2731
|
-
"history",
|
|
2732
|
-
"show",
|
|
2733
|
-
"retry",
|
|
2734
|
-
"rm"
|
|
2735
|
-
].includes(arg));
|
|
2736
|
-
}
|
|
2737
|
-
function formatSource(source) {
|
|
2738
|
-
return source.type === "file" ? source.fileName || "file" : "unknown";
|
|
2739
|
-
}
|
|
2740
|
-
function formatInputProcessing(input) {
|
|
2741
|
-
if (!input) return "";
|
|
2742
|
-
const handler = input.converter ? `${input.handler}(${input.converter})` : input.handler;
|
|
2743
|
-
return ` [${input.mime ?? input.kind} -> ${handler}]`;
|
|
2744
|
-
}
|
|
2745
|
-
function formatQuality(quality, failureStage) {
|
|
2746
|
-
if (failureStage) return ` [failed:${failureStage}]`;
|
|
2747
|
-
if (quality?.input?.pdf) return ` [pdf:${quality.input.pdf.pageCount}p/${quality.input.pdf.textLength}chars${quality.input.pdf.fallbackUsed ? "/fallback" : ""}]`;
|
|
2748
|
-
if (quality?.input?.ocr) return ` [ocr:${Math.round(quality.input.ocr.confidence * 100)}%/${quality.input.ocr.textLength}chars]`;
|
|
2749
|
-
if (quality?.ai?.missingFieldRate !== void 0) return ` [missing:${Math.round(quality.ai.missingFieldRate * 100)}%]`;
|
|
2750
|
-
return "";
|
|
2509
|
+
//#region src/commands/utils.ts
|
|
2510
|
+
function failCommand(message) {
|
|
2511
|
+
if (message) consola.error(message);
|
|
2512
|
+
outro(t("common.failed"));
|
|
2513
|
+
process.exitCode = 1;
|
|
2751
2514
|
}
|
|
2515
|
+
|
|
2516
|
+
//#endregion
|
|
2517
|
+
//#region src/commands/extract.ts
|
|
2752
2518
|
async function loadConfiguredAI(aiexDir) {
|
|
2753
2519
|
const aiConfig = await readAIConfig(aiexDir);
|
|
2754
2520
|
if (!aiConfig) {
|
|
@@ -2777,143 +2543,11 @@ function resolveModelOverride(aiConfig, modelName) {
|
|
|
2777
2543
|
}
|
|
2778
2544
|
return matched;
|
|
2779
2545
|
}
|
|
2780
|
-
const historyCommand = defineCommand({
|
|
2781
|
-
meta: {
|
|
2782
|
-
name: "history",
|
|
2783
|
-
description: t("command.extract.history.description")
|
|
2784
|
-
},
|
|
2785
|
-
async run() {
|
|
2786
|
-
const config = createMigrationConfig(process.cwd());
|
|
2787
|
-
const records = await listExtractionAuditRecords(path.dirname(config.schemaPath));
|
|
2788
|
-
if (records.length === 0) {
|
|
2789
|
-
consola.info(t("command.extract.history.empty"));
|
|
2790
|
-
return;
|
|
2791
|
-
}
|
|
2792
|
-
for (const record of records) {
|
|
2793
|
-
const suffix = record.error ? ` — ${record.error}` : record.outputName ? ` — ${record.outputName}` : "";
|
|
2794
|
-
consola.info(`${record.status.padEnd(9)} ${record.id} ${record.schemaName} ${formatSource(record.source)}${formatInputProcessing(record.inputProcessing)}${formatQuality(record.quality, record.failureStage)}${suffix}`);
|
|
2795
|
-
}
|
|
2796
|
-
}
|
|
2797
|
-
});
|
|
2798
|
-
const showCommand = defineCommand({
|
|
2799
|
-
meta: {
|
|
2800
|
-
name: "show",
|
|
2801
|
-
description: t("command.extract.show.description")
|
|
2802
|
-
},
|
|
2803
|
-
args: { id: {
|
|
2804
|
-
type: "string",
|
|
2805
|
-
description: t("command.extract.show.args.id")
|
|
2806
|
-
} },
|
|
2807
|
-
async run({ args }) {
|
|
2808
|
-
const id = getIdArg(args);
|
|
2809
|
-
if (!id) {
|
|
2810
|
-
failCommand(t("command.extract.history.errors.idRequired"));
|
|
2811
|
-
return;
|
|
2812
|
-
}
|
|
2813
|
-
const config = createMigrationConfig(process.cwd());
|
|
2814
|
-
const record = await readExtractionAuditRecord(path.dirname(config.schemaPath), id);
|
|
2815
|
-
if (!record) {
|
|
2816
|
-
failCommand(t("command.extract.history.errors.recordNotFound", { id }));
|
|
2817
|
-
return;
|
|
2818
|
-
}
|
|
2819
|
-
consola.info(JSON.stringify(record, null, 2));
|
|
2820
|
-
}
|
|
2821
|
-
});
|
|
2822
|
-
const retryCommand = defineCommand({
|
|
2823
|
-
meta: {
|
|
2824
|
-
name: "retry",
|
|
2825
|
-
description: t("command.extract.retry.description")
|
|
2826
|
-
},
|
|
2827
|
-
args: {
|
|
2828
|
-
id: {
|
|
2829
|
-
type: "string",
|
|
2830
|
-
description: t("command.extract.retry.args.id")
|
|
2831
|
-
},
|
|
2832
|
-
noInsert: {
|
|
2833
|
-
type: "boolean",
|
|
2834
|
-
description: t("command.extract.retry.args.noInsert"),
|
|
2835
|
-
default: false
|
|
2836
|
-
}
|
|
2837
|
-
},
|
|
2838
|
-
async run({ args }) {
|
|
2839
|
-
intro(pc.inverse(" aiex extract retry "));
|
|
2840
|
-
await initI18n();
|
|
2841
|
-
const id = getIdArg(args);
|
|
2842
|
-
if (!id) {
|
|
2843
|
-
failCommand(t("command.extract.history.errors.idRequired"));
|
|
2844
|
-
return;
|
|
2845
|
-
}
|
|
2846
|
-
const config = createMigrationConfig(process.cwd());
|
|
2847
|
-
const aiexDir = path.dirname(config.schemaPath);
|
|
2848
|
-
const record = await readExtractionAuditRecord(aiexDir, id);
|
|
2849
|
-
if (!record) {
|
|
2850
|
-
failCommand(t("command.extract.history.errors.recordNotFound", { id }));
|
|
2851
|
-
return;
|
|
2852
|
-
}
|
|
2853
|
-
const aiConfig = await loadConfiguredAI(aiexDir);
|
|
2854
|
-
if (!aiConfig) return;
|
|
2855
|
-
const modelOverride = resolveModelOverride(aiConfig, record.modelName);
|
|
2856
|
-
if (modelOverride === null) return;
|
|
2857
|
-
try {
|
|
2858
|
-
const result = await runAuditedExtraction({
|
|
2859
|
-
aiexDir,
|
|
2860
|
-
config,
|
|
2861
|
-
aiConfig,
|
|
2862
|
-
schemaName: record.schemaName,
|
|
2863
|
-
source: record.source,
|
|
2864
|
-
modelOverride,
|
|
2865
|
-
retryOf: record.id,
|
|
2866
|
-
insert: !args.noInsert,
|
|
2867
|
-
force: true
|
|
2868
|
-
});
|
|
2869
|
-
if (!result.success) {
|
|
2870
|
-
failCommand(result.error);
|
|
2871
|
-
return;
|
|
2872
|
-
}
|
|
2873
|
-
outro(t("common.done"));
|
|
2874
|
-
} catch (error) {
|
|
2875
|
-
if (isMissingUploadFileError(error)) {
|
|
2876
|
-
failCommand(MISSING_UPLOAD_FILE_TEXT);
|
|
2877
|
-
return;
|
|
2878
|
-
}
|
|
2879
|
-
failCommand(error instanceof Error ? error.message : String(error));
|
|
2880
|
-
}
|
|
2881
|
-
}
|
|
2882
|
-
});
|
|
2883
|
-
const rmCommand = defineCommand({
|
|
2884
|
-
meta: {
|
|
2885
|
-
name: "rm",
|
|
2886
|
-
description: t("command.extract.rm.description")
|
|
2887
|
-
},
|
|
2888
|
-
args: { id: {
|
|
2889
|
-
type: "string",
|
|
2890
|
-
description: t("command.extract.rm.args.id")
|
|
2891
|
-
} },
|
|
2892
|
-
async run({ args }) {
|
|
2893
|
-
const id = getIdArg(args);
|
|
2894
|
-
if (!id) {
|
|
2895
|
-
failCommand(t("command.extract.history.errors.idRequired"));
|
|
2896
|
-
return;
|
|
2897
|
-
}
|
|
2898
|
-
const config = createMigrationConfig(process.cwd());
|
|
2899
|
-
if (!await deleteExtractionAuditRecord(path.dirname(config.schemaPath), id)) {
|
|
2900
|
-
failCommand(t("command.extract.history.errors.recordNotFound", { id }));
|
|
2901
|
-
return;
|
|
2902
|
-
}
|
|
2903
|
-
consola.success(t("command.extract.history.deleted", { id }));
|
|
2904
|
-
}
|
|
2905
|
-
});
|
|
2906
2546
|
const extractCommand = defineCommand({
|
|
2907
2547
|
meta: {
|
|
2908
2548
|
name: "extract",
|
|
2909
2549
|
description: t("command.extract.description")
|
|
2910
2550
|
},
|
|
2911
|
-
subCommands: {
|
|
2912
|
-
history: historyCommand,
|
|
2913
|
-
show: showCommand,
|
|
2914
|
-
retry: retryCommand,
|
|
2915
|
-
rm: rmCommand
|
|
2916
|
-
},
|
|
2917
2551
|
args: {
|
|
2918
2552
|
schema: {
|
|
2919
2553
|
type: "string",
|
|
@@ -2951,8 +2585,7 @@ const extractCommand = defineCommand({
|
|
|
2951
2585
|
default: false
|
|
2952
2586
|
}
|
|
2953
2587
|
},
|
|
2954
|
-
async run({ args
|
|
2955
|
-
if (isExtractSubCommand(rawArgs)) return;
|
|
2588
|
+
async run({ args }) {
|
|
2956
2589
|
intro(pc.inverse(" aiex extract "));
|
|
2957
2590
|
await initI18n();
|
|
2958
2591
|
const config = createMigrationConfig(process.cwd());
|
|
@@ -3033,7 +2666,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3033
2666
|
}))
|
|
3034
2667
|
});
|
|
3035
2668
|
if (isCancel(schemaName)) {
|
|
3036
|
-
cancel(t("common.cancelled"));
|
|
2669
|
+
cancel$1(t("common.cancelled"));
|
|
3037
2670
|
return false;
|
|
3038
2671
|
}
|
|
3039
2672
|
const inputSource = await select({
|
|
@@ -3049,7 +2682,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3049
2682
|
}]
|
|
3050
2683
|
});
|
|
3051
2684
|
if (isCancel(inputSource)) {
|
|
3052
|
-
cancel(t("common.cancelled"));
|
|
2685
|
+
cancel$1(t("common.cancelled"));
|
|
3053
2686
|
return false;
|
|
3054
2687
|
}
|
|
3055
2688
|
if (inputSource === "file") {
|
|
@@ -3060,7 +2693,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3060
2693
|
}
|
|
3061
2694
|
});
|
|
3062
2695
|
if (isCancel(filePathStr)) {
|
|
3063
|
-
cancel(t("common.cancelled"));
|
|
2696
|
+
cancel$1(t("common.cancelled"));
|
|
3064
2697
|
return false;
|
|
3065
2698
|
}
|
|
3066
2699
|
const fp = filePathStr;
|
|
@@ -3069,7 +2702,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3069
2702
|
initialValue: false
|
|
3070
2703
|
});
|
|
3071
2704
|
if (isCancel(force)) {
|
|
3072
|
-
cancel(t("common.cancelled"));
|
|
2705
|
+
cancel$1(t("common.cancelled"));
|
|
3073
2706
|
return false;
|
|
3074
2707
|
}
|
|
3075
2708
|
return (await runAuditedExtraction({
|
|
@@ -3092,7 +2725,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3092
2725
|
}
|
|
3093
2726
|
});
|
|
3094
2727
|
if (isCancel(dirPath)) {
|
|
3095
|
-
cancel(t("common.cancelled"));
|
|
2728
|
+
cancel$1(t("common.cancelled"));
|
|
3096
2729
|
return false;
|
|
3097
2730
|
}
|
|
3098
2731
|
const force = await confirm({
|
|
@@ -3100,7 +2733,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3100
2733
|
initialValue: false
|
|
3101
2734
|
});
|
|
3102
2735
|
if (isCancel(force)) {
|
|
3103
|
-
cancel(t("common.cancelled"));
|
|
2736
|
+
cancel$1(t("common.cancelled"));
|
|
3104
2737
|
return false;
|
|
3105
2738
|
}
|
|
3106
2739
|
const result = await runBatchExtraction(aiexDir, config, aiConfig, schemaName, dirPath, void 0, modelOverride, { force });
|
|
@@ -3109,7 +2742,7 @@ async function runInteractive(aiexDir, config, aiConfig, modelOverride) {
|
|
|
3109
2742
|
}
|
|
3110
2743
|
return false;
|
|
3111
2744
|
}
|
|
3112
|
-
function cancel(msg) {
|
|
2745
|
+
function cancel$1(msg) {
|
|
3113
2746
|
consola.info(msg);
|
|
3114
2747
|
outro(t("common.cancelled"));
|
|
3115
2748
|
process.exitCode = 0;
|
|
@@ -3549,6 +3182,14 @@ const watchCommand = defineCommand({
|
|
|
3549
3182
|
async run({ args }) {
|
|
3550
3183
|
intro(pc.inverse(" aiex watch "));
|
|
3551
3184
|
await initI18n();
|
|
3185
|
+
const config = createMigrationConfig(process.cwd());
|
|
3186
|
+
const aiexDir = path.dirname(config.schemaPath);
|
|
3187
|
+
if (!args.schema && !args.dir && !args.model) {
|
|
3188
|
+
const aiConfig$1 = await loadConfiguredAI(aiexDir);
|
|
3189
|
+
if (!aiConfig$1) return;
|
|
3190
|
+
if (await runInteractiveWatch(aiexDir, config, aiConfig$1)) consola.info(t("command.watch.events.pressCtrlC"));
|
|
3191
|
+
return;
|
|
3192
|
+
}
|
|
3552
3193
|
if (!args.schema) {
|
|
3553
3194
|
failCommand(t("command.watch.errors.schemaRequired"));
|
|
3554
3195
|
return;
|
|
@@ -3557,33 +3198,18 @@ const watchCommand = defineCommand({
|
|
|
3557
3198
|
failCommand(t("command.watch.errors.dirRequired"));
|
|
3558
3199
|
return;
|
|
3559
3200
|
}
|
|
3560
|
-
const config = createMigrationConfig(process.cwd());
|
|
3561
|
-
const aiexDir = path.dirname(config.schemaPath);
|
|
3562
3201
|
const schemaLoad = await loadSchema(config, args.schema);
|
|
3563
3202
|
if (!schemaLoad.schema) {
|
|
3564
3203
|
failCommand(schemaLoad.error || t("command.watch.errors.schemaNotFound", { name: args.schema }));
|
|
3565
3204
|
return;
|
|
3566
3205
|
}
|
|
3567
|
-
let watchDirStat;
|
|
3568
|
-
try {
|
|
3569
|
-
watchDirStat = fs$1.statSync(args.dir);
|
|
3570
|
-
} catch (e) {
|
|
3571
|
-
failCommand(t("command.watch.errors.dirNotExist", {
|
|
3572
|
-
dir: args.dir,
|
|
3573
|
-
error: e instanceof Error ? e.message : String(e)
|
|
3574
|
-
}));
|
|
3575
|
-
return;
|
|
3576
|
-
}
|
|
3577
|
-
if (!watchDirStat.isDirectory()) {
|
|
3578
|
-
failCommand(t("command.watch.errors.notADirectory", { dir: args.dir }));
|
|
3579
|
-
return;
|
|
3580
|
-
}
|
|
3581
3206
|
const watchDirAbs = path.resolve(args.dir);
|
|
3207
|
+
if (!validateWatchDir(watchDirAbs)) return;
|
|
3582
3208
|
const aiConfig = await loadConfiguredAI(aiexDir);
|
|
3583
3209
|
if (!aiConfig) return;
|
|
3584
3210
|
const modelOverride = resolveModelOverride(aiConfig, args.model);
|
|
3585
3211
|
if (modelOverride === null) return;
|
|
3586
|
-
|
|
3212
|
+
registerCleanup(startWatch({
|
|
3587
3213
|
aiexDir,
|
|
3588
3214
|
config,
|
|
3589
3215
|
aiConfig,
|
|
@@ -3591,18 +3217,112 @@ const watchCommand = defineCommand({
|
|
|
3591
3217
|
watchDir: watchDirAbs,
|
|
3592
3218
|
modelOverride,
|
|
3593
3219
|
insert: !args.noInsert
|
|
3594
|
-
});
|
|
3595
|
-
const cleanup = async () => {
|
|
3596
|
-
consola.info(t("command.watch.events.stopped"));
|
|
3597
|
-
await watcher.close();
|
|
3598
|
-
consola.success(t("command.watch.events.stoppedOk"));
|
|
3599
|
-
process.exit(0);
|
|
3600
|
-
};
|
|
3601
|
-
process.on("SIGINT", cleanup);
|
|
3602
|
-
process.on("SIGTERM", cleanup);
|
|
3220
|
+
}));
|
|
3603
3221
|
consola.info(t("command.watch.events.pressCtrlC"));
|
|
3604
3222
|
}
|
|
3605
3223
|
});
|
|
3224
|
+
async function runInteractiveWatch(aiexDir, config, aiConfig) {
|
|
3225
|
+
const schemas = await listSchemas(aiexDir);
|
|
3226
|
+
if (schemas.length === 0) {
|
|
3227
|
+
failCommand(t("command.extract.errors.noSchemas", {
|
|
3228
|
+
path: pc.cyan(".aiex/schema/"),
|
|
3229
|
+
cmd: pc.cyan("aiex web")
|
|
3230
|
+
}));
|
|
3231
|
+
return false;
|
|
3232
|
+
}
|
|
3233
|
+
const schemaName = await select({
|
|
3234
|
+
message: t("command.watch.interactive.selectSchema"),
|
|
3235
|
+
options: schemas.map((s) => ({
|
|
3236
|
+
label: s,
|
|
3237
|
+
value: s
|
|
3238
|
+
}))
|
|
3239
|
+
});
|
|
3240
|
+
if (isCancel(schemaName)) {
|
|
3241
|
+
cancel(t("common.cancelled"));
|
|
3242
|
+
return false;
|
|
3243
|
+
}
|
|
3244
|
+
const dirPath = await text({
|
|
3245
|
+
message: t("command.watch.interactive.enterDirPath"),
|
|
3246
|
+
validate(value) {
|
|
3247
|
+
if (!value || value.trim().length === 0) return t("command.watch.interactive.dirPathRequired");
|
|
3248
|
+
}
|
|
3249
|
+
});
|
|
3250
|
+
if (isCancel(dirPath)) {
|
|
3251
|
+
cancel(t("common.cancelled"));
|
|
3252
|
+
return false;
|
|
3253
|
+
}
|
|
3254
|
+
const selectedModel = await select({
|
|
3255
|
+
message: t("command.watch.interactive.selectModel"),
|
|
3256
|
+
options: [{
|
|
3257
|
+
label: t("command.watch.interactive.autoModel"),
|
|
3258
|
+
value: ""
|
|
3259
|
+
}, ...aiConfig.provider.models.map((model) => ({
|
|
3260
|
+
label: model.name,
|
|
3261
|
+
value: model.name
|
|
3262
|
+
}))]
|
|
3263
|
+
});
|
|
3264
|
+
if (isCancel(selectedModel)) {
|
|
3265
|
+
cancel(t("common.cancelled"));
|
|
3266
|
+
return false;
|
|
3267
|
+
}
|
|
3268
|
+
const noInsert = await confirm({
|
|
3269
|
+
message: t("command.watch.interactive.askNoInsert"),
|
|
3270
|
+
initialValue: false
|
|
3271
|
+
});
|
|
3272
|
+
if (isCancel(noInsert)) {
|
|
3273
|
+
cancel(t("common.cancelled"));
|
|
3274
|
+
return false;
|
|
3275
|
+
}
|
|
3276
|
+
const watchDir = path.resolve(dirPath);
|
|
3277
|
+
if (!validateWatchDir(watchDir)) return false;
|
|
3278
|
+
const modelOverride = resolveModelOverride(aiConfig, selectedModel ? selectedModel : void 0);
|
|
3279
|
+
if (modelOverride === null) return false;
|
|
3280
|
+
registerCleanup(startWatch({
|
|
3281
|
+
aiexDir,
|
|
3282
|
+
config,
|
|
3283
|
+
aiConfig,
|
|
3284
|
+
schemaName,
|
|
3285
|
+
watchDir,
|
|
3286
|
+
modelOverride,
|
|
3287
|
+
insert: !noInsert
|
|
3288
|
+
}));
|
|
3289
|
+
return true;
|
|
3290
|
+
}
|
|
3291
|
+
function validateWatchDir(dir) {
|
|
3292
|
+
let watchDirStat;
|
|
3293
|
+
try {
|
|
3294
|
+
watchDirStat = fs$1.statSync(dir);
|
|
3295
|
+
} catch (e) {
|
|
3296
|
+
failCommand(t("command.watch.errors.dirNotExist", {
|
|
3297
|
+
dir,
|
|
3298
|
+
error: e instanceof Error ? e.message : String(e)
|
|
3299
|
+
}));
|
|
3300
|
+
return false;
|
|
3301
|
+
}
|
|
3302
|
+
if (!watchDirStat.isDirectory()) {
|
|
3303
|
+
failCommand(t("command.watch.errors.notADirectory", { dir }));
|
|
3304
|
+
return false;
|
|
3305
|
+
}
|
|
3306
|
+
return true;
|
|
3307
|
+
}
|
|
3308
|
+
function startWatch(options) {
|
|
3309
|
+
return startWatcher(options);
|
|
3310
|
+
}
|
|
3311
|
+
function registerCleanup(watcher) {
|
|
3312
|
+
const cleanup = async () => {
|
|
3313
|
+
consola.info(t("command.watch.events.stopped"));
|
|
3314
|
+
await watcher.close();
|
|
3315
|
+
consola.success(t("command.watch.events.stoppedOk"));
|
|
3316
|
+
process.exit(0);
|
|
3317
|
+
};
|
|
3318
|
+
process.on("SIGINT", cleanup);
|
|
3319
|
+
process.on("SIGTERM", cleanup);
|
|
3320
|
+
}
|
|
3321
|
+
function cancel(msg) {
|
|
3322
|
+
consola.info(msg);
|
|
3323
|
+
outro(t("common.cancelled"));
|
|
3324
|
+
process.exitCode = 0;
|
|
3325
|
+
}
|
|
3606
3326
|
|
|
3607
3327
|
//#endregion
|
|
3608
3328
|
//#region src/domain/ai-extraction/model-capabilities.json
|
|
@@ -17062,7 +16782,6 @@ const subCommands = {
|
|
|
17062
16782
|
schema: schemaCommand,
|
|
17063
16783
|
extract: extractCommand,
|
|
17064
16784
|
watch: watchCommand,
|
|
17065
|
-
dump: dumpCommand,
|
|
17066
16785
|
completion: completionCommand,
|
|
17067
16786
|
doctor: doctorCommand
|
|
17068
16787
|
};
|
|
@@ -11,7 +11,7 @@ import { z } from "zod";
|
|
|
11
11
|
|
|
12
12
|
//#region package.json
|
|
13
13
|
var name = "aiex-cli";
|
|
14
|
-
var version = "0.1.
|
|
14
|
+
var version = "0.1.1-beta.1";
|
|
15
15
|
var description = "JSON Schema → SQLite with AI-powered data extraction";
|
|
16
16
|
var package_default = {
|
|
17
17
|
name,
|
|
@@ -862,30 +862,6 @@ const en = {
|
|
|
862
862
|
fileRequiredSingle: "Please provide a file (-f) to extract from",
|
|
863
863
|
noSchemas: "No schema files found in {{path}}. Run {{cmd}} to create and configure schemas first."
|
|
864
864
|
},
|
|
865
|
-
history: {
|
|
866
|
-
description: "List extraction audit records",
|
|
867
|
-
empty: "No extraction history found",
|
|
868
|
-
errors: {
|
|
869
|
-
idRequired: "Audit record id is required",
|
|
870
|
-
recordNotFound: "Extraction record not found: {{id}}"
|
|
871
|
-
},
|
|
872
|
-
deleted: "Deleted extraction record: {{id}}"
|
|
873
|
-
},
|
|
874
|
-
show: {
|
|
875
|
-
description: "Show an extraction audit record",
|
|
876
|
-
args: { id: "Audit record id" }
|
|
877
|
-
},
|
|
878
|
-
retry: {
|
|
879
|
-
description: "Retry an extraction audit record",
|
|
880
|
-
args: {
|
|
881
|
-
id: "Audit record id",
|
|
882
|
-
noInsert: "Extract and save JSON without inserting into SQLite"
|
|
883
|
-
}
|
|
884
|
-
},
|
|
885
|
-
rm: {
|
|
886
|
-
description: "Delete an extraction audit record and cached upload",
|
|
887
|
-
args: { id: "Audit record id" }
|
|
888
|
-
},
|
|
889
865
|
batch: {
|
|
890
866
|
scanning: "Scanning {{dir}} for supported files...",
|
|
891
867
|
found: "Found {{count}} file(s) to process",
|
|
@@ -932,6 +908,14 @@ const en = {
|
|
|
932
908
|
model: "AI model to use for extraction (overrides default/auto-selected model)",
|
|
933
909
|
noInsert: "Extract and save JSON without inserting into SQLite database"
|
|
934
910
|
},
|
|
911
|
+
interactive: {
|
|
912
|
+
selectSchema: "Select a schema to watch with:",
|
|
913
|
+
enterDirPath: "Enter directory path to watch:",
|
|
914
|
+
dirPathRequired: "Please enter a directory path",
|
|
915
|
+
selectModel: "Select an AI model:",
|
|
916
|
+
autoModel: "Auto-select model",
|
|
917
|
+
askNoInsert: "Save extracted JSON without inserting into SQLite?"
|
|
918
|
+
},
|
|
935
919
|
errors: {
|
|
936
920
|
schemaRequired: "Schema name (-s) is required",
|
|
937
921
|
dirRequired: "Watch directory path (-d) is required",
|
|
@@ -963,35 +947,6 @@ const en = {
|
|
|
963
947
|
failTitle: "AIEX Watch Failed: {{file}}"
|
|
964
948
|
}
|
|
965
949
|
},
|
|
966
|
-
dump: {
|
|
967
|
-
description: "Dump SQLite database table to Excel (.xlsx) or CSV (.csv)",
|
|
968
|
-
args: {
|
|
969
|
-
table: "SQLite table name to export",
|
|
970
|
-
schema: "Schema name (without .json extension) to export",
|
|
971
|
-
format: "Export format: csv or xlsx (default: inferred from output or csv)",
|
|
972
|
-
output: "Output file path (default: ./<tableName>.<format>)"
|
|
973
|
-
},
|
|
974
|
-
errors: {
|
|
975
|
-
tableOrSchemaRequired: "Either table name (--table / -t) or schema name (--schema / -s) is required",
|
|
976
|
-
schemaNotFound: "Schema file for \"{{name}}\" not found",
|
|
977
|
-
noTableName: "Schema \"{{name}}\" does not define a database table name.",
|
|
978
|
-
tableMismatch: "Specified table name \"{{table}}\" does not match schema table name \"{{schemaTable}}\"",
|
|
979
|
-
unsupportedFormat: "Unsupported dump format: \"{{format}}\". Supported formats: csv, xlsx",
|
|
980
|
-
dbNotFound: "Database file not found at {{path}}. Please run \"{{cmd}}\" to create the database first.",
|
|
981
|
-
tableNotFound: "Table \"{{name}}\" not found in database. Run \"{{cmd}}\" first to migrate.",
|
|
982
|
-
tableEmpty: "Table \"{{name}}\" is empty. Exporting empty file..."
|
|
983
|
-
},
|
|
984
|
-
loading: "Loading data from table \"{{name}}\"...",
|
|
985
|
-
loaded: "Loaded {{count}} row(s)",
|
|
986
|
-
emptyTable: "Empty table",
|
|
987
|
-
formatting: "Formatting data...",
|
|
988
|
-
formatted: "Data formatted",
|
|
989
|
-
writing: "Writing {{format}} file to {{path}}...",
|
|
990
|
-
dumpCompleted: "Dump completed successfully",
|
|
991
|
-
successMsg: "Successfully dumped {{count}} row(s) to {{path}}",
|
|
992
|
-
fileWriteFailed: "File write failed",
|
|
993
|
-
dbQueryFailed: "Database query failed"
|
|
994
|
-
},
|
|
995
950
|
doctor: {
|
|
996
951
|
description: "Print environment and configuration diagnostics",
|
|
997
952
|
args: { json: "Print diagnostics as JSON" },
|
|
@@ -1295,7 +1250,7 @@ async function initI18n(lng) {
|
|
|
1295
1250
|
fallbackLng: "en",
|
|
1296
1251
|
resources: {
|
|
1297
1252
|
"en": { translation: en },
|
|
1298
|
-
"zh-CN": { translation: await import("./zh-CN-
|
|
1253
|
+
"zh-CN": { translation: await import("./zh-CN-Cs4MViVi.mjs").then((m) => m.zhCN) }
|
|
1299
1254
|
},
|
|
1300
1255
|
interpolation: { escapeValue: false },
|
|
1301
1256
|
returnNull: false
|
package/dist/index.mjs
CHANGED
|
@@ -1,3 +1,3 @@
|
|
|
1
|
-
import { _ as doctorDiagnosticsSeverityRows, g as buildDoctorDiagnostics, i as generateDrizzleConfig, m as parseJsonSchema, n as collectDoctorDiagnostics, p as JsonSchemaDefinitionSchema, r as createMigrationConfig, t as generateDrizzleSchema, v as doctorDiagnosticsTableRows, y as formatDoctorDiagnosticsJson } from "./generate-drizzle-schema-
|
|
1
|
+
import { _ as doctorDiagnosticsSeverityRows, g as buildDoctorDiagnostics, i as generateDrizzleConfig, m as parseJsonSchema, n as collectDoctorDiagnostics, p as JsonSchemaDefinitionSchema, r as createMigrationConfig, t as generateDrizzleSchema, v as doctorDiagnosticsTableRows, y as formatDoctorDiagnosticsJson } from "./generate-drizzle-schema-CaSMqQWx.mjs";
|
|
2
2
|
|
|
3
3
|
export { JsonSchemaDefinitionSchema, buildDoctorDiagnostics, collectDoctorDiagnostics, createMigrationConfig, doctorDiagnosticsSeverityRows, doctorDiagnosticsTableRows, formatDoctorDiagnosticsJson, generateDrizzleConfig, generateDrizzleSchema, parseJsonSchema };
|
|
@@ -78,30 +78,6 @@ const zhCN = {
|
|
|
78
78
|
fileRequiredSingle: "请提供文件路径(-f)进行抽取",
|
|
79
79
|
noSchemas: "在 {{path}} 未找到 schema 文件。运行 {{cmd}} 创建和配置 schemas"
|
|
80
80
|
},
|
|
81
|
-
history: {
|
|
82
|
-
description: "列出抽取审计记录",
|
|
83
|
-
empty: "未找到抽取历史记录",
|
|
84
|
-
errors: {
|
|
85
|
-
idRequired: "需要提供审计记录 ID",
|
|
86
|
-
recordNotFound: "未找到抽取记录: {{id}}"
|
|
87
|
-
},
|
|
88
|
-
deleted: "已删除抽取记录: {{id}}"
|
|
89
|
-
},
|
|
90
|
-
show: {
|
|
91
|
-
description: "查看抽取审计记录详情",
|
|
92
|
-
args: { id: "审计记录 ID" }
|
|
93
|
-
},
|
|
94
|
-
retry: {
|
|
95
|
-
description: "重试抽取审计记录",
|
|
96
|
-
args: {
|
|
97
|
-
id: "审计记录 ID",
|
|
98
|
-
noInsert: "仅保存 JSON,不插入 SQLite"
|
|
99
|
-
}
|
|
100
|
-
},
|
|
101
|
-
rm: {
|
|
102
|
-
description: "删除抽取审计记录及其缓存的上传文件",
|
|
103
|
-
args: { id: "审计记录 ID" }
|
|
104
|
-
},
|
|
105
81
|
batch: {
|
|
106
82
|
scanning: "正在扫描 {{dir}} 中支持的文件...",
|
|
107
83
|
found: "找到 {{count}} 个待处理的文件",
|
|
@@ -148,6 +124,14 @@ const zhCN = {
|
|
|
148
124
|
model: "用于抽取的 AI 模型(覆盖默认/自动选择的模型)",
|
|
149
125
|
noInsert: "仅保存 JSON,不插入 SQLite"
|
|
150
126
|
},
|
|
127
|
+
interactive: {
|
|
128
|
+
selectSchema: "选择用于监听抽取的 schema:",
|
|
129
|
+
enterDirPath: "输入要监听的目录路径:",
|
|
130
|
+
dirPathRequired: "请输入目录路径",
|
|
131
|
+
selectModel: "选择 AI 模型:",
|
|
132
|
+
autoModel: "自动选择模型",
|
|
133
|
+
askNoInsert: "是否仅保存 JSON,不插入 SQLite?"
|
|
134
|
+
},
|
|
151
135
|
errors: {
|
|
152
136
|
schemaRequired: "需要指定 schema 名称(-s)",
|
|
153
137
|
dirRequired: "需要指定监听目录路径(-d)",
|
|
@@ -179,35 +163,6 @@ const zhCN = {
|
|
|
179
163
|
failTitle: "AIEX 监听失败: {{file}}"
|
|
180
164
|
}
|
|
181
165
|
},
|
|
182
|
-
dump: {
|
|
183
|
-
description: "导出 SQLite 数据库表到 Excel(.xlsx)或 CSV(.csv)",
|
|
184
|
-
args: {
|
|
185
|
-
table: "要导出的 SQLite 表名",
|
|
186
|
-
schema: "要导出的 schema 名称(不含 .json 扩展名)",
|
|
187
|
-
format: "导出格式: csv 或 xlsx(默认: 从输出文件名推断或 csv)",
|
|
188
|
-
output: "输出文件路径(默认: ./<表名>.<格式>)"
|
|
189
|
-
},
|
|
190
|
-
errors: {
|
|
191
|
-
tableOrSchemaRequired: "需要指定表名(--table / -t)或 schema 名称(--schema / -s)",
|
|
192
|
-
schemaNotFound: "未找到 schema 文件 \"{{name}}\"",
|
|
193
|
-
noTableName: "Schema \"{{name}}\" 没有定义数据库表名",
|
|
194
|
-
tableMismatch: "指定的表名 \"{{table}}\" 与 schema 表名 \"{{schemaTable}}\" 不匹配",
|
|
195
|
-
unsupportedFormat: "不支持的导出格式: \"{{format}}\"。支持的格式: csv, xlsx",
|
|
196
|
-
dbNotFound: "在 {{path}} 未找到数据库文件。请先运行 \"{{cmd}}\" 创建数据库。",
|
|
197
|
-
tableNotFound: "在数据库中未找到表 \"{{name}}\" 。请先运行 \"{{cmd}}\" 迁移。",
|
|
198
|
-
tableEmpty: "表 \"{{name}}\" 为空。正在导出空文件..."
|
|
199
|
-
},
|
|
200
|
-
loading: "正在从表 \"{{name}}\" 加载数据...",
|
|
201
|
-
loaded: "已加载 {{count}} 行",
|
|
202
|
-
emptyTable: "空表",
|
|
203
|
-
formatting: "正在格式化数据...",
|
|
204
|
-
formatted: "数据已格式化",
|
|
205
|
-
writing: "正在写入 {{format}} 文件到 {{path}}...",
|
|
206
|
-
dumpCompleted: "导出完成",
|
|
207
|
-
successMsg: "已成功导出 {{count}} 行到 {{path}}",
|
|
208
|
-
fileWriteFailed: "文件写入失败",
|
|
209
|
-
dbQueryFailed: "数据库查询失败"
|
|
210
|
-
},
|
|
211
166
|
doctor: {
|
|
212
167
|
description: "打印环境和配置诊断信息",
|
|
213
168
|
args: { json: "以 JSON 格式输出诊断信息" },
|