@gobing-ai/ts-llm-jsonl-importer 0.2.6 → 0.2.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +131 -1
- package/package.json +3 -3
package/README.md
CHANGED
|
@@ -1,4 +1,134 @@
|
|
|
1
1
|
# @gobing-ai/ts-llm-jsonl-importer
|
|
2
2
|
|
|
3
|
-
Generic JSONL
|
|
3
|
+
Generic JSONL import pipeline for AI-agent history-style files: discover files, parse rows, normalize source fields, redact sensitive values, deduplicate by ledger hash, and persist ETL rows.
|
|
4
4
|
|
|
5
|
+
## What It Provides
|
|
6
|
+
|
|
7
|
+
`ts-llm-jsonl-importer` is a common JSONL importer. It is intentionally not a conversation-history domain model. Built-in source definitions cover common agent file shapes, but downstream systems consume normalized ETL rows and ledger metadata.
|
|
8
|
+
|
|
9
|
+
| Export | Purpose |
|
|
10
|
+
|--------|---------|
|
|
11
|
+
| `runJsonlImport()` | Runs discovery, parsing, validation, redaction, dedupe, and persistence |
|
|
12
|
+
| `applyHistoryImportSchema()` | Installs importer-owned checkpoint, ledger, and ETL tables |
|
|
13
|
+
| `SOURCE_DEFINITIONS` | Built-in source definitions |
|
|
14
|
+
| `getSourceDefinition()` | Returns one source definition by key |
|
|
15
|
+
| `redactRecord()` / `redactValue()` | Applies redaction rules before persistence |
|
|
16
|
+
| `sha256()` / `stableJson()` | Stable hash helpers used by the ledger |
|
|
17
|
+
| `HISTORY_IMPORT_SCHEMA_SQL` | SQL schema string for explicit migration flows |
|
|
18
|
+
|
|
19
|
+
Built-in source keys are `claude`, `codex`, `gemini`, `pi`, `opencode`, `antigravity`, and `openclaw`.
|
|
20
|
+
|
|
21
|
+
## Installation
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
bun add @gobing-ai/ts-llm-jsonl-importer @gobing-ai/ts-db
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
The importer expects a `DbAdapter`-compatible object from `@gobing-ai/ts-db`.
|
|
28
|
+
|
|
29
|
+
## Basic Import
|
|
30
|
+
|
|
31
|
+
```ts
|
|
32
|
+
import { createDbAdapter } from '@gobing-ai/ts-db';
|
|
33
|
+
import { runJsonlImport } from '@gobing-ai/ts-llm-jsonl-importer';
|
|
34
|
+
|
|
35
|
+
const db = await createDbAdapter({ driver: 'bun-sqlite', url: './history-import.db' });
|
|
36
|
+
|
|
37
|
+
const result = await runJsonlImport('codex', {
|
|
38
|
+
db,
|
|
39
|
+
roots: ['./agent-history'],
|
|
40
|
+
mode: 'incremental',
|
|
41
|
+
});
|
|
42
|
+
|
|
43
|
+
console.log(result.importedRecords, result.skippedDuplicates);
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
`runJsonlImport()` applies the package-owned schema automatically before processing. Use `applyHistoryImportSchema(db)` directly when your application has an explicit migration step.
|
|
47
|
+
|
|
48
|
+
## Import Modes
|
|
49
|
+
|
|
50
|
+
| Mode | Behavior |
|
|
51
|
+
|------|----------|
|
|
52
|
+
| `incremental` | Reads each file from the last imported line checkpoint |
|
|
53
|
+
| `full` | Clears checkpoints for selected files, scans all lines, and still deduplicates by ledger hash |
|
|
54
|
+
| `force-file` | Scans selected files without using the checkpoint, while preserving ledger-based duplicate protection |
|
|
55
|
+
|
|
56
|
+
All modes preserve parse and validation issues in the returned `ImportResult`; malformed rows are not inserted.
|
|
57
|
+
|
|
58
|
+
## Import Specific Files
|
|
59
|
+
|
|
60
|
+
```ts
|
|
61
|
+
const result = await runJsonlImport('pi', {
|
|
62
|
+
db,
|
|
63
|
+
files: ['/tmp/session-1.jsonl', '/tmp/session-2.jsonl'],
|
|
64
|
+
mode: 'full',
|
|
65
|
+
now: () => new Date(),
|
|
66
|
+
});
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
When `files` is provided, roots are ignored. When roots are provided, the importer walks each root and selects files matching the source definition's patterns.
|
|
70
|
+
|
|
71
|
+
## Redaction
|
|
72
|
+
|
|
73
|
+
Redaction runs before hashing and persistence, so the ledger hash represents the persisted redacted payload.
|
|
74
|
+
|
|
75
|
+
```ts
|
|
76
|
+
import { DEFAULT_REDACTION_RULES, runJsonlImport } from '@gobing-ai/ts-llm-jsonl-importer';
|
|
77
|
+
|
|
78
|
+
await runJsonlImport('openclaw', {
|
|
79
|
+
db,
|
|
80
|
+
roots: ['./history'],
|
|
81
|
+
redactionRules: [
|
|
82
|
+
...DEFAULT_REDACTION_RULES,
|
|
83
|
+
{ name: 'account-id', pattern: /acct_[a-z0-9]+/gi, replacement: '[REDACTED:account]' },
|
|
84
|
+
],
|
|
85
|
+
});
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Rules are applied recursively to string fields in the normalized record.
|
|
89
|
+
|
|
90
|
+
## Result Shape
|
|
91
|
+
|
|
92
|
+
```ts
|
|
93
|
+
interface ImportResult {
|
|
94
|
+
source: string;
|
|
95
|
+
mode: 'incremental' | 'full' | 'force-file';
|
|
96
|
+
scannedFiles: number;
|
|
97
|
+
processedLines: number;
|
|
98
|
+
importedRecords: number;
|
|
99
|
+
skippedDuplicates: number;
|
|
100
|
+
parseErrors: ImportIssue[];
|
|
101
|
+
validationErrors: ImportIssue[];
|
|
102
|
+
checkpointUpdates: number;
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Use `parseErrors` for invalid JSON or non-object rows. Use `validationErrors` for source rows that parse but fail the source definition schema.
|
|
107
|
+
|
|
108
|
+
## Stored Tables
|
|
109
|
+
|
|
110
|
+
The schema contains:
|
|
111
|
+
|
|
112
|
+
| Table | Purpose |
|
|
113
|
+
|-------|---------|
|
|
114
|
+
| `history_import_checkpoint` | Per-source/per-file last imported line |
|
|
115
|
+
| `history_import_ledger` | Stable hash ledger for dedupe and provenance |
|
|
116
|
+
| `history_etl_<source>` | Redacted normalized payloads for each built-in source |
|
|
117
|
+
|
|
118
|
+
ETL tables store the normalized payload JSON plus source file, source line, split index, hash, and timestamps.
|
|
119
|
+
|
|
120
|
+
## Split Records
|
|
121
|
+
|
|
122
|
+
Most sources are one input row to one ETL row. A source can also split one JSONL row into multiple ETL records. The built-in `pi` definition splits nested `messages` into one row per message when present.
|
|
123
|
+
|
|
124
|
+
```ts
|
|
125
|
+
const result = await runJsonlImport('pi', { db, files: ['session.jsonl'], mode: 'full' });
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Downstream consumers should treat `source_file`, `source_line`, and `split_index` as the stable provenance tuple.
|
|
129
|
+
|
|
130
|
+
## Boundary Notes
|
|
131
|
+
|
|
132
|
+
- This package imports JSONL files and writes importer-owned tables; it does not model conversations, turns, tool calls, or analytics semantics.
|
|
133
|
+
- Source definitions are the normalization boundary. Downstream applications own domain-specific interpretation of ETL rows.
|
|
134
|
+
- The importer never stores raw malformed rows. Parse and validation failures are reported in memory through `ImportResult`.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@gobing-ai/ts-llm-jsonl-importer",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.8",
|
|
4
4
|
"description": "@gobing-ai/ts-llm-jsonl-importer — Generic JSONL importer for LLM agent history files.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"typescript",
|
|
@@ -47,8 +47,8 @@
|
|
|
47
47
|
"release": "echo 'Manual publish is disabled. Releases go through GitHub Actions via Trusted Publishing — push a tag: git tag @gobing-ai/ts-llm-jsonl-importer-v<version> && git push --tags' && exit 1"
|
|
48
48
|
},
|
|
49
49
|
"dependencies": {
|
|
50
|
-
"@gobing-ai/ts-db": "^0.2.
|
|
51
|
-
"@gobing-ai/ts-runtime": "^0.2.
|
|
50
|
+
"@gobing-ai/ts-db": "^0.2.8",
|
|
51
|
+
"@gobing-ai/ts-runtime": "^0.2.8",
|
|
52
52
|
"zod": "^4.1.0"
|
|
53
53
|
},
|
|
54
54
|
"devDependencies": {
|