@gobing-ai/ts-llm-jsonl-importer 0.2.6 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +131 -1
  2. package/package.json +3 -3
package/README.md CHANGED
@@ -1,4 +1,134 @@
1
1
  # @gobing-ai/ts-llm-jsonl-importer
2
2
 
3
- Generic JSONL importer for LLM agent history files. The package owns the import schema, source definitions, validation, redaction, deduplication, and checkpointing.
3
+ Generic JSONL import pipeline for AI-agent history-style files: discover files, parse rows, normalize source fields, redact sensitive values, deduplicate by ledger hash, and persist ETL rows.
4
4
 
5
+ ## What It Provides
6
+
7
+ `ts-llm-jsonl-importer` is a common JSONL importer. It is intentionally not a conversation-history domain model. Built-in source definitions cover common agent file shapes, but downstream systems consume normalized ETL rows and ledger metadata.
8
+
9
+ | Export | Purpose |
10
+ |--------|---------|
11
+ | `runJsonlImport()` | Runs discovery, parsing, validation, redaction, dedupe, and persistence |
12
+ | `applyHistoryImportSchema()` | Installs importer-owned checkpoint, ledger, and ETL tables |
13
+ | `SOURCE_DEFINITIONS` | Built-in source definitions |
14
+ | `getSourceDefinition()` | Returns one source definition by key |
15
+ | `redactRecord()` / `redactValue()` | Applies redaction rules before persistence |
16
+ | `sha256()` / `stableJson()` | Stable hash helpers used by the ledger |
17
+ | `HISTORY_IMPORT_SCHEMA_SQL` | SQL schema string for explicit migration flows |
18
+
19
+ Built-in source keys are `claude`, `codex`, `gemini`, `pi`, `opencode`, `antigravity`, and `openclaw`.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ bun add @gobing-ai/ts-llm-jsonl-importer @gobing-ai/ts-db
25
+ ```
26
+
27
+ The importer expects a `DbAdapter`-compatible object from `@gobing-ai/ts-db`.
28
+
29
+ ## Basic Import
30
+
31
+ ```ts
32
+ import { createDbAdapter } from '@gobing-ai/ts-db';
33
+ import { runJsonlImport } from '@gobing-ai/ts-llm-jsonl-importer';
34
+
35
+ const db = await createDbAdapter({ driver: 'bun-sqlite', url: './history-import.db' });
36
+
37
+ const result = await runJsonlImport('codex', {
38
+ db,
39
+ roots: ['./agent-history'],
40
+ mode: 'incremental',
41
+ });
42
+
43
+ console.log(result.importedRecords, result.skippedDuplicates);
44
+ ```
45
+
46
+ `runJsonlImport()` applies the package-owned schema automatically before processing. Use `applyHistoryImportSchema(db)` directly when your application has an explicit migration step.
47
+
48
+ ## Import Modes
49
+
50
+ | Mode | Behavior |
51
+ |------|----------|
52
+ | `incremental` | Reads each file from the last imported line checkpoint |
53
+ | `full` | Clears checkpoints for selected files, scans all lines, and still deduplicates by ledger hash |
54
+ | `force-file` | Scans selected files without using the checkpoint, while preserving ledger-based duplicate protection |
55
+
56
+ All modes preserve parse and validation issues in the returned `ImportResult`; malformed rows are not inserted.
57
+
58
+ ## Import Specific Files
59
+
60
+ ```ts
61
+ const result = await runJsonlImport('pi', {
62
+ db,
63
+ files: ['/tmp/session-1.jsonl', '/tmp/session-2.jsonl'],
64
+ mode: 'full',
65
+ now: () => new Date(),
66
+ });
67
+ ```
68
+
69
+ When `files` is provided, roots are ignored. When roots are provided, the importer walks each root and selects files matching the source definition's patterns.
70
+
71
+ ## Redaction
72
+
73
+ Redaction runs before hashing and persistence, so the ledger hash represents the persisted redacted payload.
74
+
75
+ ```ts
76
+ import { DEFAULT_REDACTION_RULES, runJsonlImport } from '@gobing-ai/ts-llm-jsonl-importer';
77
+
78
+ await runJsonlImport('openclaw', {
79
+ db,
80
+ roots: ['./history'],
81
+ redactionRules: [
82
+ ...DEFAULT_REDACTION_RULES,
83
+ { name: 'account-id', pattern: /acct_[a-z0-9]+/gi, replacement: '[REDACTED:account]' },
84
+ ],
85
+ });
86
+ ```
87
+
88
+ Rules are applied recursively to string fields in the normalized record.
89
+
90
+ ## Result Shape
91
+
92
+ ```ts
93
+ interface ImportResult {
94
+ source: string;
95
+ mode: 'incremental' | 'full' | 'force-file';
96
+ scannedFiles: number;
97
+ processedLines: number;
98
+ importedRecords: number;
99
+ skippedDuplicates: number;
100
+ parseErrors: ImportIssue[];
101
+ validationErrors: ImportIssue[];
102
+ checkpointUpdates: number;
103
+ }
104
+ ```
105
+
106
+ Use `parseErrors` for invalid JSON or non-object rows. Use `validationErrors` for source rows that parse but fail the source definition schema.
107
+
108
+ ## Stored Tables
109
+
110
+ The schema contains:
111
+
112
+ | Table | Purpose |
113
+ |-------|---------|
114
+ | `history_import_checkpoint` | Per-source/per-file last imported line |
115
+ | `history_import_ledger` | Stable hash ledger for dedupe and provenance |
116
+ | `history_etl_<source>` | Redacted normalized payloads for each built-in source |
117
+
118
+ ETL tables store the normalized payload JSON plus source file, source line, split index, hash, and timestamps.
119
+
120
+ ## Split Records
121
+
122
+ Most sources are one input row to one ETL row. A source can also split one JSONL row into multiple ETL records. The built-in `pi` definition splits nested `messages` into one row per message when present.
123
+
124
+ ```ts
125
+ const result = await runJsonlImport('pi', { db, files: ['session.jsonl'], mode: 'full' });
126
+ ```
127
+
128
+ Downstream consumers should treat `source_file`, `source_line`, and `split_index` as the stable provenance tuple.
129
+
130
+ ## Boundary Notes
131
+
132
+ - This package imports JSONL files and writes importer-owned tables; it does not model conversations, turns, tool calls, or analytics semantics.
133
+ - Source definitions are the normalization boundary. Downstream applications own domain-specific interpretation of ETL rows.
134
+ - The importer never stores raw malformed rows. Parse and validation failures are reported in memory through `ImportResult`.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@gobing-ai/ts-llm-jsonl-importer",
3
- "version": "0.2.6",
3
+ "version": "0.2.8",
4
4
  "description": "@gobing-ai/ts-llm-jsonl-importer — Generic JSONL importer for LLM agent history files.",
5
5
  "keywords": [
6
6
  "typescript",
@@ -47,8 +47,8 @@
47
47
  "release": "echo 'Manual publish is disabled. Releases go through GitHub Actions via Trusted Publishing — push a tag: git tag @gobing-ai/ts-llm-jsonl-importer-v<version> && git push --tags' && exit 1"
48
48
  },
49
49
  "dependencies": {
50
- "@gobing-ai/ts-db": "^0.2.6",
51
- "@gobing-ai/ts-runtime": "^0.2.6",
50
+ "@gobing-ai/ts-db": "^0.2.8",
51
+ "@gobing-ai/ts-runtime": "^0.2.8",
52
52
  "zod": "^4.1.0"
53
53
  },
54
54
  "devDependencies": {