codealmanac 0.1.5 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/chunk-2JJTTN7P.js +539 -0
- package/dist/chunk-2JJTTN7P.js.map +1 -0
- package/dist/chunk-3C5SY5SE.js +1239 -0
- package/dist/chunk-3C5SY5SE.js.map +1 -0
- package/dist/chunk-4CODZRHH.js +19 -0
- package/dist/chunk-4CODZRHH.js.map +1 -0
- package/dist/chunk-7JUX4ADQ.js +38 -0
- package/dist/chunk-7JUX4ADQ.js.map +1 -0
- package/dist/chunk-A6PUCAVJ.js +145 -0
- package/dist/chunk-A6PUCAVJ.js.map +1 -0
- package/dist/chunk-AXFPUHBN.js +227 -0
- package/dist/chunk-AXFPUHBN.js.map +1 -0
- package/dist/chunk-FM3VRDK7.js +20 -0
- package/dist/chunk-FM3VRDK7.js.map +1 -0
- package/dist/chunk-H6WU6PYH.js +441 -0
- package/dist/chunk-H6WU6PYH.js.map +1 -0
- package/dist/chunk-P3LDTCLB.js +34 -0
- package/dist/chunk-P3LDTCLB.js.map +1 -0
- package/dist/chunk-QHQ6YH7U.js +81 -0
- package/dist/chunk-QHQ6YH7U.js.map +1 -0
- package/dist/chunk-Z4MWLVS2.js +355 -0
- package/dist/chunk-Z4MWLVS2.js.map +1 -0
- package/dist/chunk-Z6MBJ3D2.js +203 -0
- package/dist/chunk-Z6MBJ3D2.js.map +1 -0
- package/dist/cli-AIH5QQ5H.js +393 -0
- package/dist/cli-AIH5QQ5H.js.map +1 -0
- package/dist/codealmanac.js +68 -5954
- package/dist/codealmanac.js.map +1 -1
- package/dist/doctor-6FN5JO5F.js +15 -0
- package/dist/doctor-6FN5JO5F.js.map +1 -0
- package/dist/hook-CRJMWSSO.js +12 -0
- package/dist/hook-CRJMWSSO.js.map +1 -0
- package/dist/register-commands-PZMQNGCH.js +2644 -0
- package/dist/register-commands-PZMQNGCH.js.map +1 -0
- package/dist/uninstall-NBEZNNKM.js +12 -0
- package/dist/uninstall-NBEZNNKM.js.map +1 -0
- package/dist/update-IL243I4E.js +10 -0
- package/dist/update-IL243I4E.js.map +1 -0
- package/dist/wiki-EHZ7LG7R.js +238 -0
- package/dist/wiki-EHZ7LG7R.js.map +1 -0
- package/guides/processing/claude-code.md +152 -0
- package/guides/processing/codex.md +214 -0
- package/guides/processing/generic.md +128 -0
- package/package.json +2 -2
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# Processing Unknown Session Formats
|
|
2
|
+
|
|
3
|
+
This guide is a fallback for session files that do not match known formats (Claude Code JSONL, Codex rollout JSONL). Use it when you encounter a new tool or an unrecognized file structure.
|
|
4
|
+
|
|
5
|
+
## Step 1: Identify the format
|
|
6
|
+
|
|
7
|
+
Check the file extension and first few lines:
|
|
8
|
+
|
|
9
|
+
- **JSONL** (`.jsonl`): One JSON object per line. Parse each line independently
|
|
10
|
+
- **JSON** (`.json`): Single JSON document. Could be an array of messages or a nested conversation object
|
|
11
|
+
- **Markdown** (`.md`): Likely a conversation export with `## Human` / `## Assistant` headers or similar
|
|
12
|
+
- **Plain text** (`.txt`, `.log`): Look for turn separators (blank lines, `---`, timestamps)
|
|
13
|
+
- **SQLite** (`.sqlite`, `.db`): Database with conversation tables. List tables first, then query
|
|
14
|
+
|
|
15
|
+
For JSONL, check each line for a `type`, `role`, or `kind` field. The presence of certain fields reveals the format:
|
|
16
|
+
- `type: "user"` / `type: "assistant"` + `message.content` -> Claude Code format
|
|
17
|
+
- `type: "response_item"` / `type: "event_msg"` -> Codex format
|
|
18
|
+
- `role: "user"` / `role: "assistant"` without a wrapper type -> Raw API conversation log
|
|
19
|
+
- `type: "human"` / `type: "ai"` -> LangChain/LangSmith format
|
|
20
|
+
|
|
21
|
+
## Step 2: Classify each record
|
|
22
|
+
|
|
23
|
+
For any conversation format, records fall into these universal categories:
|
|
24
|
+
|
|
25
|
+
### Signal (extract)
|
|
26
|
+
1. **Human messages** -- What the user asked, decided, or directed. These explain intent, requirements, and constraints. Look for:
|
|
27
|
+
- Records with `role: "user"` or `type: "human"` or `type: "user_message"`
|
|
28
|
+
- Text that reads like natural language instructions, questions, or feedback
|
|
29
|
+
- Short messages (under ~500 chars) are almost always signal
|
|
30
|
+
|
|
31
|
+
2. **AI reasoning text** -- Explanations, analyses, decisions, and summaries the model produced. Look for:
|
|
32
|
+
- Records with `role: "assistant"` and text content (not tool calls)
|
|
33
|
+
- Fields named `text`, `content`, `message`, `output`, `response`
|
|
34
|
+
- Text that explains *why* something was done, not *what* command was run
|
|
35
|
+
|
|
36
|
+
3. **Final/summary responses** -- The model's synthesized answer after a chain of tool use. Look for:
|
|
37
|
+
- The last assistant message before the next human message
|
|
38
|
+
- Fields named `final_response`, `last_message`, `summary`, `result`
|
|
39
|
+
- These are typically the densest signal per byte
|
|
40
|
+
|
|
41
|
+
4. **Error messages and failures** -- What went wrong and why. Look for:
|
|
42
|
+
- Records containing `error`, `failed`, `exception`, `traceback`
|
|
43
|
+
- These often reveal important constraints, gotchas, or architectural issues
|
|
44
|
+
|
|
45
|
+
### Noise (skip)
|
|
46
|
+
1. **File contents returned by tools** -- Already in the repo. Look for:
|
|
47
|
+
- Records with `type: "tool_result"`, `type: "function_call_output"`, or `type: "tool_output"`
|
|
48
|
+
- Content that looks like source code (imports, function definitions, indented blocks)
|
|
49
|
+
- Long strings (>2KB) that are clearly file dumps
|
|
50
|
+
- **Size clue:** If a record is >10KB, it is almost certainly a tool result, not reasoning
|
|
51
|
+
|
|
52
|
+
2. **Tool invocations** -- What commands were run. Operational, not knowledge. Look for:
|
|
53
|
+
- Records with `type: "tool_use"`, `type: "function_call"`, or `type: "tool_call"`
|
|
54
|
+
- Fields named `name`, `arguments`, `input`, `command`
|
|
55
|
+
- Exception: file edit commands may contain the *what* of a change (summarize those)
|
|
56
|
+
|
|
57
|
+
3. **System prompts and instructions** -- Same across sessions. Look for:
|
|
58
|
+
- Records with `role: "system"` or `role: "developer"`
|
|
59
|
+
- Content wrapped in XML tags (`<instructions>`, `<context>`, `<rules>`)
|
|
60
|
+
- Long preambles about model behavior, tool availability, permissions
|
|
61
|
+
|
|
62
|
+
4. **Metadata and telemetry** -- Session infrastructure. Look for:
|
|
63
|
+
- Token counts, usage statistics, rate limits
|
|
64
|
+
- Timestamps, UUIDs, session IDs (useful for linking but not knowledge)
|
|
65
|
+
- Permission changes, mode switches, checkpoint markers
|
|
66
|
+
|
|
67
|
+
5. **Base64-encoded data** -- Images, files, binary content. Look for:
|
|
68
|
+
- Long strings matching `[A-Za-z0-9+/=]{1000,}` or data URIs
|
|
69
|
+
- Fields named `image`, `data`, `base64`, `source.data`
|
|
70
|
+
|
|
71
|
+
6. **Duplicate records** -- Many formats log the same event multiple ways. Look for:
|
|
72
|
+
- Records sharing an ID field (`call_id`, `tool_use_id`, `request_id`)
|
|
73
|
+
- The same text appearing in both a streaming record and a final record
|
|
74
|
+
|
|
75
|
+
### Summarize (compress)
|
|
76
|
+
1. **Sequences of tool calls** -- "Read 15 files in src/lib/" is better than 15 individual read records
|
|
77
|
+
2. **Repetitive status updates** -- "Still searching..." x10 -> "Searched extensively"
|
|
78
|
+
3. **Build/test output** -- "Tests: 58/58 passed" not the full test runner output
|
|
79
|
+
4. **File edit details** -- "Modified auth.ts: added token validation" not the full diff
|
|
80
|
+
|
|
81
|
+
## Step 3: Estimate signal ratio
|
|
82
|
+
|
|
83
|
+
Before processing the full file, sample it:
|
|
84
|
+
|
|
85
|
+
1. Take the first 20 records, middle 20, and last 20
|
|
86
|
+
2. Classify each as signal / noise / summarize
|
|
87
|
+
3. Measure bytes in each category
|
|
88
|
+
|
|
89
|
+
Typical ratios from known formats:
|
|
90
|
+
- **Claude Code:** 3-15% signal, 85-97% noise (tool results + wrapper overhead dominate)
|
|
91
|
+
- **Codex:** 10-19% signal, 50-70% noise, 20-30% ambiguous (reasoning encrypted, compacted records)
|
|
92
|
+
- **Raw API logs:** 30-50% signal (no tool overhead, just conversation)
|
|
93
|
+
- **Chat exports (markdown):** 60-80% signal (already cleaned by the export process)
|
|
94
|
+
|
|
95
|
+
## Step 4: Extract
|
|
96
|
+
|
|
97
|
+
For each signal record, extract:
|
|
98
|
+
- **Who said it:** human or AI
|
|
99
|
+
- **What they said:** the text content
|
|
100
|
+
- **When:** timestamp if available
|
|
101
|
+
- **Context:** what came before (the preceding human message gives context to an AI response)
|
|
102
|
+
|
|
103
|
+
Structure the output as a sequence of turns:
|
|
104
|
+
```
|
|
105
|
+
[timestamp] HUMAN: <message>
|
|
106
|
+
[timestamp] AI: <response>
|
|
107
|
+
[timestamp] HUMAN: <follow-up>
|
|
108
|
+
[timestamp] AI: <response>
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Step 5: Handle unknowns gracefully
|
|
112
|
+
|
|
113
|
+
If you cannot classify a record:
|
|
114
|
+
- If it is <1KB, include it (small records are cheap to carry)
|
|
115
|
+
- If it is >10KB, skip it (large unclassified records are almost always tool output)
|
|
116
|
+
- If it contains natural language prose, include it
|
|
117
|
+
- If it contains code, JSON, or structured data, skip it
|
|
118
|
+
|
|
119
|
+
## Privacy checks
|
|
120
|
+
|
|
121
|
+
Before outputting extracted content, scan for:
|
|
122
|
+
- API keys: patterns like `sk-`, `ghp_`, `Bearer `, `token: "`
|
|
123
|
+
- Passwords: fields named `password`, `passwd`, `secret`
|
|
124
|
+
- Personal data: email addresses, IP addresses, phone numbers
|
|
125
|
+
- File paths: may reveal usernames (e.g., `/Users/johndoe/`)
|
|
126
|
+
- JWT tokens: strings matching `eyJ...`
|
|
127
|
+
|
|
128
|
+
Flag these but do not include them in extracted output.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "codealmanac",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.7",
|
|
4
4
|
"description": "A living wiki for codebases, maintained by AI agents. Documents what the code can't say: decisions, flows, invariants, incidents, gotchas.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"wiki",
|
|
@@ -34,7 +34,7 @@
|
|
|
34
34
|
"LICENSE"
|
|
35
35
|
],
|
|
36
36
|
"engines": {
|
|
37
|
-
"node": "
|
|
37
|
+
"node": "20.x || 22.x || 23.x || 24.x || 25.x"
|
|
38
38
|
},
|
|
39
39
|
"scripts": {
|
|
40
40
|
"build": "tsup",
|