codealmanac 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/dist/chunk-2JJTTN7P.js +539 -0
  2. package/dist/chunk-2JJTTN7P.js.map +1 -0
  3. package/dist/chunk-3C5SY5SE.js +1239 -0
  4. package/dist/chunk-3C5SY5SE.js.map +1 -0
  5. package/dist/chunk-4CODZRHH.js +19 -0
  6. package/dist/chunk-4CODZRHH.js.map +1 -0
  7. package/dist/chunk-7JUX4ADQ.js +38 -0
  8. package/dist/chunk-7JUX4ADQ.js.map +1 -0
  9. package/dist/chunk-A6PUCAVJ.js +145 -0
  10. package/dist/chunk-A6PUCAVJ.js.map +1 -0
  11. package/dist/chunk-AXFPUHBN.js +227 -0
  12. package/dist/chunk-AXFPUHBN.js.map +1 -0
  13. package/dist/chunk-FM3VRDK7.js +20 -0
  14. package/dist/chunk-FM3VRDK7.js.map +1 -0
  15. package/dist/chunk-H6WU6PYH.js +441 -0
  16. package/dist/chunk-H6WU6PYH.js.map +1 -0
  17. package/dist/chunk-P3LDTCLB.js +34 -0
  18. package/dist/chunk-P3LDTCLB.js.map +1 -0
  19. package/dist/chunk-QHQ6YH7U.js +81 -0
  20. package/dist/chunk-QHQ6YH7U.js.map +1 -0
  21. package/dist/chunk-Z4MWLVS2.js +355 -0
  22. package/dist/chunk-Z4MWLVS2.js.map +1 -0
  23. package/dist/chunk-Z6MBJ3D2.js +203 -0
  24. package/dist/chunk-Z6MBJ3D2.js.map +1 -0
  25. package/dist/cli-AIH5QQ5H.js +393 -0
  26. package/dist/cli-AIH5QQ5H.js.map +1 -0
  27. package/dist/codealmanac.js +68 -5954
  28. package/dist/codealmanac.js.map +1 -1
  29. package/dist/doctor-6FN5JO5F.js +15 -0
  30. package/dist/doctor-6FN5JO5F.js.map +1 -0
  31. package/dist/hook-CRJMWSSO.js +12 -0
  32. package/dist/hook-CRJMWSSO.js.map +1 -0
  33. package/dist/register-commands-PZMQNGCH.js +2644 -0
  34. package/dist/register-commands-PZMQNGCH.js.map +1 -0
  35. package/dist/uninstall-NBEZNNKM.js +12 -0
  36. package/dist/uninstall-NBEZNNKM.js.map +1 -0
  37. package/dist/update-IL243I4E.js +10 -0
  38. package/dist/update-IL243I4E.js.map +1 -0
  39. package/dist/wiki-EHZ7LG7R.js +238 -0
  40. package/dist/wiki-EHZ7LG7R.js.map +1 -0
  41. package/guides/processing/claude-code.md +152 -0
  42. package/guides/processing/codex.md +214 -0
  43. package/guides/processing/generic.md +128 -0
  44. package/package.json +2 -2
@@ -0,0 +1,128 @@
1
+ # Processing Unknown Session Formats
2
+
3
+ This guide is a fallback for session files that do not match known formats (Claude Code JSONL, Codex rollout JSONL). Use it when you encounter a new tool or an unrecognized file structure.
4
+
5
+ ## Step 1: Identify the format
6
+
7
+ Check the file extension and first few lines:
8
+
9
+ - **JSONL** (`.jsonl`): One JSON object per line. Parse each line independently
10
+ - **JSON** (`.json`): Single JSON document. Could be an array of messages or a nested conversation object
11
+ - **Markdown** (`.md`): Likely a conversation export with `## Human` / `## Assistant` headers or similar
12
+ - **Plain text** (`.txt`, `.log`): Look for turn separators (blank lines, `---`, timestamps)
13
+ - **SQLite** (`.sqlite`, `.db`): Database with conversation tables. List tables first, then query
14
+
15
+ For JSONL, check each line for a `type`, `role`, or `kind` field. The presence of certain fields reveals the format:
16
+ - `type: "user"` / `type: "assistant"` + `message.content` -> Claude Code format
17
+ - `type: "response_item"` / `type: "event_msg"` -> Codex format
18
+ - `role: "user"` / `role: "assistant"` without a wrapper type -> Raw API conversation log
19
+ - `type: "human"` / `type: "ai"` -> LangChain/LangSmith format
20
+
21
+ ## Step 2: Classify each record
22
+
23
+ For any conversation format, records fall into these universal categories:
24
+
25
+ ### Signal (extract)
26
+ 1. **Human messages** -- What the user asked, decided, or directed. These explain intent, requirements, and constraints. Look for:
27
+ - Records with `role: "user"` or `type: "human"` or `type: "user_message"`
28
+ - Text that reads like natural language instructions, questions, or feedback
29
+ - Short messages (under ~500 chars) are almost always signal
30
+
31
+ 2. **AI reasoning text** -- Explanations, analyses, decisions, and summaries the model produced. Look for:
32
+ - Records with `role: "assistant"` and text content (not tool calls)
33
+ - Fields named `text`, `content`, `message`, `output`, `response`
34
+ - Text that explains *why* something was done, not *what* command was run
35
+
36
+ 3. **Final/summary responses** -- The model's synthesized answer after a chain of tool use. Look for:
37
+ - The last assistant message before the next human message
38
+ - Fields named `final_response`, `last_message`, `summary`, `result`
39
+ - These are typically the densest signal per byte
40
+
41
+ 4. **Error messages and failures** -- What went wrong and why. Look for:
42
+ - Records containing `error`, `failed`, `exception`, `traceback`
43
+ - These often reveal important constraints, gotchas, or architectural issues
44
+
45
+ ### Noise (skip)
46
+ 1. **File contents returned by tools** -- Already in the repo. Look for:
47
+ - Records with `type: "tool_result"`, `type: "function_call_output"`, or `type: "tool_output"`
48
+ - Content that looks like source code (imports, function definitions, indented blocks)
49
+ - Long strings (>2KB) that are clearly file dumps
50
+ - **Size clue:** If a record is >10KB, it is almost certainly a tool result, not reasoning
51
+
52
+ 2. **Tool invocations** -- What commands were run. Operational, not knowledge. Look for:
53
+ - Records with `type: "tool_use"`, `type: "function_call"`, or `type: "tool_call"`
54
+ - Fields named `name`, `arguments`, `input`, `command`
55
+ - Exception: file edit commands may contain the *what* of a change (summarize those)
56
+
57
+ 3. **System prompts and instructions** -- Same across sessions. Look for:
58
+ - Records with `role: "system"` or `role: "developer"`
59
+ - Content wrapped in XML tags (`<instructions>`, `<context>`, `<rules>`)
60
+ - Long preambles about model behavior, tool availability, permissions
61
+
62
+ 4. **Metadata and telemetry** -- Session infrastructure. Look for:
63
+ - Token counts, usage statistics, rate limits
64
+ - Timestamps, UUIDs, session IDs (useful for linking but not knowledge)
65
+ - Permission changes, mode switches, checkpoint markers
66
+
67
+ 5. **Base64-encoded data** -- Images, files, binary content. Look for:
68
+ - Long strings matching `[A-Za-z0-9+/=]{1000,}` or data URIs
69
+ - Fields named `image`, `data`, `base64`, `source.data`
70
+
71
+ 6. **Duplicate records** -- Many formats log the same event multiple ways. Look for:
72
+ - Records sharing an ID field (`call_id`, `tool_use_id`, `request_id`)
73
+ - The same text appearing in both a streaming record and a final record
74
+
75
+ ### Summarize (compress)
76
+ 1. **Sequences of tool calls** -- "Read 15 files in src/lib/" is better than 15 individual read records
77
+ 2. **Repetitive status updates** -- "Still searching..." x10 -> "Searched extensively"
78
+ 3. **Build/test output** -- "Tests: 58/58 passed" not the full test runner output
79
+ 4. **File edit details** -- "Modified auth.ts: added token validation" not the full diff
80
+
81
+ ## Step 3: Estimate signal ratio
82
+
83
+ Before processing the full file, sample it:
84
+
85
+ 1. Take the first 20 records, middle 20, and last 20
86
+ 2. Classify each as signal / noise / summarize
87
+ 3. Measure bytes in each category
88
+
89
+ Typical ratios from known formats:
90
+ - **Claude Code:** 3-15% signal, 85-97% noise (tool results + wrapper overhead dominate)
91
+ - **Codex:** 10-19% signal, 50-70% noise, 20-30% ambiguous (reasoning encrypted, compacted records)
92
+ - **Raw API logs:** 30-50% signal (no tool overhead, just conversation)
93
+ - **Chat exports (markdown):** 60-80% signal (already cleaned by the export process)
94
+
95
+ ## Step 4: Extract
96
+
97
+ For each signal record, extract:
98
+ - **Who said it:** human or AI
99
+ - **What they said:** the text content
100
+ - **When:** timestamp if available
101
+ - **Context:** what came before (the preceding human message gives context to an AI response)
102
+
103
+ Structure the output as a sequence of turns:
104
+ ```
105
+ [timestamp] HUMAN: <message>
106
+ [timestamp] AI: <response>
107
+ [timestamp] HUMAN: <follow-up>
108
+ [timestamp] AI: <response>
109
+ ```
110
+
111
+ ## Step 5: Handle unknowns gracefully
112
+
113
+ If you cannot classify a record:
114
+ - If it is <1KB, include it (small records are cheap to carry)
115
+ - If it is >10KB, skip it (large unclassified records are almost always tool output)
116
+ - If it contains natural language prose, include it
117
+ - If it contains code, JSON, or structured data, skip it
118
+
119
+ ## Privacy checks
120
+
121
+ Before outputting extracted content, scan for:
122
+ - API keys: patterns like `sk-`, `ghp_`, `Bearer `, `token: "`
123
+ - Passwords: fields named `password`, `passwd`, `secret`
124
+ - Personal data: email addresses, IP addresses, phone numbers
125
+ - File paths: may reveal usernames (e.g., `/Users/johndoe/`)
126
+ - JWT tokens: strings matching `eyJ...`
127
+
128
+ Flag these but do not include them in extracted output.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codealmanac",
3
- "version": "0.1.5",
3
+ "version": "0.1.7",
4
4
  "description": "A living wiki for codebases, maintained by AI agents. Documents what the code can't say: decisions, flows, invariants, incidents, gotchas.",
5
5
  "keywords": [
6
6
  "wiki",
@@ -34,7 +34,7 @@
34
34
  "LICENSE"
35
35
  ],
36
36
  "engines": {
37
- "node": ">=20"
37
+ "node": "20.x || 22.x || 23.x || 24.x || 25.x"
38
38
  },
39
39
  "scripts": {
40
40
  "build": "tsup",