@lightupai/polaris 0.0.29 → 0.0.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,169 @@
1
+ # Design: Backfill
2
+
3
+ ## Context
4
+
5
+ Events can be lost when the daemon is down, the API is unreachable, or the daemon crashes. The backfill feature recovers lost events from local sources and replays them to the API.
6
+
7
+ ## Command
8
+
9
+ ```
10
+ /polaris backfill — auto-detect gap, backfill from best source
11
+ /polaris backfill 2h — backfill the last 2 hours
12
+ /polaris backfill --from <ts> — backfill from a specific timestamp
13
+ ```
14
+
15
+ ## Data Sources (in priority order)
16
+
17
+ ### 1. Daemon JSONL log (`~/.polaris/logs/daemon-YYYY-MM-DD.jsonl`)
18
+
19
+ **When it's available**: Daemon was running and received the hook event. The log is written before the API relay, so even failed relays are captured.
20
+
21
+ **What it contains**: Full request payloads — endpoint, timestamp, hook_event_name, prompt/response content, tool calls. Also response status on failures.
22
+
23
+ **When it's incomplete**:
24
+ - Daemon wasn't running (hooks failed to reach localhost:4322)
25
+ - Daemon crashed before writing the log entry
26
+ - Log file was manually deleted or rotated
27
+
28
+ **Parsing**: Already structured JSONL. Each line is `{t, endpoint, payload, response?}`. Direct replay to the API.
29
+
30
+ ### 2. Claude Code transcript (`~/.claude/projects/.../SESSION_ID.jsonl`)
31
+
32
+ **When it's available**: Always — Claude Code manages this file regardless of Polaris. Every conversation is persisted.
33
+
34
+ **What it contains**: The complete conversation — every user message, every assistant response, every tool call and result. Raw and unstructured relative to Polaris events.
35
+
36
+ **When it's incomplete**: Only if Claude Code itself crashed or the file was deleted.
37
+
38
+ **Parsing**: Requires extracting Polaris-relevant events from the conversation format:
39
+ - User messages (role: "user", content is string) → UserPromptSubmit
40
+ - Assistant messages (role: "assistant") → Stop events
41
+ - Tool use blocks → PreToolUse/PostToolUse
42
+ - Need to distinguish real user prompts from tool_result messages (content is array)
43
+
44
+ The `capture-stop.ts` hook already does this parsing for Stop events. The same logic can be reused.
45
+
46
+ ### 3. Nothing
47
+
48
+ Both sources are gone. Data is irrecoverably lost. Backfill reports the gap.
49
+
50
+ ## Backfill Algorithm
51
+
52
+ ```
53
+ 1. Query API for the most recent event timestamp for the current session/project
54
+ 2. Determine the gap: from last API event to now (or specified time range)
55
+ 3. Try daemon log first:
56
+ a. Read log entries in the gap period
57
+ b. Filter to /events endpoint entries
58
+ c. Check each against the API (by timestamp + payload hash) to avoid duplicates
59
+ d. Replay missing entries to the API
60
+ 4. If daemon log is incomplete (gap still exists after step 3):
61
+ a. Find the transcript file (path is in hook payloads or daemon log)
62
+ b. Parse transcript for events in the remaining gap
63
+ c. Construct Polaris events from transcript entries
64
+ d. Replay to API
65
+ 5. Report results:
66
+ a. How many events were recovered
67
+ b. From which source (daemon log vs transcript)
68
+ c. Any remaining gaps
69
+ d. Post abridged summary to Slack as a thread reply
70
+ ```
71
+
72
+ ## Deduplication
73
+
74
+ Events replayed during backfill must not create duplicates. Strategies:
75
+ - **Timestamp matching**: Check if an event with the same timestamp (±1s) and same hook_event_name already exists
76
+ - **Content hash**: Hash the payload content and check against existing events
77
+ - **Idempotent insert**: The API could support an `idempotency_key` parameter that rejects duplicates
78
+
79
+ For v1, timestamp matching is sufficient. The API's event IDs are UUIDs generated on insert, so backfilled events get new IDs but the content matches.
80
+
81
+ ## Slack Recovery Summary
82
+
83
+ After backfill, post an abridged summary to the project's Slack channel as a thread reply on the last message before the gap:
84
+
85
+ ```
86
+ :warning: Recovery log — N events recovered from daemon log / transcript.
87
+ Gap: 3:29am – 3:44am PT
88
+
89
+ • 3:29 — user:manu: "add a todo for the stale daemon issue"
90
+ • 3:30 — agent:claude: fixed daemon default port
91
+ • 3:37 — user:manu: "so at this point, nothing is making it to slack"
92
+ • ...
93
+
94
+ M events recovered, K gaps remain.
95
+ ```
96
+
97
+ ## MCP Tool
98
+
99
+ ```typescript
100
+ {
101
+ name: "polaris_backfill",
102
+ description: "Recover lost events from local logs",
103
+ inputSchema: {
104
+ properties: {
105
+ duration: { type: "string", description: "e.g., '2h', '30m'" },
106
+ from: { type: "string", description: "ISO timestamp" },
107
+ },
108
+ },
109
+ }
110
+ ```
111
+
112
+ The tool calls the daemon's `/backfill` endpoint. The daemon does the actual work (reads logs, parses transcripts, replays to API).
113
+
114
+ ## Daemon Endpoint
115
+
116
+ ```
117
+ POST /backfill
118
+ {
119
+ "ccSessionId": "...",
120
+ "duration": "2h", // optional
121
+ "from": "2026-06-10...", // optional
122
+ }
123
+ ```
124
+
125
+ Response:
126
+ ```json
127
+ {
128
+ "recovered": 42,
129
+ "source": "daemon_log", // or "transcript" or "both"
130
+ "gaps": [], // time ranges with no data
131
+ "slackThreadTs": "..." // where the recovery summary was posted
132
+ }
133
+ ```
134
+
135
+ ## Implementation Order
136
+
137
+ 1. **Daemon log replay** — simplest, most structured, covers the common case (API was unreachable but daemon was running)
138
+ 2. **Transcript parsing** — more complex, covers the case where daemon was also down
139
+ 3. **Slack recovery summary** — nice to have, reuse the pattern from the manual recovery we did
140
+ 4. **Deduplication** — important for correctness, add after basic replay works
141
+
142
+ ## Name Changes During Gap
143
+
144
+ Project renames, session changes, and Slack channel renames can happen during a gap. Backfill must handle these correctly.
145
+
146
+ **Key insight**: The daemon log stores raw hook payloads, not project/session names. The daemon adds the project/session from its session mapping at relay time. So on replay, the daemon should use the *current* mapping, not reconstruct a historical one.
147
+
148
+ | Change during gap | Impact | Handling |
149
+ |---|---|---|
150
+ | Project renamed | Log doesn't contain project name — daemon resolves it from current mapping | Works automatically |
151
+ | Slack channel renamed | Bridge looks up channel by project → channel ID in DB | Works if DB mapping is current |
152
+ | User switched projects | Log has events for both time periods | Filter by timestamp range per project |
153
+ | Session handed off to new driver | Sender identity changes | Use current session's driver/agent at replay time |
154
+
155
+ **Transcript fallback complication**: The transcript doesn't know about Polaris projects/sessions at all. When parsing the transcript, backfill must ask: "which project was this CC session connected to at this timestamp?" The daemon log has `/connect` entries that establish the timeline of project associations. If the daemon log is also missing, the current session mapping is the only reference — which may not reflect historical state.
156
+
157
+ **Recommendation**: Always log `/connect` and `/disconnect` events to a separate persistent file (`~/.polaris/session-history.jsonl`) that survives daemon restarts. This gives backfill a reliable timeline of which CC session was in which project at what time.
158
+
159
+ ## Open Questions
160
+
161
+ 1. **Should backfill be automatic?** The daemon could detect gaps on startup (compare last log entry to last API event) and auto-backfill. Risk: could replay stale events unintentionally.
162
+
163
+ 2. **Transcript format stability**: Claude Code's transcript format is not a public API. It could change between versions. How brittle is the parser?
164
+
165
+ 3. **Multi-day gaps**: The daemon log is per-day. A multi-day outage requires reading multiple files. The transcript spans the whole session.
166
+
167
+ 4. **Cross-session backfill**: If the user was in session A, disconnected, joined session B, and wants to backfill A — the daemon log has events for both. Need to filter by session/project.
168
+
169
+ 5. **Session history persistence**: Should we add `~/.polaris/session-history.jsonl` now (cheap, foundational for backfill) or defer until backfill is implemented?
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lightupai/polaris",
3
- "version": "0.0.29",
3
+ "version": "0.0.31",
4
4
  "type": "module",
5
5
  "bin": {
6
6
  "polaris": "bin/polaris",