@lightupai/polaris 0.0.29 → 0.0.31
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/design-backfill.md +169 -0
- package/package.json +1 -1
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
# Design: Backfill
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
Events can be lost when the daemon is down, the API is unreachable, or the daemon crashes. The backfill feature recovers lost events from local sources and replays them to the API.
|
|
6
|
+
|
|
7
|
+
## Command
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
/polaris backfill — auto-detect gap, backfill from best source
|
|
11
|
+
/polaris backfill 2h — backfill the last 2 hours
|
|
12
|
+
/polaris backfill --from <ts> — backfill from a specific timestamp
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Data Sources (in priority order)
|
|
16
|
+
|
|
17
|
+
### 1. Daemon JSONL log (`~/.polaris/logs/daemon-YYYY-MM-DD.jsonl`)
|
|
18
|
+
|
|
19
|
+
**When it's available**: Daemon was running and received the hook event. The log is written before the API relay, so even failed relays are captured.
|
|
20
|
+
|
|
21
|
+
**What it contains**: Full request payloads — endpoint, timestamp, hook_event_name, prompt/response content, tool calls. Also response status on failures.
|
|
22
|
+
|
|
23
|
+
**When it's incomplete**:
|
|
24
|
+
- Daemon wasn't running (hooks failed to reach localhost:4322)
|
|
25
|
+
- Daemon crashed before writing the log entry
|
|
26
|
+
- Log file was manually deleted or rotated
|
|
27
|
+
|
|
28
|
+
**Parsing**: Already structured JSONL. Each line is `{t, endpoint, payload, response?}`. Direct replay to the API.
|
|
29
|
+
|
|
30
|
+
### 2. Claude Code transcript (`~/.claude/projects/.../SESSION_ID.jsonl`)
|
|
31
|
+
|
|
32
|
+
**When it's available**: Always — Claude Code manages this file regardless of Polaris. Every conversation is persisted.
|
|
33
|
+
|
|
34
|
+
**What it contains**: The complete conversation — every user message, every assistant response, every tool call and result. Raw and unstructured relative to Polaris events.
|
|
35
|
+
|
|
36
|
+
**When it's incomplete**: Only if Claude Code itself crashed or the file was deleted.
|
|
37
|
+
|
|
38
|
+
**Parsing**: Requires extracting Polaris-relevant events from the conversation format:
|
|
39
|
+
- User messages (role: "user", content is string) → UserPromptSubmit
|
|
40
|
+
- Assistant messages (role: "assistant") → Stop events
|
|
41
|
+
- Tool use blocks → PreToolUse/PostToolUse
|
|
42
|
+
- Need to distinguish real user prompts from tool_result messages (content is array)
|
|
43
|
+
|
|
44
|
+
The `capture-stop.ts` hook already does this parsing for Stop events. The same logic can be reused.
|
|
45
|
+
|
|
46
|
+
### 3. Nothing
|
|
47
|
+
|
|
48
|
+
Both sources are gone. Data is irrecoverably lost. Backfill reports the gap.
|
|
49
|
+
|
|
50
|
+
## Backfill Algorithm
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
1. Query API for the most recent event timestamp for the current session/project
|
|
54
|
+
2. Determine the gap: from last API event to now (or specified time range)
|
|
55
|
+
3. Try daemon log first:
|
|
56
|
+
a. Read log entries in the gap period
|
|
57
|
+
b. Filter to /events endpoint entries
|
|
58
|
+
c. Check each against the API (by timestamp + payload hash) to avoid duplicates
|
|
59
|
+
d. Replay missing entries to the API
|
|
60
|
+
4. If daemon log is incomplete (gap still exists after step 3):
|
|
61
|
+
a. Find the transcript file (path is in hook payloads or daemon log)
|
|
62
|
+
b. Parse transcript for events in the remaining gap
|
|
63
|
+
c. Construct Polaris events from transcript entries
|
|
64
|
+
d. Replay to API
|
|
65
|
+
5. Report results:
|
|
66
|
+
a. How many events were recovered
|
|
67
|
+
b. From which source (daemon log vs transcript)
|
|
68
|
+
c. Any remaining gaps
|
|
69
|
+
d. Post abridged summary to Slack as a thread reply
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Deduplication
|
|
73
|
+
|
|
74
|
+
Events replayed during backfill must not create duplicates. Strategies:
|
|
75
|
+
- **Timestamp matching**: Check if an event with the same timestamp (±1s) and same hook_event_name already exists
|
|
76
|
+
- **Content hash**: Hash the payload content and check against existing events
|
|
77
|
+
- **Idempotent insert**: The API could support an `idempotency_key` parameter that rejects duplicates
|
|
78
|
+
|
|
79
|
+
For v1, timestamp matching is sufficient. The API's event IDs are UUIDs generated on insert, so backfilled events get new IDs but the content matches.
|
|
80
|
+
|
|
81
|
+
## Slack Recovery Summary
|
|
82
|
+
|
|
83
|
+
After backfill, post an abridged summary to the project's Slack channel as a thread reply on the last message before the gap:
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
:warning: Recovery log — N events recovered from daemon log / transcript.
|
|
87
|
+
Gap: 3:29am – 3:44am PT
|
|
88
|
+
|
|
89
|
+
• 3:29 — user:manu: "add a todo for the stale daemon issue"
|
|
90
|
+
• 3:30 — agent:claude: fixed daemon default port
|
|
91
|
+
• 3:37 — user:manu: "so at this point, nothing is making it to slack"
|
|
92
|
+
• ...
|
|
93
|
+
|
|
94
|
+
M events recovered, K gaps remain.
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## MCP Tool
|
|
98
|
+
|
|
99
|
+
```typescript
|
|
100
|
+
{
|
|
101
|
+
name: "polaris_backfill",
|
|
102
|
+
description: "Recover lost events from local logs",
|
|
103
|
+
inputSchema: {
|
|
104
|
+
properties: {
|
|
105
|
+
duration: { type: "string", description: "e.g., '2h', '30m'" },
|
|
106
|
+
from: { type: "string", description: "ISO timestamp" },
|
|
107
|
+
},
|
|
108
|
+
},
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
The tool calls the daemon's `/backfill` endpoint. The daemon does the actual work (reads logs, parses transcripts, replays to API).
|
|
113
|
+
|
|
114
|
+
## Daemon Endpoint
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
POST /backfill
|
|
118
|
+
{
|
|
119
|
+
"ccSessionId": "...",
|
|
120
|
+
"duration": "2h", // optional
|
|
121
|
+
"from": "2026-06-10...", // optional
|
|
122
|
+
}
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Response:
|
|
126
|
+
```json
|
|
127
|
+
{
|
|
128
|
+
"recovered": 42,
|
|
129
|
+
"source": "daemon_log", // or "transcript" or "both"
|
|
130
|
+
"gaps": [], // time ranges with no data
|
|
131
|
+
"slackThreadTs": "..." // where the recovery summary was posted
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Implementation Order
|
|
136
|
+
|
|
137
|
+
1. **Daemon log replay** — simplest, most structured, covers the common case (API was unreachable but daemon was running)
|
|
138
|
+
2. **Transcript parsing** — more complex, covers the case where daemon was also down
|
|
139
|
+
3. **Slack recovery summary** — nice to have, reuse the pattern from the manual recovery we did
|
|
140
|
+
4. **Deduplication** — important for correctness, add after basic replay works
|
|
141
|
+
|
|
142
|
+
## Name Changes During Gap
|
|
143
|
+
|
|
144
|
+
Project renames, session changes, and Slack channel renames can happen during a gap. Backfill must handle these correctly.
|
|
145
|
+
|
|
146
|
+
**Key insight**: The daemon log stores raw hook payloads, not project/session names. The daemon adds the project/session from its session mapping at relay time. So on replay, the daemon should use the *current* mapping, not reconstruct a historical one.
|
|
147
|
+
|
|
148
|
+
| Change during gap | Impact | Handling |
|
|
149
|
+
|---|---|---|
|
|
150
|
+
| Project renamed | Log doesn't contain project name — daemon resolves it from current mapping | Works automatically |
|
|
151
|
+
| Slack channel renamed | Bridge looks up channel by project → channel ID in DB | Works if DB mapping is current |
|
|
152
|
+
| User switched projects | Log has events for both time periods | Filter by timestamp range per project |
|
|
153
|
+
| Session handed off to new driver | Sender identity changes | Use current session's driver/agent at replay time |
|
|
154
|
+
|
|
155
|
+
**Transcript fallback complication**: The transcript doesn't know about Polaris projects/sessions at all. When parsing the transcript, backfill must ask: "which project was this CC session connected to at this timestamp?" The daemon log has `/connect` entries that establish the timeline of project associations. If the daemon log is also missing, the current session mapping is the only reference — which may not reflect historical state.
|
|
156
|
+
|
|
157
|
+
**Recommendation**: Always log `/connect` and `/disconnect` events to a separate persistent file (`~/.polaris/session-history.jsonl`) that survives daemon restarts. This gives backfill a reliable timeline of which CC session was in which project at what time.
|
|
158
|
+
|
|
159
|
+
## Open Questions
|
|
160
|
+
|
|
161
|
+
1. **Should backfill be automatic?** The daemon could detect gaps on startup (compare last log entry to last API event) and auto-backfill. Risk: could replay stale events unintentionally.
|
|
162
|
+
|
|
163
|
+
2. **Transcript format stability**: Claude Code's transcript format is not a public API. It could change between versions. How brittle is the parser?
|
|
164
|
+
|
|
165
|
+
3. **Multi-day gaps**: The daemon log is per-day. A multi-day outage requires reading multiple files. The transcript spans the whole session.
|
|
166
|
+
|
|
167
|
+
4. **Cross-session backfill**: If the user was in session A, disconnected, joined session B, and wants to backfill A — the daemon log has events for both. Need to filter by session/project.
|
|
168
|
+
|
|
169
|
+
5. **Session history persistence**: Should we add `~/.polaris/session-history.jsonl` now (cheap, foundational for backfill) or defer until backfill is implemented?
|