@lightupai/polaris 0.0.29 → 0.0.30
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/design-backfill.md +150 -0
- package/package.json +1 -1
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
# Design: Backfill
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
Events can be lost when the daemon is down, the API is unreachable, or the daemon crashes. The backfill feature recovers lost events from local sources and replays them to the API.
|
|
6
|
+
|
|
7
|
+
## Command
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
/polaris backfill — auto-detect gap, backfill from best source
|
|
11
|
+
/polaris backfill 2h — backfill the last 2 hours
|
|
12
|
+
/polaris backfill --from <ts> — backfill from a specific timestamp
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Data Sources (in priority order)
|
|
16
|
+
|
|
17
|
+
### 1. Daemon JSONL log (`~/.polaris/logs/daemon-YYYY-MM-DD.jsonl`)
|
|
18
|
+
|
|
19
|
+
**When it's available**: Daemon was running and received the hook event. The log is written before the API relay, so even failed relays are captured.
|
|
20
|
+
|
|
21
|
+
**What it contains**: Full request payloads — endpoint, timestamp, hook_event_name, prompt/response content, tool calls. Also response status on failures.
|
|
22
|
+
|
|
23
|
+
**When it's incomplete**:
|
|
24
|
+
- Daemon wasn't running (hooks failed to reach localhost:4322)
|
|
25
|
+
- Daemon crashed before writing the log entry
|
|
26
|
+
- Log file was manually deleted or rotated
|
|
27
|
+
|
|
28
|
+
**Parsing**: Already structured JSONL. Each line is `{t, endpoint, payload, response?}`. Direct replay to the API.
|
|
29
|
+
|
|
30
|
+
### 2. Claude Code transcript (`~/.claude/projects/.../SESSION_ID.jsonl`)
|
|
31
|
+
|
|
32
|
+
**When it's available**: Always — Claude Code manages this file regardless of Polaris. Every conversation is persisted.
|
|
33
|
+
|
|
34
|
+
**What it contains**: The complete conversation — every user message, every assistant response, every tool call and result. Raw and unstructured relative to Polaris events.
|
|
35
|
+
|
|
36
|
+
**When it's incomplete**: Only if Claude Code itself crashed or the file was deleted.
|
|
37
|
+
|
|
38
|
+
**Parsing**: Requires extracting Polaris-relevant events from the conversation format:
|
|
39
|
+
- User messages (role: "user", content is string) → UserPromptSubmit
|
|
40
|
+
- Assistant messages (role: "assistant") → Stop events
|
|
41
|
+
- Tool use blocks → PreToolUse/PostToolUse
|
|
42
|
+
- Need to distinguish real user prompts from tool_result messages (content is array)
|
|
43
|
+
|
|
44
|
+
The `capture-stop.ts` hook already does this parsing for Stop events. The same logic can be reused.
|
|
45
|
+
|
|
46
|
+
### 3. Nothing
|
|
47
|
+
|
|
48
|
+
Both sources are gone. Data is irrecoverably lost. Backfill reports the gap.
|
|
49
|
+
|
|
50
|
+
## Backfill Algorithm
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
1. Query API for the most recent event timestamp for the current session/project
|
|
54
|
+
2. Determine the gap: from last API event to now (or specified time range)
|
|
55
|
+
3. Try daemon log first:
|
|
56
|
+
a. Read log entries in the gap period
|
|
57
|
+
b. Filter to /events endpoint entries
|
|
58
|
+
c. Check each against the API (by timestamp + payload hash) to avoid duplicates
|
|
59
|
+
d. Replay missing entries to the API
|
|
60
|
+
4. If daemon log is incomplete (gap still exists after step 3):
|
|
61
|
+
a. Find the transcript file (path is in hook payloads or daemon log)
|
|
62
|
+
b. Parse transcript for events in the remaining gap
|
|
63
|
+
c. Construct Polaris events from transcript entries
|
|
64
|
+
d. Replay to API
|
|
65
|
+
5. Report results:
|
|
66
|
+
a. How many events were recovered
|
|
67
|
+
b. From which source (daemon log vs transcript)
|
|
68
|
+
c. Any remaining gaps
|
|
69
|
+
d. Post abridged summary to Slack as a thread reply
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Deduplication
|
|
73
|
+
|
|
74
|
+
Events replayed during backfill must not create duplicates. Strategies:
|
|
75
|
+
- **Timestamp matching**: Check if an event with the same timestamp (±1s) and same hook_event_name already exists
|
|
76
|
+
- **Content hash**: Hash the payload content and check against existing events
|
|
77
|
+
- **Idempotent insert**: The API could support an `idempotency_key` parameter that rejects duplicates
|
|
78
|
+
|
|
79
|
+
For v1, timestamp matching is sufficient. The API's event IDs are UUIDs generated on insert, so backfilled events get new IDs but the content matches.
|
|
80
|
+
|
|
81
|
+
## Slack Recovery Summary
|
|
82
|
+
|
|
83
|
+
After backfill, post an abridged summary to the project's Slack channel as a thread reply on the last message before the gap:
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
:warning: Recovery log — N events recovered from daemon log / transcript.
|
|
87
|
+
Gap: 3:29am – 3:44am PT
|
|
88
|
+
|
|
89
|
+
• 3:29 — user:manu: "add a todo for the stale daemon issue"
|
|
90
|
+
• 3:30 — agent:claude: fixed daemon default port
|
|
91
|
+
• 3:37 — user:manu: "so at this point, nothing is making it to slack"
|
|
92
|
+
• ...
|
|
93
|
+
|
|
94
|
+
M events recovered, K gaps remain.
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## MCP Tool
|
|
98
|
+
|
|
99
|
+
```typescript
|
|
100
|
+
{
|
|
101
|
+
name: "polaris_backfill",
|
|
102
|
+
description: "Recover lost events from local logs",
|
|
103
|
+
inputSchema: {
|
|
104
|
+
properties: {
|
|
105
|
+
duration: { type: "string", description: "e.g., '2h', '30m'" },
|
|
106
|
+
from: { type: "string", description: "ISO timestamp" },
|
|
107
|
+
},
|
|
108
|
+
},
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
The tool calls the daemon's `/backfill` endpoint. The daemon does the actual work (reads logs, parses transcripts, replays to API).
|
|
113
|
+
|
|
114
|
+
## Daemon Endpoint
|
|
115
|
+
|
|
116
|
+
```
|
|
117
|
+
POST /backfill
|
|
118
|
+
{
|
|
119
|
+
"ccSessionId": "...",
|
|
120
|
+
"duration": "2h", // optional
|
|
121
|
+
"from": "2026-06-10...", // optional
|
|
122
|
+
}
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Response:
|
|
126
|
+
```json
|
|
127
|
+
{
|
|
128
|
+
"recovered": 42,
|
|
129
|
+
"source": "daemon_log", // or "transcript" or "both"
|
|
130
|
+
"gaps": [], // time ranges with no data
|
|
131
|
+
"slackThreadTs": "..." // where the recovery summary was posted
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Implementation Order
|
|
136
|
+
|
|
137
|
+
1. **Daemon log replay** — simplest, most structured, covers the common case (API was unreachable but daemon was running)
|
|
138
|
+
2. **Transcript parsing** — more complex, covers the case where daemon was also down
|
|
139
|
+
3. **Slack recovery summary** — nice to have, reuse the pattern from the manual recovery we did
|
|
140
|
+
4. **Deduplication** — important for correctness, add after basic replay works
|
|
141
|
+
|
|
142
|
+
## Open Questions
|
|
143
|
+
|
|
144
|
+
1. **Should backfill be automatic?** The daemon could detect gaps on startup (compare last log entry to last API event) and auto-backfill. Risk: could replay stale events unintentionally.
|
|
145
|
+
|
|
146
|
+
2. **Transcript format stability**: Claude Code's transcript format is not a public API. It could change between versions. How brittle is the parser?
|
|
147
|
+
|
|
148
|
+
3. **Multi-day gaps**: The daemon log is per-day. A multi-day outage requires reading multiple files. The transcript spans the whole session.
|
|
149
|
+
|
|
150
|
+
4. **Cross-session backfill**: If the user was in session A, disconnected, joined session B, and wants to backfill A — the daemon log has events for both. Need to filter by session/project.
|