llm-agent-dashboard 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- llm_agent_dashboard-0.1.0/.gitignore +8 -0
- llm_agent_dashboard-0.1.0/PKG-INFO +432 -0
- llm_agent_dashboard-0.1.0/README.md +419 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/__init__.py +4 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/anthropic.py +89 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/api.py +80 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/cli.py +38 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/db.py +345 -0
- llm_agent_dashboard-0.1.0/agent_dashboard/sdk.py +211 -0
- llm_agent_dashboard-0.1.0/pyproject.toml +37 -0
- llm_agent_dashboard-0.1.0/requirements.txt +2 -0
- llm_agent_dashboard-0.1.0/run_context.py +281 -0
- llm_agent_dashboard-0.1.0/static/index.html +960 -0
|
@@ -0,0 +1,432 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: llm-agent-dashboard
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Self-hosted observability dashboard for agentic flows — every LLM turn, tool call, and failure captured
|
|
5
|
+
License: MIT
|
|
6
|
+
Keywords: agent,anthropic,dashboard,llm,observability,openai
|
|
7
|
+
Requires-Python: >=3.11
|
|
8
|
+
Requires-Dist: fastapi>=0.110.0
|
|
9
|
+
Requires-Dist: uvicorn[standard]>=0.27.0
|
|
10
|
+
Provides-Extra: anthropic
|
|
11
|
+
Requires-Dist: anthropic>=0.20.0; extra == 'anthropic'
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
|
|
14
|
+
# Agent Dashboard
|
|
15
|
+
|
|
16
|
+
Self-hosted observability dashboard for agentic flows. Captures every LLM turn, every tool call, every failure with full inputs/outputs — displayed in a searchable, real-time web UI.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Overview → All Runs → Run Detail (iteration timeline + tool call table)
|
|
20
|
+
→ Failures & Skips → Tool Analytics
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Quickstart — plug into any Anthropic agent in 3 lines
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install agent-dashboard
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from agent_dashboard import RunContext
|
|
33
|
+
from agent_dashboard.anthropic import Anthropic # drop-in for anthropic.Anthropic
|
|
34
|
+
|
|
35
|
+
client = Anthropic() # same constructor args as anthropic.Anthropic()
|
|
36
|
+
|
|
37
|
+
with RunContext("my_agent", db_path="./agent_runs.db") as ctx:
|
|
38
|
+
client.set_context(ctx) # attach — all messages.create() calls auto-logged
|
|
39
|
+
# ... your existing agent code, zero other changes ...
|
|
40
|
+
response = client.messages.create(model=..., messages=..., tools=...)
|
|
41
|
+
# tokens, stop reason, tool names, duration — all captured automatically
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Start the dashboard:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
agent-dashboard serve --db ./agent_runs.db
|
|
48
|
+
# → http://localhost:7777
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## What it shows
|
|
54
|
+
|
|
55
|
+
- **KPI cards**: total runs, success rate, token usage, tool call error rate
|
|
56
|
+
- **7-day timeline chart**: stacked bar of success/failed/running per day
|
|
57
|
+
- **Per-run drilldown**: every LLM iteration with tokens, stop reason, tool calls used, and the full assistant text
|
|
58
|
+
- **Tool call inspection**: expandable inputs/results (JSON), quality signal, duration, error message
|
|
59
|
+
- **Failure analysis**: all failed runs grouped by error pattern
|
|
60
|
+
- **Tool analytics**: per-tool call counts, error rates, avg duration, quality breakdown
|
|
61
|
+
|
|
62
|
+
Auto-refreshes every 30 seconds. Live indicator for currently-running agents.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Installation
|
|
67
|
+
|
|
68
|
+
**From PyPI** (recommended):
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
pip install agent-dashboard # core
|
|
72
|
+
pip install "agent-dashboard[anthropic]" # + auto-instrumented Anthropic client
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**From source** (for local development):
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
cd ~/Projects/agent-dashboard
|
|
79
|
+
python -m venv .venv
|
|
80
|
+
source .venv/bin/activate
|
|
81
|
+
pip install -e ".[anthropic]"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Wiring up an agent that runs on GitHub Actions
|
|
87
|
+
|
|
88
|
+
This is a 4-step process. Steps 1–3 happen in your **agent's repo**. Step 4 happens locally after a run completes.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
### Step 1 — Copy `run_context.py` into your agent repo
|
|
93
|
+
|
|
94
|
+
`run_context.py` is a single self-contained file in the root of this repo. It has no dependencies beyond the Python standard library — just `sqlite3`, `json`, `uuid`, `time`, `os`, and `datetime`.
|
|
95
|
+
|
|
96
|
+
Copy it into the root of your agent project:
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
cp ~/Projects/agent-dashboard/run_context.py ~/your-agent-project/
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Commit it:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
cd ~/your-agent-project
|
|
106
|
+
git add run_context.py
|
|
107
|
+
git commit -m "add: agent dashboard run_context SDK"
|
|
108
|
+
git push
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
### Step 2 — Wrap your agent loop with `RunContext`
|
|
114
|
+
|
|
115
|
+
Open your agent's main Python file. Import `RunContext` and wrap your existing agent loop with it. You only need to add lines — do not change your existing tool-calling or LLM logic.
|
|
116
|
+
|
|
117
|
+
#### Full example for an Anthropic `client.messages.create` loop
|
|
118
|
+
|
|
119
|
+
```python
|
|
120
|
+
import time
|
|
121
|
+
from datetime import datetime
|
|
122
|
+
from run_context import RunContext
|
|
123
|
+
|
|
124
|
+
DB_PATH = "./agent_runs.db" # SQLite file that will be committed to git
|
|
125
|
+
|
|
126
|
+
def run_my_agent(user_prompt: str):
|
|
127
|
+
with RunContext(
|
|
128
|
+
agent_name="my_agent", # short label shown in the dashboard
|
|
129
|
+
db_path=DB_PATH,
|
|
130
|
+
topic_title=user_prompt[:80], # optional — human-readable label
|
|
131
|
+
metadata={"model": MODEL}, # optional — any extra info
|
|
132
|
+
) as ctx:
|
|
133
|
+
|
|
134
|
+
messages = [{"role": "user", "content": user_prompt}]
|
|
135
|
+
|
|
136
|
+
while True:
|
|
137
|
+
turn_start = time.time()
|
|
138
|
+
|
|
139
|
+
response = client.messages.create(
|
|
140
|
+
model=MODEL,
|
|
141
|
+
max_tokens=4096,
|
|
142
|
+
tools=tools,
|
|
143
|
+
messages=messages,
|
|
144
|
+
)
|
|
145
|
+
|
|
146
|
+
# ── log the LLM turn ──────────────────────────────────────────
|
|
147
|
+
text = next(
|
|
148
|
+
(b.text for b in response.content if hasattr(b, "text")), ""
|
|
149
|
+
)
|
|
150
|
+
tool_names = [
|
|
151
|
+
b.name for b in response.content if b.type == "tool_use"
|
|
152
|
+
]
|
|
153
|
+
ctx.log_iteration(
|
|
154
|
+
tokens_input=response.usage.input_tokens,
|
|
155
|
+
tokens_output=response.usage.output_tokens,
|
|
156
|
+
stop_reason=response.stop_reason,
|
|
157
|
+
assistant_preview=text[:200],
|
|
158
|
+
tool_names=tool_names,
|
|
159
|
+
duration_ms=int((time.time() - turn_start) * 1000),
|
|
160
|
+
)
|
|
161
|
+
# ─────────────────────────────────────────────────────────────
|
|
162
|
+
|
|
163
|
+
if response.stop_reason == "end_turn":
|
|
164
|
+
break
|
|
165
|
+
|
|
166
|
+
# run tool calls
|
|
167
|
+
tool_results = []
|
|
168
|
+
for block in response.content:
|
|
169
|
+
if block.type != "tool_use":
|
|
170
|
+
continue
|
|
171
|
+
|
|
172
|
+
# ── log each tool call ────────────────────────────────────
|
|
173
|
+
t0 = time.time()
|
|
174
|
+
result = execute_tool(block.name, block.input) # YOUR existing function
|
|
175
|
+
ctx.log_tool_call(
|
|
176
|
+
tool_name=block.name,
|
|
177
|
+
inputs=dict(block.input),
|
|
178
|
+
result=result,
|
|
179
|
+
duration_ms=int((time.time() - t0) * 1000),
|
|
180
|
+
)
|
|
181
|
+
# ─────────────────────────────────────────────────────────
|
|
182
|
+
|
|
183
|
+
tool_results.append({
|
|
184
|
+
"type": "tool_result",
|
|
185
|
+
"tool_use_id": block.id,
|
|
186
|
+
"content": str(result),
|
|
187
|
+
})
|
|
188
|
+
|
|
189
|
+
messages = messages + [
|
|
190
|
+
{"role": "assistant", "content": response.content},
|
|
191
|
+
{"role": "user", "content": tool_results},
|
|
192
|
+
]
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
#### What each method does
|
|
196
|
+
|
|
197
|
+
| Method | When to call | What it records |
|
|
198
|
+
|--------|-------------|-----------------|
|
|
199
|
+
| `ctx.log_iteration(...)` | Once per `client.messages.create` call | Token counts, stop reason, assistant text preview, which tools were called in this turn |
|
|
200
|
+
| `ctx.log_tool_call(...)` | Once per tool execution | Tool name, full inputs, full result (truncated if large), duration, success/error |
|
|
201
|
+
| `ctx.mark_failed(error)` | If you want to flag failure without raising | Sets run status to `failed` with your message |
|
|
202
|
+
|
|
203
|
+
The `with RunContext(...) as ctx:` block automatically:
|
|
204
|
+
- Creates the `agent_runs`, `agent_tool_calls`, `agent_iterations` tables in the SQLite file if they don't exist
|
|
205
|
+
- Writes a run record with `status='running'` at the start
|
|
206
|
+
- Updates it to `status='success'` or `status='failed'` (with the exception message) at the end
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
### Step 3 — Add a "Persist DB" step to your GitHub Actions workflow
|
|
211
|
+
|
|
212
|
+
At the end of your job in `.github/workflows/your-workflow.yml`, add this step **after** the step that runs your agent. The `if: always()` ensures the DB is committed even if the agent fails.
|
|
213
|
+
|
|
214
|
+
```yaml
|
|
215
|
+
- name: Persist agent run DB
|
|
216
|
+
if: always()
|
|
217
|
+
run: |
|
|
218
|
+
git config user.name "agent-bot"
|
|
219
|
+
git config user.email "bot@noreply.github.com"
|
|
220
|
+
git add -f agent_runs.db
|
|
221
|
+
git diff --staged --quiet || git commit -m "chore: persist agent run data [skip ci]"
|
|
222
|
+
git push
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Also make sure your workflow job has write permissions. Add this at the top level of your workflow file if it isn't already there:
|
|
226
|
+
|
|
227
|
+
```yaml
|
|
228
|
+
permissions:
|
|
229
|
+
contents: write
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
Full minimal workflow example:
|
|
233
|
+
|
|
234
|
+
```yaml
|
|
235
|
+
name: Run Agent
|
|
236
|
+
|
|
237
|
+
on:
|
|
238
|
+
workflow_dispatch:
|
|
239
|
+
schedule:
|
|
240
|
+
- cron: "0 9 * * *"
|
|
241
|
+
|
|
242
|
+
permissions:
|
|
243
|
+
contents: write
|
|
244
|
+
|
|
245
|
+
jobs:
|
|
246
|
+
run:
|
|
247
|
+
runs-on: ubuntu-latest
|
|
248
|
+
steps:
|
|
249
|
+
- uses: actions/checkout@v4
|
|
250
|
+
|
|
251
|
+
- uses: actions/setup-python@v5
|
|
252
|
+
with:
|
|
253
|
+
python-version: "3.11"
|
|
254
|
+
cache: pip
|
|
255
|
+
cache-dependency-path: requirements.txt
|
|
256
|
+
|
|
257
|
+
- name: Install dependencies
|
|
258
|
+
run: pip install -r requirements.txt
|
|
259
|
+
|
|
260
|
+
- name: Run agent
|
|
261
|
+
env:
|
|
262
|
+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
|
263
|
+
run: python main.py # or however you run your agent
|
|
264
|
+
|
|
265
|
+
- name: Persist agent run DB
|
|
266
|
+
if: always()
|
|
267
|
+
run: |
|
|
268
|
+
git config user.name "agent-bot"
|
|
269
|
+
git config user.email "bot@noreply.github.com"
|
|
270
|
+
git add -f agent_runs.db
|
|
271
|
+
git diff --staged --quiet || git commit -m "chore: persist agent run data [skip ci]"
|
|
272
|
+
git push
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
### Step 4 — Pull the DB locally and open the dashboard
|
|
278
|
+
|
|
279
|
+
After a GitHub Actions run finishes, pull the committed DB and start the dashboard:
|
|
280
|
+
|
|
281
|
+
```bash
|
|
282
|
+
# In your agent's repo — pull the latest DB
|
|
283
|
+
cd ~/your-agent-project
|
|
284
|
+
git pull
|
|
285
|
+
|
|
286
|
+
# In the agent-dashboard repo — start the dashboard
|
|
287
|
+
cd ~/Projects/agent-dashboard
|
|
288
|
+
source .venv/bin/activate
|
|
289
|
+
python main.py serve --db ~/your-agent-project/agent_runs.db
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Open `http://localhost:7777`.
|
|
293
|
+
|
|
294
|
+
Every time you want to see fresh data from a new run, just `git pull` in your agent repo and refresh the browser — no need to restart the dashboard.
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## `RunContext` full reference
|
|
299
|
+
|
|
300
|
+
```python
|
|
301
|
+
from run_context import RunContext
|
|
302
|
+
|
|
303
|
+
with RunContext(
|
|
304
|
+
agent_name="researcher", # required — label shown in dashboard
|
|
305
|
+
db_path="./agent_runs.db", # SQLite path, created if missing
|
|
306
|
+
topic_title="My task label", # optional — human-readable run label
|
|
307
|
+
metadata={"model": "gpt-4o"}, # optional — any JSON-serialisable dict
|
|
308
|
+
) as ctx:
|
|
309
|
+
...
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
### `ctx.log_iteration(...)`
|
|
313
|
+
|
|
314
|
+
```python
|
|
315
|
+
ctx.log_iteration(
|
|
316
|
+
tokens_input=response.usage.input_tokens, # int
|
|
317
|
+
tokens_output=response.usage.output_tokens, # int
|
|
318
|
+
stop_reason=response.stop_reason, # "tool_use" | "end_turn" | "max_tokens"
|
|
319
|
+
assistant_preview=text[:200], # str — first ~200 chars of text response
|
|
320
|
+
tool_names=["search", "write_file"], # list[str] — tools called in this turn
|
|
321
|
+
duration_ms=1234, # int — how long the LLM call took
|
|
322
|
+
started_at=datetime.now().isoformat(), # str — optional, defaults to now
|
|
323
|
+
)
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
### `ctx.log_tool_call(...)`
|
|
327
|
+
|
|
328
|
+
```python
|
|
329
|
+
ctx.log_tool_call(
|
|
330
|
+
tool_name="search", # str
|
|
331
|
+
inputs={"query": "..."}, # dict — the arguments passed to the tool
|
|
332
|
+
result={"results": [...]}, # any — what the tool returned
|
|
333
|
+
duration_ms=320, # int — how long the tool took
|
|
334
|
+
error=None, # str | None — pass error string if it failed
|
|
335
|
+
)
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Other methods
|
|
339
|
+
|
|
340
|
+
```python
|
|
341
|
+
ctx.mark_failed("Timeout after 30s") # flag run as failed without raising
|
|
342
|
+
ctx.mark_skipped("No topics ready to publish") # record a no-op run as 'skipped'
|
|
343
|
+
ctx.add_tokens(inp=500, out=200) # manually accumulate tokens (if not using log_iteration)
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
#### Capturing silent skips — important pattern
|
|
347
|
+
|
|
348
|
+
Wrap your agent at the **outermost level**, before any conditional logic. This ensures every invocation is recorded — including runs where the agent decides there is nothing to do:
|
|
349
|
+
|
|
350
|
+
```python
|
|
351
|
+
with RunContext("scheduler", db_path=DB_PATH) as ctx:
|
|
352
|
+
topics = get_ready_topics()
|
|
353
|
+
if not topics:
|
|
354
|
+
ctx.mark_skipped("No topics ready to publish")
|
|
355
|
+
# returns here — run is recorded as 'skipped', visible on the dashboard
|
|
356
|
+
else:
|
|
357
|
+
for topic in topics:
|
|
358
|
+
process(topic, ctx)
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
Without this pattern, "nothing to do" runs are invisible — you can't tell if the agent ran and skipped, or never ran at all. With it, every run shows up in the Failures & Skips page.
|
|
362
|
+
|
|
363
|
+
---
|
|
364
|
+
|
|
365
|
+
## CLI reference
|
|
366
|
+
|
|
367
|
+
After `pip install agent-dashboard` the `agent-dashboard` command is available globally:
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
# Serve dashboard pointing at a specific DB
|
|
371
|
+
agent-dashboard serve --db /path/to/agent_runs.db
|
|
372
|
+
|
|
373
|
+
# Custom port
|
|
374
|
+
agent-dashboard serve --db /path/to/agent_runs.db --port 8080
|
|
375
|
+
|
|
376
|
+
# Bind to all interfaces (e.g. accessible from another machine on your network)
|
|
377
|
+
agent-dashboard serve --db /path/to/agent_runs.db --host 0.0.0.0 --port 7777
|
|
378
|
+
|
|
379
|
+
# Fresh DB in current directory
|
|
380
|
+
agent-dashboard serve
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
Or keep using `python main.py serve` if running from source.
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## File structure
|
|
388
|
+
|
|
389
|
+
```
|
|
390
|
+
agent-dashboard/
|
|
391
|
+
├── pyproject.toml # Package metadata — pip install agent-dashboard
|
|
392
|
+
├── main.py # CLI entry point (python main.py serve)
|
|
393
|
+
├── run_context.py # Standalone SDK — copy this into any agent project
|
|
394
|
+
├── requirements.txt
|
|
395
|
+
├── Makefile # Shortcuts for blogging-agent integration
|
|
396
|
+
├── agent_dashboard/
|
|
397
|
+
│ ├── __init__.py # exports RunContext, set_db_path, init_db
|
|
398
|
+
│ ├── sdk.py # RunContext implementation
|
|
399
|
+
│ ├── anthropic.py # Auto-instrumented Anthropic client (drop-in)
|
|
400
|
+
│ ├── cli.py # agent-dashboard CLI entry point
|
|
401
|
+
│ ├── db.py # SQLite schema + all read/write queries
|
|
402
|
+
│ └── api.py # FastAPI REST endpoints
|
|
403
|
+
└── static/
|
|
404
|
+
└── index.html # Single-page dashboard (Alpine.js + Chart.js + Tailwind)
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
---
|
|
408
|
+
|
|
409
|
+
## API endpoints
|
|
410
|
+
|
|
411
|
+
| Endpoint | Description |
|
|
412
|
+
|---|---|
|
|
413
|
+
| `GET /api/overview` | KPIs, 7-day timeline, recent runs, top errors |
|
|
414
|
+
| `GET /api/runs` | Paginated run list (filters: `agent`, `status`, `search`) |
|
|
415
|
+
| `GET /api/runs/{run_id}` | Single run detail |
|
|
416
|
+
| `GET /api/runs/{run_id}/iterations` | All LLM turns for a run |
|
|
417
|
+
| `GET /api/runs/{run_id}/tools` | All tool calls for a run |
|
|
418
|
+
| `GET /api/failures` | Failed runs grouped by error pattern |
|
|
419
|
+
| `GET /api/tool-stats` | Per-tool call counts, error rates, avg duration |
|
|
420
|
+
| `GET /api/agent-stats` | Per-agent aggregated stats |
|
|
421
|
+
| `GET /api/agent-names` | List of distinct agent names in the DB |
|
|
422
|
+
|
|
423
|
+
---
|
|
424
|
+
|
|
425
|
+
## Makefile shortcuts (for blogging-agent)
|
|
426
|
+
|
|
427
|
+
```bash
|
|
428
|
+
make blog # serve dashboard using local blogging-agent DB
|
|
429
|
+
make blog-pull # git pull blogging-agent DB first, then serve
|
|
430
|
+
make blog-live # pull DB every 30s in background + serve (near-live mode)
|
|
431
|
+
make fresh # serve with a brand-new empty DB
|
|
432
|
+
```
|