llm-agent-dashboard 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,8 @@
1
+ .env
2
+ .venv/
3
+ __pycache__/
4
+ *.pyc
5
+ *.db
6
+ *.db-shm
7
+ *.db-wal
8
+ .DS_Store
@@ -0,0 +1,432 @@
1
+ Metadata-Version: 2.4
2
+ Name: llm-agent-dashboard
3
+ Version: 0.1.0
4
+ Summary: Self-hosted observability dashboard for agentic flows — every LLM turn, tool call, and failure captured
5
+ License: MIT
6
+ Keywords: agent,anthropic,dashboard,llm,observability,openai
7
+ Requires-Python: >=3.11
8
+ Requires-Dist: fastapi>=0.110.0
9
+ Requires-Dist: uvicorn[standard]>=0.27.0
10
+ Provides-Extra: anthropic
11
+ Requires-Dist: anthropic>=0.20.0; extra == 'anthropic'
12
+ Description-Content-Type: text/markdown
13
+
14
+ # Agent Dashboard
15
+
16
+ Self-hosted observability dashboard for agentic flows. Captures every LLM turn, every tool call, every failure with full inputs/outputs — displayed in a searchable, real-time web UI.
17
+
18
+ ```
19
+ Overview → All Runs → Run Detail (iteration timeline + tool call table)
20
+ → Failures & Skips → Tool Analytics
21
+ ```
22
+
23
+ ---
24
+
25
+ ## Quickstart — plug into any Anthropic agent in 3 lines
26
+
27
+ ```bash
28
+ pip install agent-dashboard
29
+ ```
30
+
31
+ ```python
32
+ from agent_dashboard import RunContext
33
+ from agent_dashboard.anthropic import Anthropic # drop-in for anthropic.Anthropic
34
+
35
+ client = Anthropic() # same constructor args as anthropic.Anthropic()
36
+
37
+ with RunContext("my_agent", db_path="./agent_runs.db") as ctx:
38
+ client.set_context(ctx) # attach — all messages.create() calls auto-logged
39
+ # ... your existing agent code, zero other changes ...
40
+ response = client.messages.create(model=..., messages=..., tools=...)
41
+ # tokens, stop reason, tool names, duration — all captured automatically
42
+ ```
43
+
44
+ Start the dashboard:
45
+
46
+ ```bash
47
+ agent-dashboard serve --db ./agent_runs.db
48
+ # → http://localhost:7777
49
+ ```
50
+
51
+ ---
52
+
53
+ ## What it shows
54
+
55
+ - **KPI cards**: total runs, success rate, token usage, tool call error rate
56
+ - **7-day timeline chart**: stacked bar of success/failed/running per day
57
+ - **Per-run drilldown**: every LLM iteration with tokens, stop reason, tool calls used, and the full assistant text
58
+ - **Tool call inspection**: expandable inputs/results (JSON), quality signal, duration, error message
59
+ - **Failure analysis**: all failed runs grouped by error pattern
60
+ - **Tool analytics**: per-tool call counts, error rates, avg duration, quality breakdown
61
+
62
+ Auto-refreshes every 30 seconds. Live indicator for currently-running agents.
63
+
64
+ ---
65
+
66
+ ## Installation
67
+
68
+ **From PyPI** (recommended):
69
+
70
+ ```bash
71
+ pip install agent-dashboard # core
72
+ pip install "agent-dashboard[anthropic]" # + auto-instrumented Anthropic client
73
+ ```
74
+
75
+ **From source** (for local development):
76
+
77
+ ```bash
78
+ cd ~/Projects/agent-dashboard
79
+ python -m venv .venv
80
+ source .venv/bin/activate
81
+ pip install -e ".[anthropic]"
82
+ ```
83
+
84
+ ---
85
+
86
+ ## Wiring up an agent that runs on GitHub Actions
87
+
88
+ This is a 4-step process. Steps 1–3 happen in your **agent's repo**. Step 4 happens locally after a run completes.
89
+
90
+ ---
91
+
92
+ ### Step 1 — Copy `run_context.py` into your agent repo
93
+
94
+ `run_context.py` is a single self-contained file in the root of this repo. It has no dependencies beyond the Python standard library — just `sqlite3`, `json`, `uuid`, `time`, `os`, and `datetime`.
95
+
96
+ Copy it into the root of your agent project:
97
+
98
+ ```bash
99
+ cp ~/Projects/agent-dashboard/run_context.py ~/your-agent-project/
100
+ ```
101
+
102
+ Commit it:
103
+
104
+ ```bash
105
+ cd ~/your-agent-project
106
+ git add run_context.py
107
+ git commit -m "add: agent dashboard run_context SDK"
108
+ git push
109
+ ```
110
+
111
+ ---
112
+
113
+ ### Step 2 — Wrap your agent loop with `RunContext`
114
+
115
+ Open your agent's main Python file. Import `RunContext` and wrap your existing agent loop with it. You only need to add lines — do not change your existing tool-calling or LLM logic.
116
+
117
+ #### Full example for an Anthropic `client.messages.create` loop
118
+
119
+ ```python
120
+ import time
121
+ from datetime import datetime
122
+ from run_context import RunContext
123
+
124
+ DB_PATH = "./agent_runs.db" # SQLite file that will be committed to git
125
+
126
+ def run_my_agent(user_prompt: str):
127
+ with RunContext(
128
+ agent_name="my_agent", # short label shown in the dashboard
129
+ db_path=DB_PATH,
130
+ topic_title=user_prompt[:80], # optional — human-readable label
131
+ metadata={"model": MODEL}, # optional — any extra info
132
+ ) as ctx:
133
+
134
+ messages = [{"role": "user", "content": user_prompt}]
135
+
136
+ while True:
137
+ turn_start = time.time()
138
+
139
+ response = client.messages.create(
140
+ model=MODEL,
141
+ max_tokens=4096,
142
+ tools=tools,
143
+ messages=messages,
144
+ )
145
+
146
+ # ── log the LLM turn ──────────────────────────────────────────
147
+ text = next(
148
+ (b.text for b in response.content if hasattr(b, "text")), ""
149
+ )
150
+ tool_names = [
151
+ b.name for b in response.content if b.type == "tool_use"
152
+ ]
153
+ ctx.log_iteration(
154
+ tokens_input=response.usage.input_tokens,
155
+ tokens_output=response.usage.output_tokens,
156
+ stop_reason=response.stop_reason,
157
+ assistant_preview=text[:200],
158
+ tool_names=tool_names,
159
+ duration_ms=int((time.time() - turn_start) * 1000),
160
+ )
161
+ # ─────────────────────────────────────────────────────────────
162
+
163
+ if response.stop_reason == "end_turn":
164
+ break
165
+
166
+ # run tool calls
167
+ tool_results = []
168
+ for block in response.content:
169
+ if block.type != "tool_use":
170
+ continue
171
+
172
+ # ── log each tool call ────────────────────────────────────
173
+ t0 = time.time()
174
+ result = execute_tool(block.name, block.input) # YOUR existing function
175
+ ctx.log_tool_call(
176
+ tool_name=block.name,
177
+ inputs=dict(block.input),
178
+ result=result,
179
+ duration_ms=int((time.time() - t0) * 1000),
180
+ )
181
+ # ─────────────────────────────────────────────────────────
182
+
183
+ tool_results.append({
184
+ "type": "tool_result",
185
+ "tool_use_id": block.id,
186
+ "content": str(result),
187
+ })
188
+
189
+ messages = messages + [
190
+ {"role": "assistant", "content": response.content},
191
+ {"role": "user", "content": tool_results},
192
+ ]
193
+ ```
194
+
195
+ #### What each method does
196
+
197
+ | Method | When to call | What it records |
198
+ |--------|-------------|-----------------|
199
+ | `ctx.log_iteration(...)` | Once per `client.messages.create` call | Token counts, stop reason, assistant text preview, which tools were called in this turn |
200
+ | `ctx.log_tool_call(...)` | Once per tool execution | Tool name, full inputs, full result (truncated if large), duration, success/error |
201
+ | `ctx.mark_failed(error)` | If you want to flag failure without raising | Sets run status to `failed` with your message |
202
+
203
+ The `with RunContext(...) as ctx:` block automatically:
204
+ - Creates the `agent_runs`, `agent_tool_calls`, `agent_iterations` tables in the SQLite file if they don't exist
205
+ - Writes a run record with `status='running'` at the start
206
+ - Updates it to `status='success'` or `status='failed'` (with the exception message) at the end
207
+
208
+ ---
209
+
210
+ ### Step 3 — Add a "Persist DB" step to your GitHub Actions workflow
211
+
212
+ At the end of your job in `.github/workflows/your-workflow.yml`, add this step **after** the step that runs your agent. The `if: always()` ensures the DB is committed even if the agent fails.
213
+
214
+ ```yaml
215
+ - name: Persist agent run DB
216
+ if: always()
217
+ run: |
218
+ git config user.name "agent-bot"
219
+ git config user.email "bot@noreply.github.com"
220
+ git add -f agent_runs.db
221
+ git diff --staged --quiet || git commit -m "chore: persist agent run data [skip ci]"
222
+ git push
223
+ ```
224
+
225
+ Also make sure your workflow job has write permissions. Add this at the top level of your workflow file if it isn't already there:
226
+
227
+ ```yaml
228
+ permissions:
229
+ contents: write
230
+ ```
231
+
232
+ Full minimal workflow example:
233
+
234
+ ```yaml
235
+ name: Run Agent
236
+
237
+ on:
238
+ workflow_dispatch:
239
+ schedule:
240
+ - cron: "0 9 * * *"
241
+
242
+ permissions:
243
+ contents: write
244
+
245
+ jobs:
246
+ run:
247
+ runs-on: ubuntu-latest
248
+ steps:
249
+ - uses: actions/checkout@v4
250
+
251
+ - uses: actions/setup-python@v5
252
+ with:
253
+ python-version: "3.11"
254
+ cache: pip
255
+ cache-dependency-path: requirements.txt
256
+
257
+ - name: Install dependencies
258
+ run: pip install -r requirements.txt
259
+
260
+ - name: Run agent
261
+ env:
262
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
263
+ run: python main.py # or however you run your agent
264
+
265
+ - name: Persist agent run DB
266
+ if: always()
267
+ run: |
268
+ git config user.name "agent-bot"
269
+ git config user.email "bot@noreply.github.com"
270
+ git add -f agent_runs.db
271
+ git diff --staged --quiet || git commit -m "chore: persist agent run data [skip ci]"
272
+ git push
273
+ ```
274
+
275
+ ---
276
+
277
+ ### Step 4 — Pull the DB locally and open the dashboard
278
+
279
+ After a GitHub Actions run finishes, pull the committed DB and start the dashboard:
280
+
281
+ ```bash
282
+ # In your agent's repo — pull the latest DB
283
+ cd ~/your-agent-project
284
+ git pull
285
+
286
+ # In the agent-dashboard repo — start the dashboard
287
+ cd ~/Projects/agent-dashboard
288
+ source .venv/bin/activate
289
+ python main.py serve --db ~/your-agent-project/agent_runs.db
290
+ ```
291
+
292
+ Open `http://localhost:7777`.
293
+
294
+ Every time you want to see fresh data from a new run, just `git pull` in your agent repo and refresh the browser — no need to restart the dashboard.
295
+
296
+ ---
297
+
298
+ ## `RunContext` full reference
299
+
300
+ ```python
301
+ from run_context import RunContext
302
+
303
+ with RunContext(
304
+ agent_name="researcher", # required — label shown in dashboard
305
+ db_path="./agent_runs.db", # SQLite path, created if missing
306
+ topic_title="My task label", # optional — human-readable run label
307
+ metadata={"model": "gpt-4o"}, # optional — any JSON-serialisable dict
308
+ ) as ctx:
309
+ ...
310
+ ```
311
+
312
+ ### `ctx.log_iteration(...)`
313
+
314
+ ```python
315
+ ctx.log_iteration(
316
+ tokens_input=response.usage.input_tokens, # int
317
+ tokens_output=response.usage.output_tokens, # int
318
+ stop_reason=response.stop_reason, # "tool_use" | "end_turn" | "max_tokens"
319
+ assistant_preview=text[:200], # str — first ~200 chars of text response
320
+ tool_names=["search", "write_file"], # list[str] — tools called in this turn
321
+ duration_ms=1234, # int — how long the LLM call took
322
+ started_at=datetime.now().isoformat(), # str — optional, defaults to now
323
+ )
324
+ ```
325
+
326
+ ### `ctx.log_tool_call(...)`
327
+
328
+ ```python
329
+ ctx.log_tool_call(
330
+ tool_name="search", # str
331
+ inputs={"query": "..."}, # dict — the arguments passed to the tool
332
+ result={"results": [...]}, # any — what the tool returned
333
+ duration_ms=320, # int — how long the tool took
334
+ error=None, # str | None — pass error string if it failed
335
+ )
336
+ ```
337
+
338
+ ### Other methods
339
+
340
+ ```python
341
+ ctx.mark_failed("Timeout after 30s") # flag run as failed without raising
342
+ ctx.mark_skipped("No topics ready to publish") # record a no-op run as 'skipped'
343
+ ctx.add_tokens(inp=500, out=200) # manually accumulate tokens (if not using log_iteration)
344
+ ```
345
+
346
+ #### Capturing silent skips — important pattern
347
+
348
+ Wrap your agent at the **outermost level**, before any conditional logic. This ensures every invocation is recorded — including runs where the agent decides there is nothing to do:
349
+
350
+ ```python
351
+ with RunContext("scheduler", db_path=DB_PATH) as ctx:
352
+ topics = get_ready_topics()
353
+ if not topics:
354
+ ctx.mark_skipped("No topics ready to publish")
355
+ # returns here — run is recorded as 'skipped', visible on the dashboard
356
+ else:
357
+ for topic in topics:
358
+ process(topic, ctx)
359
+ ```
360
+
361
+ Without this pattern, "nothing to do" runs are invisible — you can't tell if the agent ran and skipped, or never ran at all. With it, every run shows up in the Failures & Skips page.
362
+
363
+ ---
364
+
365
+ ## CLI reference
366
+
367
+ After `pip install agent-dashboard` the `agent-dashboard` command is available globally:
368
+
369
+ ```bash
370
+ # Serve dashboard pointing at a specific DB
371
+ agent-dashboard serve --db /path/to/agent_runs.db
372
+
373
+ # Custom port
374
+ agent-dashboard serve --db /path/to/agent_runs.db --port 8080
375
+
376
+ # Bind to all interfaces (e.g. accessible from another machine on your network)
377
+ agent-dashboard serve --db /path/to/agent_runs.db --host 0.0.0.0 --port 7777
378
+
379
+ # Fresh DB in current directory
380
+ agent-dashboard serve
381
+ ```
382
+
383
+ Or keep using `python main.py serve` if running from source.
384
+
385
+ ---
386
+
387
+ ## File structure
388
+
389
+ ```
390
+ agent-dashboard/
391
+ ├── pyproject.toml # Package metadata — pip install agent-dashboard
392
+ ├── main.py # CLI entry point (python main.py serve)
393
+ ├── run_context.py # Standalone SDK — copy this into any agent project
394
+ ├── requirements.txt
395
+ ├── Makefile # Shortcuts for blogging-agent integration
396
+ ├── agent_dashboard/
397
+ │ ├── __init__.py # exports RunContext, set_db_path, init_db
398
+ │ ├── sdk.py # RunContext implementation
399
+ │ ├── anthropic.py # Auto-instrumented Anthropic client (drop-in)
400
+ │ ├── cli.py # agent-dashboard CLI entry point
401
+ │ ├── db.py # SQLite schema + all read/write queries
402
+ │ └── api.py # FastAPI REST endpoints
403
+ └── static/
404
+ └── index.html # Single-page dashboard (Alpine.js + Chart.js + Tailwind)
405
+ ```
406
+
407
+ ---
408
+
409
+ ## API endpoints
410
+
411
+ | Endpoint | Description |
412
+ |---|---|
413
+ | `GET /api/overview` | KPIs, 7-day timeline, recent runs, top errors |
414
+ | `GET /api/runs` | Paginated run list (filters: `agent`, `status`, `search`) |
415
+ | `GET /api/runs/{run_id}` | Single run detail |
416
+ | `GET /api/runs/{run_id}/iterations` | All LLM turns for a run |
417
+ | `GET /api/runs/{run_id}/tools` | All tool calls for a run |
418
+ | `GET /api/failures` | Failed runs grouped by error pattern |
419
+ | `GET /api/tool-stats` | Per-tool call counts, error rates, avg duration |
420
+ | `GET /api/agent-stats` | Per-agent aggregated stats |
421
+ | `GET /api/agent-names` | List of distinct agent names in the DB |
422
+
423
+ ---
424
+
425
+ ## Makefile shortcuts (for blogging-agent)
426
+
427
+ ```bash
428
+ make blog # serve dashboard using local blogging-agent DB
429
+ make blog-pull # git pull blogging-agent DB first, then serve
430
+ make blog-live # pull DB every 30s in background + serve (near-live mode)
431
+ make fresh # serve with a brand-new empty DB
432
+ ```