tuneloop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Tuneloop
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,256 @@
1
+ # tuneloop
2
+
3
+ Local analytics for your AI coding sessions. **Count outcomes, not tokens.**
4
+
5
+ tuneloop turns the session transcripts your AI coding tools already write into a
6
+ local dashboard of what you actually shipped, what it cost, and your work patterns.
7
+
8
+ <br>
9
+
10
+ <p align="center">
11
+ <img src="docs/img/cost_per_artifact.png" alt="tuneloop dashboard — headline metrics (outcome rate, cost per shipped artifact, spend, sessions, tool error rate) above a per-PR cost breakdown treemap" width="900">
12
+ </p>
13
+
14
+ <br>
15
+
16
+ Concretely, it enriches each session with:
17
+
18
+ - **Outcome links** — merged PRs, features shipped, files changed
19
+ - **Granular cost attribution to outcomes**
20
+ - **Task complexity**
21
+ - **Agent autonomy**
22
+ - **Work type**
23
+ - **Key decisions**
24
+ - **Tool error categories**
25
+
26
+ Combined with the data already in the transcript — model, agent harness, repo, and
27
+ more — this data lets you answer questions like:
28
+
29
+ - How much of my AI spend went into PR #2, or feature *X*?
30
+ - Are my agents getting more autonomous over time on complex tasks?
31
+ - What's my success rate on repo *X* vs. repo *Y* — or any other dimension you care about?
32
+
33
+ Works with Claude Code, Codex, and OpenCode. Everything runs and stays
34
+ on your machine; enrichments that need an LLM can use your own provider key or a
35
+ local model. The built-ins above are just the defaults — tuneloop is extensible,
36
+ and adding your own enrichment is straightforward.
37
+
38
+ > Built by the team at [Tuneloop](https://tuneloop.io).
39
+
40
+ ## Quick start
41
+
42
+ ```bash
43
+ npx tuneloop analyze
44
+ ```
45
+
46
+ This scans typical session folders like `~/.claude/projects`, builds a local
47
+ store, and prints a summary. The **first run** processes every transcript, so
48
+ expect a few minutes (around 4 for ~80 sessions with [LLM
49
+ enrichment](#llm-enrichment) on; static-only runs are faster); later runs are
50
+ incremental and only re-process sessions that changed, so they finish quickly. On completion the
51
+ CLI prints the dashboard URL — press **Enter** to open it in your browser
52
+ (`Ctrl+C` to stop). Point it at other locations with a comma-separated list:
53
+
54
+ ```bash
55
+ npx tuneloop analyze ~/.claude/projects,/path/to/more/sessions
56
+ ```
57
+
58
+ Handy flags:
59
+
60
+ - `--no-serve` — build the store and exit, no dashboard
61
+ - `--port <n>` — serve on a different port
62
+ - `npx tuneloop serve` — open the dashboard over an already-analyzed store, without re-analyzing
63
+
64
+ ## What you get
65
+
66
+ The dashboard reads everything live from a local SQLite store:
67
+
68
+ - **Session outcome rate** — how many of your sessions ended in a win (you pick what counts).
69
+ - **Cost per shipped artifact** — dollars of AI spend per merged PR or per shipped feature.
70
+ - **Total spend** — over time, split by model, work type, or repo.
71
+ - **Tool & skill usage** — call counts, error rates, and error categories across every session.
72
+ - A **filterable session viewer**, with the full transcript and file changes behind each one.
73
+ - Easy transcript navigation (turn-by-turn, errors, free text search, and
74
+ outcomes). For example: you can jump to the part of the session where
75
+ you worked on a particular feature or code change.
76
+ - Filter sessions that touched a particular file / PR / feature.
77
+
78
+ Cost, tools, files, and git/PR outcomes come from static analysis — no setup or
79
+ API key. Work type, complexity, autonomy, and feature names come from [LLM
80
+ enrichment](#llm-enrichment), which is worth setting up: much of what makes the
81
+ dashboard useful depends on it.
82
+
83
+ **Highlights** turns the same data into plain-English insights about your recent work:
84
+
85
+ <br>
86
+
87
+ <p align="center">
88
+ <img src="docs/img/highlights_tab.png" alt="tuneloop Highlights tab — a question-led digest: sessions run, most AI spend on shipped vs unshipped work, share of spend that shipped, and success rate by complexity" width="900">
89
+ </p>
90
+
91
+ <br>
92
+
93
+ And every session is a readable transcript you can navigate turn-by-turn, by work
94
+ type, or by error — with the files it changed alongside:
95
+
96
+ <br>
97
+
98
+ <p align="center">
99
+ <img src="docs/img/session_transcript_viewer.png" alt="tuneloop session viewer — turn-by-turn transcript with work-type filter pills, tool calls, and a Files tab, next to the filterable session list" width="900">
100
+ </p>
101
+
102
+ <br>
103
+
104
+ ## How it works
105
+
106
+ **Enrichment** labels each session in one LLM call:
107
+
108
+ - **Work type** — one of `plan` · `implement` · `debug` · `research` · `review` · `docs` · `other`.
109
+ - **Complexity** — one of `trivial` · `routine` · `substantial` · `open-ended`.
110
+ - **Autonomy** — how much the agent drove itself: `autonomous` · `guided` · `minimal`.
111
+ - **Feature** — links the session to a shipped feature, reusing your existing feature
112
+ names and proposing new ones. The taxonomy grows as you analyze, so related work
113
+ lands under one feature instead of fragmenting.
114
+ - **Success** — a judged outcome (`success` / `partial` / `failure`), surfaced as the
115
+ `session_success` outcome you can count.
116
+
117
+ **PR linking** connects a session to the PRs it produced, two ways:
118
+
119
+ - **Explicit** — the transcript shows the agent creating, merging, or reviewing a PR
120
+ (`gh pr create` / `gh pr review` / a GitHub MCP tool); live status comes from your
121
+ local `gh`.
122
+ - **Content-match** — for the common case where the agent writes the code and *you*
123
+ commit and push it (no `gh pr create` in the transcript), tuneloop matches the lines
124
+ the agent authored against your own PRs' diffs and links the best match.
125
+
126
+ See [ARCHITECTURE.md](./ARCHITECTURE.md#built-in-processors) for the detection rules.
127
+
128
+ **Block-level cost attribution** — a long session that touches several things isn't
129
+ billed as one lump. tuneloop splits it into blocks and attributes token cost per
130
+ block, so a per-PR or per-feature cost reflects only the work that went into it.
131
+ → [how blocks work](./ARCHITECTURE.md#blocks-and-cost-attribution-srccoreblocksts)
132
+
133
+ **Metrics** — the five dashboard headlines (outcome rate, cost per shipped artifact,
134
+ total spend, sessions, tool error rate) are each explained in
135
+ [ARCHITECTURE.md](./ARCHITECTURE.md#the-metrics-explained).
136
+
137
+ ## Query it from your coding agent
138
+
139
+ Everything on the dashboard is a query over the store — and so is anything it
140
+ *doesn't* show. `tuneloop query` runs read-only SQL over that store, straight from
141
+ your terminal or your coding agent's shell:
142
+
143
+ ```bash
144
+ tuneloop query "SELECT model, SUM(cost_usd) FROM usage_facts GROUP BY 1 ORDER BY 2 DESC"
145
+ tuneloop query --schema # tables, facets, and measures — learn the shape first
146
+ ```
147
+
148
+ Only `SELECT` / `WITH … SELECT` run; writes and raw transcripts are off-limits.
149
+
150
+ Because it needs no server and speaks plain SQL, it's a natural fit for Claude Code
151
+ and other agents. Install the bundled skill so your agent knows the schema and the
152
+ grain rules before it writes a query:
153
+
154
+ ```bash
155
+ npx skills add tuneloop/tuneloop
156
+ ```
157
+
158
+ Then just ask — *"Query tuneloop: what did I spend per model last week?"* — and the agent writes
159
+ the SQL, runs it, and reads back the answer.
160
+
161
+ ## LLM enrichment
162
+
163
+ To label each session with a work type, complexity, autonomy, and an LLM-judged
164
+ success signal — and to name the features you shipped — point tuneloop at **your
165
+ own** LLM key. Your session data goes only to the provider you choose:
166
+
167
+ ```bash
168
+ export TUNELOOP_LLM_PROVIDER=anthropic
169
+ export ANTHROPIC_API_KEY=sk-ant-...
170
+ # optional: export TUNELOOP_LLM_MODEL=claude-haiku-4-5
171
+ npx tuneloop analyze
172
+ ```
173
+
174
+ Pick a preset and supply its key; the model defaults sensibly and is overridable
175
+ with `TUNELOOP_LLM_MODEL` (or `--llm-model`). Anthropic and OpenAI are native;
176
+ everything else speaks the OpenAI-compatible API.
177
+
178
+ | `TUNELOOP_LLM_PROVIDER` | Key env | Notes |
179
+ |---|---|---|
180
+ | `anthropic` | `ANTHROPIC_API_KEY` | native |
181
+ | `openai` | `OPENAI_API_KEY` | native |
182
+ | `openrouter` | `OPENROUTER_API_KEY` | 400+ models via one key |
183
+ | `groq` | `GROQ_API_KEY` | fast; free tier |
184
+ | `deepseek` | `DEEPSEEK_API_KEY` | |
185
+ | `gemini` | `GEMINI_API_KEY` | Google, OpenAI-compatible endpoint |
186
+ | `together` / `fireworks` / `xai` | `TOGETHER_API_KEY` / `FIREWORKS_API_KEY` / `XAI_API_KEY` | |
187
+ | `ollama` | _(none)_ | local; `http://localhost:11434` |
188
+ | `openai-compatible` | `TUNELOOP_LLM_API_KEY` | any other host; set `TUNELOOP_LLM_BASE_URL` |
189
+
190
+ ```bash
191
+ # A hosted provider — name it, never type a URL:
192
+ TUNELOOP_LLM_PROVIDER=openrouter OPENROUTER_API_KEY=sk-or-... \
193
+ npx tuneloop analyze --llm-model deepseek/deepseek-chat
194
+
195
+ # Fully local, no key, nothing leaves your machine:
196
+ npx tuneloop analyze --llm-provider ollama --llm-model qwen2.5
197
+
198
+ # Any other OpenAI-compatible host:
199
+ TUNELOOP_LLM_PROVIDER=openai-compatible TUNELOOP_LLM_BASE_URL=https://host/v1 \
200
+ TUNELOOP_LLM_API_KEY=… npx tuneloop analyze --llm-model my-model
201
+ ```
202
+
203
+ Enrichment is one structured **tool call** per session, so use a
204
+ tool-call-capable model (all the hosted defaults qualify). Flags override the env
205
+ for one run; the API key is always env-only. It's cheap — a typical corpus of ~80
206
+ sessions runs about **$0.60** with Claude Haiku. This cost shows up as **Analysis
207
+ spend** in the summary, priced from a built-in table with an OpenRouter public
208
+ price list filling gaps (cached under `~/.tuneloop/`).
209
+
210
+ **Local Ollama** needs a bigger context window and a capable model: the enrichment
211
+ prompt is ~4–6k tokens but Ollama's ~2k default silently truncates it, so start the
212
+ server with `OLLAMA_CONTEXT_LENGTH=8192 ollama serve` and use a tool-strong ≥7B
213
+ model like `qwen2.5:7b` (tiny models tool-call unreliably).
214
+
215
+ ## Privacy
216
+
217
+ Transcripts are processed locally and results are written to a local SQLite store
218
+ (`~/.tuneloop/` by default). tuneloop never posts your **session data** anywhere —
219
+ the only thing that ever leaves is a transcript sent to the LLM provider whose key
220
+ you supply, and only if you enable enrichment. Its other network calls are
221
+ read-only and carry none of your data: your local `gh` for PR status and diffs
222
+ (your own GitHub auth), and OpenRouter's public price list to cost models the
223
+ built-in table doesn't know. To avoid sending transcripts off the machine at all,
224
+ enrich against a local model (`--llm-provider ollama`).
225
+
226
+ ## Run from source
227
+
228
+ `npx tuneloop` is all most people need. To hack on tuneloop itself, run it from a
229
+ local checkout:
230
+
231
+ ```bash
232
+ npm install
233
+ npm run dev -- analyze # builds, runs the CLI (args after `--`), then serves the dashboard
234
+ ```
235
+
236
+ Or build once and call the binary directly:
237
+
238
+ ```bash
239
+ npm run build
240
+ node dist/cli.js analyze
241
+ ```
242
+
243
+ `npm link` gives you a global `tuneloop` backed by your local build. LLM
244
+ enrichment works the same way — set `TUNELOOP_LLM_PROVIDER` and its key before
245
+ running.
246
+
247
+ ## Extending
248
+
249
+ Adding new analysis is one file: implement the `Processor` interface, declare any
250
+ sliceable facets, and register it — it shows up in the store and the dashboard (as
251
+ a card and a filter) automatically, no migration. To support a new AI tool, write
252
+ a `SourceAdapter`. See [ARCHITECTURE.md](./ARCHITECTURE.md).
253
+
254
+ ## License
255
+
256
+ MIT