npm - oh-my-tps - Versions diffs - 0.1.0 - Mend

oh-my-tps 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/LICENSE +21 -0
package/README.md +182 -0
package/README.zh-CN.md +178 -0
package/extensions/oh-my-tps.ts +209 -0
package/extensions/shared/content.ts +26 -0
package/extensions/shared/token-estimator.ts +6 -0
package/package.json +49 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 EnderLiquid
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,182 @@
+# Oh My TPS
+English | [简体中文](./README.zh-CN.md)
+## Install
+### npm package
+```bash
+pi install npm:oh-my-tps
+```
+### Git repository
+```bash
+pi install git:github.com/EnderLiquid/oh-my-tps
+```
+## What it does
+`oh-my-tps` does one thing:
+it adds a tiny live speed readout to the Pi TUI so you can see first-token latency and output speed while the model is responding.
+- `τ`: TTFT, time to first token, in seconds
+- `Δ`: TPS, tokens per second
+What it looks like:
+```text
+τ0.8 Δ48.6
+```
+That's it.
+Ten characters. It just works.
+If you want, you can keep reading for the details—but at this point you already know how to use it.
+## Reading the numbers
+You will see readings like these in the TUI footer area:
+```text
+τ0.8 Δ48.6
+τ1.1 Δ49.7L
+τ0.8A Δ52.4A
+```
+Suffixes:
+- `A`: Average, the average final value across recent requests
+- `L`: Last, the final value from the previous request
+A quick way to read them:
+- `τ0.8 Δ48.6`: the response is currently streaming; TTFT was about 0.8s and the current live TPS estimate is about 48.6
+- `τ1.1 Δ49.7L`: the request has been sent, but streaming has not started yet; TTFT is still counting, so the extension shows the previous request's final TPS as a reference
+- `τ0.8A Δ52.4A`: Pi is currently idle, so the extension shows the recent average performance
+The live `Δ` shown during streaming is an estimate. The final `Δ` shown after the response ends is more trustworthy.
+## How it works
+This section is for people who want to know what the extension is actually measuring.
+### State machine
+Internally, the extension moves through four phases:
+1. **waiting**: the request has been sent and is waiting for the first token
+2. **streaming**: the assistant is actively streaming output
+3. **settled**: the response has finished
+4. **idle**: the turn is over and the extension is showing historical values
+Example:
+```text
+idle τ… Δ? (before the very first request)
+    -> waiting(req1) τ0.2 Δ? (first request in the prompt, no usable average yet, τ updates every 200ms)
+    ...
+-> idle τ1.5A Δ50.0A (idle, with historical averages available)
+    -> waiting(req1) τ0.2 Δ50.0A  (first request in the prompt, shows average as baseline while waiting)
+    -> streaming(req1) τ1.3 Δ51.0  (live Δ updates, τ is now locked)
+    -> settled(req1) τ1.3 Δ52.0  (final Δ locked, τ locked)
+    -> waiting(req2+) τ0.2 Δ52.0L  (second or later request in the same prompt, uses last final value as baseline)
+    -> streaming(req2+) τ1.7 Δ49.0  (live Δ updates, τ locked)
+    -> settled(req2+) τ1.7 Δ49.5  (final Δ locked, τ locked)
+-> idle τ1.5A Δ50.0A
+```
+### Where `τ` comes from
+`τ` is straightforward.
+Once a provider request is sent, the extension enters `waiting` and refreshes the elapsed time every 200ms. The moment the first assistant streaming update arrives, that time delta is locked in as the TTFT for the request.
+So in practice:
+- during `waiting`, `τ` keeps increasing
+- once `streaming` begins, `τ` stops changing
+- the `τ` shown later in `settled`, the historical `τ` reused before the next request, and the `τ` that contributes to idle averages are all based on that final locked TTFT
+### Where live `Δ` and final `Δ` come from
+These two values come from different sources, and that distinction matters.
+#### Live `Δ`
+During streaming, the provider does not continuously tell Pi exactly how many new output tokens just arrived. That means live `Δ` has to be estimated locally.
+The current implementation does this:
+1. Take all assistant text that has streamed so far for the current response
+2. Estimate how many tokens that text roughly corresponds to with [`tokenx`](https://github.com/johannschopplich/tokenx)
+3. Divide that estimate by the elapsed streaming time
+In other words, live `Δ` is essentially:
+```text
+estimated output tokens so far / elapsed streaming time so far
+```
+It is not the provider's real-time token truth. It is a local approximation meant for UI feedback.
+#### Final settled `Δ`
+When the response ends, if the provider returns `usage.output`, the extension uses that to compute the final TPS:
+```text
+final output tokens / total streaming time
+```
+This is usually more trustworthy than live `Δ`, because it is based on the provider's final reported output token count rather than a local estimate.
+If a provider or a specific response does not return usable output token data, the extension falls back to the last live estimate as a best-effort display value.
+### Why live and final values can differ
+#### 1. Token estimation is heuristic
+`tokenx` is not an exact tokenizer. It is a lightweight heuristic estimator. That is why it works well for fast UI updates: it is small, fast, and easy to run on every streaming update.
+The tradeoff is obvious: it is not designed to match every model family exactly.
+`tokenx` is designed and benchmarked closer to **GPT-style tokenization / English text**. When you use other model families or output that contains non-English text, the live estimate can drift further away from the final settled value.
+#### 2. Streaming itself is uneven
+Model output does not arrive in the UI as a perfectly uniform token-by-token stream. The observed readout is affected by things like:
+- the provider's own SSE / chunk flush strategy
+- how Pi receives and surfaces updates
+- structural changes caused by thinking blocks, tool calls, and normal text appearing together
+So live `Δ` typically behaves like this:
+- unstable at first, then gradually settles
+- often approaches the final settled value, but does not perfectly match it
+### Average value `A`
+In the current implementation, `A` means the average final performance across the most recent 5 provider requests.
+- average `τ`: the average final TTFT across those recent requests
+- average `Δ`: the average settled TPS across those recent requests
+### How to interpret the data
+A good rule of thumb is:
+- `τ`: highly useful
+- settled / average `Δ`: the most useful numbers when comparing results
+- live `Δ`: reflects real-time trend and perceived speed
+## Where it fits
+- useful as a rough quantitative reference for LLM latency and speed
+- useful for quickly spotting obviously slow requests in long Pi sessions
+- not meant for strict model benchmarking
+## License
+MIT License

package/README.zh-CN.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Oh My TPS
+[English](./README.md) | 简体中文
+## 安装
+### npm package
+```bash
+pi install npm:oh-my-tps
+```
+### Git repository
+```bash
+pi install git:github.com/EnderLiquid/oh-my-tps
+```
+## 功能速览
+`oh-my-tps` 只做一件事：
+给 Pi TUI 加一组实时速度读数，测量 LLM 首字延迟和输出速度。
+- `τ`：TTFT，首个 token 到达前等了多久，单位秒
+- `Δ`：TPS，每秒输出多少 token
+显示效果：
+```
+τ0.8 Δ48.6
+```
+就这么多。
+十个字符的空间，开箱即用的体验。
+感兴趣可以继续往下看，但到这里你其实已经会用了。
+## 读数详解
+你会在 TUI 底部状态区域看到这样的读数：
+```text
+τ0.8 Δ48.6
+τ1.1 Δ49.7L
+τ0.8A Δ52.4A
+```
+这里后缀的含义：
+- `A`：Average，最近几次请求的平均最终值
+- `L`：Last，上一次请求的最终值
+可以直接这样理解：
+- `τ0.8 Δ48.6`：响应正在流式传输，首 token 大约等了 0.8 秒，现在的实时 TPS 大约是 48.6。
+- `τ1.1 Δ49.7L`：请求发出去了，但流式传输还没开始。首 token 等待时间正在计时，先显示本轮上一次响应的 TPS。
+- `τ0.8A Δ52.4A`：当前空闲，显示最近几次响应的平均表现。
+流式传输时显示的实时 `Δ` 是估算值，而响应完成后显示的 `Δ` 则更可信。
+## 原理说明
+下面这部分面向希望了解插件原理的用户。
+### 状态机
+内部分四个阶段：
+1. **waiting**：请求已发出，等待首 token
+2. **streaming**：流式输出中
+3. **settled**：响应结束
+4. **idle**：轮次结束，空闲
+示例：
+```
+idle τ… Δ? (第1轮对话未开始)
+    -> waiting(req1) τ0.2 Δ? (本轮第1次请求，Δ取avg(还不可用)，每200ms更新τ)
+    ...
+-> idle τ1.5A Δ50.0A (空闲状态，已有历史平均值)
+    -> waiting(req1) τ0.2 Δ50.0A  (本轮第1次请求，Δ取avg，每200ms更新τ)
+    -> streaming(req1) τ1.3 Δ51.0  (Δ实时更新，τ锁定)
+    -> settled(req1) τ1.3 Δ52.0  (Δ锁定，τ锁定)
+    -> waiting(req2+) τ0.2 Δ52.0L  (本轮第2+次请求，Δ取last，每200ms更新τ)
+    -> streaming(req2+) τ1.7 Δ49.0  (Δ实时，τ锁定)
+    -> settled(req2+) τ1.7 Δ49.5  (Δ锁定，τ锁定)
+-> idle τ1.5A Δ50.0A
+```
+### `τ` 的来源
+请求发出后，插件进入 `waiting` 状态，每 200ms 刷新一次等待时间；一旦收到第一条 assistant 流式更新，就把这一刻和请求发出时刻的时间差锁定下来，作为本轮的 TTFT。
+所以：
+- `waiting` 阶段的 `τ` 会一直增加
+- 进入 `streaming`，`τ` 就锁定不再变化
+- 后面的 `settled`、下一次请求开始前显示的历史值，以及 idle 阶段参与平均值计算的，都是这次请求的最终 TTFT。
+### 实时 `Δ` 和最终 `Δ` 的来源
+这两个值的来源不同，这一点非常重要。
+#### 实时 `Δ`
+流式传输过程中，provider 不会持续告诉 Pi “刚刚又生成了多少个 token”。所以实时 `Δ` 只能本地估算。
+当前做法是：
+1. 取当前这次响应到此刻为止已经流出的文本
+2. 用 [`tokenx`](https://github.com/johannschopplich/tokenx) 估算这段文本大约有多少 token
+3. 再除以从 `streaming` 开始到现在的时间
+也就是说，实时 `Δ` 本质上是：
+```text
+当前累计估算 token / 当前累计流式时间
+```
+它不是 provider 的实时真值，只是一个本地估算的近似读数。
+#### 最终 settled `Δ`
+请求结束后，如果 provider 返回了 `usage.output`，插件会直接用它来计算最终 TPS：
+```text
+最终 output token / 总流式时间
+```
+这个值通常比本地估算的实时 `Δ` 更可信，因为它基于 provider 返回的真实 token 输出量。
+如果某些 provider / 某些响应没有返回可用的 `output token`，插件才会退回到最后一次实时估算值作为兜底。
+### 实时值与最终值存在偏差的原因
+#### 1. token 估算是启发式的
+`tokenx` 不是精确 tokenizer，而是一个轻量、偏启发式的估算库。它的优势是小而快，适合实时 UI 刷新。代价也很明确：它不是为“所有模型都精确对齐”设计的。
+`tokenx` 设计与 benchmark 更偏向 **GPT tokenizer / 英文文本** 场景。当接入其他模型家族的 LLM，或者输出内容包含非英文字符时，偏差往往会更大一些。
+#### 2. 流式输出节奏不均匀
+模型输出不是严格按“每个 token 匀速到达”展示给 UI 的。实际过程中还会受到这些因素影响：
+- provider 自己的 SSE / chunk 刷新策略
+- Pi 收到事件的节奏
+- thinking、tool call、正文混在一起时的内容结构变化
+所以实时 `Δ` 通常会出现以下情况：
+- 一开始不稳定，后面会慢慢收敛
+- 和最终 settled 值接近，但不会完全重合
+### 平均值 `A`
+当前实现中，`A` 表示最近最多 5 次 provider 请求的平均最终表现。
+- 平均 `τ`：最近这些请求的最终 TTFT 平均值
+- 平均 `Δ`：最近这些请求的 settled TPS 平均值
+### 数据参考指导
+经验上可以这么看：
+- `τ`：参考价值很高
+- settled / 平均 `Δ`：最值得看，用于比较结果
+- 实时 `Δ`：反映实时趋势和体感
+## 插件适用范围
+- 适用于为 LLM 速度与延迟提供粗略量化参考
+- 适用于快速发现长会话中某次明显偏慢的请求
+- 不适用于严格的模型性能对比与基准测试
+## 许可证
+MIT License

package/extensions/oh-my-tps.ts ADDED Viewed

@@ -0,0 +1,209 @@
+import type { ExtensionAPI, ExtensionContext } from "@earendil-works/pi-coding-agent";
+import { collectAssistantText, type AssistantContentBlock } from "./shared/content.js";
+import { estimateTokens } from "./shared/token-estimator.js";
+const STATUS_KEY = "oh-my-tps";
+const WAITING_UPDATE_MS = 200;
+const MIN_STREAM_SECONDS = 0.1;
+const MAX_RECENT_SAMPLES = 5;
+type RequestPhase = "idle" | "waiting" | "streaming" | "settled";
+type RequestSample = {
+	tps: number;
+	ttft: number;
+};
+function formatNumber(value: number): string {
+	return value.toFixed(1);
+}
+function isFinitePositive(value: number | null | undefined): value is number {
+	return typeof value === "number" && Number.isFinite(value) && value > 0;
+}
+export default function ohMyTps(pi: ExtensionAPI): void {
+	let phase: RequestPhase = "idle";
+	let requestIndexInPrompt = 0;
+	let requestStartedAt = 0;
+	let streamStartedAt = 0;
+	let lockedTtft: number | null = null;
+	let lastLiveTps: number | null = null;
+	let lastFinalTps: number | null = null;
+	let recentSamples: RequestSample[] = [];
+	let waitingTimer: NodeJS.Timeout | undefined;
+	let waitingDeltaLabel = "Δ?";
+	let lastMessageText = "";
+	function stopWaitingTimer(): void {
+		if (waitingTimer) clearInterval(waitingTimer);
+		waitingTimer = undefined;
+	}
+	function setStatus(ctx: ExtensionContext, text: string): void {
+		if (!ctx.hasUI) return;
+		ctx.ui.setStatus(STATUS_KEY, text);
+	}
+	function getAverageSample(): RequestSample | null {
+		if (recentSamples.length === 0) return null;
+		let totalTps = 0;
+		let totalTtft = 0;
+		for (const sample of recentSamples) {
+			totalTps += sample.tps;
+			totalTtft += sample.ttft;
+		}
+		return {
+			tps: totalTps / recentSamples.length,
+			ttft: totalTtft / recentSamples.length,
+		};
+	}
+	function pushSample(sample: RequestSample): void {
+		recentSamples.push(sample);
+		if (recentSamples.length > MAX_RECENT_SAMPLES) {
+			recentSamples = recentSamples.slice(-MAX_RECENT_SAMPLES);
+		}
+	}
+	function renderIdle(ctx: ExtensionContext): void {
+		phase = "idle";
+		stopWaitingTimer();
+		const avg = getAverageSample();
+		if (!avg) {
+			setStatus(ctx, "τ… Δ?");
+			return;
+		}
+		setStatus(ctx, `τ${formatNumber(avg.ttft)}A Δ${formatNumber(avg.tps)}A`);
+	}
+	function selectWaitingDeltaLabel(): string {
+		if (requestIndexInPrompt <= 1) {
+			const avg = getAverageSample();
+			return avg ? `Δ${formatNumber(avg.tps)}A` : "Δ?";
+		}
+		if (isFinitePositive(lastFinalTps)) {
+			return `Δ${formatNumber(lastFinalTps)}L`;
+		}
+		const avg = getAverageSample();
+		return avg ? `Δ${formatNumber(avg.tps)}A` : "Δ?";
+	}
+	function renderWaiting(ctx: ExtensionContext): void {
+		stopWaitingTimer();
+		const update = () => {
+			const elapsed = Math.max(0, (performance.now() - requestStartedAt) / 1000);
+			setStatus(ctx, `τ${formatNumber(elapsed)} ${waitingDeltaLabel}`);
+		};
+		update();
+		waitingTimer = setInterval(update, WAITING_UPDATE_MS);
+	}
+	function renderStreaming(ctx: ExtensionContext, estimatedTps: number | null): void {
+		const ttftLabel = isFinitePositive(lockedTtft) ? `τ${formatNumber(lockedTtft)}` : "τ…";
+		const deltaLabel = isFinitePositive(estimatedTps) ? `Δ${formatNumber(estimatedTps)}` : waitingDeltaLabel;
+		setStatus(ctx, `${ttftLabel} ${deltaLabel}`);
+	}
+	function beginWaiting(ctx: ExtensionContext): void {
+		requestIndexInPrompt += 1;
+		phase = "waiting";
+		requestStartedAt = performance.now();
+		streamStartedAt = 0;
+		lockedTtft = null;
+		lastLiveTps = null;
+		lastMessageText = "";
+		waitingDeltaLabel = selectWaitingDeltaLabel();
+		renderWaiting(ctx);
+	}
+	function beginStreaming(now: number): void {
+		phase = "streaming";
+		stopWaitingTimer();
+		streamStartedAt = now;
+		lockedTtft = requestStartedAt > 0 ? Math.max(0, (now - requestStartedAt) / 1000) : null;
+	}
+	function finalizeRequest(ctx: ExtensionContext, outputTokens: number): void {
+		phase = "settled";
+		stopWaitingTimer();
+		const elapsed = streamStartedAt > 0 ? Math.max(0, (performance.now() - streamStartedAt) / 1000) : 0;
+		let finalTps: number | null = null;
+		if (elapsed > 0 && outputTokens > 0) {
+			finalTps = outputTokens / elapsed;
+		} else if (isFinitePositive(lastLiveTps)) {
+			finalTps = lastLiveTps;
+		}
+		if (isFinitePositive(finalTps)) {
+			lastFinalTps = finalTps;
+		}
+		if (isFinitePositive(finalTps) && isFinitePositive(lockedTtft)) {
+			pushSample({ tps: finalTps, ttft: lockedTtft });
+		}
+		const ttftLabel = isFinitePositive(lockedTtft) ? `τ${formatNumber(lockedTtft)}` : "τ…";
+		const deltaLabel = isFinitePositive(finalTps) ? `Δ${formatNumber(finalTps)}` : waitingDeltaLabel;
+		setStatus(ctx, `${ttftLabel} ${deltaLabel}`);
+	}
+	pi.on("session_start", async (_event, ctx) => {
+		requestIndexInPrompt = 0;
+		renderIdle(ctx);
+	});
+	pi.on("agent_start", async () => {
+		requestIndexInPrompt = 0;
+	});
+	pi.on("before_provider_request", async (_event, ctx) => {
+		beginWaiting(ctx);
+	});
+	pi.on("message_update", async (event, ctx) => {
+		if (event.message.role !== "assistant") return;
+		const now = performance.now();
+		if (phase === "waiting") {
+			beginStreaming(now);
+		} else if (phase !== "streaming") {
+			if (requestStartedAt <= 0) requestStartedAt = now;
+			if (!isFinitePositive(lockedTtft)) lockedTtft = Math.max(0, (now - requestStartedAt) / 1000);
+			streamStartedAt = now;
+			phase = "streaming";
+			stopWaitingTimer();
+		}
+		const currentText = collectAssistantText(event.message as { content?: AssistantContentBlock[] });
+		lastMessageText = currentText;
+		const elapsed = streamStartedAt > 0 ? (now - streamStartedAt) / 1000 : 0;
+		let estimatedTps: number | null = null;
+		if (elapsed >= MIN_STREAM_SECONDS && lastMessageText.length > 0) {
+			const estimatedTokens = estimateTokens(lastMessageText);
+			estimatedTps = estimatedTokens / elapsed;
+			if (isFinitePositive(estimatedTps)) {
+				lastLiveTps = estimatedTps;
+			}
+		}
+		renderStreaming(ctx, estimatedTps);
+	});
+	pi.on("message_end", async (event, ctx) => {
+		if (event.message.role !== "assistant") return;
+		const outputTokens = event.message.usage?.output ?? 0;
+		finalizeRequest(ctx, outputTokens);
+	});
+	pi.on("agent_end", async (_event, ctx) => {
+		renderIdle(ctx);
+	});
+	pi.on("session_shutdown", async (_event, ctx) => {
+		stopWaitingTimer();
+		if (ctx.hasUI) ctx.ui.setStatus(STATUS_KEY, undefined);
+	});
+}

package/extensions/shared/content.ts ADDED Viewed

@@ -0,0 +1,26 @@
+export type AssistantContentBlock = {
+	type?: string;
+	text?: string;
+	thinking?: string;
+	name?: string;
+	args?: unknown;
+};
+export function collectAssistantText(message: { content?: AssistantContentBlock[] }): string {
+	let text = "";
+	for (const block of message.content ?? []) {
+		if (block.type === "text") {
+			text += block.text ?? "";
+			continue;
+		}
+		if (block.type === "thinking") {
+			text += block.thinking ?? "";
+			continue;
+		}
+		if (block.type === "toolCall") {
+			text += block.name ?? "";
+			text += JSON.stringify(block.args ?? "");
+		}
+	}
+	return text;
+}

package/extensions/shared/token-estimator.ts ADDED Viewed

@@ -0,0 +1,6 @@
+import { estimateTokenCount, type TokenEstimationOptions } from "tokenx";
+export function estimateTokens(text: string, options?: TokenEstimationOptions): number {
+	if (!text) return 0;
+	return options ? estimateTokenCount(text, options) : estimateTokenCount(text);
+}

package/package.json ADDED Viewed

@@ -0,0 +1,49 @@
+{
+  "name": "oh-my-tps",
+  "version": "0.1.0",
+  "description": "Tiny live TTFT and TPS readouts for the Pi coding agent.",
+  "type": "module",
+  "license": "MIT",
+  "author": "EnderLiquid",
+  "keywords": [
+    "pi-package",
+    "pi-extension",
+    "pi-coding-agent",
+    "tps",
+    "ttft"
+  ],
+  "files": [
+    "extensions",
+    "README.md",
+    "LICENSE"
+  ],
+  "exports": {
+    ".": "./extensions/oh-my-tps.ts"
+  },
+  "pi": {
+    "extensions": [
+      "./extensions"
+    ]
+  },
+  "peerDependencies": {
+    "@earendil-works/pi-coding-agent": "*"
+  },
+  "dependencies": {
+    "tokenx": "^1.3.0"
+  },
+  "homepage": "https://github.com/EnderLiquid/oh-my-tps#readme",
+  "bugs": {
+    "url": "https://github.com/EnderLiquid/oh-my-tps/issues"
+  },
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/EnderLiquid/oh-my-tps.git"
+  },
+  "scripts": {
+    "test:types": "tsc --noEmit"
+  },
+  "devDependencies": {
+    "@types/node": "^24.10.0",
+    "typescript": "^5.9.3"
+  }
+}