npm - free-coding-models - Versions diffs - 0.3.78 → 0.3.80 - Mend

free-coding-models 0.3.78 → 0.3.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +1 -0
package/changelog/v0.3.79.md +21 -0
package/changelog/v0.3.80.md +9 -0
package/package.json +1 -1
package/src/benchmark.js +38 -32
package/src/constants.js +1 -1
package/src/key-handler.js +82 -1
package/src/overlays.js +1 -1
package/src/ping.js +69 -9
package/src/render-table.js +73 -36
package/src/router-daemon.js +6 -7
package/src/tui-state.js +5 -0
package/web/dist/assets/{index-BjX9NY0U.js → index-DDz3_efL.js} +1 -1
package/web/dist/index.html +1 -1

package/README.md CHANGED Viewed

@@ -456,6 +456,7 @@ When a tool mode is active (via `Z`), models incompatible with that tool are hig
 ## ✨ Features
 - **Parallel pings** — all ~165 API/Zen-callable models tested simultaneously via native `fetch` (~170 total cataloged models including CLI-only Gemini rows)
+- **AI benchmark columns** — `Ctrl+A` benchmarks the selected model, `Ctrl+U` benchmarks visible models, and results split cleanly into **AI Latency** plus **TPS**.
 - **Adaptive monitoring** — 2s burst for 60s → 10s normal → 30s idle
 - **Stability score** — composite 0–100 (p95 latency, jitter, spike rate, uptime)
 - **Smart ranking** — top 3 highlighted 🥇🥈🥉

package/changelog/v0.3.79.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Changelog v0.3.79 - 2026-05-28
+### Added
+- **Global AI Speed Test (Ctrl+U)** — Benchmark all currently visible models (respecting all active filters) in parallel with a 5-request concurrency limit to avoid rate limits. Results populate the **AI Speed** column for each model. Displays a prominent violet/purple footer badge warning about high request volume.
+- New TUI footer layout with two speed test controls aligned left:
+  - `NEW ⭐️ Ctrl+A 🤖 AI Speed Test` (green badge) — benchmark selected model only
+  - `NEW Ctrl+U : Global AI Speed Test (Uses a lot of requests!)` (violet/purple badge) — benchmark all visible models
+- Concurrency utility `runWithConcurrency(tasks, maxConcurrent)` to manage parallel benchmark requests safely.
+- Global benchmark state tracking in TUI:
+  - `globalBenchmarkRunning` (boolean)
+  - `globalBenchmarkTotal` (number of models to test)
+  - `globalBenchmarkCompleted` (progress counter)
+### Changed
+- **Renamed column** `Answer Speed` → `AI Speed` (compact: `AI Sp.`) to better reflect that it measures model inference speed, not network ping.
+- **Removed sort hotkey** `w` from the AI Speed column header — the column is not sortable, so the misleading yellow "w" has been removed. The column remains responsive and hides/shows with other optional columns.
+- Help overlay text updated: `Ctrl+A` now labeled "AI Speed Test" with description "(benchmark selected model → time + TPS)".
+- TUI footer structure updated to 3 lines (`TABLE_FOOTER_LINES = 3`) to accommodate both speed test badges and the last release date.
+### Fixed
+- Footer layout consistency with new multi-part line construction using `parts` array and proper hotkey zone registration for both `Ctrl+A` and `Ctrl+U` badges.

package/changelog/v0.3.80.md ADDED Viewed

@@ -0,0 +1,9 @@
+# Changelog v0.3.80 - 2026-05-30
+### Changed
+- Split the former single **AI Speed** table value into two clearer columns: **AI Latency** for benchmark wall-clock response time and **TPS** for rounded tokens per second. When a model's Health is not good, AI Latency now mirrors the exact Health text and TPS stays `—`, so benchmark columns do not introduce a second, conflicting error language.
+- Made the AI benchmark prompt request one cohesive 80–100 word paragraph instead of a tiny one-sentence answer, and raised the benchmark cap to 140 output tokens. This gives latency and TPS measurements enough generated text to be more stable and meaningful.
+### Fixed
+- Disabled reasoning/thinking for lightweight model ping payloads by sending `thinking: { type: "disabled" }` on OpenAI-compatible chat-completion probes. This keeps latency checks focused on network/model availability instead of accidentally paying for hidden reasoning tokens.
+- Reused the same disabled-thinking ping payload for router health probes, while leaving Replicate prediction probes unchanged because they do not use the OpenAI chat-completions schema.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "free-coding-models",
-  "version": "0.3.78",
+  "version": "0.3.80",
   "description": "Find the fastest coding LLM models in seconds — ping free models from multiple providers, pick the best one for OpenCode, Cursor, or any AI coding assistant.",
   "keywords": [
     "nvidia",

package/src/benchmark.js CHANGED Viewed

@@ -18,25 +18,27 @@
  *   → Functions:
  *   - `buildBenchmarkRequest`: Build provider-specific benchmark request
  *   - `benchmarkModel`: Run a single benchmark and return timing + token metrics
- *   - `formatBenchmarkResult`: Format a benchmark result for the TUI column
+ *   - `formatBenchmarkLatency`: Format benchmark latency for the AI Latency TUI column
+ *   - `formatBenchmarkTps`: Format benchmark throughput for the TPS TUI column
+ *   - `formatBenchmarkResult`: Legacy combined formatter for compatibility
  *   - `estimateTokensFromText`: Fallback token estimator (clearly labeled)
  *
  *   📦 Dependencies:
  *   - ./ping.js: buildPingRequest, resolveCloudflareUrl
  *
  *   @see {@link ./ping.js} Provider-specific request building
- *   @see {@link ./render-table.js} Answer Speed column rendering
+ *   @see {@link ./render-table.js} AI Latency + TPS column rendering
  */
 import { buildPingRequest, resolveCloudflareUrl } from './ping.js'
-// 📖 BENCHMARK_PROMPT: A short, unambiguous question that any model can answer.
-// 📖 Constrained to one sentence to keep benchmarks fast and consistent.
-export const BENCHMARK_PROMPT = 'Why is the sky blue? Answer in exactly one short sentence.'
+// 📖 BENCHMARK_PROMPT: A deterministic one-paragraph task that any model can answer.
+// 📖 The longer target gives latency + TPS measurements enough generated tokens to be reliable.
+export const BENCHMARK_PROMPT = 'Why is the sky blue? Answer in exactly one cohesive paragraph of 80 to 100 words. Do not use bullet points, headings, or multiple paragraphs.'
-// 📖 BENCHMARK_MAX_TOKENS: Hard cap on generation length to prevent slow models
-// 📖 from producing essays and skewing the TPS calculation.
-export const BENCHMARK_MAX_TOKENS = 32
+// 📖 BENCHMARK_MAX_TOKENS: Hard cap high enough for a real paragraph, but low enough
+// 📖 to avoid accidental essays when benchmarking many models at once.
+export const BENCHMARK_MAX_TOKENS = 140
 // 📖 BENCHMARK_TEMPERATURE: Zero temperature for deterministic, reproducible results.
 export const BENCHMARK_TEMPERATURE = 0
@@ -52,37 +54,41 @@ export function estimateTokensFromText(text) {
   return Math.ceil(text.length / 4)
 }
-// 📖 formatBenchmarkResult: Turn a raw benchmark result into a compact display string.
-// 📖 Handles all three states: empty, running, success, and error.
-// 📖
-// 📖 Success: "4.3s / 13 TPS"
-// 📖 Running: spinner (caller passes spinner char)
-// 📖 Error: compact error code like "ERR", "TIMEOUT", "401", "429"
-// 📖 Empty: "—"
-export function formatBenchmarkResult(result, { running = false, frame = 0 } = {}) {
-  if (running) {
-    const spinIdx = frame % 10
-    const spinner = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'][spinIdx]
-    return spinner
-  }
-  if (!result) {
-    return '—'
-  }
+// 📖 benchmarkSpinner: Shared tiny spinner for benchmark columns while a request runs.
+function benchmarkSpinner(frame) {
+  const spinIdx = frame % 10
+  return ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'][spinIdx]
+}
-  if (!result.ok) {
-    return result.code || 'ERR'
-  }
+// 📖 formatBenchmarkLatency: Turn a raw benchmark result into the AI Latency column value.
+// 📖 Success: "4.3s" / "12s". Error: compact error code. Empty: "—".
+export function formatBenchmarkLatency(result, { running = false, frame = 0 } = {}) {
+  if (running) return benchmarkSpinner(frame)
+  if (!result) return '—'
+  if (!result.ok) return result.code || 'ERR'
   const totalSeconds = result.totalMs / 1000
-  const secondsLabel = totalSeconds >= 10
+  return totalSeconds >= 10
     ? totalSeconds.toFixed(0) + 's'
     : totalSeconds.toFixed(1) + 's'
+}
-  const tps = result.tokensPerSecond ?? 0
-  const tpsLabel = Math.round(tps)
+// 📖 formatBenchmarkTps: Turn a raw benchmark result into the TPS column value.
+// 📖 Success is the rounded tokens/second number only because the header carries "TPS".
+// 📖 Errors and empty state stay as a dim dash in the table to avoid duplicating codes.
+export function formatBenchmarkTps(result, { running = false, frame = 0 } = {}) {
+  if (running) return benchmarkSpinner(frame)
+  if (!result || !result.ok) return '—'
+  return String(Math.round(result.tokensPerSecond ?? 0))
+}
-  return `${secondsLabel} / ${tpsLabel} TPS`
+// 📖 formatBenchmarkResult: legacy combined formatter retained for integrations/tests
+// 📖 that still expect the old single-column "latency / TPS" string.
+export function formatBenchmarkResult(result, options = {}) {
+  if (options.running) return benchmarkSpinner(options.frame ?? 0)
+  if (!result) return '—'
+  if (!result.ok) return result.code || 'ERR'
+  return `${formatBenchmarkLatency(result)} / ${formatBenchmarkTps(result)} TPS`
 }
 // 📖 buildBenchmarkRequest: Build provider-specific benchmark request.

package/src/constants.js CHANGED Viewed

@@ -102,7 +102,7 @@ export const WIDTH_WARNING_MIN_COLS = 80
 // 📖 Table row-budget constants — must stay in sync with renderTable()'s actual output.
 // 📖 If this drifts, model rows overflow and can push the title row out of view.
 export const TABLE_HEADER_LINES = 2  // 📖 title, column headers
-export const TABLE_FOOTER_LINES = 2  // 📖 actions, links
+export const TABLE_FOOTER_LINES = 3  // 📖 actions, links, speed test
 export const TABLE_FIXED_LINES  = TABLE_HEADER_LINES + TABLE_FOOTER_LINES
 // ─── Small cell-formatting helpers ────────────────────────────────────────────

package/src/key-handler.js CHANGED Viewed

@@ -33,7 +33,8 @@
 import { loadChangelog } from './changelog-loader.js'
 import { getToolMeta, isModelCompatibleWithTool, getCompatibleTools, findSimilarCompatibleModels } from './tool-metadata.js'
-import { loadConfig, saveConfig, replaceConfigContents } from './config.js'
+import { loadConfig, saveConfig, replaceConfigContents, getApiKey } from './config.js'
+import { sources } from '../sources.js'
 import { join, dirname } from 'node:path'
 import { fileURLToPath } from 'node:url'
 import { spawn } from 'node:child_process'
@@ -1075,6 +1076,80 @@ export function createKeyHandler(ctx) {
     }
   }
+  // 📖 runWithConcurrency: Execute tasks with limited parallelism (maxConcurrent simultaneous).
+  function runWithConcurrency(tasks, maxConcurrent) {
+    const results = new Array(tasks.length)
+    let nextIndex = 0
+    const workers = new Array(maxConcurrent).fill(null).map(async () => {
+      while (true) {
+        const index = nextIndex++
+        if (index >= tasks.length) break
+        try {
+          results[index] = await tasks[index]()
+        } catch (err) {
+          results[index] = { error: err }
+        }
+      }
+    })
+    return Promise.all(workers).then(() => results)
+  }
+  // 📖 runGlobalBenchmark: Benchmark all visible models with up to 5 concurrent requests.
+  // 📖 Results are stored in state.benchmarkResults (same format as individual benchmarks).
+  async function runGlobalBenchmark(state) {
+    if (state.globalBenchmarkRunning) return
+    state.globalBenchmarkRunning = true
+    const models = state.visibleSorted
+    const total = models.length
+    state.globalBenchmarkTotal = total
+    state.globalBenchmarkCompleted = 0
+    const tasks = models.map(model => async () => {
+      const benchmarkKey = `${model.providerKey}/${model.modelId}`
+      // Skip if already running (e.g., from Ctrl+A)
+      if (state.benchmarkRunning.has(benchmarkKey)) {
+        state.globalBenchmarkCompleted++
+        return { skipped: true }
+      }
+      const apiKey = getApiKey(state.config, model.providerKey) ?? null
+      const providerUrl = sources[model.providerKey]?.url ?? null
+      if (!providerUrl) {
+        state.globalBenchmarkCompleted++
+        return { skipped: true }
+      }
+      state.benchmarkRunning.add(benchmarkKey)
+      try {
+        const result = await benchmarkModel({
+          apiKey,
+          modelId: model.modelId,
+          providerKey: model.providerKey,
+          url: providerUrl,
+        })
+        state.benchmarkResults[benchmarkKey] = result
+        return { ok: result.ok }
+      } catch (err) {
+        state.benchmarkResults[benchmarkKey] = {
+          ok: false,
+          code: 'ERR',
+          totalMs: 0,
+          error: err?.message || 'Benchmark failed',
+        }
+        return { ok: false }
+      } finally {
+        state.benchmarkRunning.delete(benchmarkKey)
+        state.globalBenchmarkCompleted++
+      }
+    })
+    await runWithConcurrency(tasks, 5)
+    state.globalBenchmarkRunning = false
+    state.globalBenchmarkTotal = 0
+    state.globalBenchmarkCompleted = 0
+  }
   // 📖 Favorites display mode:
   // 📖 - true  => favorites stay pinned + always visible (legacy behavior)
   // 📖 - false => favorites are just starred rows and obey normal sort/filter rules
@@ -1386,6 +1461,12 @@ export function createKeyHandler(ctx) {
       return
     }
+    // 📖 Ctrl+U: Global AI Speed Benchmark (benchmark all visible models, 5 concurrent)
+    if (key.ctrl && key.name === 'u') {
+      await runGlobalBenchmark(state)
+      return
+    }
     // 📖 Command palette captures the keyboard while active.
     if (state.commandPaletteOpen) {
       if (key.ctrl && key.name === 'c') { exit(0); return }

package/src/overlays.js CHANGED Viewed

@@ -930,7 +930,7 @@ export function createOverlayRenderers(state, deps) {
     lines.push(`  ${heading('Controls')}`)
     lines.push(`  ${key('W')}  Toggle ping mode  ${hint('(speed 2s → normal 10s → slow 30s → forced 4s)')}`)
     lines.push(`  ${key('Ctrl+P')}  Open ⚡️ command palette  ${hint('(search and run actions quickly)')}`)
-    lines.push(`  ${key('Ctrl+A')}  Benchmark answer speed  ${hint('(real completion on selected model → time + TPS)')}`)
+    lines.push(`  ${key('Ctrl+A')}  AI Speed Test  ${hint('(benchmark selected model → time + TPS)')}`)
     lines.push(`  ${key('E')}  Cycle filter mode  ${hint('(Normal → Configured only → Usable only)')}`)
     lines.push(`  ${key('Z')}  Cycle tool mode  ${hint('(📦 OpenCode → π Pi → 🪼 jcode → 📦 Desktop → 🦞 OpenClaw → 💘 Crush → 🪿 Goose → 🛠 Aider → 🐉 Qwen → 🤲 OpenHands → ⚡ Amp → 🦘 Rovo → ♊ Gemini)')}`)
     lines.push(`  ${key('F')}  Toggle favorite on selected row  ${hint('(1️⃣2️⃣3️⃣ = router fallback order, capped at 🔟)')}`)

package/src/ping.js CHANGED Viewed

@@ -16,6 +16,9 @@
  *
  *   → Functions:
  *   - `resolveCloudflareUrl`: Resolve {account_id} placeholder from CLOUDFLARE_ACCOUNT_ID env var
+ *   - `buildChatCompletionPingBody`: Build minimal chat-completion probe payloads with thinking disabled
+ *   - `markDisabledThinkingUnsupported`: Cache strict providers that reject the optional thinking control
+ *   - `shouldUseDisabledThinkingForProvider`: Decide whether a provider should receive disabled-thinking probes
  *   - `buildPingRequest`: Build provider-specific HTTP request for pinging
  *   - `ping`: Send async ping request with timeout; returns { code, ms, quotaPercent }
  *   - `getHeaderValue`: Helper to extract header value from Headers object or plain object
@@ -41,6 +44,9 @@ import { PING_TIMEOUT } from './constants.js'
 import { fetchProviderQuota as _fetchProviderQuotaFromModule } from './provider-quota-fetchers.js'
 import { supportsUsagePercent } from './quota-capabilities.js'
+const DISABLED_THINKING_RETRY_STATUSES = new Set([400, 422])
+const disabledThinkingUnsupportedProviders = new Set()
 // 📖 resolveCloudflareUrl: Cloudflare's OpenAI-compatible endpoint is account-scoped.
 // 📖 We resolve {account_id} from env so provider setup can stay simple in config.
 export function resolveCloudflareUrl(url) {
@@ -50,10 +56,37 @@ export function resolveCloudflareUrl(url) {
   return url.replace('{account_id}', encodeURIComponent(accountId))
 }
+// 📖 buildChatCompletionPingBody: Use the smallest useful chat-completion probe.
+// 📖 The explicit thinking toggle prevents reasoning-capable endpoints from spending
+// 📖 hidden tokens or adding thinking latency when we only need availability + RTT.
+export function buildChatCompletionPingBody(modelId, overrides = {}, options = {}) {
+  const body = {
+    model: modelId,
+    messages: [{ role: 'user', content: 'hi' }],
+    max_tokens: 1,
+    thinking: { type: 'disabled' },
+    ...overrides,
+  }
+  if (options.disableThinking === false) delete body.thinking
+  return body
+}
+// 📖 markDisabledThinkingUnsupported: remember strict providers that reject the
+// 📖 optional `thinking` field so future pings avoid repeated 400/422 retries.
+export function markDisabledThinkingUnsupported(providerKey) {
+  if (providerKey) disabledThinkingUnsupportedProviders.add(providerKey)
+}
+// 📖 shouldUseDisabledThinkingForProvider: central policy for OpenAI-compatible
+// 📖 probes, shared by regular pings and router health probes.
+export function shouldUseDisabledThinkingForProvider(providerKey) {
+  return !disabledThinkingUnsupportedProviders.has(providerKey)
+}
 // 📖 buildPingRequest: Build provider-specific ping request.
 // 📖 Handles Replicate's /v1/predictions format, Cloudflare's account_id in URL,
 // 📖 and standard OpenAI-compliant chat completions with provider-specific headers.
-export function buildPingRequest(apiKey, modelId, providerKey, url) {
+export function buildPingRequest(apiKey, modelId, providerKey, url, options = {}) {
   // 📖 ZAI models are stored as "zai/glm-..." in sources.js but the API expects just "glm-..."
   const apiModelId = providerKey === 'zai' ? modelId.replace(/^zai\//, '') : modelId
@@ -75,7 +108,9 @@ export function buildPingRequest(apiKey, modelId, providerKey, url) {
     return {
       url: resolveCloudflareUrl(url),
       headers,
-      body: { model: apiModelId, messages: [{ role: 'user', content: 'hi' }], max_tokens: 1 },
+      body: buildChatCompletionPingBody(apiModelId, {}, {
+        disableThinking: options.disableThinking ?? shouldUseDisabledThinkingForProvider(providerKey),
+      }),
     }
   }
@@ -90,7 +125,31 @@ export function buildPingRequest(apiKey, modelId, providerKey, url) {
   return {
     url,
     headers,
-    body: { model: apiModelId, messages: [{ role: 'user', content: 'hi' }], max_tokens: 1 },
+    body: buildChatCompletionPingBody(apiModelId, {}, {
+      disableThinking: options.disableThinking ?? shouldUseDisabledThinkingForProvider(providerKey),
+    }),
+  }
+}
+// 📖 sendPingFetch: keep retry code tiny and ensure both attempts use the same abort signal.
+async function sendPingFetch(req, signal) {
+  return fetch(req.url, {
+    method: 'POST', signal,
+    headers: req.headers,
+    body: JSON.stringify(req.body),
+  })
+}
+// 📖 isDisabledThinkingRejected: strict OpenAI-compatible gateways may reject
+// 📖 unknown root fields. We only retry when the status and error text names
+// 📖 the optional `thinking` control, avoiding retries for real model failures.
+async function isDisabledThinkingRejected(resp, req) {
+  if (!req?.body?.thinking || !DISABLED_THINKING_RETRY_STATUSES.has(resp.status)) return false
+  try {
+    const text = await resp.clone().text()
+    return /thinking/i.test(text)
+  } catch {
+    return false
   }
 }
@@ -104,12 +163,13 @@ export async function ping(apiKey, modelId, providerKey, url) {
   const timer = setTimeout(() => ctrl.abort(), PING_TIMEOUT)
   const t0    = performance.now()
   try {
-    const req = buildPingRequest(apiKey, modelId, providerKey, url)
-    const resp = await fetch(req.url, {
-      method: 'POST', signal: ctrl.signal,
-      headers: req.headers,
-      body: JSON.stringify(req.body),
-    })
+    let req = buildPingRequest(apiKey, modelId, providerKey, url)
+    let resp = await sendPingFetch(req, ctrl.signal)
+    if (await isDisabledThinkingRejected(resp, req)) {
+      markDisabledThinkingUnsupported(providerKey)
+      req = buildPingRequest(apiKey, modelId, providerKey, url, { disableThinking: false })
+      resp = await sendPingFetch(req, ctrl.signal)
+    }
     // 📖 Normalize all HTTP 2xx statuses to "200" so existing verdict/avg logic still works.
     const code = resp.status >= 200 && resp.status < 300 ? '200' : String(resp.status)
     return {

package/src/render-table.js CHANGED Viewed

@@ -49,8 +49,8 @@ import { themeColors, getProviderRgb, getTierRgb, getReadableTextRgb, getTheme }
 import { TIER_COLOR } from './tier-colors.js'
 import { getAvg, getVerdict, getUptime, getStabilityScore, getVersionStatusInfo } from './utils.js'
 import { usagePlaceholderForProvider } from './ping.js'
-import { formatBenchmarkResult } from './benchmark.js'
-import { calculateViewport, sortResultsWithPinnedFavorites, padEndDisplay, displayWidth } from './render-helpers.js'
+import { formatBenchmarkLatency, formatBenchmarkTps } from './benchmark.js'
+import { calculateViewport, sortResultsWithPinnedFavorites, padEndDisplay, displayWidth, stripAnsi } from './render-helpers.js'
 import { getToolMeta, TOOL_METADATA, TOOL_MODE_ORDER, isModelCompatibleWithTool } from './tool-metadata.js'
 import { getColumnSpacing } from './ui-config.js'
 import { detectPackageManager, getManualInstallCmd } from './updater.js'
@@ -91,6 +91,8 @@ const COLUMN_SORT_MAP = {
   verdict: 'verdict',
   stability: 'stability',
   uptime: 'uptime',
+  aiLatency: null,
+  tps: null,
 }
 export { COLUMN_SORT_MAP }
@@ -277,7 +279,8 @@ export function renderTable({
   const W_STATUS = 18
   const W_VERDICT = 14
   const W_UPTIME = 6
-  const W_ANSWER = 14
+  const W_AI_LATENCY = 18
+  const W_TPS = 5
   // const W_TOKENS = 7 // Used column removed
   // const W_USAGE = 7 // Usage column removed
@@ -285,17 +288,18 @@ export function renderTable({
   // 📖 Responsive column visibility: progressively hide least-useful columns
   // 📖 and shorten header labels when terminal width is insufficient.
-  // 📖 Hiding order (least useful first): Rank → Answer Speed → Up% → Tier → Stability
+  // 📖 Hiding order (least useful first): Rank → AI Latency/TPS → Up% → Tier → Stability
   // 📖 Compact mode shrinks: Latest Ping→Lat. P (9), Avg Ping→Avg. P (8),
   // 📖 Stability→StaB. (8), Provider→4chars+… (7), Health→6chars+… (13)
-  // 📖 Breakpoints: full=183 | compact=160 | -Rank=151 | -Answer=142 | -Up%=133 | -Tier=125 | -Stab=114
+  // 📖 Breakpoints are computed dynamically from active column widths.
   let wPing = 14
   let wAvg = 11
   let wStab = 11
   let wSource = W_SOURCE
   let wStatus = W_STATUS
+  let wAiLatency = W_AI_LATENCY
   let showRank = true
-  let showAnswerSpeed = true
+  let showBenchmarkColumns = true
   let showUptime = true
   let showTier = true
   let showStability = true
@@ -310,7 +314,7 @@ export function renderTable({
       cols.push(W_SWE, W_CTX, W_MODEL, wSource, wPing, wAvg, wStatus, W_VERDICT)
       if (showStability) cols.push(wStab)
       if (showUptime) cols.push(W_UPTIME)
-      if (showAnswerSpeed) cols.push(W_ANSWER)
+      if (showBenchmarkColumns) cols.push(wAiLatency, W_TPS)
       return ROW_MARGIN + cols.reduce((a, b) => a + b, 0) + (cols.length - 1) * SEP_W
     }
@@ -322,10 +326,11 @@ export function renderTable({
       wStab = 8      // 'StaB.' instead of 'Stability'
       wSource = 7    // Provider truncated to 4 chars + '…', 7 cols total
       wStatus = 13   // Health truncated after 6 chars + '…'
+      wAiLatency = 13 // Mirror compact Health text when health is not good
     }
     // 📖 Steps 2–6: Progressive column hiding (least useful first)
     if (calcWidth() > terminalCols) showRank = false
-    if (calcWidth() > terminalCols) showAnswerSpeed = false
+    if (calcWidth() > terminalCols) showBenchmarkColumns = false
     if (calcWidth() > terminalCols) showUptime = false
     if (calcWidth() > terminalCols) showTier = false
     if (calcWidth() > terminalCols) showStability = false
@@ -348,7 +353,10 @@ export function renderTable({
     colDefs.push({ name: 'verdict', width: W_VERDICT })
     if (showStability) colDefs.push({ name: 'stability', width: wStab })
     if (showUptime) colDefs.push({ name: 'uptime', width: W_UPTIME })
-    if (showAnswerSpeed) colDefs.push({ name: 'answerSpeed', width: W_ANSWER })
+    if (showBenchmarkColumns) {
+      colDefs.push({ name: 'aiLatency', width: wAiLatency })
+      colDefs.push({ name: 'tps', width: W_TPS })
+    }
     let x = ROW_MARGIN + 1 // 📖 1-based: first column starts after the 2-char left margin
     const columns = []
     for (let i = 0; i < colDefs.length; i++) {
@@ -475,12 +483,17 @@ export function renderTable({
     return themeColors.hotkey('U') + themeColors.dim('p%' + padding)
   })()
-  // 📖 Answer Speed header — no sort hotkey, just the label
-  const answerLabel = isCompact ? 'Answ.' : 'Answer Speed'
-  const answerH_c = (() => {
-    const plain = answerLabel
-    const padding = ' '.repeat(Math.max(0, W_ANSWER - plain.length))
-    return themeColors.dim('Ans') + themeColors.hotkey('w') + themeColors.dim('er' + (isCompact ? '.' : ' Speed') + padding)
+  // 📖 Benchmark headers — split the old combined AI Speed field into latency + throughput.
+  const aiLatencyLabel = isCompact ? 'AI Lat.' : 'AI Latency'
+  const aiLatencyH_c = (() => {
+    const plain = aiLatencyLabel
+    const padding = ' '.repeat(Math.max(0, wAiLatency - plain.length))
+    return themeColors.dim(plain + padding)
+  })()
+  const tpsH_c = (() => {
+    const plain = 'TPS'
+    const padding = ' '.repeat(Math.max(0, W_TPS - plain.length))
+    return themeColors.dim(plain + padding)
   })()
   // 📖 Usage column removed from UI – no header or separator for it.
@@ -491,7 +504,7 @@ export function renderTable({
   headerParts.push(sweH_c, ctxH_c, modelH_c, originH_c, pingH_c, avgH_c, healthH_c, verdictH_c)
   if (showStability) headerParts.push(stabH_c)
   if (showUptime) headerParts.push(uptimeH_c)
-  if (showAnswerSpeed) headerParts.push(answerH_c)
+  if (showBenchmarkColumns) headerParts.push(aiLatencyH_c, tpsH_c)
   lines.push('  ' + headerParts.join(COL_SEP))
   // 📖 Mouse support: the column header row is the last line we just pushed.
@@ -793,24 +806,28 @@ export function renderTable({
     // (We keep the logic but do not render it.)
     const usageCell = ''
-    // 📖 Answer Speed column — show benchmark result, running spinner, or dash
+    // 📖 AI Latency + TPS columns — same benchmark result, split into two readable metrics.
     const benchmarkKey = `${r.providerKey}/${r.modelId}`
     const benchmarkResult = benchmarkResults[benchmarkKey]
     const isBenchmarkRunning = benchmarkRunning.has(benchmarkKey)
-    let answerSpeedCell
-    if (isBenchmarkRunning) {
-      const spinner = FRAMES[frame % FRAMES.length]
-      answerSpeedCell = themeColors.success(spinner.padEnd(W_ANSWER))
-    } else if (benchmarkResult) {
-      const text = formatBenchmarkResult(benchmarkResult)
-      // 📖 Colorize: success = green, error = red/dim
-      const isError = !benchmarkResult.ok
-      answerSpeedCell = isError
-        ? themeColors.metricBad(text.padEnd(W_ANSWER))
-        : themeColors.metricGood(text.padEnd(W_ANSWER))
-    } else {
-      answerSpeedCell = themeColors.dim('—'.padEnd(W_ANSWER))
-    }
+    const healthIsGood = r.status === 'up'
+    const latencyText = healthIsGood
+      ? formatBenchmarkLatency(benchmarkResult, { running: isBenchmarkRunning, frame })
+      : statusDisplayText
+    const tpsText = healthIsGood
+      ? formatBenchmarkTps(benchmarkResult, { running: isBenchmarkRunning, frame })
+      : '—'
+    const benchmarkIsError = healthIsGood && benchmarkResult && !benchmarkResult.ok
+    const latencyCell = !healthIsGood
+      ? statusColor(padEndDisplay(latencyText, wAiLatency))
+      : benchmarkIsError
+        ? themeColors.metricBad(latencyText.padEnd(wAiLatency))
+        : benchmarkResult || isBenchmarkRunning
+          ? themeColors.metricGood(latencyText.padEnd(wAiLatency))
+          : themeColors.dim(latencyText.padEnd(wAiLatency))
+    const tpsCell = healthIsGood && (benchmarkResult?.ok || isBenchmarkRunning)
+      ? themeColors.metricGood(tpsText.padEnd(W_TPS))
+      : themeColors.dim(tpsText.padEnd(W_TPS))
     // 📖 Build row: conditionally include columns based on responsive visibility
     const rowParts = []
@@ -819,7 +836,7 @@ export function renderTable({
     rowParts.push(sweCell, ctxCell, nameCell, sourceCell, pingCell, avgCell, status, speedCell)
     if (showStability) rowParts.push(stabCell)
     if (showUptime) rowParts.push(uptimeCell)
-    if (showAnswerSpeed) rowParts.push(answerSpeedCell)
+    if (showBenchmarkColumns) rowParts.push(latencyCell, tpsCell)
     const row = '  ' + rowParts.join(COL_SEP)
     if (isCursor) {
@@ -942,9 +959,8 @@ export function renderTable({
     }
   }
-  // 📖 Line 2: command palette (highlighted as new) + GitHub link.
-  // 📖 Ctrl+P Cmd Palette uses neon-green-on-dark-green background to highlight the feature.
-  const paletteLabel = chalk.bgRgb(0, 60, 0).rgb(57, 255, 20).bold(' Ctrl+P Cmd Palette ')
+  // 📖 Line 2: command palette (simple color, no background) + GitHub link.
+  const paletteLabel = chalk.rgb(57, 255, 20).bold('Ctrl+P Cmd Palette')
   const starLink = '⭐ ' + themeColors.link('\x1b]8;;https://github.com/vava-nessa/free-coding-models\x1b\\GitHub\x1b]8;;\x1b\\')
   lines.push(
     '  ' + paletteLabel + themeColors.dim(`  •  `) + starLink + themeColors.dim(`  •  `) +
@@ -998,8 +1014,29 @@ export function renderTable({
   const releaseLabel = lastReleaseDate
     ? chalk.rgb(255, 182, 193)(`Last release: ${lastReleaseDate}`)
     : ''
+  const speedTestLabel = chalk.bgRgb(0, 60, 0).rgb(57, 255, 20).bold(' NEW ⭐️ Ctrl+A 🤖 AI Speed Test ')
+  const globalBenchmarkLabel = chalk.bgRgb(180, 0, 255).white.bold(' NEW Ctrl+U : Global AI Speed Test (Uses a lot of requests!) ')
-  if (releaseLabel) lines.push('  ' + releaseLabel)
+  // 📖 Line 3: Speed Test (Ctrl+A) + Global Benchmark (Ctrl+U) + Last release
+  if (releaseLabel || speedTestLabel || globalBenchmarkLabel) {
+    const parts = [
+      { text: '  ', key: null },
+      { text: speedTestLabel, key: 'a' },
+      { text: '  ', key: null },
+      { text: globalBenchmarkLabel, key: 'u' },
+      { text: '  ', key: null },
+      { text: releaseLabel, key: null },
+    ]
+    const footerRow3 = lines.length + 1
+    let xPos = 1
+    for (const part of parts) {
+      const w = displayWidth(part.text)
+      if (part.key) footerHotkeys.push({ key: part.key, row: footerRow3, xStart: xPos, xEnd: xPos + w - 1 })
+      xPos += w
+    }
+    const line = parts.map(p => p.text).join('')
+    lines.push(line)
+  }
   _lastLayout.footerHotkeys = footerHotkeys
   // 📖 Append \x1b[K (erase to EOL) to each line so leftover chars from previous

package/src/router-daemon.js CHANGED Viewed

@@ -47,7 +47,7 @@ import {
   normalizeRouterConfig,
   saveConfig,
 } from './config.js'
-import { resolveCloudflareUrl } from './ping.js'
+import { buildChatCompletionPingBody, resolveCloudflareUrl, shouldUseDisabledThinkingForProvider } from './ping.js'
 import { sendUsageTelemetry } from './telemetry.js'
 export const ROUTER_DEFAULT_PORT = 19280
@@ -1200,12 +1200,11 @@ class RouterRuntime {
         : await fetch(providerUrl, {
             method: 'POST',
             headers: cloneHeadersForUpstream({}, apiKey, candidate.provider),
-            body: JSON.stringify({
-              model: getApiModelId(candidate.provider, candidate.model),
-              messages: [{ role: 'user', content: 'hi' }],
-              max_tokens: 1,
-              stream: false,
-            }),
+            body: JSON.stringify(buildChatCompletionPingBody(
+              getApiModelId(candidate.provider, candidate.model),
+              { stream: false },
+              { disableThinking: shouldUseDisabledThinkingForProvider(candidate.provider) }
+            )),
             signal: controller.signal,
           })
       const latencyMs = Math.round(performance.now() - started)