npm - free-coding-models - Versions diffs - 0.1.66 → 0.1.67 - Mend

free-coding-models 0.1.66 → 0.1.67

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +105 -18
package/bin/free-coding-models.js +217 -73
package/lib/utils.js +121 -25
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -36,7 +36,9 @@
   <a href="#-requirements">Requirements</a> •
   <a href="#-installation">Installation</a> •
   <a href="#-usage">Usage</a> •
-  <a href="#-models">Models</a> •
+  <a href="#-tui-columns">Columns</a> •
+  <a href="#-stability-score">Stability</a> •
+  <a href="#-coding-models">Models</a> •
   <a href="#-opencode-integration">OpenCode</a> •
   <a href="#-openclaw-integration">OpenClaw</a> •
   <a href="#-how-it-works">How it works</a>
@@ -52,9 +54,10 @@
 - **🚀 Parallel pings** — All models tested simultaneously via native `fetch`
 - **📊 Real-time animation** — Watch latency appear live in alternate screen buffer
 - **🏆 Smart ranking** — Top 3 fastest models highlighted with medals 🥇🥈🥉
-- **⏱ Continuous monitoring** — Pings all models every 2 seconds forever, never stops
+- **⏱ Continuous monitoring** — Pings all models every 3 seconds forever, never stops
 - **📈 Rolling averages** — Avg calculated from ALL successful pings since start
 - **📊 Uptime tracking** — Percentage of successful pings shown in real-time
+- **📐 Stability score** — Composite 0–100 score measuring consistency (p95, jitter, spikes, uptime) — a model with 400ms avg and stable responses beats a 250ms avg model that randomly spikes to 6s
 - **🔄 Auto-retry** — Timeout models keep getting retried, nothing is ever "given up on"
 - **🎮 Interactive selection** — Navigate with arrow keys directly in the table, press Enter to act
 - **🔀 Startup mode menu** — Choose between OpenCode and OpenClaw before the TUI launches
@@ -177,7 +180,7 @@ Use `↑↓` arrows to select, `Enter` to confirm. Then the TUI launches with yo
 **How it works:**
 1. **Ping phase** — All enabled models are pinged in parallel (up to 134 across 17 providers)
-2. **Continuous monitoring** — Models are re-pinged every 2 seconds forever
+2. **Continuous monitoring** — Models are re-pinged every 3 seconds forever
 3. **Real-time updates** — Watch "Latest", "Avg", and "Up%" columns update live
 4. **Select anytime** — Use ↑↓ arrows to navigate, press Enter on a model to act
 5. **Smart detection** — Automatically detects if NVIDIA NIM is configured in OpenCode or OpenClaw
@@ -414,6 +417,92 @@ Current tier filter is shown in the header badge (e.g., `[Tier S]`)
 ---
+## 📊 TUI Columns
+The main table displays one row per model with the following columns:
+| Column | Sort key | Description |
+|--------|----------|-------------|
+| **Rank** | `R` | Position based on current sort order (medals for top 3: 🥇🥈🥉) |
+| **Tier** | `Y` | SWE-bench tier (S+, S, A+, A, A-, B+, B, C) |
+| **SWE%** | `S` | SWE-bench Verified score — the industry-standard benchmark for real GitHub issue resolution |
+| **CTX** | `C` | Context window size in thousands of tokens (e.g. `128k`) |
+| **Model** | `M` | Model display name (favorites show ⭐ prefix) |
+| **Origin** | `N` | Provider name (NIM, Groq, Cerebras, etc.) — press `N` to cycle origin filter |
+| **Latest Ping** | `L` | Most recent round-trip latency in milliseconds |
+| **Avg Ping** | `A` | Rolling average of ALL successful pings since launch |
+| **Health** | `H` | Current status: UP ✅, NO KEY 🔑, Timeout ⏳, Overloaded 🔥, Not Found 🚫 |
+| **Verdict** | `V` | Health verdict based on avg latency + stability analysis (see below) |
+| **Stability** | `B` | Composite 0–100 consistency score (see [Stability Score](#-stability-score)) |
+| **Up%** | `U` | Uptime — percentage of successful pings out of total attempts |
+### Verdict values
+The Verdict column combines average latency with stability analysis:
+| Verdict | Meaning |
+|---------|---------|
+| **Perfect** | Avg < 400ms with stable p95/jitter |
+| **Normal** | Avg < 1000ms, consistent responses |
+| **Slow** | Avg 1000–2000ms |
+| **Spiky** | Good avg but erratic tail latency (p95 >> avg) |
+| **Very Slow** | Avg 2000–5000ms |
+| **Overloaded** | Server returned 429/503 (rate limited or capacity hit) |
+| **Unstable** | Was previously up but now timing out, or avg > 5000ms |
+| **Not Active** | No successful pings yet |
+| **Pending** | First ping still in flight |
+---
+## 📐 Stability Score
+The **Stability** column (sort with `B` key) shows a composite 0–100 score that answers: *"How consistent and predictable is this model?"*
+Average latency alone is misleading — a model averaging 250ms that randomly spikes to 6 seconds *feels* slower in practice than a steady 400ms model. The stability score captures this.
+### Formula
+Four signals are normalized to 0–100 each, then combined with weights:
+```
+Stability = 0.30 × p95_score
+          + 0.30 × jitter_score
+          + 0.20 × spike_score
+          + 0.20 × reliability_score
+```
+| Component | Weight | What it measures | How it's normalized |
+|-----------|--------|-----------------|---------------------|
+| **p95 latency** | 30% | Tail-latency spikes — the worst 5% of response times | `100 × (1 - p95 / 5000)`, clamped to 0–100 |
+| **Jitter (σ)** | 30% | Erratic response times — standard deviation of ping times | `100 × (1 - jitter / 2000)`, clamped to 0–100 |
+| **Spike rate** | 20% | Fraction of pings above 3000ms | `100 × (1 - spikes / total_pings)` |
+| **Reliability** | 20% | Uptime — fraction of successful HTTP 200 pings | Direct uptime percentage (0–100) |
+### Color coding
+| Score | Color | Interpretation |
+|-------|-------|----------------|
+| **80–100** | Green | Rock-solid — very consistent, safe to rely on |
+| **60–79** | Cyan | Good — occasional variance but generally stable |
+| **40–59** | Yellow | Shaky — noticeable inconsistency |
+| **< 40** | Red | Unreliable — frequent spikes or failures |
+| **—** | Dim | No data yet (no successful pings) |
+### Example
+Two models with similar average latency, very different real-world experience:
+```
+Model A:  avg 250ms,  p95 6000ms,  jitter 1800ms  →  Stability ~30  (red)
+Model B:  avg 400ms,  p95  650ms,  jitter  120ms  →  Stability ~85  (green)
+```
+Model B is the better choice despite its higher average — it won't randomly stall your coding workflow.
+> 💡 **Tip:** Sort by Stability (`B` key) after a few minutes of monitoring to find the models that deliver the most predictable performance.
+---
 ## 🔌 OpenCode Integration
 **The easiest way** — let `free-coding-models` do everything:
@@ -589,19 +678,19 @@ This script:
 ## ⚙️ How it works
 ```
-┌─────────────────────────────────────────────────────────────┐
-│  1. Enter alternate screen buffer (like vim/htop/less)      │
-│  2. Ping ALL models in parallel                             │
-│  3. Display real-time table with Latest/Avg/Up% columns     │
-│  4. Re-ping ALL models every 2 seconds (forever)           │
-│  5. Update rolling averages from ALL successful pings      │
-│  6. User can navigate with ↑↓ and select with Enter       │
-│  7. On Enter (OpenCode): set model, launch OpenCode        │
-│  8. On Enter (OpenClaw): update ~/.openclaw/openclaw.json  │
-└─────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────┐
+│  1. Enter alternate screen buffer (like vim/htop/less)           │
+│  2. Ping ALL models in parallel                                  │
+│  3. Display real-time table with Latest/Avg/Stability/Up%        │
+│  4. Re-ping ALL models every 3 seconds (forever)                │
+│  5. Update rolling averages + stability scores per model        │
+│  6. User can navigate with ↑↓ and select with Enter            │
+│  7. On Enter (OpenCode): set model, launch OpenCode             │
+│  8. On Enter (OpenClaw): update ~/.openclaw/openclaw.json       │
+└──────────────────────────────────────────────────────────────────┘
 ```
-**Result:** Continuous monitoring interface that stays open until you select a model or press Ctrl+C. Rolling averages give you accurate long-term latency data, uptime percentage tracks reliability, and you can configure your tool of choice with your chosen model in one keystroke.
+**Result:** Continuous monitoring interface that stays open until you select a model or press Ctrl+C. Rolling averages give you accurate long-term latency data, the stability score reveals which models are truly consistent vs. deceptively spikey, and you can configure your tool of choice with one keystroke.
 ---
@@ -675,7 +764,7 @@ This script:
 **Configuration:**
 - **Ping timeout**: 15 seconds per attempt (slow models get more time)
-- **Ping interval**: 2 seconds between complete re-pings of all models (adjustable with W/X keys)
+- **Ping interval**: 3 seconds between complete re-pings of all models (adjustable with W/X keys)
 - **Monitor mode**: Interface stays open forever, press Ctrl+C to exit
 **Flags:**
@@ -697,7 +786,7 @@ This script:
 **Keyboard shortcuts (main TUI):**
 - **↑↓** — Navigate models
 - **Enter** — Select model (launches OpenCode or sets OpenClaw default, depending on mode)
-- **R/Y/O/M/L/A/S/N/H/V/U** — Sort by Rank/Tier/Origin/Model/LatestPing/Avg/SWE/Ctx/Health/Verdict/Uptime
+- **R/Y/O/M/L/A/S/N/H/V/B/U** — Sort by Rank/Tier/Origin/Model/LatestPing/Avg/SWE/Ctx/Health/Verdict/Stability/Uptime
 - **F** — Toggle favorite on selected model (⭐ in Model column, pinned at top)
 - **T** — Cycle tier filter (All → S+ → S → A+ → A → A- → B+ → B → C → All)
 - **Z** — Cycle mode (OpenCode CLI → OpenCode Desktop → OpenClaw)
@@ -772,5 +861,3 @@ We welcome contributions! Feel free to open issues, submit pull requests, or get
 For questions or issues, open a [GitHub issue](https://github.com/vava-nessa/free-coding-models/issues).
 💬 Let's talk about the project on Discord: https://discord.gg/5MbTnDC3Md
-> ⚠️ **free-coding-models is a BETA TUI** — it might crash or have problems. Use at your own risk and feel free to report issues!

package/bin/free-coding-models.js CHANGED Viewed

@@ -23,7 +23,7 @@
  *   - Settings screen (P key) to manage API keys, provider toggles, analytics, and manual updates
  *   - Favorites system: toggle with F, pin rows to top, persist between sessions
  *   - Uptime percentage tracking (successful pings / total pings)
- *   - Sortable columns (R/Y/O/M/L/A/S/N/H/V/U keys)
+ *   - Sortable columns (R/Y/O/M/L/A/S/N/H/V/B/U keys)
  *   - Tier filtering via T key (cycles S+→S→A+→A→A-→B+→B→C→All)
  *
  *   → Functions:
@@ -93,7 +93,7 @@ import { join, dirname } from 'path'
 import { createServer } from 'net'
 import { MODELS, sources } from '../sources.js'
 import { patchOpenClawModelsJson } from '../patch-openclaw-models.js'
-import { getAvg, getVerdict, getUptime, sortResults, filterByTier, findBestModel, parseArgs, TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP } from '../lib/utils.js'
+import { getAvg, getVerdict, getUptime, getP95, getJitter, getStabilityScore, sortResults, filterByTier, findBestModel, parseArgs, TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP } from '../lib/utils.js'
 import { loadConfig, saveConfig, getApiKey, isProviderEnabled } from '../lib/config.js'
 const require = createRequire(import.meta.url)
@@ -717,7 +717,7 @@ const ALT_HOME   = '\x1b[H'
 // 📖 This allows easy addition of new model sources beyond NVIDIA NIM
 const PING_TIMEOUT  = 15_000   // 📖 15s per attempt before abort - slow models get more time
-const PING_INTERVAL = 2_000    // 📖 Ping all models every 2 seconds in continuous mode
+const PING_INTERVAL = 3_000    // 📖 Ping all models every 3 seconds in continuous mode
 const FPS          = 12
 const COL_MODEL    = 22
@@ -767,6 +767,47 @@ function stripAnsi(input) {
   return String(input).replace(/\x1b\[[0-9;]*m/g, '').replace(/\x1b\][^\x1b]*\x1b\\/g, '')
 }
+// 📖 Calculate display width of a string in terminal columns.
+// 📖 Emojis and other wide characters occupy 2 columns, variation selectors (U+FE0F) are zero-width.
+// 📖 This avoids pulling in a full `string-width` dependency for a lightweight CLI tool.
+function displayWidth(str) {
+  const plain = stripAnsi(String(str))
+  let w = 0
+  for (const ch of plain) {
+    const cp = ch.codePointAt(0)
+    // Zero-width: variation selectors (FE00-FE0F), zero-width joiner/non-joiner, combining marks
+    if ((cp >= 0xFE00 && cp <= 0xFE0F) || cp === 0x200D || cp === 0x200C || cp === 0x20E3) continue
+    // Wide: CJK, emoji (most above U+1F000), fullwidth forms
+    if (
+      cp > 0x1F000 ||                              // emoji & symbols
+      (cp >= 0x2600 && cp <= 0x27BF) ||             // misc symbols, dingbats
+      (cp >= 0x2300 && cp <= 0x23FF) ||             // misc technical (⏳, ⏰, etc.)
+      (cp >= 0x2700 && cp <= 0x27BF) ||             // dingbats
+      (cp >= 0xFE10 && cp <= 0xFE19) ||             // vertical forms
+      (cp >= 0xFF01 && cp <= 0xFF60) ||             // fullwidth ASCII
+      (cp >= 0xFFE0 && cp <= 0xFFE6) ||             // fullwidth signs
+      (cp >= 0x4E00 && cp <= 0x9FFF) ||             // CJK unified
+      (cp >= 0x3000 && cp <= 0x303F) ||             // CJK symbols
+      (cp >= 0x2B50 && cp <= 0x2B55) ||             // stars, circles
+      cp === 0x2705 || cp === 0x2714 || cp === 0x2716 || // check/cross marks
+      cp === 0x26A0                                  // ⚠ warning sign
+    ) {
+      w += 2
+    } else {
+      w += 1
+    }
+  }
+  return w
+}
+// 📖 Left-pad (padEnd equivalent) using display width instead of string length.
+// 📖 Ensures columns with emoji text align correctly in the terminal.
+function padEndDisplay(str, width) {
+  const dw = displayWidth(str)
+  const need = Math.max(0, width - dw)
+  return str + ' '.repeat(need)
+}
 // 📖 Tint overlay lines with a fixed dark panel width so the background is clearly visible.
 function tintOverlayLines(lines, bgColor) {
   return lines.map((line) => {
@@ -904,6 +945,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
   const W_AVG = 11
   const W_STATUS = 18
   const W_VERDICT = 14
+  const W_STAB = 11
   const W_UPTIME = 6
   // 📖 Sort models using the shared helper
@@ -933,6 +975,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
   const avgH     = sortColumn === 'avg' ? dir + ' Avg Ping' : 'Avg Ping'
   const healthH  = sortColumn === 'condition' ? dir + ' Health' : 'Health'
   const verdictH = sortColumn === 'verdict' ? dir + ' Verdict' : 'Verdict'
+  const stabH    = sortColumn === 'stability' ? dir + ' Stability' : 'Stability'
   const uptimeH  = sortColumn === 'uptime' ? dir + ' Up%' : 'Up%'
   // 📖 Helper to colorize first letter for keyboard shortcuts
@@ -948,10 +991,14 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
   // 📖 Now colorize after padding is calculated on plain text
   const rankH_c    = colorFirst(rankH, W_RANK)
   const tierH_c    = colorFirst('Tier', W_TIER)
-  const originLabel = 'Origin(N)'
+  const originLabel = 'Origin'
   const originH_c  = sortColumn === 'origin'
     ? chalk.bold.cyan(originLabel.padEnd(W_SOURCE))
-    : (originFilterMode > 0 ? chalk.bold.rgb(100, 200, 255)(originLabel.padEnd(W_SOURCE)) : colorFirst(originLabel, W_SOURCE))
+    : (originFilterMode > 0 ? chalk.bold.rgb(100, 200, 255)(originLabel.padEnd(W_SOURCE)) : (() => {
+      // 📖 Custom colorization for Origin: highlight 'N' (the filter key) at the end
+      const padding = ' '.repeat(Math.max(0, W_SOURCE - originLabel.length))
+      return chalk.dim('Origi') + chalk.yellow('N') + chalk.dim(padding)
+    })())
   const modelH_c   = colorFirst(modelH, W_MODEL)
   const sweH_c     = sortColumn === 'swe' ? chalk.bold.cyan(sweH.padEnd(W_SWE)) : colorFirst(sweH, W_SWE)
   const ctxH_c     = sortColumn === 'ctx' ? chalk.bold.cyan(ctxH.padEnd(W_CTX)) : colorFirst(ctxH, W_CTX)
@@ -959,10 +1006,16 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
   const avgH_c     = sortColumn === 'avg' ? chalk.bold.cyan(avgH.padEnd(W_AVG)) : colorFirst('Avg Ping', W_AVG)
   const healthH_c  = sortColumn === 'condition' ? chalk.bold.cyan(healthH.padEnd(W_STATUS)) : colorFirst('Health', W_STATUS)
   const verdictH_c = sortColumn === 'verdict' ? chalk.bold.cyan(verdictH.padEnd(W_VERDICT)) : colorFirst(verdictH, W_VERDICT)
-  const uptimeH_c  = sortColumn === 'uptime' ? chalk.bold.cyan(uptimeH.padStart(W_UPTIME)) : colorFirst(uptimeH, W_UPTIME, chalk.green)
+  // 📖 Custom colorization for Stability: highlight 'B' (the sort key) since 'S' is taken by SWE
+  const stabH_c    = sortColumn === 'stability' ? chalk.bold.cyan(stabH.padEnd(W_STAB)) : (() => {
+    const plain = 'Stability'
+    const padding = ' '.repeat(Math.max(0, W_STAB - plain.length))
+    return chalk.dim('Sta') + chalk.white.bold('B') + chalk.dim('ility' + padding)
+  })()
+  const uptimeH_c  = sortColumn === 'uptime' ? chalk.bold.cyan(uptimeH.padEnd(W_UPTIME)) : colorFirst(uptimeH, W_UPTIME, chalk.green)
-  // 📖 Header with proper spacing (column order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Up%)
-  lines.push('  ' + rankH_c + '  ' + tierH_c + '  ' + sweH_c + '  ' + ctxH_c + '  ' + modelH_c + '  ' + originH_c + '  ' + pingH_c + '  ' + avgH_c + '  ' + healthH_c + '  ' + verdictH_c + '  ' + uptimeH_c)
+  // 📖 Header with proper spacing (column order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Stability, Up%)
+  lines.push('  ' + rankH_c + '  ' + tierH_c + '  ' + sweH_c + '  ' + ctxH_c + '  ' + modelH_c + '  ' + originH_c + '  ' + pingH_c + '  ' + avgH_c + '  ' + healthH_c + '  ' + verdictH_c + '  ' + stabH_c + '  ' + uptimeH_c)
   // 📖 Separator line
   lines.push(
@@ -977,6 +1030,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
     chalk.dim('─'.repeat(W_AVG)) + '  ' +
     chalk.dim('─'.repeat(W_STATUS)) + '  ' +
     chalk.dim('─'.repeat(W_VERDICT)) + '  ' +
+    chalk.dim('─'.repeat(W_STAB)) + '  ' +
     chalk.dim('─'.repeat(W_UPTIME))
   )
@@ -999,16 +1053,32 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
     // 📖 Show provider name from sources map (NIM / Groq / Cerebras)
     const providerName = sources[r.providerKey]?.name ?? r.providerKey ?? 'NIM'
     const source = chalk.green(providerName.padEnd(W_SOURCE))
-    // 📖 Favorites get a leading star in Model column.
-    const favoritePrefix = r.isFavorite ? '⭐ ' : ''
-    const nameWidth = Math.max(0, W_MODEL - favoritePrefix.length)
+    // 📖 Favorites: always reserve 2 display columns at the start of Model column.
+    // 📖 ⭐ (2 cols) for favorites, '  ' (2 spaces) for non-favorites — keeps alignment stable.
+    const favoritePrefix = r.isFavorite ? '⭐' : '  '
+    const prefixDisplayWidth = 2
+    const nameWidth = Math.max(0, W_MODEL - prefixDisplayWidth)
     const name = favoritePrefix + r.label.slice(0, nameWidth).padEnd(nameWidth)
     const sweScore = r.sweScore ?? '—'
-    const sweCell = sweScore !== '—' && parseFloat(sweScore) >= 50
-      ? chalk.greenBright(sweScore.padEnd(W_SWE))
-      : sweScore !== '—' && parseFloat(sweScore) >= 30
-      ? chalk.yellow(sweScore.padEnd(W_SWE))
-      : chalk.dim(sweScore.padEnd(W_SWE))
+    // 📖 SWE% colorized on the same gradient as Tier:
+    //   ≥70% bright neon green (S+), ≥60% green (S), ≥50% yellow-green (A+),
+    //   ≥40% yellow (A), ≥35% amber (A-), ≥30% orange-red (B+),
+    //   ≥20% red (B), <20% dark red (C), '—' dim
+    let sweCell
+    if (sweScore === '—') {
+      sweCell = chalk.dim(sweScore.padEnd(W_SWE))
+    } else {
+      const sweVal = parseFloat(sweScore)
+      const swePadded = sweScore.padEnd(W_SWE)
+      if (sweVal >= 70)      sweCell = chalk.bold.rgb(0,   255,  80)(swePadded)
+      else if (sweVal >= 60) sweCell = chalk.bold.rgb(80,  220,   0)(swePadded)
+      else if (sweVal >= 50) sweCell = chalk.bold.rgb(170, 210,   0)(swePadded)
+      else if (sweVal >= 40) sweCell = chalk.rgb(240, 190,   0)(swePadded)
+      else if (sweVal >= 35) sweCell = chalk.rgb(255, 130,   0)(swePadded)
+      else if (sweVal >= 30) sweCell = chalk.rgb(255,  70,   0)(swePadded)
+      else if (sweVal >= 20) sweCell = chalk.rgb(210,  20,   0)(swePadded)
+      else                   sweCell = chalk.rgb(140,   0,   0)(swePadded)
+    }
     // 📖 Context window column - colorized by size (larger = better)
     const ctxRaw = r.ctx ?? '—'
@@ -1023,7 +1093,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
     const latestPing = r.pings.length > 0 ? r.pings[r.pings.length - 1] : null
     let pingCell
     if (!latestPing) {
-      pingCell = chalk.dim('—'.padEnd(W_PING))
+      pingCell = chalk.dim('———'.padEnd(W_PING))
     } else if (latestPing.code === '200') {
       // 📖 Success - show response time
       const str = String(latestPing.ms).padEnd(W_PING)
@@ -1032,8 +1102,8 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
       // 📖 401 = no API key but server IS reachable — still show latency in dim
       pingCell = chalk.dim(String(latestPing.ms).padEnd(W_PING))
     } else {
-      // 📖 Error or timeout - show "—" (error code is already in Status column)
-      pingCell = chalk.dim('—'.padEnd(W_PING))
+      // 📖 Error or timeout - show "———" (error code is already in Status column)
+      pingCell = chalk.dim('———'.padEnd(W_PING))
     }
     // 📖 Avg ping (just number, no "ms")
@@ -1043,7 +1113,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
       const str = String(avg).padEnd(W_AVG)
       avgCell = avg < 500 ? chalk.greenBright(str) : avg < 1500 ? chalk.yellow(str) : chalk.red(str)
     } else {
-      avgCell = chalk.dim('—'.padEnd(W_AVG))
+      avgCell = chalk.dim('———'.padEnd(W_AVG))
     }
     // 📖 Status column - build plain text with emoji, pad, then colorize
@@ -1080,64 +1150,99 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
       statusText = '?'
       statusColor = (s) => chalk.dim(s)
     }
-    const status = statusColor(statusText.padEnd(W_STATUS))
+    const status = statusColor(padEndDisplay(statusText, W_STATUS))
-    // 📖 Verdict column - build plain text with emoji, pad, then colorize
-    const wasUpBefore = r.pings.length > 0 && r.pings.some(p => p.code === '200')
+    // 📖 Verdict column - use getVerdict() for stability-aware verdicts, then render with emoji
+    const verdict = getVerdict(r)
     let verdictText, verdictColor
-    if (r.httpCode === '429') {
-      verdictText = '🔥 Overloaded'
-      verdictColor = (s) => chalk.yellow.bold(s)
-    } else if ((r.status === 'timeout' || r.status === 'down') && wasUpBefore) {
-      verdictText = '⚠️ Unstable'
-      verdictColor = (s) => chalk.magenta(s)
-    } else if (r.status === 'timeout' || r.status === 'down') {
-      verdictText = '👻 Not Active'
-      verdictColor = (s) => chalk.dim(s)
-    } else if (avg === Infinity) {
-      verdictText = '⏳ Pending'
-      verdictColor = (s) => chalk.dim(s)
-    } else if (avg < 400) {
-      verdictText = '🚀 Perfect'
-      verdictColor = (s) => chalk.greenBright(s)
-    } else if (avg < 1000) {
-      verdictText = '✅ Normal'
-      verdictColor = (s) => chalk.cyan(s)
-    } else if (avg < 3000) {
-      verdictText = '🐢 Slow'
-      verdictColor = (s) => chalk.yellow(s)
-    } else if (avg < 5000) {
-      verdictText = '🐌 Very Slow'
-      verdictColor = (s) => chalk.red(s)
+    // 📖 Verdict colors follow the same green→red gradient as TIER_COLOR / SWE%
+    switch (verdict) {
+      case 'Perfect':
+        verdictText = 'Perfect 🚀'
+        verdictColor = (s) => chalk.bold.rgb(0, 255, 180)(s)    // bright cyan-green — stands out from Normal
+        break
+      case 'Normal':
+        verdictText = 'Normal ✅'
+        verdictColor = (s) => chalk.bold.rgb(140, 200, 0)(s)    // lime-yellow — clearly warmer than Perfect
+        break
+      case 'Spiky':
+        verdictText = 'Spiky 📈'
+        verdictColor = (s) => chalk.bold.rgb(170, 210, 0)(s)    // A+ yellow-green
+        break
+      case 'Slow':
+        verdictText = 'Slow 🐢'
+        verdictColor = (s) => chalk.bold.rgb(255, 130, 0)(s)    // A- amber
+        break
+      case 'Very Slow':
+        verdictText = 'Very Slow 🐌'
+        verdictColor = (s) => chalk.bold.rgb(255, 70, 0)(s)     // B+ orange-red
+        break
+      case 'Overloaded':
+        verdictText = 'Overloaded 🔥'
+        verdictColor = (s) => chalk.bold.rgb(210, 20, 0)(s)     // B red
+        break
+      case 'Unstable':
+        verdictText = 'Unstable ⚠️'
+        verdictColor = (s) => chalk.bold.rgb(175, 10, 0)(s)     // between B and C
+        break
+      case 'Not Active':
+        verdictText = 'Not Active 👻'
+        verdictColor = (s) => chalk.dim(s)
+        break
+      case 'Pending':
+        verdictText = 'Pending ⏳'
+        verdictColor = (s) => chalk.dim(s)
+        break
+      default:
+        verdictText = 'Unusable 💀'
+        verdictColor = (s) => chalk.bold.rgb(140, 0, 0)(s)      // C dark red
+        break
+    }
+    // 📖 Use padEndDisplay to account for emoji display width (2 cols each) so all rows align
+    const speedCell = verdictColor(padEndDisplay(verdictText, W_VERDICT))
+    // 📖 Stability column - composite score (0–100) from p95 + jitter + spikes + uptime
+    // 📖 Left-aligned to sit flush under the column header
+    const stabScore = getStabilityScore(r)
+    let stabCell
+    if (stabScore < 0) {
+      stabCell = chalk.dim('———'.padEnd(W_STAB))
+    } else if (stabScore >= 80) {
+      stabCell = chalk.greenBright(String(stabScore).padEnd(W_STAB))
+    } else if (stabScore >= 60) {
+      stabCell = chalk.cyan(String(stabScore).padEnd(W_STAB))
+    } else if (stabScore >= 40) {
+      stabCell = chalk.yellow(String(stabScore).padEnd(W_STAB))
     } else {
-      verdictText = '💀 Unusable'
-      verdictColor = (s) => chalk.red.bold(s)
+      stabCell = chalk.red(String(stabScore).padEnd(W_STAB))
     }
-    const speedCell = verdictColor(verdictText.padEnd(W_VERDICT))
     // 📖 Uptime column - percentage of successful pings
+    // 📖 Left-aligned to sit flush under the column header
     const uptimePercent = getUptime(r)
     const uptimeStr = uptimePercent + '%'
     let uptimeCell
     if (uptimePercent >= 90) {
-      uptimeCell = chalk.greenBright(uptimeStr.padStart(W_UPTIME))
+      uptimeCell = chalk.greenBright(uptimeStr.padEnd(W_UPTIME))
     } else if (uptimePercent >= 70) {
-      uptimeCell = chalk.yellow(uptimeStr.padStart(W_UPTIME))
+      uptimeCell = chalk.yellow(uptimeStr.padEnd(W_UPTIME))
     } else if (uptimePercent >= 50) {
-      uptimeCell = chalk.rgb(255, 165, 0)(uptimeStr.padStart(W_UPTIME)) // orange
+      uptimeCell = chalk.rgb(255, 165, 0)(uptimeStr.padEnd(W_UPTIME)) // orange
     } else {
-      uptimeCell = chalk.red(uptimeStr.padStart(W_UPTIME))
+      uptimeCell = chalk.red(uptimeStr.padEnd(W_UPTIME))
     }
-    // 📖 Build row with double space between columns (order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Up%)
-    const row = '  ' + num + '  ' + tier + '  ' + sweCell + '  ' + ctxCell + '  ' + name + '  ' + source + '  ' + pingCell + '  ' + avgCell + '  ' + status + '  ' + speedCell + '  ' + uptimeCell
+    // 📖 When cursor is on this row, render Model and Origin in bright white for readability
+    const nameCell = isCursor ? chalk.white.bold(favoritePrefix + r.label.slice(0, nameWidth).padEnd(nameWidth)) : name
+    const sourceCell = isCursor ? chalk.white.bold(providerName.padEnd(W_SOURCE)) : source
-    if (isCursor && r.isFavorite) {
-      lines.push(chalk.bgRgb(120, 60, 0)(row))
-    } else if (isCursor) {
-      lines.push(chalk.bgRgb(139, 0, 139)(row))
+    // 📖 Build row with double space between columns (order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Stability, Up%)
+    const row = '  ' + num + '  ' + tier + '  ' + sweCell + '  ' + ctxCell + '  ' + nameCell + '  ' + sourceCell + '  ' + pingCell + '  ' + avgCell + '  ' + status + '  ' + speedCell + '  ' + stabCell + '  ' + uptimeCell
+    if (isCursor) {
+      lines.push(chalk.bgRgb(50, 0, 60)(row))
     } else if (r.isFavorite) {
-      lines.push(chalk.bgRgb(90, 45, 0)(row))
+      lines.push(chalk.bgRgb(35, 20, 0)(row))
     } else {
       lines.push(row)
     }
@@ -1156,19 +1261,24 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
     : mode === 'opencode-desktop'
       ? chalk.rgb(0, 200, 255)('Enter→OpenDesktop')
       : chalk.rgb(0, 200, 255)('Enter→OpenCode')
-  lines.push(chalk.dim(`  ↑↓ Navigate  •  `) + actionHint + chalk.dim(`  •  F Favorite  •  R/Y/O/M/L/A/S/C/H/V/U Sort  •  T Tier  •  N Origin  •  W↓/X↑ (${intervalSec}s)  •  Z Mode  •  `) + chalk.yellow('P') + chalk.dim(` Settings  •  `) + chalk.bgGreenBright.black.bold(' K Help ') + chalk.dim(`  •  Ctrl+C Exit`))
+  lines.push(chalk.dim(`  ↑↓ Navigate  •  `) + actionHint + chalk.dim(`  •  F Favorite  •  R/Y/O/M/L/A/S/C/H/V/B/U Sort  •  T Tier  •  N Origin  •  W↓/X↑ (${intervalSec}s)  •  `) + chalk.rgb(255, 100, 50).bold('Z Mode') + chalk.dim(`  •  `) + chalk.yellow('P') + chalk.dim(` Settings  •  `) + chalk.rgb(0, 255, 80).bold('K Help'))
   lines.push('')
   lines.push(
     chalk.rgb(255, 150, 200)('  Made with 💖 & ☕ by \x1b]8;;https://github.com/vava-nessa\x1b\\vava-nessa\x1b]8;;\x1b\\') +
     chalk.dim('  •  ') +
     '⭐ ' +
-    '\x1b]8;;https://github.com/vava-nessa/free-coding-models\x1b\\Star on GitHub\x1b]8;;\x1b\\' +
+    chalk.yellow('\x1b]8;;https://github.com/vava-nessa/free-coding-models\x1b\\Star on GitHub\x1b]8;;\x1b\\') +
     chalk.dim('  •  ') +
     '🤝 ' +
-    '\x1b]8;;https://github.com/vava-nessa/free-coding-models/graphs/contributors\x1b\\Contributors\x1b]8;;\x1b\\'
+    chalk.rgb(255, 165, 0)('\x1b]8;;https://github.com/vava-nessa/free-coding-models/graphs/contributors\x1b\\Contributors\x1b]8;;\x1b\\') +
+    chalk.dim('  •  ') +
+    '💬 ' +
+    chalk.rgb(200, 150, 255)('\x1b]8;;https://discord.gg/5MbTnDC3Md\x1b\\Discord\x1b]8;;\x1b\\') +
+    chalk.dim(' → ') +
+    chalk.rgb(200, 150, 255)('https://discord.gg/5MbTnDC3Md') +
+    chalk.dim('  •  ') +
+    chalk.dim('Ctrl+C Exit')
   )
-  // 📖 Discord invite + BETA warning — always visible at the bottom of the TUI
-  lines.push('  💬 ' + chalk.cyanBright('\x1b]8;;https://discord.gg/5MbTnDC3Md\x1b\\Join our Discord\x1b]8;;\x1b\\') + chalk.dim(' → ') + chalk.cyanBright('https://discord.gg/5MbTnDC3Md') + chalk.dim('  •  ') + chalk.yellow('⚠ BETA TUI') + chalk.dim(' — might crash or have problems'))
   lines.push('')
   // 📖 Append \x1b[K (erase to EOL) to each line so leftover chars from previous
   // 📖 frames are cleared. Then pad with blank cleared lines to fill the terminal,
@@ -2684,17 +2794,51 @@ async function main() {
     const lines = []
     lines.push('')
     lines.push(`  ${chalk.bold('❓ Keyboard Shortcuts')}  ${chalk.dim('— ↑↓ / PgUp / PgDn / Home / End scroll • K or Esc close')}`)
+    lines.push('')
+    lines.push(`  ${chalk.bold('Columns')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Rank')}        SWE-bench rank (1 = best coding score)  ${chalk.dim('Sort:')} ${chalk.yellow('R')}`)
+    lines.push(`              ${chalk.dim('Quick glance at which model is objectively the best coder right now.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Tier')}        S+ / S / A+ / A / A- / B+ / B / C based on SWE-bench score  ${chalk.dim('Sort:')} ${chalk.yellow('Y')}`)
+    lines.push(`              ${chalk.dim('Skip the noise — S/S+ models solve real GitHub issues, C models are for light tasks.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('SWE%')}        SWE-bench score — coding ability benchmark (color-coded)  ${chalk.dim('Sort:')} ${chalk.yellow('S')}`)
+    lines.push(`              ${chalk.dim('The raw number behind the tier. Higher = better at writing, fixing, and refactoring code.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('CTX')}         Context window size (128k, 200k, 256k, 1m, etc.)  ${chalk.dim('Sort:')} ${chalk.yellow('C')}`)
+    lines.push(`              ${chalk.dim('Bigger context = the model can read more of your codebase at once without forgetting.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Model')}       Model name (⭐ = favorited, pinned at top)  ${chalk.dim('Sort:')} ${chalk.yellow('M')}  ${chalk.dim('Favorite:')} ${chalk.yellow('F')}`)
+    lines.push(`              ${chalk.dim('Star the ones you like — they stay pinned at the top across restarts.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Origin')}      Provider source (NIM, Groq, Cerebras, etc.)  ${chalk.dim('Sort:')} ${chalk.yellow('O')}  ${chalk.dim('Filter:')} ${chalk.yellow('N')}`)
+    lines.push(`              ${chalk.dim('Same model on different providers can have very different speed and uptime.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Latest')}      Most recent ping response time (ms)  ${chalk.dim('Sort:')} ${chalk.yellow('L')}`)
+    lines.push(`              ${chalk.dim('Shows how fast the server is responding right now — useful to catch live slowdowns.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Avg Ping')}    Average response time across all successful pings (ms)  ${chalk.dim('Sort:')} ${chalk.yellow('A')}`)
+    lines.push(`              ${chalk.dim('The long-term truth. Ignore lucky one-off pings, this tells you real everyday speed.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Health')}      Live status: ✅ UP / 🔥 429 / ⏳ TIMEOUT / ❌ ERR / 🔑 NO KEY  ${chalk.dim('Sort:')} ${chalk.yellow('H')}`)
+    lines.push(`              ${chalk.dim('Tells you instantly if a model is reachable or down — no guesswork needed.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Verdict')}     Overall assessment: Perfect / Normal / Spiky / Slow / Overloaded  ${chalk.dim('Sort:')} ${chalk.yellow('V')}`)
+    lines.push(`              ${chalk.dim('One-word summary so you don\'t have to cross-check speed, health, and stability yourself.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Stability')}   Composite 0–100 score: p95 + jitter + spike rate + uptime  ${chalk.dim('Sort:')} ${chalk.yellow('B')}`)
+    lines.push(`              ${chalk.dim('A fast model that randomly freezes is worse than a steady one. This catches that.')}`)
+    lines.push('')
+    lines.push(`  ${chalk.cyan('Up%')}         Uptime — ratio of successful pings to total pings  ${chalk.dim('Sort:')} ${chalk.yellow('U')}`)
+    lines.push(`              ${chalk.dim('If a model only works half the time, you\'ll waste time retrying. Higher = more reliable.')}`)
     lines.push('')
     lines.push(`  ${chalk.bold('Main TUI')}`)
     lines.push(`  ${chalk.bold('Navigation')}`)
     lines.push(`  ${chalk.yellow('↑↓')}           Navigate rows`)
     lines.push(`  ${chalk.yellow('Enter')}        Select model and launch`)
     lines.push('')
-    lines.push(`  ${chalk.bold('Sorting')}`)
-    lines.push(`  ${chalk.yellow('R')} Rank  ${chalk.yellow('Y')} Tier  ${chalk.yellow('O')} Origin  ${chalk.yellow('M')} Model`)
-    lines.push(`  ${chalk.yellow('L')} Latest ping  ${chalk.yellow('A')} Avg ping  ${chalk.yellow('S')} SWE-bench score`)
-    lines.push(`  ${chalk.yellow('C')} Context window  ${chalk.yellow('H')} Health  ${chalk.yellow('V')} Verdict  ${chalk.yellow('U')} Uptime`)
-    lines.push('')
     lines.push(`  ${chalk.bold('Filters')}`)
     lines.push(`  ${chalk.yellow('T')}  Cycle tier filter  ${chalk.dim('(All → S+ → S → A+ → A → A- → B+ → B → C → All)')}`)
     lines.push(`  ${chalk.yellow('N')}  Cycle origin filter  ${chalk.dim('(All → NIM → Groq → Cerebras → ... each provider → All)')}`)
@@ -2994,12 +3138,12 @@ async function main() {
       return
     }
-    // 📖 Sorting keys: R=rank, Y=tier, O=origin, M=model, L=latest ping, A=avg ping, S=SWE-bench, C=context, H=health, V=verdict, U=uptime
+    // 📖 Sorting keys: R=rank, Y=tier, O=origin, M=model, L=latest ping, A=avg ping, S=SWE-bench, C=context, H=health, V=verdict, B=stability, U=uptime
     // 📖 T is reserved for tier filter cycling — tier sort moved to Y
     // 📖 N is now reserved for origin filter cycling
     const sortKeys = {
       'r': 'rank', 'y': 'tier', 'o': 'origin', 'm': 'model',
-      'l': 'ping', 'a': 'avg', 's': 'swe', 'c': 'ctx', 'h': 'condition', 'v': 'verdict', 'u': 'uptime'
+      'l': 'ping', 'a': 'avg', 's': 'swe', 'c': 'ctx', 'h': 'condition', 'v': 'verdict', 'b': 'stability', 'u': 'uptime'
     }
     if (sortKeys[key.name] && !key.ctrl) {

package/lib/utils.js CHANGED Viewed

@@ -27,14 +27,18 @@
  *
  * @functions
  *   → getAvg(result) — Calculate average latency from successful pings only
- *   → getVerdict(result) — Determine model health verdict based on avg latency and status
+ *   → getVerdict(result) — Determine model health verdict based on avg latency and stability
  *   → getUptime(result) — Calculate uptime percentage (successful / total pings)
+ *   → getP95(result) — Calculate 95th percentile latency from successful pings
+ *   → getJitter(result) — Calculate latency standard deviation (jitter)
+ *   → getStabilityScore(result) — Composite 0–100 stability score (p95 + jitter + spikes + uptime)
  *   → sortResults(results, sortColumn, sortDirection) — Sort model results by any column
  *   → filterByTier(results, tierLetter) — Filter results by tier letter (S/A/B/C)
- *   → findBestModel(results) — Pick the best model by status → avg → uptime priority
+ *   → findBestModel(results) — Pick the best model by status → avg → stability → uptime priority
  *   → parseArgs(argv) — Parse CLI arguments into structured flags and values
  *
- * @exports getAvg, getVerdict, getUptime, sortResults, filterByTier, findBestModel, parseArgs
+ * @exports getAvg, getVerdict, getUptime, getP95, getJitter, getStabilityScore
+ * @exports sortResults, filterByTier, findBestModel, parseArgs
  * @exports TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP
  *
  * @see bin/free-coding-models.js — main CLI that imports these utils
@@ -54,7 +58,7 @@ export const TIER_ORDER = ['S+', 'S', 'A+', 'A', 'A-', 'B+', 'B', 'C']
 // 📖 Used by sortResults when sorting by the "verdict" column.
 // 📖 "Perfect" means < 400ms avg, "Pending" means no data yet.
 // 📖 The order matters — it determines sort rank in the TUI table.
-export const VERDICT_ORDER = ['Perfect', 'Normal', 'Slow', 'Very Slow', 'Overloaded', 'Unstable', 'Not Active', 'Pending']
+export const VERDICT_ORDER = ['Perfect', 'Normal', 'Slow', 'Spiky', 'Very Slow', 'Overloaded', 'Unstable', 'Not Active', 'Pending']
 // 📖 Maps a CLI tier letter (--tier S/A/B/C) to the full tier strings it includes.
 // 📖 Example: --tier A matches A+, A, and A- models (all "A-family" tiers).
@@ -91,11 +95,17 @@ export const getAvg = (r) => {
 //   2. Timeout/down BUT was previously up → "Unstable" (it worked before, now it doesn't)
 //   3. Timeout/down and never worked → "Not Active" (model might be offline)
 //   4. No successful pings yet → "Pending" (still waiting for first response)
-//   5. Avg < 400ms → "Perfect"
-//   6. Avg < 1000ms → "Normal"
-//   7. Avg < 3000ms → "Slow"
-//   8. Avg < 5000ms → "Very Slow"
-//   9. Avg >= 5000ms → "Unstable"
+//   5. Stability-aware speed tiers (avg + p95/jitter penalty):
+//      - Avg < 400ms + stable → "Perfect"
+//      - Avg < 400ms but spiky p95 → "Spiky" (fast on average, but tail latency hurts)
+//      - Avg < 1000ms → "Normal"
+//      - Avg < 3000ms → "Slow"
+//      - Avg < 5000ms → "Very Slow"
+//      - Avg >= 5000ms → "Unstable"
+//
+// 📖 The "Spiky" verdict catches models that look fast on paper (low avg) but randomly
+//    stall your IDE/agent with tail-latency spikes. A model with avg 250ms but p95 6000ms
+//    gets downgraded from "Perfect" to "Spiky" — because consistency matters more than speed.
 //
 // 📖 The "wasUpBefore" check is key — it distinguishes between a model that's
 //    temporarily flaky vs one that was never reachable in the first place.
@@ -107,8 +117,20 @@ export const getVerdict = (r) => {
   if ((r.status === 'timeout' || r.status === 'down') && wasUpBefore) return 'Unstable'
   if (r.status === 'timeout' || r.status === 'down') return 'Not Active'
   if (avg === Infinity) return 'Pending'
-  if (avg < 400) return 'Perfect'
-  if (avg < 1000) return 'Normal'
+  // 📖 Stability-aware verdict: penalize models with good avg but terrible tail latency
+  const successfulPings = (r.pings || []).filter(p => p.code === '200')
+  const p95 = getP95(r)
+  if (avg < 400) {
+    // 📖 Only flag as "Spiky" when we have enough data (≥3 pings) to judge stability
+    if (successfulPings.length >= 3 && p95 > 3000) return 'Spiky'
+    return 'Perfect'
+  }
+  if (avg < 1000) {
+    if (successfulPings.length >= 3 && p95 > 5000) return 'Spiky'
+    return 'Normal'
+  }
   if (avg < 3000) return 'Slow'
   if (avg < 5000) return 'Very Slow'
   if (avg < 10000) return 'Unstable'
@@ -125,21 +147,84 @@ export const getUptime = (r) => {
   return Math.round((successful / r.pings.length) * 100)
 }
+// 📖 getP95: Calculate the 95th percentile latency from successful pings (HTTP 200).
+// 📖 The p95 answers: "95% of requests are faster than this value."
+// 📖 A low p95 means consistently fast responses — a high p95 signals tail-latency spikes.
+// 📖 Returns Infinity when no successful pings exist.
+//
+// 📖 Algorithm: sort latencies ascending, pick the value at ceil(N * 0.95) - 1.
+// 📖 Example: [100, 200, 300, 400, 5000] → p95 index = ceil(5 * 0.95) - 1 = 4 → 5000ms
+export const getP95 = (r) => {
+  const successfulPings = (r.pings || []).filter(p => p.code === '200')
+  if (successfulPings.length === 0) return Infinity
+  const sorted = successfulPings.map(p => p.ms).sort((a, b) => a - b)
+  const idx = Math.ceil(sorted.length * 0.95) - 1
+  return sorted[Math.max(0, idx)]
+}
+// 📖 getJitter: Calculate latency standard deviation (σ) from successful pings.
+// 📖 Low jitter = predictable response times. High jitter = erratic, spiky latency.
+// 📖 Returns 0 when fewer than 2 successful pings (can't compute variance from 1 point).
+// 📖 Uses population σ (divides by N, not N-1) since we have ALL the data, not a sample.
+export const getJitter = (r) => {
+  const successfulPings = (r.pings || []).filter(p => p.code === '200')
+  if (successfulPings.length < 2) return 0
+  const mean = successfulPings.reduce((a, b) => a + b.ms, 0) / successfulPings.length
+  const variance = successfulPings.reduce((sum, p) => sum + (p.ms - mean) ** 2, 0) / successfulPings.length
+  return Math.round(Math.sqrt(variance))
+}
+// 📖 getStabilityScore: Composite 0–100 score that rewards consistency and reliability.
+// 📖 Combines four signals into a single number:
+//   - p95 latency (30%) — penalizes tail-latency spikes
+//   - Jitter / σ (30%) — penalizes erratic response times
+//   - Spike rate (20%) — fraction of pings above 3000ms threshold
+//   - Uptime / reliability (20%) — fraction of successful pings
+//
+// 📖 Each component is normalized to 0–100, then weighted and combined.
+// 📖 Returns -1 when no successful pings exist (not enough data yet).
+//
+// 📖 Example:
+//   Model A: avg 250ms, p95 6000ms (tons of spikes) → score ~30
+//   Model B: avg 400ms, p95 650ms (boringly consistent) → score ~85
+//   In real usage, Model B FEELS faster because it doesn't randomly stall.
+export const getStabilityScore = (r) => {
+  const successfulPings = (r.pings || []).filter(p => p.code === '200')
+  if (successfulPings.length === 0) return -1
+  const p95 = getP95(r)
+  const jitter = getJitter(r)
+  const uptime = getUptime(r)
+  const spikeCount = successfulPings.filter(p => p.ms > 3000).length
+  const spikeRate = spikeCount / successfulPings.length
+  // 📖 Normalize each component to 0–100 (higher = better)
+  const p95Score = Math.max(0, Math.min(100, 100 * (1 - p95 / 5000)))
+  const jitterScore = Math.max(0, Math.min(100, 100 * (1 - jitter / 2000)))
+  const spikeScore = Math.max(0, 100 * (1 - spikeRate))
+  const reliabilityScore = uptime
+  // 📖 Weighted composite: 30% p95, 30% jitter, 20% spikes, 20% reliability
+  const score = 0.3 * p95Score + 0.3 * jitterScore + 0.2 * spikeScore + 0.2 * reliabilityScore
+  return Math.round(score)
+}
 // 📖 sortResults: Sort the results array by any column the user can click/press in the TUI.
 // 📖 Returns a NEW array — never mutates the original (important for React-style re-renders).
 //
 // 📖 Supported columns (matching the keyboard shortcuts in the TUI):
-//   - 'rank'    (R key) — original index from sources.js
-//   - 'tier'    (T key) — tier hierarchy (S+ first, C last)
-//   - 'origin'  (O key) — provider name (all NIM for now, future-proofed)
-//   - 'model'   (M key) — alphabetical by display label
-//   - 'ping'    (L key) — last ping latency (only successful ones count)
-//   - 'avg'     (A key) — average latency across all successful pings
-//   - 'swe'     (S key) — SWE-bench score (higher is better)
-//   - 'ctx'     (N key) — context window size (larger is better)
-//   - 'condition' (H key) — health status (alphabetical)
-//   - 'verdict' (V key) — verdict order (Perfect → Pending)
-//   - 'uptime'  (U key) — uptime percentage
+//   - 'rank'      (R key) — original index from sources.js
+//   - 'tier'      (T key) — tier hierarchy (S+ first, C last)
+//   - 'origin'    (O key) — provider name (all NIM for now, future-proofed)
+//   - 'model'     (M key) — alphabetical by display label
+//   - 'ping'      (L key) — last ping latency (only successful ones count)
+//   - 'avg'       (A key) — average latency across all successful pings
+//   - 'swe'       (S key) — SWE-bench score (higher is better)
+//   - 'ctx'       (N key) — context window size (larger is better)
+//   - 'condition'  (H key) — health status (alphabetical)
+//   - 'verdict'   (V key) — verdict order (Perfect → Pending)
+//   - 'uptime'    (U key) — uptime percentage
+//   - 'stability' (B key) — stability score (0–100, higher = more stable)
 //
 // 📖 sortDirection 'asc' = ascending (smallest first), 'desc' = descending (largest first)
 export const sortResults = (results, sortColumn, sortDirection) => {
@@ -219,6 +304,11 @@ export const sortResults = (results, sortColumn, sortDirection) => {
       case 'uptime':
         cmp = getUptime(a) - getUptime(b)
         break
+      case 'stability':
+        // 📖 Sort by stability score — higher = more stable = better
+        // 📖 Models with no data (-1) sort to the bottom
+        cmp = getStabilityScore(a) - getStabilityScore(b)
+        break
     }
     // 📖 Flip comparison for descending order
@@ -242,16 +332,19 @@ export function filterByTier(results, tierLetter) {
 // 📖 findBestModel: Pick the single best model from a results array.
 // 📖 Used by --fiable mode to output the most reliable model after 10s of analysis.
 //
-// 📖 Selection priority (tri-key sort):
+// 📖 Selection priority (quad-key sort):
 //   1. Status: "up" models always beat non-up models
 //   2. Average latency: faster average wins (lower is better)
-//   3. Uptime %: higher uptime wins as tiebreaker
+//   3. Stability score: higher stability wins (more consistent = better)
+//   4. Uptime %: higher uptime wins as final tiebreaker
 //
 // 📖 Returns null if the array is empty.
 export function findBestModel(results) {
   const sorted = [...results].sort((a, b) => {
     const avgA = getAvg(a)
     const avgB = getAvg(b)
+    const stabilityA = getStabilityScore(a)
+    const stabilityB = getStabilityScore(b)
     const uptimeA = getUptime(a)
     const uptimeB = getUptime(b)
@@ -262,7 +355,10 @@ export function findBestModel(results) {
     // 📖 Priority 2: Lower average latency = faster = better
     if (avgA !== avgB) return avgA - avgB
-    // 📖 Priority 3: Higher uptime = more reliable = better (tiebreaker)
+    // 📖 Priority 3: Higher stability = more consistent = better
+    if (stabilityA !== stabilityB) return stabilityB - stabilityA
+    // 📖 Priority 4: Higher uptime = more reliable = better (final tiebreaker)
     return uptimeB - uptimeA
   })

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "free-coding-models",
-  "version": "0.1.66",
+  "version": "0.1.67",
   "description": "Find the fastest coding LLM models in seconds — ping free models from multiple providers, pick the best one for OpenCode, Cursor, or any AI coding assistant.",
   "keywords": [
     "nvidia",