free-coding-models 0.1.66 → 0.1.67

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -36,7 +36,9 @@
36
36
  <a href="#-requirements">Requirements</a> •
37
37
  <a href="#-installation">Installation</a> •
38
38
  <a href="#-usage">Usage</a> •
39
- <a href="#-models">Models</a> •
39
+ <a href="#-tui-columns">Columns</a> •
40
+ <a href="#-stability-score">Stability</a> •
41
+ <a href="#-coding-models">Models</a> •
40
42
  <a href="#-opencode-integration">OpenCode</a> •
41
43
  <a href="#-openclaw-integration">OpenClaw</a> •
42
44
  <a href="#-how-it-works">How it works</a>
@@ -52,9 +54,10 @@
52
54
  - **🚀 Parallel pings** — All models tested simultaneously via native `fetch`
53
55
  - **📊 Real-time animation** — Watch latency appear live in alternate screen buffer
54
56
  - **🏆 Smart ranking** — Top 3 fastest models highlighted with medals 🥇🥈🥉
55
- - **⏱ Continuous monitoring** — Pings all models every 2 seconds forever, never stops
57
+ - **⏱ Continuous monitoring** — Pings all models every 3 seconds forever, never stops
56
58
  - **📈 Rolling averages** — Avg calculated from ALL successful pings since start
57
59
  - **📊 Uptime tracking** — Percentage of successful pings shown in real-time
60
+ - **📐 Stability score** — Composite 0–100 score measuring consistency (p95, jitter, spikes, uptime) — a model with 400ms avg and stable responses beats a 250ms avg model that randomly spikes to 6s
58
61
  - **🔄 Auto-retry** — Timeout models keep getting retried, nothing is ever "given up on"
59
62
  - **🎮 Interactive selection** — Navigate with arrow keys directly in the table, press Enter to act
60
63
  - **🔀 Startup mode menu** — Choose between OpenCode and OpenClaw before the TUI launches
@@ -177,7 +180,7 @@ Use `↑↓` arrows to select, `Enter` to confirm. Then the TUI launches with yo
177
180
 
178
181
  **How it works:**
179
182
  1. **Ping phase** — All enabled models are pinged in parallel (up to 134 across 17 providers)
180
- 2. **Continuous monitoring** — Models are re-pinged every 2 seconds forever
183
+ 2. **Continuous monitoring** — Models are re-pinged every 3 seconds forever
181
184
  3. **Real-time updates** — Watch "Latest", "Avg", and "Up%" columns update live
182
185
  4. **Select anytime** — Use ↑↓ arrows to navigate, press Enter on a model to act
183
186
  5. **Smart detection** — Automatically detects if NVIDIA NIM is configured in OpenCode or OpenClaw
@@ -414,6 +417,92 @@ Current tier filter is shown in the header badge (e.g., `[Tier S]`)
414
417
 
415
418
  ---
416
419
 
420
+ ## 📊 TUI Columns
421
+
422
+ The main table displays one row per model with the following columns:
423
+
424
+ | Column | Sort key | Description |
425
+ |--------|----------|-------------|
426
+ | **Rank** | `R` | Position based on current sort order (medals for top 3: 🥇🥈🥉) |
427
+ | **Tier** | `Y` | SWE-bench tier (S+, S, A+, A, A-, B+, B, C) |
428
+ | **SWE%** | `S` | SWE-bench Verified score — the industry-standard benchmark for real GitHub issue resolution |
429
+ | **CTX** | `C` | Context window size in thousands of tokens (e.g. `128k`) |
430
+ | **Model** | `M` | Model display name (favorites show ⭐ prefix) |
431
+ | **Origin** | `N` | Provider name (NIM, Groq, Cerebras, etc.) — press `N` to cycle origin filter |
432
+ | **Latest Ping** | `L` | Most recent round-trip latency in milliseconds |
433
+ | **Avg Ping** | `A` | Rolling average of ALL successful pings since launch |
434
+ | **Health** | `H` | Current status: UP ✅, NO KEY 🔑, Timeout ⏳, Overloaded 🔥, Not Found 🚫 |
435
+ | **Verdict** | `V` | Health verdict based on avg latency + stability analysis (see below) |
436
+ | **Stability** | `B` | Composite 0–100 consistency score (see [Stability Score](#-stability-score)) |
437
+ | **Up%** | `U` | Uptime — percentage of successful pings out of total attempts |
438
+
439
+ ### Verdict values
440
+
441
+ The Verdict column combines average latency with stability analysis:
442
+
443
+ | Verdict | Meaning |
444
+ |---------|---------|
445
+ | **Perfect** | Avg < 400ms with stable p95/jitter |
446
+ | **Normal** | Avg < 1000ms, consistent responses |
447
+ | **Slow** | Avg 1000–2000ms |
448
+ | **Spiky** | Good avg but erratic tail latency (p95 >> avg) |
449
+ | **Very Slow** | Avg 2000–5000ms |
450
+ | **Overloaded** | Server returned 429/503 (rate limited or capacity hit) |
451
+ | **Unstable** | Was previously up but now timing out, or avg > 5000ms |
452
+ | **Not Active** | No successful pings yet |
453
+ | **Pending** | First ping still in flight |
454
+
455
+ ---
456
+
457
+ ## 📐 Stability Score
458
+
459
+ The **Stability** column (sort with `B` key) shows a composite 0–100 score that answers: *"How consistent and predictable is this model?"*
460
+
461
+ Average latency alone is misleading — a model averaging 250ms that randomly spikes to 6 seconds *feels* slower in practice than a steady 400ms model. The stability score captures this.
462
+
463
+ ### Formula
464
+
465
+ Four signals are normalized to 0–100 each, then combined with weights:
466
+
467
+ ```
468
+ Stability = 0.30 × p95_score
469
+ + 0.30 × jitter_score
470
+ + 0.20 × spike_score
471
+ + 0.20 × reliability_score
472
+ ```
473
+
474
+ | Component | Weight | What it measures | How it's normalized |
475
+ |-----------|--------|-----------------|---------------------|
476
+ | **p95 latency** | 30% | Tail-latency spikes — the worst 5% of response times | `100 × (1 - p95 / 5000)`, clamped to 0–100 |
477
+ | **Jitter (σ)** | 30% | Erratic response times — standard deviation of ping times | `100 × (1 - jitter / 2000)`, clamped to 0–100 |
478
+ | **Spike rate** | 20% | Fraction of pings above 3000ms | `100 × (1 - spikes / total_pings)` |
479
+ | **Reliability** | 20% | Uptime — fraction of successful HTTP 200 pings | Direct uptime percentage (0–100) |
480
+
481
+ ### Color coding
482
+
483
+ | Score | Color | Interpretation |
484
+ |-------|-------|----------------|
485
+ | **80–100** | Green | Rock-solid — very consistent, safe to rely on |
486
+ | **60–79** | Cyan | Good — occasional variance but generally stable |
487
+ | **40–59** | Yellow | Shaky — noticeable inconsistency |
488
+ | **< 40** | Red | Unreliable — frequent spikes or failures |
489
+ | **—** | Dim | No data yet (no successful pings) |
490
+
491
+ ### Example
492
+
493
+ Two models with similar average latency, very different real-world experience:
494
+
495
+ ```
496
+ Model A: avg 250ms, p95 6000ms, jitter 1800ms → Stability ~30 (red)
497
+ Model B: avg 400ms, p95 650ms, jitter 120ms → Stability ~85 (green)
498
+ ```
499
+
500
+ Model B is the better choice despite its higher average — it won't randomly stall your coding workflow.
501
+
502
+ > 💡 **Tip:** Sort by Stability (`B` key) after a few minutes of monitoring to find the models that deliver the most predictable performance.
503
+
504
+ ---
505
+
417
506
  ## 🔌 OpenCode Integration
418
507
 
419
508
  **The easiest way** — let `free-coding-models` do everything:
@@ -589,19 +678,19 @@ This script:
589
678
  ## ⚙️ How it works
590
679
 
591
680
  ```
592
- ┌─────────────────────────────────────────────────────────────┐
593
- │ 1. Enter alternate screen buffer (like vim/htop/less)
594
- │ 2. Ping ALL models in parallel
595
- │ 3. Display real-time table with Latest/Avg/Up% columns
596
- │ 4. Re-ping ALL models every 2 seconds (forever)
597
- │ 5. Update rolling averages from ALL successful pings
598
- │ 6. User can navigate with ↑↓ and select with Enter
599
- │ 7. On Enter (OpenCode): set model, launch OpenCode
600
- │ 8. On Enter (OpenClaw): update ~/.openclaw/openclaw.json
601
- └─────────────────────────────────────────────────────────────┘
681
+ ┌──────────────────────────────────────────────────────────────────┐
682
+ │ 1. Enter alternate screen buffer (like vim/htop/less)
683
+ │ 2. Ping ALL models in parallel
684
+ │ 3. Display real-time table with Latest/Avg/Stability/Up%
685
+ │ 4. Re-ping ALL models every 3 seconds (forever)
686
+ │ 5. Update rolling averages + stability scores per model
687
+ │ 6. User can navigate with ↑↓ and select with Enter
688
+ │ 7. On Enter (OpenCode): set model, launch OpenCode
689
+ │ 8. On Enter (OpenClaw): update ~/.openclaw/openclaw.json
690
+ └──────────────────────────────────────────────────────────────────┘
602
691
  ```
603
692
 
604
- **Result:** Continuous monitoring interface that stays open until you select a model or press Ctrl+C. Rolling averages give you accurate long-term latency data, uptime percentage tracks reliability, and you can configure your tool of choice with your chosen model in one keystroke.
693
+ **Result:** Continuous monitoring interface that stays open until you select a model or press Ctrl+C. Rolling averages give you accurate long-term latency data, the stability score reveals which models are truly consistent vs. deceptively spikey, and you can configure your tool of choice with one keystroke.
605
694
 
606
695
  ---
607
696
 
@@ -675,7 +764,7 @@ This script:
675
764
 
676
765
  **Configuration:**
677
766
  - **Ping timeout**: 15 seconds per attempt (slow models get more time)
678
- - **Ping interval**: 2 seconds between complete re-pings of all models (adjustable with W/X keys)
767
+ - **Ping interval**: 3 seconds between complete re-pings of all models (adjustable with W/X keys)
679
768
  - **Monitor mode**: Interface stays open forever, press Ctrl+C to exit
680
769
 
681
770
  **Flags:**
@@ -697,7 +786,7 @@ This script:
697
786
  **Keyboard shortcuts (main TUI):**
698
787
  - **↑↓** — Navigate models
699
788
  - **Enter** — Select model (launches OpenCode or sets OpenClaw default, depending on mode)
700
- - **R/Y/O/M/L/A/S/N/H/V/U** — Sort by Rank/Tier/Origin/Model/LatestPing/Avg/SWE/Ctx/Health/Verdict/Uptime
789
+ - **R/Y/O/M/L/A/S/N/H/V/B/U** — Sort by Rank/Tier/Origin/Model/LatestPing/Avg/SWE/Ctx/Health/Verdict/Stability/Uptime
701
790
  - **F** — Toggle favorite on selected model (⭐ in Model column, pinned at top)
702
791
  - **T** — Cycle tier filter (All → S+ → S → A+ → A → A- → B+ → B → C → All)
703
792
  - **Z** — Cycle mode (OpenCode CLI → OpenCode Desktop → OpenClaw)
@@ -772,5 +861,3 @@ We welcome contributions! Feel free to open issues, submit pull requests, or get
772
861
  For questions or issues, open a [GitHub issue](https://github.com/vava-nessa/free-coding-models/issues).
773
862
 
774
863
  💬 Let's talk about the project on Discord: https://discord.gg/5MbTnDC3Md
775
-
776
- > ⚠️ **free-coding-models is a BETA TUI** — it might crash or have problems. Use at your own risk and feel free to report issues!
@@ -23,7 +23,7 @@
23
23
  * - Settings screen (P key) to manage API keys, provider toggles, analytics, and manual updates
24
24
  * - Favorites system: toggle with F, pin rows to top, persist between sessions
25
25
  * - Uptime percentage tracking (successful pings / total pings)
26
- * - Sortable columns (R/Y/O/M/L/A/S/N/H/V/U keys)
26
+ * - Sortable columns (R/Y/O/M/L/A/S/N/H/V/B/U keys)
27
27
  * - Tier filtering via T key (cycles S+→S→A+→A→A-→B+→B→C→All)
28
28
  *
29
29
  * → Functions:
@@ -93,7 +93,7 @@ import { join, dirname } from 'path'
93
93
  import { createServer } from 'net'
94
94
  import { MODELS, sources } from '../sources.js'
95
95
  import { patchOpenClawModelsJson } from '../patch-openclaw-models.js'
96
- import { getAvg, getVerdict, getUptime, sortResults, filterByTier, findBestModel, parseArgs, TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP } from '../lib/utils.js'
96
+ import { getAvg, getVerdict, getUptime, getP95, getJitter, getStabilityScore, sortResults, filterByTier, findBestModel, parseArgs, TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP } from '../lib/utils.js'
97
97
  import { loadConfig, saveConfig, getApiKey, isProviderEnabled } from '../lib/config.js'
98
98
 
99
99
  const require = createRequire(import.meta.url)
@@ -717,7 +717,7 @@ const ALT_HOME = '\x1b[H'
717
717
  // 📖 This allows easy addition of new model sources beyond NVIDIA NIM
718
718
 
719
719
  const PING_TIMEOUT = 15_000 // 📖 15s per attempt before abort - slow models get more time
720
- const PING_INTERVAL = 2_000 // 📖 Ping all models every 2 seconds in continuous mode
720
+ const PING_INTERVAL = 3_000 // 📖 Ping all models every 3 seconds in continuous mode
721
721
 
722
722
  const FPS = 12
723
723
  const COL_MODEL = 22
@@ -767,6 +767,47 @@ function stripAnsi(input) {
767
767
  return String(input).replace(/\x1b\[[0-9;]*m/g, '').replace(/\x1b\][^\x1b]*\x1b\\/g, '')
768
768
  }
769
769
 
770
+ // 📖 Calculate display width of a string in terminal columns.
771
+ // 📖 Emojis and other wide characters occupy 2 columns, variation selectors (U+FE0F) are zero-width.
772
+ // 📖 This avoids pulling in a full `string-width` dependency for a lightweight CLI tool.
773
+ function displayWidth(str) {
774
+ const plain = stripAnsi(String(str))
775
+ let w = 0
776
+ for (const ch of plain) {
777
+ const cp = ch.codePointAt(0)
778
+ // Zero-width: variation selectors (FE00-FE0F), zero-width joiner/non-joiner, combining marks
779
+ if ((cp >= 0xFE00 && cp <= 0xFE0F) || cp === 0x200D || cp === 0x200C || cp === 0x20E3) continue
780
+ // Wide: CJK, emoji (most above U+1F000), fullwidth forms
781
+ if (
782
+ cp > 0x1F000 || // emoji & symbols
783
+ (cp >= 0x2600 && cp <= 0x27BF) || // misc symbols, dingbats
784
+ (cp >= 0x2300 && cp <= 0x23FF) || // misc technical (⏳, ⏰, etc.)
785
+ (cp >= 0x2700 && cp <= 0x27BF) || // dingbats
786
+ (cp >= 0xFE10 && cp <= 0xFE19) || // vertical forms
787
+ (cp >= 0xFF01 && cp <= 0xFF60) || // fullwidth ASCII
788
+ (cp >= 0xFFE0 && cp <= 0xFFE6) || // fullwidth signs
789
+ (cp >= 0x4E00 && cp <= 0x9FFF) || // CJK unified
790
+ (cp >= 0x3000 && cp <= 0x303F) || // CJK symbols
791
+ (cp >= 0x2B50 && cp <= 0x2B55) || // stars, circles
792
+ cp === 0x2705 || cp === 0x2714 || cp === 0x2716 || // check/cross marks
793
+ cp === 0x26A0 // ⚠ warning sign
794
+ ) {
795
+ w += 2
796
+ } else {
797
+ w += 1
798
+ }
799
+ }
800
+ return w
801
+ }
802
+
803
+ // 📖 Left-pad (padEnd equivalent) using display width instead of string length.
804
+ // 📖 Ensures columns with emoji text align correctly in the terminal.
805
+ function padEndDisplay(str, width) {
806
+ const dw = displayWidth(str)
807
+ const need = Math.max(0, width - dw)
808
+ return str + ' '.repeat(need)
809
+ }
810
+
770
811
  // 📖 Tint overlay lines with a fixed dark panel width so the background is clearly visible.
771
812
  function tintOverlayLines(lines, bgColor) {
772
813
  return lines.map((line) => {
@@ -904,6 +945,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
904
945
  const W_AVG = 11
905
946
  const W_STATUS = 18
906
947
  const W_VERDICT = 14
948
+ const W_STAB = 11
907
949
  const W_UPTIME = 6
908
950
 
909
951
  // 📖 Sort models using the shared helper
@@ -933,6 +975,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
933
975
  const avgH = sortColumn === 'avg' ? dir + ' Avg Ping' : 'Avg Ping'
934
976
  const healthH = sortColumn === 'condition' ? dir + ' Health' : 'Health'
935
977
  const verdictH = sortColumn === 'verdict' ? dir + ' Verdict' : 'Verdict'
978
+ const stabH = sortColumn === 'stability' ? dir + ' Stability' : 'Stability'
936
979
  const uptimeH = sortColumn === 'uptime' ? dir + ' Up%' : 'Up%'
937
980
 
938
981
  // 📖 Helper to colorize first letter for keyboard shortcuts
@@ -948,10 +991,14 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
948
991
  // 📖 Now colorize after padding is calculated on plain text
949
992
  const rankH_c = colorFirst(rankH, W_RANK)
950
993
  const tierH_c = colorFirst('Tier', W_TIER)
951
- const originLabel = 'Origin(N)'
994
+ const originLabel = 'Origin'
952
995
  const originH_c = sortColumn === 'origin'
953
996
  ? chalk.bold.cyan(originLabel.padEnd(W_SOURCE))
954
- : (originFilterMode > 0 ? chalk.bold.rgb(100, 200, 255)(originLabel.padEnd(W_SOURCE)) : colorFirst(originLabel, W_SOURCE))
997
+ : (originFilterMode > 0 ? chalk.bold.rgb(100, 200, 255)(originLabel.padEnd(W_SOURCE)) : (() => {
998
+ // 📖 Custom colorization for Origin: highlight 'N' (the filter key) at the end
999
+ const padding = ' '.repeat(Math.max(0, W_SOURCE - originLabel.length))
1000
+ return chalk.dim('Origi') + chalk.yellow('N') + chalk.dim(padding)
1001
+ })())
955
1002
  const modelH_c = colorFirst(modelH, W_MODEL)
956
1003
  const sweH_c = sortColumn === 'swe' ? chalk.bold.cyan(sweH.padEnd(W_SWE)) : colorFirst(sweH, W_SWE)
957
1004
  const ctxH_c = sortColumn === 'ctx' ? chalk.bold.cyan(ctxH.padEnd(W_CTX)) : colorFirst(ctxH, W_CTX)
@@ -959,10 +1006,16 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
959
1006
  const avgH_c = sortColumn === 'avg' ? chalk.bold.cyan(avgH.padEnd(W_AVG)) : colorFirst('Avg Ping', W_AVG)
960
1007
  const healthH_c = sortColumn === 'condition' ? chalk.bold.cyan(healthH.padEnd(W_STATUS)) : colorFirst('Health', W_STATUS)
961
1008
  const verdictH_c = sortColumn === 'verdict' ? chalk.bold.cyan(verdictH.padEnd(W_VERDICT)) : colorFirst(verdictH, W_VERDICT)
962
- const uptimeH_c = sortColumn === 'uptime' ? chalk.bold.cyan(uptimeH.padStart(W_UPTIME)) : colorFirst(uptimeH, W_UPTIME, chalk.green)
1009
+ // 📖 Custom colorization for Stability: highlight 'B' (the sort key) since 'S' is taken by SWE
1010
+ const stabH_c = sortColumn === 'stability' ? chalk.bold.cyan(stabH.padEnd(W_STAB)) : (() => {
1011
+ const plain = 'Stability'
1012
+ const padding = ' '.repeat(Math.max(0, W_STAB - plain.length))
1013
+ return chalk.dim('Sta') + chalk.white.bold('B') + chalk.dim('ility' + padding)
1014
+ })()
1015
+ const uptimeH_c = sortColumn === 'uptime' ? chalk.bold.cyan(uptimeH.padEnd(W_UPTIME)) : colorFirst(uptimeH, W_UPTIME, chalk.green)
963
1016
 
964
- // 📖 Header with proper spacing (column order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Up%)
965
- lines.push(' ' + rankH_c + ' ' + tierH_c + ' ' + sweH_c + ' ' + ctxH_c + ' ' + modelH_c + ' ' + originH_c + ' ' + pingH_c + ' ' + avgH_c + ' ' + healthH_c + ' ' + verdictH_c + ' ' + uptimeH_c)
1017
+ // 📖 Header with proper spacing (column order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Stability, Up%)
1018
+ lines.push(' ' + rankH_c + ' ' + tierH_c + ' ' + sweH_c + ' ' + ctxH_c + ' ' + modelH_c + ' ' + originH_c + ' ' + pingH_c + ' ' + avgH_c + ' ' + healthH_c + ' ' + verdictH_c + ' ' + stabH_c + ' ' + uptimeH_c)
966
1019
 
967
1020
  // 📖 Separator line
968
1021
  lines.push(
@@ -977,6 +1030,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
977
1030
  chalk.dim('─'.repeat(W_AVG)) + ' ' +
978
1031
  chalk.dim('─'.repeat(W_STATUS)) + ' ' +
979
1032
  chalk.dim('─'.repeat(W_VERDICT)) + ' ' +
1033
+ chalk.dim('─'.repeat(W_STAB)) + ' ' +
980
1034
  chalk.dim('─'.repeat(W_UPTIME))
981
1035
  )
982
1036
 
@@ -999,16 +1053,32 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
999
1053
  // 📖 Show provider name from sources map (NIM / Groq / Cerebras)
1000
1054
  const providerName = sources[r.providerKey]?.name ?? r.providerKey ?? 'NIM'
1001
1055
  const source = chalk.green(providerName.padEnd(W_SOURCE))
1002
- // 📖 Favorites get a leading star in Model column.
1003
- const favoritePrefix = r.isFavorite ? '⭐ ' : ''
1004
- const nameWidth = Math.max(0, W_MODEL - favoritePrefix.length)
1056
+ // 📖 Favorites: always reserve 2 display columns at the start of Model column.
1057
+ // 📖 (2 cols) for favorites, ' ' (2 spaces) for non-favorites — keeps alignment stable.
1058
+ const favoritePrefix = r.isFavorite ? '⭐' : ' '
1059
+ const prefixDisplayWidth = 2
1060
+ const nameWidth = Math.max(0, W_MODEL - prefixDisplayWidth)
1005
1061
  const name = favoritePrefix + r.label.slice(0, nameWidth).padEnd(nameWidth)
1006
1062
  const sweScore = r.sweScore ?? '—'
1007
- const sweCell = sweScore !== '—' && parseFloat(sweScore) >= 50
1008
- ? chalk.greenBright(sweScore.padEnd(W_SWE))
1009
- : sweScore !== '—' && parseFloat(sweScore) >= 30
1010
- ? chalk.yellow(sweScore.padEnd(W_SWE))
1011
- : chalk.dim(sweScore.padEnd(W_SWE))
1063
+ // 📖 SWE% colorized on the same gradient as Tier:
1064
+ // ≥70% bright neon green (S+), ≥60% green (S), ≥50% yellow-green (A+),
1065
+ // ≥40% yellow (A), ≥35% amber (A-), ≥30% orange-red (B+),
1066
+ // ≥20% red (B), <20% dark red (C), '—' dim
1067
+ let sweCell
1068
+ if (sweScore === '—') {
1069
+ sweCell = chalk.dim(sweScore.padEnd(W_SWE))
1070
+ } else {
1071
+ const sweVal = parseFloat(sweScore)
1072
+ const swePadded = sweScore.padEnd(W_SWE)
1073
+ if (sweVal >= 70) sweCell = chalk.bold.rgb(0, 255, 80)(swePadded)
1074
+ else if (sweVal >= 60) sweCell = chalk.bold.rgb(80, 220, 0)(swePadded)
1075
+ else if (sweVal >= 50) sweCell = chalk.bold.rgb(170, 210, 0)(swePadded)
1076
+ else if (sweVal >= 40) sweCell = chalk.rgb(240, 190, 0)(swePadded)
1077
+ else if (sweVal >= 35) sweCell = chalk.rgb(255, 130, 0)(swePadded)
1078
+ else if (sweVal >= 30) sweCell = chalk.rgb(255, 70, 0)(swePadded)
1079
+ else if (sweVal >= 20) sweCell = chalk.rgb(210, 20, 0)(swePadded)
1080
+ else sweCell = chalk.rgb(140, 0, 0)(swePadded)
1081
+ }
1012
1082
 
1013
1083
  // 📖 Context window column - colorized by size (larger = better)
1014
1084
  const ctxRaw = r.ctx ?? '—'
@@ -1023,7 +1093,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
1023
1093
  const latestPing = r.pings.length > 0 ? r.pings[r.pings.length - 1] : null
1024
1094
  let pingCell
1025
1095
  if (!latestPing) {
1026
- pingCell = chalk.dim(''.padEnd(W_PING))
1096
+ pingCell = chalk.dim('———'.padEnd(W_PING))
1027
1097
  } else if (latestPing.code === '200') {
1028
1098
  // 📖 Success - show response time
1029
1099
  const str = String(latestPing.ms).padEnd(W_PING)
@@ -1032,8 +1102,8 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
1032
1102
  // 📖 401 = no API key but server IS reachable — still show latency in dim
1033
1103
  pingCell = chalk.dim(String(latestPing.ms).padEnd(W_PING))
1034
1104
  } else {
1035
- // 📖 Error or timeout - show "" (error code is already in Status column)
1036
- pingCell = chalk.dim(''.padEnd(W_PING))
1105
+ // 📖 Error or timeout - show "———" (error code is already in Status column)
1106
+ pingCell = chalk.dim('———'.padEnd(W_PING))
1037
1107
  }
1038
1108
 
1039
1109
  // 📖 Avg ping (just number, no "ms")
@@ -1043,7 +1113,7 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
1043
1113
  const str = String(avg).padEnd(W_AVG)
1044
1114
  avgCell = avg < 500 ? chalk.greenBright(str) : avg < 1500 ? chalk.yellow(str) : chalk.red(str)
1045
1115
  } else {
1046
- avgCell = chalk.dim(''.padEnd(W_AVG))
1116
+ avgCell = chalk.dim('———'.padEnd(W_AVG))
1047
1117
  }
1048
1118
 
1049
1119
  // 📖 Status column - build plain text with emoji, pad, then colorize
@@ -1080,64 +1150,99 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
1080
1150
  statusText = '?'
1081
1151
  statusColor = (s) => chalk.dim(s)
1082
1152
  }
1083
- const status = statusColor(statusText.padEnd(W_STATUS))
1153
+ const status = statusColor(padEndDisplay(statusText, W_STATUS))
1084
1154
 
1085
- // 📖 Verdict column - build plain text with emoji, pad, then colorize
1086
- const wasUpBefore = r.pings.length > 0 && r.pings.some(p => p.code === '200')
1155
+ // 📖 Verdict column - use getVerdict() for stability-aware verdicts, then render with emoji
1156
+ const verdict = getVerdict(r)
1087
1157
  let verdictText, verdictColor
1088
- if (r.httpCode === '429') {
1089
- verdictText = '🔥 Overloaded'
1090
- verdictColor = (s) => chalk.yellow.bold(s)
1091
- } else if ((r.status === 'timeout' || r.status === 'down') && wasUpBefore) {
1092
- verdictText = '⚠️ Unstable'
1093
- verdictColor = (s) => chalk.magenta(s)
1094
- } else if (r.status === 'timeout' || r.status === 'down') {
1095
- verdictText = '👻 Not Active'
1096
- verdictColor = (s) => chalk.dim(s)
1097
- } else if (avg === Infinity) {
1098
- verdictText = '⏳ Pending'
1099
- verdictColor = (s) => chalk.dim(s)
1100
- } else if (avg < 400) {
1101
- verdictText = '🚀 Perfect'
1102
- verdictColor = (s) => chalk.greenBright(s)
1103
- } else if (avg < 1000) {
1104
- verdictText = '✅ Normal'
1105
- verdictColor = (s) => chalk.cyan(s)
1106
- } else if (avg < 3000) {
1107
- verdictText = '🐢 Slow'
1108
- verdictColor = (s) => chalk.yellow(s)
1109
- } else if (avg < 5000) {
1110
- verdictText = '🐌 Very Slow'
1111
- verdictColor = (s) => chalk.red(s)
1158
+ // 📖 Verdict colors follow the same green→red gradient as TIER_COLOR / SWE%
1159
+ switch (verdict) {
1160
+ case 'Perfect':
1161
+ verdictText = 'Perfect 🚀'
1162
+ verdictColor = (s) => chalk.bold.rgb(0, 255, 180)(s) // bright cyan-green — stands out from Normal
1163
+ break
1164
+ case 'Normal':
1165
+ verdictText = 'Normal '
1166
+ verdictColor = (s) => chalk.bold.rgb(140, 200, 0)(s) // lime-yellow — clearly warmer than Perfect
1167
+ break
1168
+ case 'Spiky':
1169
+ verdictText = 'Spiky 📈'
1170
+ verdictColor = (s) => chalk.bold.rgb(170, 210, 0)(s) // A+ yellow-green
1171
+ break
1172
+ case 'Slow':
1173
+ verdictText = 'Slow 🐢'
1174
+ verdictColor = (s) => chalk.bold.rgb(255, 130, 0)(s) // A- amber
1175
+ break
1176
+ case 'Very Slow':
1177
+ verdictText = 'Very Slow 🐌'
1178
+ verdictColor = (s) => chalk.bold.rgb(255, 70, 0)(s) // B+ orange-red
1179
+ break
1180
+ case 'Overloaded':
1181
+ verdictText = 'Overloaded 🔥'
1182
+ verdictColor = (s) => chalk.bold.rgb(210, 20, 0)(s) // B red
1183
+ break
1184
+ case 'Unstable':
1185
+ verdictText = 'Unstable ⚠️'
1186
+ verdictColor = (s) => chalk.bold.rgb(175, 10, 0)(s) // between B and C
1187
+ break
1188
+ case 'Not Active':
1189
+ verdictText = 'Not Active 👻'
1190
+ verdictColor = (s) => chalk.dim(s)
1191
+ break
1192
+ case 'Pending':
1193
+ verdictText = 'Pending ⏳'
1194
+ verdictColor = (s) => chalk.dim(s)
1195
+ break
1196
+ default:
1197
+ verdictText = 'Unusable 💀'
1198
+ verdictColor = (s) => chalk.bold.rgb(140, 0, 0)(s) // C dark red
1199
+ break
1200
+ }
1201
+ // 📖 Use padEndDisplay to account for emoji display width (2 cols each) so all rows align
1202
+ const speedCell = verdictColor(padEndDisplay(verdictText, W_VERDICT))
1203
+
1204
+ // 📖 Stability column - composite score (0–100) from p95 + jitter + spikes + uptime
1205
+ // 📖 Left-aligned to sit flush under the column header
1206
+ const stabScore = getStabilityScore(r)
1207
+ let stabCell
1208
+ if (stabScore < 0) {
1209
+ stabCell = chalk.dim('———'.padEnd(W_STAB))
1210
+ } else if (stabScore >= 80) {
1211
+ stabCell = chalk.greenBright(String(stabScore).padEnd(W_STAB))
1212
+ } else if (stabScore >= 60) {
1213
+ stabCell = chalk.cyan(String(stabScore).padEnd(W_STAB))
1214
+ } else if (stabScore >= 40) {
1215
+ stabCell = chalk.yellow(String(stabScore).padEnd(W_STAB))
1112
1216
  } else {
1113
- verdictText = '💀 Unusable'
1114
- verdictColor = (s) => chalk.red.bold(s)
1217
+ stabCell = chalk.red(String(stabScore).padEnd(W_STAB))
1115
1218
  }
1116
- const speedCell = verdictColor(verdictText.padEnd(W_VERDICT))
1117
1219
 
1118
1220
  // 📖 Uptime column - percentage of successful pings
1221
+ // 📖 Left-aligned to sit flush under the column header
1119
1222
  const uptimePercent = getUptime(r)
1120
1223
  const uptimeStr = uptimePercent + '%'
1121
1224
  let uptimeCell
1122
1225
  if (uptimePercent >= 90) {
1123
- uptimeCell = chalk.greenBright(uptimeStr.padStart(W_UPTIME))
1226
+ uptimeCell = chalk.greenBright(uptimeStr.padEnd(W_UPTIME))
1124
1227
  } else if (uptimePercent >= 70) {
1125
- uptimeCell = chalk.yellow(uptimeStr.padStart(W_UPTIME))
1228
+ uptimeCell = chalk.yellow(uptimeStr.padEnd(W_UPTIME))
1126
1229
  } else if (uptimePercent >= 50) {
1127
- uptimeCell = chalk.rgb(255, 165, 0)(uptimeStr.padStart(W_UPTIME)) // orange
1230
+ uptimeCell = chalk.rgb(255, 165, 0)(uptimeStr.padEnd(W_UPTIME)) // orange
1128
1231
  } else {
1129
- uptimeCell = chalk.red(uptimeStr.padStart(W_UPTIME))
1232
+ uptimeCell = chalk.red(uptimeStr.padEnd(W_UPTIME))
1130
1233
  }
1131
1234
 
1132
- // 📖 Build row with double space between columns (order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Up%)
1133
- const row = ' ' + num + ' ' + tier + ' ' + sweCell + ' ' + ctxCell + ' ' + name + ' ' + source + ' ' + pingCell + ' ' + avgCell + ' ' + status + ' ' + speedCell + ' ' + uptimeCell
1235
+ // 📖 When cursor is on this row, render Model and Origin in bright white for readability
1236
+ const nameCell = isCursor ? chalk.white.bold(favoritePrefix + r.label.slice(0, nameWidth).padEnd(nameWidth)) : name
1237
+ const sourceCell = isCursor ? chalk.white.bold(providerName.padEnd(W_SOURCE)) : source
1134
1238
 
1135
- if (isCursor && r.isFavorite) {
1136
- lines.push(chalk.bgRgb(120, 60, 0)(row))
1137
- } else if (isCursor) {
1138
- lines.push(chalk.bgRgb(139, 0, 139)(row))
1239
+ // 📖 Build row with double space between columns (order: Rank, Tier, SWE%, CTX, Model, Origin, Latest Ping, Avg Ping, Health, Verdict, Stability, Up%)
1240
+ const row = ' ' + num + ' ' + tier + ' ' + sweCell + ' ' + ctxCell + ' ' + nameCell + ' ' + sourceCell + ' ' + pingCell + ' ' + avgCell + ' ' + status + ' ' + speedCell + ' ' + stabCell + ' ' + uptimeCell
1241
+
1242
+ if (isCursor) {
1243
+ lines.push(chalk.bgRgb(50, 0, 60)(row))
1139
1244
  } else if (r.isFavorite) {
1140
- lines.push(chalk.bgRgb(90, 45, 0)(row))
1245
+ lines.push(chalk.bgRgb(35, 20, 0)(row))
1141
1246
  } else {
1142
1247
  lines.push(row)
1143
1248
  }
@@ -1156,19 +1261,24 @@ function renderTable(results, pendingPings, frame, cursor = null, sortColumn = '
1156
1261
  : mode === 'opencode-desktop'
1157
1262
  ? chalk.rgb(0, 200, 255)('Enter→OpenDesktop')
1158
1263
  : chalk.rgb(0, 200, 255)('Enter→OpenCode')
1159
- lines.push(chalk.dim(` ↑↓ Navigate • `) + actionHint + chalk.dim(` • F Favorite • R/Y/O/M/L/A/S/C/H/V/U Sort • T Tier • N Origin • W↓/X↑ (${intervalSec}s) • Z Mode • `) + chalk.yellow('P') + chalk.dim(` Settings • `) + chalk.bgGreenBright.black.bold(' K Help ') + chalk.dim(` • Ctrl+C Exit`))
1264
+ lines.push(chalk.dim(` ↑↓ Navigate • `) + actionHint + chalk.dim(` • F Favorite • R/Y/O/M/L/A/S/C/H/V/B/U Sort • T Tier • N Origin • W↓/X↑ (${intervalSec}s) • `) + chalk.rgb(255, 100, 50).bold('Z Mode') + chalk.dim(` • `) + chalk.yellow('P') + chalk.dim(` Settings • `) + chalk.rgb(0, 255, 80).bold('K Help'))
1160
1265
  lines.push('')
1161
1266
  lines.push(
1162
1267
  chalk.rgb(255, 150, 200)(' Made with 💖 & ☕ by \x1b]8;;https://github.com/vava-nessa\x1b\\vava-nessa\x1b]8;;\x1b\\') +
1163
1268
  chalk.dim(' • ') +
1164
1269
  '⭐ ' +
1165
- '\x1b]8;;https://github.com/vava-nessa/free-coding-models\x1b\\Star on GitHub\x1b]8;;\x1b\\' +
1270
+ chalk.yellow('\x1b]8;;https://github.com/vava-nessa/free-coding-models\x1b\\Star on GitHub\x1b]8;;\x1b\\') +
1166
1271
  chalk.dim(' • ') +
1167
1272
  '🤝 ' +
1168
- '\x1b]8;;https://github.com/vava-nessa/free-coding-models/graphs/contributors\x1b\\Contributors\x1b]8;;\x1b\\'
1273
+ chalk.rgb(255, 165, 0)('\x1b]8;;https://github.com/vava-nessa/free-coding-models/graphs/contributors\x1b\\Contributors\x1b]8;;\x1b\\') +
1274
+ chalk.dim(' • ') +
1275
+ '💬 ' +
1276
+ chalk.rgb(200, 150, 255)('\x1b]8;;https://discord.gg/5MbTnDC3Md\x1b\\Discord\x1b]8;;\x1b\\') +
1277
+ chalk.dim(' → ') +
1278
+ chalk.rgb(200, 150, 255)('https://discord.gg/5MbTnDC3Md') +
1279
+ chalk.dim(' • ') +
1280
+ chalk.dim('Ctrl+C Exit')
1169
1281
  )
1170
- // 📖 Discord invite + BETA warning — always visible at the bottom of the TUI
1171
- lines.push(' 💬 ' + chalk.cyanBright('\x1b]8;;https://discord.gg/5MbTnDC3Md\x1b\\Join our Discord\x1b]8;;\x1b\\') + chalk.dim(' → ') + chalk.cyanBright('https://discord.gg/5MbTnDC3Md') + chalk.dim(' • ') + chalk.yellow('⚠ BETA TUI') + chalk.dim(' — might crash or have problems'))
1172
1282
  lines.push('')
1173
1283
  // 📖 Append \x1b[K (erase to EOL) to each line so leftover chars from previous
1174
1284
  // 📖 frames are cleared. Then pad with blank cleared lines to fill the terminal,
@@ -2684,17 +2794,51 @@ async function main() {
2684
2794
  const lines = []
2685
2795
  lines.push('')
2686
2796
  lines.push(` ${chalk.bold('❓ Keyboard Shortcuts')} ${chalk.dim('— ↑↓ / PgUp / PgDn / Home / End scroll • K or Esc close')}`)
2797
+ lines.push('')
2798
+ lines.push(` ${chalk.bold('Columns')}`)
2799
+ lines.push('')
2800
+ lines.push(` ${chalk.cyan('Rank')} SWE-bench rank (1 = best coding score) ${chalk.dim('Sort:')} ${chalk.yellow('R')}`)
2801
+ lines.push(` ${chalk.dim('Quick glance at which model is objectively the best coder right now.')}`)
2802
+ lines.push('')
2803
+ lines.push(` ${chalk.cyan('Tier')} S+ / S / A+ / A / A- / B+ / B / C based on SWE-bench score ${chalk.dim('Sort:')} ${chalk.yellow('Y')}`)
2804
+ lines.push(` ${chalk.dim('Skip the noise — S/S+ models solve real GitHub issues, C models are for light tasks.')}`)
2805
+ lines.push('')
2806
+ lines.push(` ${chalk.cyan('SWE%')} SWE-bench score — coding ability benchmark (color-coded) ${chalk.dim('Sort:')} ${chalk.yellow('S')}`)
2807
+ lines.push(` ${chalk.dim('The raw number behind the tier. Higher = better at writing, fixing, and refactoring code.')}`)
2808
+ lines.push('')
2809
+ lines.push(` ${chalk.cyan('CTX')} Context window size (128k, 200k, 256k, 1m, etc.) ${chalk.dim('Sort:')} ${chalk.yellow('C')}`)
2810
+ lines.push(` ${chalk.dim('Bigger context = the model can read more of your codebase at once without forgetting.')}`)
2811
+ lines.push('')
2812
+ lines.push(` ${chalk.cyan('Model')} Model name (⭐ = favorited, pinned at top) ${chalk.dim('Sort:')} ${chalk.yellow('M')} ${chalk.dim('Favorite:')} ${chalk.yellow('F')}`)
2813
+ lines.push(` ${chalk.dim('Star the ones you like — they stay pinned at the top across restarts.')}`)
2814
+ lines.push('')
2815
+ lines.push(` ${chalk.cyan('Origin')} Provider source (NIM, Groq, Cerebras, etc.) ${chalk.dim('Sort:')} ${chalk.yellow('O')} ${chalk.dim('Filter:')} ${chalk.yellow('N')}`)
2816
+ lines.push(` ${chalk.dim('Same model on different providers can have very different speed and uptime.')}`)
2817
+ lines.push('')
2818
+ lines.push(` ${chalk.cyan('Latest')} Most recent ping response time (ms) ${chalk.dim('Sort:')} ${chalk.yellow('L')}`)
2819
+ lines.push(` ${chalk.dim('Shows how fast the server is responding right now — useful to catch live slowdowns.')}`)
2820
+ lines.push('')
2821
+ lines.push(` ${chalk.cyan('Avg Ping')} Average response time across all successful pings (ms) ${chalk.dim('Sort:')} ${chalk.yellow('A')}`)
2822
+ lines.push(` ${chalk.dim('The long-term truth. Ignore lucky one-off pings, this tells you real everyday speed.')}`)
2823
+ lines.push('')
2824
+ lines.push(` ${chalk.cyan('Health')} Live status: ✅ UP / 🔥 429 / ⏳ TIMEOUT / ❌ ERR / 🔑 NO KEY ${chalk.dim('Sort:')} ${chalk.yellow('H')}`)
2825
+ lines.push(` ${chalk.dim('Tells you instantly if a model is reachable or down — no guesswork needed.')}`)
2826
+ lines.push('')
2827
+ lines.push(` ${chalk.cyan('Verdict')} Overall assessment: Perfect / Normal / Spiky / Slow / Overloaded ${chalk.dim('Sort:')} ${chalk.yellow('V')}`)
2828
+ lines.push(` ${chalk.dim('One-word summary so you don\'t have to cross-check speed, health, and stability yourself.')}`)
2829
+ lines.push('')
2830
+ lines.push(` ${chalk.cyan('Stability')} Composite 0–100 score: p95 + jitter + spike rate + uptime ${chalk.dim('Sort:')} ${chalk.yellow('B')}`)
2831
+ lines.push(` ${chalk.dim('A fast model that randomly freezes is worse than a steady one. This catches that.')}`)
2832
+ lines.push('')
2833
+ lines.push(` ${chalk.cyan('Up%')} Uptime — ratio of successful pings to total pings ${chalk.dim('Sort:')} ${chalk.yellow('U')}`)
2834
+ lines.push(` ${chalk.dim('If a model only works half the time, you\'ll waste time retrying. Higher = more reliable.')}`)
2835
+
2687
2836
  lines.push('')
2688
2837
  lines.push(` ${chalk.bold('Main TUI')}`)
2689
2838
  lines.push(` ${chalk.bold('Navigation')}`)
2690
2839
  lines.push(` ${chalk.yellow('↑↓')} Navigate rows`)
2691
2840
  lines.push(` ${chalk.yellow('Enter')} Select model and launch`)
2692
2841
  lines.push('')
2693
- lines.push(` ${chalk.bold('Sorting')}`)
2694
- lines.push(` ${chalk.yellow('R')} Rank ${chalk.yellow('Y')} Tier ${chalk.yellow('O')} Origin ${chalk.yellow('M')} Model`)
2695
- lines.push(` ${chalk.yellow('L')} Latest ping ${chalk.yellow('A')} Avg ping ${chalk.yellow('S')} SWE-bench score`)
2696
- lines.push(` ${chalk.yellow('C')} Context window ${chalk.yellow('H')} Health ${chalk.yellow('V')} Verdict ${chalk.yellow('U')} Uptime`)
2697
- lines.push('')
2698
2842
  lines.push(` ${chalk.bold('Filters')}`)
2699
2843
  lines.push(` ${chalk.yellow('T')} Cycle tier filter ${chalk.dim('(All → S+ → S → A+ → A → A- → B+ → B → C → All)')}`)
2700
2844
  lines.push(` ${chalk.yellow('N')} Cycle origin filter ${chalk.dim('(All → NIM → Groq → Cerebras → ... each provider → All)')}`)
@@ -2994,12 +3138,12 @@ async function main() {
2994
3138
  return
2995
3139
  }
2996
3140
 
2997
- // 📖 Sorting keys: R=rank, Y=tier, O=origin, M=model, L=latest ping, A=avg ping, S=SWE-bench, C=context, H=health, V=verdict, U=uptime
3141
+ // 📖 Sorting keys: R=rank, Y=tier, O=origin, M=model, L=latest ping, A=avg ping, S=SWE-bench, C=context, H=health, V=verdict, B=stability, U=uptime
2998
3142
  // 📖 T is reserved for tier filter cycling — tier sort moved to Y
2999
3143
  // 📖 N is now reserved for origin filter cycling
3000
3144
  const sortKeys = {
3001
3145
  'r': 'rank', 'y': 'tier', 'o': 'origin', 'm': 'model',
3002
- 'l': 'ping', 'a': 'avg', 's': 'swe', 'c': 'ctx', 'h': 'condition', 'v': 'verdict', 'u': 'uptime'
3146
+ 'l': 'ping', 'a': 'avg', 's': 'swe', 'c': 'ctx', 'h': 'condition', 'v': 'verdict', 'b': 'stability', 'u': 'uptime'
3003
3147
  }
3004
3148
 
3005
3149
  if (sortKeys[key.name] && !key.ctrl) {
package/lib/utils.js CHANGED
@@ -27,14 +27,18 @@
27
27
  *
28
28
  * @functions
29
29
  * → getAvg(result) — Calculate average latency from successful pings only
30
- * → getVerdict(result) — Determine model health verdict based on avg latency and status
30
+ * → getVerdict(result) — Determine model health verdict based on avg latency and stability
31
31
  * → getUptime(result) — Calculate uptime percentage (successful / total pings)
32
+ * → getP95(result) — Calculate 95th percentile latency from successful pings
33
+ * → getJitter(result) — Calculate latency standard deviation (jitter)
34
+ * → getStabilityScore(result) — Composite 0–100 stability score (p95 + jitter + spikes + uptime)
32
35
  * → sortResults(results, sortColumn, sortDirection) — Sort model results by any column
33
36
  * → filterByTier(results, tierLetter) — Filter results by tier letter (S/A/B/C)
34
- * → findBestModel(results) — Pick the best model by status → avg → uptime priority
37
+ * → findBestModel(results) — Pick the best model by status → avg → stability → uptime priority
35
38
  * → parseArgs(argv) — Parse CLI arguments into structured flags and values
36
39
  *
37
- * @exports getAvg, getVerdict, getUptime, sortResults, filterByTier, findBestModel, parseArgs
40
+ * @exports getAvg, getVerdict, getUptime, getP95, getJitter, getStabilityScore
41
+ * @exports sortResults, filterByTier, findBestModel, parseArgs
38
42
  * @exports TIER_ORDER, VERDICT_ORDER, TIER_LETTER_MAP
39
43
  *
40
44
  * @see bin/free-coding-models.js — main CLI that imports these utils
@@ -54,7 +58,7 @@ export const TIER_ORDER = ['S+', 'S', 'A+', 'A', 'A-', 'B+', 'B', 'C']
54
58
  // 📖 Used by sortResults when sorting by the "verdict" column.
55
59
  // 📖 "Perfect" means < 400ms avg, "Pending" means no data yet.
56
60
  // 📖 The order matters — it determines sort rank in the TUI table.
57
- export const VERDICT_ORDER = ['Perfect', 'Normal', 'Slow', 'Very Slow', 'Overloaded', 'Unstable', 'Not Active', 'Pending']
61
+ export const VERDICT_ORDER = ['Perfect', 'Normal', 'Slow', 'Spiky', 'Very Slow', 'Overloaded', 'Unstable', 'Not Active', 'Pending']
58
62
 
59
63
  // 📖 Maps a CLI tier letter (--tier S/A/B/C) to the full tier strings it includes.
60
64
  // 📖 Example: --tier A matches A+, A, and A- models (all "A-family" tiers).
@@ -91,11 +95,17 @@ export const getAvg = (r) => {
91
95
  // 2. Timeout/down BUT was previously up → "Unstable" (it worked before, now it doesn't)
92
96
  // 3. Timeout/down and never worked → "Not Active" (model might be offline)
93
97
  // 4. No successful pings yet → "Pending" (still waiting for first response)
94
- // 5. Avg < 400ms "Perfect"
95
- // 6. Avg < 1000ms → "Normal"
96
- // 7. Avg < 3000ms → "Slow"
97
- // 8. Avg < 5000ms → "Very Slow"
98
- // 9. Avg >= 5000ms → "Unstable"
98
+ // 5. Stability-aware speed tiers (avg + p95/jitter penalty):
99
+ // - Avg < 400ms + stable → "Perfect"
100
+ // - Avg < 400ms but spiky p95 → "Spiky" (fast on average, but tail latency hurts)
101
+ // - Avg < 1000ms → "Normal"
102
+ // - Avg < 3000ms → "Slow"
103
+ // - Avg < 5000ms → "Very Slow"
104
+ // - Avg >= 5000ms → "Unstable"
105
+ //
106
+ // 📖 The "Spiky" verdict catches models that look fast on paper (low avg) but randomly
107
+ // stall your IDE/agent with tail-latency spikes. A model with avg 250ms but p95 6000ms
108
+ // gets downgraded from "Perfect" to "Spiky" — because consistency matters more than speed.
99
109
  //
100
110
  // 📖 The "wasUpBefore" check is key — it distinguishes between a model that's
101
111
  // temporarily flaky vs one that was never reachable in the first place.
@@ -107,8 +117,20 @@ export const getVerdict = (r) => {
107
117
  if ((r.status === 'timeout' || r.status === 'down') && wasUpBefore) return 'Unstable'
108
118
  if (r.status === 'timeout' || r.status === 'down') return 'Not Active'
109
119
  if (avg === Infinity) return 'Pending'
110
- if (avg < 400) return 'Perfect'
111
- if (avg < 1000) return 'Normal'
120
+
121
+ // 📖 Stability-aware verdict: penalize models with good avg but terrible tail latency
122
+ const successfulPings = (r.pings || []).filter(p => p.code === '200')
123
+ const p95 = getP95(r)
124
+
125
+ if (avg < 400) {
126
+ // 📖 Only flag as "Spiky" when we have enough data (≥3 pings) to judge stability
127
+ if (successfulPings.length >= 3 && p95 > 3000) return 'Spiky'
128
+ return 'Perfect'
129
+ }
130
+ if (avg < 1000) {
131
+ if (successfulPings.length >= 3 && p95 > 5000) return 'Spiky'
132
+ return 'Normal'
133
+ }
112
134
  if (avg < 3000) return 'Slow'
113
135
  if (avg < 5000) return 'Very Slow'
114
136
  if (avg < 10000) return 'Unstable'
@@ -125,21 +147,84 @@ export const getUptime = (r) => {
125
147
  return Math.round((successful / r.pings.length) * 100)
126
148
  }
127
149
 
150
+ // 📖 getP95: Calculate the 95th percentile latency from successful pings (HTTP 200).
151
+ // 📖 The p95 answers: "95% of requests are faster than this value."
152
+ // 📖 A low p95 means consistently fast responses — a high p95 signals tail-latency spikes.
153
+ // 📖 Returns Infinity when no successful pings exist.
154
+ //
155
+ // 📖 Algorithm: sort latencies ascending, pick the value at ceil(N * 0.95) - 1.
156
+ // 📖 Example: [100, 200, 300, 400, 5000] → p95 index = ceil(5 * 0.95) - 1 = 4 → 5000ms
157
+ export const getP95 = (r) => {
158
+ const successfulPings = (r.pings || []).filter(p => p.code === '200')
159
+ if (successfulPings.length === 0) return Infinity
160
+ const sorted = successfulPings.map(p => p.ms).sort((a, b) => a - b)
161
+ const idx = Math.ceil(sorted.length * 0.95) - 1
162
+ return sorted[Math.max(0, idx)]
163
+ }
164
+
165
+ // 📖 getJitter: Calculate latency standard deviation (σ) from successful pings.
166
+ // 📖 Low jitter = predictable response times. High jitter = erratic, spiky latency.
167
+ // 📖 Returns 0 when fewer than 2 successful pings (can't compute variance from 1 point).
168
+ // 📖 Uses population σ (divides by N, not N-1) since we have ALL the data, not a sample.
169
+ export const getJitter = (r) => {
170
+ const successfulPings = (r.pings || []).filter(p => p.code === '200')
171
+ if (successfulPings.length < 2) return 0
172
+ const mean = successfulPings.reduce((a, b) => a + b.ms, 0) / successfulPings.length
173
+ const variance = successfulPings.reduce((sum, p) => sum + (p.ms - mean) ** 2, 0) / successfulPings.length
174
+ return Math.round(Math.sqrt(variance))
175
+ }
176
+
177
+ // 📖 getStabilityScore: Composite 0–100 score that rewards consistency and reliability.
178
+ // 📖 Combines four signals into a single number:
179
+ // - p95 latency (30%) — penalizes tail-latency spikes
180
+ // - Jitter / σ (30%) — penalizes erratic response times
181
+ // - Spike rate (20%) — fraction of pings above 3000ms threshold
182
+ // - Uptime / reliability (20%) — fraction of successful pings
183
+ //
184
+ // 📖 Each component is normalized to 0–100, then weighted and combined.
185
+ // 📖 Returns -1 when no successful pings exist (not enough data yet).
186
+ //
187
+ // 📖 Example:
188
+ // Model A: avg 250ms, p95 6000ms (tons of spikes) → score ~30
189
+ // Model B: avg 400ms, p95 650ms (boringly consistent) → score ~85
190
+ // In real usage, Model B FEELS faster because it doesn't randomly stall.
191
+ export const getStabilityScore = (r) => {
192
+ const successfulPings = (r.pings || []).filter(p => p.code === '200')
193
+ if (successfulPings.length === 0) return -1
194
+
195
+ const p95 = getP95(r)
196
+ const jitter = getJitter(r)
197
+ const uptime = getUptime(r)
198
+ const spikeCount = successfulPings.filter(p => p.ms > 3000).length
199
+ const spikeRate = spikeCount / successfulPings.length
200
+
201
+ // 📖 Normalize each component to 0–100 (higher = better)
202
+ const p95Score = Math.max(0, Math.min(100, 100 * (1 - p95 / 5000)))
203
+ const jitterScore = Math.max(0, Math.min(100, 100 * (1 - jitter / 2000)))
204
+ const spikeScore = Math.max(0, 100 * (1 - spikeRate))
205
+ const reliabilityScore = uptime
206
+
207
+ // 📖 Weighted composite: 30% p95, 30% jitter, 20% spikes, 20% reliability
208
+ const score = 0.3 * p95Score + 0.3 * jitterScore + 0.2 * spikeScore + 0.2 * reliabilityScore
209
+ return Math.round(score)
210
+ }
211
+
128
212
  // 📖 sortResults: Sort the results array by any column the user can click/press in the TUI.
129
213
  // 📖 Returns a NEW array — never mutates the original (important for React-style re-renders).
130
214
  //
131
215
  // 📖 Supported columns (matching the keyboard shortcuts in the TUI):
132
- // - 'rank' (R key) — original index from sources.js
133
- // - 'tier' (T key) — tier hierarchy (S+ first, C last)
134
- // - 'origin' (O key) — provider name (all NIM for now, future-proofed)
135
- // - 'model' (M key) — alphabetical by display label
136
- // - 'ping' (L key) — last ping latency (only successful ones count)
137
- // - 'avg' (A key) — average latency across all successful pings
138
- // - 'swe' (S key) — SWE-bench score (higher is better)
139
- // - 'ctx' (N key) — context window size (larger is better)
140
- // - 'condition' (H key) — health status (alphabetical)
141
- // - 'verdict' (V key) — verdict order (Perfect → Pending)
142
- // - 'uptime' (U key) — uptime percentage
216
+ // - 'rank' (R key) — original index from sources.js
217
+ // - 'tier' (T key) — tier hierarchy (S+ first, C last)
218
+ // - 'origin' (O key) — provider name (all NIM for now, future-proofed)
219
+ // - 'model' (M key) — alphabetical by display label
220
+ // - 'ping' (L key) — last ping latency (only successful ones count)
221
+ // - 'avg' (A key) — average latency across all successful pings
222
+ // - 'swe' (S key) — SWE-bench score (higher is better)
223
+ // - 'ctx' (N key) — context window size (larger is better)
224
+ // - 'condition' (H key) — health status (alphabetical)
225
+ // - 'verdict' (V key) — verdict order (Perfect → Pending)
226
+ // - 'uptime' (U key) — uptime percentage
227
+ // - 'stability' (B key) — stability score (0–100, higher = more stable)
143
228
  //
144
229
  // 📖 sortDirection 'asc' = ascending (smallest first), 'desc' = descending (largest first)
145
230
  export const sortResults = (results, sortColumn, sortDirection) => {
@@ -219,6 +304,11 @@ export const sortResults = (results, sortColumn, sortDirection) => {
219
304
  case 'uptime':
220
305
  cmp = getUptime(a) - getUptime(b)
221
306
  break
307
+ case 'stability':
308
+ // 📖 Sort by stability score — higher = more stable = better
309
+ // 📖 Models with no data (-1) sort to the bottom
310
+ cmp = getStabilityScore(a) - getStabilityScore(b)
311
+ break
222
312
  }
223
313
 
224
314
  // 📖 Flip comparison for descending order
@@ -242,16 +332,19 @@ export function filterByTier(results, tierLetter) {
242
332
  // 📖 findBestModel: Pick the single best model from a results array.
243
333
  // 📖 Used by --fiable mode to output the most reliable model after 10s of analysis.
244
334
  //
245
- // 📖 Selection priority (tri-key sort):
335
+ // 📖 Selection priority (quad-key sort):
246
336
  // 1. Status: "up" models always beat non-up models
247
337
  // 2. Average latency: faster average wins (lower is better)
248
- // 3. Uptime %: higher uptime wins as tiebreaker
338
+ // 3. Stability score: higher stability wins (more consistent = better)
339
+ // 4. Uptime %: higher uptime wins as final tiebreaker
249
340
  //
250
341
  // 📖 Returns null if the array is empty.
251
342
  export function findBestModel(results) {
252
343
  const sorted = [...results].sort((a, b) => {
253
344
  const avgA = getAvg(a)
254
345
  const avgB = getAvg(b)
346
+ const stabilityA = getStabilityScore(a)
347
+ const stabilityB = getStabilityScore(b)
255
348
  const uptimeA = getUptime(a)
256
349
  const uptimeB = getUptime(b)
257
350
 
@@ -262,7 +355,10 @@ export function findBestModel(results) {
262
355
  // 📖 Priority 2: Lower average latency = faster = better
263
356
  if (avgA !== avgB) return avgA - avgB
264
357
 
265
- // 📖 Priority 3: Higher uptime = more reliable = better (tiebreaker)
358
+ // 📖 Priority 3: Higher stability = more consistent = better
359
+ if (stabilityA !== stabilityB) return stabilityB - stabilityA
360
+
361
+ // 📖 Priority 4: Higher uptime = more reliable = better (final tiebreaker)
266
362
  return uptimeB - uptimeA
267
363
  })
268
364
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "free-coding-models",
3
- "version": "0.1.66",
3
+ "version": "0.1.67",
4
4
  "description": "Find the fastest coding LLM models in seconds — ping free models from multiple providers, pick the best one for OpenCode, Cursor, or any AI coding assistant.",
5
5
  "keywords": [
6
6
  "nvidia",