adaptive-memory-multi-model-router 2.14.46 → 2.14.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (598) hide show
  1. package/{docs/llms.txt → llms.txt.bak} +6 -6
  2. package/package.json +270 -72
  3. package/src/routing/advancedRouter.ts.bak +650 -0
  4. package/test.js.bak +376 -0
  5. package/.dockerignore +0 -82
  6. package/.env.example +0 -303
  7. package/.github/DISCUSSIONS_WELCOME.md +0 -27
  8. package/.github/DISCUSSION_TEMPLATE.yml +0 -5
  9. package/.github/FUNDING.yml +0 -2
  10. package/.github/ISSUE_TEMPLATE/bug_report.md +0 -94
  11. package/.github/ISSUE_TEMPLATE/config.yml +0 -17
  12. package/.github/ISSUE_TEMPLATE/feature_request.md +0 -71
  13. package/.github/PULL_REQUEST_TEMPLATE.md +0 -71
  14. package/.github/dependabot.yml +0 -9
  15. package/.github/workflows/auto-publish.yml +0 -51
  16. package/.github/workflows/ci.yml +0 -263
  17. package/.github/workflows/codeql.yml +0 -38
  18. package/.github/workflows/npm-publish.yml +0 -20
  19. package/.github/workflows/pages.yml +0 -37
  20. package/.github/workflows/stale.yml +0 -54
  21. package/.publish-tick +0 -1
  22. package/.well-known/ai-plugin.json +0 -16
  23. package/AGENT_COUNCIL_FINDINGS.md +0 -142
  24. package/ARCHITECTURE.md +0 -346
  25. package/AUDIT_REPORT.md +0 -28
  26. package/CODE_OF_CONDUCT.md +0 -128
  27. package/CONTRIBUTING.md +0 -50
  28. package/CONTRIBUTORS.md +0 -20
  29. package/Dockerfile +0 -53
  30. package/Dockerfile.proxy +0 -33
  31. package/HEALTH_REPORT.md +0 -118
  32. package/IMPROVEMENT_PLAN.md +0 -107
  33. package/LANDING.md +0 -43
  34. package/LAUNCH-PAIN-DRIVEN.md +0 -339
  35. package/LAUNCH.md +0 -337
  36. package/LAUNCH_CHECKLIST.md +0 -141
  37. package/LAUNCH_SNAPSHOT.md +0 -260
  38. package/MANIFESTO.md +0 -41
  39. package/POPULARITY_BOOSTERS.md +0 -285
  40. package/PR_STATUS_REPORT.md +0 -148
  41. package/REDESIGN.md +0 -95
  42. package/RUNKIT.md +0 -83
  43. package/SECURITY.md +0 -29
  44. package/SUBMISSIONS.md +0 -43
  45. package/_schema.html +0 -53
  46. package/ai-plugin.json +0 -16
  47. package/articles/AI_AGENT_LLM_ROUTING.md +0 -150
  48. package/articles/CHINESE_DIRECTORIES.md +0 -100
  49. package/articles/CHINESE_SUBMISSIONS_READY.md +0 -322
  50. package/articles/COMPETITOR_ALERTS.md +0 -31
  51. package/articles/COMPLETE_POSTING_DIRECTORY.md +0 -147
  52. package/articles/CONTENT_STRUCTURE.md +0 -292
  53. package/articles/DEVTO_COST_GUIDE.md +0 -473
  54. package/articles/DEVTO_FINAL.md +0 -416
  55. package/articles/DEVTO_MULTI_PROVIDER.md +0 -542
  56. package/articles/DEVTO_READY.md +0 -255
  57. package/articles/DEVTO_V2_ANNOUNCEMENT.md +0 -160
  58. package/articles/DEVTO_VIRAL_GROWTH.md +0 -280
  59. package/articles/FRESH_devto.md +0 -460
  60. package/articles/FRESH_devto_2026_05.md +0 -73
  61. package/articles/FRESH_hackernews.md +0 -14
  62. package/articles/FRESH_reddit_ml.md +0 -90
  63. package/articles/FRESH_reddit_node.md +0 -198
  64. package/articles/FRESH_reddit_sideproject.md +0 -72
  65. package/articles/FRESH_reddit_webdev.md +0 -130
  66. package/articles/FROM_ZERO_TO_10K.md +0 -107
  67. package/articles/HN_10X_BETTER.md +0 -430
  68. package/articles/HN_ACCOUNT_GUIDE.md +0 -21
  69. package/articles/HN_CHINESE_STYLE.md +0 -308
  70. package/articles/HN_FINAL.md +0 -148
  71. package/articles/HN_POSTED_VERSION.md +0 -56
  72. package/articles/HN_POST_READY.md +0 -137
  73. package/articles/HN_RESEARCH.md +0 -364
  74. package/articles/HN_SHOW_routerarena.md +0 -17
  75. package/articles/HN_TIMING_GUIDE.md +0 -52
  76. package/articles/INDIEHACKERS_POST.md +0 -52
  77. package/articles/INDIEHACKERS_READY.md +0 -120
  78. package/articles/LLM_BENCHMARK_DEEP_DIVE.md +0 -153
  79. package/articles/MASTER_POSTING_DIRECTORY.md +0 -189
  80. package/articles/NEWSLETTER_SEND_NOW.md +0 -259
  81. package/articles/NEWSLETTER_SUBMISSIONS.md +0 -112
  82. package/articles/PAIN-DRIVEN-devto-v2.md +0 -308
  83. package/articles/PAIN-DRIVEN-devto-v3.md +0 -268
  84. package/articles/PAIN-DRIVEN-devto.md +0 -242
  85. package/articles/PAIN-DRIVEN-hackernews-v2.md +0 -138
  86. package/articles/PAIN-DRIVEN-hackernews-v3.md +0 -151
  87. package/articles/PAIN-DRIVEN-hackernews.md +0 -131
  88. package/articles/PAIN-DRIVEN-reddit-v2.md +0 -301
  89. package/articles/PAIN-DRIVEN-reddit-v3.md +0 -236
  90. package/articles/PAIN-DRIVEN-reddit.md +0 -218
  91. package/articles/PAIN-DRIVEN-twitter-v2.md +0 -110
  92. package/articles/PAIN-DRIVEN-twitter-v3.md +0 -121
  93. package/articles/PAIN-DRIVEN-twitter.md +0 -120
  94. package/articles/PORTKEY_VS_A3M.md +0 -147
  95. package/articles/POSTING_KIT_2026_05.md +0 -67
  96. package/articles/PRESS_KIT_routerarena.md +0 -77
  97. package/articles/PRODUCTHUNT_LISTING.md +0 -48
  98. package/articles/PRODUCTHUNT_READY.md +0 -106
  99. package/articles/PR_PLAN_vault.md +0 -125
  100. package/articles/REDDIT_FINAL.md +0 -232
  101. package/articles/REDDIT_POST.md +0 -67
  102. package/articles/REDDIT_SUBMISSION_READY.md +0 -348
  103. package/articles/ROUTERARENA_LEADER.md +0 -45
  104. package/articles/SHOW_HN_FINAL.md +0 -29
  105. package/articles/TWEETS_10K_DOWNLOADS.md +0 -47
  106. package/articles/TWEETS_BENCHMARK_FIRST.md +0 -46
  107. package/articles/TWEETS_MCP_PLAY.md +0 -51
  108. package/articles/TWEETS_SEQUENTIAL_BROKEN.md +0 -49
  109. package/articles/TWEETS_WHY_BUILD.md +0 -54
  110. package/articles/TWEETS_routerarena_leader.md +0 -53
  111. package/articles/TWEET_STORM_READY.md +0 -165
  112. package/articles/TWITTER_FINAL.md +0 -167
  113. package/articles/WHY_10X_BETTER.md +0 -261
  114. package/articles/WHY_CHINESE_STYLE_BETTER.md +0 -323
  115. package/articles/ai-discoverability-llm-routing.md +0 -210
  116. package/articles/devto-llm-routing.md +0 -138
  117. package/articles/hackernews-show-hn.md +0 -54
  118. package/articles/hashnode-llm-cost-optimization.md +0 -125
  119. package/articles/hn_show_2026_05.md +0 -11
  120. package/articles/medium-building-llm-router.md +0 -205
  121. package/articles/reddit-ml.md +0 -76
  122. package/articles/twitter-thread-cost-savings.md +0 -50
  123. package/articles/youtube-tutorial-script.md +0 -262
  124. package/assets/a3m_3blue1brown.mp4 +0 -0
  125. package/assets/banner.svg +0 -109
  126. package/assets/chart-cost-v2.svg +0 -91
  127. package/assets/chart-cost-v3.svg +0 -143
  128. package/assets/chart-features-v2.svg +0 -132
  129. package/assets/chart-features-v3.svg +0 -211
  130. package/assets/chart-growth-v2.svg +0 -122
  131. package/assets/chart-growth-v3.svg +0 -189
  132. package/assets/cost-comparison.svg +0 -134
  133. package/assets/cost-simple.svg +0 -64
  134. package/assets/demo-hn.gif +0 -0
  135. package/assets/feature-matrix.svg +0 -136
  136. package/assets/growth-chart-animated.svg +0 -76
  137. package/assets/growth-chart.svg +0 -82
  138. package/assets/growth-simple.svg +0 -69
  139. package/assets/hero-diagram.svg +0 -81
  140. package/assets/logo-new.svg +0 -21
  141. package/assets/logo.svg +0 -68
  142. package/assets/provider-comparison.svg +0 -121
  143. package/assets/social-preview-new.svg +0 -100
  144. package/assets/social-preview.svg +0 -194
  145. package/assets/social-v2.svg +0 -130
  146. package/assets/social-v3.svg +0 -212
  147. package/benchmark-provider-results.json +0 -245
  148. package/benchmark-results.json +0 -54
  149. package/council-votes/architecture-vote.md +0 -121
  150. package/council-votes/coverage-vote.md +0 -93
  151. package/data/adaptive-benchmark.json +0 -92
  152. package/data/benchmark-results.json +0 -47
  153. package/data/labeled-benchmark.json +0 -88
  154. package/demo/3blue1brown_video.py +0 -285
  155. package/demo/3blue1brown_video_v2.py +0 -310
  156. package/demo/IMPROVED_PROMPTS.md +0 -229
  157. package/demo/VEO3_PROMPTS.md +0 -269
  158. package/demo/VIDEO_PRODUCTION_GUIDE.md +0 -333
  159. package/demo/a3m_3blue1brown.mp4 +0 -0
  160. package/demo/asciinema-demo.sh +0 -195
  161. package/demo/demo-hn.tape +0 -74
  162. package/demo/demo-script.md +0 -53
  163. package/demo/demo-script.sh +0 -62
  164. package/demo/demo.svg +0 -75
  165. package/demo/frame1_ai_data_center.png +0 -0
  166. package/demo/frame1_sunset_video.mp4 +0 -0
  167. package/demo/frame2_cost_comparison.png +0 -0
  168. package/demo/frame2_cost_comparison_fallback.png +0 -0
  169. package/demo/frame3_parallel_execution.png +0 -0
  170. package/demo/frame3_parallel_execution_fallback.png +0 -0
  171. package/demo/frame4_providers.png +0 -0
  172. package/demo/frame4_providers_fallback.png +0 -0
  173. package/demo/frame5_endcard.png +0 -0
  174. package/demo/frame5_endcard_fallback.png +0 -0
  175. package/demo/new_frame1_hook.png +0 -0
  176. package/demo/new_frame2_proof.png +0 -0
  177. package/demo/new_frame3_wow.png +0 -0
  178. package/demo/new_frame4_social.png +0 -0
  179. package/demo/new_frame5_cta.png +0 -0
  180. package/demo/package.json +0 -13
  181. package/demo/product-video-final.mp4 +0 -0
  182. package/demo/product-video-hype-v1.mp4 +0 -0
  183. package/demo/product-video-v1.mp4 +0 -0
  184. package/demo/public/index.html +0 -762
  185. package/demo/recording.cast +0 -55
  186. package/demo/server.js +0 -405
  187. package/demo-new.tape +0 -71
  188. package/demo-real.sh +0 -198
  189. package/demo-simple.tape +0 -205
  190. package/demo.html +0 -520
  191. package/demo.sh +0 -85
  192. package/demo.tape +0 -259
  193. package/dist/analytics/costAnalytics.d.ts.map +0 -1
  194. package/dist/analytics/costAnalytics.js.map +0 -1
  195. package/dist/benchmark/comprehensive.js.map +0 -1
  196. package/dist/benchmark/reproducible.d.ts.map +0 -1
  197. package/dist/benchmark/reproducible.js.map +0 -1
  198. package/dist/cache/prefixCache.d.ts.map +0 -1
  199. package/dist/cache/prefixCache.js.map +0 -1
  200. package/dist/cache/responseCache.d.ts.map +0 -1
  201. package/dist/cache/responseCache.js.map +0 -1
  202. package/dist/cache/semanticCache.d.ts.map +0 -1
  203. package/dist/cache/semanticCache.js.map +0 -1
  204. package/dist/cli/setupWizard.d.ts.map +0 -1
  205. package/dist/cli/setupWizard.js.map +0 -1
  206. package/dist/cost/budgetEnforcer.d.ts.map +0 -1
  207. package/dist/cost/budgetEnforcer.js.map +0 -1
  208. package/dist/cost/costTracker.d.ts.map +0 -1
  209. package/dist/cost/costTracker.js.map +0 -1
  210. package/dist/ensemble/multiRoundDialog.js.map +0 -1
  211. package/dist/ensemble/shapleyValue.js.map +0 -1
  212. package/dist/integrations/langchainAdapter.d.ts.map +0 -1
  213. package/dist/integrations/langchainAdapter.js.map +0 -1
  214. package/dist/integrations/oauth.d.ts.map +0 -1
  215. package/dist/integrations/oauth.js.map +0 -1
  216. package/dist/integrations/scienceAdapter.js.map +0 -1
  217. package/dist/memory/autoFetch.d.ts.map +0 -1
  218. package/dist/memory/autoFetch.js.map +0 -1
  219. package/dist/memory/episodicMemory.d.ts.map +0 -1
  220. package/dist/memory/episodicMemory.js.map +0 -1
  221. package/dist/memory/hybridMemory.js.map +0 -1
  222. package/dist/memory/memoryTree.d.ts.map +0 -1
  223. package/dist/memory/memoryTree.js.map +0 -1
  224. package/dist/memory/obsidianVault.d.ts.map +0 -1
  225. package/dist/memory/obsidianVault.js.map +0 -1
  226. package/dist/memory/reasoningBank.js.map +0 -1
  227. package/dist/observability/changeWatch.d.ts.map +0 -1
  228. package/dist/observability/changeWatch.js.map +0 -1
  229. package/dist/observability/fatigueDetector.d.ts.map +0 -1
  230. package/dist/observability/fatigueDetector.js.map +0 -1
  231. package/dist/observability/index.d.ts.map +0 -1
  232. package/dist/observability/index.js.map +0 -1
  233. package/dist/observability/metrics.d.ts.map +0 -1
  234. package/dist/observability/metrics.js.map +0 -1
  235. package/dist/observability/middleware.d.ts.map +0 -1
  236. package/dist/observability/middleware.js.map +0 -1
  237. package/dist/observability/tracer.d.ts.map +0 -1
  238. package/dist/observability/tracer.js.map +0 -1
  239. package/dist/observability/types.d.ts.map +0 -1
  240. package/dist/observability/types.js.map +0 -1
  241. package/dist/orchestration/haloOrchestrator.d.ts.map +0 -1
  242. package/dist/orchestration/haloOrchestrator.js.map +0 -1
  243. package/dist/orchestration/mctsWorkflow.d.ts.map +0 -1
  244. package/dist/orchestration/mctsWorkflow.js.map +0 -1
  245. package/dist/providers/localProvider.d.ts.map +0 -1
  246. package/dist/providers/localProvider.js.map +0 -1
  247. package/dist/providers/providerConfig.d.ts.map +0 -1
  248. package/dist/providers/providerConfig.js.map +0 -1
  249. package/dist/providers/registry.d.ts.map +0 -1
  250. package/dist/providers/registry.js.map +0 -1
  251. package/dist/routing/advancedRouter.d.ts.map +0 -1
  252. package/dist/routing/advancedRouter.js.map +0 -1
  253. package/dist/routing/crossModelValidation.d.ts.map +0 -1
  254. package/dist/routing/crossModelValidation.js.map +0 -1
  255. package/dist/routing/providerHealth.d.ts.map +0 -1
  256. package/dist/routing/providerHealth.js.map +0 -1
  257. package/dist/routing/providerRetry.d.ts.map +0 -1
  258. package/dist/routing/providerRetry.js.map +0 -1
  259. package/dist/scripts/banner.js +0 -29
  260. package/dist/security/guardrails.d.ts.map +0 -1
  261. package/dist/security/guardrails.js.map +0 -1
  262. package/dist/server/dashboard.d.ts.map +0 -1
  263. package/dist/server/dashboard.js.map +0 -1
  264. package/dist/server/modelMapper.d.ts.map +0 -1
  265. package/dist/server/modelMapper.js.map +0 -1
  266. package/dist/server/proxyServer.d.ts.map +0 -1
  267. package/dist/server/proxyServer.js.map +0 -1
  268. package/dist/skills/__tests__/skill_manager.test.d.ts +0 -2
  269. package/dist/skills/__tests__/skill_manager.test.d.ts.map +0 -1
  270. package/dist/skills/__tests__/skill_manager.test.js +0 -268
  271. package/dist/skills/__tests__/skill_manager.test.js.map +0 -1
  272. package/dist/tools/tmlpdTools.d.ts.map +0 -1
  273. package/dist/tools/tmlpdTools.js.map +0 -1
  274. package/dist/tui/dashboard.d.ts.map +0 -1
  275. package/dist/tui/dashboard.js.map +0 -1
  276. package/dist/tui/index.d.ts.map +0 -1
  277. package/dist/tui/index.js.map +0 -1
  278. package/dist/utils/batchProcessor.d.ts.map +0 -1
  279. package/dist/utils/batchProcessor.js.map +0 -1
  280. package/dist/utils/compression.d.ts.map +0 -1
  281. package/dist/utils/compression.js.map +0 -1
  282. package/dist/utils/costUtils.d.ts.map +0 -1
  283. package/dist/utils/costUtils.js.map +0 -1
  284. package/dist/utils/reliability.d.ts.map +0 -1
  285. package/dist/utils/reliability.js.map +0 -1
  286. package/dist/utils/sorting.d.ts.map +0 -1
  287. package/dist/utils/sorting.js.map +0 -1
  288. package/dist/utils/speculativeDecoding.d.ts.map +0 -1
  289. package/dist/utils/speculativeDecoding.js.map +0 -1
  290. package/dist/utils/tokenUtils.d.ts.map +0 -1
  291. package/dist/utils/tokenUtils.js.map +0 -1
  292. package/docs/.nojekyll +0 -0
  293. package/docs/ANALYSIS_PRINCIPLES.md +0 -162
  294. package/docs/API.md +0 -855
  295. package/docs/ARCHITECTURAL-IMPROVEMENTS-2025.md +0 -1391
  296. package/docs/ARCHITECTURAL-IMPROVEMENTS-REVISED-2025.md +0 -1051
  297. package/docs/BENCHMARK.md +0 -170
  298. package/docs/CHINESE_PROVIDER_RELIABILITY.md +0 -37
  299. package/docs/CITATIONS.md +0 -74
  300. package/docs/CLAIMS_AND_EVIDENCE.md +0 -58
  301. package/docs/CONFIGURATION.md +0 -476
  302. package/docs/COUNCIL_DECISION.json +0 -816
  303. package/docs/COUNCIL_SUMMARY.md +0 -319
  304. package/docs/COUNCIL_V2.2_DECISION.md +0 -416
  305. package/docs/ENGINEERING_SPEC.md +0 -55
  306. package/docs/FACTORY_RESET.md +0 -34
  307. package/docs/GEO.md +0 -66
  308. package/docs/GEO_OPTIMIZATION.md +0 -30
  309. package/docs/GEO_ROOT_CAUSE.md +0 -136
  310. package/docs/GEO_STATUS.md +0 -85
  311. package/docs/GEO_TEST_RESULTS.md +0 -176
  312. package/docs/HN_CHECKLIST.md +0 -38
  313. package/docs/HN_FOUNDER_COMMENT.md +0 -17
  314. package/docs/HN_SUBMISSION_FINAL.md +0 -180
  315. package/docs/HN_SUBMISSION_V3.md +0 -56
  316. package/docs/IMPROVEMENT_ROADMAP.md +0 -515
  317. package/docs/INTEGRATIONS.md +0 -420
  318. package/docs/LANGCHAIN_INTEGRATION.md +0 -147
  319. package/docs/LLM_COUNCIL_DECISION.md +0 -508
  320. package/docs/MIDDLEWARE_CHAIN.md +0 -35
  321. package/docs/PROMO_CHECKLIST.md +0 -200
  322. package/docs/QUICKSTART.md +0 -271
  323. package/docs/QUICK_START.md +0 -43
  324. package/docs/QUICK_START_VISIBILITY.md +0 -782
  325. package/docs/REDDIT_GAP_ANALYSIS.md +0 -299
  326. package/docs/RELEASE_CHECKLIST.md +0 -32
  327. package/docs/REPRODUCIBILITY.md +0 -63
  328. package/docs/RESEARCH_BACKED_IMPROVEMENTS.md +0 -1180
  329. package/docs/ROUTING_RUBRIC.md +0 -197
  330. package/docs/SEO_AUDIT.md +0 -186
  331. package/docs/SOCIAL_LISTENING.md +0 -219
  332. package/docs/TMLPD_QNA.md +0 -751
  333. package/docs/TMLPD_V2.1_COMPLETE.md +0 -763
  334. package/docs/TMLPD_V2.2_RESEARCH_ROADMAP.md +0 -754
  335. package/docs/UPDATE_TOPICS.md +0 -15
  336. package/docs/USE_CASES.md +0 -59
  337. package/docs/V2.2_IMPLEMENTATION_COMPLETE.md +0 -446
  338. package/docs/V2_IMPLEMENTATION_GUIDE.md +0 -388
  339. package/docs/VERCEL_AI_SDK.md +0 -209
  340. package/docs/VISIBILITY_ADOPTION_PLAN.md +0 -1005
  341. package/docs/_config.yml +0 -49
  342. package/docs/ai-plugin.json +0 -16
  343. package/docs/api.html +0 -513
  344. package/docs/architecture-diagram.md +0 -40
  345. package/docs/benchmark-chart.png +0 -0
  346. package/docs/benchmark.html +0 -387
  347. package/docs/blog/routerarena-number-one.html +0 -73
  348. package/docs/cli-cheatsheet.md +0 -339
  349. package/docs/compare.md +0 -109
  350. package/docs/comparison-litellm.md +0 -88
  351. package/docs/comparison.md +0 -108
  352. package/docs/cost-chart-ascii.md +0 -42
  353. package/docs/cost-comparison-chart.svg +0 -88
  354. package/docs/curl-examples.md +0 -247
  355. package/docs/demo-auto.html +0 -264
  356. package/docs/demo.html +0 -416
  357. package/docs/geo/GENERATIVE_ENGINE_OPTIMIZATION.md +0 -232
  358. package/docs/index.html +0 -507
  359. package/docs/launch-content/LAUNCH_EXECUTION_CHECKLIST.md +0 -421
  360. package/docs/launch-content/README.md +0 -457
  361. package/docs/launch-content/assets/cost_comparison_100_tasks.png +0 -0
  362. package/docs/launch-content/assets/cumulative_savings.png +0 -0
  363. package/docs/launch-content/assets/parallel_speedup.png +0 -0
  364. package/docs/launch-content/assets/provider_pricing_comparison.png +0 -0
  365. package/docs/launch-content/assets/task_breakdown_comparison.png +0 -0
  366. package/docs/launch-content/generate_charts.py +0 -313
  367. package/docs/launch-content/hn_show_post.md +0 -139
  368. package/docs/launch-content/partner_outreach_templates.md +0 -745
  369. package/docs/launch-content/reddit_posts.md +0 -467
  370. package/docs/launch-content/twitter_thread.txt +0 -460
  371. package/docs/npm-downloads-chart.svg +0 -43
  372. package/docs/openapi.json +0 -139
  373. package/docs/openapi.yaml +0 -1318
  374. package/docs/quick-start.html +0 -366
  375. package/docs/robots.txt +0 -52
  376. package/docs/sitemap.xml +0 -57
  377. package/docs/styles.css +0 -682
  378. package/docs/well-known/ai-plugin.json +0 -16
  379. package/docs/wellknown/ai-plugin.json +0 -16
  380. package/docs-site/assets/og-banner.svg +0 -194
  381. package/docs-site/index.html +0 -632
  382. package/eval/README.md +0 -46
  383. package/eval/baselines/main.json +0 -12
  384. package/eval/benchmark_dataset.jsonl +0 -16
  385. package/eval/check_golden_routes.js +0 -64
  386. package/eval/datasets/catalog.json +0 -33
  387. package/eval/datasets/slices/cn_provider_reliability_v1.jsonl +0 -3
  388. package/eval/datasets/slices/cost_pressure_v1.jsonl +0 -3
  389. package/eval/datasets/slices/safety_guardrails_v1.jsonl +0 -3
  390. package/eval/evals.json +0 -199
  391. package/eval/fault_injection_thresholds.json +0 -3
  392. package/eval/generate_report.js +0 -128
  393. package/eval/golden_routes.json +0 -114
  394. package/eval/lib/experiment_registry.js +0 -24
  395. package/eval/run_eval.js +0 -197
  396. package/eval/run_fault_injection.js +0 -201
  397. package/eval/run_shadow_eval.js +0 -85
  398. package/eval/thresholds.json +0 -9
  399. package/examples/QUICKSTART.md +0 -183
  400. package/examples/README.md +0 -61
  401. package/examples/a3m-sdk.js +0 -124
  402. package/examples/basic-route.js +0 -54
  403. package/examples/chat-loop.js +0 -202
  404. package/examples/classify-then-route.js +0 -102
  405. package/examples/cost-compare.js +0 -120
  406. package/examples/ensemble.js +0 -160
  407. package/examples/whatsapp-telegram-bridge-demo.js +0 -302
  408. package/examples/whatsapp-telegram-bridge.js +0 -269
  409. package/hf-space/README.md +0 -23
  410. package/hf-space/app.py +0 -240
  411. package/hf-space/requirements.txt +0 -1
  412. package/huggingface_space/README.md +0 -35
  413. package/huggingface_space/app.py +0 -126
  414. package/huggingface_space/create_space.py +0 -208
  415. package/huggingface_space/requirements.txt +0 -1
  416. package/mcp-server/README.md +0 -188
  417. package/mcp-server/package.json +0 -29
  418. package/mcp-server/src/index.ts +0 -744
  419. package/mcp-server/tsconfig.json +0 -19
  420. package/openclaw-alexa-bridge/ALL_REMAINING_FIXES_PLAN.md +0 -313
  421. package/openclaw-alexa-bridge/REMAINING_FIXES_SUMMARY.md +0 -277
  422. package/openclaw-alexa-bridge/src/alexa_handler_no_tmlpd.js +0 -1234
  423. package/openclaw-alexa-bridge/test_fixes.js +0 -77
  424. package/playground/README.md +0 -51
  425. package/playground/codesandbox.json +0 -12
  426. package/playground/index.js +0 -39
  427. package/proxy/README.md +0 -227
  428. package/proxy/package-lock.json +0 -831
  429. package/proxy/package.json +0 -17
  430. package/proxy/rate-limit.js +0 -145
  431. package/proxy/rate-limit.test.js +0 -311
  432. package/proxy/server.js +0 -970
  433. package/python/README.md +0 -102
  434. package/python/a3m/__init__.py +0 -6
  435. package/python/a3m/client.py +0 -190
  436. package/python/a3m/models.py +0 -40
  437. package/python/a3m/sync_client.py +0 -61
  438. package/python/examples.py +0 -53
  439. package/python/integrations.py +0 -330
  440. package/python/pyproject.toml +0 -23
  441. package/python/setup.py +0 -28
  442. package/python/tmlpd.py +0 -369
  443. package/qna/REDDIT_GAP_ANALYSIS.md +0 -299
  444. package/qna/TMLPD_QNA.md +0 -751
  445. package/research/FINDING_001_safety.md +0 -28
  446. package/research/FINDING_002_error_diversity.md +0 -32
  447. package/research/FINDING_003_confidence_weighted_voting.md +0 -32
  448. package/research/FINDING_004_cross_model_semantic_detection.md +0 -37
  449. package/research/FINDING_005_knowledge_gap_orthogonality.md +0 -34
  450. package/research/HALLUCINATION_RESEARCH.md +0 -27
  451. package/research/ensemble-voting.md +0 -324
  452. package/research/loss-functions.md +0 -545
  453. package/research-log.md +0 -49
  454. package/scripts/banner.js +0 -29
  455. package/scripts/benchmark-local-routerarena.ts +0 -176
  456. package/scripts/benchmark.js +0 -145
  457. package/scripts/benchmark.sh +0 -61
  458. package/scripts/compare-providers.sh +0 -230
  459. package/scripts/content-planner.js +0 -25
  460. package/scripts/create-labeled-benchmark.ts +0 -105
  461. package/scripts/cross_post.py +0 -443
  462. package/scripts/local-router-benchmark.ts +0 -154
  463. package/scripts/post-all.sh +0 -41
  464. package/scripts/publish_fcc.py +0 -106
  465. package/scripts/push-to-gitee.sh +0 -25
  466. package/scripts/routerarena_ensemble.js +0 -144
  467. package/scripts/routing-benchmark-v2.js +0 -373
  468. package/scripts/routing-benchmark-v3.js +0 -118
  469. package/scripts/routing-benchmark.js +0 -462
  470. package/scripts/run-labeled-benchmark.mjs +0 -104
  471. package/scripts/run-mmlu-benchmark.js +0 -176
  472. package/scripts/run-provider-benchmark.js +0 -244
  473. package/scripts/update-npm-badges.js +0 -158
  474. package/skill/SKILL.md +0 -238
  475. package/src/__tests__/integration/tmpld_integration.test.py +0 -540
  476. package/src/skills/__tests__/skill_manager.test.ts +0 -328
  477. package/submissions/benchmarks/ALL_PLATFORMS_SUBMISSION.md +0 -94
  478. package/submissions/benchmarks/LLMROUTERBENCH_SUBMISSION.md +0 -121
  479. package/submissions/benchmarks/MMRBENCH_SUBMISSION.md +0 -94
  480. package/submissions/benchmarks/ROUTERARENA_UPDATE.md +0 -83
  481. package/submissions/benchmarks/ROUTERBENCH_SUBMISSION.md +0 -225
  482. package/test-council/1-structure-tests.test.js +0 -353
  483. package/test-council/1-structure-tests.test.ts +0 -353
  484. package/test-council/2-edge-case-tests.test.ts +0 -361
  485. package/test-council/3-performance-tests.test.ts +0 -669
  486. package/test-council/4-integration-tests.test.ts +0 -391
  487. package/test-council/5-agent-council-eval.test.ts +0 -413
  488. package/test-council/AGENT_COUNCIL_ARCHITECTURE.md +0 -349
  489. package/test-council/TEST_COUNCIL_REPORT.md +0 -201
  490. package/test-council/agents/edge-case-agent.ts +0 -363
  491. package/test-council/agents/performance-agent.ts +0 -426
  492. package/test-council/agents/structure-agent.ts +0 -227
  493. package/test-council/council.md +0 -183
  494. package/tests/__mocks__/tokenUtils.ts +0 -8
  495. package/tests/memory/episodicMemory.test.ts +0 -227
  496. package/tests/package-lock.json +0 -1628
  497. package/tests/package.json +0 -18
  498. package/tests/routing/ensembleVoting.test.ts +0 -236
  499. package/tests/routing/providerRetry.test.ts +0 -360
  500. package/tests/routing/queryTypePresets.test.ts +0 -208
  501. package/tests/security/guardrailEngine.test.ts +0 -700
  502. package/tests/tsconfig.json +0 -21
  503. package/tests/vitest.config.ts +0 -18
  504. package/tmlpd-pi-extension/README.md +0 -66
  505. package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts +0 -114
  506. package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts.map +0 -1
  507. package/tmlpd-pi-extension/dist/cache/prefixCache.js +0 -285
  508. package/tmlpd-pi-extension/dist/cache/prefixCache.js.map +0 -1
  509. package/tmlpd-pi-extension/dist/cache/responseCache.d.ts +0 -58
  510. package/tmlpd-pi-extension/dist/cache/responseCache.d.ts.map +0 -1
  511. package/tmlpd-pi-extension/dist/cache/responseCache.js +0 -153
  512. package/tmlpd-pi-extension/dist/cache/responseCache.js.map +0 -1
  513. package/tmlpd-pi-extension/dist/cli.js +0 -59
  514. package/tmlpd-pi-extension/dist/cost/costTracker.d.ts +0 -95
  515. package/tmlpd-pi-extension/dist/cost/costTracker.d.ts.map +0 -1
  516. package/tmlpd-pi-extension/dist/cost/costTracker.js +0 -240
  517. package/tmlpd-pi-extension/dist/cost/costTracker.js.map +0 -1
  518. package/tmlpd-pi-extension/dist/index.d.ts +0 -723
  519. package/tmlpd-pi-extension/dist/index.d.ts.map +0 -1
  520. package/tmlpd-pi-extension/dist/index.js +0 -239
  521. package/tmlpd-pi-extension/dist/index.js.map +0 -1
  522. package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts +0 -82
  523. package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts.map +0 -1
  524. package/tmlpd-pi-extension/dist/memory/episodicMemory.js +0 -145
  525. package/tmlpd-pi-extension/dist/memory/episodicMemory.js.map +0 -1
  526. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts +0 -102
  527. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts.map +0 -1
  528. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js +0 -207
  529. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js.map +0 -1
  530. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts +0 -85
  531. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts.map +0 -1
  532. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js +0 -210
  533. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js.map +0 -1
  534. package/tmlpd-pi-extension/dist/providers/localProvider.d.ts +0 -102
  535. package/tmlpd-pi-extension/dist/providers/localProvider.d.ts.map +0 -1
  536. package/tmlpd-pi-extension/dist/providers/localProvider.js +0 -338
  537. package/tmlpd-pi-extension/dist/providers/localProvider.js.map +0 -1
  538. package/tmlpd-pi-extension/dist/providers/registry.d.ts +0 -55
  539. package/tmlpd-pi-extension/dist/providers/registry.d.ts.map +0 -1
  540. package/tmlpd-pi-extension/dist/providers/registry.js +0 -138
  541. package/tmlpd-pi-extension/dist/providers/registry.js.map +0 -1
  542. package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts +0 -68
  543. package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts.map +0 -1
  544. package/tmlpd-pi-extension/dist/routing/advancedRouter.js +0 -332
  545. package/tmlpd-pi-extension/dist/routing/advancedRouter.js.map +0 -1
  546. package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts +0 -101
  547. package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts.map +0 -1
  548. package/tmlpd-pi-extension/dist/tools/tmlpdTools.js +0 -368
  549. package/tmlpd-pi-extension/dist/tools/tmlpdTools.js.map +0 -1
  550. package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts +0 -96
  551. package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts.map +0 -1
  552. package/tmlpd-pi-extension/dist/utils/batchProcessor.js +0 -170
  553. package/tmlpd-pi-extension/dist/utils/batchProcessor.js.map +0 -1
  554. package/tmlpd-pi-extension/dist/utils/compression.d.ts +0 -61
  555. package/tmlpd-pi-extension/dist/utils/compression.d.ts.map +0 -1
  556. package/tmlpd-pi-extension/dist/utils/compression.js +0 -281
  557. package/tmlpd-pi-extension/dist/utils/compression.js.map +0 -1
  558. package/tmlpd-pi-extension/dist/utils/reliability.d.ts +0 -74
  559. package/tmlpd-pi-extension/dist/utils/reliability.d.ts.map +0 -1
  560. package/tmlpd-pi-extension/dist/utils/reliability.js +0 -177
  561. package/tmlpd-pi-extension/dist/utils/reliability.js.map +0 -1
  562. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts +0 -117
  563. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts.map +0 -1
  564. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js +0 -246
  565. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js.map +0 -1
  566. package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts +0 -50
  567. package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts.map +0 -1
  568. package/tmlpd-pi-extension/dist/utils/tokenUtils.js +0 -124
  569. package/tmlpd-pi-extension/dist/utils/tokenUtils.js.map +0 -1
  570. package/tmlpd-pi-extension/examples/QUICKSTART.md +0 -183
  571. package/tmlpd-pi-extension/package-lock.json +0 -79
  572. package/tmlpd-pi-extension/package.json +0 -172
  573. package/tmlpd-pi-extension/python/examples.py +0 -53
  574. package/tmlpd-pi-extension/python/integrations.py +0 -330
  575. package/tmlpd-pi-extension/python/setup.py +0 -28
  576. package/tmlpd-pi-extension/python/tmlpd.py +0 -369
  577. package/tmlpd-pi-extension/qna/REDDIT_GAP_ANALYSIS.md +0 -299
  578. package/tmlpd-pi-extension/qna/TMLPD_QNA.md +0 -751
  579. package/tmlpd-pi-extension/skill/SKILL.md +0 -238
  580. package/tmlpd-pi-extension/src/cache/responseCache.ts +0 -147
  581. package/tmlpd-pi-extension/src/cost/costTracker.ts +0 -302
  582. package/tmlpd-pi-extension/src/index.ts +0 -232
  583. package/tmlpd-pi-extension/src/memory/episodicMemory.ts +0 -257
  584. package/tmlpd-pi-extension/src/orchestration/haloOrchestrator.ts +0 -266
  585. package/tmlpd-pi-extension/src/orchestration/mctsWorkflow.ts +0 -262
  586. package/tmlpd-pi-extension/src/providers/localProvider.ts +0 -406
  587. package/tmlpd-pi-extension/src/providers/registry.ts +0 -164
  588. package/tmlpd-pi-extension/src/routing/ensembleVoting.ts +0 -159
  589. package/tmlpd-pi-extension/src/routing/queryTypePresets.ts +0 -136
  590. package/tmlpd-pi-extension/src/tools/tmlpdTools.ts +0 -433
  591. package/tmlpd-pi-extension/src/utils/batchProcessor.ts +0 -232
  592. package/tmlpd-pi-extension/src/utils/compression.ts +0 -325
  593. package/tmlpd-pi-extension/src/utils/reliability.ts +0 -221
  594. package/tmlpd-pi-extension/src/utils/tokenUtils.ts +0 -145
  595. package/tmlpd-pi-extension/tsconfig.json +0 -18
  596. package/tsconfig.build.json +0 -29
  597. package/tsconfig.json +0 -18
  598. /package/{docs/llms-full.txt → llms-full.txt.bak} +0 -0
@@ -1,232 +0,0 @@
1
- # [R] I benchmarked 47 LLM providers against 12K+ real queries - the cost/speed/quality matrix
2
-
3
- ---
4
-
5
- ## TL;DR
6
-
7
- I ran 12,847 real-world queries through 47 LLM API providers, scoring each on quality, measuring latency, and tracking cost and uptime. The goal: build an evidence base for intelligent model routing rather than defaulting to a single provider. The data shows a 70% cost reduction is achievable with marginal quality loss by matching query complexity to the right model.
8
-
9
- All findings below. Code and routing system open-sourced.
10
-
11
- ## Motivation
12
-
13
- Most LLM applications hard-code a single provider. When cost or latency becomes a problem, teams either switch providers entirely or implement ad-hoc fallback chains. Neither approach is systematic.
14
-
15
- I wanted to answer: **for a given query type, which provider gives the best quality-per-dollar?**
16
-
17
- The answer turns out to depend heavily on what you're asking.
18
-
19
- ## Methodology
20
-
21
- ### Query Dataset
22
-
23
- - **12,847 queries** collected from production traffic over 60 days (March-April 2026)
24
- - Queries were manually categorized into 5 buckets by complexity and domain:
25
-
26
- | Category | Count | % of Total | Description |
27
- |---|---|---|---|
28
- | Simple Q&A | 3,212 | 25.0% | Factual lookup, definition, single-step reasoning |
29
- | Code | 2,831 | 22.0% | Code generation, debugging, refactoring |
30
- | Summary | 2,574 | 20.0% | Summarization, extraction, reformulation |
31
- | Complex Reasoning | 2,182 | 17.0% | Multi-step logic, analysis, comparison |
32
- | Multilingual | 2,048 | 16.0% | Queries in Hindi, Bengali, Hinglish, Chinese, French, Spanish |
33
-
34
- ### Quality Scoring
35
-
36
- Quality was evaluated using a two-stage process:
37
-
38
- 1. **Reference-based scoring**: For each query category, I held out 200 queries and wrote reference answers manually. Model outputs were compared against these references using a combination of:
39
- - Semantic similarity (embedding cosine distance)
40
- - LLM-as-judge scoring (GPT-4o as evaluator, blind to model identity)
41
- - Task-specific heuristics (e.g., code correctness via unit test pass rate)
42
-
43
- 2. **Pairwise Elo rating**: Each model output was compared against outputs from 3 other models for the same query. Wins/losses updated an Elo rating per category. The final quality percentage is normalized Elo across all categories.
44
-
45
- This is not a perfect methodology. LLM-as-judge has known biases. But it's consistent enough to separate tiers.
46
-
47
- ### Latency Measurement
48
-
49
- - Measured from request dispatch to full response receipt (non-streaming)
50
- - 3 runs per query, median reported
51
- - All requests from a single US-East GCP instance
52
- - Network variance: +/- 50ms across runs
53
-
54
- ### Cost
55
-
56
- - Based on published per-token pricing as of May 2026
57
- - Computed per 1M tokens (combined input+output, weighted by observed ratio)
58
-
59
- ### Uptime
60
-
61
- - Tracked over the same 60-day window
62
- - Measured as % of 5-minute intervals where at least one successful response was received
63
- - Excludes planned maintenance windows from provider status pages
64
-
65
- ---
66
-
67
- ## Results
68
-
69
- ### Quality by Category
70
-
71
- Quality scores (0-100) per provider, broken down by query type. Only providers scoring above 75% on at least one category are listed:
72
-
73
- | Provider | Simple Q&A | Code | Summary | Complex | Multilingual | Overall |
74
- |---|---|---|---|---|---|---|
75
- | OpenAI GPT-4 | 96 | 94 | 95 | 97 | 93 | **95** |
76
- | Anthropic Claude 3.5 | 95 | 93 | 96 | 96 | 90 | **94** |
77
- | Google Gemini 2.5 Pro | 94 | 91 | 94 | 94 | 91 | **93** |
78
- | GLM-4 (Zhipu) | 91 | 88 | 90 | 93 | 95 | **92** |
79
- | Mistral Large | 90 | 89 | 92 | 91 | 86 | **90** |
80
- | MiniMax-M2 | 88 | 86 | 91 | 88 | 92 | **89** |
81
- | Groq (Llama 3.3 70B) | 84 | 80 | 83 | 78 | 79 | **82** |
82
- | Cerebras (Llama 3.3 70B) | 84 | 79 | 83 | 77 | 80 | **82** |
83
- | DeepSeek V3 | 89 | 90 | 88 | 85 | 84 | **88** |
84
- | Cohere Command R+ | 88 | 82 | 91 | 84 | 85 | **87** |
85
-
86
- **Key finding**: The quality gap between GPT-4 and Groq/Cerebras is 13 points overall, but only 2-4 points on Simple Q&A. For straightforward queries, cheaper models are nearly indistinguishable.
87
-
88
- GLM-4 scores notably well on multilingual (95%), outperforming GPT-4 (93%) on the Hindi/Bengali/Chinese subset.
89
-
90
- ### Cost per 1M Tokens
91
-
92
- | Provider | Cost/1M tokens | Notes |
93
- |---|---|---|
94
- | Groq | $0.59 | Llama 3.3 70B, free tier available |
95
- | Cerebras | $0.60 | Llama 3.3 70B |
96
- | Together AI | $0.72 | Mixtral 8x22B |
97
- | DeepSeek | $0.80 | DeepSeek V3 |
98
- | Fireworks | $1.10 | Llama 3.3 70B |
99
- | MiniMax | $1.50 | MiniMax-M2 |
100
- | Mistral | $2.00 | Mistral Large |
101
- | GLM-4 | $2.80 | Via Zhipu API |
102
- | Cohere | $3.00 | Command R+ |
103
- | Google Gemini 2.5 Flash | $3.50 | Flash variant |
104
- | Google Gemini 2.5 Pro | $7.00 | Pro variant |
105
- | Anthropic Claude 3.5 | $15.00 | Sonnet pricing |
106
- | OpenAI GPT-4 | $30.00 | Latest pricing |
107
-
108
- **50x cost range** between cheapest and most expensive.
109
-
110
- ### Latency (Median, non-streaming)
111
-
112
- | Provider | p50 latency | p95 latency |
113
- |---|---|---|
114
- | Cerebras | 380ms | 620ms |
115
- | Groq | 420ms | 710ms |
116
- | Fireworks | 580ms | 1100ms |
117
- | MiniMax | 600ms | 1050ms |
118
- | Together AI | 650ms | 1300ms |
119
- | Mistral | 800ms | 1800ms |
120
- | GLM-4 | 800ms | 1600ms |
121
- | DeepSeek | 850ms | 2000ms |
122
- | Cohere | 1100ms | 2200ms |
123
- | Google Gemini 2.5 Pro | 1500ms | 3200ms |
124
- | Anthropic Claude 3.5 | 1800ms | 3500ms |
125
- | OpenAI GPT-4 | 2100ms | 4500ms |
126
-
127
- Cerebras and Groq are in a different league for latency. Both run Llama 3.3 70B on custom inference silicon. The tradeoff: lower quality ceiling than proprietary models.
128
-
129
- ### Uptime (60-day window)
130
-
131
- | Provider | Uptime | Longest outage |
132
- |---|---|---|
133
- | OpenAI | 99.91% | 23 min |
134
- | Anthropic | 99.87% | 41 min |
135
- | Google Gemini | 99.82% | 58 min |
136
- | Mistral | 99.65% | 2.1 hr |
137
- | Groq | 99.40% | 3.5 hr |
138
- | GLM-4 | 99.30% | 4.0 hr |
139
- | Cerebras | 99.25% | 3.2 hr |
140
- | MiniMax | 99.10% | 5.5 hr |
141
- | DeepSeek | 98.80% | 8.2 hr |
142
- | Cohere | 99.70% | 1.5 hr |
143
-
144
- Budget providers have meaningfully lower uptime. Groq and Cerebras both had multi-hour outages during the test window. If you route to them, you need automatic fallback logic.
145
-
146
- ---
147
-
148
- ## The Routing Hypothesis
149
-
150
- The data suggests a clear strategy: **match query complexity to model capability**.
151
-
152
- Here's what a naive routing policy looks like based on these numbers:
153
-
154
- | Query Type | Route to | Cost vs GPT-4 | Quality delta |
155
- |---|---|---|---|
156
- | Simple Q&A | Groq/Cerebras | -98% | -12% (96->84) |
157
- | Code (simple) | Groq/Cerebras | -98% | -14% (94->80) |
158
- | Code (complex) | DeepSeek/Mistral | -97% | -4% (94->90) |
159
- | Summary | MiniMax/Mistral | -93% | -3% (95->92) |
160
- | Complex Reasoning | GLM-4/Mistral | -91% | -4% (97->93) |
161
- | Multilingual | GLM-4/MiniMax | -91% | +2% (93->95) |
162
- | Fallback (uncertain) | GPT-4/Claude | baseline | baseline |
163
-
164
- Applying this routing to the 12,847 query distribution: **70.3% cost reduction** with a weighted quality drop of 3.8 points (from 95 to 91.2).
165
-
166
- For most production workloads, that tradeoff is favorable.
167
-
168
- ### What I Built From This Data
169
-
170
- I packaged the routing logic into an npm library: **adaptive-memory-multi-model-router**.
171
-
172
- - GitHub: https://github.com/Das-rebel/a3m-router
173
- - npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
174
-
175
- It handles provider selection, automatic fallback on failure/timeout, and cost tracking per request. The routing table is configurable -- you can set your own quality/cost thresholds. It ships with the benchmark data above as default routing weights.
176
-
177
- The routing decision is currently rule-based (query category -> provider). I experimented with learned routing (training a classifier on query features to predict optimal provider) but the rule-based approach matched it within 1% on cost savings with far less complexity.
178
-
179
- ---
180
-
181
- ## Limitations
182
-
183
- Several things this benchmark does **not** tell you:
184
-
185
- 1. **Streaming latency not measured.** Most production apps use streaming. Non-streaming latency is a proxy but not identical. Cerebras/Groq's advantage may be even larger with streaming due to first-token latency.
186
-
187
- 2. **Context window behavior not tested.** All queries were under 4K tokens. Performance with 32K+ context (RAG, long documents) may differ significantly. Some providers degrade noticeably at longer contexts.
188
-
189
- 3. **Single region only.** All requests originated from US-East. Latency from Europe or Asia will look different, especially for Mistral (EU-hosted) and GLM-4 (China-hosted).
190
-
191
- 4. **Quality scoring has biases.** LLM-as-judge tends to prefer longer, more verbose outputs. This may inflate scores for some providers. The Elo pairwise comparison mitigates this somewhat but doesn't eliminate it.
192
-
193
- 5. **Provider-specific features ignored.** Function calling, structured output, vision, tool use -- none of these were tested. If you need reliable function calling, OpenAI and Anthropic are still meaningfully ahead.
194
-
195
- 6. **Snapshot in time.** Provider models and pricing change frequently. These numbers are from March-May 2026. Re-run before making decisions.
196
-
197
- 7. **No fine-tuned models tested.** All providers tested with their base offerings. Fine-tuned variants (e.g., your own Llama fine-tune on Groq) could shift results significantly.
198
-
199
- 8. **Sample bias.** Queries come from my own applications (chat, coding assistant, multilingual content processing). Different workloads will see different quality distributions.
200
-
201
- ---
202
-
203
- ## Lessons Learned
204
-
205
- **1. The cheapest model that works is usually good enough.** For ~40% of real-world queries, Groq/Cerebras at $0.60/1M tokens produce outputs within 5% of GPT-4 quality. The gap is real but rarely matters for simple tasks.
206
-
207
- **2. Multilingual is where mid-tier models shine.** GLM-4 and MiniMax both outperform GPT-4 on Hindi/Bengali/Chinese at 1/10th the cost. If multilingual is your primary use case, routing to these providers is a no-brainer.
208
-
209
- **3. Uptime matters more than you think.** Groq had a 3.5-hour outage during testing. If you're routing 100% of simple queries to Groq, that's a 3.5-hour window where either queries fail or you need fallback logic. The routing system **must** handle provider failures gracefully.
210
-
211
- **4. Latency variance is the hidden problem.** p50 tells you the typical experience. p95 tells you what users actually perceive. OpenAI's p95 is 4.5 seconds, more than 2x its p50. If you have SLAs, plan around p95.
212
-
213
- **5. The "best" provider depends on your query distribution.** There is no universal winner. A coding assistant should route differently than a multilingual chatbot. Know your query mix before choosing providers.
214
-
215
- **6. Quality scores compress over time.** Compared to a similar benchmark I ran 6 months ago, the gap between top-tier and budget providers narrowed from ~20 points to ~13 points. Model quality is converging. Cost and latency are becoming the differentiators.
216
-
217
- ---
218
-
219
- ## Questions for the Community
220
-
221
- - **What providers did I miss?** I tested 47 but there are many more (Replicate, Anyscale, Perplexity API, Lepton, various regional providers). If you have benchmark data for others, I'd like to compare.
222
- - **Do these quality scores match your experience?** Particularly interested in disagreements on the code and multilingual categories, since those are hardest to score objectively.
223
- - **Has anyone trained a learned router?** My rule-based approach works but I suspect a lightweight classifier could squeeze another 2-5% cost savings. Curious what others have found.
224
- - **How are you handling provider failover?** The latency of detecting a failure and switching providers is a real cost. Currently I use a 2-second timeout with a health check cache. What's your approach?
225
-
226
- ---
227
-
228
- **Links:**
229
- - GitHub: https://github.com/Das-rebel/a3m-router
230
- - npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
231
-
232
- Raw benchmark data is in the repo under `benchmarks/`. PRs welcome if you want to add your own provider data.
@@ -1,67 +0,0 @@
1
- # Reddit Post - Daslearnsai
2
-
3
- ## Target Subreddits
4
- - r/LocalLLaMA
5
- - r/SideProject
6
- - r/programming
7
- - r/MachineLearning
8
-
9
- ## Post Title Options
10
- 1. "I built an LLM router that beats GPT-5 at 1/213th the cost — #1 on RouterArena"
11
- 2. "A3M Router: 70.32 score, $0.047/1K, open-source"
12
-
13
- ## Post Body
14
-
15
- ```
16
- I built A3M Router — an open-source LLM routing proxy that ranks #1 on RouterArena (arXiv:2510.00202).
17
-
18
- **The Numbers:**
19
- - RouterArena Score: 70.32 (#1 of 19 routers)
20
- - Cost: $0.047 per 1K queries
21
- - vs GPT-5: 213x cheaper with better accuracy
22
- - vs RouteLLM: 59% higher score at 5.7x lower cost
23
-
24
- **How it works:**
25
- Instead of sending every query to expensive models, A3M routes queries to the cheapest capable provider using 12 keyword signals.
26
-
27
- Simple query (hi, thanks) → free tier (Groq llama)
28
- Complex query (explain quantum entanglement) → premium (GPT-4o)
29
-
30
- **Features:**
31
- - Parallel multi-LLM execution (fire multiple, pick best)
32
- - 47+ providers: OpenAI, Anthropic, Groq, Cerebras, DeepSeek, Gemini, Mistral...
33
- - Memory across sessions
34
- - Semantic cache (30%+ hit rate)
35
- - Budget enforcement
36
- - Circuit breaker with auto-failover
37
-
38
- **Quick start:**
39
- ```bash
40
- npx a3m-router serve
41
- ```
42
-
43
- Then use it like OpenAI:
44
- ```python
45
- from openai import OpenAI
46
- client = OpenAI(
47
- api_key="your-key",
48
- base_url="http://localhost:8787/v1" # A3M proxy
49
- )
50
- response = client.chat.completions.create(
51
- model="auto", # A3M routes automatically
52
- messages=[{"role": "user", "content": "Your query"}]
53
- )
54
- ```
55
-
56
- GitHub: https://github.com/Das-rebel/a3m-router
57
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
58
-
59
- Demo: [asciinema.org/a/RpqOZM9tFMALYWvs]
60
-
61
- AMA!
62
- ```
63
-
64
- ## Posting Strategy
65
- 1. Post to r/LocalLLaMA first (most receptive)
66
- 2. 24h later: r/SideProject, r/programming
67
- 3. Track engagement
@@ -1,348 +0,0 @@
1
- # A3M Router — Reddit Submission-Ready Posts
2
-
3
- ---
4
-
5
- ## Post 1: r/LocalLLaMA
6
-
7
- **URL:** https://www.reddit.com/r/LocalLLaMA/submit/
8
-
9
- **Title:** [R] I benchmarked 47 LLM providers against 12K+ real queries — the cost/speed/quality matrix
10
-
11
- **Body:**
12
-
13
- ```
14
- ## TL;DR
15
-
16
- I ran 12,847 real-world queries through 47 LLM API providers, scoring each on quality, measuring latency, and tracking cost and uptime. The goal: build an evidence base for intelligent model routing rather than defaulting to a single provider. The data shows a 70% cost reduction is achievable with marginal quality loss by matching query complexity to the right model.
17
-
18
- All findings below. Code and routing system open-sourced.
19
-
20
- ## Motivation
21
-
22
- Most LLM applications hard-code a single provider. When cost or latency becomes a problem, teams either switch providers entirely or implement ad-hoc fallback chains. Neither approach is systematic.
23
-
24
- I wanted to answer: **for a given query type, which provider gives the best quality-per-dollar?**
25
-
26
- The answer turns out to depend heavily on what you're asking.
27
-
28
- ## Methodology
29
-
30
- ### Query Dataset
31
-
32
- - **12,847 queries** collected from production traffic over 60 days (March-April 2026)
33
- - Queries were manually categorized into 5 buckets by complexity and domain:
34
-
35
- | Category | Count | % of Total | Description |
36
- |---|---|---|---|
37
- | Simple Q&A | 3,212 | 25.0% | Factual lookup, definition, single-step reasoning |
38
- | Code | 2,831 | 22.0% | Code generation, debugging, refactoring |
39
- | Summary | 2,574 | 20.0% | Summarization, extraction, reformulation |
40
- | Complex Reasoning | 2,182 | 17.0% | Multi-step logic, analysis, comparison |
41
- | Multilingual | 2,048 | 16.0% | Queries in Hindi, Bengali, Hinglish, Chinese, French, Spanish |
42
-
43
- ### Quality Scoring
44
-
45
- Quality was evaluated using a two-stage process:
46
-
47
- 1. **Reference-based scoring**: For each query category, I held out 200 queries and wrote reference answers manually. Model outputs were compared against these references using a combination of:
48
- - Semantic similarity (embedding cosine distance)
49
- - LLM-as-judge scoring (GPT-4o as evaluator, blind to model identity)
50
- - Task-specific heuristics (e.g., code correctness via unit test pass rate)
51
-
52
- 2. **Pairwise Elo rating**: Each model output was compared against outputs from 3 other models for the same query. Wins/losses updated an Elo rating per category. The final quality percentage is normalized Elo across all categories.
53
-
54
- This is not a perfect methodology. LLM-as-judge has known biases. But it's consistent enough to separate tiers.
55
-
56
- ### Cost per 1M Tokens
57
-
58
- | Provider | Cost/1M tokens |
59
- |---|---|
60
- | Groq | $0.59 |
61
- | Cerebras | $0.60 |
62
- | DeepSeek V3 | $0.80 |
63
- | MiniMax-M2 | $1.50 |
64
- | Mistral Large | $2.00 |
65
- | GLM-4 | $2.80 |
66
- | Google Gemini 2.5 Flash | $3.50 |
67
- | Google Gemini 2.5 Pro | $7.00 |
68
- | Anthropic Claude 3.5 | $15.00 |
69
- | OpenAI GPT-4 | $30.00 |
70
-
71
- **50x cost range** between cheapest and most expensive.
72
-
73
- ### The Routing Policy
74
-
75
- Based on the data, here's the routing policy:
76
-
77
- | Query Type | Route to | Cost vs GPT-4 | Quality delta |
78
- |---|---|---|---|
79
- | Simple Q&A | Groq/Cerebras | -98% | -12% |
80
- | Code (simple) | Groq/Cerebras | -98% | -14% |
81
- | Code (complex) | DeepSeek/Mistral | -97% | -4% |
82
- | Summary | MiniMax/Mistral | -93% | -3% |
83
- | Complex Reasoning | GLM-4/Mistral | -91% | -4% |
84
- | Multilingual | GLM-4/MiniMax | -91% | +2% |
85
- | Fallback (uncertain) | GPT-4/Claude | baseline | baseline |
86
-
87
- Applying this to the query distribution: **70.3% cost reduction** with a weighted quality drop of 3.8 points.
88
-
89
- ### What I Built
90
-
91
- I packaged this into an npm library: **A3M Router**.
92
-
93
- - GitHub: https://github.com/Das-rebel/a3m-router
94
- - npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
95
-
96
- ```bash
97
- npm install adaptive-memory-multi-model-router
98
- npx a3m-router serve
99
- # Then point OpenAI SDK at localhost:8787
100
- ```
101
-
102
- ## Limitations
103
-
104
- 1. **Streaming latency not measured.** Most production apps use streaming.
105
- 2. **Context window behavior not tested.** All queries were under 4K tokens.
106
- 3. **Single region only.** All requests from US-East.
107
- 4. **Quality scoring has biases.** LLM-as-judge prefers longer outputs.
108
- 5. **Snapshot in time.** Numbers are from March-May 2026.
109
- 6. **Sample bias.** Queries come from my own applications.
110
-
111
- ## Questions for the Community
112
-
113
- - What providers did I miss? I tested 47 but there are many more.
114
- - Do these quality scores match your experience?
115
- - Has anyone trained a learned router? I experimented with this but rule-based matched it within 1%.
116
- - How are you handling provider failover?
117
-
118
- **Links:**
119
- - GitHub: https://github.com/Das-rebel/a3m-router
120
- - npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
121
- - Raw benchmark data in `benchmarks/` — PRs welcome
122
- ```
123
-
124
- **Pre-written comments:**
125
-
126
- 1. **Q: How does this compare to LiteLLM?**
127
- A: LiteLLM (48K stars) does sequential fallback (try A → B → C). A3M Router runs all candidates in parallel and picks the best result. It's architecturally different — not just another proxy layer.
128
-
129
- 2. **Q: What's the accuracy on routing decisions?**
130
- A: 82.5% routing accuracy (within 1 quality tier) based on our benchmark suite. We compared against RouteLLM's BERT classifier (85%) — 2.5% gap, but zero ML infrastructure needed.
131
-
132
- 3. **Q: What happens when a provider goes down?**
133
- A: A3M has automatic failover with circuit breakers. If your primary provider fails mid-request, it routes to the next best candidate. Timeout is configurable (default 2s).
134
-
135
- 4. **Q: Is this production-ready?**
136
- A: 271 tests passing, 15K+ npm downloads, active development. Use at your own discretion like any open-source project.
137
-
138
- 5. **Q: Can I use my own API keys?**
139
- A: Yes. A3M Router is a local proxy — you bring your own API keys. It never stores or exfilters them.
140
-
141
- ---
142
-
143
- ## Post 2: r/MachineLearning
144
-
145
- **URL:** https://www.reddit.com/r/MachineLearning/submit/
146
-
147
- **Title:** [P] A3M Router achieves 82.5% routing accuracy with keyword matching — matches RouteLLM's BERT classifier (85%) without GPU
148
-
149
- **Body:**
150
-
151
- ```
152
- Hi r/MachineLearning,
153
-
154
- We benchmarked our keyword-matching LLM router against RouteLLM's GPU-trained BERT classifier. The results surprised us.
155
-
156
- **Benchmark comparison:**
157
-
158
- | Metric | RouteLLM (BERT) | A3M Router (Keywords) |
159
- |--------|------------------|------------------------|
160
- | Accuracy (±1 tier) | 85% | 82.5% |
161
- | ML required | Yes (PyTorch + CUDA) | No |
162
- | Model size | ~500MB BERT | 0 bytes |
163
- | GPU required | Yes | No |
164
- | Cold start | ~3s (model load) | ~50ms |
165
- | Install size | ~2GB+ | 3MB |
166
- | Runtime | Python | Node.js |
167
-
168
- 2.5% accuracy gap. Zero ML infrastructure.
169
-
170
- **Context:**
171
- RouteLLM (from UC Berkeley, arXiv:2404.06035) trains a BERT classifier to route LLM queries between tiers. It's the gold standard for published LLM routing benchmarks.
172
-
173
- We implemented routing via keyword-based feature extraction: 139 keywords, 12 complexity signals, heuristic scoring. No training loop, no gradient updates, no neural network.
174
-
175
- **Routing algorithm:**
176
- ```javascript
177
- // Feature extraction
178
- const features = extractQueryFeatures(query);
179
- // { has_code: true, complexity: 0.6, task_type: "code_gen" }
180
-
181
- // Complexity-weighted scoring
182
- if (features.complexity < 0.5) {
183
- score = cost_efficiency * 0.7 + quality * 0.3;
184
- } else if (features.has_code) {
185
- score = speed * 0.4 + quality * 0.4 + cost * 0.2;
186
- } else {
187
- score = quality * 0.7 + cost_efficiency * 0.3;
188
- }
189
- ```
190
-
191
- **Why this matters for the ML community:**
192
-
193
- 1. **Benchmark transparency**: There are exactly two LLM routers with published routing accuracy: RouteLLM and us. LiteLLM (47K GitHub stars) publishes zero accuracy data. If the most popular tool won't tell you how often it's right, something is wrong.
194
-
195
- 2. **Efficiency question**: Is a 2.5% accuracy improvement worth requiring PyTorch, CUDA, a GPU, 500MB model download, and 3-second cold starts? For many production deployments, the answer is no.
196
-
197
- 3. **The 30x story**: 97% of the accuracy at 3% of the compute. That's a 30x efficiency multiplier.
198
-
199
- **Cost results:**
200
- - 63.7% average cost reduction vs single-provider routing
201
- - 40 provider integrations
202
- - Drop-in OpenAI-compatible proxy (localhost:8787)
203
-
204
- **Growth (organically, zero marketing):**
205
- - Day 1: 552 downloads
206
- - Day 2: 320 downloads
207
- - Day 3: 1,903 downloads
208
- - 245% growth, zero budget
209
-
210
- **Questions for the community:**
211
-
212
- 1. What benchmark methodology should we use for a more rigorous comparison? We used the same ±1 tier accuracy metric as RouteLLM's paper.
213
- 2. Has anyone else compared simple heuristic routing vs learned routing for LLM query classification? The gap seems smaller than expected.
214
- 3. What accuracy threshold would you need to see to trust keyword-based routing in production?
215
-
216
- **Try it:**
217
- ```bash
218
- npm install adaptive-memory-multi-model-router
219
- npx a3m-router route "Write Python to sort an array"
220
- npx a3m-router benchmark
221
- ```
222
-
223
- GitHub: https://github.com/Das-rebel/a3m-router
224
-
225
- The honest caveat: this is a young project (3 days since launch). The 82.5% number is from our benchmark suite, not an independent evaluation. We welcome scrutiny and would love to see third-party replication.
226
- ```
227
-
228
- **Pre-written comments:**
229
-
230
- 1. **Q: Why not just use RouteLLM if it has higher accuracy?**
231
- A: RouteLLM requires PyTorch + CUDA + GPU + 500MB download + 3s cold start. A3M is 3MB, pure JS, starts in 50ms. For many deployments the 2.5% accuracy gap is worth the operational simplicity.
232
-
233
- 2. **Q: How does this handle non-English queries?**
234
- A: We have a multilingual routing category. GLM-4 and MiniMax both outperform GPT-4 on Hindi/Bengali/Chinese at 1/10th the cost based on our benchmarks.
235
-
236
- 3. **Q: Is there a learned routing version planned?**
237
- A: We experimented with a lightweight classifier but the rule-based approach matched it within 1% on cost savings. The complexity/reward tradeoff doesn't justify the additional infrastructure right now.
238
-
239
- 4. **Q: What about the parallel execution claim? Do you run all 47 providers at once?**
240
- A: No — that would be expensive and slow. Parallel execution is configurable: you can set how many candidates to run simultaneously. Default is top-2 with scoring.
241
-
242
- 5. **Q: How is routing quality measured in production over time?**
243
- A: Good question. We track cost-per-query and fallback rate. If fallback rates spike, we investigate routing rules. We'd love to add more sophisticated monitoring.
244
-
245
- ---
246
-
247
- ## Post 3: r/SideProject
248
-
249
- **URL:** https://www.reddit.com/r/SideProject/submit/
250
-
251
- **Title:** I built an LLM router that beats GPT-5 at 1/213th the cost — now at 15K npm downloads with zero marketing
252
-
253
- **Body:**
254
-
255
- ```
256
- ## What I built
257
-
258
- A3M Router — an open-source LLM routing proxy that automatically sends your queries to the cheapest capable model.
259
-
260
- **The numbers:**
261
- - #1 on RouterArena (70.32 score, beating GPT-5 at 64.32)
262
- - $0.047 per 1K queries — 213x cheaper than GPT-5
263
- - 15,237 npm downloads (grew from 0 to 15K in ~3 weeks, zero marketing)
264
- - 271 tests passing
265
- - 47+ providers: OpenAI, Anthropic, Groq, Cerebras, DeepSeek, Gemini, Mistral...
266
-
267
- ## The problem I was solving
268
-
269
- My AI side projects were getting expensive. Every query — whether "hi" or "explain quantum entanglement" — was going to GPT-4o at $30/1M tokens.
270
-
271
- I wanted: send cheap queries to cheap models, expensive queries to premium models, save money without losing quality.
272
-
273
- ## How it works
274
-
275
- ```bash
276
- # Install
277
- npm install adaptive-memory-multi-model-router
278
-
279
- # Start proxy
280
- npx a3m-router serve
281
- ```
282
-
283
- Then point your existing OpenAI code at localhost:8787:
284
-
285
- ```python
286
- from openai import OpenAI
287
- client = OpenAI(
288
- api_key="your-key",
289
- base_url="http://localhost:8787/v1"
290
- )
291
- # A3M routes automatically based on query complexity
292
- response = client.chat.completions.create(
293
- model="auto",
294
- messages=[{"role": "user", "content": "Debug my Python code"}]
295
- )
296
- # "Debug my Python code" → DeepSeek ($0.0003/query)
297
- # "Explain this quantum physics paper" → GPT-4o mini
298
- # "Hi" → Groq free tier
299
- ```
300
-
301
- ## What surprised me
302
-
303
- 1. **62% cost reduction was achievable** with less than 4-point quality drop
304
- 2. **Keyword-based routing matched BERT classifier within 2.5%** (RouteLLM, the gold standard, trains a BERT model for this — we used 139 keywords and heuristics)
305
- 3. **Groq/Cerebras are legitimately great for simple queries** — 2-4 quality points behind GPT-4 but 50x cheaper
306
- 4. **Multilingual is where mid-tier models shine** — GLM-4 beats GPT-4 on Hindi/Bengali at 1/10th the cost
307
-
308
- ## Not for you if
309
-
310
- - You need reliable function calling (OpenAI/Anthropic still ahead)
311
- - You're running long-context tasks (32K+ tokens — not tested)
312
- - You only use one model and it's working fine
313
-
314
- ## Try it
315
-
316
- - GitHub: https://github.com/Das-rebel/a3m-router
317
- - npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
318
- - Demo: https://asciinema.org/a/RpqOZM9tFMALYWvs
319
-
320
- Questions welcome!
321
- ```
322
-
323
- **Pre-written comments:**
324
-
325
- 1. **Q: Is this free?**
326
- A: The software is MIT-licensed and free. You pay for your own API keys. No subscription, no lock-in.
327
-
328
- 2. **Q: How does it decide which model to use?**
329
- A: It analyzes 12 keyword signals (query length, code keywords, complexity indicators, etc.) and routes based on a configurable scoring function. You can override the defaults per query type.
330
-
331
- 3. **Q: What if it routes to the wrong model?**
332
- A: You can set a `force_model` parameter to override routing for specific queries. There's also a fallback chain if the primary provider fails.
333
-
334
- 4. **Q: Does this work with Anthropic/Google/Groq API keys?**
335
- A: Yes — you set all your provider keys in the config, A3M manages which one gets used.
336
-
337
- 5. **Q: Can I self-host this?**
338
- A: Yes. It's a local Node.js proxy. Runs on your machine or server. No cloud dependency.
339
-
340
- ---
341
-
342
- ## Submission Checklist
343
-
344
- - [ ] r/LocalLLaMA — submit at https://www.reddit.com/r/LocalLLaMA/submit/
345
- - [ ] r/MachineLearning — submit at https://www.reddit.com/r/MachineLearning/submit/
346
- - [ ] r/SideProject — submit at https://www.reddit.com/r/SideProject/submit/
347
- - [ ] Monitor for comments, respond within 2 hours of posting
348
- - [ ] 24h later: cross-post to r/programming if engagement is positive