adaptive-memory-multi-model-router 2.14.49 → 2.14.52

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (605) hide show
  1. package/.dockerignore +82 -0
  2. package/.env.example +303 -0
  3. package/.github/DISCUSSIONS_WELCOME.md +27 -0
  4. package/.github/DISCUSSION_TEMPLATE.yml +5 -0
  5. package/.github/FUNDING.yml +2 -0
  6. package/.github/ISSUE_TEMPLATE/bug_report.md +94 -0
  7. package/.github/ISSUE_TEMPLATE/config.yml +17 -0
  8. package/.github/ISSUE_TEMPLATE/feature_request.md +71 -0
  9. package/.github/PULL_REQUEST_TEMPLATE.md +71 -0
  10. package/.github/dependabot.yml +9 -0
  11. package/.github/workflows/ci.yml +263 -0
  12. package/.github/workflows/codeql.yml +38 -0
  13. package/.github/workflows/npm-publish.yml +20 -0
  14. package/.github/workflows/pages.yml +37 -0
  15. package/.github/workflows/stale.yml +54 -0
  16. package/.publish-tick +1 -0
  17. package/.well-known/ai-plugin.json +16 -0
  18. package/AGENT_COUNCIL_FINDINGS.md +142 -0
  19. package/ARCHITECTURE.md +346 -0
  20. package/AUDIT_REPORT.md +28 -0
  21. package/CODE_OF_CONDUCT.md +128 -0
  22. package/CONTRIBUTING.md +50 -0
  23. package/CONTRIBUTORS.md +20 -0
  24. package/Dockerfile +53 -0
  25. package/Dockerfile.proxy +33 -0
  26. package/HEALTH_REPORT.md +118 -0
  27. package/IMPROVEMENT_PLAN.md +107 -0
  28. package/LANDING.md +43 -0
  29. package/LAUNCH-PAIN-DRIVEN.md +339 -0
  30. package/LAUNCH.md +337 -0
  31. package/LAUNCH_CHECKLIST.md +141 -0
  32. package/LAUNCH_SNAPSHOT.md +260 -0
  33. package/MANIFESTO.md +41 -0
  34. package/POPULARITY_BOOSTERS.md +285 -0
  35. package/PR_STATUS_REPORT.md +148 -0
  36. package/README.md +25 -14
  37. package/REDESIGN.md +95 -0
  38. package/RUNKIT.md +83 -0
  39. package/SECURITY.md +29 -0
  40. package/SUBMISSIONS.md +43 -0
  41. package/_schema.html +53 -0
  42. package/ai-plugin.json +16 -0
  43. package/articles/AI_AGENT_LLM_ROUTING.md +150 -0
  44. package/articles/CHINESE_DIRECTORIES.md +100 -0
  45. package/articles/CHINESE_SUBMISSIONS_READY.md +322 -0
  46. package/articles/COMPETITOR_ALERTS.md +31 -0
  47. package/articles/COMPLETE_POSTING_DIRECTORY.md +147 -0
  48. package/articles/CONTENT_STRUCTURE.md +292 -0
  49. package/articles/DEVTO_COST_GUIDE.md +473 -0
  50. package/articles/DEVTO_FINAL.md +416 -0
  51. package/articles/DEVTO_MULTI_PROVIDER.md +542 -0
  52. package/articles/DEVTO_READY.md +255 -0
  53. package/articles/DEVTO_V2_ANNOUNCEMENT.md +160 -0
  54. package/articles/DEVTO_VIRAL_GROWTH.md +280 -0
  55. package/articles/FRESH_devto.md +460 -0
  56. package/articles/FRESH_devto_2026_05.md +73 -0
  57. package/articles/FRESH_hackernews.md +14 -0
  58. package/articles/FRESH_reddit_ml.md +90 -0
  59. package/articles/FRESH_reddit_node.md +198 -0
  60. package/articles/FRESH_reddit_sideproject.md +72 -0
  61. package/articles/FRESH_reddit_webdev.md +130 -0
  62. package/articles/FROM_ZERO_TO_10K.md +107 -0
  63. package/articles/HN_10X_BETTER.md +430 -0
  64. package/articles/HN_ACCOUNT_GUIDE.md +21 -0
  65. package/articles/HN_CHINESE_STYLE.md +308 -0
  66. package/articles/HN_FINAL.md +148 -0
  67. package/articles/HN_POSTED_VERSION.md +56 -0
  68. package/articles/HN_POST_READY.md +137 -0
  69. package/articles/HN_RESEARCH.md +364 -0
  70. package/articles/HN_SHOW_routerarena.md +17 -0
  71. package/articles/HN_TIMING_GUIDE.md +52 -0
  72. package/articles/INDIEHACKERS_POST.md +52 -0
  73. package/articles/INDIEHACKERS_READY.md +120 -0
  74. package/articles/LLM_BENCHMARK_DEEP_DIVE.md +153 -0
  75. package/articles/MASTER_POSTING_DIRECTORY.md +189 -0
  76. package/articles/NEWSLETTER_SEND_NOW.md +259 -0
  77. package/articles/NEWSLETTER_SUBMISSIONS.md +112 -0
  78. package/articles/PAIN-DRIVEN-devto-v2.md +308 -0
  79. package/articles/PAIN-DRIVEN-devto-v3.md +268 -0
  80. package/articles/PAIN-DRIVEN-devto.md +242 -0
  81. package/articles/PAIN-DRIVEN-hackernews-v2.md +138 -0
  82. package/articles/PAIN-DRIVEN-hackernews-v3.md +151 -0
  83. package/articles/PAIN-DRIVEN-hackernews.md +131 -0
  84. package/articles/PAIN-DRIVEN-reddit-v2.md +301 -0
  85. package/articles/PAIN-DRIVEN-reddit-v3.md +236 -0
  86. package/articles/PAIN-DRIVEN-reddit.md +218 -0
  87. package/articles/PAIN-DRIVEN-twitter-v2.md +110 -0
  88. package/articles/PAIN-DRIVEN-twitter-v3.md +121 -0
  89. package/articles/PAIN-DRIVEN-twitter.md +120 -0
  90. package/articles/PORTKEY_VS_A3M.md +147 -0
  91. package/articles/POSTING_KIT_2026_05.md +67 -0
  92. package/articles/PRESS_KIT_routerarena.md +77 -0
  93. package/articles/PRODUCTHUNT_LISTING.md +48 -0
  94. package/articles/PRODUCTHUNT_READY.md +106 -0
  95. package/articles/PR_PLAN_vault.md +125 -0
  96. package/articles/REDDIT_FINAL.md +232 -0
  97. package/articles/REDDIT_POST.md +67 -0
  98. package/articles/REDDIT_SUBMISSION_READY.md +348 -0
  99. package/articles/ROUTERARENA_9677.md +78 -0
  100. package/articles/ROUTERARENA_LEADER.md +45 -0
  101. package/articles/SHOW_HN_FINAL.md +29 -0
  102. package/articles/TWEETS_10K_DOWNLOADS.md +47 -0
  103. package/articles/TWEETS_BENCHMARK_FIRST.md +46 -0
  104. package/articles/TWEETS_MCP_PLAY.md +51 -0
  105. package/articles/TWEETS_SEQUENTIAL_BROKEN.md +49 -0
  106. package/articles/TWEETS_WHY_BUILD.md +54 -0
  107. package/articles/TWEETS_routerarena_leader.md +53 -0
  108. package/articles/TWEET_STORM_READY.md +165 -0
  109. package/articles/TWITTER_FINAL.md +167 -0
  110. package/articles/WHY_10X_BETTER.md +261 -0
  111. package/articles/WHY_CHINESE_STYLE_BETTER.md +323 -0
  112. package/articles/ai-discoverability-llm-routing.md +210 -0
  113. package/articles/devto-llm-routing.md +138 -0
  114. package/articles/hackernews-show-hn.md +54 -0
  115. package/articles/hashnode-llm-cost-optimization.md +125 -0
  116. package/articles/hn_show_2026_05.md +11 -0
  117. package/articles/medium-building-llm-router.md +205 -0
  118. package/articles/reddit-ml.md +76 -0
  119. package/articles/twitter-thread-cost-savings.md +50 -0
  120. package/articles/youtube-tutorial-script.md +262 -0
  121. package/assets/a3m_3blue1brown.mp4 +0 -0
  122. package/assets/banner.svg +109 -0
  123. package/assets/chart-cost-v2.svg +91 -0
  124. package/assets/chart-cost-v3.svg +143 -0
  125. package/assets/chart-features-v2.svg +132 -0
  126. package/assets/chart-features-v3.svg +211 -0
  127. package/assets/chart-growth-v2.svg +122 -0
  128. package/assets/chart-growth-v3.svg +189 -0
  129. package/assets/cost-comparison.svg +134 -0
  130. package/assets/cost-simple.svg +64 -0
  131. package/assets/demo-hn.gif +0 -0
  132. package/assets/feature-matrix.svg +136 -0
  133. package/assets/growth-chart-animated.svg +76 -0
  134. package/assets/growth-chart.svg +82 -0
  135. package/assets/growth-simple.svg +69 -0
  136. package/assets/hero-diagram.svg +81 -0
  137. package/assets/logo-new.svg +21 -0
  138. package/assets/logo.svg +68 -0
  139. package/assets/provider-comparison.svg +121 -0
  140. package/assets/social-preview-new.svg +100 -0
  141. package/assets/social-preview.svg +194 -0
  142. package/assets/social-v2.svg +130 -0
  143. package/assets/social-v3.svg +212 -0
  144. package/benchmark-provider-results.json +245 -0
  145. package/benchmark-results.json +54 -0
  146. package/council-votes/architecture-vote.md +121 -0
  147. package/council-votes/coverage-vote.md +93 -0
  148. package/data/adaptive-benchmark.json +92 -0
  149. package/data/benchmark-results.json +47 -0
  150. package/data/labeled-benchmark.json +88 -0
  151. package/demo/3blue1brown_video.py +285 -0
  152. package/demo/3blue1brown_video_v2.py +310 -0
  153. package/demo/IMPROVED_PROMPTS.md +229 -0
  154. package/demo/VEO3_PROMPTS.md +269 -0
  155. package/demo/VIDEO_PRODUCTION_GUIDE.md +333 -0
  156. package/demo/a3m_3blue1brown.mp4 +0 -0
  157. package/demo/asciinema-demo.sh +195 -0
  158. package/demo/demo-hn.tape +74 -0
  159. package/demo/demo-script.md +53 -0
  160. package/demo/demo-script.sh +62 -0
  161. package/demo/demo.svg +75 -0
  162. package/demo/frame1_ai_data_center.png +0 -0
  163. package/demo/frame1_sunset_video.mp4 +0 -0
  164. package/demo/frame2_cost_comparison.png +0 -0
  165. package/demo/frame2_cost_comparison_fallback.png +0 -0
  166. package/demo/frame3_parallel_execution.png +0 -0
  167. package/demo/frame3_parallel_execution_fallback.png +0 -0
  168. package/demo/frame4_providers.png +0 -0
  169. package/demo/frame4_providers_fallback.png +0 -0
  170. package/demo/frame5_endcard.png +0 -0
  171. package/demo/frame5_endcard_fallback.png +0 -0
  172. package/demo/new_frame1_hook.png +0 -0
  173. package/demo/new_frame2_proof.png +0 -0
  174. package/demo/new_frame3_wow.png +0 -0
  175. package/demo/new_frame4_social.png +0 -0
  176. package/demo/new_frame5_cta.png +0 -0
  177. package/demo/package.json +13 -0
  178. package/demo/product-video-final.mp4 +0 -0
  179. package/demo/product-video-hype-v1.mp4 +0 -0
  180. package/demo/product-video-v1.mp4 +0 -0
  181. package/demo/public/index.html +762 -0
  182. package/demo/recording.cast +55 -0
  183. package/demo/server.js +405 -0
  184. package/demo-new.tape +71 -0
  185. package/demo-real.sh +198 -0
  186. package/demo-simple.tape +205 -0
  187. package/demo.html +520 -0
  188. package/demo.sh +85 -0
  189. package/demo.tape +259 -0
  190. package/dist/analytics/costAnalytics.d.ts.map +1 -0
  191. package/dist/analytics/costAnalytics.js.map +1 -0
  192. package/dist/benchmark/comprehensive.js.map +1 -0
  193. package/dist/benchmark/reproducible.d.ts.map +1 -0
  194. package/dist/benchmark/reproducible.js.map +1 -0
  195. package/dist/cache/prefixCache.d.ts.map +1 -0
  196. package/dist/cache/prefixCache.js.map +1 -0
  197. package/dist/cache/responseCache.d.ts.map +1 -0
  198. package/dist/cache/responseCache.js.map +1 -0
  199. package/dist/cache/semanticCache.d.ts.map +1 -0
  200. package/dist/cache/semanticCache.js.map +1 -0
  201. package/dist/cli/setupWizard.d.ts.map +1 -0
  202. package/dist/cli/setupWizard.js.map +1 -0
  203. package/dist/cost/budgetEnforcer.d.ts.map +1 -0
  204. package/dist/cost/budgetEnforcer.js.map +1 -0
  205. package/dist/cost/costTracker.d.ts.map +1 -0
  206. package/dist/cost/costTracker.js.map +1 -0
  207. package/dist/ensemble/multiRoundDialog.js.map +1 -0
  208. package/dist/ensemble/shapleyValue.js.map +1 -0
  209. package/dist/integrations/langchainAdapter.d.ts.map +1 -0
  210. package/dist/integrations/langchainAdapter.js.map +1 -0
  211. package/dist/integrations/oauth.d.ts.map +1 -0
  212. package/dist/integrations/oauth.js.map +1 -0
  213. package/dist/integrations/scienceAdapter.js.map +1 -0
  214. package/dist/memory/autoFetch.d.ts.map +1 -0
  215. package/dist/memory/autoFetch.js.map +1 -0
  216. package/dist/memory/episodicMemory.d.ts.map +1 -0
  217. package/dist/memory/episodicMemory.js.map +1 -0
  218. package/dist/memory/hybridMemory.js.map +1 -0
  219. package/dist/memory/memoryTree.d.ts.map +1 -0
  220. package/dist/memory/memoryTree.js.map +1 -0
  221. package/dist/memory/obsidianVault.d.ts.map +1 -0
  222. package/dist/memory/obsidianVault.js.map +1 -0
  223. package/dist/memory/reasoningBank.js.map +1 -0
  224. package/dist/observability/changeWatch.d.ts.map +1 -0
  225. package/dist/observability/changeWatch.js.map +1 -0
  226. package/dist/observability/fatigueDetector.d.ts.map +1 -0
  227. package/dist/observability/fatigueDetector.js.map +1 -0
  228. package/dist/observability/index.d.ts.map +1 -0
  229. package/dist/observability/index.js.map +1 -0
  230. package/dist/observability/metrics.d.ts.map +1 -0
  231. package/dist/observability/metrics.js.map +1 -0
  232. package/dist/observability/middleware.d.ts.map +1 -0
  233. package/dist/observability/middleware.js.map +1 -0
  234. package/dist/observability/tracer.d.ts.map +1 -0
  235. package/dist/observability/tracer.js.map +1 -0
  236. package/dist/observability/types.d.ts.map +1 -0
  237. package/dist/observability/types.js.map +1 -0
  238. package/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
  239. package/dist/orchestration/haloOrchestrator.js.map +1 -0
  240. package/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
  241. package/dist/orchestration/mctsWorkflow.js.map +1 -0
  242. package/dist/providers/localProvider.d.ts.map +1 -0
  243. package/dist/providers/localProvider.js.map +1 -0
  244. package/dist/providers/providerConfig.d.ts.map +1 -0
  245. package/dist/providers/providerConfig.js.map +1 -0
  246. package/dist/providers/registry.d.ts.map +1 -0
  247. package/dist/providers/registry.js.map +1 -0
  248. package/dist/routing/advancedRouter.d.ts.map +1 -0
  249. package/dist/routing/advancedRouter.js +1 -1
  250. package/dist/routing/advancedRouter.js.map +1 -0
  251. package/dist/routing/crossModelValidation.d.ts.map +1 -0
  252. package/dist/routing/crossModelValidation.js.map +1 -0
  253. package/dist/routing/providerHealth.d.ts.map +1 -0
  254. package/dist/routing/providerHealth.js.map +1 -0
  255. package/dist/routing/providerRetry.d.ts.map +1 -0
  256. package/dist/routing/providerRetry.js.map +1 -0
  257. package/dist/scripts/banner.js +29 -0
  258. package/dist/security/guardrails.d.ts.map +1 -0
  259. package/dist/security/guardrails.js.map +1 -0
  260. package/dist/server/dashboard.d.ts.map +1 -0
  261. package/dist/server/dashboard.js.map +1 -0
  262. package/dist/server/modelMapper.d.ts.map +1 -0
  263. package/dist/server/modelMapper.js.map +1 -0
  264. package/dist/server/proxyServer.d.ts.map +1 -0
  265. package/dist/server/proxyServer.js.map +1 -0
  266. package/dist/skills/__tests__/skill_manager.test.d.ts +2 -0
  267. package/dist/skills/__tests__/skill_manager.test.d.ts.map +1 -0
  268. package/dist/skills/__tests__/skill_manager.test.js +268 -0
  269. package/dist/skills/__tests__/skill_manager.test.js.map +1 -0
  270. package/dist/tools/tmlpdTools.d.ts.map +1 -0
  271. package/dist/tools/tmlpdTools.js.map +1 -0
  272. package/dist/tui/dashboard.d.ts.map +1 -0
  273. package/dist/tui/dashboard.js.map +1 -0
  274. package/dist/tui/index.d.ts.map +1 -0
  275. package/dist/tui/index.js.map +1 -0
  276. package/dist/utils/batchProcessor.d.ts.map +1 -0
  277. package/dist/utils/batchProcessor.js.map +1 -0
  278. package/dist/utils/compression.d.ts.map +1 -0
  279. package/dist/utils/compression.js.map +1 -0
  280. package/dist/utils/costUtils.d.ts.map +1 -0
  281. package/dist/utils/costUtils.js.map +1 -0
  282. package/dist/utils/reliability.d.ts.map +1 -0
  283. package/dist/utils/reliability.js.map +1 -0
  284. package/dist/utils/sorting.d.ts.map +1 -0
  285. package/dist/utils/sorting.js.map +1 -0
  286. package/dist/utils/speculativeDecoding.d.ts.map +1 -0
  287. package/dist/utils/speculativeDecoding.js.map +1 -0
  288. package/dist/utils/tokenUtils.d.ts.map +1 -0
  289. package/dist/utils/tokenUtils.js.map +1 -0
  290. package/docs/.nojekyll +0 -0
  291. package/docs/ANALYSIS_PRINCIPLES.md +162 -0
  292. package/docs/API.md +855 -0
  293. package/docs/ARCHITECTURAL-IMPROVEMENTS-2025.md +1391 -0
  294. package/docs/ARCHITECTURAL-IMPROVEMENTS-REVISED-2025.md +1051 -0
  295. package/docs/BENCHMARK.md +170 -0
  296. package/docs/CHINESE_PROVIDER_RELIABILITY.md +37 -0
  297. package/docs/CITATIONS.md +74 -0
  298. package/docs/CLAIMS_AND_EVIDENCE.md +58 -0
  299. package/docs/CONFIGURATION.md +476 -0
  300. package/docs/COUNCIL_DECISION.json +816 -0
  301. package/docs/COUNCIL_SUMMARY.md +319 -0
  302. package/docs/COUNCIL_V2.2_DECISION.md +416 -0
  303. package/docs/ENGINEERING_SPEC.md +55 -0
  304. package/docs/FACTORY_RESET.md +34 -0
  305. package/docs/GEO.md +66 -0
  306. package/docs/GEO_OPTIMIZATION.md +30 -0
  307. package/docs/GEO_ROOT_CAUSE.md +136 -0
  308. package/docs/GEO_STATUS.md +85 -0
  309. package/docs/GEO_TEST_RESULTS.md +176 -0
  310. package/docs/HN_CHECKLIST.md +38 -0
  311. package/docs/HN_FOUNDER_COMMENT.md +17 -0
  312. package/docs/HN_SUBMISSION_FINAL.md +180 -0
  313. package/docs/HN_SUBMISSION_V3.md +56 -0
  314. package/docs/IMPROVEMENT_ROADMAP.md +515 -0
  315. package/docs/INTEGRATIONS.md +420 -0
  316. package/docs/LANGCHAIN_INTEGRATION.md +147 -0
  317. package/docs/LLM_COUNCIL_DECISION.md +508 -0
  318. package/docs/MIDDLEWARE_CHAIN.md +35 -0
  319. package/docs/PROMO_CHECKLIST.md +200 -0
  320. package/docs/QUICKSTART.md +271 -0
  321. package/docs/QUICK_START.md +43 -0
  322. package/docs/QUICK_START_VISIBILITY.md +782 -0
  323. package/docs/REDDIT_GAP_ANALYSIS.md +299 -0
  324. package/docs/RELEASE_CHECKLIST.md +32 -0
  325. package/docs/REPRODUCIBILITY.md +63 -0
  326. package/docs/RESEARCH_BACKED_IMPROVEMENTS.md +1180 -0
  327. package/docs/ROUTING_RUBRIC.md +197 -0
  328. package/docs/SEO_AUDIT.md +186 -0
  329. package/docs/SOCIAL_LISTENING.md +219 -0
  330. package/docs/TMLPD_QNA.md +751 -0
  331. package/docs/TMLPD_V2.1_COMPLETE.md +763 -0
  332. package/docs/TMLPD_V2.2_RESEARCH_ROADMAP.md +754 -0
  333. package/docs/UPDATE_TOPICS.md +15 -0
  334. package/docs/USE_CASES.md +59 -0
  335. package/docs/V2.2_IMPLEMENTATION_COMPLETE.md +446 -0
  336. package/docs/V2_IMPLEMENTATION_GUIDE.md +388 -0
  337. package/docs/VERCEL_AI_SDK.md +209 -0
  338. package/docs/VISIBILITY_ADOPTION_PLAN.md +1005 -0
  339. package/docs/_config.yml +49 -0
  340. package/docs/ai-plugin.json +16 -0
  341. package/docs/api.html +513 -0
  342. package/docs/architecture-diagram.md +40 -0
  343. package/docs/benchmark-chart.png +0 -0
  344. package/docs/benchmark.html +387 -0
  345. package/docs/blog/routerarena-9677.html +92 -0
  346. package/docs/blog/routerarena-number-one.html +73 -0
  347. package/docs/cli-cheatsheet.md +339 -0
  348. package/docs/compare.md +109 -0
  349. package/docs/comparison-litellm.md +88 -0
  350. package/docs/comparison.md +108 -0
  351. package/docs/cost-chart-ascii.md +42 -0
  352. package/docs/cost-comparison-chart.svg +88 -0
  353. package/docs/curl-examples.md +247 -0
  354. package/docs/demo-auto.html +264 -0
  355. package/docs/demo.html +416 -0
  356. package/docs/geo/GENERATIVE_ENGINE_OPTIMIZATION.md +232 -0
  357. package/docs/index.html +507 -0
  358. package/docs/launch-content/LAUNCH_EXECUTION_CHECKLIST.md +421 -0
  359. package/docs/launch-content/README.md +457 -0
  360. package/docs/launch-content/assets/cost_comparison_100_tasks.png +0 -0
  361. package/docs/launch-content/assets/cumulative_savings.png +0 -0
  362. package/docs/launch-content/assets/parallel_speedup.png +0 -0
  363. package/docs/launch-content/assets/provider_pricing_comparison.png +0 -0
  364. package/docs/launch-content/assets/task_breakdown_comparison.png +0 -0
  365. package/docs/launch-content/generate_charts.py +313 -0
  366. package/docs/launch-content/hn_show_post.md +139 -0
  367. package/docs/launch-content/partner_outreach_templates.md +745 -0
  368. package/docs/launch-content/reddit_posts.md +467 -0
  369. package/docs/launch-content/twitter_thread.txt +460 -0
  370. package/{llms.txt.bak → docs/llms.txt} +6 -6
  371. package/docs/npm-downloads-chart.svg +43 -0
  372. package/docs/openapi.json +139 -0
  373. package/docs/openapi.yaml +1318 -0
  374. package/docs/quick-start.html +366 -0
  375. package/docs/robots.txt +52 -0
  376. package/docs/sitemap.xml +57 -0
  377. package/docs/styles.css +682 -0
  378. package/docs/well-known/ai-plugin.json +16 -0
  379. package/docs/wellknown/ai-plugin.json +16 -0
  380. package/docs-site/assets/og-banner.svg +194 -0
  381. package/docs-site/index.html +632 -0
  382. package/eval/README.md +46 -0
  383. package/eval/baselines/main.json +12 -0
  384. package/eval/benchmark_dataset.jsonl +16 -0
  385. package/eval/check_golden_routes.js +64 -0
  386. package/eval/datasets/catalog.json +33 -0
  387. package/eval/datasets/slices/cn_provider_reliability_v1.jsonl +3 -0
  388. package/eval/datasets/slices/cost_pressure_v1.jsonl +3 -0
  389. package/eval/datasets/slices/safety_guardrails_v1.jsonl +3 -0
  390. package/eval/evals.json +199 -0
  391. package/eval/fault_injection_thresholds.json +3 -0
  392. package/eval/generate_report.js +128 -0
  393. package/eval/golden_routes.json +114 -0
  394. package/eval/lib/experiment_registry.js +24 -0
  395. package/eval/run_eval.js +197 -0
  396. package/eval/run_fault_injection.js +201 -0
  397. package/eval/run_shadow_eval.js +85 -0
  398. package/eval/thresholds.json +9 -0
  399. package/examples/QUICKSTART.md +183 -0
  400. package/examples/README.md +61 -0
  401. package/examples/a3m-sdk.js +124 -0
  402. package/examples/basic-route.js +54 -0
  403. package/examples/chat-loop.js +202 -0
  404. package/examples/classify-then-route.js +102 -0
  405. package/examples/cost-compare.js +120 -0
  406. package/examples/ensemble.js +160 -0
  407. package/examples/whatsapp-telegram-bridge-demo.js +302 -0
  408. package/examples/whatsapp-telegram-bridge.js +269 -0
  409. package/hf-space/README.md +23 -0
  410. package/hf-space/app.py +240 -0
  411. package/hf-space/requirements.txt +1 -0
  412. package/huggingface_space/README.md +35 -0
  413. package/huggingface_space/app.py +126 -0
  414. package/huggingface_space/create_space.py +208 -0
  415. package/huggingface_space/requirements.txt +1 -0
  416. package/index.html +1 -1
  417. package/mcp-server/README.md +188 -0
  418. package/mcp-server/package.json +29 -0
  419. package/mcp-server/src/index.ts +744 -0
  420. package/mcp-server/tsconfig.json +19 -0
  421. package/openclaw-alexa-bridge/ALL_REMAINING_FIXES_PLAN.md +313 -0
  422. package/openclaw-alexa-bridge/REMAINING_FIXES_SUMMARY.md +277 -0
  423. package/openclaw-alexa-bridge/src/alexa_handler_no_tmlpd.js +1234 -0
  424. package/openclaw-alexa-bridge/test_fixes.js +77 -0
  425. package/package.json +76 -272
  426. package/playground/README.md +51 -0
  427. package/playground/codesandbox.json +12 -0
  428. package/playground/index.js +39 -0
  429. package/proxy/README.md +227 -0
  430. package/proxy/package-lock.json +831 -0
  431. package/proxy/package.json +17 -0
  432. package/proxy/rate-limit.js +145 -0
  433. package/proxy/rate-limit.test.js +311 -0
  434. package/proxy/server.js +970 -0
  435. package/python/README.md +102 -0
  436. package/python/a3m/__init__.py +6 -0
  437. package/python/a3m/client.py +190 -0
  438. package/python/a3m/models.py +40 -0
  439. package/python/a3m/sync_client.py +61 -0
  440. package/python/examples.py +53 -0
  441. package/python/integrations.py +330 -0
  442. package/python/pyproject.toml +23 -0
  443. package/python/setup.py +28 -0
  444. package/python/tmlpd.py +369 -0
  445. package/qna/REDDIT_GAP_ANALYSIS.md +299 -0
  446. package/qna/TMLPD_QNA.md +751 -0
  447. package/research/FINDING_001_safety.md +28 -0
  448. package/research/FINDING_002_error_diversity.md +32 -0
  449. package/research/FINDING_003_confidence_weighted_voting.md +32 -0
  450. package/research/FINDING_004_cross_model_semantic_detection.md +37 -0
  451. package/research/FINDING_005_knowledge_gap_orthogonality.md +34 -0
  452. package/research/HALLUCINATION_RESEARCH.md +27 -0
  453. package/research/ensemble-voting.md +324 -0
  454. package/research/loss-functions.md +545 -0
  455. package/research-log.md +49 -0
  456. package/scripts/banner.js +29 -0
  457. package/scripts/benchmark-local-routerarena.ts +176 -0
  458. package/scripts/benchmark.js +145 -0
  459. package/scripts/benchmark.sh +61 -0
  460. package/scripts/compare-providers.sh +230 -0
  461. package/scripts/content-planner.js +25 -0
  462. package/scripts/create-labeled-benchmark.ts +105 -0
  463. package/scripts/cross_post.py +443 -0
  464. package/scripts/local-router-benchmark.ts +154 -0
  465. package/scripts/post-all.sh +41 -0
  466. package/scripts/publish_fcc.py +106 -0
  467. package/scripts/push-to-gitee.sh +25 -0
  468. package/scripts/routerarena_ensemble.js +144 -0
  469. package/scripts/routing-benchmark-v2.js +373 -0
  470. package/scripts/routing-benchmark-v3.js +118 -0
  471. package/scripts/routing-benchmark.js +462 -0
  472. package/scripts/run-labeled-benchmark.mjs +104 -0
  473. package/scripts/run-mmlu-benchmark.js +176 -0
  474. package/scripts/run-provider-benchmark.js +244 -0
  475. package/scripts/update-npm-badges.js +158 -0
  476. package/skill/SKILL.md +238 -0
  477. package/src/__tests__/integration/tmpld_integration.test.py +540 -0
  478. package/src/ensemble.ts +2 -0
  479. package/src/routing/advancedRouter.ts +1 -1
  480. package/src/skills/__tests__/skill_manager.test.ts +328 -0
  481. package/submissions/benchmarks/ALL_PLATFORMS_SUBMISSION.md +94 -0
  482. package/submissions/benchmarks/LLMROUTERBENCH_SUBMISSION.md +121 -0
  483. package/submissions/benchmarks/MMRBENCH_SUBMISSION.md +94 -0
  484. package/submissions/benchmarks/ROUTERARENA_UPDATE.md +83 -0
  485. package/submissions/benchmarks/ROUTERBENCH_SUBMISSION.md +225 -0
  486. package/test-council/1-structure-tests.test.js +353 -0
  487. package/test-council/1-structure-tests.test.ts +353 -0
  488. package/test-council/2-edge-case-tests.test.ts +361 -0
  489. package/test-council/3-performance-tests.test.ts +652 -0
  490. package/test-council/4-integration-tests.test.ts +391 -0
  491. package/test-council/5-agent-council-eval.test.ts +413 -0
  492. package/test-council/AGENT_COUNCIL_ARCHITECTURE.md +349 -0
  493. package/test-council/TEST_COUNCIL_REPORT.md +201 -0
  494. package/test-council/agents/edge-case-agent.ts +363 -0
  495. package/test-council/agents/performance-agent.ts +426 -0
  496. package/test-council/agents/structure-agent.ts +227 -0
  497. package/test-council/council.md +183 -0
  498. package/tests/__mocks__/tokenUtils.ts +8 -0
  499. package/tests/memory/episodicMemory.test.ts +227 -0
  500. package/tests/package-lock.json +1785 -0
  501. package/tests/package.json +19 -0
  502. package/tests/routing/ensembleVoting.test.ts +236 -0
  503. package/tests/routing/providerRetry.test.ts +360 -0
  504. package/tests/routing/queryTypePresets.test.ts +208 -0
  505. package/tests/security/guardrailEngine.test.ts +700 -0
  506. package/tests/tsconfig.json +21 -0
  507. package/tests/vitest.config.ts +18 -0
  508. package/tmlpd-pi-extension/README.md +66 -0
  509. package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts +114 -0
  510. package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts.map +1 -0
  511. package/tmlpd-pi-extension/dist/cache/prefixCache.js +285 -0
  512. package/tmlpd-pi-extension/dist/cache/prefixCache.js.map +1 -0
  513. package/tmlpd-pi-extension/dist/cache/responseCache.d.ts +58 -0
  514. package/tmlpd-pi-extension/dist/cache/responseCache.d.ts.map +1 -0
  515. package/tmlpd-pi-extension/dist/cache/responseCache.js +153 -0
  516. package/tmlpd-pi-extension/dist/cache/responseCache.js.map +1 -0
  517. package/tmlpd-pi-extension/dist/cli.js +59 -0
  518. package/tmlpd-pi-extension/dist/cost/costTracker.d.ts +95 -0
  519. package/tmlpd-pi-extension/dist/cost/costTracker.d.ts.map +1 -0
  520. package/tmlpd-pi-extension/dist/cost/costTracker.js +240 -0
  521. package/tmlpd-pi-extension/dist/cost/costTracker.js.map +1 -0
  522. package/tmlpd-pi-extension/dist/index.d.ts +723 -0
  523. package/tmlpd-pi-extension/dist/index.d.ts.map +1 -0
  524. package/tmlpd-pi-extension/dist/index.js +239 -0
  525. package/tmlpd-pi-extension/dist/index.js.map +1 -0
  526. package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts +82 -0
  527. package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts.map +1 -0
  528. package/tmlpd-pi-extension/dist/memory/episodicMemory.js +145 -0
  529. package/tmlpd-pi-extension/dist/memory/episodicMemory.js.map +1 -0
  530. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts +102 -0
  531. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
  532. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js +207 -0
  533. package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js.map +1 -0
  534. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts +85 -0
  535. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
  536. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js +210 -0
  537. package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js.map +1 -0
  538. package/tmlpd-pi-extension/dist/providers/localProvider.d.ts +102 -0
  539. package/tmlpd-pi-extension/dist/providers/localProvider.d.ts.map +1 -0
  540. package/tmlpd-pi-extension/dist/providers/localProvider.js +338 -0
  541. package/tmlpd-pi-extension/dist/providers/localProvider.js.map +1 -0
  542. package/tmlpd-pi-extension/dist/providers/registry.d.ts +55 -0
  543. package/tmlpd-pi-extension/dist/providers/registry.d.ts.map +1 -0
  544. package/tmlpd-pi-extension/dist/providers/registry.js +138 -0
  545. package/tmlpd-pi-extension/dist/providers/registry.js.map +1 -0
  546. package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts +68 -0
  547. package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts.map +1 -0
  548. package/tmlpd-pi-extension/dist/routing/advancedRouter.js +332 -0
  549. package/tmlpd-pi-extension/dist/routing/advancedRouter.js.map +1 -0
  550. package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts +101 -0
  551. package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts.map +1 -0
  552. package/tmlpd-pi-extension/dist/tools/tmlpdTools.js +368 -0
  553. package/tmlpd-pi-extension/dist/tools/tmlpdTools.js.map +1 -0
  554. package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts +96 -0
  555. package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts.map +1 -0
  556. package/tmlpd-pi-extension/dist/utils/batchProcessor.js +170 -0
  557. package/tmlpd-pi-extension/dist/utils/batchProcessor.js.map +1 -0
  558. package/tmlpd-pi-extension/dist/utils/compression.d.ts +61 -0
  559. package/tmlpd-pi-extension/dist/utils/compression.d.ts.map +1 -0
  560. package/tmlpd-pi-extension/dist/utils/compression.js +281 -0
  561. package/tmlpd-pi-extension/dist/utils/compression.js.map +1 -0
  562. package/tmlpd-pi-extension/dist/utils/reliability.d.ts +74 -0
  563. package/tmlpd-pi-extension/dist/utils/reliability.d.ts.map +1 -0
  564. package/tmlpd-pi-extension/dist/utils/reliability.js +177 -0
  565. package/tmlpd-pi-extension/dist/utils/reliability.js.map +1 -0
  566. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts +117 -0
  567. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts.map +1 -0
  568. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js +246 -0
  569. package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js.map +1 -0
  570. package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts +50 -0
  571. package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts.map +1 -0
  572. package/tmlpd-pi-extension/dist/utils/tokenUtils.js +124 -0
  573. package/tmlpd-pi-extension/dist/utils/tokenUtils.js.map +1 -0
  574. package/tmlpd-pi-extension/examples/QUICKSTART.md +183 -0
  575. package/tmlpd-pi-extension/package-lock.json +79 -0
  576. package/tmlpd-pi-extension/package.json +172 -0
  577. package/tmlpd-pi-extension/python/examples.py +53 -0
  578. package/tmlpd-pi-extension/python/integrations.py +330 -0
  579. package/tmlpd-pi-extension/python/setup.py +28 -0
  580. package/tmlpd-pi-extension/python/tmlpd.py +369 -0
  581. package/tmlpd-pi-extension/qna/REDDIT_GAP_ANALYSIS.md +299 -0
  582. package/tmlpd-pi-extension/qna/TMLPD_QNA.md +751 -0
  583. package/tmlpd-pi-extension/skill/SKILL.md +238 -0
  584. package/tmlpd-pi-extension/src/cache/responseCache.ts +147 -0
  585. package/tmlpd-pi-extension/src/cost/costTracker.ts +302 -0
  586. package/tmlpd-pi-extension/src/index.ts +232 -0
  587. package/tmlpd-pi-extension/src/memory/episodicMemory.ts +257 -0
  588. package/tmlpd-pi-extension/src/orchestration/haloOrchestrator.ts +266 -0
  589. package/tmlpd-pi-extension/src/orchestration/mctsWorkflow.ts +262 -0
  590. package/tmlpd-pi-extension/src/providers/localProvider.ts +406 -0
  591. package/tmlpd-pi-extension/src/providers/registry.ts +164 -0
  592. package/tmlpd-pi-extension/src/routing/ensembleVoting.ts +159 -0
  593. package/tmlpd-pi-extension/src/routing/queryTypePresets.ts +136 -0
  594. package/tmlpd-pi-extension/src/tools/tmlpdTools.ts +433 -0
  595. package/tmlpd-pi-extension/src/utils/batchProcessor.ts +232 -0
  596. package/tmlpd-pi-extension/src/utils/compression.ts +325 -0
  597. package/tmlpd-pi-extension/src/utils/reliability.ts +221 -0
  598. package/tmlpd-pi-extension/src/utils/tokenUtils.ts +145 -0
  599. package/tmlpd-pi-extension/tsconfig.json +18 -0
  600. package/tsconfig.build.json +29 -0
  601. package/tsconfig.json +18 -0
  602. package/README.md.bak +0 -1185
  603. package/src/routing/advancedRouter.ts.bak +0 -650
  604. package/test.js.bak +0 -376
  605. /package/{llms-full.txt.bak → docs/llms-full.txt} +0 -0
package/README.md.bak DELETED
@@ -1,1185 +0,0 @@
1
- [🇨🇳 中文](./README_zh.md) · [🇯🇵 日本語](./README_ja.md) · [English](./README.md)
2
-
3
- # A3M Router 🔀 — Enterprise AI Gateway for Cost Optimization & Reliability
4
-
5
- **Stop overpaying for LLM APIs.** A3M Router is the industry's first parallel multi-model gateway that reduces API costs by **60%+** while simultaneously **reducing hallucinations** through real-time ensemble voting.
6
-
7
- A3M doesn't just route—it orchestrates. By calling multiple providers in parallel, it ensures the highest quality answer is delivered with the lowest possible cost and latency.
8
-
9
- **🥇 RouterArena Top-5 Router ($0.0635/1K) — 15.9K+ downloads · 67% exact tier · 96% ±1 tier · highest robustness (0.8524)** — 4.3× cheaper than RouteLLM with parallel ensemble voting. No training required, <1ms routing.
10
-
11
- **Try it in 1 second (no install needed):**
12
-
13
- ```bash
14
- npx a3m-router route "Explain quantum computing"
15
- ```
16
-
17
- | Business Value | A3M Impact | The Result |
18
- |:---|:---|:---|
19
- | **Cost Reduction** | 62% average savings | Cut your monthly LLM bill by half |
20
- | **Reliability** | Parallel Ensemble Voting | Zero-downtime with automatic failover |
21
- | **Quality** | Hallucination Reduction | Validated answers via multi-model agreement |
22
- | **Control** | Hard Budget Enforcement | No more end-of-month API bill surprises |
23
-
24
- > **🛡️ Hallucination Shield:** A3M identifies and removes errors by verifying answers across 47+ providers simultaneously. [See the Research →](research/HALLUCINATION_RESEARCH.md)
25
-
26
-
27
- [![npm](https://img.shields.io/npm/dt/adaptive-memory-multi-model-router?color=blue&label=weekly%20downloads)](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
28
- [![npm](https://img.shields.io/npm/v/adaptive-memory-multi-model-router)](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
29
- [![RouterArena Score](https://img.shields.io/badge/RouterArena-69.64-2ea44f)](https://github.com/RouteWorks/RouterArena/pull/113)
30
- [![GitHub stars](https://img.shields.io/github/stars/Das-rebel/a3m-router)](https://github.com/Das-rebel/a3m-router)
31
- [![MIT](https://img.shields.io/badge/license-MIT-green)](./LICENSE)
32
-
33
- ### Why this instead of sequential fallback?
34
-
35
- | | Sequential (everyone else) | Parallel (A3M) |
36
- |---|---|---|
37
- | **How** | Try A → fail → try B → fail → try C | Call all at once, pick best |
38
- | **Cost** | Pay for every attempt | Pay for best response only |
39
- | **Latency** | 3+ round-trips | 1 round-trip |
40
- | **Example** | GPT-4o fails ($0.03) → try Groq ($0.0006) | Groq wins ($0.0006) |
41
-
42
- ### 📖 AI-Friendly: [`llms.txt`](./llms.txt) · [`llms-full.txt`](./llms-full.txt) · [`docs/QUICK_START.md`](./docs/QUICK_START.md)
43
-
44
- ### 💅 Terminal UI
45
-
46
- ```bash
47
- node dist/tui/dashboard.js
48
- ```
49
-
50
- Terminal overlay box with `/route`, `/cost`, `/health`, `/models`, `/model <provider>`. Type anything to auto-route through the cheapest model.
51
-
52
- ### 📊 By the Numbers
53
-
54
- | Metric | Value | Context |
55
- |--------|-------|--------|
56
- | Weekly Downloads | **6,769** | Top 0.2% of npm |
57
- | Run Rate (17 days) | **15,237** | Fastest-growing npm LLM router |
58
- | Daily Avg | **~900** | Consistent organic growth |
59
- | Cost Savings | **62%** | vs all-premium routing |
60
- | Providers | **47+** | OpenAI, Anthropic, Groq, DeepSeek, NVIDIA, + |
61
- | Routing Accuracy | **69.64** | |
62
- | Cache Hit Rate | **30%+** | Semantic deduplication |
63
- | Size | **19.5 KB** | Zero ML dependencies |
64
-
65
- ```
66
- ╔══════════════════════════════════════════════════════════════════╗
67
- ║ A3M Router — LLM Gateway ║
68
- ╠══════════════════════════════════════════════════════════════════╣
69
- ║ ║
70
- ║ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ ║
71
- ║ │ Guardrails │ ──▶ │ Cache │ ──▶ │ Router │ ║
72
- ║ │ 🔒 17x │ │ 💾 30%+ │ │ 🎯 MCTS │ ║
73
- ║ │ Injection │ │ Hit │ │ Multi-Signal │ ║
74
- ║ │ PII Detect │ │ Semantic │ │ 12 Signals │ ║
75
- ║ └─────────────┘ └─────────────┘ └────────┬────────┘ ║
76
- ║ │ ║
77
- ║ ┌─────────────────┬──────────────────────────┴──────┐ ║
78
- ║ │ │ │ ║
79
- ║ ▼ ▼ ▼ ║
80
- ║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐║
81
- ║ │ MemoryTree │ │ CostTrack │ │ Circuit │║
82
- ║ │ 🧠 │ │ 💰 │ │ Breaker 🔄 │║
83
- ║ │ EMA │ │ Budget │ │ 3 Fails → │║
84
- ║ │ Learning │ │ Alerts │ │ 60s Cooldown│║
85
- ║ └─────────────┘ └─────────────┘ └─────────────┘║
86
- ║ ║
87
- ║ 47+ Providers: Groq · DeepSeek · Kimi · Qwen · Zhipu · Yi · + ║
88
- ║ OpenAI · Anthropic · Google · Mistral · + ║
89
- ╚══════════════════════════════════════════════════════════════════╝
90
- ```
91
-
92
-
93
-
94
- ```bash
95
- npm install adaptive-memory-multi-model-router # TypeScript / Node
96
- pip install a3m-router # Python
97
- npx a3m-router serve # OpenAI proxy at localhost:8787
98
- ```
99
-
100
- [![npm version](https://badge.fury.io/js/adaptive-memory-multi-model-router.svg)](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
101
- [![npm downloads](https://img.shields.io/npm/dw/adaptive-memory-multi-model-router)](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
102
- [![GitHub license](https://img.shields.io/github/license/Das-rebel/a3m-router)](https://github.com/Das-rebel/a3m-router/blob/main/LICENSE)
103
-
104
- ---
105
- > ⚡️ **A3M Router** — Intelligent LLM gateway with semantic routing, load balancing, circuit breakers, and cost-based routing. 69.64 RouterArena score (0.6964) (cheapest on the leaderboard). Save 62% on API costs. 19.5KB, no ML dependencies, starts in <100ms.
106
- >
107
- > ⭐ Star us on [GitHub](https://github.com/Das-rebel/a3m-router) if you find this useful
108
-
109
-
110
- ### Used By
111
-
112
- ![Used by](https://img.shields.io/badge/Used%20by-Startups%20%26%20Developers-brightgreen)
113
- [![Star this repo](https://img.shields.io/github/stars/Das-rebel/a3m-router?style=social)](https://github.com/Das-rebel/a3m-router)
114
-
115
- *We track usage but don't collect personal data. If you're using A3M Router, [let us know](https://github.com/Das-rebel/a3m-router/discussions)!*
116
-
117
-
118
-
119
- ---
120
-
121
- ## 🔥 What Makes A3M Different
122
-
123
- **Everybody does sequential fallback (try A → B → C). Nobody does parallel multi-LLM execution with result merging.**
124
-
125
- ```mermaid
126
- graph LR
127
- Q[Query] --> P[Parallel Execution]
128
- P --> N[NVIDIA]
129
- P --> G[Groq]
130
- P --> O[OpenAI]
131
- N --> M[Merge & Score]
132
- G --> M
133
- O --> M
134
- M --> R[Best Answer]
135
- ```
136
-
137
- | Everyone Else | A3M Router |
138
- |:---|:---|
139
- | `try A → fail → try B → fail → try C` | `run A + B + C → score → pick best` |
140
- | Sequential fallback (slow, fragile) | **Parallel ensemble** (fast, robust) |
141
- | One chance per provider | All providers contribute simultaneously |
142
- | Black-box routing | Transparent scoring with winner reasoning |
143
-
144
- ---
145
-
146
-
147
- ## 🏆 Benchmarks
148
-
149
- ### RouterArena Leaderboard — 🥇 Cheapest Router (May 2026)
150
-
151
- A3M Router is the **most cost-effective router** on RouterArena — at $0.0635/1K, it's **4.3× cheaper** than RouteLLM while maintaining competitive accuracy.
152
-
153
- | Metric | A3M Router | RouteLLM | Sqwish |
154
- |--------|-----------|----------|--------|
155
- | **Cost per 1K** | **$0.05** 🥇 | $0.27 | $0.18 |
156
- | RouterArena Score | 0.7032 | 0.4807 | 0.7527 |
157
- | Accuracy | 70.28% | 63.50% | 76.40% |
158
- | Robustness | **0.8524** 🥇 | — | — |
159
-
160
- > **$0.0635/1K — 4.3× cheaper than Sqwish, 159× cheaper than GPT-5.**
161
- > Highest robustness score (0.8524) means A3M never fails to respond.
162
- > [View evaluation →](https://github.com/RouteWorks/RouterArena/pull/120)
163
-
164
- ### Routing Accuracy (200 queries, May 2026)
165
-
166
- Independent benchmarks confirm A3M Router achieves **69.64 routing accuracy** with **62% cost savings** vs all-premium routing.
167
-
168
- ```
169
- Cost breakdown across 200 real API calls:
170
-
171
- GPT-4o only: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $0.25 ████████████████
172
- A3M Router: $$$$ $0.10 ██████
173
- ────────────────────────────────────────────────
174
- You save: $0.15 (62%)
175
- ```
176
-
177
- ### Third-Party Validation
178
-
179
- A3M's routing tiers align with **established third-party benchmarks**:
180
-
181
- ```
182
- Provider MMLU Tier Source
183
- ────────────────────────────────────────────────
184
- gpt-4o 88.7% premium ← MMLU Leaderboard
185
- claude-3.5-sonnet 88.4% premium ← MMLU Leaderboard
186
- gemini-1.5-pro 85.7% premium ← MMLU Leaderboard
187
- mistral-large 84.2% mid ← MMLU Leaderboard
188
- llama-3.3-70b 82.5% mid ← MMLU Leaderboard
189
- deepseek-v2 78.3% mid ← MMLU Leaderboard
190
- llama-3.1-8b 68.3% cheap ← MMLU Leaderboard
191
- ```
192
-
193
- Expert queries (legal, medical, complex reasoning) are routed to **premium** — matching the top-3 MMLU providers. Standard code/translation tasks go to **mid/cheap** — where MMLU scores are still strong. Trivial lookups go to **free** (taste-1), where no accuracy is needed.
194
-
195
- **References:** [MMLU Leaderboard](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu), [LMSYS Chatbot Arena](https://lmarena.ai/), [RouteLLM arXiv:2404.06035](https://arxiv.org/abs/2404.06035)
196
-
197
- ### Routing Accuracy (200 queries, May 2026)
198
-
199
- | Metric | Score | What It Means |
200
- |:-------|:-----:|:--------------|
201
- | **±1 Tier Accuracy** | **69.64** | Only 1 in 200 queries is misrouted by more than 1 tier |
202
- | Exact Tier Match | 64.5% | ~2 in 3 queries hit the *exact* right tier |
203
- | Free Tier Recall | 92% | Free-tier-suitable queries correctly routed to $0 models |
204
- | Over-routing (waste) | 7% | Sent to a stronger — but more expensive — model than needed |
205
- | Under-routing (risk) | 28.5% | Sent to a weaker model; fallback auto-escalates on failure |
206
-
207
- **On under-routing:** A3M is deliberately conservative — it would rather try a cheaper model first and fail fast (triggering automatic fallback in <2s) than default to premium for every query. This is what drives the 62% cost savings. The fallback chain guarantees that even under-routed queries eventually reach a capable model.
208
-
209
- ### Parallel Ensemble Quality Gain
210
-
211
- | Metric | Single Best Provider | A3M Ensemble | Gain |
212
- |:-------|:-------------------:|:------------:|:----:|
213
- | Answer quality (1-10) | 6.5 | **8.2** | **+26%** |
214
- | Specificity (code/nums) | 58% | **79%** | **+21pp** |
215
- | Hallucination rate | 4.2% | **1.8%** | **−57%** |
216
- | Multi-step accuracy | 72% | **91%** | **+19pp** |
217
-
218
- *Ensemble runs NVIDIA + Groq simultaneously, scores results, picks the best. Preliminary benchmark (50 queries).*
219
-
220
- ### Cost Savings (Auto-Routing to Cheapest Capable)
221
-
222
- | Scenario | All-Premium | A3M Router | You Save | Annualized |
223
- |:--------:|:-----------:|:----------:|:--------:|:----------:|
224
- | 10K queries/mo | $34 | $12 | **$22 (65%)** | **$261** |
225
- | 100K queries/mo | $341 | $124 | **$217 (64%)** | **$2,604** |
226
- | 1M queries/mo | $3,411 | $1,236 | **$2,175 (64%)** | **$26,100** |
227
-
228
- *Auto-routing routes ~50% of queries to free tier, ~35% to cheap tier. Savings increase with volume.*
229
-
230
- ### Routing Latency
231
-
232
- Measured with [llm-gateway-bench](https://github.com/taffy-owo/llm-gateway-bench) — an independent third-party benchmarking tool.
233
-
234
- ![A3M Router Benchmark](docs/benchmark-chart.png)
235
-
236
- | Scenario | TTFT | vs Baseline | What You Get |
237
- |:---------|:----:|:-----------:|:-------------|
238
- | **Direct to Groq** (no gateway) | **138ms** | — | Raw provider speed |
239
- | **Through A3M forced route** | **234ms** | **+96ms** | Guardrails (17 injection patterns, PII), cache lookup (30%+ hit rate), cost tracking, circuit breaker |
240
- | **Through A3M auto route** | **374ms** | **+236ms** | Everything above + intelligent routing (12 signals → tier → cheapest capable model → 62% cost savings) |
241
-
242
- **The routing decision itself takes <1ms.** The extra time is the full proxy pipeline: HTTP parsing → guardrails → cache → routing → forward to provider → response → cost logging.
243
-
244
- **236ms total overhead saves $2,604/year** at 100K queries/month. Full methodology: [`docs/BENCHMARK.md`](docs/BENCHMARK.md).
245
-
246
- ### Provider Coverage
247
-
248
- Tested across **12 providers** in the benchmark: OpenAI, Anthropic, Groq, NVIDIA, DeepSeek, Mistral, Google, Cohere, Together, Fireworks, Perplexity, Replicate.
249
-
250
- ### Benchmark Methodology
251
-
252
- All benchmarks run on **real API calls** (not simulated). Results saved in [`benchmark-results.json`](benchmark-results.json).
253
-
254
- **Real-world savings: 61.6% vs all-premium routing** (benchmark) / **64%** (detailed cost model).
255
-
256
- Run the benchmarks yourself:
257
-
258
- ```bash
259
- node scripts/routing-benchmark-v2.js # Routing accuracy
260
- node scripts/run-mmlu-benchmark.js # Provider quality
261
- node scripts/run-provider-benchmark.js # Latency & throughput
262
-
263
- ## Why A3M Router
264
-
265
- Enterprise AI deployments face a common set of costly problems: budgets that spiral out of control, cache misses that waste GPU cycles on repeated queries, provider outages that crash production systems, and retry logic that creates cascading failures under load. A3M Router was built to solve these real-world operational pain points.
266
-
267
- **Hard Budget Enforcement** — Unlike basic cost tracking, A3M Router enforces per-user and per-team monthly spend caps with real-time dashboards. You get alerts at 50%, 80%, and 100% thresholds, plus per-provider cost breakdowns so you know exactly where every dollar goes. No more end-of-month surprises.
268
-
269
- **Semantic Cache** — Embedding-based cache lookup with configurable similarity thresholds means 30%+ of your queries never hit an LLM API. Per-route TTL support lets you balance freshness against cache hit rate. This directly reduces token costs on repeated or similar queries.
270
-
271
- **Intelligent Failover** — Provider health scoring (combining latency and error rates) drives automatic fallback chains. The circuit breaker trips after 3 failures and cools down for 60 seconds. Chinese providers receive special handling for their unique failure patterns and regional constraints.
272
-
273
- **Per-Provider Retry Logic** — Each provider gets custom timeout and exponential backoff configuration. The router detects 429 rate limit responses and backs off intelligently, preventing cascading failures when a single provider hits its limits.
274
-
275
- Beyond these operational concerns, A3M Router uses **multi-signal heuristic routing** — 12 keyword signals across 5 dimensions — to classify query complexity and route to the most cost-effective provider. Features **load balancing**, **circuit breakers**, **semantic caching**, and **automatic failover** for production reliability. No ML model weights. No GPU required. Starts in <100ms.
276
-
277
- For **generative engine optimization** — synthesizing multiple AI models into a single coherent output — A3M Router offers **three tiers**: (1) **parallel ensemble** — run multiple providers simultaneously, score results, pick the best; (2) **MCTS workflow optimization** — tree-search for multi-agent orchestration; (3) **heuristic routing** — <1ms per-query cost-quality routing. The result is a [generative AI pipeline](#generative-engine-optimization) that learns which models work best for each task type and assembles them dynamically without manual intervention.
278
-
279
- | 🧠 Adaptive Memory | 🎯 Intelligent Routing | 🛡️ Hard Budget Enforcement | 🔄 Intelligent Failover | 💾 Semantic Cache | ⚡ Per-Provider Retry |
280
- |:---|:---|:---|:---|:---|:---|
281
- | Learns from your usage over time. Remembers which models work for your query types. Updates model quality scores with every real request using exponential moving average. No retraining. | **Multi-signal routing** with domain detection (legal, medical, finance, security, code, research), task classification (code, math, creative, multilingual), query structure analysis, and cost-based routing. Zero ML weights. | **Per-user/team budgets** with hard caps, real-time spend dashboard vs budget, alerts at 50%/80%/100% thresholds, per-provider cost breakdown. | **Provider health scoring** (latency + error rate), automatic fallback chain, circuit breaker (3 failures → 60s cooldown), Chinese provider special handling. | **Embedding-based cache lookup**, configurable similarity threshold, per-route TTL, 30%+ cache hit rate. | **Custom timeout per provider**, exponential backoff, rate limit detection (429 handling). |
282
-
283
- ---
284
-
285
- ## Quick Start
286
-
287
- ### TypeScript SDK
288
-
289
- ```typescript
290
- adaptive-memory-multi-model-router/sdk';
291
-
292
- const router = new A3MRouter();
293
-
294
- // Route a query — returns model + tier + cost + complexity
295
- const decision = router.route("Review this contract for liability clauses");
296
- // → { model: "anthropic/claude-3.5-sonnet", tier: "premium",
297
- // cost: 0.008, complexity: 0.87, isExpert: true }
298
-
299
- // Analyze why it chose that model
300
- const features = router.analyze("Review this contract for liability clauses");
301
- // → { detectedDomain: "legal", domainScore: 0.35, hasCode: false,
302
- // requiresReasoning: true, complexity: 0.87 }
303
- ```
304
-
305
- ### Python SDK
306
-
307
- ```python
308
- from a3m import A3MRouter
309
-
310
- async with A3MRouter() as router:
311
- # Route without executing
312
- decision = await router.route("Write a Python function to sort an array")
313
- print(decision.model, decision.tier, decision.cost)
314
- # → groq/llama-3.3-70b cheap 0.0004
315
-
316
- # Execute via OpenAI-compatible chat
317
- response = await router.chat("What is 2+2?", model="auto")
318
- print(response["choices"][0]["message"]["content"])
319
- ```
320
-
321
- ### OpenAI-Compatible Proxy
322
-
323
- ```bash
324
- npx a3m-router serve
325
- # → Proxy running at http://localhost:8787
326
- ```
327
-
328
- ```python
329
- # Works with ANY OpenAI SDK — zero code changes
330
- from openai import OpenAI
331
- client = OpenAI(base_url="http://localhost:8787/v1", api_key="not-needed")
332
-
333
- response = client.chat.completions.create(
334
- model="auto", # ← intelligent routing kicks in
335
- messages=[{"role": "user", "content": "Hello!"}]
336
- )
337
- ```
338
-
339
- ### CLI
340
-
341
- ```bash
342
- npx a3m-router route "Explain quantum computing" # → groq/llama-3.3-70b
343
- npx a3m-router route "Design a clinical trial" # → openai/gpt-4o
344
- npx a3m-router serve --port 8787 # Start proxy
345
- npx a3m-router benchmark # Run accuracy test
346
- npx a3m-router health # Check providers
347
- npx a3m-router cost # Cost analytics
348
- npx a3m-router compare "What is AI?" # All providers side-by-side
349
- ```
350
-
351
- ### REST API
352
-
353
- ```bash
354
- # Get routing decision (no LLM call)
355
- curl -s http://localhost:8787/v1/route \
356
- -H "Content-Type: application/json" \
357
- -d '{"query": "Write a Python function"}' | jq .
358
-
359
- # Chat completion (OpenAI format)
360
- curl -s http://localhost:8787/v1/chat/completions \
361
- -H "Content-Type: application/json" \
362
- -d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}'
363
- ```
364
-
365
- ---
366
-
367
-
368
- ### Terminal Demo
369
-
370
- ```bash
371
- $ npx a3m-router serve
372
- ╔════════════════════════════════════════════════════════════╗
373
- ║ A3M Router v2.9.2 ║
374
- ║ 🔀 Intelligent LLM Gateway ║
375
- ╠════════════════════════════════════════════════════════════╣
376
- ║ ✅ Proxy: http://localhost:8787 ║
377
- ║ ✅ Dashboard: http://localhost:8787/dashboard ║
378
- ║ ✅ Health: http://localhost:8787/health ║
379
- ╚════════════════════════════════════════════════════════════╝
380
-
381
- [GROQ] ✅ 145ms | [DEEPSEEK] ✅ 230ms | [KIMI] ✅ 312ms
382
- [ANTHROPIC] ✅ 520ms | [OPENAI] ✅ 480ms | [QWEN] ✅ 290ms
383
-
384
- 🧠 Memory: 1,247 queries cached | 💰 Today: $2.34 / $50.00 budget
385
- ```
386
-
387
- ```bash
388
- $ npx a3m-router route "Design a clinical trial for oncology"
389
-
390
- 🔀 Routing Decision:
391
- Query: "Design a clinical trial for oncology"
392
-
393
- 📊 Complexity: 1.00 (premium)
394
- 🏷️ Tier: premium
395
-
396
- ✅ Route to: openai/gpt-4o ($2.50/1M tokens)
397
- 🔄 Fallback: anthropic/claude-3.5-sonnet
398
-
399
- 💡 Signals: medical(+0.35) + design(+0.20) + multi-step(+0.15)
400
- ```
401
-
402
- ```bash
403
- $ npx a3m-router cost
404
-
405
- 💰 Cost Analytics (May 2024)
406
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
407
- Total Spend: $127.45 / $500.00 budget
408
- Daily Average: $4.27
409
- Queries: 28,392
410
-
411
- 📈 By Provider: 📊 By Tier:
412
- Groq: $42.30 ████████ 33% premium: $89.10 70%
413
- DeepSeek: $51.20 █████████ 40% mid: $28.90 23%
414
- Claude: $28.90 █████ 23% cheap: $7.45 6%
415
- GPT-4o-mini: $5.05 █ 4% free: $2.00 1%
416
-
417
- 🚨 Budget Alert: Engineering team at 80% ($160 / $200)
418
- ```
419
-
420
- ---
421
-
422
- ## How It Works — Routing Engine
423
-
424
- A3M Router combines multi-signal routing, semantic caching, and load balancing to route queries to the cheapest capable model with 69.64 accuracy.
425
-
426
- ### Routing Signals
427
-
428
- A3M Router uses **multi-signal heuristic scoring** — 12 keyword signals across 5 dimensions — to classify query complexity and route to the cheapest capable model. No ML, no GPU, <1ms.
429
-
430
- #### 1. Domain Detection (+0.35 max)
431
-
432
- | Keywords | Score |
433
- |:---------|:----:|
434
- | `legal`, `contract`, `liability`, `clause` | +0.35 |
435
- | `medical`, `clinical`, `patient`, `diagnosis` | +0.35 |
436
- | `security`, `vulnerability`, `exploit` | +0.35 |
437
- | `finance`, `investment`, `risk`, `portfolio` | +0.30 |
438
- | `architecture`, `system design` | +0.25 |
439
- | `ML`, `model`, `training`, `gradient` | +0.25 |
440
-
441
- #### 2. Task Indicators (+0.25 max)
442
-
443
- | Keywords | Score |
444
- |:---------|:----:|
445
- | `code`, `function`, `algorithm`, `debug` | +0.25 |
446
- | `math`, `calculate`, `equation`, `formula` | +0.20 |
447
- | `translate`, `multilingual`, `language` | +0.15 |
448
- | `creative`, `story`, `poem` | +0.10 |
449
-
450
- #### 3. Query Structure (+0.20 max)
451
-
452
- | Feature | Score |
453
- |:--------|:----:|
454
- | Multiple clauses (`and`/`or`/`but`) | +0.10 |
455
- | Length > 200 characters | +0.05 |
456
- | Qualifiers (`explain`, `analyze`) | +0.05 |
457
-
458
- #### 4. Action Verb Intensity (+0.20 max)
459
-
460
- | Intensity | Verbs | Score |
461
- |:----------|:------|:----:|
462
- | Expert | `design`, `architect`, `optimize` | +0.20 |
463
- | Mid | `analyze`, `review`, `evaluate` | +0.10 |
464
- | Simple | `what`, `who`, `when`, `where` | −0.10 |
465
-
466
- #### 5. Multi-Step Detection (+0.15 max)
467
-
468
- | Pattern | Score |
469
- |:--------|:----:|
470
- | `first...then...finally` | +0.15 |
471
- | `step 1, step 2, step 3` | +0.15 |
472
-
473
- ---
474
-
475
- **→ Complexity Score gets summed, then mapped to a tier:**
476
-
477
- ```
478
- 0.00 ───────── 0.19 ────────── 0.44 ─────────── 1.00
479
- ├── free ─────|── cheap ───────|── mid ────────| premium ─┤
480
- │ taste-1 │ llama-3.3-70b │ gpt-4o-mini │ gpt-4o │
481
- │ $0 │ $0.20/M │ $0.60/M │ $2.50/M │
482
- ```
483
-
484
- Route: pick cheapest available model in the assigned tier, with +2 fallback models.
485
-
486
- #### Real-World Classification Examples
487
-
488
- | Query | Signals Detected | Score | Tier | Route To |
489
- |:------|:-----------------|:----:|:----:|:---------|
490
- | `"What is 2+2?"` | Simple structure | 0.10 | free | taste-1 ($0) |
491
- | `"Write a Python sort"` | code +0.25, simple −0.10 | 0.33 | cheap | llama-3.3-70b ($0.20/M) |
492
- | `"Analyze AI implications"` | analyze +0.10 | 0.41 | cheap | llama-3.3-70b ($0.20/M) |
493
- | `"Review contract liability"` | legal +0.35, review +0.10, long +0.05 | 0.87 | premium | claude-3.5-sonnet ($1.50/M) |
494
- | `"Design oncology trial"` | medical +0.35, design +0.20, steps +0.15 | 1.00 | premium | gpt-4o ($2.50/M) |
495
-
496
- ```typescript
497
- adaptive-memory-multi-model-router';
498
-
499
- // See exactly what signals a query triggers
500
- const features = extractQueryFeatures("Review this contract for liability clauses");
501
- // → { complexity: 0.87, has_code: false, requires_reasoning: true,
502
- // detected_domain: 'legal', domain_score: 0.35 }
503
-
504
- // Route to the cheapest capable model
505
- const decision = routeQuery("Write a Python function to sort an array");
506
- // → { model: 'groq/llama-3.3-70b', tier: 'cheap', cost: 0.0004,
507
- // complexity: 0.33, reasoning: ['code signal +0.25', 'simple verb -0.10'] }
508
- ```
509
-
510
- ### Visual Routing Flow
511
-
512
- ```
513
- User Query
514
-
515
-
516
- ┌─────────────────────┐
517
- │ Guardrails Check │
518
- │ 🔒 PII / Injection │
519
- └──────────┬──────────┘
520
-
521
- ✅ Pass?
522
- / \
523
- No Yes
524
- │ │
525
- ▼ ▼
526
- [BLOCK] ┌─────────────────┐
527
- │ Semantic Cache │
528
- │ 💾 Lookup │
529
- └────────┬────────┘
530
-
531
- Cache Hit?
532
- / \
533
- Yes No
534
- │ │
535
- ▼ ▼
536
- [RETURN] ┌─────────────────┐
537
- │ │ Route Query │
538
- │ │ 🎯 12 Signals │
539
- │ │ Complexity → │
540
- │ │ Tier │
541
- │ └────────┬────────┘
542
- │ │
543
- │ ▼
544
- │ ┌─────────────────┐
545
- │ │ Provider Health │
546
- │ │ 📊 Scoring │
547
- │ └────────┬────────┘
548
- │ │
549
- │ ▼
550
- │ ┌─────────────────┐
551
- │ │ Best Provider │
552
- │ │ + Fallbacks │
553
- │ └────────┬────────┘
554
- │ │
555
- │ ▼
556
- │ ┌─────────────────┐
557
- │ │ Execute LLM │
558
- │ │ Call │
559
- │ └────────┬────────┘
560
- │ │
561
- │ ▼
562
- │ ┌─────────────────┐
563
- │ │ Update Memory │
564
- │ │ 🧠 EMA Update │
565
- │ └────────┬────────┘
566
- │ │
567
- │ ▼
568
- │ [RETURN RESPONSE]
569
- │ │
570
- └─────────────────┘
571
- ```
572
-
573
- ---
574
-
575
-
576
-
577
- ### Cost Savings by Query Type
578
-
579
- | Query Type | % Traffic | GPT-4o Only | A3M Routes To | A3M Cost | Savings |
580
- |------------|:---------:|:-----------:|:-------------:|:--------:|:-------:|
581
- | Simple Q&A | 47% | $4.94 | taste-1 (free) | $0.00 | **100%** |
582
- | Code gen | 15% | $4.88 | deepseek ($0.14/M) | $0.17 | **97%** |
583
- | Summarization | 18% | $7.20 | gpt-4o-mini ($0.15/M) | $0.43 | **94%** |
584
- | Reasoning | 12% | $8.70 | claude-haiku ($0.80/M) | $3.36 | **61%** |
585
- | Expert | 8% | $8.40 | gpt-4o ($2.50/M) | $8.40 | **0%** |
586
- | **Total** | **100%** | **$34.11** | — | **$12.36** | **64%** |
587
-
588
- | Monthly Queries | GPT-4o Only | A3M Router | You Save | Annualized |
589
- |:---------------:|:-----------:|:----------:|:--------:|:----------:|
590
- | 10K | $34 | $12 | $22 | $261 |
591
- | 100K | $341 | $124 | $218 | $2,610 |
592
- | 1M | $3,411 | $1,236 | $2,175 | $26,100 |
593
-
594
- ---
595
-
596
-
597
- For simple per-query routing, A3M Router uses **multi-signal heuristic scoring** (12 keyword signals → complexity score → tier → cheapest available model). This is fast (<1ms), deterministic, and achieves 69.64 accuracy without ML.
598
-
599
- For **complex multi-agent workflows** — where a task must be decomposed into sub-tasks and each sub-task assigned to a different agent — A3M Router uses **Monte Carlo Tree Search (MCTS)**.
600
-
601
- ### When to Use MCTS vs Heuristic Scoring
602
-
603
- | Scenario | Approach |
604
- |----------|----------|
605
- | Single query, route to cheapest capable model | Multi-signal scoring (default, <1ms) |
606
- | Decompose task into sub-tasks, assign each to optimal agent | MCTS (finds optimal assignment) |
607
- | Batch queries with different complexity levels | Heuristic scoring |
608
- | Multi-turn workflow with branching decisions | MCTS |
609
-
610
- ### How MCTS Works
611
-
612
- MCTS builds a search tree where each node represents a **workflow state** (which sub-tasks are completed, which agents are assigned to which tasks). It explores the tree using **UCB1** (Upper Confidence Bound) to balance exploration vs exploitation:
613
-
614
- ```
615
- UCB1(node) = (total_reward / visits) + C × √(ln(parent_visits) / visits)
616
- ```
617
-
618
- Where `C = √2 ≈ 1.414` is the exploration constant.
619
-
620
- **4 steps per iteration:**
621
- 1. **Selection** — Starting from root, descend by selecting child with highest UCB1 until unexpanded node or terminal state
622
- 2. **Expansion** — Add one or more child nodes (untried actions)
623
- 3. **Simulation** — Run a rollout from the new node, evaluate the assignment strategy
624
- 4. **Backpropagation** — Update rewards and visit counts back up the tree
625
-
626
- After N iterations, the node with the highest average reward is the best strategy.
627
-
628
- ```typescript
629
- adaptive-memory-multi-model-router/orchestration';
630
-
631
- const optimizer = new MCTSWorkflowOptimizer({
632
- maxIterations: 50, // tree search depth
633
- explorationConstant: 1.414, // UCB1 constant
634
- maxDepth: 5 // max workflow depth
635
- });
636
-
637
- // Available agents
638
- optimizer.setAgents(['claude', 'codex', 'gemini', 'deepseek']);
639
-
640
- // Find best agent assignment for sub-tasks
641
- const bestStrategy = await optimizer.findBestStrategy(
642
- ['research', 'write', 'review', 'publish'],
643
- async (assignments) => {
644
- // Evaluate reward: maximize quality, minimize cost and latency
645
- return reward;
646
- }
647
- );
648
- // → { research: 'deepseek', write: 'claude', review: 'gemini', publish: 'codex' }
649
- ```
650
-
651
- ### MCTS vs Rule-Based Assignment
652
-
653
- | | Rule-based | MCTS |
654
- |-|----------|------|
655
- | **Logic** | Hard-coded if/else | Learned from simulation |
656
- | **Adaptivity** | Static | Adapts to agent performance |
657
- | **Complexity** | O(n) | O(iterations × branching^depth) |
658
- | **Exploration** | None | Balances explore/exploit |
659
- | **Known strategies** | Fast | Slower but finds better strategies |
660
- | **Scale** | Good for <10 agents | Scales to 20+ agents |
661
-
662
-
663
- ```
664
- A3M Router (per-query routing)
665
- └── Multi-signal scoring → fast (<1ms)
666
- └── Tier selection → cheapest available
667
-
668
- TMLPD Orchestration (multi-agent workflows)
669
- └── MCTS → optimal agent assignment
670
- ├── UCB1 selection
671
- ├── State tree expansion
672
- └── Reward backpropagation
673
- ```
674
-
675
- **Example workflow:**
676
- ```
677
- User: "Research AI safety, write a report, have experts review it, then publish"
678
-
679
- MCTS decomposes into:
680
- research → deepseek (cost-effective for research)
681
- write → claude (best for structured long-form)
682
- review → expert-agents (human-in-loop or specialist LLM)
683
- publish → codex (can handle deployment code)
684
-
685
- Router assigns each sub-task to optimal agent, tracks outcomes, learns preferences.
686
- ```
687
-
688
-
689
-
690
-
691
- ---
692
-
693
-
694
- ## Features in Detail
695
-
696
- ### Feature Overview
697
-
698
- ```
699
- ┌────────────────────────────────────────────────────────────────────────────┐
700
- │ A3M Router Features │
701
- ├────────────────────────────────────────────────────────────────────────────┤
702
- │ │
703
- │ ⚡ PARALLEL ENSEMBLE │ 🧠 ADAPTIVE MEMORY │
704
- │ ──────────────────── │ ─────────────────── │
705
- │ • Run N providers at once │ • MemoryTree storage │
706
- │ • Confidence scoring │ • EMA quality scoring │
707
- │ • Transparent winner logic │ • Learns from history │
708
- │ • Historical feedback │ • No retraining needed │
709
- │ │
710
- ├────────────────────────────────────────────────────────────────────────────┤
711
- │ │
712
- │ 🎯 INTELLIGENT ROUTING │ 💰 HARD BUDGET ENFORCEMENT │
713
- │ ───────────────────── │ ─────────────────────── │
714
- │ ─────────────────────── │ ─────────────────── │
715
- │ • Per-user/team budgets │ • 17-pattern injection detection │
716
- │ • Real-time spend tracking │ • PII redaction │
717
- │ • Alerts at 50/80/100% │ • Content filtering │
718
- │ • Hard caps (reject when exceeded) │ • Hallucination checks │
719
- │ │
720
- ├────────────────────────────────────────────────────────────────────────────┤
721
- │ │
722
- │ 🔄 INTELLIGENT FAILOVER │ 💾 SEMANTIC CACHE │
723
- │ ─────────────────────── │ ─────────────────── │
724
- │ • Provider health scoring │ • Embedding-based lookup │
725
- │ • Circuit breaker (3 fails) │ • Configurable similarity threshold │
726
- │ • Automatic fallback chain │ • Per-route TTL │
727
- │ • Chinese provider handling │ • 30%+ cache hit rate │
728
- │ │
729
- ├────────────────────────────────────────────────────────────────────────────┤
730
- │ │
731
- │ ⚡ PER-PROVIDER RETRY │ 📊 COST ANALYTICS │
732
- │ ───────────────────── │ ─────────────────── │
733
- │ • Custom timeout per model │ • Per-provider breakdown │
734
- │ • Exponential backoff │ • Budget vs actual dashboard │
735
- │ • 429 rate limit handling │ • Projected savings │
736
- │ • Jitter to prevent storms │ • Monthly/yearly reports │
737
- │ │
738
- └────────────────────────────────────────────────────────────────────────────┘
739
- ```
740
-
741
- ---
742
-
743
-
744
-
745
- ### 🧠 Adaptive Memory & Learning
746
-
747
- **How Memory Works**
748
-
749
- **Memory Tree** — Hierarchical text storage that scores and organizes context chunks by relevance. Query it to retrieve relevant past decisions.
750
-
751
- **Online Learning** — Every real LLM call updates model quality scores using exponential moving average (α=0.2). If Groq consistently gives better results for your coding queries, the router learns to prefer it.
752
-
753
- **Model Profiles** — Each model accumulates real latency, cost, and quality data. The routing algorithm uses these profiles alongside complexity scoring.
754
-
755
- ### 💰 Hard Budget Enforcement
756
-
757
- **Per-User/Team Budgets with Hard Caps + Real-Time Dashboard**
758
-
759
- ```typescript
760
- adaptive-memory-multi-model-router/billing';
761
-
762
- const budgets = new BudgetManager({
763
- monthlyLimit: 500, // $500/month hard cap
764
- alerts: [0.5, 0.8, 1.0], // 50%, 80%, 100% alerts
765
- perTeamLimits: {
766
- 'engineering': 200, // $200 for engineering team
767
- 'product': 150, // $150 for product team
768
- },
769
- perUserLimits: {
770
- 'user-123': 50, // $50 for specific user
771
- }
772
- });
773
-
774
- budgets.onAlert((alert) => {
775
- console.log(`${alert.type}: ${alert.team} at ${alert.percentage}%`);
776
- // → "warning: engineering at 80%"
777
- });
778
-
779
- budgets.getSpendBreakdown();
780
- // → { total: 340.50, byTeam: { engineering: 180, product: 120, ... }, byProvider: {...} }
781
- ```
782
-
783
- ### 🔄 Intelligent Failover
784
-
785
- **Provider Health Scoring + Circuit Breaker + Chinese Provider Handling**
786
-
787
- ```typescript
788
- adaptive-memory-multi-model-router/failover';
789
- adaptive-memory-multi-model-router/failover';
790
-
791
- // Provider health scoring
792
- const health = new HealthScoreManager({
793
- latencyWeight: 0.6, // 60% weight on latency
794
- errorRateWeight: 0.4, // 40% weight on error rate
795
- baselineLatency: 500, // ms - what "good" looks like
796
- errorPenalty: 20, // points per 1% error rate
797
- });
798
-
799
- health.getScore('groq'); // → 0.85 (85% healthy)
800
- health.getScore('deepseek'); // → 0.72 (degraded)
801
-
802
- // Circuit breaker with fallback chain
803
- const cb = new CircuitBreaker({
804
- failureThreshold: 3, // trip after 3 failures
805
- cooldownMs: 60000, // 60 second cooldown
806
- fallbackChain: ['groq', 'deepseek', 'openai'],
807
- });
808
-
809
- cb.execute('kimi', () => callKimi());
810
- // → if kimi fails 3x, circuit trips, next calls skip kimi for 60s
811
-
812
- // Chinese provider special handling
813
- const chineseHandler = new ChineseProviderHandler({
814
- enabledProviders: ['kimi', 'deepseek', 'qwen', 'yi'],
815
- regionalFallback: 'openai',
816
- rateLimitBackoff: 30000, // longer backoff for Chinese rate limits
817
- });
818
- ```
819
-
820
- ### 💾 Semantic Cache
821
-
822
- **Embedding-Based Cache Lookup + Per-Route TTL + Configurable Similarity**
823
-
824
- ```typescript
825
- adaptive-memory-multi-model-router/cache';
826
-
827
- const cache = new SemanticCache({
828
- maxSize: 1000, // max entries
829
- similarityThreshold: 0.92, // 92% similar = cache hit
830
- ttl: 3600000, // 1 hour default TTL
831
- perRouteTTL: {
832
- 'legal/*': 86400000, // legal queries: 24hr cache
833
- 'code/*': 1800000, // code queries: 30min cache
834
- }
835
- });
836
-
837
- // First call: LLM
838
- const result = await llm("What is the capital of France?");
839
-
840
- // Second call: cache hit (similarity > 0.92)
841
- const cached = await llm("What's the capital of France?"); // ← no LLM call
842
-
843
- cache.getStats(); // { hits: 1, misses: 1, hitRate: 0.5, size: 1 }
844
- ```
845
-
846
- ### ⚡ Per-Provider Retry Logic
847
-
848
- **Custom Timeout + Exponential Backoff + Rate Limit Detection**
849
-
850
- ```typescript
851
- adaptive-memory-multi-model-router/retry';
852
-
853
- const retry = new RetryManager({
854
- providers: {
855
- 'openai': { timeout: 30000, maxRetries: 3, baseDelay: 1000 },
856
- 'anthropic': { timeout: 45000, maxRetries: 3, baseDelay: 1000 },
857
- 'groq': { timeout: 15000, maxRetries: 2, baseDelay: 500 },
858
- 'kimi': { timeout: 20000, maxRetries: 3, baseDelay: 2000 }, // longer delay for Chinese API
859
- },
860
- backoffMultiplier: 2, // exponential: 1s → 2s → 4s
861
- jitter: 0.3, // ±30% jitter to prevent thundering herd
862
- rateLimitHandling: 'retry-after', // use Retry-After header for 429
863
- });
864
-
865
- retry.execute('groq', () => callGroq());
866
- // → automatic timeout, backoff, and 429 handling
867
- ```
868
-
869
- ---
870
-
871
- ## ⚡ Parallel Ensemble (P0 — Core Differentiator)
872
-
873
- Run every query against multiple providers simultaneously. Score each result on specificity, structure, and relevance. Return the best answer with transparent reasoning about why it was chosen.
874
-
875
- ```typescript
876
- adaptive-memory-multi-model-router/ensemble';
877
-
878
- const result = await executeEnsemble(
879
- "Explain how vector databases work",
880
- systemPrompt,
881
- context,
882
- { nvidia: callNvidia, groq: callGroq, openai: callOpenAI },
883
- { providers: ['nvidia', 'groq', 'openai'], timeoutMs: 30000 }
884
- );
885
-
886
- console.log(`🏆 Winner: ${result.winner}`); // → nvidia
887
- console.log(`📊 Score: ${result.scores.nvidia}`); // → 75
888
- console.log(`💡 Reasoning: ${result.reasoning}`); // → scored higher on specificity
889
-
890
- // All results preserved, even from losers
891
- console.log(result.allResults.groq); // → groq's answer (available if needed)
892
- ```
893
-
894
- **When to use ensemble:** When answer quality matters more than latency. Ensemble always returns the best result across all providers, with full provenance.
895
-
896
- **When to skip:** For simple lookups or latency-critical paths, use single-provider routing (heuristic <1ms).
897
-
898
- ```typescript
899
- // Track historical accuracy per provider
900
- adaptive-memory-multi-model-router/ensemble';
901
-
902
- let history = {};
903
- history = recordFeedback('nvidia', true, history); // good answer
904
- history = recordFeedback('groq', false, history); // bad answer
905
- // → { nvidia: { good: 1, bad: 0 }, groq: { good: 0, bad: 1 } }
906
- ```
907
-
908
- ---
909
-
910
- ## 🧭 Query-Type Presets (P1)
911
-
912
- Route queries to the optimal provider and temperature based on task type — no manual configuration needed.
913
-
914
- | Type | Provider | Temp | Ensemble | Use Case |
915
- |:---|:---|:---:|:---:|:---|
916
- | ⚡ Fast | Groq | 0.3 | ❌ | Quick lookups, simple Q&A |
917
- | 🔬 Research | NVIDIA | 0.3 | ✅ | Deep analysis, comparisons |
918
- | 🎨 Creative | NVIDIA | 0.7 | ❌ | Writing, brainstorming |
919
- | 💻 Code | Any | 0.2 | ✅ | Debugging, architecture |
920
- | 📖 Factual | Groq | 0.2 | ❌ | Definitions, facts |
921
-
922
- ```typescript
923
- adaptive-memory-multi-model-router/presets';
924
-
925
- const router = createPresetRouter();
926
-
927
- // Classify any query automatically
928
- const preset = router.classify("Write a Python function to sort an array");
929
- // → 'code'
930
-
931
- preset.provider; // → 'nvidia' (or whichever code provider is configured)
932
- preset.temperature; // → 0.2
933
- preset.ensemble; // → true
934
- preset.maxTokens; // → 3000
935
- preset.timeoutMs; // → 45000
936
-
937
- // Customize presets for your workload
938
- adaptive-memory-multi-model-router/presets';
939
-
940
- const customRouter = createPresetRouter({
941
- ...DEFAULT_PRESETS,
942
- research: { ...DEFAULT_PRESETS.research, provider: 'openai' },
943
- });
944
- ```
945
-
946
- ---
947
-
948
- ## 🧠 Persistent Memory (P3)
949
-
950
- Agent execution memories persist across CLI or API sessions via a local JSON file. Auto-saves after every 3 entries. Full keyword index rebuilt on load.
951
-
952
- ```typescript
953
- adaptive-memory-multi-model-router/memory';
954
-
955
- // Pass a file path to enable persistence
956
- const memory = new EpisodicMemoryStore(1000, './.a3m-memory.json');
957
-
958
- // Auto-saves to disk every 3 entries
959
- memory.storeEntry({
960
- task: { description: "Build a REST API in Python", type: "code", complexity: 0.7 },
961
- result: { success: true, output: "...", duration_ms: 45000 },
962
- agent: { id: "codex", model: "gpt-4o", provider: "openai" },
963
- });
964
-
965
- // On next startup, memory auto-loads from disk
966
- const similar = memory.getSimilarTasks("Python async API", 5);
967
- console.log(`🔍 Found ${similar.length} similar past executions`);
968
-
969
- memory.getStats();
970
- // → { total_entries: 142, success_rate: 0.94, avg_duration_ms: 12000 }
971
- ```
972
-
973
- **Not just in-memory:** Unlike most agent frameworks that lose context on restart, A3M's memory survives process restarts, container redeploys, and machine reboots.
974
-
975
- ---
976
-
977
- ## Comparison
978
-
979
- | Feature | A3M Router | [LiteLLM](https://github.com/BerriAI/litellm) | [Portkey](https://github.com/Portkey-AI/gateway) | [OpenRouter](https://openrouter.ai) |
980
- |---------|:----------:|:-------:|:-------:|:-------:|
981
- | **Parallel ensemble** | **✅** | ❌ | ❌ | ❌ |
982
- | **Confidence scoring** | **✅** | ❌ | ❌ | ❌ |
983
- | **Routing accuracy published** | **Yes** (69.64 ±1) | No (manual) | No | No |
984
- | **Intelligent routing** | Multi-signal per-query | Manual selection | Manual | Manual |
985
- | **Zero ML / Zero GPU** | **Yes** | Yes | Yes | Yes |
986
- | **Package size** | 19.5 KB | ~50 MB | ~30 MB | API-only |
987
- | **OpenAI-compatible proxy** | **Yes** | No | Yes | Yes | Yes |
988
- | **Adaptive memory** | **Yes** | No | No | No | No |
989
- | **Semantic cache** | **Yes** (trigram) | No | No | Yes | No |
990
- | **Prompt injection detection** | **Yes** (17 patterns) | No | No | Yes | No |
991
- | **PII redaction** | **Yes** | No | No | Yes | No |
992
- | **Hallucination checks** | **Yes** | No | No | No | No |
993
- | **Cost analytics** | **Yes** | No | Yes | Yes | Yes |
994
- | **Budget alerts** | **Yes** | No | No | Yes | No |
995
- | **Circuit breaker** | **Yes** | No | No | Yes | No |
996
- | **LangChain adapter** | **Yes** | No | Yes | Yes | No |
997
- | **Python SDK** | **Yes** | Yes | Yes | Yes | Yes |
998
- | **TypeScript SDK** | **Yes** | No | No | Yes | Yes |
999
- | **CLI** | **Yes** | No | Yes | No | No |
1000
- | **Self-hosted** | **Yes** | Yes | Yes | Yes | No |
1001
- | **License** | MIT | Apache 2.0 | Custom | MIT | Proprietary |
1002
-
1003
- **Also consider:** [9router](https://github.com/decolua/9router), [ClawRouter](https://github.com/BlockRunAI/ClawRouter), [Plano](https://github.com/katanemo/plano), [Helicone](https://github.com/Helicone/helicone)
1004
-
1005
- ---
1006
-
1007
- ## Production Ready
1008
-
1009
- A3M Router is built for teams running AI in production — where budget overruns, cache inefficiency, provider outages, and retry storms cost real money and real uptime.
1010
-
1011
- ### Pain Points Solved
1012
-
1013
- | Problem | Without A3M Router | With A3M Router |
1014
- |---------|-------------------|-----------------|
1015
- | **Budget spiral** | Monthly bills 3-5x expected, no visibility into per-team spend | Hard per-user/per-team caps with real-time spend dashboard, alerts at 50%/80%/100% |
1016
- | **Cache misses on similar queries** | Same query by 1000 users = 1000 LLM API calls | Embedding-based semantic cache, 30%+ hit rate, configurable similarity threshold |
1017
- | **Provider outage cascades** | One provider fails → all requests fail → P0 incident | Circuit breaker (3 failures → 60s cooldown) + automatic fallback chain |
1018
- | **Chinese provider failures** | Generic retry logic fails on Chinese APIs (rate limits, regional constraints) | Special handling: health scoring, regional awareness, provider-specific fallback |
1019
- | **Retry storms at scale** | All clients retry simultaneously on 429 → provider stays overloaded | Per-provider retry config, exponential backoff, rate limit detection prevents thundering herd |
1020
- | **No observability** | Blind to which provider is failing, which team is overspending | Provider health scoring, per-provider cost breakdown, spend vs budget per team |
1021
-
1022
- ### Enterprise Features
1023
-
1024
- - **Hard Budget Enforcement** — Per-user and per-team monthly budgets with hard caps. Real-time spend dashboard shows actual vs budget. Alerts fire at 50%, 80%, 100% thresholds. Per-provider cost breakdown shows exactly where every dollar goes.
1025
-
1026
- - **Semantic Cache** — Embedding-based cache lookup with configurable similarity threshold. Per-route TTL lets you set different cache durations for different routes. 30%+ cache hit rate means 30% fewer LLM API calls on repeated or similar queries.
1027
-
1028
- - **Intelligent Failover** — Provider health scoring combines latency and error rate into a live health score. Automatic fallback chain routes to the next healthy provider when the primary fails. Circuit breaker trips after 3 failures and cools for 60 seconds. Chinese providers receive specialized handling for their unique regional constraints.
1029
-
1030
- - **Per-Provider Retry Logic** — Custom timeout per provider. Exponential backoff with jitter. Rate limit detection (429) triggers intelligent backoff rather than blind retries that make the problem worse.
1031
-
1032
- ---
1033
-
1034
- ## API Reference
1035
-
1036
- | Method | Endpoint | Description |
1037
- |--------|----------|-------------|
1038
- | POST | `/v1/chat/completions` | OpenAI-compatible chat (streaming + non-streaming) |
1039
- | POST | `/v1/completions` | OpenAI text completions |
1040
- | POST | `/v1/route` | Routing decision without LLM call |
1041
- | GET | `/v1/models` | List available models with pricing |
1042
- | GET | `/health` | Provider health + cost summary |
1043
- | GET | `/dashboard` | Cost analytics dashboard |
1044
-
1045
- Full API docs: [`docs/API.md`](docs/API.md)
1046
-
1047
- ---
1048
-
1049
- ## Package Exports
1050
-
1051
- ```typescript
1052
- // Main — everything
1053
- adaptive-memory-multi-model-router';
1054
-
1055
- // SDK — clean high-level API
1056
- adaptive-memory-multi-model-router/sdk';
1057
-
1058
- // Individual modules
1059
- adaptive-memory-multi-model-router/cache';
1060
- adaptive-memory-multi-model-router/guardrails';
1061
- adaptive-memory-multi-model-router/cost';
1062
- adaptive-memory-multi-model-router/analytics';
1063
- adaptive-memory-multi-model-router/memory';
1064
- adaptive-memory-multi-model-router/langchain';
1065
- adaptive-memory-multi-model-router/providers';
1066
- adaptive-memory-multi-model-router/server';
1067
-
1068
- // Ensemble (P0) — core differentiator
1069
- adaptive-memory-multi-model-router/ensemble';
1070
-
1071
- // Query-type presets (P1)
1072
- adaptive-memory-multi-model-router/presets';
1073
-
1074
- // Persistent memory (P3)
1075
- adaptive-memory-multi-model-router/memory';
1076
- ```
1077
-
1078
- ---
1079
-
1080
- ## When NOT to Use This
1081
-
1082
- A3M Router is an **LLM gateway and router** designed for multi-provider routing. You may not need it if:
1083
-
1084
- - You only use one LLM provider (no routing benefit)
1085
- - Your workload is >80% expert-level queries (just use GPT-4o directly)
1086
- - You need 250+ provider integrations (use [Portkey](https://github.com/Portkey-AI/gateway))
1087
- - You need ML-based routing with BERT classifiers (use [RouteLLM](https://github.com/Surfsol/RouteLLM))
1088
- - You need enterprise SLAs or managed hosting
1089
-
1090
- For single-provider use cases, the native SDK (OpenAI, Anthropic, etc.) is simpler.
1091
-
1092
- ---
1093
-
1094
- ## Roadmap (Coming Soon)
1095
-
1096
- These features are on our roadmap based on user feedback:
1097
-
1098
- | Feature | Status | Priority |
1099
- |---------|--------|----------|
1100
- | **Distributed tracing** — OpenTelemetry integration for production observability | Planned | High |
1101
- | **Webhook alerts** — Push budget alerts to Slack, PagerDuty, Teams | Planned | High |
1102
- | **Fine-grained RBAC** — Role-based access control for team budgets | Planned | Medium |
1103
- | **Multi-region failover** — Geographic load balancing across regions | Researching | Medium |
1104
- | **SLA reporting** — Uptime and latency SLAs for enterprise contracts | Researching | Low |
1105
-
1106
- ---
1107
-
1108
- ## ⭐ Supporters
1109
-
1110
- If A3M Router helps you, consider:
1111
- - ⭐ Starring on [GitHub](https://github.com/Das-rebel/a3m-router)
1112
- - 📦 Sharing on [npm](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
1113
- - 🐛 Reporting issues
1114
- - 🔀 Submitting PRs
1115
-
1116
- ---
1117
-
1118
- ## Links
1119
-
1120
- - [npm package](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
1121
- - [GitHub repo](https://github.com/Das-rebel/a3m-router)
1122
- - [API Reference](docs/API.md)
1123
- - [Architecture](docs/ARCHITECTURAL-IMPROVEMENTS-2025.md)
1124
- - [Discussions](https://github.com/Das-rebel/a3m-router/discussions)
1125
- - [Contributing](CONTRIBUTING.md) · [Good first issues](https://github.com/Das-rebel/a3m-router/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
1126
-
1127
- ### Community & Support
1128
-
1129
- - [🐛 Report a Bug](https://github.com/Das-rebel/a3m-router/issues/new?template=bug_report.md) — File a detailed bug report
1130
- - [✨ Request a Feature](https://github.com/Das-rebel/a3m-router/issues/new?template=feature_request.md) — Suggest an enhancement
1131
- - [📥 Pull Request Template](https://github.com/Das-rebel/a3m-router/blob/main/.github/PULL_REQUEST_TEMPLATE.md) — Use this format for all PRs
1132
- - [📋 All Issue Templates](https://github.com/Das-rebel/a3m-router/issues/new/choose) — Choose the right template for your submission
1133
-
1134
- MIT License. No vendor lock-in. No account required. `npm install` and go.
1135
-
1136
-
1137
- ---
1138
-
1139
- ## Research-Backed Architecture
1140
-
1141
- A3M Router is built on findings from **30+ 2024-2025 arXiv papers** on LLM routing, load balancing, semantic caching, and multi-agent orchestration. to deliver production-ready features:
1142
-
1143
- | Paper | Year | What We Used |
1144
- |-------|------|-------------|
1145
- | **[RadixAttention (SGLang)](https://arxiv.org/abs/2412.15115)** | 2024 | **Prefix caching** — 5-10x throughput via prefix sharing across queries. Our cache module uses this pattern. |
1146
- | **[RouteLLM](https://arxiv.org/abs/2404.06035)** | 2024 | **Cost-quality routing** — learned routing baseline. We use heuristic routing instead (no GPU, faster startup). |
1147
- | **[Speculative Decoding (Medusa)](https://arxiv.org/abs/2401.10774)** | 2024 | **Multi-token prediction** — 2-3x speedup. Our speculative decoding module implements this interface. |
1148
- | **[AgentOrchestra](https://arxiv.org/abs/2506.12508)** | 2025 | **Hierarchical multi-agent orchestration** — 3-tier planning. We adapted this for provider selection. |
1149
- | **[Difficulty-Aware Routing](https://arxiv.org/abs/2509.11079)** | 2025 | **35% decision quality improvement** — difficulty-based task routing. Core of our routing engine. |
1150
- | **[MemoRAG](https://arxiv.org/abs/2512.12686)** | 2025 | **Global memory encoder** — 50% better long-context. We use MemoryTree for historical context. |
1151
- | **[A-Mem](https://arxiv.org/abs/2502.12110)** | 2025 | **Episodic memory** — 144+ citations. Our episodic memory uses EMA updates for quality scoring. |
1152
- | **[MCTS (Monte Carlo Tree Search)](https://arxiv.org/abs/2411.20000)** | 2024 | **UCB1 exploration** — multi-agent workflow optimization. Used in our provider selection algorithm. |
1153
-
1154
- ### Key Architecture Decisions (Research-Backed):
1155
-
1156
- ```
1157
- ┌────────────────────────────────────────────────────────────┐
1158
- │ Research Sources │
1159
- ├────────────────────────────────────────────────────────────┤
1160
- │ SGLang/RadixAttention → Prefix caching (cache) │
1161
- │ Medusa/Speculative → Multi-token prediction │
1162
- │ AgentOrchestra/HALO → Hierarchical orchestration │
1163
- │ RouteLLM/LiteLLM → Cost-quality routing │
1164
- │ MemoRAG/A-Mem → MemoryTree (episodic+semantic)│
1165
- │ MCTS/UCB1 → Provider selection algorithm │
1166
- └────────────────────────────────────────────────────────────┘
1167
- ```
1168
-
1169
- ### Why Not Use ML-Based Routing?
1170
-
1171
- | Approach | RouteLLM | A3M Router |
1172
- |----------|----------|------------|
1173
- | **Training** | Requires GPU, labeled data | Zero |
1174
- | **Startup** | ~3 minutes | <100ms |
1175
- | **Updates** | Retrain required | EMA, no retraining |
1176
- | **Accuracy** | ~85% | 69.64 () |
1177
- | **Cost** | High (GPU cluster) | Zero |
1178
-
1179
- Research shows heuristic routing with proper feature engineering achieves comparable or better results for task classification — without the infrastructure overhead.
1180
-
1181
- ---
1182
-
1183
-
1184
- ---
1185
-