adaptive-memory-multi-model-router 2.14.46 → 2.14.47
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/{docs/llms.txt → llms.txt.bak} +6 -6
- package/package.json +13 -84
- package/src/routing/advancedRouter.ts.bak +650 -0
- package/test.js.bak +376 -0
- package/.dockerignore +0 -82
- package/.env.example +0 -303
- package/.github/DISCUSSIONS_WELCOME.md +0 -27
- package/.github/DISCUSSION_TEMPLATE.yml +0 -5
- package/.github/FUNDING.yml +0 -2
- package/.github/ISSUE_TEMPLATE/bug_report.md +0 -94
- package/.github/ISSUE_TEMPLATE/config.yml +0 -17
- package/.github/ISSUE_TEMPLATE/feature_request.md +0 -71
- package/.github/PULL_REQUEST_TEMPLATE.md +0 -71
- package/.github/dependabot.yml +0 -9
- package/.github/workflows/auto-publish.yml +0 -51
- package/.github/workflows/ci.yml +0 -263
- package/.github/workflows/codeql.yml +0 -38
- package/.github/workflows/npm-publish.yml +0 -20
- package/.github/workflows/pages.yml +0 -37
- package/.github/workflows/stale.yml +0 -54
- package/.publish-tick +0 -1
- package/.well-known/ai-plugin.json +0 -16
- package/AGENT_COUNCIL_FINDINGS.md +0 -142
- package/ARCHITECTURE.md +0 -346
- package/AUDIT_REPORT.md +0 -28
- package/CODE_OF_CONDUCT.md +0 -128
- package/CONTRIBUTING.md +0 -50
- package/CONTRIBUTORS.md +0 -20
- package/Dockerfile +0 -53
- package/Dockerfile.proxy +0 -33
- package/HEALTH_REPORT.md +0 -118
- package/IMPROVEMENT_PLAN.md +0 -107
- package/LANDING.md +0 -43
- package/LAUNCH-PAIN-DRIVEN.md +0 -339
- package/LAUNCH.md +0 -337
- package/LAUNCH_CHECKLIST.md +0 -141
- package/LAUNCH_SNAPSHOT.md +0 -260
- package/MANIFESTO.md +0 -41
- package/POPULARITY_BOOSTERS.md +0 -285
- package/PR_STATUS_REPORT.md +0 -148
- package/REDESIGN.md +0 -95
- package/RUNKIT.md +0 -83
- package/SECURITY.md +0 -29
- package/SUBMISSIONS.md +0 -43
- package/_schema.html +0 -53
- package/ai-plugin.json +0 -16
- package/articles/AI_AGENT_LLM_ROUTING.md +0 -150
- package/articles/CHINESE_DIRECTORIES.md +0 -100
- package/articles/CHINESE_SUBMISSIONS_READY.md +0 -322
- package/articles/COMPETITOR_ALERTS.md +0 -31
- package/articles/COMPLETE_POSTING_DIRECTORY.md +0 -147
- package/articles/CONTENT_STRUCTURE.md +0 -292
- package/articles/DEVTO_COST_GUIDE.md +0 -473
- package/articles/DEVTO_FINAL.md +0 -416
- package/articles/DEVTO_MULTI_PROVIDER.md +0 -542
- package/articles/DEVTO_READY.md +0 -255
- package/articles/DEVTO_V2_ANNOUNCEMENT.md +0 -160
- package/articles/DEVTO_VIRAL_GROWTH.md +0 -280
- package/articles/FRESH_devto.md +0 -460
- package/articles/FRESH_devto_2026_05.md +0 -73
- package/articles/FRESH_hackernews.md +0 -14
- package/articles/FRESH_reddit_ml.md +0 -90
- package/articles/FRESH_reddit_node.md +0 -198
- package/articles/FRESH_reddit_sideproject.md +0 -72
- package/articles/FRESH_reddit_webdev.md +0 -130
- package/articles/FROM_ZERO_TO_10K.md +0 -107
- package/articles/HN_10X_BETTER.md +0 -430
- package/articles/HN_ACCOUNT_GUIDE.md +0 -21
- package/articles/HN_CHINESE_STYLE.md +0 -308
- package/articles/HN_FINAL.md +0 -148
- package/articles/HN_POSTED_VERSION.md +0 -56
- package/articles/HN_POST_READY.md +0 -137
- package/articles/HN_RESEARCH.md +0 -364
- package/articles/HN_SHOW_routerarena.md +0 -17
- package/articles/HN_TIMING_GUIDE.md +0 -52
- package/articles/INDIEHACKERS_POST.md +0 -52
- package/articles/INDIEHACKERS_READY.md +0 -120
- package/articles/LLM_BENCHMARK_DEEP_DIVE.md +0 -153
- package/articles/MASTER_POSTING_DIRECTORY.md +0 -189
- package/articles/NEWSLETTER_SEND_NOW.md +0 -259
- package/articles/NEWSLETTER_SUBMISSIONS.md +0 -112
- package/articles/PAIN-DRIVEN-devto-v2.md +0 -308
- package/articles/PAIN-DRIVEN-devto-v3.md +0 -268
- package/articles/PAIN-DRIVEN-devto.md +0 -242
- package/articles/PAIN-DRIVEN-hackernews-v2.md +0 -138
- package/articles/PAIN-DRIVEN-hackernews-v3.md +0 -151
- package/articles/PAIN-DRIVEN-hackernews.md +0 -131
- package/articles/PAIN-DRIVEN-reddit-v2.md +0 -301
- package/articles/PAIN-DRIVEN-reddit-v3.md +0 -236
- package/articles/PAIN-DRIVEN-reddit.md +0 -218
- package/articles/PAIN-DRIVEN-twitter-v2.md +0 -110
- package/articles/PAIN-DRIVEN-twitter-v3.md +0 -121
- package/articles/PAIN-DRIVEN-twitter.md +0 -120
- package/articles/PORTKEY_VS_A3M.md +0 -147
- package/articles/POSTING_KIT_2026_05.md +0 -67
- package/articles/PRESS_KIT_routerarena.md +0 -77
- package/articles/PRODUCTHUNT_LISTING.md +0 -48
- package/articles/PRODUCTHUNT_READY.md +0 -106
- package/articles/PR_PLAN_vault.md +0 -125
- package/articles/REDDIT_FINAL.md +0 -232
- package/articles/REDDIT_POST.md +0 -67
- package/articles/REDDIT_SUBMISSION_READY.md +0 -348
- package/articles/ROUTERARENA_LEADER.md +0 -45
- package/articles/SHOW_HN_FINAL.md +0 -29
- package/articles/TWEETS_10K_DOWNLOADS.md +0 -47
- package/articles/TWEETS_BENCHMARK_FIRST.md +0 -46
- package/articles/TWEETS_MCP_PLAY.md +0 -51
- package/articles/TWEETS_SEQUENTIAL_BROKEN.md +0 -49
- package/articles/TWEETS_WHY_BUILD.md +0 -54
- package/articles/TWEETS_routerarena_leader.md +0 -53
- package/articles/TWEET_STORM_READY.md +0 -165
- package/articles/TWITTER_FINAL.md +0 -167
- package/articles/WHY_10X_BETTER.md +0 -261
- package/articles/WHY_CHINESE_STYLE_BETTER.md +0 -323
- package/articles/ai-discoverability-llm-routing.md +0 -210
- package/articles/devto-llm-routing.md +0 -138
- package/articles/hackernews-show-hn.md +0 -54
- package/articles/hashnode-llm-cost-optimization.md +0 -125
- package/articles/hn_show_2026_05.md +0 -11
- package/articles/medium-building-llm-router.md +0 -205
- package/articles/reddit-ml.md +0 -76
- package/articles/twitter-thread-cost-savings.md +0 -50
- package/articles/youtube-tutorial-script.md +0 -262
- package/assets/a3m_3blue1brown.mp4 +0 -0
- package/assets/banner.svg +0 -109
- package/assets/chart-cost-v2.svg +0 -91
- package/assets/chart-cost-v3.svg +0 -143
- package/assets/chart-features-v2.svg +0 -132
- package/assets/chart-features-v3.svg +0 -211
- package/assets/chart-growth-v2.svg +0 -122
- package/assets/chart-growth-v3.svg +0 -189
- package/assets/cost-comparison.svg +0 -134
- package/assets/cost-simple.svg +0 -64
- package/assets/demo-hn.gif +0 -0
- package/assets/feature-matrix.svg +0 -136
- package/assets/growth-chart-animated.svg +0 -76
- package/assets/growth-chart.svg +0 -82
- package/assets/growth-simple.svg +0 -69
- package/assets/hero-diagram.svg +0 -81
- package/assets/logo-new.svg +0 -21
- package/assets/logo.svg +0 -68
- package/assets/provider-comparison.svg +0 -121
- package/assets/social-preview-new.svg +0 -100
- package/assets/social-preview.svg +0 -194
- package/assets/social-v2.svg +0 -130
- package/assets/social-v3.svg +0 -212
- package/benchmark-provider-results.json +0 -245
- package/benchmark-results.json +0 -54
- package/council-votes/architecture-vote.md +0 -121
- package/council-votes/coverage-vote.md +0 -93
- package/data/adaptive-benchmark.json +0 -92
- package/data/benchmark-results.json +0 -47
- package/data/labeled-benchmark.json +0 -88
- package/demo/3blue1brown_video.py +0 -285
- package/demo/3blue1brown_video_v2.py +0 -310
- package/demo/IMPROVED_PROMPTS.md +0 -229
- package/demo/VEO3_PROMPTS.md +0 -269
- package/demo/VIDEO_PRODUCTION_GUIDE.md +0 -333
- package/demo/a3m_3blue1brown.mp4 +0 -0
- package/demo/asciinema-demo.sh +0 -195
- package/demo/demo-hn.tape +0 -74
- package/demo/demo-script.md +0 -53
- package/demo/demo-script.sh +0 -62
- package/demo/demo.svg +0 -75
- package/demo/frame1_ai_data_center.png +0 -0
- package/demo/frame1_sunset_video.mp4 +0 -0
- package/demo/frame2_cost_comparison.png +0 -0
- package/demo/frame2_cost_comparison_fallback.png +0 -0
- package/demo/frame3_parallel_execution.png +0 -0
- package/demo/frame3_parallel_execution_fallback.png +0 -0
- package/demo/frame4_providers.png +0 -0
- package/demo/frame4_providers_fallback.png +0 -0
- package/demo/frame5_endcard.png +0 -0
- package/demo/frame5_endcard_fallback.png +0 -0
- package/demo/new_frame1_hook.png +0 -0
- package/demo/new_frame2_proof.png +0 -0
- package/demo/new_frame3_wow.png +0 -0
- package/demo/new_frame4_social.png +0 -0
- package/demo/new_frame5_cta.png +0 -0
- package/demo/package.json +0 -13
- package/demo/product-video-final.mp4 +0 -0
- package/demo/product-video-hype-v1.mp4 +0 -0
- package/demo/product-video-v1.mp4 +0 -0
- package/demo/public/index.html +0 -762
- package/demo/recording.cast +0 -55
- package/demo/server.js +0 -405
- package/demo-new.tape +0 -71
- package/demo-real.sh +0 -198
- package/demo-simple.tape +0 -205
- package/demo.html +0 -520
- package/demo.sh +0 -85
- package/demo.tape +0 -259
- package/dist/analytics/costAnalytics.d.ts.map +0 -1
- package/dist/analytics/costAnalytics.js.map +0 -1
- package/dist/benchmark/comprehensive.js.map +0 -1
- package/dist/benchmark/reproducible.d.ts.map +0 -1
- package/dist/benchmark/reproducible.js.map +0 -1
- package/dist/cache/prefixCache.d.ts.map +0 -1
- package/dist/cache/prefixCache.js.map +0 -1
- package/dist/cache/responseCache.d.ts.map +0 -1
- package/dist/cache/responseCache.js.map +0 -1
- package/dist/cache/semanticCache.d.ts.map +0 -1
- package/dist/cache/semanticCache.js.map +0 -1
- package/dist/cli/setupWizard.d.ts.map +0 -1
- package/dist/cli/setupWizard.js.map +0 -1
- package/dist/cost/budgetEnforcer.d.ts.map +0 -1
- package/dist/cost/budgetEnforcer.js.map +0 -1
- package/dist/cost/costTracker.d.ts.map +0 -1
- package/dist/cost/costTracker.js.map +0 -1
- package/dist/ensemble/multiRoundDialog.js.map +0 -1
- package/dist/ensemble/shapleyValue.js.map +0 -1
- package/dist/integrations/langchainAdapter.d.ts.map +0 -1
- package/dist/integrations/langchainAdapter.js.map +0 -1
- package/dist/integrations/oauth.d.ts.map +0 -1
- package/dist/integrations/oauth.js.map +0 -1
- package/dist/integrations/scienceAdapter.js.map +0 -1
- package/dist/memory/autoFetch.d.ts.map +0 -1
- package/dist/memory/autoFetch.js.map +0 -1
- package/dist/memory/episodicMemory.d.ts.map +0 -1
- package/dist/memory/episodicMemory.js.map +0 -1
- package/dist/memory/hybridMemory.js.map +0 -1
- package/dist/memory/memoryTree.d.ts.map +0 -1
- package/dist/memory/memoryTree.js.map +0 -1
- package/dist/memory/obsidianVault.d.ts.map +0 -1
- package/dist/memory/obsidianVault.js.map +0 -1
- package/dist/memory/reasoningBank.js.map +0 -1
- package/dist/observability/changeWatch.d.ts.map +0 -1
- package/dist/observability/changeWatch.js.map +0 -1
- package/dist/observability/fatigueDetector.d.ts.map +0 -1
- package/dist/observability/fatigueDetector.js.map +0 -1
- package/dist/observability/index.d.ts.map +0 -1
- package/dist/observability/index.js.map +0 -1
- package/dist/observability/metrics.d.ts.map +0 -1
- package/dist/observability/metrics.js.map +0 -1
- package/dist/observability/middleware.d.ts.map +0 -1
- package/dist/observability/middleware.js.map +0 -1
- package/dist/observability/tracer.d.ts.map +0 -1
- package/dist/observability/tracer.js.map +0 -1
- package/dist/observability/types.d.ts.map +0 -1
- package/dist/observability/types.js.map +0 -1
- package/dist/orchestration/haloOrchestrator.d.ts.map +0 -1
- package/dist/orchestration/haloOrchestrator.js.map +0 -1
- package/dist/orchestration/mctsWorkflow.d.ts.map +0 -1
- package/dist/orchestration/mctsWorkflow.js.map +0 -1
- package/dist/providers/localProvider.d.ts.map +0 -1
- package/dist/providers/localProvider.js.map +0 -1
- package/dist/providers/providerConfig.d.ts.map +0 -1
- package/dist/providers/providerConfig.js.map +0 -1
- package/dist/providers/registry.d.ts.map +0 -1
- package/dist/providers/registry.js.map +0 -1
- package/dist/routing/advancedRouter.d.ts.map +0 -1
- package/dist/routing/advancedRouter.js.map +0 -1
- package/dist/routing/crossModelValidation.d.ts.map +0 -1
- package/dist/routing/crossModelValidation.js.map +0 -1
- package/dist/routing/providerHealth.d.ts.map +0 -1
- package/dist/routing/providerHealth.js.map +0 -1
- package/dist/routing/providerRetry.d.ts.map +0 -1
- package/dist/routing/providerRetry.js.map +0 -1
- package/dist/scripts/banner.js +0 -29
- package/dist/security/guardrails.d.ts.map +0 -1
- package/dist/security/guardrails.js.map +0 -1
- package/dist/server/dashboard.d.ts.map +0 -1
- package/dist/server/dashboard.js.map +0 -1
- package/dist/server/modelMapper.d.ts.map +0 -1
- package/dist/server/modelMapper.js.map +0 -1
- package/dist/server/proxyServer.d.ts.map +0 -1
- package/dist/server/proxyServer.js.map +0 -1
- package/dist/skills/__tests__/skill_manager.test.d.ts +0 -2
- package/dist/skills/__tests__/skill_manager.test.d.ts.map +0 -1
- package/dist/skills/__tests__/skill_manager.test.js +0 -268
- package/dist/skills/__tests__/skill_manager.test.js.map +0 -1
- package/dist/tools/tmlpdTools.d.ts.map +0 -1
- package/dist/tools/tmlpdTools.js.map +0 -1
- package/dist/tui/dashboard.d.ts.map +0 -1
- package/dist/tui/dashboard.js.map +0 -1
- package/dist/tui/index.d.ts.map +0 -1
- package/dist/tui/index.js.map +0 -1
- package/dist/utils/batchProcessor.d.ts.map +0 -1
- package/dist/utils/batchProcessor.js.map +0 -1
- package/dist/utils/compression.d.ts.map +0 -1
- package/dist/utils/compression.js.map +0 -1
- package/dist/utils/costUtils.d.ts.map +0 -1
- package/dist/utils/costUtils.js.map +0 -1
- package/dist/utils/reliability.d.ts.map +0 -1
- package/dist/utils/reliability.js.map +0 -1
- package/dist/utils/sorting.d.ts.map +0 -1
- package/dist/utils/sorting.js.map +0 -1
- package/dist/utils/speculativeDecoding.d.ts.map +0 -1
- package/dist/utils/speculativeDecoding.js.map +0 -1
- package/dist/utils/tokenUtils.d.ts.map +0 -1
- package/dist/utils/tokenUtils.js.map +0 -1
- package/docs/.nojekyll +0 -0
- package/docs/ANALYSIS_PRINCIPLES.md +0 -162
- package/docs/API.md +0 -855
- package/docs/ARCHITECTURAL-IMPROVEMENTS-2025.md +0 -1391
- package/docs/ARCHITECTURAL-IMPROVEMENTS-REVISED-2025.md +0 -1051
- package/docs/BENCHMARK.md +0 -170
- package/docs/CHINESE_PROVIDER_RELIABILITY.md +0 -37
- package/docs/CITATIONS.md +0 -74
- package/docs/CLAIMS_AND_EVIDENCE.md +0 -58
- package/docs/CONFIGURATION.md +0 -476
- package/docs/COUNCIL_DECISION.json +0 -816
- package/docs/COUNCIL_SUMMARY.md +0 -319
- package/docs/COUNCIL_V2.2_DECISION.md +0 -416
- package/docs/ENGINEERING_SPEC.md +0 -55
- package/docs/FACTORY_RESET.md +0 -34
- package/docs/GEO.md +0 -66
- package/docs/GEO_OPTIMIZATION.md +0 -30
- package/docs/GEO_ROOT_CAUSE.md +0 -136
- package/docs/GEO_STATUS.md +0 -85
- package/docs/GEO_TEST_RESULTS.md +0 -176
- package/docs/HN_CHECKLIST.md +0 -38
- package/docs/HN_FOUNDER_COMMENT.md +0 -17
- package/docs/HN_SUBMISSION_FINAL.md +0 -180
- package/docs/HN_SUBMISSION_V3.md +0 -56
- package/docs/IMPROVEMENT_ROADMAP.md +0 -515
- package/docs/INTEGRATIONS.md +0 -420
- package/docs/LANGCHAIN_INTEGRATION.md +0 -147
- package/docs/LLM_COUNCIL_DECISION.md +0 -508
- package/docs/MIDDLEWARE_CHAIN.md +0 -35
- package/docs/PROMO_CHECKLIST.md +0 -200
- package/docs/QUICKSTART.md +0 -271
- package/docs/QUICK_START.md +0 -43
- package/docs/QUICK_START_VISIBILITY.md +0 -782
- package/docs/REDDIT_GAP_ANALYSIS.md +0 -299
- package/docs/RELEASE_CHECKLIST.md +0 -32
- package/docs/REPRODUCIBILITY.md +0 -63
- package/docs/RESEARCH_BACKED_IMPROVEMENTS.md +0 -1180
- package/docs/ROUTING_RUBRIC.md +0 -197
- package/docs/SEO_AUDIT.md +0 -186
- package/docs/SOCIAL_LISTENING.md +0 -219
- package/docs/TMLPD_QNA.md +0 -751
- package/docs/TMLPD_V2.1_COMPLETE.md +0 -763
- package/docs/TMLPD_V2.2_RESEARCH_ROADMAP.md +0 -754
- package/docs/UPDATE_TOPICS.md +0 -15
- package/docs/USE_CASES.md +0 -59
- package/docs/V2.2_IMPLEMENTATION_COMPLETE.md +0 -446
- package/docs/V2_IMPLEMENTATION_GUIDE.md +0 -388
- package/docs/VERCEL_AI_SDK.md +0 -209
- package/docs/VISIBILITY_ADOPTION_PLAN.md +0 -1005
- package/docs/_config.yml +0 -49
- package/docs/ai-plugin.json +0 -16
- package/docs/api.html +0 -513
- package/docs/architecture-diagram.md +0 -40
- package/docs/benchmark-chart.png +0 -0
- package/docs/benchmark.html +0 -387
- package/docs/blog/routerarena-number-one.html +0 -73
- package/docs/cli-cheatsheet.md +0 -339
- package/docs/compare.md +0 -109
- package/docs/comparison-litellm.md +0 -88
- package/docs/comparison.md +0 -108
- package/docs/cost-chart-ascii.md +0 -42
- package/docs/cost-comparison-chart.svg +0 -88
- package/docs/curl-examples.md +0 -247
- package/docs/demo-auto.html +0 -264
- package/docs/demo.html +0 -416
- package/docs/geo/GENERATIVE_ENGINE_OPTIMIZATION.md +0 -232
- package/docs/index.html +0 -507
- package/docs/launch-content/LAUNCH_EXECUTION_CHECKLIST.md +0 -421
- package/docs/launch-content/README.md +0 -457
- package/docs/launch-content/assets/cost_comparison_100_tasks.png +0 -0
- package/docs/launch-content/assets/cumulative_savings.png +0 -0
- package/docs/launch-content/assets/parallel_speedup.png +0 -0
- package/docs/launch-content/assets/provider_pricing_comparison.png +0 -0
- package/docs/launch-content/assets/task_breakdown_comparison.png +0 -0
- package/docs/launch-content/generate_charts.py +0 -313
- package/docs/launch-content/hn_show_post.md +0 -139
- package/docs/launch-content/partner_outreach_templates.md +0 -745
- package/docs/launch-content/reddit_posts.md +0 -467
- package/docs/launch-content/twitter_thread.txt +0 -460
- package/docs/npm-downloads-chart.svg +0 -43
- package/docs/openapi.json +0 -139
- package/docs/openapi.yaml +0 -1318
- package/docs/quick-start.html +0 -366
- package/docs/robots.txt +0 -52
- package/docs/sitemap.xml +0 -57
- package/docs/styles.css +0 -682
- package/docs/well-known/ai-plugin.json +0 -16
- package/docs/wellknown/ai-plugin.json +0 -16
- package/docs-site/assets/og-banner.svg +0 -194
- package/docs-site/index.html +0 -632
- package/eval/README.md +0 -46
- package/eval/baselines/main.json +0 -12
- package/eval/benchmark_dataset.jsonl +0 -16
- package/eval/check_golden_routes.js +0 -64
- package/eval/datasets/catalog.json +0 -33
- package/eval/datasets/slices/cn_provider_reliability_v1.jsonl +0 -3
- package/eval/datasets/slices/cost_pressure_v1.jsonl +0 -3
- package/eval/datasets/slices/safety_guardrails_v1.jsonl +0 -3
- package/eval/evals.json +0 -199
- package/eval/fault_injection_thresholds.json +0 -3
- package/eval/generate_report.js +0 -128
- package/eval/golden_routes.json +0 -114
- package/eval/lib/experiment_registry.js +0 -24
- package/eval/run_eval.js +0 -197
- package/eval/run_fault_injection.js +0 -201
- package/eval/run_shadow_eval.js +0 -85
- package/eval/thresholds.json +0 -9
- package/examples/QUICKSTART.md +0 -183
- package/examples/README.md +0 -61
- package/examples/a3m-sdk.js +0 -124
- package/examples/basic-route.js +0 -54
- package/examples/chat-loop.js +0 -202
- package/examples/classify-then-route.js +0 -102
- package/examples/cost-compare.js +0 -120
- package/examples/ensemble.js +0 -160
- package/examples/whatsapp-telegram-bridge-demo.js +0 -302
- package/examples/whatsapp-telegram-bridge.js +0 -269
- package/hf-space/README.md +0 -23
- package/hf-space/app.py +0 -240
- package/hf-space/requirements.txt +0 -1
- package/huggingface_space/README.md +0 -35
- package/huggingface_space/app.py +0 -126
- package/huggingface_space/create_space.py +0 -208
- package/huggingface_space/requirements.txt +0 -1
- package/mcp-server/README.md +0 -188
- package/mcp-server/package.json +0 -29
- package/mcp-server/src/index.ts +0 -744
- package/mcp-server/tsconfig.json +0 -19
- package/openclaw-alexa-bridge/ALL_REMAINING_FIXES_PLAN.md +0 -313
- package/openclaw-alexa-bridge/REMAINING_FIXES_SUMMARY.md +0 -277
- package/openclaw-alexa-bridge/src/alexa_handler_no_tmlpd.js +0 -1234
- package/openclaw-alexa-bridge/test_fixes.js +0 -77
- package/playground/README.md +0 -51
- package/playground/codesandbox.json +0 -12
- package/playground/index.js +0 -39
- package/proxy/README.md +0 -227
- package/proxy/package-lock.json +0 -831
- package/proxy/package.json +0 -17
- package/proxy/rate-limit.js +0 -145
- package/proxy/rate-limit.test.js +0 -311
- package/proxy/server.js +0 -970
- package/python/README.md +0 -102
- package/python/a3m/__init__.py +0 -6
- package/python/a3m/client.py +0 -190
- package/python/a3m/models.py +0 -40
- package/python/a3m/sync_client.py +0 -61
- package/python/examples.py +0 -53
- package/python/integrations.py +0 -330
- package/python/pyproject.toml +0 -23
- package/python/setup.py +0 -28
- package/python/tmlpd.py +0 -369
- package/qna/REDDIT_GAP_ANALYSIS.md +0 -299
- package/qna/TMLPD_QNA.md +0 -751
- package/research/FINDING_001_safety.md +0 -28
- package/research/FINDING_002_error_diversity.md +0 -32
- package/research/FINDING_003_confidence_weighted_voting.md +0 -32
- package/research/FINDING_004_cross_model_semantic_detection.md +0 -37
- package/research/FINDING_005_knowledge_gap_orthogonality.md +0 -34
- package/research/HALLUCINATION_RESEARCH.md +0 -27
- package/research/ensemble-voting.md +0 -324
- package/research/loss-functions.md +0 -545
- package/research-log.md +0 -49
- package/scripts/banner.js +0 -29
- package/scripts/benchmark-local-routerarena.ts +0 -176
- package/scripts/benchmark.js +0 -145
- package/scripts/benchmark.sh +0 -61
- package/scripts/compare-providers.sh +0 -230
- package/scripts/content-planner.js +0 -25
- package/scripts/create-labeled-benchmark.ts +0 -105
- package/scripts/cross_post.py +0 -443
- package/scripts/local-router-benchmark.ts +0 -154
- package/scripts/post-all.sh +0 -41
- package/scripts/publish_fcc.py +0 -106
- package/scripts/push-to-gitee.sh +0 -25
- package/scripts/routerarena_ensemble.js +0 -144
- package/scripts/routing-benchmark-v2.js +0 -373
- package/scripts/routing-benchmark-v3.js +0 -118
- package/scripts/routing-benchmark.js +0 -462
- package/scripts/run-labeled-benchmark.mjs +0 -104
- package/scripts/run-mmlu-benchmark.js +0 -176
- package/scripts/run-provider-benchmark.js +0 -244
- package/scripts/update-npm-badges.js +0 -158
- package/skill/SKILL.md +0 -238
- package/src/__tests__/integration/tmpld_integration.test.py +0 -540
- package/src/skills/__tests__/skill_manager.test.ts +0 -328
- package/submissions/benchmarks/ALL_PLATFORMS_SUBMISSION.md +0 -94
- package/submissions/benchmarks/LLMROUTERBENCH_SUBMISSION.md +0 -121
- package/submissions/benchmarks/MMRBENCH_SUBMISSION.md +0 -94
- package/submissions/benchmarks/ROUTERARENA_UPDATE.md +0 -83
- package/submissions/benchmarks/ROUTERBENCH_SUBMISSION.md +0 -225
- package/test-council/1-structure-tests.test.js +0 -353
- package/test-council/1-structure-tests.test.ts +0 -353
- package/test-council/2-edge-case-tests.test.ts +0 -361
- package/test-council/3-performance-tests.test.ts +0 -669
- package/test-council/4-integration-tests.test.ts +0 -391
- package/test-council/5-agent-council-eval.test.ts +0 -413
- package/test-council/AGENT_COUNCIL_ARCHITECTURE.md +0 -349
- package/test-council/TEST_COUNCIL_REPORT.md +0 -201
- package/test-council/agents/edge-case-agent.ts +0 -363
- package/test-council/agents/performance-agent.ts +0 -426
- package/test-council/agents/structure-agent.ts +0 -227
- package/test-council/council.md +0 -183
- package/tests/__mocks__/tokenUtils.ts +0 -8
- package/tests/memory/episodicMemory.test.ts +0 -227
- package/tests/package-lock.json +0 -1628
- package/tests/package.json +0 -18
- package/tests/routing/ensembleVoting.test.ts +0 -236
- package/tests/routing/providerRetry.test.ts +0 -360
- package/tests/routing/queryTypePresets.test.ts +0 -208
- package/tests/security/guardrailEngine.test.ts +0 -700
- package/tests/tsconfig.json +0 -21
- package/tests/vitest.config.ts +0 -18
- package/tmlpd-pi-extension/README.md +0 -66
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts +0 -114
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/cache/prefixCache.js +0 -285
- package/tmlpd-pi-extension/dist/cache/prefixCache.js.map +0 -1
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts +0 -58
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/cache/responseCache.js +0 -153
- package/tmlpd-pi-extension/dist/cache/responseCache.js.map +0 -1
- package/tmlpd-pi-extension/dist/cli.js +0 -59
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts +0 -95
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/cost/costTracker.js +0 -240
- package/tmlpd-pi-extension/dist/cost/costTracker.js.map +0 -1
- package/tmlpd-pi-extension/dist/index.d.ts +0 -723
- package/tmlpd-pi-extension/dist/index.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/index.js +0 -239
- package/tmlpd-pi-extension/dist/index.js.map +0 -1
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts +0 -82
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js +0 -145
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js.map +0 -1
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts +0 -102
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js +0 -207
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js.map +0 -1
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts +0 -85
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js +0 -210
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js.map +0 -1
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts +0 -102
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/providers/localProvider.js +0 -338
- package/tmlpd-pi-extension/dist/providers/localProvider.js.map +0 -1
- package/tmlpd-pi-extension/dist/providers/registry.d.ts +0 -55
- package/tmlpd-pi-extension/dist/providers/registry.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/providers/registry.js +0 -138
- package/tmlpd-pi-extension/dist/providers/registry.js.map +0 -1
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts +0 -68
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js +0 -332
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js.map +0 -1
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts +0 -101
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js +0 -368
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js.map +0 -1
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts +0 -96
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js +0 -170
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js.map +0 -1
- package/tmlpd-pi-extension/dist/utils/compression.d.ts +0 -61
- package/tmlpd-pi-extension/dist/utils/compression.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/utils/compression.js +0 -281
- package/tmlpd-pi-extension/dist/utils/compression.js.map +0 -1
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts +0 -74
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/utils/reliability.js +0 -177
- package/tmlpd-pi-extension/dist/utils/reliability.js.map +0 -1
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts +0 -117
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js +0 -246
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js.map +0 -1
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts +0 -50
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts.map +0 -1
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js +0 -124
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js.map +0 -1
- package/tmlpd-pi-extension/examples/QUICKSTART.md +0 -183
- package/tmlpd-pi-extension/package-lock.json +0 -79
- package/tmlpd-pi-extension/package.json +0 -172
- package/tmlpd-pi-extension/python/examples.py +0 -53
- package/tmlpd-pi-extension/python/integrations.py +0 -330
- package/tmlpd-pi-extension/python/setup.py +0 -28
- package/tmlpd-pi-extension/python/tmlpd.py +0 -369
- package/tmlpd-pi-extension/qna/REDDIT_GAP_ANALYSIS.md +0 -299
- package/tmlpd-pi-extension/qna/TMLPD_QNA.md +0 -751
- package/tmlpd-pi-extension/skill/SKILL.md +0 -238
- package/tmlpd-pi-extension/src/cache/responseCache.ts +0 -147
- package/tmlpd-pi-extension/src/cost/costTracker.ts +0 -302
- package/tmlpd-pi-extension/src/index.ts +0 -232
- package/tmlpd-pi-extension/src/memory/episodicMemory.ts +0 -257
- package/tmlpd-pi-extension/src/orchestration/haloOrchestrator.ts +0 -266
- package/tmlpd-pi-extension/src/orchestration/mctsWorkflow.ts +0 -262
- package/tmlpd-pi-extension/src/providers/localProvider.ts +0 -406
- package/tmlpd-pi-extension/src/providers/registry.ts +0 -164
- package/tmlpd-pi-extension/src/routing/ensembleVoting.ts +0 -159
- package/tmlpd-pi-extension/src/routing/queryTypePresets.ts +0 -136
- package/tmlpd-pi-extension/src/tools/tmlpdTools.ts +0 -433
- package/tmlpd-pi-extension/src/utils/batchProcessor.ts +0 -232
- package/tmlpd-pi-extension/src/utils/compression.ts +0 -325
- package/tmlpd-pi-extension/src/utils/reliability.ts +0 -221
- package/tmlpd-pi-extension/src/utils/tokenUtils.ts +0 -145
- package/tmlpd-pi-extension/tsconfig.json +0 -18
- package/tsconfig.build.json +0 -29
- package/tsconfig.json +0 -18
- /package/{docs/llms-full.txt → llms-full.txt.bak} +0 -0
package/articles/REDDIT_FINAL.md
DELETED
|
@@ -1,232 +0,0 @@
|
|
|
1
|
-
# [R] I benchmarked 47 LLM providers against 12K+ real queries - the cost/speed/quality matrix
|
|
2
|
-
|
|
3
|
-
---
|
|
4
|
-
|
|
5
|
-
## TL;DR
|
|
6
|
-
|
|
7
|
-
I ran 12,847 real-world queries through 47 LLM API providers, scoring each on quality, measuring latency, and tracking cost and uptime. The goal: build an evidence base for intelligent model routing rather than defaulting to a single provider. The data shows a 70% cost reduction is achievable with marginal quality loss by matching query complexity to the right model.
|
|
8
|
-
|
|
9
|
-
All findings below. Code and routing system open-sourced.
|
|
10
|
-
|
|
11
|
-
## Motivation
|
|
12
|
-
|
|
13
|
-
Most LLM applications hard-code a single provider. When cost or latency becomes a problem, teams either switch providers entirely or implement ad-hoc fallback chains. Neither approach is systematic.
|
|
14
|
-
|
|
15
|
-
I wanted to answer: **for a given query type, which provider gives the best quality-per-dollar?**
|
|
16
|
-
|
|
17
|
-
The answer turns out to depend heavily on what you're asking.
|
|
18
|
-
|
|
19
|
-
## Methodology
|
|
20
|
-
|
|
21
|
-
### Query Dataset
|
|
22
|
-
|
|
23
|
-
- **12,847 queries** collected from production traffic over 60 days (March-April 2026)
|
|
24
|
-
- Queries were manually categorized into 5 buckets by complexity and domain:
|
|
25
|
-
|
|
26
|
-
| Category | Count | % of Total | Description |
|
|
27
|
-
|---|---|---|---|
|
|
28
|
-
| Simple Q&A | 3,212 | 25.0% | Factual lookup, definition, single-step reasoning |
|
|
29
|
-
| Code | 2,831 | 22.0% | Code generation, debugging, refactoring |
|
|
30
|
-
| Summary | 2,574 | 20.0% | Summarization, extraction, reformulation |
|
|
31
|
-
| Complex Reasoning | 2,182 | 17.0% | Multi-step logic, analysis, comparison |
|
|
32
|
-
| Multilingual | 2,048 | 16.0% | Queries in Hindi, Bengali, Hinglish, Chinese, French, Spanish |
|
|
33
|
-
|
|
34
|
-
### Quality Scoring
|
|
35
|
-
|
|
36
|
-
Quality was evaluated using a two-stage process:
|
|
37
|
-
|
|
38
|
-
1. **Reference-based scoring**: For each query category, I held out 200 queries and wrote reference answers manually. Model outputs were compared against these references using a combination of:
|
|
39
|
-
- Semantic similarity (embedding cosine distance)
|
|
40
|
-
- LLM-as-judge scoring (GPT-4o as evaluator, blind to model identity)
|
|
41
|
-
- Task-specific heuristics (e.g., code correctness via unit test pass rate)
|
|
42
|
-
|
|
43
|
-
2. **Pairwise Elo rating**: Each model output was compared against outputs from 3 other models for the same query. Wins/losses updated an Elo rating per category. The final quality percentage is normalized Elo across all categories.
|
|
44
|
-
|
|
45
|
-
This is not a perfect methodology. LLM-as-judge has known biases. But it's consistent enough to separate tiers.
|
|
46
|
-
|
|
47
|
-
### Latency Measurement
|
|
48
|
-
|
|
49
|
-
- Measured from request dispatch to full response receipt (non-streaming)
|
|
50
|
-
- 3 runs per query, median reported
|
|
51
|
-
- All requests from a single US-East GCP instance
|
|
52
|
-
- Network variance: +/- 50ms across runs
|
|
53
|
-
|
|
54
|
-
### Cost
|
|
55
|
-
|
|
56
|
-
- Based on published per-token pricing as of May 2026
|
|
57
|
-
- Computed per 1M tokens (combined input+output, weighted by observed ratio)
|
|
58
|
-
|
|
59
|
-
### Uptime
|
|
60
|
-
|
|
61
|
-
- Tracked over the same 60-day window
|
|
62
|
-
- Measured as % of 5-minute intervals where at least one successful response was received
|
|
63
|
-
- Excludes planned maintenance windows from provider status pages
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## Results
|
|
68
|
-
|
|
69
|
-
### Quality by Category
|
|
70
|
-
|
|
71
|
-
Quality scores (0-100) per provider, broken down by query type. Only providers scoring above 75% on at least one category are listed:
|
|
72
|
-
|
|
73
|
-
| Provider | Simple Q&A | Code | Summary | Complex | Multilingual | Overall |
|
|
74
|
-
|---|---|---|---|---|---|---|
|
|
75
|
-
| OpenAI GPT-4 | 96 | 94 | 95 | 97 | 93 | **95** |
|
|
76
|
-
| Anthropic Claude 3.5 | 95 | 93 | 96 | 96 | 90 | **94** |
|
|
77
|
-
| Google Gemini 2.5 Pro | 94 | 91 | 94 | 94 | 91 | **93** |
|
|
78
|
-
| GLM-4 (Zhipu) | 91 | 88 | 90 | 93 | 95 | **92** |
|
|
79
|
-
| Mistral Large | 90 | 89 | 92 | 91 | 86 | **90** |
|
|
80
|
-
| MiniMax-M2 | 88 | 86 | 91 | 88 | 92 | **89** |
|
|
81
|
-
| Groq (Llama 3.3 70B) | 84 | 80 | 83 | 78 | 79 | **82** |
|
|
82
|
-
| Cerebras (Llama 3.3 70B) | 84 | 79 | 83 | 77 | 80 | **82** |
|
|
83
|
-
| DeepSeek V3 | 89 | 90 | 88 | 85 | 84 | **88** |
|
|
84
|
-
| Cohere Command R+ | 88 | 82 | 91 | 84 | 85 | **87** |
|
|
85
|
-
|
|
86
|
-
**Key finding**: The quality gap between GPT-4 and Groq/Cerebras is 13 points overall, but only 2-4 points on Simple Q&A. For straightforward queries, cheaper models are nearly indistinguishable.
|
|
87
|
-
|
|
88
|
-
GLM-4 scores notably well on multilingual (95%), outperforming GPT-4 (93%) on the Hindi/Bengali/Chinese subset.
|
|
89
|
-
|
|
90
|
-
### Cost per 1M Tokens
|
|
91
|
-
|
|
92
|
-
| Provider | Cost/1M tokens | Notes |
|
|
93
|
-
|---|---|---|
|
|
94
|
-
| Groq | $0.59 | Llama 3.3 70B, free tier available |
|
|
95
|
-
| Cerebras | $0.60 | Llama 3.3 70B |
|
|
96
|
-
| Together AI | $0.72 | Mixtral 8x22B |
|
|
97
|
-
| DeepSeek | $0.80 | DeepSeek V3 |
|
|
98
|
-
| Fireworks | $1.10 | Llama 3.3 70B |
|
|
99
|
-
| MiniMax | $1.50 | MiniMax-M2 |
|
|
100
|
-
| Mistral | $2.00 | Mistral Large |
|
|
101
|
-
| GLM-4 | $2.80 | Via Zhipu API |
|
|
102
|
-
| Cohere | $3.00 | Command R+ |
|
|
103
|
-
| Google Gemini 2.5 Flash | $3.50 | Flash variant |
|
|
104
|
-
| Google Gemini 2.5 Pro | $7.00 | Pro variant |
|
|
105
|
-
| Anthropic Claude 3.5 | $15.00 | Sonnet pricing |
|
|
106
|
-
| OpenAI GPT-4 | $30.00 | Latest pricing |
|
|
107
|
-
|
|
108
|
-
**50x cost range** between cheapest and most expensive.
|
|
109
|
-
|
|
110
|
-
### Latency (Median, non-streaming)
|
|
111
|
-
|
|
112
|
-
| Provider | p50 latency | p95 latency |
|
|
113
|
-
|---|---|---|
|
|
114
|
-
| Cerebras | 380ms | 620ms |
|
|
115
|
-
| Groq | 420ms | 710ms |
|
|
116
|
-
| Fireworks | 580ms | 1100ms |
|
|
117
|
-
| MiniMax | 600ms | 1050ms |
|
|
118
|
-
| Together AI | 650ms | 1300ms |
|
|
119
|
-
| Mistral | 800ms | 1800ms |
|
|
120
|
-
| GLM-4 | 800ms | 1600ms |
|
|
121
|
-
| DeepSeek | 850ms | 2000ms |
|
|
122
|
-
| Cohere | 1100ms | 2200ms |
|
|
123
|
-
| Google Gemini 2.5 Pro | 1500ms | 3200ms |
|
|
124
|
-
| Anthropic Claude 3.5 | 1800ms | 3500ms |
|
|
125
|
-
| OpenAI GPT-4 | 2100ms | 4500ms |
|
|
126
|
-
|
|
127
|
-
Cerebras and Groq are in a different league for latency. Both run Llama 3.3 70B on custom inference silicon. The tradeoff: lower quality ceiling than proprietary models.
|
|
128
|
-
|
|
129
|
-
### Uptime (60-day window)
|
|
130
|
-
|
|
131
|
-
| Provider | Uptime | Longest outage |
|
|
132
|
-
|---|---|---|
|
|
133
|
-
| OpenAI | 99.91% | 23 min |
|
|
134
|
-
| Anthropic | 99.87% | 41 min |
|
|
135
|
-
| Google Gemini | 99.82% | 58 min |
|
|
136
|
-
| Mistral | 99.65% | 2.1 hr |
|
|
137
|
-
| Groq | 99.40% | 3.5 hr |
|
|
138
|
-
| GLM-4 | 99.30% | 4.0 hr |
|
|
139
|
-
| Cerebras | 99.25% | 3.2 hr |
|
|
140
|
-
| MiniMax | 99.10% | 5.5 hr |
|
|
141
|
-
| DeepSeek | 98.80% | 8.2 hr |
|
|
142
|
-
| Cohere | 99.70% | 1.5 hr |
|
|
143
|
-
|
|
144
|
-
Budget providers have meaningfully lower uptime. Groq and Cerebras both had multi-hour outages during the test window. If you route to them, you need automatic fallback logic.
|
|
145
|
-
|
|
146
|
-
---
|
|
147
|
-
|
|
148
|
-
## The Routing Hypothesis
|
|
149
|
-
|
|
150
|
-
The data suggests a clear strategy: **match query complexity to model capability**.
|
|
151
|
-
|
|
152
|
-
Here's what a naive routing policy looks like based on these numbers:
|
|
153
|
-
|
|
154
|
-
| Query Type | Route to | Cost vs GPT-4 | Quality delta |
|
|
155
|
-
|---|---|---|---|
|
|
156
|
-
| Simple Q&A | Groq/Cerebras | -98% | -12% (96->84) |
|
|
157
|
-
| Code (simple) | Groq/Cerebras | -98% | -14% (94->80) |
|
|
158
|
-
| Code (complex) | DeepSeek/Mistral | -97% | -4% (94->90) |
|
|
159
|
-
| Summary | MiniMax/Mistral | -93% | -3% (95->92) |
|
|
160
|
-
| Complex Reasoning | GLM-4/Mistral | -91% | -4% (97->93) |
|
|
161
|
-
| Multilingual | GLM-4/MiniMax | -91% | +2% (93->95) |
|
|
162
|
-
| Fallback (uncertain) | GPT-4/Claude | baseline | baseline |
|
|
163
|
-
|
|
164
|
-
Applying this routing to the 12,847 query distribution: **70.3% cost reduction** with a weighted quality drop of 3.8 points (from 95 to 91.2).
|
|
165
|
-
|
|
166
|
-
For most production workloads, that tradeoff is favorable.
|
|
167
|
-
|
|
168
|
-
### What I Built From This Data
|
|
169
|
-
|
|
170
|
-
I packaged the routing logic into an npm library: **adaptive-memory-multi-model-router**.
|
|
171
|
-
|
|
172
|
-
- GitHub: https://github.com/Das-rebel/a3m-router
|
|
173
|
-
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
174
|
-
|
|
175
|
-
It handles provider selection, automatic fallback on failure/timeout, and cost tracking per request. The routing table is configurable -- you can set your own quality/cost thresholds. It ships with the benchmark data above as default routing weights.
|
|
176
|
-
|
|
177
|
-
The routing decision is currently rule-based (query category -> provider). I experimented with learned routing (training a classifier on query features to predict optimal provider) but the rule-based approach matched it within 1% on cost savings with far less complexity.
|
|
178
|
-
|
|
179
|
-
---
|
|
180
|
-
|
|
181
|
-
## Limitations
|
|
182
|
-
|
|
183
|
-
Several things this benchmark does **not** tell you:
|
|
184
|
-
|
|
185
|
-
1. **Streaming latency not measured.** Most production apps use streaming. Non-streaming latency is a proxy but not identical. Cerebras/Groq's advantage may be even larger with streaming due to first-token latency.
|
|
186
|
-
|
|
187
|
-
2. **Context window behavior not tested.** All queries were under 4K tokens. Performance with 32K+ context (RAG, long documents) may differ significantly. Some providers degrade noticeably at longer contexts.
|
|
188
|
-
|
|
189
|
-
3. **Single region only.** All requests originated from US-East. Latency from Europe or Asia will look different, especially for Mistral (EU-hosted) and GLM-4 (China-hosted).
|
|
190
|
-
|
|
191
|
-
4. **Quality scoring has biases.** LLM-as-judge tends to prefer longer, more verbose outputs. This may inflate scores for some providers. The Elo pairwise comparison mitigates this somewhat but doesn't eliminate it.
|
|
192
|
-
|
|
193
|
-
5. **Provider-specific features ignored.** Function calling, structured output, vision, tool use -- none of these were tested. If you need reliable function calling, OpenAI and Anthropic are still meaningfully ahead.
|
|
194
|
-
|
|
195
|
-
6. **Snapshot in time.** Provider models and pricing change frequently. These numbers are from March-May 2026. Re-run before making decisions.
|
|
196
|
-
|
|
197
|
-
7. **No fine-tuned models tested.** All providers tested with their base offerings. Fine-tuned variants (e.g., your own Llama fine-tune on Groq) could shift results significantly.
|
|
198
|
-
|
|
199
|
-
8. **Sample bias.** Queries come from my own applications (chat, coding assistant, multilingual content processing). Different workloads will see different quality distributions.
|
|
200
|
-
|
|
201
|
-
---
|
|
202
|
-
|
|
203
|
-
## Lessons Learned
|
|
204
|
-
|
|
205
|
-
**1. The cheapest model that works is usually good enough.** For ~40% of real-world queries, Groq/Cerebras at $0.60/1M tokens produce outputs within 5% of GPT-4 quality. The gap is real but rarely matters for simple tasks.
|
|
206
|
-
|
|
207
|
-
**2. Multilingual is where mid-tier models shine.** GLM-4 and MiniMax both outperform GPT-4 on Hindi/Bengali/Chinese at 1/10th the cost. If multilingual is your primary use case, routing to these providers is a no-brainer.
|
|
208
|
-
|
|
209
|
-
**3. Uptime matters more than you think.** Groq had a 3.5-hour outage during testing. If you're routing 100% of simple queries to Groq, that's a 3.5-hour window where either queries fail or you need fallback logic. The routing system **must** handle provider failures gracefully.
|
|
210
|
-
|
|
211
|
-
**4. Latency variance is the hidden problem.** p50 tells you the typical experience. p95 tells you what users actually perceive. OpenAI's p95 is 4.5 seconds, more than 2x its p50. If you have SLAs, plan around p95.
|
|
212
|
-
|
|
213
|
-
**5. The "best" provider depends on your query distribution.** There is no universal winner. A coding assistant should route differently than a multilingual chatbot. Know your query mix before choosing providers.
|
|
214
|
-
|
|
215
|
-
**6. Quality scores compress over time.** Compared to a similar benchmark I ran 6 months ago, the gap between top-tier and budget providers narrowed from ~20 points to ~13 points. Model quality is converging. Cost and latency are becoming the differentiators.
|
|
216
|
-
|
|
217
|
-
---
|
|
218
|
-
|
|
219
|
-
## Questions for the Community
|
|
220
|
-
|
|
221
|
-
- **What providers did I miss?** I tested 47 but there are many more (Replicate, Anyscale, Perplexity API, Lepton, various regional providers). If you have benchmark data for others, I'd like to compare.
|
|
222
|
-
- **Do these quality scores match your experience?** Particularly interested in disagreements on the code and multilingual categories, since those are hardest to score objectively.
|
|
223
|
-
- **Has anyone trained a learned router?** My rule-based approach works but I suspect a lightweight classifier could squeeze another 2-5% cost savings. Curious what others have found.
|
|
224
|
-
- **How are you handling provider failover?** The latency of detecting a failure and switching providers is a real cost. Currently I use a 2-second timeout with a health check cache. What's your approach?
|
|
225
|
-
|
|
226
|
-
---
|
|
227
|
-
|
|
228
|
-
**Links:**
|
|
229
|
-
- GitHub: https://github.com/Das-rebel/a3m-router
|
|
230
|
-
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
231
|
-
|
|
232
|
-
Raw benchmark data is in the repo under `benchmarks/`. PRs welcome if you want to add your own provider data.
|
package/articles/REDDIT_POST.md
DELETED
|
@@ -1,67 +0,0 @@
|
|
|
1
|
-
# Reddit Post - Daslearnsai
|
|
2
|
-
|
|
3
|
-
## Target Subreddits
|
|
4
|
-
- r/LocalLLaMA
|
|
5
|
-
- r/SideProject
|
|
6
|
-
- r/programming
|
|
7
|
-
- r/MachineLearning
|
|
8
|
-
|
|
9
|
-
## Post Title Options
|
|
10
|
-
1. "I built an LLM router that beats GPT-5 at 1/213th the cost — #1 on RouterArena"
|
|
11
|
-
2. "A3M Router: 70.32 score, $0.047/1K, open-source"
|
|
12
|
-
|
|
13
|
-
## Post Body
|
|
14
|
-
|
|
15
|
-
```
|
|
16
|
-
I built A3M Router — an open-source LLM routing proxy that ranks #1 on RouterArena (arXiv:2510.00202).
|
|
17
|
-
|
|
18
|
-
**The Numbers:**
|
|
19
|
-
- RouterArena Score: 70.32 (#1 of 19 routers)
|
|
20
|
-
- Cost: $0.047 per 1K queries
|
|
21
|
-
- vs GPT-5: 213x cheaper with better accuracy
|
|
22
|
-
- vs RouteLLM: 59% higher score at 5.7x lower cost
|
|
23
|
-
|
|
24
|
-
**How it works:**
|
|
25
|
-
Instead of sending every query to expensive models, A3M routes queries to the cheapest capable provider using 12 keyword signals.
|
|
26
|
-
|
|
27
|
-
Simple query (hi, thanks) → free tier (Groq llama)
|
|
28
|
-
Complex query (explain quantum entanglement) → premium (GPT-4o)
|
|
29
|
-
|
|
30
|
-
**Features:**
|
|
31
|
-
- Parallel multi-LLM execution (fire multiple, pick best)
|
|
32
|
-
- 47+ providers: OpenAI, Anthropic, Groq, Cerebras, DeepSeek, Gemini, Mistral...
|
|
33
|
-
- Memory across sessions
|
|
34
|
-
- Semantic cache (30%+ hit rate)
|
|
35
|
-
- Budget enforcement
|
|
36
|
-
- Circuit breaker with auto-failover
|
|
37
|
-
|
|
38
|
-
**Quick start:**
|
|
39
|
-
```bash
|
|
40
|
-
npx a3m-router serve
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
Then use it like OpenAI:
|
|
44
|
-
```python
|
|
45
|
-
from openai import OpenAI
|
|
46
|
-
client = OpenAI(
|
|
47
|
-
api_key="your-key",
|
|
48
|
-
base_url="http://localhost:8787/v1" # A3M proxy
|
|
49
|
-
)
|
|
50
|
-
response = client.chat.completions.create(
|
|
51
|
-
model="auto", # A3M routes automatically
|
|
52
|
-
messages=[{"role": "user", "content": "Your query"}]
|
|
53
|
-
)
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
GitHub: https://github.com/Das-rebel/a3m-router
|
|
57
|
-
npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
58
|
-
|
|
59
|
-
Demo: [asciinema.org/a/RpqOZM9tFMALYWvs]
|
|
60
|
-
|
|
61
|
-
AMA!
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
## Posting Strategy
|
|
65
|
-
1. Post to r/LocalLLaMA first (most receptive)
|
|
66
|
-
2. 24h later: r/SideProject, r/programming
|
|
67
|
-
3. Track engagement
|
|
@@ -1,348 +0,0 @@
|
|
|
1
|
-
# A3M Router — Reddit Submission-Ready Posts
|
|
2
|
-
|
|
3
|
-
---
|
|
4
|
-
|
|
5
|
-
## Post 1: r/LocalLLaMA
|
|
6
|
-
|
|
7
|
-
**URL:** https://www.reddit.com/r/LocalLLaMA/submit/
|
|
8
|
-
|
|
9
|
-
**Title:** [R] I benchmarked 47 LLM providers against 12K+ real queries — the cost/speed/quality matrix
|
|
10
|
-
|
|
11
|
-
**Body:**
|
|
12
|
-
|
|
13
|
-
```
|
|
14
|
-
## TL;DR
|
|
15
|
-
|
|
16
|
-
I ran 12,847 real-world queries through 47 LLM API providers, scoring each on quality, measuring latency, and tracking cost and uptime. The goal: build an evidence base for intelligent model routing rather than defaulting to a single provider. The data shows a 70% cost reduction is achievable with marginal quality loss by matching query complexity to the right model.
|
|
17
|
-
|
|
18
|
-
All findings below. Code and routing system open-sourced.
|
|
19
|
-
|
|
20
|
-
## Motivation
|
|
21
|
-
|
|
22
|
-
Most LLM applications hard-code a single provider. When cost or latency becomes a problem, teams either switch providers entirely or implement ad-hoc fallback chains. Neither approach is systematic.
|
|
23
|
-
|
|
24
|
-
I wanted to answer: **for a given query type, which provider gives the best quality-per-dollar?**
|
|
25
|
-
|
|
26
|
-
The answer turns out to depend heavily on what you're asking.
|
|
27
|
-
|
|
28
|
-
## Methodology
|
|
29
|
-
|
|
30
|
-
### Query Dataset
|
|
31
|
-
|
|
32
|
-
- **12,847 queries** collected from production traffic over 60 days (March-April 2026)
|
|
33
|
-
- Queries were manually categorized into 5 buckets by complexity and domain:
|
|
34
|
-
|
|
35
|
-
| Category | Count | % of Total | Description |
|
|
36
|
-
|---|---|---|---|
|
|
37
|
-
| Simple Q&A | 3,212 | 25.0% | Factual lookup, definition, single-step reasoning |
|
|
38
|
-
| Code | 2,831 | 22.0% | Code generation, debugging, refactoring |
|
|
39
|
-
| Summary | 2,574 | 20.0% | Summarization, extraction, reformulation |
|
|
40
|
-
| Complex Reasoning | 2,182 | 17.0% | Multi-step logic, analysis, comparison |
|
|
41
|
-
| Multilingual | 2,048 | 16.0% | Queries in Hindi, Bengali, Hinglish, Chinese, French, Spanish |
|
|
42
|
-
|
|
43
|
-
### Quality Scoring
|
|
44
|
-
|
|
45
|
-
Quality was evaluated using a two-stage process:
|
|
46
|
-
|
|
47
|
-
1. **Reference-based scoring**: For each query category, I held out 200 queries and wrote reference answers manually. Model outputs were compared against these references using a combination of:
|
|
48
|
-
- Semantic similarity (embedding cosine distance)
|
|
49
|
-
- LLM-as-judge scoring (GPT-4o as evaluator, blind to model identity)
|
|
50
|
-
- Task-specific heuristics (e.g., code correctness via unit test pass rate)
|
|
51
|
-
|
|
52
|
-
2. **Pairwise Elo rating**: Each model output was compared against outputs from 3 other models for the same query. Wins/losses updated an Elo rating per category. The final quality percentage is normalized Elo across all categories.
|
|
53
|
-
|
|
54
|
-
This is not a perfect methodology. LLM-as-judge has known biases. But it's consistent enough to separate tiers.
|
|
55
|
-
|
|
56
|
-
### Cost per 1M Tokens
|
|
57
|
-
|
|
58
|
-
| Provider | Cost/1M tokens |
|
|
59
|
-
|---|---|
|
|
60
|
-
| Groq | $0.59 |
|
|
61
|
-
| Cerebras | $0.60 |
|
|
62
|
-
| DeepSeek V3 | $0.80 |
|
|
63
|
-
| MiniMax-M2 | $1.50 |
|
|
64
|
-
| Mistral Large | $2.00 |
|
|
65
|
-
| GLM-4 | $2.80 |
|
|
66
|
-
| Google Gemini 2.5 Flash | $3.50 |
|
|
67
|
-
| Google Gemini 2.5 Pro | $7.00 |
|
|
68
|
-
| Anthropic Claude 3.5 | $15.00 |
|
|
69
|
-
| OpenAI GPT-4 | $30.00 |
|
|
70
|
-
|
|
71
|
-
**50x cost range** between cheapest and most expensive.
|
|
72
|
-
|
|
73
|
-
### The Routing Policy
|
|
74
|
-
|
|
75
|
-
Based on the data, here's the routing policy:
|
|
76
|
-
|
|
77
|
-
| Query Type | Route to | Cost vs GPT-4 | Quality delta |
|
|
78
|
-
|---|---|---|---|
|
|
79
|
-
| Simple Q&A | Groq/Cerebras | -98% | -12% |
|
|
80
|
-
| Code (simple) | Groq/Cerebras | -98% | -14% |
|
|
81
|
-
| Code (complex) | DeepSeek/Mistral | -97% | -4% |
|
|
82
|
-
| Summary | MiniMax/Mistral | -93% | -3% |
|
|
83
|
-
| Complex Reasoning | GLM-4/Mistral | -91% | -4% |
|
|
84
|
-
| Multilingual | GLM-4/MiniMax | -91% | +2% |
|
|
85
|
-
| Fallback (uncertain) | GPT-4/Claude | baseline | baseline |
|
|
86
|
-
|
|
87
|
-
Applying this to the query distribution: **70.3% cost reduction** with a weighted quality drop of 3.8 points.
|
|
88
|
-
|
|
89
|
-
### What I Built
|
|
90
|
-
|
|
91
|
-
I packaged this into an npm library: **A3M Router**.
|
|
92
|
-
|
|
93
|
-
- GitHub: https://github.com/Das-rebel/a3m-router
|
|
94
|
-
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
95
|
-
|
|
96
|
-
```bash
|
|
97
|
-
npm install adaptive-memory-multi-model-router
|
|
98
|
-
npx a3m-router serve
|
|
99
|
-
# Then point OpenAI SDK at localhost:8787
|
|
100
|
-
```
|
|
101
|
-
|
|
102
|
-
## Limitations
|
|
103
|
-
|
|
104
|
-
1. **Streaming latency not measured.** Most production apps use streaming.
|
|
105
|
-
2. **Context window behavior not tested.** All queries were under 4K tokens.
|
|
106
|
-
3. **Single region only.** All requests from US-East.
|
|
107
|
-
4. **Quality scoring has biases.** LLM-as-judge prefers longer outputs.
|
|
108
|
-
5. **Snapshot in time.** Numbers are from March-May 2026.
|
|
109
|
-
6. **Sample bias.** Queries come from my own applications.
|
|
110
|
-
|
|
111
|
-
## Questions for the Community
|
|
112
|
-
|
|
113
|
-
- What providers did I miss? I tested 47 but there are many more.
|
|
114
|
-
- Do these quality scores match your experience?
|
|
115
|
-
- Has anyone trained a learned router? I experimented with this but rule-based matched it within 1%.
|
|
116
|
-
- How are you handling provider failover?
|
|
117
|
-
|
|
118
|
-
**Links:**
|
|
119
|
-
- GitHub: https://github.com/Das-rebel/a3m-router
|
|
120
|
-
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
121
|
-
- Raw benchmark data in `benchmarks/` — PRs welcome
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
**Pre-written comments:**
|
|
125
|
-
|
|
126
|
-
1. **Q: How does this compare to LiteLLM?**
|
|
127
|
-
A: LiteLLM (48K stars) does sequential fallback (try A → B → C). A3M Router runs all candidates in parallel and picks the best result. It's architecturally different — not just another proxy layer.
|
|
128
|
-
|
|
129
|
-
2. **Q: What's the accuracy on routing decisions?**
|
|
130
|
-
A: 82.5% routing accuracy (within 1 quality tier) based on our benchmark suite. We compared against RouteLLM's BERT classifier (85%) — 2.5% gap, but zero ML infrastructure needed.
|
|
131
|
-
|
|
132
|
-
3. **Q: What happens when a provider goes down?**
|
|
133
|
-
A: A3M has automatic failover with circuit breakers. If your primary provider fails mid-request, it routes to the next best candidate. Timeout is configurable (default 2s).
|
|
134
|
-
|
|
135
|
-
4. **Q: Is this production-ready?**
|
|
136
|
-
A: 271 tests passing, 15K+ npm downloads, active development. Use at your own discretion like any open-source project.
|
|
137
|
-
|
|
138
|
-
5. **Q: Can I use my own API keys?**
|
|
139
|
-
A: Yes. A3M Router is a local proxy — you bring your own API keys. It never stores or exfilters them.
|
|
140
|
-
|
|
141
|
-
---
|
|
142
|
-
|
|
143
|
-
## Post 2: r/MachineLearning
|
|
144
|
-
|
|
145
|
-
**URL:** https://www.reddit.com/r/MachineLearning/submit/
|
|
146
|
-
|
|
147
|
-
**Title:** [P] A3M Router achieves 82.5% routing accuracy with keyword matching — matches RouteLLM's BERT classifier (85%) without GPU
|
|
148
|
-
|
|
149
|
-
**Body:**
|
|
150
|
-
|
|
151
|
-
```
|
|
152
|
-
Hi r/MachineLearning,
|
|
153
|
-
|
|
154
|
-
We benchmarked our keyword-matching LLM router against RouteLLM's GPU-trained BERT classifier. The results surprised us.
|
|
155
|
-
|
|
156
|
-
**Benchmark comparison:**
|
|
157
|
-
|
|
158
|
-
| Metric | RouteLLM (BERT) | A3M Router (Keywords) |
|
|
159
|
-
|--------|------------------|------------------------|
|
|
160
|
-
| Accuracy (±1 tier) | 85% | 82.5% |
|
|
161
|
-
| ML required | Yes (PyTorch + CUDA) | No |
|
|
162
|
-
| Model size | ~500MB BERT | 0 bytes |
|
|
163
|
-
| GPU required | Yes | No |
|
|
164
|
-
| Cold start | ~3s (model load) | ~50ms |
|
|
165
|
-
| Install size | ~2GB+ | 3MB |
|
|
166
|
-
| Runtime | Python | Node.js |
|
|
167
|
-
|
|
168
|
-
2.5% accuracy gap. Zero ML infrastructure.
|
|
169
|
-
|
|
170
|
-
**Context:**
|
|
171
|
-
RouteLLM (from UC Berkeley, arXiv:2404.06035) trains a BERT classifier to route LLM queries between tiers. It's the gold standard for published LLM routing benchmarks.
|
|
172
|
-
|
|
173
|
-
We implemented routing via keyword-based feature extraction: 139 keywords, 12 complexity signals, heuristic scoring. No training loop, no gradient updates, no neural network.
|
|
174
|
-
|
|
175
|
-
**Routing algorithm:**
|
|
176
|
-
```javascript
|
|
177
|
-
// Feature extraction
|
|
178
|
-
const features = extractQueryFeatures(query);
|
|
179
|
-
// { has_code: true, complexity: 0.6, task_type: "code_gen" }
|
|
180
|
-
|
|
181
|
-
// Complexity-weighted scoring
|
|
182
|
-
if (features.complexity < 0.5) {
|
|
183
|
-
score = cost_efficiency * 0.7 + quality * 0.3;
|
|
184
|
-
} else if (features.has_code) {
|
|
185
|
-
score = speed * 0.4 + quality * 0.4 + cost * 0.2;
|
|
186
|
-
} else {
|
|
187
|
-
score = quality * 0.7 + cost_efficiency * 0.3;
|
|
188
|
-
}
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
**Why this matters for the ML community:**
|
|
192
|
-
|
|
193
|
-
1. **Benchmark transparency**: There are exactly two LLM routers with published routing accuracy: RouteLLM and us. LiteLLM (47K GitHub stars) publishes zero accuracy data. If the most popular tool won't tell you how often it's right, something is wrong.
|
|
194
|
-
|
|
195
|
-
2. **Efficiency question**: Is a 2.5% accuracy improvement worth requiring PyTorch, CUDA, a GPU, 500MB model download, and 3-second cold starts? For many production deployments, the answer is no.
|
|
196
|
-
|
|
197
|
-
3. **The 30x story**: 97% of the accuracy at 3% of the compute. That's a 30x efficiency multiplier.
|
|
198
|
-
|
|
199
|
-
**Cost results:**
|
|
200
|
-
- 63.7% average cost reduction vs single-provider routing
|
|
201
|
-
- 40 provider integrations
|
|
202
|
-
- Drop-in OpenAI-compatible proxy (localhost:8787)
|
|
203
|
-
|
|
204
|
-
**Growth (organically, zero marketing):**
|
|
205
|
-
- Day 1: 552 downloads
|
|
206
|
-
- Day 2: 320 downloads
|
|
207
|
-
- Day 3: 1,903 downloads
|
|
208
|
-
- 245% growth, zero budget
|
|
209
|
-
|
|
210
|
-
**Questions for the community:**
|
|
211
|
-
|
|
212
|
-
1. What benchmark methodology should we use for a more rigorous comparison? We used the same ±1 tier accuracy metric as RouteLLM's paper.
|
|
213
|
-
2. Has anyone else compared simple heuristic routing vs learned routing for LLM query classification? The gap seems smaller than expected.
|
|
214
|
-
3. What accuracy threshold would you need to see to trust keyword-based routing in production?
|
|
215
|
-
|
|
216
|
-
**Try it:**
|
|
217
|
-
```bash
|
|
218
|
-
npm install adaptive-memory-multi-model-router
|
|
219
|
-
npx a3m-router route "Write Python to sort an array"
|
|
220
|
-
npx a3m-router benchmark
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
GitHub: https://github.com/Das-rebel/a3m-router
|
|
224
|
-
|
|
225
|
-
The honest caveat: this is a young project (3 days since launch). The 82.5% number is from our benchmark suite, not an independent evaluation. We welcome scrutiny and would love to see third-party replication.
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
**Pre-written comments:**
|
|
229
|
-
|
|
230
|
-
1. **Q: Why not just use RouteLLM if it has higher accuracy?**
|
|
231
|
-
A: RouteLLM requires PyTorch + CUDA + GPU + 500MB download + 3s cold start. A3M is 3MB, pure JS, starts in 50ms. For many deployments the 2.5% accuracy gap is worth the operational simplicity.
|
|
232
|
-
|
|
233
|
-
2. **Q: How does this handle non-English queries?**
|
|
234
|
-
A: We have a multilingual routing category. GLM-4 and MiniMax both outperform GPT-4 on Hindi/Bengali/Chinese at 1/10th the cost based on our benchmarks.
|
|
235
|
-
|
|
236
|
-
3. **Q: Is there a learned routing version planned?**
|
|
237
|
-
A: We experimented with a lightweight classifier but the rule-based approach matched it within 1% on cost savings. The complexity/reward tradeoff doesn't justify the additional infrastructure right now.
|
|
238
|
-
|
|
239
|
-
4. **Q: What about the parallel execution claim? Do you run all 47 providers at once?**
|
|
240
|
-
A: No — that would be expensive and slow. Parallel execution is configurable: you can set how many candidates to run simultaneously. Default is top-2 with scoring.
|
|
241
|
-
|
|
242
|
-
5. **Q: How is routing quality measured in production over time?**
|
|
243
|
-
A: Good question. We track cost-per-query and fallback rate. If fallback rates spike, we investigate routing rules. We'd love to add more sophisticated monitoring.
|
|
244
|
-
|
|
245
|
-
---
|
|
246
|
-
|
|
247
|
-
## Post 3: r/SideProject
|
|
248
|
-
|
|
249
|
-
**URL:** https://www.reddit.com/r/SideProject/submit/
|
|
250
|
-
|
|
251
|
-
**Title:** I built an LLM router that beats GPT-5 at 1/213th the cost — now at 15K npm downloads with zero marketing
|
|
252
|
-
|
|
253
|
-
**Body:**
|
|
254
|
-
|
|
255
|
-
```
|
|
256
|
-
## What I built
|
|
257
|
-
|
|
258
|
-
A3M Router — an open-source LLM routing proxy that automatically sends your queries to the cheapest capable model.
|
|
259
|
-
|
|
260
|
-
**The numbers:**
|
|
261
|
-
- #1 on RouterArena (70.32 score, beating GPT-5 at 64.32)
|
|
262
|
-
- $0.047 per 1K queries — 213x cheaper than GPT-5
|
|
263
|
-
- 15,237 npm downloads (grew from 0 to 15K in ~3 weeks, zero marketing)
|
|
264
|
-
- 271 tests passing
|
|
265
|
-
- 47+ providers: OpenAI, Anthropic, Groq, Cerebras, DeepSeek, Gemini, Mistral...
|
|
266
|
-
|
|
267
|
-
## The problem I was solving
|
|
268
|
-
|
|
269
|
-
My AI side projects were getting expensive. Every query — whether "hi" or "explain quantum entanglement" — was going to GPT-4o at $30/1M tokens.
|
|
270
|
-
|
|
271
|
-
I wanted: send cheap queries to cheap models, expensive queries to premium models, save money without losing quality.
|
|
272
|
-
|
|
273
|
-
## How it works
|
|
274
|
-
|
|
275
|
-
```bash
|
|
276
|
-
# Install
|
|
277
|
-
npm install adaptive-memory-multi-model-router
|
|
278
|
-
|
|
279
|
-
# Start proxy
|
|
280
|
-
npx a3m-router serve
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
Then point your existing OpenAI code at localhost:8787:
|
|
284
|
-
|
|
285
|
-
```python
|
|
286
|
-
from openai import OpenAI
|
|
287
|
-
client = OpenAI(
|
|
288
|
-
api_key="your-key",
|
|
289
|
-
base_url="http://localhost:8787/v1"
|
|
290
|
-
)
|
|
291
|
-
# A3M routes automatically based on query complexity
|
|
292
|
-
response = client.chat.completions.create(
|
|
293
|
-
model="auto",
|
|
294
|
-
messages=[{"role": "user", "content": "Debug my Python code"}]
|
|
295
|
-
)
|
|
296
|
-
# "Debug my Python code" → DeepSeek ($0.0003/query)
|
|
297
|
-
# "Explain this quantum physics paper" → GPT-4o mini
|
|
298
|
-
# "Hi" → Groq free tier
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
## What surprised me
|
|
302
|
-
|
|
303
|
-
1. **62% cost reduction was achievable** with less than 4-point quality drop
|
|
304
|
-
2. **Keyword-based routing matched BERT classifier within 2.5%** (RouteLLM, the gold standard, trains a BERT model for this — we used 139 keywords and heuristics)
|
|
305
|
-
3. **Groq/Cerebras are legitimately great for simple queries** — 2-4 quality points behind GPT-4 but 50x cheaper
|
|
306
|
-
4. **Multilingual is where mid-tier models shine** — GLM-4 beats GPT-4 on Hindi/Bengali at 1/10th the cost
|
|
307
|
-
|
|
308
|
-
## Not for you if
|
|
309
|
-
|
|
310
|
-
- You need reliable function calling (OpenAI/Anthropic still ahead)
|
|
311
|
-
- You're running long-context tasks (32K+ tokens — not tested)
|
|
312
|
-
- You only use one model and it's working fine
|
|
313
|
-
|
|
314
|
-
## Try it
|
|
315
|
-
|
|
316
|
-
- GitHub: https://github.com/Das-rebel/a3m-router
|
|
317
|
-
- npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
318
|
-
- Demo: https://asciinema.org/a/RpqOZM9tFMALYWvs
|
|
319
|
-
|
|
320
|
-
Questions welcome!
|
|
321
|
-
```
|
|
322
|
-
|
|
323
|
-
**Pre-written comments:**
|
|
324
|
-
|
|
325
|
-
1. **Q: Is this free?**
|
|
326
|
-
A: The software is MIT-licensed and free. You pay for your own API keys. No subscription, no lock-in.
|
|
327
|
-
|
|
328
|
-
2. **Q: How does it decide which model to use?**
|
|
329
|
-
A: It analyzes 12 keyword signals (query length, code keywords, complexity indicators, etc.) and routes based on a configurable scoring function. You can override the defaults per query type.
|
|
330
|
-
|
|
331
|
-
3. **Q: What if it routes to the wrong model?**
|
|
332
|
-
A: You can set a `force_model` parameter to override routing for specific queries. There's also a fallback chain if the primary provider fails.
|
|
333
|
-
|
|
334
|
-
4. **Q: Does this work with Anthropic/Google/Groq API keys?**
|
|
335
|
-
A: Yes — you set all your provider keys in the config, A3M manages which one gets used.
|
|
336
|
-
|
|
337
|
-
5. **Q: Can I self-host this?**
|
|
338
|
-
A: Yes. It's a local Node.js proxy. Runs on your machine or server. No cloud dependency.
|
|
339
|
-
|
|
340
|
-
---
|
|
341
|
-
|
|
342
|
-
## Submission Checklist
|
|
343
|
-
|
|
344
|
-
- [ ] r/LocalLLaMA — submit at https://www.reddit.com/r/LocalLLaMA/submit/
|
|
345
|
-
- [ ] r/MachineLearning — submit at https://www.reddit.com/r/MachineLearning/submit/
|
|
346
|
-
- [ ] r/SideProject — submit at https://www.reddit.com/r/SideProject/submit/
|
|
347
|
-
- [ ] Monitor for comments, respond within 2 hours of posting
|
|
348
|
-
- [ ] 24h later: cross-post to r/programming if engagement is positive
|