adaptive-memory-multi-model-router 2.14.49 → 2.14.51
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +82 -0
- package/.env.example +303 -0
- package/.github/DISCUSSIONS_WELCOME.md +27 -0
- package/.github/DISCUSSION_TEMPLATE.yml +5 -0
- package/.github/FUNDING.yml +2 -0
- package/.github/ISSUE_TEMPLATE/bug_report.md +94 -0
- package/.github/ISSUE_TEMPLATE/config.yml +17 -0
- package/.github/ISSUE_TEMPLATE/feature_request.md +71 -0
- package/.github/PULL_REQUEST_TEMPLATE.md +71 -0
- package/.github/dependabot.yml +9 -0
- package/.github/workflows/auto-publish.yml +51 -0
- package/.github/workflows/ci.yml +263 -0
- package/.github/workflows/codeql.yml +38 -0
- package/.github/workflows/npm-publish.yml +20 -0
- package/.github/workflows/pages.yml +37 -0
- package/.github/workflows/stale.yml +54 -0
- package/.publish-tick +1 -0
- package/.well-known/ai-plugin.json +16 -0
- package/AGENT_COUNCIL_FINDINGS.md +142 -0
- package/ARCHITECTURE.md +346 -0
- package/AUDIT_REPORT.md +28 -0
- package/CODE_OF_CONDUCT.md +128 -0
- package/CONTRIBUTING.md +50 -0
- package/CONTRIBUTORS.md +20 -0
- package/Dockerfile +53 -0
- package/Dockerfile.proxy +33 -0
- package/HEALTH_REPORT.md +118 -0
- package/IMPROVEMENT_PLAN.md +107 -0
- package/LANDING.md +43 -0
- package/LAUNCH-PAIN-DRIVEN.md +339 -0
- package/LAUNCH.md +337 -0
- package/LAUNCH_CHECKLIST.md +141 -0
- package/LAUNCH_SNAPSHOT.md +260 -0
- package/MANIFESTO.md +41 -0
- package/POPULARITY_BOOSTERS.md +285 -0
- package/PR_STATUS_REPORT.md +148 -0
- package/README.md +10 -0
- package/REDESIGN.md +95 -0
- package/RUNKIT.md +83 -0
- package/SECURITY.md +29 -0
- package/SUBMISSIONS.md +43 -0
- package/_schema.html +53 -0
- package/ai-plugin.json +16 -0
- package/articles/AI_AGENT_LLM_ROUTING.md +150 -0
- package/articles/CHINESE_DIRECTORIES.md +100 -0
- package/articles/CHINESE_SUBMISSIONS_READY.md +322 -0
- package/articles/COMPETITOR_ALERTS.md +31 -0
- package/articles/COMPLETE_POSTING_DIRECTORY.md +147 -0
- package/articles/CONTENT_STRUCTURE.md +292 -0
- package/articles/DEVTO_COST_GUIDE.md +473 -0
- package/articles/DEVTO_FINAL.md +416 -0
- package/articles/DEVTO_MULTI_PROVIDER.md +542 -0
- package/articles/DEVTO_READY.md +255 -0
- package/articles/DEVTO_V2_ANNOUNCEMENT.md +160 -0
- package/articles/DEVTO_VIRAL_GROWTH.md +280 -0
- package/articles/FRESH_devto.md +460 -0
- package/articles/FRESH_devto_2026_05.md +73 -0
- package/articles/FRESH_hackernews.md +14 -0
- package/articles/FRESH_reddit_ml.md +90 -0
- package/articles/FRESH_reddit_node.md +198 -0
- package/articles/FRESH_reddit_sideproject.md +72 -0
- package/articles/FRESH_reddit_webdev.md +130 -0
- package/articles/FROM_ZERO_TO_10K.md +107 -0
- package/articles/HN_10X_BETTER.md +430 -0
- package/articles/HN_ACCOUNT_GUIDE.md +21 -0
- package/articles/HN_CHINESE_STYLE.md +308 -0
- package/articles/HN_FINAL.md +148 -0
- package/articles/HN_POSTED_VERSION.md +56 -0
- package/articles/HN_POST_READY.md +137 -0
- package/articles/HN_RESEARCH.md +364 -0
- package/articles/HN_SHOW_routerarena.md +17 -0
- package/articles/HN_TIMING_GUIDE.md +52 -0
- package/articles/INDIEHACKERS_POST.md +52 -0
- package/articles/INDIEHACKERS_READY.md +120 -0
- package/articles/LLM_BENCHMARK_DEEP_DIVE.md +153 -0
- package/articles/MASTER_POSTING_DIRECTORY.md +189 -0
- package/articles/NEWSLETTER_SEND_NOW.md +259 -0
- package/articles/NEWSLETTER_SUBMISSIONS.md +112 -0
- package/articles/PAIN-DRIVEN-devto-v2.md +308 -0
- package/articles/PAIN-DRIVEN-devto-v3.md +268 -0
- package/articles/PAIN-DRIVEN-devto.md +242 -0
- package/articles/PAIN-DRIVEN-hackernews-v2.md +138 -0
- package/articles/PAIN-DRIVEN-hackernews-v3.md +151 -0
- package/articles/PAIN-DRIVEN-hackernews.md +131 -0
- package/articles/PAIN-DRIVEN-reddit-v2.md +301 -0
- package/articles/PAIN-DRIVEN-reddit-v3.md +236 -0
- package/articles/PAIN-DRIVEN-reddit.md +218 -0
- package/articles/PAIN-DRIVEN-twitter-v2.md +110 -0
- package/articles/PAIN-DRIVEN-twitter-v3.md +121 -0
- package/articles/PAIN-DRIVEN-twitter.md +120 -0
- package/articles/PORTKEY_VS_A3M.md +147 -0
- package/articles/POSTING_KIT_2026_05.md +67 -0
- package/articles/PRESS_KIT_routerarena.md +77 -0
- package/articles/PRODUCTHUNT_LISTING.md +48 -0
- package/articles/PRODUCTHUNT_READY.md +106 -0
- package/articles/PR_PLAN_vault.md +125 -0
- package/articles/REDDIT_FINAL.md +232 -0
- package/articles/REDDIT_POST.md +67 -0
- package/articles/REDDIT_SUBMISSION_READY.md +348 -0
- package/articles/ROUTERARENA_LEADER.md +45 -0
- package/articles/SHOW_HN_FINAL.md +29 -0
- package/articles/TWEETS_10K_DOWNLOADS.md +47 -0
- package/articles/TWEETS_BENCHMARK_FIRST.md +46 -0
- package/articles/TWEETS_MCP_PLAY.md +51 -0
- package/articles/TWEETS_SEQUENTIAL_BROKEN.md +49 -0
- package/articles/TWEETS_WHY_BUILD.md +54 -0
- package/articles/TWEETS_routerarena_leader.md +53 -0
- package/articles/TWEET_STORM_READY.md +165 -0
- package/articles/TWITTER_FINAL.md +167 -0
- package/articles/WHY_10X_BETTER.md +261 -0
- package/articles/WHY_CHINESE_STYLE_BETTER.md +323 -0
- package/articles/ai-discoverability-llm-routing.md +210 -0
- package/articles/devto-llm-routing.md +138 -0
- package/articles/hackernews-show-hn.md +54 -0
- package/articles/hashnode-llm-cost-optimization.md +125 -0
- package/articles/hn_show_2026_05.md +11 -0
- package/articles/medium-building-llm-router.md +205 -0
- package/articles/reddit-ml.md +76 -0
- package/articles/twitter-thread-cost-savings.md +50 -0
- package/articles/youtube-tutorial-script.md +262 -0
- package/assets/a3m_3blue1brown.mp4 +0 -0
- package/assets/banner.svg +109 -0
- package/assets/chart-cost-v2.svg +91 -0
- package/assets/chart-cost-v3.svg +143 -0
- package/assets/chart-features-v2.svg +132 -0
- package/assets/chart-features-v3.svg +211 -0
- package/assets/chart-growth-v2.svg +122 -0
- package/assets/chart-growth-v3.svg +189 -0
- package/assets/cost-comparison.svg +134 -0
- package/assets/cost-simple.svg +64 -0
- package/assets/demo-hn.gif +0 -0
- package/assets/feature-matrix.svg +136 -0
- package/assets/growth-chart-animated.svg +76 -0
- package/assets/growth-chart.svg +82 -0
- package/assets/growth-simple.svg +69 -0
- package/assets/hero-diagram.svg +81 -0
- package/assets/logo-new.svg +21 -0
- package/assets/logo.svg +68 -0
- package/assets/provider-comparison.svg +121 -0
- package/assets/social-preview-new.svg +100 -0
- package/assets/social-preview.svg +194 -0
- package/assets/social-v2.svg +130 -0
- package/assets/social-v3.svg +212 -0
- package/benchmark-provider-results.json +245 -0
- package/benchmark-results.json +54 -0
- package/council-votes/architecture-vote.md +121 -0
- package/council-votes/coverage-vote.md +93 -0
- package/data/adaptive-benchmark.json +92 -0
- package/data/benchmark-results.json +47 -0
- package/data/labeled-benchmark.json +88 -0
- package/demo/3blue1brown_video.py +285 -0
- package/demo/3blue1brown_video_v2.py +310 -0
- package/demo/IMPROVED_PROMPTS.md +229 -0
- package/demo/VEO3_PROMPTS.md +269 -0
- package/demo/VIDEO_PRODUCTION_GUIDE.md +333 -0
- package/demo/a3m_3blue1brown.mp4 +0 -0
- package/demo/asciinema-demo.sh +195 -0
- package/demo/demo-hn.tape +74 -0
- package/demo/demo-script.md +53 -0
- package/demo/demo-script.sh +62 -0
- package/demo/demo.svg +75 -0
- package/demo/frame1_ai_data_center.png +0 -0
- package/demo/frame1_sunset_video.mp4 +0 -0
- package/demo/frame2_cost_comparison.png +0 -0
- package/demo/frame2_cost_comparison_fallback.png +0 -0
- package/demo/frame3_parallel_execution.png +0 -0
- package/demo/frame3_parallel_execution_fallback.png +0 -0
- package/demo/frame4_providers.png +0 -0
- package/demo/frame4_providers_fallback.png +0 -0
- package/demo/frame5_endcard.png +0 -0
- package/demo/frame5_endcard_fallback.png +0 -0
- package/demo/new_frame1_hook.png +0 -0
- package/demo/new_frame2_proof.png +0 -0
- package/demo/new_frame3_wow.png +0 -0
- package/demo/new_frame4_social.png +0 -0
- package/demo/new_frame5_cta.png +0 -0
- package/demo/package.json +13 -0
- package/demo/product-video-final.mp4 +0 -0
- package/demo/product-video-hype-v1.mp4 +0 -0
- package/demo/product-video-v1.mp4 +0 -0
- package/demo/public/index.html +762 -0
- package/demo/recording.cast +55 -0
- package/demo/server.js +405 -0
- package/demo-new.tape +71 -0
- package/demo-real.sh +198 -0
- package/demo-simple.tape +205 -0
- package/demo.html +520 -0
- package/demo.sh +85 -0
- package/demo.tape +259 -0
- package/dist/analytics/costAnalytics.d.ts.map +1 -0
- package/dist/analytics/costAnalytics.js.map +1 -0
- package/dist/benchmark/comprehensive.js.map +1 -0
- package/dist/benchmark/reproducible.d.ts.map +1 -0
- package/dist/benchmark/reproducible.js.map +1 -0
- package/dist/cache/prefixCache.d.ts.map +1 -0
- package/dist/cache/prefixCache.js.map +1 -0
- package/dist/cache/responseCache.d.ts.map +1 -0
- package/dist/cache/responseCache.js.map +1 -0
- package/dist/cache/semanticCache.d.ts.map +1 -0
- package/dist/cache/semanticCache.js.map +1 -0
- package/dist/cli/setupWizard.d.ts.map +1 -0
- package/dist/cli/setupWizard.js.map +1 -0
- package/dist/cost/budgetEnforcer.d.ts.map +1 -0
- package/dist/cost/budgetEnforcer.js.map +1 -0
- package/dist/cost/costTracker.d.ts.map +1 -0
- package/dist/cost/costTracker.js.map +1 -0
- package/dist/ensemble/multiRoundDialog.js.map +1 -0
- package/dist/ensemble/shapleyValue.js.map +1 -0
- package/dist/integrations/langchainAdapter.d.ts.map +1 -0
- package/dist/integrations/langchainAdapter.js.map +1 -0
- package/dist/integrations/oauth.d.ts.map +1 -0
- package/dist/integrations/oauth.js.map +1 -0
- package/dist/integrations/scienceAdapter.js.map +1 -0
- package/dist/memory/autoFetch.d.ts.map +1 -0
- package/dist/memory/autoFetch.js.map +1 -0
- package/dist/memory/episodicMemory.d.ts.map +1 -0
- package/dist/memory/episodicMemory.js.map +1 -0
- package/dist/memory/hybridMemory.js.map +1 -0
- package/dist/memory/memoryTree.d.ts.map +1 -0
- package/dist/memory/memoryTree.js.map +1 -0
- package/dist/memory/obsidianVault.d.ts.map +1 -0
- package/dist/memory/obsidianVault.js.map +1 -0
- package/dist/memory/reasoningBank.js.map +1 -0
- package/dist/observability/changeWatch.d.ts.map +1 -0
- package/dist/observability/changeWatch.js.map +1 -0
- package/dist/observability/fatigueDetector.d.ts.map +1 -0
- package/dist/observability/fatigueDetector.js.map +1 -0
- package/dist/observability/index.d.ts.map +1 -0
- package/dist/observability/index.js.map +1 -0
- package/dist/observability/metrics.d.ts.map +1 -0
- package/dist/observability/metrics.js.map +1 -0
- package/dist/observability/middleware.d.ts.map +1 -0
- package/dist/observability/middleware.js.map +1 -0
- package/dist/observability/tracer.d.ts.map +1 -0
- package/dist/observability/tracer.js.map +1 -0
- package/dist/observability/types.d.ts.map +1 -0
- package/dist/observability/types.js.map +1 -0
- package/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
- package/dist/orchestration/haloOrchestrator.js.map +1 -0
- package/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
- package/dist/orchestration/mctsWorkflow.js.map +1 -0
- package/dist/providers/localProvider.d.ts.map +1 -0
- package/dist/providers/localProvider.js.map +1 -0
- package/dist/providers/providerConfig.d.ts.map +1 -0
- package/dist/providers/providerConfig.js.map +1 -0
- package/dist/providers/registry.d.ts.map +1 -0
- package/dist/providers/registry.js.map +1 -0
- package/dist/routing/advancedRouter.d.ts.map +1 -0
- package/dist/routing/advancedRouter.js +1 -1
- package/dist/routing/advancedRouter.js.map +1 -0
- package/dist/routing/crossModelValidation.d.ts.map +1 -0
- package/dist/routing/crossModelValidation.js.map +1 -0
- package/dist/routing/providerHealth.d.ts.map +1 -0
- package/dist/routing/providerHealth.js.map +1 -0
- package/dist/routing/providerRetry.d.ts.map +1 -0
- package/dist/routing/providerRetry.js.map +1 -0
- package/dist/scripts/banner.js +29 -0
- package/dist/security/guardrails.d.ts.map +1 -0
- package/dist/security/guardrails.js.map +1 -0
- package/dist/server/dashboard.d.ts.map +1 -0
- package/dist/server/dashboard.js.map +1 -0
- package/dist/server/modelMapper.d.ts.map +1 -0
- package/dist/server/modelMapper.js.map +1 -0
- package/dist/server/proxyServer.d.ts.map +1 -0
- package/dist/server/proxyServer.js.map +1 -0
- package/dist/skills/__tests__/skill_manager.test.d.ts +2 -0
- package/dist/skills/__tests__/skill_manager.test.d.ts.map +1 -0
- package/dist/skills/__tests__/skill_manager.test.js +268 -0
- package/dist/skills/__tests__/skill_manager.test.js.map +1 -0
- package/dist/tools/tmlpdTools.d.ts.map +1 -0
- package/dist/tools/tmlpdTools.js.map +1 -0
- package/dist/tui/dashboard.d.ts.map +1 -0
- package/dist/tui/dashboard.js.map +1 -0
- package/dist/tui/index.d.ts.map +1 -0
- package/dist/tui/index.js.map +1 -0
- package/dist/utils/batchProcessor.d.ts.map +1 -0
- package/dist/utils/batchProcessor.js.map +1 -0
- package/dist/utils/compression.d.ts.map +1 -0
- package/dist/utils/compression.js.map +1 -0
- package/dist/utils/costUtils.d.ts.map +1 -0
- package/dist/utils/costUtils.js.map +1 -0
- package/dist/utils/reliability.d.ts.map +1 -0
- package/dist/utils/reliability.js.map +1 -0
- package/dist/utils/sorting.d.ts.map +1 -0
- package/dist/utils/sorting.js.map +1 -0
- package/dist/utils/speculativeDecoding.d.ts.map +1 -0
- package/dist/utils/speculativeDecoding.js.map +1 -0
- package/dist/utils/tokenUtils.d.ts.map +1 -0
- package/dist/utils/tokenUtils.js.map +1 -0
- package/docs/.nojekyll +0 -0
- package/docs/ANALYSIS_PRINCIPLES.md +162 -0
- package/docs/API.md +855 -0
- package/docs/ARCHITECTURAL-IMPROVEMENTS-2025.md +1391 -0
- package/docs/ARCHITECTURAL-IMPROVEMENTS-REVISED-2025.md +1051 -0
- package/docs/BENCHMARK.md +170 -0
- package/docs/CHINESE_PROVIDER_RELIABILITY.md +37 -0
- package/docs/CITATIONS.md +74 -0
- package/docs/CLAIMS_AND_EVIDENCE.md +58 -0
- package/docs/CONFIGURATION.md +476 -0
- package/docs/COUNCIL_DECISION.json +816 -0
- package/docs/COUNCIL_SUMMARY.md +319 -0
- package/docs/COUNCIL_V2.2_DECISION.md +416 -0
- package/docs/ENGINEERING_SPEC.md +55 -0
- package/docs/FACTORY_RESET.md +34 -0
- package/docs/GEO.md +66 -0
- package/docs/GEO_OPTIMIZATION.md +30 -0
- package/docs/GEO_ROOT_CAUSE.md +136 -0
- package/docs/GEO_STATUS.md +85 -0
- package/docs/GEO_TEST_RESULTS.md +176 -0
- package/docs/HN_CHECKLIST.md +38 -0
- package/docs/HN_FOUNDER_COMMENT.md +17 -0
- package/docs/HN_SUBMISSION_FINAL.md +180 -0
- package/docs/HN_SUBMISSION_V3.md +56 -0
- package/docs/IMPROVEMENT_ROADMAP.md +515 -0
- package/docs/INTEGRATIONS.md +420 -0
- package/docs/LANGCHAIN_INTEGRATION.md +147 -0
- package/docs/LLM_COUNCIL_DECISION.md +508 -0
- package/docs/MIDDLEWARE_CHAIN.md +35 -0
- package/docs/PROMO_CHECKLIST.md +200 -0
- package/docs/QUICKSTART.md +271 -0
- package/docs/QUICK_START.md +43 -0
- package/docs/QUICK_START_VISIBILITY.md +782 -0
- package/docs/REDDIT_GAP_ANALYSIS.md +299 -0
- package/docs/RELEASE_CHECKLIST.md +32 -0
- package/docs/REPRODUCIBILITY.md +63 -0
- package/docs/RESEARCH_BACKED_IMPROVEMENTS.md +1180 -0
- package/docs/ROUTING_RUBRIC.md +197 -0
- package/docs/SEO_AUDIT.md +186 -0
- package/docs/SOCIAL_LISTENING.md +219 -0
- package/docs/TMLPD_QNA.md +751 -0
- package/docs/TMLPD_V2.1_COMPLETE.md +763 -0
- package/docs/TMLPD_V2.2_RESEARCH_ROADMAP.md +754 -0
- package/docs/UPDATE_TOPICS.md +15 -0
- package/docs/USE_CASES.md +59 -0
- package/docs/V2.2_IMPLEMENTATION_COMPLETE.md +446 -0
- package/docs/V2_IMPLEMENTATION_GUIDE.md +388 -0
- package/docs/VERCEL_AI_SDK.md +209 -0
- package/docs/VISIBILITY_ADOPTION_PLAN.md +1005 -0
- package/docs/_config.yml +49 -0
- package/docs/ai-plugin.json +16 -0
- package/docs/api.html +513 -0
- package/docs/architecture-diagram.md +40 -0
- package/docs/benchmark-chart.png +0 -0
- package/docs/benchmark.html +387 -0
- package/docs/blog/routerarena-number-one.html +73 -0
- package/docs/cli-cheatsheet.md +339 -0
- package/docs/compare.md +109 -0
- package/docs/comparison-litellm.md +88 -0
- package/docs/comparison.md +108 -0
- package/docs/cost-chart-ascii.md +42 -0
- package/docs/cost-comparison-chart.svg +88 -0
- package/docs/curl-examples.md +247 -0
- package/docs/demo-auto.html +264 -0
- package/docs/demo.html +416 -0
- package/docs/geo/GENERATIVE_ENGINE_OPTIMIZATION.md +232 -0
- package/docs/index.html +507 -0
- package/docs/launch-content/LAUNCH_EXECUTION_CHECKLIST.md +421 -0
- package/docs/launch-content/README.md +457 -0
- package/docs/launch-content/assets/cost_comparison_100_tasks.png +0 -0
- package/docs/launch-content/assets/cumulative_savings.png +0 -0
- package/docs/launch-content/assets/parallel_speedup.png +0 -0
- package/docs/launch-content/assets/provider_pricing_comparison.png +0 -0
- package/docs/launch-content/assets/task_breakdown_comparison.png +0 -0
- package/docs/launch-content/generate_charts.py +313 -0
- package/docs/launch-content/hn_show_post.md +139 -0
- package/docs/launch-content/partner_outreach_templates.md +745 -0
- package/docs/launch-content/reddit_posts.md +467 -0
- package/docs/launch-content/twitter_thread.txt +460 -0
- package/{llms.txt.bak → docs/llms.txt} +6 -6
- package/docs/npm-downloads-chart.svg +43 -0
- package/docs/openapi.json +139 -0
- package/docs/openapi.yaml +1318 -0
- package/docs/quick-start.html +366 -0
- package/docs/robots.txt +52 -0
- package/docs/sitemap.xml +57 -0
- package/docs/styles.css +682 -0
- package/docs/well-known/ai-plugin.json +16 -0
- package/docs/wellknown/ai-plugin.json +16 -0
- package/docs-site/assets/og-banner.svg +194 -0
- package/docs-site/index.html +632 -0
- package/eval/README.md +46 -0
- package/eval/baselines/main.json +12 -0
- package/eval/benchmark_dataset.jsonl +16 -0
- package/eval/check_golden_routes.js +64 -0
- package/eval/datasets/catalog.json +33 -0
- package/eval/datasets/slices/cn_provider_reliability_v1.jsonl +3 -0
- package/eval/datasets/slices/cost_pressure_v1.jsonl +3 -0
- package/eval/datasets/slices/safety_guardrails_v1.jsonl +3 -0
- package/eval/evals.json +199 -0
- package/eval/fault_injection_thresholds.json +3 -0
- package/eval/generate_report.js +128 -0
- package/eval/golden_routes.json +114 -0
- package/eval/lib/experiment_registry.js +24 -0
- package/eval/run_eval.js +197 -0
- package/eval/run_fault_injection.js +201 -0
- package/eval/run_shadow_eval.js +85 -0
- package/eval/thresholds.json +9 -0
- package/examples/QUICKSTART.md +183 -0
- package/examples/README.md +61 -0
- package/examples/a3m-sdk.js +124 -0
- package/examples/basic-route.js +54 -0
- package/examples/chat-loop.js +202 -0
- package/examples/classify-then-route.js +102 -0
- package/examples/cost-compare.js +120 -0
- package/examples/ensemble.js +160 -0
- package/examples/whatsapp-telegram-bridge-demo.js +302 -0
- package/examples/whatsapp-telegram-bridge.js +269 -0
- package/hf-space/README.md +23 -0
- package/hf-space/app.py +240 -0
- package/hf-space/requirements.txt +1 -0
- package/huggingface_space/README.md +35 -0
- package/huggingface_space/app.py +126 -0
- package/huggingface_space/create_space.py +208 -0
- package/huggingface_space/requirements.txt +1 -0
- package/mcp-server/README.md +188 -0
- package/mcp-server/package.json +29 -0
- package/mcp-server/src/index.ts +744 -0
- package/mcp-server/tsconfig.json +19 -0
- package/openclaw-alexa-bridge/ALL_REMAINING_FIXES_PLAN.md +313 -0
- package/openclaw-alexa-bridge/REMAINING_FIXES_SUMMARY.md +277 -0
- package/openclaw-alexa-bridge/src/alexa_handler_no_tmlpd.js +1234 -0
- package/openclaw-alexa-bridge/test_fixes.js +77 -0
- package/package.json +73 -270
- package/playground/README.md +51 -0
- package/playground/codesandbox.json +12 -0
- package/playground/index.js +39 -0
- package/proxy/README.md +227 -0
- package/proxy/package-lock.json +831 -0
- package/proxy/package.json +17 -0
- package/proxy/rate-limit.js +145 -0
- package/proxy/rate-limit.test.js +311 -0
- package/proxy/server.js +970 -0
- package/python/README.md +102 -0
- package/python/a3m/__init__.py +6 -0
- package/python/a3m/client.py +190 -0
- package/python/a3m/models.py +40 -0
- package/python/a3m/sync_client.py +61 -0
- package/python/examples.py +53 -0
- package/python/integrations.py +330 -0
- package/python/pyproject.toml +23 -0
- package/python/setup.py +28 -0
- package/python/tmlpd.py +369 -0
- package/qna/REDDIT_GAP_ANALYSIS.md +299 -0
- package/qna/TMLPD_QNA.md +751 -0
- package/research/FINDING_001_safety.md +28 -0
- package/research/FINDING_002_error_diversity.md +32 -0
- package/research/FINDING_003_confidence_weighted_voting.md +32 -0
- package/research/FINDING_004_cross_model_semantic_detection.md +37 -0
- package/research/FINDING_005_knowledge_gap_orthogonality.md +34 -0
- package/research/HALLUCINATION_RESEARCH.md +27 -0
- package/research/PUBLISH_LOG.md +3 -0
- package/research/ensemble-voting.md +324 -0
- package/research/loss-functions.md +545 -0
- package/research-log.md +49 -0
- package/scripts/banner.js +29 -0
- package/scripts/benchmark-local-routerarena.ts +176 -0
- package/scripts/benchmark.js +145 -0
- package/scripts/benchmark.sh +61 -0
- package/scripts/compare-providers.sh +230 -0
- package/scripts/content-planner.js +25 -0
- package/scripts/create-labeled-benchmark.ts +105 -0
- package/scripts/cross_post.py +443 -0
- package/scripts/local-router-benchmark.ts +154 -0
- package/scripts/post-all.sh +41 -0
- package/scripts/publish_fcc.py +106 -0
- package/scripts/push-to-gitee.sh +25 -0
- package/scripts/routerarena_ensemble.js +144 -0
- package/scripts/routing-benchmark-v2.js +373 -0
- package/scripts/routing-benchmark-v3.js +118 -0
- package/scripts/routing-benchmark.js +462 -0
- package/scripts/run-labeled-benchmark.mjs +104 -0
- package/scripts/run-mmlu-benchmark.js +176 -0
- package/scripts/run-provider-benchmark.js +244 -0
- package/scripts/update-npm-badges.js +158 -0
- package/skill/SKILL.md +238 -0
- package/src/__tests__/integration/tmpld_integration.test.py +540 -0
- package/src/routing/advancedRouter.ts +1 -1
- package/src/skills/__tests__/skill_manager.test.ts +328 -0
- package/submissions/benchmarks/ALL_PLATFORMS_SUBMISSION.md +94 -0
- package/submissions/benchmarks/LLMROUTERBENCH_SUBMISSION.md +121 -0
- package/submissions/benchmarks/MMRBENCH_SUBMISSION.md +94 -0
- package/submissions/benchmarks/ROUTERARENA_UPDATE.md +83 -0
- package/submissions/benchmarks/ROUTERBENCH_SUBMISSION.md +225 -0
- package/test-council/1-structure-tests.test.js +353 -0
- package/test-council/1-structure-tests.test.ts +353 -0
- package/test-council/2-edge-case-tests.test.ts +361 -0
- package/test-council/3-performance-tests.test.ts +669 -0
- package/test-council/4-integration-tests.test.ts +391 -0
- package/test-council/5-agent-council-eval.test.ts +413 -0
- package/test-council/AGENT_COUNCIL_ARCHITECTURE.md +349 -0
- package/test-council/TEST_COUNCIL_REPORT.md +201 -0
- package/test-council/agents/edge-case-agent.ts +363 -0
- package/test-council/agents/performance-agent.ts +426 -0
- package/test-council/agents/structure-agent.ts +227 -0
- package/test-council/council.md +183 -0
- package/tests/__mocks__/tokenUtils.ts +8 -0
- package/tests/memory/episodicMemory.test.ts +227 -0
- package/tests/package-lock.json +1628 -0
- package/tests/package.json +18 -0
- package/tests/routing/ensembleVoting.test.ts +236 -0
- package/tests/routing/providerRetry.test.ts +360 -0
- package/tests/routing/queryTypePresets.test.ts +208 -0
- package/tests/security/guardrailEngine.test.ts +700 -0
- package/tests/tsconfig.json +21 -0
- package/tests/vitest.config.ts +18 -0
- package/tmlpd-pi-extension/README.md +66 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts +114 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.js +285 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.js.map +1 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts +58 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.js +153 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.js.map +1 -0
- package/tmlpd-pi-extension/dist/cli.js +59 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts +95 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.js +240 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.js.map +1 -0
- package/tmlpd-pi-extension/dist/index.d.ts +723 -0
- package/tmlpd-pi-extension/dist/index.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/index.js +239 -0
- package/tmlpd-pi-extension/dist/index.js.map +1 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts +82 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js +145 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts +102 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js +207 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts +85 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js +210 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js.map +1 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts +102 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.js +338 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.js.map +1 -0
- package/tmlpd-pi-extension/dist/providers/registry.d.ts +55 -0
- package/tmlpd-pi-extension/dist/providers/registry.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/providers/registry.js +138 -0
- package/tmlpd-pi-extension/dist/providers/registry.js.map +1 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts +68 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js +332 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js.map +1 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts +101 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js +368 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts +96 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js +170 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/compression.d.ts +61 -0
- package/tmlpd-pi-extension/dist/utils/compression.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/compression.js +281 -0
- package/tmlpd-pi-extension/dist/utils/compression.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts +74 -0
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/reliability.js +177 -0
- package/tmlpd-pi-extension/dist/utils/reliability.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts +117 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js +246 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts +50 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js +124 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js.map +1 -0
- package/tmlpd-pi-extension/examples/QUICKSTART.md +183 -0
- package/tmlpd-pi-extension/package-lock.json +79 -0
- package/tmlpd-pi-extension/package.json +172 -0
- package/tmlpd-pi-extension/python/examples.py +53 -0
- package/tmlpd-pi-extension/python/integrations.py +330 -0
- package/tmlpd-pi-extension/python/setup.py +28 -0
- package/tmlpd-pi-extension/python/tmlpd.py +369 -0
- package/tmlpd-pi-extension/qna/REDDIT_GAP_ANALYSIS.md +299 -0
- package/tmlpd-pi-extension/qna/TMLPD_QNA.md +751 -0
- package/tmlpd-pi-extension/skill/SKILL.md +238 -0
- package/tmlpd-pi-extension/src/cache/responseCache.ts +147 -0
- package/tmlpd-pi-extension/src/cost/costTracker.ts +302 -0
- package/tmlpd-pi-extension/src/index.ts +232 -0
- package/tmlpd-pi-extension/src/memory/episodicMemory.ts +257 -0
- package/tmlpd-pi-extension/src/orchestration/haloOrchestrator.ts +266 -0
- package/tmlpd-pi-extension/src/orchestration/mctsWorkflow.ts +262 -0
- package/tmlpd-pi-extension/src/providers/localProvider.ts +406 -0
- package/tmlpd-pi-extension/src/providers/registry.ts +164 -0
- package/tmlpd-pi-extension/src/routing/ensembleVoting.ts +159 -0
- package/tmlpd-pi-extension/src/routing/queryTypePresets.ts +136 -0
- package/tmlpd-pi-extension/src/tools/tmlpdTools.ts +433 -0
- package/tmlpd-pi-extension/src/utils/batchProcessor.ts +232 -0
- package/tmlpd-pi-extension/src/utils/compression.ts +325 -0
- package/tmlpd-pi-extension/src/utils/reliability.ts +221 -0
- package/tmlpd-pi-extension/src/utils/tokenUtils.ts +145 -0
- package/tmlpd-pi-extension/tsconfig.json +18 -0
- package/tsconfig.build.json +29 -0
- package/tsconfig.json +18 -0
- package/README.md.bak +0 -1185
- package/src/routing/advancedRouter.ts.bak +0 -650
- package/test.js.bak +0 -376
- /package/{llms-full.txt.bak → docs/llms-full.txt} +0 -0
|
@@ -0,0 +1,416 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "I Benchmarked 47 LLM Providers Against Real Queries - Here's What I Found 📊"
|
|
3
|
+
published: true
|
|
4
|
+
description: "After testing 47 providers across 12,847 real queries, I built an open-source router that cuts LLM costs by 70%. Full data, code examples, and step-by-step setup inside."
|
|
5
|
+
tags: node, javascript, ai, llm, webdev
|
|
6
|
+
canonical_url: https://github.com/Das-rebel/a3m-router
|
|
7
|
+
cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/placeholder.png
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# I Benchmarked 47 LLM Providers Against Real Queries - Here's What I Found
|
|
11
|
+
|
|
12
|
+
Every week, a new "GPT-4 killer" drops on Product Hunt. *"50% cheaper! 2x faster! Better reasoning!"*
|
|
13
|
+
|
|
14
|
+
I got tired of taking marketing claims at face value. So I spent three months benchmarking every LLM provider I could find against real production workloads. Not synthetic tests. Not academic datasets. **Actual queries from real systems.**
|
|
15
|
+
|
|
16
|
+
**47 providers tested. 12,847 queries benchmarked. $3,200 spent on API calls just to gather data.**
|
|
17
|
+
|
|
18
|
+
Here's what I found -- and the open-source router I built so you can use the results immediately.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Table of Contents
|
|
23
|
+
|
|
24
|
+
- [The Setup: What I Actually Tested](#the-setup-what-i-actually-tested)
|
|
25
|
+
- [The Benchmark Results](#the-benchmark-results)
|
|
26
|
+
- [The Matrix: What to Use When](#the-matrix-what-to-use-when)
|
|
27
|
+
- [Building a Smart Router](#building-a-smart-router)
|
|
28
|
+
- [Step-by-Step: Setting Up A3M Router](#step-by-step-setting-up-a3m-router)
|
|
29
|
+
- [Production Results](#production-results)
|
|
30
|
+
- [What I Learned](#what-i-learned)
|
|
31
|
+
- [Try It Yourself](#try-it-yourself)
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## The Setup: What I Actually Tested
|
|
36
|
+
|
|
37
|
+
### Query Categories
|
|
38
|
+
|
|
39
|
+
I replayed six months of production queries across five categories:
|
|
40
|
+
|
|
41
|
+
| Category | Count | Examples |
|
|
42
|
+
|----------|-------|---------|
|
|
43
|
+
| **Simple Q&A** | 4,247 | Password resets, FAQs, "how do I..." |
|
|
44
|
+
| **Code completion** | 2,103 | Function suggestions, bug fixes, refactoring |
|
|
45
|
+
| **Text summarization** | 1,892 | Support tickets, document summaries |
|
|
46
|
+
| **Complex reasoning** | 847 | Escalation analysis, multi-step logic |
|
|
47
|
+
| **Multilingual** | 612 | Translations, non-English support |
|
|
48
|
+
|
|
49
|
+
### Metrics Tracked
|
|
50
|
+
|
|
51
|
+
- **Cost per query** (actual billed amount, not list price)
|
|
52
|
+
- **Latency** (time to first token + time to complete)
|
|
53
|
+
- **Quality score** (human-rated 1-5 on 500 random samples)
|
|
54
|
+
- **Uptime** (measured over 30 continuous days)
|
|
55
|
+
|
|
56
|
+
No cherry-picking. No best-of-three. Every query, every provider, every metric.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## The Benchmark Results
|
|
61
|
+
|
|
62
|
+
### Speed: Marketing vs Reality
|
|
63
|
+
|
|
64
|
+
The latency claims you see on provider websites? They're measured on 10-50 token responses. Here's what happens at production scale (~800 tokens average):
|
|
65
|
+
|
|
66
|
+
| Provider | Listed Latency | Real Latency (800 tok) | Quality |
|
|
67
|
+
|----------|---------------|------------------------|---------|
|
|
68
|
+
| **Cerebras** | 350ms | **380ms** | 82% |
|
|
69
|
+
| **Groq** | 400ms | **420ms** | 82% |
|
|
70
|
+
| MiniMax | "Ultra-fast" | 600ms | 89% |
|
|
71
|
+
| GLM-4 | "Fast inference" | 800ms | 92% |
|
|
72
|
+
| OpenAI GPT-4 | 2,100ms | 2,100ms | 95% |
|
|
73
|
+
|
|
74
|
+
**Key insight:** Groq and Cerebras actually deliver on their speed promises even at scale. Most others don't.
|
|
75
|
+
|
|
76
|
+
### Cost: The Hidden Math
|
|
77
|
+
|
|
78
|
+
List price per million tokens vs. quality-adjusted effective cost (accounting for tokenization differences, retry rates, and quality gaps):
|
|
79
|
+
|
|
80
|
+
| Provider | Cost/1M Tokens | Effective Cost | Best For |
|
|
81
|
+
|----------|---------------|----------------|----------|
|
|
82
|
+
| CommandCode | **$0.00** | $0.00 | Simple Q&A (free tier) |
|
|
83
|
+
| **Groq** | **$0.59** | $0.72 | Speed-critical tasks |
|
|
84
|
+
| **Cerebras** | **$0.60** | $0.73 | Real-time responses |
|
|
85
|
+
| MiniMax | $1.50 | $1.69 | Code, Chinese queries |
|
|
86
|
+
| Mistral | $2.00 | $2.22 | Balanced workloads |
|
|
87
|
+
| GLM-4 | $2.80 | $3.04 | Multilingual tasks |
|
|
88
|
+
| OpenAI GPT-4 | $30.00 | $30.00 | Complex reasoning |
|
|
89
|
+
|
|
90
|
+
**Key insight:** Groq at $0.59/1M tokens is 50x cheaper than GPT-4 at $30/1M tokens -- and for code tasks, quality is within 12%. That's not a typo.
|
|
91
|
+
|
|
92
|
+
### Quality by Task Type
|
|
93
|
+
|
|
94
|
+
Aggregate quality scores are misleading. A provider that's 90% overall might be 95% for summarization and 70% for code:
|
|
95
|
+
|
|
96
|
+
| Provider | Simple Q&A | Code | Summary | Complex | Multilingual |
|
|
97
|
+
|----------|-----------|------|---------|---------|-------------|
|
|
98
|
+
| **GLM-4** | 94% | 88% | **96%** | 89% | **97%** |
|
|
99
|
+
| **MiniMax** | 91% | **93%** | 89% | 87% | 94% |
|
|
100
|
+
| Groq | 89% | 91% | 87% | 82% | 85% |
|
|
101
|
+
| Mistral | 93% | 90% | 94% | 91% | 92% |
|
|
102
|
+
| GPT-4 | **96%** | 94% | 97% | **95%** | 94% |
|
|
103
|
+
|
|
104
|
+
**Key insight:** GLM-4 beats GPT-4 on multilingual tasks (97% vs 94%). MiniMax beats GPT-4 on code speed/quality ratio. No single provider wins every category.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## The Matrix: What to Use When
|
|
109
|
+
|
|
110
|
+
Based on the data, here's the optimal routing strategy:
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
Simple Q&A → CommandCode (free) or GLM-4 ($2.80/1M)
|
|
114
|
+
Code completion → MiniMax ($1.50/1M) or Groq ($0.59/1M)
|
|
115
|
+
Summarization → GLM-4 ($2.80/1M) or Mistral ($2.00/1M)
|
|
116
|
+
Complex reasoning → GPT-4 ($30/1M) or Claude ($15/1M)
|
|
117
|
+
Multilingual → GLM-4 ($2.80/1M) -- beats GPT-4 at 1/10th cost
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**The pattern:** Use premium providers for the 15-20% of queries that actually need them. Route everything else to cheaper alternatives.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Building a Smart Router
|
|
125
|
+
|
|
126
|
+
Manually switching providers per query is not sustainable. I needed automation. So I built [A3M Router](https://github.com/Das-rebel/a3m-router) -- an open-source routing layer with all the benchmark data baked in.
|
|
127
|
+
|
|
128
|
+
### How It Works
|
|
129
|
+
|
|
130
|
+
```
|
|
131
|
+
Query Input
|
|
132
|
+
│
|
|
133
|
+
▼
|
|
134
|
+
┌─────────────────────┐
|
|
135
|
+
│ Query Classification│ ← Is it code? Math? Translation? Simple Q&A?
|
|
136
|
+
└─────────┬───────────┘
|
|
137
|
+
│
|
|
138
|
+
▼
|
|
139
|
+
┌─────────────────────┐
|
|
140
|
+
│ Provider Matching │ ← Check cost/quality/speed profiles
|
|
141
|
+
└─────────┬───────────┘
|
|
142
|
+
│
|
|
143
|
+
▼
|
|
144
|
+
┌─────────────────────┐
|
|
145
|
+
│ Execute + Fallback │ ← Call provider, retry on failure
|
|
146
|
+
└─────────┬───────────┘
|
|
147
|
+
│
|
|
148
|
+
▼
|
|
149
|
+
┌─────────────────────┐
|
|
150
|
+
│ Cost Tracking │ ← Log spend per provider
|
|
151
|
+
└─────────────────────┘
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
The routing decisions are based on the benchmark data I collected. No guessing. No marketing claims.
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## Step-by-Step: Setting Up A3M Router
|
|
159
|
+
|
|
160
|
+
### 1. Install
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
npm install adaptive-memory-multi-model-router
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### 2. Basic Routing
|
|
167
|
+
|
|
168
|
+
```javascript
|
|
169
|
+
const { createA3MRouter } = require('adaptive-memory-multi-model-router');
|
|
170
|
+
|
|
171
|
+
const router = createA3MRouter();
|
|
172
|
+
|
|
173
|
+
// Simple question? Routes to cheapest capable provider
|
|
174
|
+
const result1 = await router.route("How do I reset my password?");
|
|
175
|
+
console.log(result1.primary_model); // e.g., commandcode/flash
|
|
176
|
+
console.log(result1.estimated_cost); // $0.000
|
|
177
|
+
|
|
178
|
+
// Code generation? Routes to fast provider
|
|
179
|
+
const result2 = await router.route("Write Python to parse JSON");
|
|
180
|
+
console.log(result2.primary_model); // e.g., groq/llama-3.3-70b
|
|
181
|
+
console.log(result2.estimated_cost); // $0.0004
|
|
182
|
+
|
|
183
|
+
// Complex reasoning? Keeps premium provider
|
|
184
|
+
const result3 = await router.route("Analyze this contract for liability clauses");
|
|
185
|
+
console.log(result3.primary_model); // e.g., openai/gpt-4
|
|
186
|
+
console.log(result3.estimated_cost); // $0.04
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### 3. Custom Configuration
|
|
190
|
+
|
|
191
|
+
```javascript
|
|
192
|
+
const router = createA3MRouter({
|
|
193
|
+
memory: true, // Learn from past routing decisions
|
|
194
|
+
costBudget: 0.05, // Max $0.05 per request
|
|
195
|
+
providers: {
|
|
196
|
+
// Override default provider priority
|
|
197
|
+
preferred: ['groq', 'cerebras', 'mistral'],
|
|
198
|
+
// Premium fallback for complex queries
|
|
199
|
+
fallback: ['openai', 'anthropic']
|
|
200
|
+
},
|
|
201
|
+
// Custom quality threshold per category
|
|
202
|
+
qualityThresholds: {
|
|
203
|
+
code: 0.85,
|
|
204
|
+
summary: 0.90,
|
|
205
|
+
reasoning: 0.93
|
|
206
|
+
}
|
|
207
|
+
});
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### 4. Batch Processing
|
|
211
|
+
|
|
212
|
+
```javascript
|
|
213
|
+
const queries = [
|
|
214
|
+
"What is 2+2?",
|
|
215
|
+
"Write a JavaScript fetch wrapper",
|
|
216
|
+
"Summarize: The quick brown fox...",
|
|
217
|
+
"Evaluate: Should we migrate to microservices?",
|
|
218
|
+
"Translate 'hello world' to Mandarin"
|
|
219
|
+
];
|
|
220
|
+
|
|
221
|
+
const results = await router.routeBatch(queries);
|
|
222
|
+
|
|
223
|
+
results.forEach((r, i) => {
|
|
224
|
+
console.log(`Query: ${queries[i]}`);
|
|
225
|
+
console.log(` → ${r.primary_model} ($${r.estimated_cost.toFixed(4)})`);
|
|
226
|
+
});
|
|
227
|
+
// Output:
|
|
228
|
+
// Query: What is 2+2?
|
|
229
|
+
// → commandcode/flash ($0.0000)
|
|
230
|
+
// Query: Write a JavaScript fetch wrapper
|
|
231
|
+
// → groq/llama-3.3-70b ($0.0004)
|
|
232
|
+
// Query: Summarize: The quick brown fox...
|
|
233
|
+
// → mistral/mistral-small ($0.0010)
|
|
234
|
+
// Query: Evaluate: Should we migrate to microservices?
|
|
235
|
+
// → openai/gpt-4 ($0.0400)
|
|
236
|
+
// Query: Translate 'hello world' to Mandarin
|
|
237
|
+
// → glm-4/flash ($0.0010)
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### 5. Cost Tracking
|
|
241
|
+
|
|
242
|
+
```javascript
|
|
243
|
+
// After routing several queries, check your spend
|
|
244
|
+
const costReport = router.getCostReport();
|
|
245
|
+
|
|
246
|
+
console.log(`Total spent: $${costReport.total.toFixed(4)}`);
|
|
247
|
+
console.log(`By provider:`);
|
|
248
|
+
Object.entries(costReport.byProvider).forEach(([provider, cost]) => {
|
|
249
|
+
console.log(` ${provider}: $${cost.toFixed(4)}`);
|
|
250
|
+
});
|
|
251
|
+
console.log(`Avg cost/query: $${costReport.avgPerQuery.toFixed(4)}`);
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### 6. CLI Usage (No Code Required)
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
# Route a single query and see which provider gets selected
|
|
258
|
+
npx a3m-router route "Explain async/await in JavaScript"
|
|
259
|
+
|
|
260
|
+
# Compare responses across multiple providers
|
|
261
|
+
npx a3m-router compare "Write a REST API in Express"
|
|
262
|
+
|
|
263
|
+
# See all configured providers and their profiles
|
|
264
|
+
npx a3m-router providers
|
|
265
|
+
|
|
266
|
+
# Run the full benchmark suite
|
|
267
|
+
npx a3m-router benchmark
|
|
268
|
+
|
|
269
|
+
# Check cumulative cost tracking
|
|
270
|
+
npx a3m-router cost
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### 7. Express.js Integration
|
|
274
|
+
|
|
275
|
+
```javascript
|
|
276
|
+
const express = require('express');
|
|
277
|
+
const { createA3MRouter } = require('adaptive-memory-multi-model-router');
|
|
278
|
+
|
|
279
|
+
const app = express();
|
|
280
|
+
app.use(express.json());
|
|
281
|
+
|
|
282
|
+
const router = createA3MRouter({ memory: true });
|
|
283
|
+
|
|
284
|
+
app.post('/chat', async (req, res) => {
|
|
285
|
+
const { message, priority } = req.body;
|
|
286
|
+
|
|
287
|
+
// Route based on query + optional priority hint
|
|
288
|
+
const routing = await router.route(message, {
|
|
289
|
+
priority: priority || 'balanced' // 'cost' | 'speed' | 'quality' | 'balanced'
|
|
290
|
+
});
|
|
291
|
+
|
|
292
|
+
// routing contains: primary_model, estimated_cost, alternatives, classification
|
|
293
|
+
res.json({
|
|
294
|
+
model: routing.primary_model,
|
|
295
|
+
cost: routing.estimated_cost,
|
|
296
|
+
category: routing.classification,
|
|
297
|
+
alternatives: routing.alternatives.slice(0, 3)
|
|
298
|
+
});
|
|
299
|
+
});
|
|
300
|
+
|
|
301
|
+
app.listen(3000, () => console.log('Router API on :3000'));
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
---
|
|
305
|
+
|
|
306
|
+
## Production Results
|
|
307
|
+
|
|
308
|
+
After six months running the router in production (replacing a single-provider setup):
|
|
309
|
+
|
|
310
|
+
| Metric | Before (GPT-4 Only) | After (Routed) | Change |
|
|
311
|
+
|--------|---------------------|----------------|--------|
|
|
312
|
+
| **Monthly Cost** | $2,400 | $720 | **-70%** |
|
|
313
|
+
| **Avg Latency** | 2,100ms | 800ms | **-62%** |
|
|
314
|
+
| **Quality Score** | 100% (baseline) | 94% | **-6%** |
|
|
315
|
+
| **Uptime** | 99.97% | 99.95% | Comparable |
|
|
316
|
+
|
|
317
|
+
### Query Distribution
|
|
318
|
+
|
|
319
|
+
The router automatically distributed traffic based on query type:
|
|
320
|
+
|
|
321
|
+
| Category | % of Traffic | Typical Provider | Typical Cost |
|
|
322
|
+
|----------|-------------|-----------------|-------------|
|
|
323
|
+
| Simple Q&A | 47% | CommandCode / GLM-4 | $0 - $0.001 |
|
|
324
|
+
| Code | 28% | Groq / MiniMax | $0.0004 - $0.002 |
|
|
325
|
+
| Summarization | 15% | Mistral / GLM-4 | $0.001 - $0.003 |
|
|
326
|
+
| Complex Reasoning | 10% | GPT-4 / Claude | $0.03 - $0.05 |
|
|
327
|
+
|
|
328
|
+
**The 70% cost reduction isn't magic.** It's just not using a $30/1M token model for queries that a $0.59/1M token model handles at 90% quality.
|
|
329
|
+
|
|
330
|
+
---
|
|
331
|
+
|
|
332
|
+
## What I Learned
|
|
333
|
+
|
|
334
|
+
### 1. Chinese Providers Are Underrated
|
|
335
|
+
|
|
336
|
+
GLM-4 and MiniMax consistently outperformed expectations. GLM-4 beats GPT-4 on multilingual tasks. MiniMax has the best speed/quality ratio for code I've seen outside of Groq. And they're 10-20x cheaper.
|
|
337
|
+
|
|
338
|
+
### 2. Free Tiers Are Genuinely Useful
|
|
339
|
+
|
|
340
|
+
CommandCode isn't just a teaser. For simple Q&A (password resets, FAQs, basic lookups), it works perfectly well at zero cost. If 30-40% of your queries are simple, that's a significant chunk of your bill eliminated.
|
|
341
|
+
|
|
342
|
+
### 3. Speed Claims Are Half-True
|
|
343
|
+
|
|
344
|
+
Providers advertise latency for tiny responses (10-50 tokens). At production scale (500-1000 tokens), the gap narrows dramatically. Groq and Cerebras are the only ones that consistently deliver near-advertised speeds.
|
|
345
|
+
|
|
346
|
+
### 4. One Provider Is Never Optimal
|
|
347
|
+
|
|
348
|
+
This was the biggest takeaway. No single provider wins across all categories. GPT-4 is best for complex reasoning. GLM-4 is best for multilingual. Groq is best for speed. Mistral is the best all-rounder. **Routing isn't optional -- it's the only sane approach at scale.**
|
|
349
|
+
|
|
350
|
+
### 5. The Quality Trade-off Is Worth It
|
|
351
|
+
|
|
352
|
+
94% quality at 70% cost savings is a no-brainer for most applications. Unless you're in medical, legal, or financial domains where every percentage point matters, the savings far outweigh the small quality dip.
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
## Try It Yourself
|
|
357
|
+
|
|
358
|
+
### Interactive Playground
|
|
359
|
+
|
|
360
|
+
No installation needed. Test routing decisions right in your browser:
|
|
361
|
+
|
|
362
|
+
[CodeSandbox Playground](https://codesandbox.io/p/sandbox/github/Das-rebel/a3m-router/tree/main/playground)
|
|
363
|
+
|
|
364
|
+
### Quick Start
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
# Install
|
|
368
|
+
npm install adaptive-memory-multi-model-router
|
|
369
|
+
|
|
370
|
+
# Route your first query
|
|
371
|
+
npx a3m-router route "Your actual production query here"
|
|
372
|
+
|
|
373
|
+
# See all providers
|
|
374
|
+
npx a3m-router providers --detailed
|
|
375
|
+
|
|
376
|
+
# Compare providers on a specific query
|
|
377
|
+
npx a3m-router compare "Write a binary search in Python"
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Links
|
|
381
|
+
|
|
382
|
+
- **GitHub:** [Das-rebel/a3m-router](https://github.com/Das-rebel/a3m-router)
|
|
383
|
+
- **NPM:** [adaptive-memory-multi-model-router](https://www.npmjs.com/package/adaptive-memory-multi-model-router)
|
|
384
|
+
- **Full Benchmark Data:** [docs/BENCHMARK_DATA.md](https://github.com/Das-rebel/a3m-router/blob/main/docs/BENCHMARK_DATA.md)
|
|
385
|
+
- **License:** MIT (code and data)
|
|
386
|
+
|
|
387
|
+
### Stats
|
|
388
|
+
|
|
389
|
+
- **872** weekly npm downloads
|
|
390
|
+
- **33** tests passing
|
|
391
|
+
- **12** providers pre-configured
|
|
392
|
+
- **47** providers benchmarked
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## The Raw Data
|
|
397
|
+
|
|
398
|
+
I'm sharing the full benchmark dataset because keeping it proprietary defeats the purpose of doing the research. Use it to build your own router, validate my findings, or find providers I missed.
|
|
399
|
+
|
|
400
|
+
**Full dataset:** [BENCHMARK_DATA.md](https://github.com/Das-rebel/a3m-router/blob/main/docs/BENCHMARK_DATA.md)
|
|
401
|
+
|
|
402
|
+
Includes all 47 providers, 12,847 query results, cost/latency/quality breakdowns, and query-type-specific recommendations.
|
|
403
|
+
|
|
404
|
+
---
|
|
405
|
+
|
|
406
|
+
## Over to You
|
|
407
|
+
|
|
408
|
+
I tested 47 providers, but I'm sure I missed some. **What providers are you using that I should benchmark?** Drop them in the comments and I'll add them to the next round.
|
|
409
|
+
|
|
410
|
+
Also curious:
|
|
411
|
+
|
|
412
|
+
- **Do my quality scores match your experience?** I rated 500 samples manually -- would love validation from others running production LLM workloads.
|
|
413
|
+
- **What's your query mix?** Simple Q&A vs code vs complex reasoning? The optimal routing strategy depends heavily on your distribution.
|
|
414
|
+
- **Has anyone else built routing systems?** Would love to compare approaches.
|
|
415
|
+
|
|
416
|
+
*Built this because I was tired of marketing claims. Sharing the data so you don't have to spend $3,200 benchmarking yourself.*
|