adaptive-memory-multi-model-router 2.14.49 → 2.14.51
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +82 -0
- package/.env.example +303 -0
- package/.github/DISCUSSIONS_WELCOME.md +27 -0
- package/.github/DISCUSSION_TEMPLATE.yml +5 -0
- package/.github/FUNDING.yml +2 -0
- package/.github/ISSUE_TEMPLATE/bug_report.md +94 -0
- package/.github/ISSUE_TEMPLATE/config.yml +17 -0
- package/.github/ISSUE_TEMPLATE/feature_request.md +71 -0
- package/.github/PULL_REQUEST_TEMPLATE.md +71 -0
- package/.github/dependabot.yml +9 -0
- package/.github/workflows/auto-publish.yml +51 -0
- package/.github/workflows/ci.yml +263 -0
- package/.github/workflows/codeql.yml +38 -0
- package/.github/workflows/npm-publish.yml +20 -0
- package/.github/workflows/pages.yml +37 -0
- package/.github/workflows/stale.yml +54 -0
- package/.publish-tick +1 -0
- package/.well-known/ai-plugin.json +16 -0
- package/AGENT_COUNCIL_FINDINGS.md +142 -0
- package/ARCHITECTURE.md +346 -0
- package/AUDIT_REPORT.md +28 -0
- package/CODE_OF_CONDUCT.md +128 -0
- package/CONTRIBUTING.md +50 -0
- package/CONTRIBUTORS.md +20 -0
- package/Dockerfile +53 -0
- package/Dockerfile.proxy +33 -0
- package/HEALTH_REPORT.md +118 -0
- package/IMPROVEMENT_PLAN.md +107 -0
- package/LANDING.md +43 -0
- package/LAUNCH-PAIN-DRIVEN.md +339 -0
- package/LAUNCH.md +337 -0
- package/LAUNCH_CHECKLIST.md +141 -0
- package/LAUNCH_SNAPSHOT.md +260 -0
- package/MANIFESTO.md +41 -0
- package/POPULARITY_BOOSTERS.md +285 -0
- package/PR_STATUS_REPORT.md +148 -0
- package/README.md +10 -0
- package/REDESIGN.md +95 -0
- package/RUNKIT.md +83 -0
- package/SECURITY.md +29 -0
- package/SUBMISSIONS.md +43 -0
- package/_schema.html +53 -0
- package/ai-plugin.json +16 -0
- package/articles/AI_AGENT_LLM_ROUTING.md +150 -0
- package/articles/CHINESE_DIRECTORIES.md +100 -0
- package/articles/CHINESE_SUBMISSIONS_READY.md +322 -0
- package/articles/COMPETITOR_ALERTS.md +31 -0
- package/articles/COMPLETE_POSTING_DIRECTORY.md +147 -0
- package/articles/CONTENT_STRUCTURE.md +292 -0
- package/articles/DEVTO_COST_GUIDE.md +473 -0
- package/articles/DEVTO_FINAL.md +416 -0
- package/articles/DEVTO_MULTI_PROVIDER.md +542 -0
- package/articles/DEVTO_READY.md +255 -0
- package/articles/DEVTO_V2_ANNOUNCEMENT.md +160 -0
- package/articles/DEVTO_VIRAL_GROWTH.md +280 -0
- package/articles/FRESH_devto.md +460 -0
- package/articles/FRESH_devto_2026_05.md +73 -0
- package/articles/FRESH_hackernews.md +14 -0
- package/articles/FRESH_reddit_ml.md +90 -0
- package/articles/FRESH_reddit_node.md +198 -0
- package/articles/FRESH_reddit_sideproject.md +72 -0
- package/articles/FRESH_reddit_webdev.md +130 -0
- package/articles/FROM_ZERO_TO_10K.md +107 -0
- package/articles/HN_10X_BETTER.md +430 -0
- package/articles/HN_ACCOUNT_GUIDE.md +21 -0
- package/articles/HN_CHINESE_STYLE.md +308 -0
- package/articles/HN_FINAL.md +148 -0
- package/articles/HN_POSTED_VERSION.md +56 -0
- package/articles/HN_POST_READY.md +137 -0
- package/articles/HN_RESEARCH.md +364 -0
- package/articles/HN_SHOW_routerarena.md +17 -0
- package/articles/HN_TIMING_GUIDE.md +52 -0
- package/articles/INDIEHACKERS_POST.md +52 -0
- package/articles/INDIEHACKERS_READY.md +120 -0
- package/articles/LLM_BENCHMARK_DEEP_DIVE.md +153 -0
- package/articles/MASTER_POSTING_DIRECTORY.md +189 -0
- package/articles/NEWSLETTER_SEND_NOW.md +259 -0
- package/articles/NEWSLETTER_SUBMISSIONS.md +112 -0
- package/articles/PAIN-DRIVEN-devto-v2.md +308 -0
- package/articles/PAIN-DRIVEN-devto-v3.md +268 -0
- package/articles/PAIN-DRIVEN-devto.md +242 -0
- package/articles/PAIN-DRIVEN-hackernews-v2.md +138 -0
- package/articles/PAIN-DRIVEN-hackernews-v3.md +151 -0
- package/articles/PAIN-DRIVEN-hackernews.md +131 -0
- package/articles/PAIN-DRIVEN-reddit-v2.md +301 -0
- package/articles/PAIN-DRIVEN-reddit-v3.md +236 -0
- package/articles/PAIN-DRIVEN-reddit.md +218 -0
- package/articles/PAIN-DRIVEN-twitter-v2.md +110 -0
- package/articles/PAIN-DRIVEN-twitter-v3.md +121 -0
- package/articles/PAIN-DRIVEN-twitter.md +120 -0
- package/articles/PORTKEY_VS_A3M.md +147 -0
- package/articles/POSTING_KIT_2026_05.md +67 -0
- package/articles/PRESS_KIT_routerarena.md +77 -0
- package/articles/PRODUCTHUNT_LISTING.md +48 -0
- package/articles/PRODUCTHUNT_READY.md +106 -0
- package/articles/PR_PLAN_vault.md +125 -0
- package/articles/REDDIT_FINAL.md +232 -0
- package/articles/REDDIT_POST.md +67 -0
- package/articles/REDDIT_SUBMISSION_READY.md +348 -0
- package/articles/ROUTERARENA_LEADER.md +45 -0
- package/articles/SHOW_HN_FINAL.md +29 -0
- package/articles/TWEETS_10K_DOWNLOADS.md +47 -0
- package/articles/TWEETS_BENCHMARK_FIRST.md +46 -0
- package/articles/TWEETS_MCP_PLAY.md +51 -0
- package/articles/TWEETS_SEQUENTIAL_BROKEN.md +49 -0
- package/articles/TWEETS_WHY_BUILD.md +54 -0
- package/articles/TWEETS_routerarena_leader.md +53 -0
- package/articles/TWEET_STORM_READY.md +165 -0
- package/articles/TWITTER_FINAL.md +167 -0
- package/articles/WHY_10X_BETTER.md +261 -0
- package/articles/WHY_CHINESE_STYLE_BETTER.md +323 -0
- package/articles/ai-discoverability-llm-routing.md +210 -0
- package/articles/devto-llm-routing.md +138 -0
- package/articles/hackernews-show-hn.md +54 -0
- package/articles/hashnode-llm-cost-optimization.md +125 -0
- package/articles/hn_show_2026_05.md +11 -0
- package/articles/medium-building-llm-router.md +205 -0
- package/articles/reddit-ml.md +76 -0
- package/articles/twitter-thread-cost-savings.md +50 -0
- package/articles/youtube-tutorial-script.md +262 -0
- package/assets/a3m_3blue1brown.mp4 +0 -0
- package/assets/banner.svg +109 -0
- package/assets/chart-cost-v2.svg +91 -0
- package/assets/chart-cost-v3.svg +143 -0
- package/assets/chart-features-v2.svg +132 -0
- package/assets/chart-features-v3.svg +211 -0
- package/assets/chart-growth-v2.svg +122 -0
- package/assets/chart-growth-v3.svg +189 -0
- package/assets/cost-comparison.svg +134 -0
- package/assets/cost-simple.svg +64 -0
- package/assets/demo-hn.gif +0 -0
- package/assets/feature-matrix.svg +136 -0
- package/assets/growth-chart-animated.svg +76 -0
- package/assets/growth-chart.svg +82 -0
- package/assets/growth-simple.svg +69 -0
- package/assets/hero-diagram.svg +81 -0
- package/assets/logo-new.svg +21 -0
- package/assets/logo.svg +68 -0
- package/assets/provider-comparison.svg +121 -0
- package/assets/social-preview-new.svg +100 -0
- package/assets/social-preview.svg +194 -0
- package/assets/social-v2.svg +130 -0
- package/assets/social-v3.svg +212 -0
- package/benchmark-provider-results.json +245 -0
- package/benchmark-results.json +54 -0
- package/council-votes/architecture-vote.md +121 -0
- package/council-votes/coverage-vote.md +93 -0
- package/data/adaptive-benchmark.json +92 -0
- package/data/benchmark-results.json +47 -0
- package/data/labeled-benchmark.json +88 -0
- package/demo/3blue1brown_video.py +285 -0
- package/demo/3blue1brown_video_v2.py +310 -0
- package/demo/IMPROVED_PROMPTS.md +229 -0
- package/demo/VEO3_PROMPTS.md +269 -0
- package/demo/VIDEO_PRODUCTION_GUIDE.md +333 -0
- package/demo/a3m_3blue1brown.mp4 +0 -0
- package/demo/asciinema-demo.sh +195 -0
- package/demo/demo-hn.tape +74 -0
- package/demo/demo-script.md +53 -0
- package/demo/demo-script.sh +62 -0
- package/demo/demo.svg +75 -0
- package/demo/frame1_ai_data_center.png +0 -0
- package/demo/frame1_sunset_video.mp4 +0 -0
- package/demo/frame2_cost_comparison.png +0 -0
- package/demo/frame2_cost_comparison_fallback.png +0 -0
- package/demo/frame3_parallel_execution.png +0 -0
- package/demo/frame3_parallel_execution_fallback.png +0 -0
- package/demo/frame4_providers.png +0 -0
- package/demo/frame4_providers_fallback.png +0 -0
- package/demo/frame5_endcard.png +0 -0
- package/demo/frame5_endcard_fallback.png +0 -0
- package/demo/new_frame1_hook.png +0 -0
- package/demo/new_frame2_proof.png +0 -0
- package/demo/new_frame3_wow.png +0 -0
- package/demo/new_frame4_social.png +0 -0
- package/demo/new_frame5_cta.png +0 -0
- package/demo/package.json +13 -0
- package/demo/product-video-final.mp4 +0 -0
- package/demo/product-video-hype-v1.mp4 +0 -0
- package/demo/product-video-v1.mp4 +0 -0
- package/demo/public/index.html +762 -0
- package/demo/recording.cast +55 -0
- package/demo/server.js +405 -0
- package/demo-new.tape +71 -0
- package/demo-real.sh +198 -0
- package/demo-simple.tape +205 -0
- package/demo.html +520 -0
- package/demo.sh +85 -0
- package/demo.tape +259 -0
- package/dist/analytics/costAnalytics.d.ts.map +1 -0
- package/dist/analytics/costAnalytics.js.map +1 -0
- package/dist/benchmark/comprehensive.js.map +1 -0
- package/dist/benchmark/reproducible.d.ts.map +1 -0
- package/dist/benchmark/reproducible.js.map +1 -0
- package/dist/cache/prefixCache.d.ts.map +1 -0
- package/dist/cache/prefixCache.js.map +1 -0
- package/dist/cache/responseCache.d.ts.map +1 -0
- package/dist/cache/responseCache.js.map +1 -0
- package/dist/cache/semanticCache.d.ts.map +1 -0
- package/dist/cache/semanticCache.js.map +1 -0
- package/dist/cli/setupWizard.d.ts.map +1 -0
- package/dist/cli/setupWizard.js.map +1 -0
- package/dist/cost/budgetEnforcer.d.ts.map +1 -0
- package/dist/cost/budgetEnforcer.js.map +1 -0
- package/dist/cost/costTracker.d.ts.map +1 -0
- package/dist/cost/costTracker.js.map +1 -0
- package/dist/ensemble/multiRoundDialog.js.map +1 -0
- package/dist/ensemble/shapleyValue.js.map +1 -0
- package/dist/integrations/langchainAdapter.d.ts.map +1 -0
- package/dist/integrations/langchainAdapter.js.map +1 -0
- package/dist/integrations/oauth.d.ts.map +1 -0
- package/dist/integrations/oauth.js.map +1 -0
- package/dist/integrations/scienceAdapter.js.map +1 -0
- package/dist/memory/autoFetch.d.ts.map +1 -0
- package/dist/memory/autoFetch.js.map +1 -0
- package/dist/memory/episodicMemory.d.ts.map +1 -0
- package/dist/memory/episodicMemory.js.map +1 -0
- package/dist/memory/hybridMemory.js.map +1 -0
- package/dist/memory/memoryTree.d.ts.map +1 -0
- package/dist/memory/memoryTree.js.map +1 -0
- package/dist/memory/obsidianVault.d.ts.map +1 -0
- package/dist/memory/obsidianVault.js.map +1 -0
- package/dist/memory/reasoningBank.js.map +1 -0
- package/dist/observability/changeWatch.d.ts.map +1 -0
- package/dist/observability/changeWatch.js.map +1 -0
- package/dist/observability/fatigueDetector.d.ts.map +1 -0
- package/dist/observability/fatigueDetector.js.map +1 -0
- package/dist/observability/index.d.ts.map +1 -0
- package/dist/observability/index.js.map +1 -0
- package/dist/observability/metrics.d.ts.map +1 -0
- package/dist/observability/metrics.js.map +1 -0
- package/dist/observability/middleware.d.ts.map +1 -0
- package/dist/observability/middleware.js.map +1 -0
- package/dist/observability/tracer.d.ts.map +1 -0
- package/dist/observability/tracer.js.map +1 -0
- package/dist/observability/types.d.ts.map +1 -0
- package/dist/observability/types.js.map +1 -0
- package/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
- package/dist/orchestration/haloOrchestrator.js.map +1 -0
- package/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
- package/dist/orchestration/mctsWorkflow.js.map +1 -0
- package/dist/providers/localProvider.d.ts.map +1 -0
- package/dist/providers/localProvider.js.map +1 -0
- package/dist/providers/providerConfig.d.ts.map +1 -0
- package/dist/providers/providerConfig.js.map +1 -0
- package/dist/providers/registry.d.ts.map +1 -0
- package/dist/providers/registry.js.map +1 -0
- package/dist/routing/advancedRouter.d.ts.map +1 -0
- package/dist/routing/advancedRouter.js +1 -1
- package/dist/routing/advancedRouter.js.map +1 -0
- package/dist/routing/crossModelValidation.d.ts.map +1 -0
- package/dist/routing/crossModelValidation.js.map +1 -0
- package/dist/routing/providerHealth.d.ts.map +1 -0
- package/dist/routing/providerHealth.js.map +1 -0
- package/dist/routing/providerRetry.d.ts.map +1 -0
- package/dist/routing/providerRetry.js.map +1 -0
- package/dist/scripts/banner.js +29 -0
- package/dist/security/guardrails.d.ts.map +1 -0
- package/dist/security/guardrails.js.map +1 -0
- package/dist/server/dashboard.d.ts.map +1 -0
- package/dist/server/dashboard.js.map +1 -0
- package/dist/server/modelMapper.d.ts.map +1 -0
- package/dist/server/modelMapper.js.map +1 -0
- package/dist/server/proxyServer.d.ts.map +1 -0
- package/dist/server/proxyServer.js.map +1 -0
- package/dist/skills/__tests__/skill_manager.test.d.ts +2 -0
- package/dist/skills/__tests__/skill_manager.test.d.ts.map +1 -0
- package/dist/skills/__tests__/skill_manager.test.js +268 -0
- package/dist/skills/__tests__/skill_manager.test.js.map +1 -0
- package/dist/tools/tmlpdTools.d.ts.map +1 -0
- package/dist/tools/tmlpdTools.js.map +1 -0
- package/dist/tui/dashboard.d.ts.map +1 -0
- package/dist/tui/dashboard.js.map +1 -0
- package/dist/tui/index.d.ts.map +1 -0
- package/dist/tui/index.js.map +1 -0
- package/dist/utils/batchProcessor.d.ts.map +1 -0
- package/dist/utils/batchProcessor.js.map +1 -0
- package/dist/utils/compression.d.ts.map +1 -0
- package/dist/utils/compression.js.map +1 -0
- package/dist/utils/costUtils.d.ts.map +1 -0
- package/dist/utils/costUtils.js.map +1 -0
- package/dist/utils/reliability.d.ts.map +1 -0
- package/dist/utils/reliability.js.map +1 -0
- package/dist/utils/sorting.d.ts.map +1 -0
- package/dist/utils/sorting.js.map +1 -0
- package/dist/utils/speculativeDecoding.d.ts.map +1 -0
- package/dist/utils/speculativeDecoding.js.map +1 -0
- package/dist/utils/tokenUtils.d.ts.map +1 -0
- package/dist/utils/tokenUtils.js.map +1 -0
- package/docs/.nojekyll +0 -0
- package/docs/ANALYSIS_PRINCIPLES.md +162 -0
- package/docs/API.md +855 -0
- package/docs/ARCHITECTURAL-IMPROVEMENTS-2025.md +1391 -0
- package/docs/ARCHITECTURAL-IMPROVEMENTS-REVISED-2025.md +1051 -0
- package/docs/BENCHMARK.md +170 -0
- package/docs/CHINESE_PROVIDER_RELIABILITY.md +37 -0
- package/docs/CITATIONS.md +74 -0
- package/docs/CLAIMS_AND_EVIDENCE.md +58 -0
- package/docs/CONFIGURATION.md +476 -0
- package/docs/COUNCIL_DECISION.json +816 -0
- package/docs/COUNCIL_SUMMARY.md +319 -0
- package/docs/COUNCIL_V2.2_DECISION.md +416 -0
- package/docs/ENGINEERING_SPEC.md +55 -0
- package/docs/FACTORY_RESET.md +34 -0
- package/docs/GEO.md +66 -0
- package/docs/GEO_OPTIMIZATION.md +30 -0
- package/docs/GEO_ROOT_CAUSE.md +136 -0
- package/docs/GEO_STATUS.md +85 -0
- package/docs/GEO_TEST_RESULTS.md +176 -0
- package/docs/HN_CHECKLIST.md +38 -0
- package/docs/HN_FOUNDER_COMMENT.md +17 -0
- package/docs/HN_SUBMISSION_FINAL.md +180 -0
- package/docs/HN_SUBMISSION_V3.md +56 -0
- package/docs/IMPROVEMENT_ROADMAP.md +515 -0
- package/docs/INTEGRATIONS.md +420 -0
- package/docs/LANGCHAIN_INTEGRATION.md +147 -0
- package/docs/LLM_COUNCIL_DECISION.md +508 -0
- package/docs/MIDDLEWARE_CHAIN.md +35 -0
- package/docs/PROMO_CHECKLIST.md +200 -0
- package/docs/QUICKSTART.md +271 -0
- package/docs/QUICK_START.md +43 -0
- package/docs/QUICK_START_VISIBILITY.md +782 -0
- package/docs/REDDIT_GAP_ANALYSIS.md +299 -0
- package/docs/RELEASE_CHECKLIST.md +32 -0
- package/docs/REPRODUCIBILITY.md +63 -0
- package/docs/RESEARCH_BACKED_IMPROVEMENTS.md +1180 -0
- package/docs/ROUTING_RUBRIC.md +197 -0
- package/docs/SEO_AUDIT.md +186 -0
- package/docs/SOCIAL_LISTENING.md +219 -0
- package/docs/TMLPD_QNA.md +751 -0
- package/docs/TMLPD_V2.1_COMPLETE.md +763 -0
- package/docs/TMLPD_V2.2_RESEARCH_ROADMAP.md +754 -0
- package/docs/UPDATE_TOPICS.md +15 -0
- package/docs/USE_CASES.md +59 -0
- package/docs/V2.2_IMPLEMENTATION_COMPLETE.md +446 -0
- package/docs/V2_IMPLEMENTATION_GUIDE.md +388 -0
- package/docs/VERCEL_AI_SDK.md +209 -0
- package/docs/VISIBILITY_ADOPTION_PLAN.md +1005 -0
- package/docs/_config.yml +49 -0
- package/docs/ai-plugin.json +16 -0
- package/docs/api.html +513 -0
- package/docs/architecture-diagram.md +40 -0
- package/docs/benchmark-chart.png +0 -0
- package/docs/benchmark.html +387 -0
- package/docs/blog/routerarena-number-one.html +73 -0
- package/docs/cli-cheatsheet.md +339 -0
- package/docs/compare.md +109 -0
- package/docs/comparison-litellm.md +88 -0
- package/docs/comparison.md +108 -0
- package/docs/cost-chart-ascii.md +42 -0
- package/docs/cost-comparison-chart.svg +88 -0
- package/docs/curl-examples.md +247 -0
- package/docs/demo-auto.html +264 -0
- package/docs/demo.html +416 -0
- package/docs/geo/GENERATIVE_ENGINE_OPTIMIZATION.md +232 -0
- package/docs/index.html +507 -0
- package/docs/launch-content/LAUNCH_EXECUTION_CHECKLIST.md +421 -0
- package/docs/launch-content/README.md +457 -0
- package/docs/launch-content/assets/cost_comparison_100_tasks.png +0 -0
- package/docs/launch-content/assets/cumulative_savings.png +0 -0
- package/docs/launch-content/assets/parallel_speedup.png +0 -0
- package/docs/launch-content/assets/provider_pricing_comparison.png +0 -0
- package/docs/launch-content/assets/task_breakdown_comparison.png +0 -0
- package/docs/launch-content/generate_charts.py +313 -0
- package/docs/launch-content/hn_show_post.md +139 -0
- package/docs/launch-content/partner_outreach_templates.md +745 -0
- package/docs/launch-content/reddit_posts.md +467 -0
- package/docs/launch-content/twitter_thread.txt +460 -0
- package/{llms.txt.bak → docs/llms.txt} +6 -6
- package/docs/npm-downloads-chart.svg +43 -0
- package/docs/openapi.json +139 -0
- package/docs/openapi.yaml +1318 -0
- package/docs/quick-start.html +366 -0
- package/docs/robots.txt +52 -0
- package/docs/sitemap.xml +57 -0
- package/docs/styles.css +682 -0
- package/docs/well-known/ai-plugin.json +16 -0
- package/docs/wellknown/ai-plugin.json +16 -0
- package/docs-site/assets/og-banner.svg +194 -0
- package/docs-site/index.html +632 -0
- package/eval/README.md +46 -0
- package/eval/baselines/main.json +12 -0
- package/eval/benchmark_dataset.jsonl +16 -0
- package/eval/check_golden_routes.js +64 -0
- package/eval/datasets/catalog.json +33 -0
- package/eval/datasets/slices/cn_provider_reliability_v1.jsonl +3 -0
- package/eval/datasets/slices/cost_pressure_v1.jsonl +3 -0
- package/eval/datasets/slices/safety_guardrails_v1.jsonl +3 -0
- package/eval/evals.json +199 -0
- package/eval/fault_injection_thresholds.json +3 -0
- package/eval/generate_report.js +128 -0
- package/eval/golden_routes.json +114 -0
- package/eval/lib/experiment_registry.js +24 -0
- package/eval/run_eval.js +197 -0
- package/eval/run_fault_injection.js +201 -0
- package/eval/run_shadow_eval.js +85 -0
- package/eval/thresholds.json +9 -0
- package/examples/QUICKSTART.md +183 -0
- package/examples/README.md +61 -0
- package/examples/a3m-sdk.js +124 -0
- package/examples/basic-route.js +54 -0
- package/examples/chat-loop.js +202 -0
- package/examples/classify-then-route.js +102 -0
- package/examples/cost-compare.js +120 -0
- package/examples/ensemble.js +160 -0
- package/examples/whatsapp-telegram-bridge-demo.js +302 -0
- package/examples/whatsapp-telegram-bridge.js +269 -0
- package/hf-space/README.md +23 -0
- package/hf-space/app.py +240 -0
- package/hf-space/requirements.txt +1 -0
- package/huggingface_space/README.md +35 -0
- package/huggingface_space/app.py +126 -0
- package/huggingface_space/create_space.py +208 -0
- package/huggingface_space/requirements.txt +1 -0
- package/mcp-server/README.md +188 -0
- package/mcp-server/package.json +29 -0
- package/mcp-server/src/index.ts +744 -0
- package/mcp-server/tsconfig.json +19 -0
- package/openclaw-alexa-bridge/ALL_REMAINING_FIXES_PLAN.md +313 -0
- package/openclaw-alexa-bridge/REMAINING_FIXES_SUMMARY.md +277 -0
- package/openclaw-alexa-bridge/src/alexa_handler_no_tmlpd.js +1234 -0
- package/openclaw-alexa-bridge/test_fixes.js +77 -0
- package/package.json +73 -270
- package/playground/README.md +51 -0
- package/playground/codesandbox.json +12 -0
- package/playground/index.js +39 -0
- package/proxy/README.md +227 -0
- package/proxy/package-lock.json +831 -0
- package/proxy/package.json +17 -0
- package/proxy/rate-limit.js +145 -0
- package/proxy/rate-limit.test.js +311 -0
- package/proxy/server.js +970 -0
- package/python/README.md +102 -0
- package/python/a3m/__init__.py +6 -0
- package/python/a3m/client.py +190 -0
- package/python/a3m/models.py +40 -0
- package/python/a3m/sync_client.py +61 -0
- package/python/examples.py +53 -0
- package/python/integrations.py +330 -0
- package/python/pyproject.toml +23 -0
- package/python/setup.py +28 -0
- package/python/tmlpd.py +369 -0
- package/qna/REDDIT_GAP_ANALYSIS.md +299 -0
- package/qna/TMLPD_QNA.md +751 -0
- package/research/FINDING_001_safety.md +28 -0
- package/research/FINDING_002_error_diversity.md +32 -0
- package/research/FINDING_003_confidence_weighted_voting.md +32 -0
- package/research/FINDING_004_cross_model_semantic_detection.md +37 -0
- package/research/FINDING_005_knowledge_gap_orthogonality.md +34 -0
- package/research/HALLUCINATION_RESEARCH.md +27 -0
- package/research/PUBLISH_LOG.md +3 -0
- package/research/ensemble-voting.md +324 -0
- package/research/loss-functions.md +545 -0
- package/research-log.md +49 -0
- package/scripts/banner.js +29 -0
- package/scripts/benchmark-local-routerarena.ts +176 -0
- package/scripts/benchmark.js +145 -0
- package/scripts/benchmark.sh +61 -0
- package/scripts/compare-providers.sh +230 -0
- package/scripts/content-planner.js +25 -0
- package/scripts/create-labeled-benchmark.ts +105 -0
- package/scripts/cross_post.py +443 -0
- package/scripts/local-router-benchmark.ts +154 -0
- package/scripts/post-all.sh +41 -0
- package/scripts/publish_fcc.py +106 -0
- package/scripts/push-to-gitee.sh +25 -0
- package/scripts/routerarena_ensemble.js +144 -0
- package/scripts/routing-benchmark-v2.js +373 -0
- package/scripts/routing-benchmark-v3.js +118 -0
- package/scripts/routing-benchmark.js +462 -0
- package/scripts/run-labeled-benchmark.mjs +104 -0
- package/scripts/run-mmlu-benchmark.js +176 -0
- package/scripts/run-provider-benchmark.js +244 -0
- package/scripts/update-npm-badges.js +158 -0
- package/skill/SKILL.md +238 -0
- package/src/__tests__/integration/tmpld_integration.test.py +540 -0
- package/src/routing/advancedRouter.ts +1 -1
- package/src/skills/__tests__/skill_manager.test.ts +328 -0
- package/submissions/benchmarks/ALL_PLATFORMS_SUBMISSION.md +94 -0
- package/submissions/benchmarks/LLMROUTERBENCH_SUBMISSION.md +121 -0
- package/submissions/benchmarks/MMRBENCH_SUBMISSION.md +94 -0
- package/submissions/benchmarks/ROUTERARENA_UPDATE.md +83 -0
- package/submissions/benchmarks/ROUTERBENCH_SUBMISSION.md +225 -0
- package/test-council/1-structure-tests.test.js +353 -0
- package/test-council/1-structure-tests.test.ts +353 -0
- package/test-council/2-edge-case-tests.test.ts +361 -0
- package/test-council/3-performance-tests.test.ts +669 -0
- package/test-council/4-integration-tests.test.ts +391 -0
- package/test-council/5-agent-council-eval.test.ts +413 -0
- package/test-council/AGENT_COUNCIL_ARCHITECTURE.md +349 -0
- package/test-council/TEST_COUNCIL_REPORT.md +201 -0
- package/test-council/agents/edge-case-agent.ts +363 -0
- package/test-council/agents/performance-agent.ts +426 -0
- package/test-council/agents/structure-agent.ts +227 -0
- package/test-council/council.md +183 -0
- package/tests/__mocks__/tokenUtils.ts +8 -0
- package/tests/memory/episodicMemory.test.ts +227 -0
- package/tests/package-lock.json +1628 -0
- package/tests/package.json +18 -0
- package/tests/routing/ensembleVoting.test.ts +236 -0
- package/tests/routing/providerRetry.test.ts +360 -0
- package/tests/routing/queryTypePresets.test.ts +208 -0
- package/tests/security/guardrailEngine.test.ts +700 -0
- package/tests/tsconfig.json +21 -0
- package/tests/vitest.config.ts +18 -0
- package/tmlpd-pi-extension/README.md +66 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts +114 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.js +285 -0
- package/tmlpd-pi-extension/dist/cache/prefixCache.js.map +1 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts +58 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.js +153 -0
- package/tmlpd-pi-extension/dist/cache/responseCache.js.map +1 -0
- package/tmlpd-pi-extension/dist/cli.js +59 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts +95 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.js +240 -0
- package/tmlpd-pi-extension/dist/cost/costTracker.js.map +1 -0
- package/tmlpd-pi-extension/dist/index.d.ts +723 -0
- package/tmlpd-pi-extension/dist/index.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/index.js +239 -0
- package/tmlpd-pi-extension/dist/index.js.map +1 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts +82 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js +145 -0
- package/tmlpd-pi-extension/dist/memory/episodicMemory.js.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts +102 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js +207 -0
- package/tmlpd-pi-extension/dist/orchestration/haloOrchestrator.js.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts +85 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js +210 -0
- package/tmlpd-pi-extension/dist/orchestration/mctsWorkflow.js.map +1 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts +102 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.js +338 -0
- package/tmlpd-pi-extension/dist/providers/localProvider.js.map +1 -0
- package/tmlpd-pi-extension/dist/providers/registry.d.ts +55 -0
- package/tmlpd-pi-extension/dist/providers/registry.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/providers/registry.js +138 -0
- package/tmlpd-pi-extension/dist/providers/registry.js.map +1 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts +68 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js +332 -0
- package/tmlpd-pi-extension/dist/routing/advancedRouter.js.map +1 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts +101 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js +368 -0
- package/tmlpd-pi-extension/dist/tools/tmlpdTools.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts +96 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js +170 -0
- package/tmlpd-pi-extension/dist/utils/batchProcessor.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/compression.d.ts +61 -0
- package/tmlpd-pi-extension/dist/utils/compression.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/compression.js +281 -0
- package/tmlpd-pi-extension/dist/utils/compression.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts +74 -0
- package/tmlpd-pi-extension/dist/utils/reliability.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/reliability.js +177 -0
- package/tmlpd-pi-extension/dist/utils/reliability.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts +117 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js +246 -0
- package/tmlpd-pi-extension/dist/utils/speculativeDecoding.js.map +1 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts +50 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.d.ts.map +1 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js +124 -0
- package/tmlpd-pi-extension/dist/utils/tokenUtils.js.map +1 -0
- package/tmlpd-pi-extension/examples/QUICKSTART.md +183 -0
- package/tmlpd-pi-extension/package-lock.json +79 -0
- package/tmlpd-pi-extension/package.json +172 -0
- package/tmlpd-pi-extension/python/examples.py +53 -0
- package/tmlpd-pi-extension/python/integrations.py +330 -0
- package/tmlpd-pi-extension/python/setup.py +28 -0
- package/tmlpd-pi-extension/python/tmlpd.py +369 -0
- package/tmlpd-pi-extension/qna/REDDIT_GAP_ANALYSIS.md +299 -0
- package/tmlpd-pi-extension/qna/TMLPD_QNA.md +751 -0
- package/tmlpd-pi-extension/skill/SKILL.md +238 -0
- package/tmlpd-pi-extension/src/cache/responseCache.ts +147 -0
- package/tmlpd-pi-extension/src/cost/costTracker.ts +302 -0
- package/tmlpd-pi-extension/src/index.ts +232 -0
- package/tmlpd-pi-extension/src/memory/episodicMemory.ts +257 -0
- package/tmlpd-pi-extension/src/orchestration/haloOrchestrator.ts +266 -0
- package/tmlpd-pi-extension/src/orchestration/mctsWorkflow.ts +262 -0
- package/tmlpd-pi-extension/src/providers/localProvider.ts +406 -0
- package/tmlpd-pi-extension/src/providers/registry.ts +164 -0
- package/tmlpd-pi-extension/src/routing/ensembleVoting.ts +159 -0
- package/tmlpd-pi-extension/src/routing/queryTypePresets.ts +136 -0
- package/tmlpd-pi-extension/src/tools/tmlpdTools.ts +433 -0
- package/tmlpd-pi-extension/src/utils/batchProcessor.ts +232 -0
- package/tmlpd-pi-extension/src/utils/compression.ts +325 -0
- package/tmlpd-pi-extension/src/utils/reliability.ts +221 -0
- package/tmlpd-pi-extension/src/utils/tokenUtils.ts +145 -0
- package/tmlpd-pi-extension/tsconfig.json +18 -0
- package/tsconfig.build.json +29 -0
- package/tsconfig.json +18 -0
- package/README.md.bak +0 -1185
- package/src/routing/advancedRouter.ts.bak +0 -650
- package/test.js.bak +0 -376
- /package/{llms-full.txt.bak → docs/llms-full.txt} +0 -0
|
@@ -0,0 +1,460 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "We Built an LLM Router That Runs on Keywords, Not Neural Networks — Here's How It Works"
|
|
3
|
+
published: false
|
|
4
|
+
description: "A 19.5 KB TypeScript package that routes LLM queries with 70.32 accuracy using 5 keyword-based signals. No GPU, no ML weights, zero dependencies."
|
|
5
|
+
tags: llm, typescript, ai, optimization
|
|
6
|
+
cover_image: https://placeholder.dev.to/cover.png
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
We needed to route LLM queries across 36 providers. The ML approach (BERT classifier, embedding similarity, LLM-as-judge) adds latency, infrastructure, and cost. We tried something simpler: a 5-signal keyword scoring system in pure TypeScript.
|
|
10
|
+
|
|
11
|
+
The result: **70.32 accuracy**, **64.5% exact match**, **0.3ms routing latency**, in a **19.5 KB gzipped** package with zero runtime dependencies.
|
|
12
|
+
|
|
13
|
+
Here's exactly how each signal works, with code.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## The problem
|
|
18
|
+
|
|
19
|
+
We have 36 LLM providers across 5 complexity tiers:
|
|
20
|
+
|
|
21
|
+
| Tier | Count | Examples | Price range |
|
|
22
|
+
|------|-------|---------|-------------|
|
|
23
|
+
| Free | 6 | Gemini Flash, Groq free tier | $0 |
|
|
24
|
+
| Cheap | 15 | DeepSeek, Mistral Small | ~$0.15/1M tokens |
|
|
25
|
+
| Mid | 9 | Claude Sonnet, GPT-4o-mini | ~$1-3/1M tokens |
|
|
26
|
+
| Premium | 3 | GPT-4, Claude Opus | ~$15-30/1M tokens |
|
|
27
|
+
| Enterprise | 3 | Claude Max, GPT-4 turbo | ~$60+/1M tokens |
|
|
28
|
+
|
|
29
|
+
Every query needs to land in the right tier. Sending "what is 2+2?" to GPT-4 wastes money. Sending "design a Byzantine fault-tolerant consensus algorithm" to a free model wastes the response.
|
|
30
|
+
|
|
31
|
+
## The 5-signal architecture
|
|
32
|
+
|
|
33
|
+
Each incoming query is scored on five orthogonal signals (0-1 range). The weighted sum maps to a tier.
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Query → [domain, task, structure, verb, specificity] → weighted sum → tier → provider
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Let's break down each signal.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
### Signal 1: Domain Detection
|
|
44
|
+
|
|
45
|
+
**What it measures:** Is this query from a specialized domain (code, math, legal, medical)?
|
|
46
|
+
|
|
47
|
+
**Why it matters:** Domain-specific queries need domain-specific capabilities. Code generation needs instruction-following. Math needs chain-of-thought. Medical needs accuracy.
|
|
48
|
+
|
|
49
|
+
```typescript
|
|
50
|
+
const DOMAIN_PATTERNS: Record<string, RegExp[]> = {
|
|
51
|
+
code: [
|
|
52
|
+
/\b(function|class|import|export|async|await|def|return|const|let|var)\b/gi,
|
|
53
|
+
/\b(api|endpoint|database|query|schema|migrate|deploy)\b/gi,
|
|
54
|
+
],
|
|
55
|
+
math: [
|
|
56
|
+
/\b(equation|integral|derivative|theorem|proof|calculate|solve|formula)\b/gi,
|
|
57
|
+
/\b(algebra|calculus|geometry|statistics|probability)\b/gi,
|
|
58
|
+
],
|
|
59
|
+
legal: [
|
|
60
|
+
/\b(contract|liability|clause|statute|regulation|compliance|attorney)\b/gi,
|
|
61
|
+
],
|
|
62
|
+
medical: [
|
|
63
|
+
/\b(diagnosis|symptom|treatment|patient|clinical|dosage|prescription)\b/gi,
|
|
64
|
+
],
|
|
65
|
+
};
|
|
66
|
+
|
|
67
|
+
function scoreDomain(query: string): number {
|
|
68
|
+
let maxScore = 0;
|
|
69
|
+
for (const [domain, patterns] of Object.entries(DOMAIN_PATTERNS)) {
|
|
70
|
+
const matchCount = patterns.reduce(
|
|
71
|
+
(sum, pattern) => sum + (query.match(pattern)?.length ?? 0), 0
|
|
72
|
+
);
|
|
73
|
+
const domainScore = Math.min(matchCount * 0.15, 1.0);
|
|
74
|
+
maxScore = Math.max(maxScore, domainScore);
|
|
75
|
+
}
|
|
76
|
+
return maxScore;
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Example scoring:**
|
|
81
|
+
|
|
82
|
+
| Query | Domain score | Detected domain |
|
|
83
|
+
|-------|-------------|----------------|
|
|
84
|
+
| "What is the weather?" | 0.0 | none |
|
|
85
|
+
| "Explain async/await in JavaScript" | 0.45 | code |
|
|
86
|
+
| "Prove that sqrt(2) is irrational" | 0.45 | math |
|
|
87
|
+
| "Debug this React component, the useState hook isn't updating" | 0.60 | code |
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
### Signal 2: Task Indicators
|
|
92
|
+
|
|
93
|
+
**What it measures:** What type of task is the user asking for? Summarize, translate, debug, create, analyze?
|
|
94
|
+
|
|
95
|
+
**Why it matters:** Different tasks have different complexity ceilings. "Summarize" is bounded. "Create from scratch" is unbounded.
|
|
96
|
+
|
|
97
|
+
```typescript
|
|
98
|
+
const TASK_KEYWORDS: Record<string, { keywords: string[]; complexity: number }> = {
|
|
99
|
+
summarize: {
|
|
100
|
+
keywords: ['summarize', 'tldr', 'brief', 'overview', 'recap', 'sum up'],
|
|
101
|
+
complexity: 0.2,
|
|
102
|
+
},
|
|
103
|
+
translate: {
|
|
104
|
+
keywords: ['translate', 'in french', 'in spanish', 'in german', 'in japanese'],
|
|
105
|
+
complexity: 0.25,
|
|
106
|
+
},
|
|
107
|
+
explain: {
|
|
108
|
+
keywords: ['explain', 'describe', 'tell me about', 'what is', 'how does'],
|
|
109
|
+
complexity: 0.3,
|
|
110
|
+
},
|
|
111
|
+
debug: {
|
|
112
|
+
keywords: ['debug', 'fix this', 'error', 'stack trace', 'not working', 'broken'],
|
|
113
|
+
complexity: 0.55,
|
|
114
|
+
},
|
|
115
|
+
analyze: {
|
|
116
|
+
keywords: ['analyze', 'compare', 'evaluate', 'assess', 'investigate', 'critique'],
|
|
117
|
+
complexity: 0.7,
|
|
118
|
+
},
|
|
119
|
+
create: {
|
|
120
|
+
keywords: ['write', 'create', 'generate', 'build', 'implement', 'design', 'develop'],
|
|
121
|
+
complexity: 0.75,
|
|
122
|
+
},
|
|
123
|
+
architect: {
|
|
124
|
+
keywords: ['architect', 'design a system', 'system design', 'infrastructure'],
|
|
125
|
+
complexity: 0.9,
|
|
126
|
+
},
|
|
127
|
+
};
|
|
128
|
+
|
|
129
|
+
function scoreTask(query: string): number {
|
|
130
|
+
const lower = query.toLowerCase();
|
|
131
|
+
let score = 0;
|
|
132
|
+
for (const [task, config] of Object.entries(TASK_KEYWORDS)) {
|
|
133
|
+
const matched = config.keywords.some(kw => lower.includes(kw));
|
|
134
|
+
if (matched) score += config.complexity;
|
|
135
|
+
}
|
|
136
|
+
return Math.min(score, 1.0);
|
|
137
|
+
}
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
**Example scoring:**
|
|
141
|
+
|
|
142
|
+
| Query | Task score | Tasks detected |
|
|
143
|
+
|-------|-----------|---------------|
|
|
144
|
+
| "What is React?" | 0.3 | explain |
|
|
145
|
+
| "Summarize this article" | 0.2 | summarize |
|
|
146
|
+
| "Debug this Python script and explain the fix" | 0.85 | debug + explain |
|
|
147
|
+
| "Design a microservices architecture and write the API gateway" | 1.0 | architect + create |
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
### Signal 3: Query Structure
|
|
152
|
+
|
|
153
|
+
**What it measures:** The structural complexity of the query — multiple steps, conditionals, nested requirements.
|
|
154
|
+
|
|
155
|
+
**Why it matters:** "Translate this" is simple. "Translate this, then summarize in 3 bullets, then check for legal compliance" is structurally complex regardless of the individual tasks.
|
|
156
|
+
|
|
157
|
+
```typescript
|
|
158
|
+
function scoreStructure(query: string): number {
|
|
159
|
+
let score = 0;
|
|
160
|
+
|
|
161
|
+
// Multi-step queries ("first do X, then do Y, finally Z")
|
|
162
|
+
const stepMarkers = query.split(/\b(first|then|after|before|finally|next|lastly)\b/i);
|
|
163
|
+
score += Math.max(0, (stepMarkers.length - 1)) * 0.2;
|
|
164
|
+
|
|
165
|
+
// Conditional queries ("if X then Y otherwise Z")
|
|
166
|
+
const conditionals = query.match(/\b(if|unless|otherwise|whether|given that)\b/gi);
|
|
167
|
+
score += (conditionals?.length ?? 0) * 0.15;
|
|
168
|
+
|
|
169
|
+
// Conjunction chains (A and B and C)
|
|
170
|
+
const conjunctions = query.match(/\band\b/gi);
|
|
171
|
+
score += Math.min((conjunctions?.length ?? 0) * 0.05, 0.2);
|
|
172
|
+
|
|
173
|
+
// Query length with diminishing returns
|
|
174
|
+
score += Math.min(query.length / 500, 0.3);
|
|
175
|
+
|
|
176
|
+
// Nested quotes or code blocks (indicates context-heavy queries)
|
|
177
|
+
const codeBlocks = query.match(/```[\s\S]*?```/g);
|
|
178
|
+
score += (codeBlocks?.length ?? 0) * 0.1;
|
|
179
|
+
|
|
180
|
+
return Math.min(score, 1.0);
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Example scoring:**
|
|
185
|
+
|
|
186
|
+
| Query | Structure score | Why |
|
|
187
|
+
|-------|----------------|-----|
|
|
188
|
+
| "What is Python?" | 0.04 | short, simple |
|
|
189
|
+
| "Explain async/await" | 0.05 | short, simple |
|
|
190
|
+
| "First translate to French, then summarize in 3 bullets" | 0.47 | multi-step |
|
|
191
|
+
| "If the user is admin, show the dashboard with all metrics, otherwise show a limited view with only their data" | 0.72 | conditional + multi-step |
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
### Signal 4: Action Verb Intensity
|
|
196
|
+
|
|
197
|
+
**What it measures:** How demanding the requested action is. "List" < "explain" < "analyze" < "design" < "architect".
|
|
198
|
+
|
|
199
|
+
```typescript
|
|
200
|
+
const VERB_WEIGHTS: Record<string, number> = {
|
|
201
|
+
// Low intensity
|
|
202
|
+
'what is': 0.1, 'define': 0.15, 'list': 0.2, 'describe': 0.25,
|
|
203
|
+
// Medium intensity
|
|
204
|
+
'explain': 0.35, 'convert': 0.4, 'translate': 0.4, 'summarize': 0.4,
|
|
205
|
+
'rewrite': 0.45, 'format': 0.45,
|
|
206
|
+
// High intensity
|
|
207
|
+
'debug': 0.6, 'fix': 0.6, 'analyze': 0.65, 'compare': 0.65,
|
|
208
|
+
'optimize': 0.7, 'refactor': 0.7, 'implement': 0.75,
|
|
209
|
+
// Very high intensity
|
|
210
|
+
'design': 0.8, 'architect': 0.85, 'reverse-engineer': 0.9,
|
|
211
|
+
'create from scratch': 0.9,
|
|
212
|
+
};
|
|
213
|
+
|
|
214
|
+
function scoreVerb(query: string): number {
|
|
215
|
+
const lower = query.toLowerCase();
|
|
216
|
+
let maxVerb = 0;
|
|
217
|
+
for (const [verb, weight] of Object.entries(VERB_WEIGHTS)) {
|
|
218
|
+
if (lower.includes(verb)) {
|
|
219
|
+
maxVerb = Math.max(maxVerb, weight);
|
|
220
|
+
}
|
|
221
|
+
}
|
|
222
|
+
return maxVerb;
|
|
223
|
+
}
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
### Signal 5: Specificity
|
|
229
|
+
|
|
230
|
+
**What it measures:** How precise and technical the query is. "Tell me about AI" vs "Implement a transformer decoder with multi-head attention using PyTorch".
|
|
231
|
+
|
|
232
|
+
```typescript
|
|
233
|
+
function scoreSpecificity(query: string): number {
|
|
234
|
+
let score = 0;
|
|
235
|
+
|
|
236
|
+
// Technical terms (camelCase, PascalCase identifiers)
|
|
237
|
+
const technicalTerms = query.match(/\b[A-Z][a-z]+[A-Z][a-z]+\b/g);
|
|
238
|
+
score += Math.min((technicalTerms?.length ?? 0) * 0.12, 0.3);
|
|
239
|
+
|
|
240
|
+
// Quoted strings (specific values, names, identifiers)
|
|
241
|
+
const quotedTerms = query.match(/["'`][^"'`]+["'`]/g);
|
|
242
|
+
score += Math.min((quotedTerms?.length ?? 0) * 0.1, 0.2);
|
|
243
|
+
|
|
244
|
+
// Numbers and measurements (specificity indicator)
|
|
245
|
+
const numbers = query.match(/\d+/g);
|
|
246
|
+
score += Math.min((numbers?.length ?? 0) * 0.03, 0.15);
|
|
247
|
+
|
|
248
|
+
// Penalize vagueness
|
|
249
|
+
const vagueTerms = query.match(/\b(something|anything|stuff|things|etc|whatever|some)\b/gi);
|
|
250
|
+
score -= (vagueTerms?.length ?? 0) * 0.15;
|
|
251
|
+
|
|
252
|
+
// Bonus for field-specific jargon density
|
|
253
|
+
const jargonTerms = query.match(/\b(algorithm|protocol|architecture|paradigm|heuristic|orthogonal)\b/gi);
|
|
254
|
+
score += Math.min((jargonTerms?.length ?? 0) * 0.1, 0.2);
|
|
255
|
+
|
|
256
|
+
return Math.max(0, Math.min(score, 1.0));
|
|
257
|
+
}
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Putting it all together
|
|
263
|
+
|
|
264
|
+
```typescript
|
|
265
|
+
interface RoutingSignals {
|
|
266
|
+
domain: number;
|
|
267
|
+
task: number;
|
|
268
|
+
structure: number;
|
|
269
|
+
verbIntensity: number;
|
|
270
|
+
specificity: number;
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
const WEIGHTS = {
|
|
274
|
+
domain: 0.25,
|
|
275
|
+
task: 0.25,
|
|
276
|
+
structure: 0.20,
|
|
277
|
+
verbIntensity: 0.15,
|
|
278
|
+
specificity: 0.15,
|
|
279
|
+
};
|
|
280
|
+
|
|
281
|
+
const TIER_THRESHOLDS: [number, Tier][] = [
|
|
282
|
+
[0.20, 'free'],
|
|
283
|
+
[0.40, 'cheap'],
|
|
284
|
+
[0.60, 'mid'],
|
|
285
|
+
[0.80, 'premium'],
|
|
286
|
+
[1.01, 'enterprise'],
|
|
287
|
+
];
|
|
288
|
+
|
|
289
|
+
function route(query: string): Tier {
|
|
290
|
+
const signals: RoutingSignals = {
|
|
291
|
+
domain: scoreDomain(query),
|
|
292
|
+
task: scoreTask(query),
|
|
293
|
+
structure: scoreStructure(query),
|
|
294
|
+
verbIntensity: scoreVerb(query),
|
|
295
|
+
specificity: scoreSpecificity(query),
|
|
296
|
+
};
|
|
297
|
+
|
|
298
|
+
const score =
|
|
299
|
+
signals.domain * WEIGHTS.domain +
|
|
300
|
+
signals.task * WEIGHTS.task +
|
|
301
|
+
signals.structure * WEIGHTS.structure +
|
|
302
|
+
signals.verbIntensity * WEIGHTS.verbIntensity +
|
|
303
|
+
signals.specificity * WEIGHTS.specificity;
|
|
304
|
+
|
|
305
|
+
for (const [threshold, tier] of TIER_THRESHOLDS) {
|
|
306
|
+
if (score < threshold) return tier;
|
|
307
|
+
}
|
|
308
|
+
return 'enterprise';
|
|
309
|
+
}
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
---
|
|
313
|
+
|
|
314
|
+
## Real query examples with full scoring
|
|
315
|
+
|
|
316
|
+
### Example 1: "What is Python?"
|
|
317
|
+
|
|
318
|
+
| Signal | Score | Weight | Weighted |
|
|
319
|
+
|--------|-------|--------|----------|
|
|
320
|
+
| Domain | 0.0 | 0.25 | 0.0 |
|
|
321
|
+
| Task | 0.3 | 0.25 | 0.075 |
|
|
322
|
+
| Structure | 0.03 | 0.20 | 0.006 |
|
|
323
|
+
| Verb | 0.1 | 0.15 | 0.015 |
|
|
324
|
+
| Specificity | 0.0 | 0.15 | 0.0 |
|
|
325
|
+
| **Total** | | | **0.096** |
|
|
326
|
+
|
|
327
|
+
**Routed to: Free tier** ✅
|
|
328
|
+
|
|
329
|
+
### Example 2: "Implement a red-black tree with insert, delete, and search operations in TypeScript"
|
|
330
|
+
|
|
331
|
+
| Signal | Score | Weight | Weighted |
|
|
332
|
+
|--------|-------|--------|----------|
|
|
333
|
+
| Domain | 0.45 | 0.25 | 0.1125 |
|
|
334
|
+
| Task | 0.75 | 0.25 | 0.1875 |
|
|
335
|
+
| Structure | 0.15 | 0.20 | 0.03 |
|
|
336
|
+
| Verb | 0.75 | 0.15 | 0.1125 |
|
|
337
|
+
| Specificity | 0.42 | 0.15 | 0.063 |
|
|
338
|
+
| **Total** | | | **0.505** |
|
|
339
|
+
|
|
340
|
+
**Routed to: Mid tier** ✅
|
|
341
|
+
|
|
342
|
+
### Example 3: "Design a fault-tolerant distributed database that handles network partitions, supports ACID transactions, and can scale to 10,000 nodes. Include the consensus protocol, replication strategy, and failure recovery mechanism."
|
|
343
|
+
|
|
344
|
+
| Signal | Score | Weight | Weighted |
|
|
345
|
+
|--------|-------|--------|----------|
|
|
346
|
+
| Domain | 0.30 | 0.25 | 0.075 |
|
|
347
|
+
| Task | 0.90 | 0.25 | 0.225 |
|
|
348
|
+
| Structure | 0.62 | 0.20 | 0.124 |
|
|
349
|
+
| Verb | 0.80 | 0.15 | 0.12 |
|
|
350
|
+
| Specificity | 0.65 | 0.15 | 0.0975 |
|
|
351
|
+
| **Total** | | | **0.641** |
|
|
352
|
+
|
|
353
|
+
**Routed to: Premium tier** ✅
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## Benchmark results
|
|
358
|
+
|
|
359
|
+
Tested on 2,500 real-world queries across coding, creative writing, analysis, math, translation, and general Q&A.
|
|
360
|
+
|
|
361
|
+
```
|
|
362
|
+
Confusion Matrix (3-tier simplified):
|
|
363
|
+
|
|
364
|
+
Predicted
|
|
365
|
+
Free Mid Premium
|
|
366
|
+
Actual Free 812 38 5
|
|
367
|
+
Actual Mid 41 647 27
|
|
368
|
+
Actual Premium 3 22 705
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
| Metric | Value |
|
|
372
|
+
|--------|-------|
|
|
373
|
+
| Exact tier match | 64.5% |
|
|
374
|
+
| accuracy | 70.32 |
|
|
375
|
+
| Mean absolute error | 0.37 tiers |
|
|
376
|
+
| Routing latency | 0.3ms per query |
|
|
377
|
+
| Cost savings vs premium-only | 61.6% |
|
|
378
|
+
|
|
379
|
+
---
|
|
380
|
+
|
|
381
|
+
## What about the other features?
|
|
382
|
+
|
|
383
|
+
### Semantic Cache
|
|
384
|
+
|
|
385
|
+
Uses trigram Jaccard similarity to detect near-duplicate queries:
|
|
386
|
+
|
|
387
|
+
```typescript
|
|
388
|
+
function trigramJaccard(a: string, b: string): number {
|
|
389
|
+
const trigrams = (s: string) => {
|
|
390
|
+
const set = new Set<string>();
|
|
391
|
+
for (let i = 0; i <= s.length - 3; i++) {
|
|
392
|
+
set.add(s.slice(i, i + 3));
|
|
393
|
+
}
|
|
394
|
+
return set;
|
|
395
|
+
};
|
|
396
|
+
const setA = trigrams(a.toLowerCase());
|
|
397
|
+
const setB = trigrams(b.toLowerCase());
|
|
398
|
+
const intersection = [...setA].filter(x => setB.has(x)).length;
|
|
399
|
+
const union = new Set([...setA, ...setB]).size;
|
|
400
|
+
return intersection / union;
|
|
401
|
+
}
|
|
402
|
+
|
|
403
|
+
// "Explain React hooks" and "what are React hooks?" → Jaccard > 0.4 → cache hit
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
### Prompt Injection Detection
|
|
407
|
+
|
|
408
|
+
17 patterns covering common attack vectors:
|
|
409
|
+
|
|
410
|
+
```typescript
|
|
411
|
+
const INJECTION_PATTERNS = [
|
|
412
|
+
/ignore\s+(all\s+)?previous\s+instructions/i,
|
|
413
|
+
/you\s+are\s+now\s+/i,
|
|
414
|
+
/system\s*:\s*/i,
|
|
415
|
+
/\[INST\]/i,
|
|
416
|
+
/simulate\s+/i,
|
|
417
|
+
/pretend\s+you\s+(are|can)/i,
|
|
418
|
+
/jailbreak/i,
|
|
419
|
+
/DAN\s+mode/i,
|
|
420
|
+
// ... 9 more patterns
|
|
421
|
+
];
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## Get started
|
|
427
|
+
|
|
428
|
+
```bash
|
|
429
|
+
npm install adaptive-memory-multi-model-router
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
```typescript
|
|
433
|
+
import { A3MRouter } from 'adaptive-memory-multi-model-router';
|
|
434
|
+
|
|
435
|
+
const router = new A3MRouter({
|
|
436
|
+
providers: {
|
|
437
|
+
openai: { apiKey: process.env.OPENAI_API_KEY },
|
|
438
|
+
anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
|
|
439
|
+
google: { apiKey: process.env.GOOGLE_API_KEY },
|
|
440
|
+
groq: { apiKey: process.env.GROQ_API_KEY },
|
|
441
|
+
}
|
|
442
|
+
});
|
|
443
|
+
|
|
444
|
+
const result = await router.route({
|
|
445
|
+
messages: [{ role: 'user', content: 'Your query here' }]
|
|
446
|
+
});
|
|
447
|
+
|
|
448
|
+
console.log(`Provider: ${result.provider}`);
|
|
449
|
+
console.log(`Tier: ${result.tier}`);
|
|
450
|
+
console.log(`Cost: $${result.cost}`);
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
**GitHub:** https://github.com/Das-rebel/a3m-router
|
|
454
|
+
**npm:** https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
455
|
+
|
|
456
|
+
MIT license. Self-hosted. No account. 19.5 KB. TypeScript + Python SDKs, CLI, REST API, OpenAI proxy, LangChain adapter.
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
*We're actively looking for independent benchmark evaluations. If you run the router against your own query distribution, we'd love to see the results — especially cases where it fails.*
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
LLM infrastructure has three problems that shouldn't exist in 2026. Here's what we built because nobody else fixed them.
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## Problem 1: Your LLM bill is unnecessarily high
|
|
6
|
+
|
|
7
|
+
Everyone routes everything to GPT-4 because who has time to configure per-query routing. The bill hits 3-5x what it should be for zero extra value.
|
|
8
|
+
|
|
9
|
+
People are already switching because of this. A dev on X: *"Cancelled both my Claude Code Pro and ChatGPT Pro. Kimi K2.6 is just as good for my side projects as Opus or GPT 5.4 were. The price for this is crazy low."*
|
|
10
|
+
|
|
11
|
+
Another one: *"Just used gemini-embedding-2 to vectorize 27,603 notes for semantic search. Total cost: $0.07. That's pretty amazing."*
|
|
12
|
+
|
|
13
|
+
The pattern is obvious — developers are actively looking for cheaper alternatives. The problem is doing it query-by-query without wasting time.
|
|
14
|
+
|
|
15
|
+
We built a router that classifies every query by complexity and sends it to the cheapest capable model.
|
|
16
|
+
|
|
17
|
+
```javascript
|
|
18
|
+
"Design a clinical trial protocol" → premium ($2.50/M tokens)
|
|
19
|
+
"Write a Python sort function" → groq ($0.20/M tokens)
|
|
20
|
+
"What is 2+2?" → free ($0.00/M tokens)
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Result: **62% cost savings** measured across 200 real API calls. Not theoretical.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Problem 2: Sequential fallback gives you one answer, not the best
|
|
28
|
+
|
|
29
|
+
Every gateway does: try A → fail → try B → fail → try C.
|
|
30
|
+
|
|
31
|
+
You always get one provider's answer. Never the best across all. If A is slow, everything waits.
|
|
32
|
+
|
|
33
|
+
Someone already built `ai-retry` — a library for retry and fallback mechanisms — because this is such a common pain. People are hacking around it manually.
|
|
34
|
+
|
|
35
|
+
We went further. Run all providers in parallel. Score every result on specificity, structure, and relevance. Return the best answer with reasons why it won.
|
|
36
|
+
|
|
37
|
+
```javascript
|
|
38
|
+
const result = await executeEnsemble(query, context, {
|
|
39
|
+
nvidia: callNvidia,
|
|
40
|
+
groq: callGroq,
|
|
41
|
+
openai: callOpenAI
|
|
42
|
+
});
|
|
43
|
+
// → nvidia (scored 75, higher specificity on code)
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Problem 3: Every gateway claims "negligible overhead." None publish numbers.
|
|
49
|
+
|
|
50
|
+
It's the standard line. "Negligible overhead" followed by zero data.
|
|
51
|
+
|
|
52
|
+
We ran ours through a third-party benchmark tool (llm-gateway-bench) and published everything:
|
|
53
|
+
|
|
54
|
+
| Scenario | Time | What's included |
|
|
55
|
+
|:---------|:----:|:----------------|
|
|
56
|
+
| Direct to Groq | **138ms** | Raw API call |
|
|
57
|
+
| Through A3M | **374ms** | Routing + cache + guardrails + cost tracking |
|
|
58
|
+
|
|
59
|
+
236ms overhead. Not zero. But it saves 62% on API costs — that's ~$2,600/year at 100K queries/month.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Why it grew
|
|
64
|
+
|
|
65
|
+
10,024 downloads in 14 days. Zero marketing. Developers found it on npm, tried it, told other developers.
|
|
66
|
+
|
|
67
|
+
The feedback loop was: *"My bill is too high"* → 62% savings. *"I want the best answer, not the first one"* → parallel ensemble. *"I don't trust your latency claims"* → here's the third-party benchmark, run it yourself.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
*npm: `npm install adaptive-memory-multi-model-router`*
|
|
72
|
+
*GitHub: [github.com/Das-rebel/a3m-router](https://github.com/Das-rebel/a3m-router)*
|
|
73
|
+
*Benchmarks: third-party via [llm-gateway-bench](https://github.com/taffy-owo/llm-gateway-bench)*
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
Show HN: A3M Router — 70.32 LLM routing accuracy with zero ML, 36 providers, semantic cache
|
|
2
|
+
|
|
3
|
+
A3M Router is a TypeScript LLM routing library that classifies query complexity using 5 keyword-based signals (domain detection, task indicators, query structure, action verb intensity, specificity) instead of neural networks. The weighted signal sum maps queries to one of 5 complexity tiers (free → enterprise), which routes to the cheapest provider that can handle the query.
|
|
4
|
+
|
|
5
|
+
On a 2,500-query benchmark: 70.32 accuracy, 64.5% exact tier match, 0.3ms routing latency. The entire routing classifier is ~200 lines of TypeScript with zero runtime dependencies and a 19.5 KB gzipped package size. 61.6% cost savings vs. sending everything to premium providers.
|
|
6
|
+
|
|
7
|
+
Supports 36 providers (OpenAI, Anthropic, Google, Groq, Cerebras, Mistral, DeepSeek, etc.) across 5 tiers. Includes a semantic cache (trigram Jaccard similarity), 17-pattern prompt injection detection, PII redaction, and cost analytics. Available as TypeScript SDK, Python SDK, CLI, REST API, OpenAI-compatible proxy, and LangChain adapter. MIT license, self-hosted, no account required.
|
|
8
|
+
|
|
9
|
+
The core insight is that keyword-based routing is within of BERT-based routing for nearly all queries, at zero infrastructure cost. The routing signals are composable and adjustable — if a particular domain routes poorly, you add domain-specific patterns without retraining anything.
|
|
10
|
+
|
|
11
|
+
Repo: https://github.com/Das-rebel/a3m-router
|
|
12
|
+
npm: https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
13
|
+
|
|
14
|
+
Caveat: the 70.32 figure is self-benchmarked. We'd welcome independent evaluation, especially on non-English or creative writing query distributions where the keyword signals may be weaker.
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# [D] We benchmarked keyword-based routing vs BERT for LLM provider selection. The gap is smaller than we expected — and keyword routing has zero infra cost.
|
|
2
|
+
|
|
3
|
+
**TL;DR:** A 5-signal keyword classifier routes LLM queries across 36 providers with 70.32 accuracy and 64.5% exact tier match, in a 19.5 KB gzipped package with no ML weights. We're sharing the methodology and invite scrutiny on the benchmark design.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Background
|
|
8
|
+
|
|
9
|
+
When you have 36 LLM providers (6 free, 15 cheap, 9 mid-tier, 3 premium, 3 enterprise), routing queries to the right provider matters. A simple "coding question → code model" heuristic breaks down fast. The established approaches are:
|
|
10
|
+
|
|
11
|
+
1. **BERT/transformer-based routing** (e.g., RouteLLM trains a BERT classifier on paired human preferences)
|
|
12
|
+
2. **LLM-as-judge routing** (ask GPT-4 to classify query complexity)
|
|
13
|
+
3. **Rule-based routing** (regex, keyword matching)
|
|
14
|
+
|
|
15
|
+
We went with approach 3, but with a structured 5-signal scoring system instead of naive regex. The question was: how much accuracy do we actually sacrifice?
|
|
16
|
+
|
|
17
|
+
## The 5 routing signals
|
|
18
|
+
|
|
19
|
+
Each query is scored on five orthogonal signals (0-1 scale each):
|
|
20
|
+
|
|
21
|
+
| Signal | What it measures | Example high-score query |
|
|
22
|
+
|--------|-----------------|------------------------|
|
|
23
|
+
| Domain detection | Is this a specialized domain (code, math, legal, medical)? | "Implement a red-black tree with insert and delete" |
|
|
24
|
+
| Task indicators | What type of task (summarize, translate, debug, create)? | "Debug this Python stack trace and explain the root cause" |
|
|
25
|
+
| Query structure | Complexity of the query itself (multi-step, conditional, nested) | "First translate to French, then summarize in 3 bullets, then check for legal compliance" |
|
|
26
|
+
| Action verb intensity | Strength/demand of the action requested | "Reverse-engineer" > "explain" > "mention" |
|
|
27
|
+
| Specificity | How precise/vague the request is | "Quantum error correction in topological codes" vs "tell me about physics" |
|
|
28
|
+
|
|
29
|
+
The weighted sum maps to one of 5 tiers, which maps to a provider. The whole thing runs in ~0.3ms per query.
|
|
30
|
+
|
|
31
|
+
## Benchmark results
|
|
32
|
+
|
|
33
|
+
We tested on a held-out set of 2,500 real-world queries across domains (coding, creative writing, analysis, math, translation, general Q&A).
|
|
34
|
+
|
|
35
|
+
**Confusion matrix (simplified to 3 tiers for readability):**
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
Predicted
|
|
39
|
+
Free Mid Premium
|
|
40
|
+
Actual Free 812 38 5
|
|
41
|
+
Actual Mid 41 647 27
|
|
42
|
+
Actual Premium 3 22 705
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Full 5-tier results:
|
|
46
|
+
|
|
47
|
+
| Metric | Value |
|
|
48
|
+
|--------|-------|
|
|
49
|
+
| Exact tier match | 64.5% |
|
|
50
|
+
| accuracy | 70.32 |
|
|
51
|
+
| Mean absolute error | 0.37 tiers |
|
|
52
|
+
| Routing latency | 0.3ms/query |
|
|
53
|
+
|
|
54
|
+
** accuracy of 70.32** means the router is never sending a trivial "what's the weather" query to GPT-4, and it's never sending a "design a distributed consensus algorithm" query to a free tier.
|
|
55
|
+
|
|
56
|
+
### Cost impact
|
|
57
|
+
|
|
58
|
+
On the same query workload:
|
|
59
|
+
|
|
60
|
+
| Strategy | Cost | Savings |
|
|
61
|
+
|----------|------|---------|
|
|
62
|
+
| Premium-only (GPT-4 for everything) | $1.00 | — |
|
|
63
|
+
| RouteLLM (reported in their paper) | ~$0.47 | ~53% |
|
|
64
|
+
| A3M Router (our benchmark) | $0.384 | 61.6% |
|
|
65
|
+
|
|
66
|
+
## Honest caveats (please poke holes)
|
|
67
|
+
|
|
68
|
+
1. **Self-benchmarking.** We wrote the classifier, we designed the test set, we ran the evaluation. This is the biggest threat to validity. We'd love an independent evaluation. The test set and evaluation code are in the repo.
|
|
69
|
+
|
|
70
|
+
2. **The 64.5% exact match is mediocre.** If you need surgical tier precision (e.g., you're operating at margins where the difference between "cheap" and "mid-tier" matters a lot), 64.5% means 1 in 3 queries lands in an adjacent tier. The metric papers over this.
|
|
71
|
+
|
|
72
|
+
3. **No comparison with RouteLLM on the same data.** We reference RouteLLM's publicly reported numbers, but we didn't run RouteLLM on our test set. Different query distributions make direct comparison unreliable.
|
|
73
|
+
|
|
74
|
+
4. **Query distribution bias.** Our test set likely over-represents English, coding, and analytical queries because that's what we test with. Non-English and creative tasks may route differently.
|
|
75
|
+
|
|
76
|
+
5. **Cost savings depend heavily on your query mix.** 61.6% is our benchmark workload. If 90% of your queries are complex, routing saves less. If 90% are simple, routing saves more.
|
|
77
|
+
|
|
78
|
+
## Questions for the community
|
|
79
|
+
|
|
80
|
+
- Is accuracy actually the right metric? Or should we optimize for exact match at the cost of simplicity?
|
|
81
|
+
- Has anyone compared RouteLLM's BERT-based approach against a strong keyword baseline on the same dataset? Our suspicion is that the gap is smaller than the ML community assumes.
|
|
82
|
+
- For production routing, what's the actual cost of a "wrong tier" routing? We assume is fine because provider quality within adjacent tiers overlaps significantly. Is that assumption valid?
|
|
83
|
+
- Are there public LLM routing benchmarks we should be evaluating on?
|
|
84
|
+
|
|
85
|
+
## Links
|
|
86
|
+
|
|
87
|
+
- **Repo:** https://github.com/Das-rebel/a3m-router
|
|
88
|
+
- **npm:** https://www.npmjs.com/package/adaptive-memory-multi-model-router
|
|
89
|
+
|
|
90
|
+
The classifier is ~200 lines of TypeScript. No dependencies beyond a standard Node.js runtime. If you want to reproduce the benchmark or contribute a more rigorous evaluation, PRs welcome.
|