agentic-flow 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/MIGRATION_SUMMARY.md +222 -0
- package/.claude/agents/README.md +89 -0
- package/.claude/agents/analysis/code-analyzer.md +209 -0
- package/.claude/agents/analysis/code-review/analyze-code-quality.md +180 -0
- package/.claude/agents/architecture/system-design/arch-system-design.md +156 -0
- package/.claude/agents/base-template-generator.md +42 -0
- package/.claude/agents/consensus/README.md +253 -0
- package/.claude/agents/consensus/byzantine-coordinator.md +63 -0
- package/.claude/agents/consensus/crdt-synchronizer.md +997 -0
- package/.claude/agents/consensus/gossip-coordinator.md +63 -0
- package/.claude/agents/consensus/performance-benchmarker.md +851 -0
- package/.claude/agents/consensus/quorum-manager.md +823 -0
- package/.claude/agents/consensus/raft-manager.md +63 -0
- package/.claude/agents/consensus/security-manager.md +622 -0
- package/.claude/agents/core/coder.md +211 -0
- package/.claude/agents/core/planner.md +116 -0
- package/.claude/agents/core/researcher.md +136 -0
- package/.claude/agents/core/reviewer.md +272 -0
- package/.claude/agents/core/tester.md +266 -0
- package/.claude/agents/data/ml/data-ml-model.md +193 -0
- package/.claude/agents/development/backend/dev-backend-api.md +142 -0
- package/.claude/agents/devops/ci-cd/ops-cicd-github.md +164 -0
- package/.claude/agents/documentation/api-docs/docs-api-openapi.md +174 -0
- package/.claude/agents/flow-nexus/app-store.md +88 -0
- package/.claude/agents/flow-nexus/authentication.md +69 -0
- package/.claude/agents/flow-nexus/challenges.md +81 -0
- package/.claude/agents/flow-nexus/neural-network.md +88 -0
- package/.claude/agents/flow-nexus/payments.md +83 -0
- package/.claude/agents/flow-nexus/sandbox.md +76 -0
- package/.claude/agents/flow-nexus/swarm.md +76 -0
- package/.claude/agents/flow-nexus/user-tools.md +96 -0
- package/.claude/agents/flow-nexus/workflow.md +84 -0
- package/.claude/agents/github/code-review-swarm.md +538 -0
- package/.claude/agents/github/github-modes.md +173 -0
- package/.claude/agents/github/issue-tracker.md +319 -0
- package/.claude/agents/github/multi-repo-swarm.md +553 -0
- package/.claude/agents/github/pr-manager.md +191 -0
- package/.claude/agents/github/project-board-sync.md +509 -0
- package/.claude/agents/github/release-manager.md +367 -0
- package/.claude/agents/github/release-swarm.md +583 -0
- package/.claude/agents/github/repo-architect.md +398 -0
- package/.claude/agents/github/swarm-issue.md +573 -0
- package/.claude/agents/github/swarm-pr.md +428 -0
- package/.claude/agents/github/sync-coordinator.md +452 -0
- package/.claude/agents/github/workflow-automation.md +635 -0
- package/.claude/agents/goal/agent.md +816 -0
- package/.claude/agents/goal/goal-planner.md +73 -0
- package/.claude/agents/optimization/README.md +250 -0
- package/.claude/agents/optimization/benchmark-suite.md +665 -0
- package/.claude/agents/optimization/load-balancer.md +431 -0
- package/.claude/agents/optimization/performance-monitor.md +672 -0
- package/.claude/agents/optimization/resource-allocator.md +674 -0
- package/.claude/agents/optimization/topology-optimizer.md +808 -0
- package/.claude/agents/payments/agentic-payments.md +126 -0
- package/.claude/agents/sparc/architecture.md +472 -0
- package/.claude/agents/sparc/pseudocode.md +318 -0
- package/.claude/agents/sparc/refinement.md +525 -0
- package/.claude/agents/sparc/specification.md +276 -0
- package/.claude/agents/specialized/mobile/spec-mobile-react-native.md +226 -0
- package/.claude/agents/sublinear/consensus-coordinator.md +338 -0
- package/.claude/agents/sublinear/matrix-optimizer.md +185 -0
- package/.claude/agents/sublinear/pagerank-analyzer.md +299 -0
- package/.claude/agents/sublinear/performance-optimizer.md +368 -0
- package/.claude/agents/sublinear/trading-predictor.md +246 -0
- package/.claude/agents/swarm/README.md +190 -0
- package/.claude/agents/swarm/adaptive-coordinator.md +396 -0
- package/.claude/agents/swarm/hierarchical-coordinator.md +256 -0
- package/.claude/agents/swarm/mesh-coordinator.md +392 -0
- package/.claude/agents/templates/automation-smart-agent.md +205 -0
- package/.claude/agents/templates/coordinator-swarm-init.md +90 -0
- package/.claude/agents/templates/github-pr-manager.md +177 -0
- package/.claude/agents/templates/implementer-sparc-coder.md +259 -0
- package/.claude/agents/templates/memory-coordinator.md +187 -0
- package/.claude/agents/templates/migration-plan.md +746 -0
- package/.claude/agents/templates/orchestrator-task.md +139 -0
- package/.claude/agents/templates/performance-analyzer.md +199 -0
- package/.claude/agents/templates/sparc-coordinator.md +183 -0
- package/.claude/agents/test-neural.md +14 -0
- package/.claude/agents/testing/unit/tdd-london-swarm.md +244 -0
- package/.claude/agents/testing/validation/production-validator.md +395 -0
- package/.claude/commands/agents/README.md +10 -0
- package/.claude/commands/agents/agent-capabilities.md +21 -0
- package/.claude/commands/agents/agent-coordination.md +28 -0
- package/.claude/commands/agents/agent-spawning.md +28 -0
- package/.claude/commands/agents/agent-types.md +26 -0
- package/.claude/commands/analysis/COMMAND_COMPLIANCE_REPORT.md +54 -0
- package/.claude/commands/analysis/README.md +9 -0
- package/.claude/commands/analysis/bottleneck-detect.md +162 -0
- package/.claude/commands/analysis/performance-bottlenecks.md +59 -0
- package/.claude/commands/analysis/performance-report.md +25 -0
- package/.claude/commands/analysis/token-efficiency.md +45 -0
- package/.claude/commands/analysis/token-usage.md +25 -0
- package/.claude/commands/automation/README.md +9 -0
- package/.claude/commands/automation/auto-agent.md +122 -0
- package/.claude/commands/automation/self-healing.md +106 -0
- package/.claude/commands/automation/session-memory.md +90 -0
- package/.claude/commands/automation/smart-agents.md +73 -0
- package/.claude/commands/automation/smart-spawn.md +25 -0
- package/.claude/commands/automation/workflow-select.md +25 -0
- package/.claude/commands/claude-flow-help.md +103 -0
- package/.claude/commands/claude-flow-memory.md +107 -0
- package/.claude/commands/claude-flow-swarm.md +205 -0
- package/.claude/commands/coordination/README.md +9 -0
- package/.claude/commands/coordination/agent-spawn.md +25 -0
- package/.claude/commands/coordination/init.md +44 -0
- package/.claude/commands/coordination/orchestrate.md +43 -0
- package/.claude/commands/coordination/spawn.md +45 -0
- package/.claude/commands/coordination/swarm-init.md +85 -0
- package/.claude/commands/coordination/task-orchestrate.md +25 -0
- package/.claude/commands/flow-nexus/app-store.md +124 -0
- package/.claude/commands/flow-nexus/challenges.md +120 -0
- package/.claude/commands/flow-nexus/login-registration.md +65 -0
- package/.claude/commands/flow-nexus/neural-network.md +134 -0
- package/.claude/commands/flow-nexus/payments.md +116 -0
- package/.claude/commands/flow-nexus/sandbox.md +83 -0
- package/.claude/commands/flow-nexus/swarm.md +87 -0
- package/.claude/commands/flow-nexus/user-tools.md +152 -0
- package/.claude/commands/flow-nexus/workflow.md +115 -0
- package/.claude/commands/github/README.md +11 -0
- package/.claude/commands/github/code-review-swarm.md +514 -0
- package/.claude/commands/github/code-review.md +25 -0
- package/.claude/commands/github/github-modes.md +147 -0
- package/.claude/commands/github/github-swarm.md +121 -0
- package/.claude/commands/github/issue-tracker.md +292 -0
- package/.claude/commands/github/issue-triage.md +25 -0
- package/.claude/commands/github/multi-repo-swarm.md +519 -0
- package/.claude/commands/github/pr-enhance.md +26 -0
- package/.claude/commands/github/pr-manager.md +170 -0
- package/.claude/commands/github/project-board-sync.md +471 -0
- package/.claude/commands/github/release-manager.md +338 -0
- package/.claude/commands/github/release-swarm.md +544 -0
- package/.claude/commands/github/repo-analyze.md +25 -0
- package/.claude/commands/github/repo-architect.md +367 -0
- package/.claude/commands/github/swarm-issue.md +482 -0
- package/.claude/commands/github/swarm-pr.md +285 -0
- package/.claude/commands/github/sync-coordinator.md +301 -0
- package/.claude/commands/github/workflow-automation.md +442 -0
- package/.claude/commands/hive-mind/README.md +17 -0
- package/.claude/commands/hive-mind/hive-mind-consensus.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-init.md +18 -0
- package/.claude/commands/hive-mind/hive-mind-memory.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-metrics.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-resume.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-sessions.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-spawn.md +21 -0
- package/.claude/commands/hive-mind/hive-mind-status.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-stop.md +8 -0
- package/.claude/commands/hive-mind/hive-mind-wizard.md +8 -0
- package/.claude/commands/hive-mind/hive-mind.md +27 -0
- package/.claude/commands/hooks/README.md +11 -0
- package/.claude/commands/hooks/overview.md +58 -0
- package/.claude/commands/hooks/post-edit.md +117 -0
- package/.claude/commands/hooks/post-task.md +112 -0
- package/.claude/commands/hooks/pre-edit.md +113 -0
- package/.claude/commands/hooks/pre-task.md +111 -0
- package/.claude/commands/hooks/session-end.md +118 -0
- package/.claude/commands/hooks/setup.md +103 -0
- package/.claude/commands/memory/README.md +9 -0
- package/.claude/commands/memory/memory-persist.md +25 -0
- package/.claude/commands/memory/memory-search.md +25 -0
- package/.claude/commands/memory/memory-usage.md +25 -0
- package/.claude/commands/memory/neural.md +47 -0
- package/.claude/commands/memory/usage.md +46 -0
- package/.claude/commands/monitoring/README.md +9 -0
- package/.claude/commands/monitoring/agent-metrics.md +25 -0
- package/.claude/commands/monitoring/agents.md +44 -0
- package/.claude/commands/monitoring/real-time-view.md +25 -0
- package/.claude/commands/monitoring/status.md +46 -0
- package/.claude/commands/monitoring/swarm-monitor.md +25 -0
- package/.claude/commands/optimization/README.md +9 -0
- package/.claude/commands/optimization/auto-topology.md +62 -0
- package/.claude/commands/optimization/cache-manage.md +25 -0
- package/.claude/commands/optimization/parallel-execute.md +25 -0
- package/.claude/commands/optimization/parallel-execution.md +50 -0
- package/.claude/commands/optimization/topology-optimize.md +25 -0
- package/.claude/commands/pair/README.md +261 -0
- package/.claude/commands/pair/commands.md +546 -0
- package/.claude/commands/pair/config.md +510 -0
- package/.claude/commands/pair/examples.md +512 -0
- package/.claude/commands/pair/modes.md +348 -0
- package/.claude/commands/pair/session.md +407 -0
- package/.claude/commands/pair/start.md +209 -0
- package/.claude/commands/sparc/analyzer.md +52 -0
- package/.claude/commands/sparc/architect.md +53 -0
- package/.claude/commands/sparc/ask.md +97 -0
- package/.claude/commands/sparc/batch-executor.md +54 -0
- package/.claude/commands/sparc/code.md +89 -0
- package/.claude/commands/sparc/coder.md +54 -0
- package/.claude/commands/sparc/debug.md +83 -0
- package/.claude/commands/sparc/debugger.md +54 -0
- package/.claude/commands/sparc/designer.md +53 -0
- package/.claude/commands/sparc/devops.md +109 -0
- package/.claude/commands/sparc/docs-writer.md +80 -0
- package/.claude/commands/sparc/documenter.md +54 -0
- package/.claude/commands/sparc/innovator.md +54 -0
- package/.claude/commands/sparc/integration.md +83 -0
- package/.claude/commands/sparc/mcp.md +117 -0
- package/.claude/commands/sparc/memory-manager.md +54 -0
- package/.claude/commands/sparc/optimizer.md +54 -0
- package/.claude/commands/sparc/orchestrator.md +132 -0
- package/.claude/commands/sparc/post-deployment-monitoring-mode.md +83 -0
- package/.claude/commands/sparc/refinement-optimization-mode.md +83 -0
- package/.claude/commands/sparc/researcher.md +54 -0
- package/.claude/commands/sparc/reviewer.md +54 -0
- package/.claude/commands/sparc/security-review.md +80 -0
- package/.claude/commands/sparc/sparc-modes.md +174 -0
- package/.claude/commands/sparc/sparc.md +111 -0
- package/.claude/commands/sparc/spec-pseudocode.md +80 -0
- package/.claude/commands/sparc/supabase-admin.md +348 -0
- package/.claude/commands/sparc/swarm-coordinator.md +54 -0
- package/.claude/commands/sparc/tdd.md +54 -0
- package/.claude/commands/sparc/tester.md +54 -0
- package/.claude/commands/sparc/tutorial.md +79 -0
- package/.claude/commands/sparc/workflow-manager.md +54 -0
- package/.claude/commands/sparc.md +166 -0
- package/.claude/commands/stream-chain/pipeline.md +121 -0
- package/.claude/commands/stream-chain/run.md +70 -0
- package/.claude/commands/swarm/README.md +15 -0
- package/.claude/commands/swarm/analysis.md +95 -0
- package/.claude/commands/swarm/development.md +96 -0
- package/.claude/commands/swarm/examples.md +168 -0
- package/.claude/commands/swarm/maintenance.md +102 -0
- package/.claude/commands/swarm/optimization.md +117 -0
- package/.claude/commands/swarm/research.md +136 -0
- package/.claude/commands/swarm/swarm-analysis.md +8 -0
- package/.claude/commands/swarm/swarm-background.md +8 -0
- package/.claude/commands/swarm/swarm-init.md +19 -0
- package/.claude/commands/swarm/swarm-modes.md +8 -0
- package/.claude/commands/swarm/swarm-monitor.md +8 -0
- package/.claude/commands/swarm/swarm-spawn.md +19 -0
- package/.claude/commands/swarm/swarm-status.md +8 -0
- package/.claude/commands/swarm/swarm-strategies.md +8 -0
- package/.claude/commands/swarm/swarm.md +27 -0
- package/.claude/commands/swarm/testing.md +131 -0
- package/.claude/commands/training/README.md +9 -0
- package/.claude/commands/training/model-update.md +25 -0
- package/.claude/commands/training/neural-patterns.md +74 -0
- package/.claude/commands/training/neural-train.md +25 -0
- package/.claude/commands/training/pattern-learn.md +25 -0
- package/.claude/commands/training/specialization.md +63 -0
- package/.claude/commands/truth/start.md +143 -0
- package/.claude/commands/verify/check.md +50 -0
- package/.claude/commands/verify/start.md +128 -0
- package/.claude/commands/workflows/README.md +9 -0
- package/.claude/commands/workflows/development.md +78 -0
- package/.claude/commands/workflows/research.md +63 -0
- package/.claude/commands/workflows/workflow-create.md +25 -0
- package/.claude/commands/workflows/workflow-execute.md +25 -0
- package/.claude/commands/workflows/workflow-export.md +25 -0
- package/.claude/helpers/checkpoint-manager.sh +251 -0
- package/.claude/helpers/github-safe.js +106 -0
- package/.claude/helpers/github-setup.sh +28 -0
- package/.claude/helpers/quick-start.sh +19 -0
- package/.claude/helpers/setup-mcp.sh +18 -0
- package/.claude/helpers/standard-checkpoint-hooks.sh +179 -0
- package/.claude/mcp.json +13 -0
- package/.claude/settings-backup.json +130 -0
- package/.claude/settings-optimized.json +116 -0
- package/.claude/settings-simple.json +78 -0
- package/.claude/settings.json +114 -0
- package/.claude/settings.local.json +14 -0
- package/README.md +1280 -0
- package/dist/agents/claudeAgent.js +73 -0
- package/dist/agents/claudeFlowAgent.js +115 -0
- package/dist/agents/codeReviewAgent.js +34 -0
- package/dist/agents/dataAgent.js +34 -0
- package/dist/agents/directApiAgent.js +260 -0
- package/dist/agents/webResearchAgent.js +35 -0
- package/dist/cli/mcp.js +135 -0
- package/dist/cli-proxy.js +246 -0
- package/dist/cli.js +158 -0
- package/dist/config/claudeFlow.js +67 -0
- package/dist/config/tools.js +33 -0
- package/dist/coordination/parallelSwarm.js +226 -0
- package/dist/examples/multi-agent-orchestration.js +45 -0
- package/dist/examples/parallel-swarm-deployment.js +171 -0
- package/dist/examples/use-goal-planner.js +52 -0
- package/dist/health.js +46 -0
- package/dist/index-with-proxy.js +101 -0
- package/dist/index.js +167 -0
- package/dist/mcp/claudeFlowSdkServer.js +202 -0
- package/dist/mcp/fastmcp/servers/claude-flow-sdk.js +198 -0
- package/dist/mcp/fastmcp/servers/http-streaming-updated.js +421 -0
- package/dist/mcp/fastmcp/servers/poc-stdio.js +82 -0
- package/dist/mcp/fastmcp/servers/stdio-full.js +421 -0
- package/dist/mcp/fastmcp/tools/agent/add-agent.js +107 -0
- package/dist/mcp/fastmcp/tools/agent/add-command.js +117 -0
- package/dist/mcp/fastmcp/tools/agent/execute.js +56 -0
- package/dist/mcp/fastmcp/tools/agent/list.js +82 -0
- package/dist/mcp/fastmcp/tools/agent/parallel.js +63 -0
- package/dist/mcp/fastmcp/tools/memory/retrieve.js +38 -0
- package/dist/mcp/fastmcp/tools/memory/search.js +41 -0
- package/dist/mcp/fastmcp/tools/memory/store.js +56 -0
- package/dist/mcp/fastmcp/tools/swarm/init.js +41 -0
- package/dist/mcp/fastmcp/tools/swarm/orchestrate.js +47 -0
- package/dist/mcp/fastmcp/tools/swarm/spawn.js +40 -0
- package/dist/mcp/fastmcp/types/index.js +2 -0
- package/dist/proxy/anthropic-to-openrouter.js +246 -0
- package/dist/router/providers/anthropic.js +89 -0
- package/dist/router/providers/onnx-local-optimized.js +167 -0
- package/dist/router/providers/onnx-local.js +294 -0
- package/dist/router/providers/onnx-phi4.js +190 -0
- package/dist/router/providers/onnx.js +242 -0
- package/dist/router/providers/openrouter.js +242 -0
- package/dist/router/router.js +283 -0
- package/dist/router/test-integration.js +140 -0
- package/dist/router/test-onnx-benchmark.js +145 -0
- package/dist/router/test-onnx-integration.js +128 -0
- package/dist/router/test-onnx-local.js +37 -0
- package/dist/router/test-onnx.js +148 -0
- package/dist/router/test-openrouter.js +121 -0
- package/dist/router/test-phi4.js +137 -0
- package/dist/router/types.js +2 -0
- package/dist/utils/agentLoader.js +106 -0
- package/dist/utils/cli.js +128 -0
- package/dist/utils/logger.js +41 -0
- package/dist/utils/mcpCommands.js +214 -0
- package/dist/utils/model-downloader.js +182 -0
- package/dist/utils/retry.js +54 -0
- package/docs/.claude-flow/metrics/agent-metrics.json +1 -0
- package/docs/.claude-flow/metrics/performance.json +9 -0
- package/docs/.claude-flow/metrics/task-metrics.json +10 -0
- package/docs/CHANGELOG.md +155 -0
- package/docs/CLAUDE.md +352 -0
- package/docs/COMPLETE_VALIDATION_SUMMARY.md +405 -0
- package/docs/INDEX.md +183 -0
- package/docs/LICENSE +21 -0
- package/docs/ONNX_CLI_USAGE.md +344 -0
- package/docs/ONNX_ENV_VARS.md +564 -0
- package/docs/ONNX_INTEGRATION.md +422 -0
- package/docs/ONNX_OPTIMIZATION_GUIDE.md +665 -0
- package/docs/ONNX_OPTIMIZATION_SUMMARY.md +374 -0
- package/docs/ONNX_VS_CLAUDE_QUALITY.md +442 -0
- package/docs/OPENROUTER_DEPLOYMENT.md +495 -0
- package/docs/architecture/EXECUTIVE_SUMMARY.md +310 -0
- package/docs/architecture/IMPROVEMENT_PLAN.md +11 -0
- package/docs/architecture/INTEGRATION-STATUS.md +290 -0
- package/docs/architecture/MULTI_MODEL_ROUTER_PLAN.md +620 -0
- package/docs/architecture/QUICK_WINS.md +333 -0
- package/docs/architecture/README.md +15 -0
- package/docs/architecture/RESEARCH_SUMMARY.md +652 -0
- package/docs/archived/FASTMCP_COMPLETE.md +428 -0
- package/docs/archived/FASTMCP_INTEGRATION_STATUS.md +288 -0
- package/docs/archived/FLOW-NEXUS-COMPLETE.md +269 -0
- package/docs/archived/INTEGRATION_CONFIRMED.md +351 -0
- package/docs/archived/ONNX_FINAL_REPORT.md +312 -0
- package/docs/archived/ONNX_IMPLEMENTATION_COMPLETE.md +215 -0
- package/docs/archived/ONNX_IMPLEMENTATION_SUMMARY.md +197 -0
- package/docs/archived/ONNX_SUCCESS_REPORT.md +271 -0
- package/docs/archived/OPENROUTER_PROXY_COMPLETE.md +494 -0
- package/docs/archived/PACKAGE-COMPLETE.md +138 -0
- package/docs/archived/README.md +27 -0
- package/docs/archived/RESEARCH_COMPLETE.txt +335 -0
- package/docs/archived/SDK-SETUP-COMPLETE.md +252 -0
- package/docs/guides/ALTERNATIVE_LLM_MODELS.md +524 -0
- package/docs/guides/DOCKER_AGENT_USAGE.md +352 -0
- package/docs/guides/IMPLEMENTATION_EXAMPLES.md +960 -0
- package/docs/guides/NPM-PUBLISH.md +218 -0
- package/docs/guides/README.md +17 -0
- package/docs/guides/agent-sdk.md +234 -0
- package/docs/integrations/CLAUDE_AGENTS_INTEGRATION.md +356 -0
- package/docs/integrations/CLAUDE_FLOW_INTEGRATION.md +535 -0
- package/docs/integrations/FASTMCP_CLI_INTEGRATION.md +503 -0
- package/docs/integrations/FLOW-NEXUS-INTEGRATION.md +319 -0
- package/docs/integrations/README.md +18 -0
- package/docs/integrations/fastmcp-implementation-plan.md +2516 -0
- package/docs/integrations/fastmcp-poc-integration.md +198 -0
- package/docs/router/ONNX_PHI4_RESEARCH.md +220 -0
- package/docs/router/ONNX_RUNTIME_INTEGRATION_PLAN.md +866 -0
- package/docs/router/PHI4_HYPEROPTIMIZATION_PLAN.md +2488 -0
- package/docs/router/README.md +552 -0
- package/docs/router/ROUTER_CONFIG_REFERENCE.md +577 -0
- package/docs/router/ROUTER_USER_GUIDE.md +865 -0
- package/docs/validation/DOCKER_MCP_VALIDATION.md +358 -0
- package/docs/validation/DOCKER_OPENROUTER_VALIDATION.md +443 -0
- package/docs/validation/FINAL_SYSTEM_VALIDATION.md +458 -0
- package/docs/validation/FINAL_VALIDATION_SUMMARY.md +409 -0
- package/docs/validation/MCP_CLI_TOOLS_VALIDATION.md +266 -0
- package/docs/validation/MODEL_VALIDATION_REPORT.md +386 -0
- package/docs/validation/OPENROUTER_VALIDATION_COMPLETE.md +382 -0
- package/docs/validation/README.md +20 -0
- package/docs/validation/ROUTER_VALIDATION.md +311 -0
- package/package.json +140 -0
|
@@ -0,0 +1,866 @@
|
|
|
1
|
+
# ONNX Runtime Integration Plan for Agentic-Flow
|
|
2
|
+
|
|
3
|
+
## Executive Summary
|
|
4
|
+
|
|
5
|
+
Integrate ONNX Runtime to enable high-performance local model inference on both CPU and GPU, providing 2-100x speedup over standard inference and enabling privacy-focused, cost-free local execution of AI models including Microsoft Phi-3.
|
|
6
|
+
|
|
7
|
+
## 🎯 Objectives
|
|
8
|
+
|
|
9
|
+
1. **Performance**: Achieve 2-100x inference speedup using ONNX Runtime optimizations
|
|
10
|
+
2. **Hardware Flexibility**: Support both CPU and GPU execution with automatic provider selection
|
|
11
|
+
3. **Cost Reduction**: Enable 100% cost-free inference for local model execution
|
|
12
|
+
4. **Privacy**: Provide fully local inference option for sensitive data
|
|
13
|
+
5. **Model Support**: Support Microsoft Phi-3 and other ONNX-compatible models
|
|
14
|
+
|
|
15
|
+
## 📊 Expected Performance Gains
|
|
16
|
+
|
|
17
|
+
### CPU Optimization
|
|
18
|
+
- **WebAssembly + SIMD**: 3.4x performance improvement
|
|
19
|
+
- **ONNX Runtime CPU**: 2x average performance gain vs PyTorch/TensorFlow
|
|
20
|
+
- **Graph Optimizations**: 47% → 0.5% CPU usage (94% reduction)
|
|
21
|
+
- **Inference Speed**: ~20 tokens/second (Phi-3-medium on Intel i9-10920X)
|
|
22
|
+
|
|
23
|
+
### GPU Acceleration
|
|
24
|
+
- **CUDA Execution Provider**: 10-100x speedup on NVIDIA GPUs
|
|
25
|
+
- **TensorRT**: Additional 2-5x optimization on top of CUDA
|
|
26
|
+
- **DirectML (Windows)**: Native GPU acceleration on Windows
|
|
27
|
+
- **WebGPU**: Browser/Electron GPU acceleration
|
|
28
|
+
|
|
29
|
+
## 🏗️ Architecture
|
|
30
|
+
|
|
31
|
+
### Component Overview
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
┌─────────────────────────────────────────────────────┐
|
|
35
|
+
│ Agentic-Flow Multi-Model Router │
|
|
36
|
+
│ │
|
|
37
|
+
│ ┌────────────┐ ┌──────────────┐ ┌─────────────┐ │
|
|
38
|
+
│ │ Anthropic │ │ OpenRouter │ │ ONNX Runtime│ │
|
|
39
|
+
│ │ Provider │ │ Provider │ │ Provider │ │
|
|
40
|
+
│ └────────────┘ └──────────────┘ └─────────────┘ │
|
|
41
|
+
│ │ │
|
|
42
|
+
│ ▼ │
|
|
43
|
+
│ ┌────────────────────────┐│
|
|
44
|
+
│ │ Execution Provider ││
|
|
45
|
+
│ │ Selector ││
|
|
46
|
+
│ └────────────────────────┘│
|
|
47
|
+
│ │ │
|
|
48
|
+
│ ┌─────────────────────────────┼──────┐ │
|
|
49
|
+
│ │ │ │ │ │
|
|
50
|
+
│ ▼ ▼ ▼ ▼ │
|
|
51
|
+
│ ┌────────┐ ┌─────────┐ ┌───────┐ ┌──────┐ │
|
|
52
|
+
│ │ CPU │ │ CUDA │ │ WebGPU│ │DirectML
|
|
53
|
+
│ │ (WASM) │ │(NVIDIA) │ │ │ │(Windows)
|
|
54
|
+
│ └────────┘ └─────────┘ └───────┘ └──────┘ │
|
|
55
|
+
└─────────────────────────────────────────────────────┘
|
|
56
|
+
│
|
|
57
|
+
▼
|
|
58
|
+
┌───────────────────────┐
|
|
59
|
+
│ ONNX Model Store │
|
|
60
|
+
│ │
|
|
61
|
+
│ • Phi-3-mini (4K) │
|
|
62
|
+
│ • Phi-3-medium (128K) │
|
|
63
|
+
│ • Llama-3 ONNX │
|
|
64
|
+
│ • Custom models │
|
|
65
|
+
└───────────────────────┘
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Data Flow
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
User Request
|
|
72
|
+
│
|
|
73
|
+
▼
|
|
74
|
+
Router (model selection)
|
|
75
|
+
│
|
|
76
|
+
▼
|
|
77
|
+
ONNX Provider
|
|
78
|
+
│
|
|
79
|
+
├─→ Load ONNX Model (if not cached)
|
|
80
|
+
│
|
|
81
|
+
├─→ Select Execution Provider
|
|
82
|
+
│ │
|
|
83
|
+
│ ├─→ Probe GPU availability
|
|
84
|
+
│ ├─→ Check CPU capabilities (SIMD, threads)
|
|
85
|
+
│ └─→ Prioritize: CUDA > WebGPU > DirectML > CPU
|
|
86
|
+
│
|
|
87
|
+
├─→ Create Inference Session
|
|
88
|
+
│ │
|
|
89
|
+
│ ├─→ Apply graph optimizations
|
|
90
|
+
│ ├─→ Configure threading
|
|
91
|
+
│ └─→ Enable SIMD if available
|
|
92
|
+
│
|
|
93
|
+
├─→ Run Inference
|
|
94
|
+
│ │
|
|
95
|
+
│ ├─→ Tokenize input
|
|
96
|
+
│ ├─→ Execute model
|
|
97
|
+
│ └─→ Decode output
|
|
98
|
+
│
|
|
99
|
+
└─→ Return Response
|
|
100
|
+
│
|
|
101
|
+
├─→ Update metrics (latency, tokens)
|
|
102
|
+
└─→ Cache model for reuse
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## 📦 NPM Packages Required
|
|
106
|
+
|
|
107
|
+
### Core Dependencies
|
|
108
|
+
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"dependencies": {
|
|
112
|
+
"onnxruntime-node": "^1.22.0",
|
|
113
|
+
"@xenova/transformers": "^2.6.0",
|
|
114
|
+
"sharp": "^0.32.0"
|
|
115
|
+
},
|
|
116
|
+
"devDependencies": {
|
|
117
|
+
"@types/node": "^20.0.0"
|
|
118
|
+
},
|
|
119
|
+
"optionalDependencies": {
|
|
120
|
+
"onnxruntime-node-gpu": "^1.22.0"
|
|
121
|
+
}
|
|
122
|
+
}
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Package Descriptions
|
|
126
|
+
|
|
127
|
+
1. **onnxruntime-node** (Required)
|
|
128
|
+
- Core ONNX Runtime for Node.js
|
|
129
|
+
- CPU execution provider
|
|
130
|
+
- Supports Node.js v16.x+ (recommend v20.x+)
|
|
131
|
+
- Size: ~20MB
|
|
132
|
+
|
|
133
|
+
2. **onnxruntime-node-gpu** (Optional)
|
|
134
|
+
- GPU acceleration via CUDA
|
|
135
|
+
- Requires NVIDIA GPU + CUDA 11.8 or 12.x
|
|
136
|
+
- Size: ~500MB (includes CUDA libraries)
|
|
137
|
+
|
|
138
|
+
3. **@xenova/transformers** (Helper)
|
|
139
|
+
- Transformers.js for tokenization
|
|
140
|
+
- Pre/post-processing utilities
|
|
141
|
+
- Model download management
|
|
142
|
+
|
|
143
|
+
4. **sharp** (Optional)
|
|
144
|
+
- Image processing for vision models
|
|
145
|
+
- Only needed for multimodal support
|
|
146
|
+
|
|
147
|
+
## 🔧 Implementation Phases
|
|
148
|
+
|
|
149
|
+
### Phase 1: Core ONNX Provider (Week 1)
|
|
150
|
+
|
|
151
|
+
**Objective**: Basic ONNX Runtime integration with CPU inference
|
|
152
|
+
|
|
153
|
+
**Tasks**:
|
|
154
|
+
1. Create ONNX provider class (`src/router/providers/onnx.ts`)
|
|
155
|
+
2. Implement model loading and caching
|
|
156
|
+
3. Add CPU execution provider support
|
|
157
|
+
4. Integrate tokenization with Transformers.js
|
|
158
|
+
5. Add basic error handling
|
|
159
|
+
|
|
160
|
+
**Deliverables**:
|
|
161
|
+
- `ONNXProvider` class implementing `LLMProvider` interface
|
|
162
|
+
- Model download and caching system
|
|
163
|
+
- CPU inference working with Phi-3-mini
|
|
164
|
+
|
|
165
|
+
**Code Structure**:
|
|
166
|
+
```typescript
|
|
167
|
+
// src/router/providers/onnx.ts
|
|
168
|
+
import * as ort from 'onnxruntime-node';
|
|
169
|
+
import { AutoTokenizer } from '@xenova/transformers';
|
|
170
|
+
|
|
171
|
+
export class ONNXProvider implements LLMProvider {
|
|
172
|
+
name = 'onnx';
|
|
173
|
+
type = 'onnx' as const;
|
|
174
|
+
supportsStreaming = false; // Phase 2
|
|
175
|
+
supportsTools = false; // Phase 3
|
|
176
|
+
supportsMCP = false;
|
|
177
|
+
|
|
178
|
+
private session: ort.InferenceSession | null = null;
|
|
179
|
+
private tokenizer: any = null;
|
|
180
|
+
private modelPath: string;
|
|
181
|
+
|
|
182
|
+
constructor(config: ProviderConfig) {
|
|
183
|
+
this.modelPath = config.models?.default || 'microsoft/Phi-3-mini-4k-instruct-onnx-cpu';
|
|
184
|
+
}
|
|
185
|
+
|
|
186
|
+
async chat(params: ChatParams): Promise<ChatResponse> {
|
|
187
|
+
// Initialize session if needed
|
|
188
|
+
if (!this.session) {
|
|
189
|
+
await this.initializeSession();
|
|
190
|
+
}
|
|
191
|
+
|
|
192
|
+
// Tokenize input
|
|
193
|
+
const inputs = await this.tokenize(params.messages);
|
|
194
|
+
|
|
195
|
+
// Run inference
|
|
196
|
+
const outputs = await this.session.run(inputs);
|
|
197
|
+
|
|
198
|
+
// Decode output
|
|
199
|
+
const response = await this.decode(outputs);
|
|
200
|
+
|
|
201
|
+
return this.formatResponse(response, params.model);
|
|
202
|
+
}
|
|
203
|
+
|
|
204
|
+
private async initializeSession(): Promise<void> {
|
|
205
|
+
// Download model if needed
|
|
206
|
+
const modelPath = await this.downloadModel();
|
|
207
|
+
|
|
208
|
+
// Create inference session with optimizations
|
|
209
|
+
this.session = await ort.InferenceSession.create(modelPath, {
|
|
210
|
+
executionProviders: ['cpu'],
|
|
211
|
+
graphOptimizationLevel: 'all',
|
|
212
|
+
enableCpuMemArena: true,
|
|
213
|
+
enableMemPattern: true,
|
|
214
|
+
executionMode: 'parallel',
|
|
215
|
+
intraOpNumThreads: 4,
|
|
216
|
+
interOpNumThreads: 2
|
|
217
|
+
});
|
|
218
|
+
|
|
219
|
+
// Initialize tokenizer
|
|
220
|
+
this.tokenizer = await AutoTokenizer.from_pretrained(this.modelPath);
|
|
221
|
+
}
|
|
222
|
+
}
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Testing**:
|
|
226
|
+
- Load Phi-3-mini ONNX model
|
|
227
|
+
- Run simple inference test
|
|
228
|
+
- Measure baseline CPU performance
|
|
229
|
+
- Verify memory usage < 2GB
|
|
230
|
+
|
|
231
|
+
### Phase 2: GPU Acceleration (Week 2)
|
|
232
|
+
|
|
233
|
+
**Objective**: Add GPU support with automatic provider selection
|
|
234
|
+
|
|
235
|
+
**Tasks**:
|
|
236
|
+
1. Implement execution provider detection
|
|
237
|
+
2. Add CUDA execution provider
|
|
238
|
+
3. Add DirectML execution provider (Windows)
|
|
239
|
+
4. Add WebGPU support (Electron/browser)
|
|
240
|
+
5. Implement automatic provider fallback
|
|
241
|
+
|
|
242
|
+
**Deliverables**:
|
|
243
|
+
- GPU detection and capability probing
|
|
244
|
+
- Multi-provider support with prioritization
|
|
245
|
+
- Automatic fallback chain
|
|
246
|
+
|
|
247
|
+
**Code Structure**:
|
|
248
|
+
```typescript
|
|
249
|
+
// src/router/providers/onnx.ts (additions)
|
|
250
|
+
export class ONNXProvider implements LLMProvider {
|
|
251
|
+
private async detectExecutionProviders(): Promise<string[]> {
|
|
252
|
+
const providers: string[] = [];
|
|
253
|
+
|
|
254
|
+
// Try CUDA first (NVIDIA GPU)
|
|
255
|
+
if (await this.isCUDAAvailable()) {
|
|
256
|
+
providers.push('cuda');
|
|
257
|
+
console.log('✅ CUDA execution provider available');
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
// Try DirectML (Windows GPU)
|
|
261
|
+
if (process.platform === 'win32' && await this.isDirectMLAvailable()) {
|
|
262
|
+
providers.push('dml');
|
|
263
|
+
console.log('✅ DirectML execution provider available');
|
|
264
|
+
}
|
|
265
|
+
|
|
266
|
+
// Try WebGPU (browser/Electron)
|
|
267
|
+
if (await this.isWebGPUAvailable()) {
|
|
268
|
+
providers.push('webgpu');
|
|
269
|
+
console.log('✅ WebGPU execution provider available');
|
|
270
|
+
}
|
|
271
|
+
|
|
272
|
+
// Always fallback to CPU
|
|
273
|
+
providers.push('cpu');
|
|
274
|
+
console.log('✅ CPU execution provider available');
|
|
275
|
+
|
|
276
|
+
return providers;
|
|
277
|
+
}
|
|
278
|
+
|
|
279
|
+
private async initializeSession(): Promise<void> {
|
|
280
|
+
const modelPath = await this.downloadModel();
|
|
281
|
+
const providers = await this.detectExecutionProviders();
|
|
282
|
+
|
|
283
|
+
this.session = await ort.InferenceSession.create(modelPath, {
|
|
284
|
+
executionProviders: providers,
|
|
285
|
+
graphOptimizationLevel: 'all',
|
|
286
|
+
enableCpuMemArena: true,
|
|
287
|
+
enableMemPattern: true
|
|
288
|
+
});
|
|
289
|
+
|
|
290
|
+
// Log which provider was selected
|
|
291
|
+
const selectedProvider = this.session.executionProvider;
|
|
292
|
+
console.log(`🚀 Using execution provider: ${selectedProvider}`);
|
|
293
|
+
}
|
|
294
|
+
|
|
295
|
+
private async isCUDAAvailable(): Promise<boolean> {
|
|
296
|
+
try {
|
|
297
|
+
// Check if CUDA libraries are available
|
|
298
|
+
const testSession = await ort.InferenceSession.create(
|
|
299
|
+
'path/to/test.onnx',
|
|
300
|
+
{ executionProviders: ['cuda'] }
|
|
301
|
+
);
|
|
302
|
+
return true;
|
|
303
|
+
} catch {
|
|
304
|
+
return false;
|
|
305
|
+
}
|
|
306
|
+
}
|
|
307
|
+
}
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
**Testing**:
|
|
311
|
+
- Test on CPU-only machine
|
|
312
|
+
- Test on NVIDIA GPU machine
|
|
313
|
+
- Test on Windows with DirectML
|
|
314
|
+
- Verify automatic fallback works
|
|
315
|
+
- Benchmark performance improvements
|
|
316
|
+
|
|
317
|
+
**Expected Results**:
|
|
318
|
+
- CUDA: 10-100x faster than CPU
|
|
319
|
+
- DirectML: 5-20x faster than CPU
|
|
320
|
+
- Automatic selection working
|
|
321
|
+
|
|
322
|
+
### Phase 3: Optimization & Streaming (Week 3)
|
|
323
|
+
|
|
324
|
+
**Objective**: Performance optimizations and streaming support
|
|
325
|
+
|
|
326
|
+
**Tasks**:
|
|
327
|
+
1. Implement streaming inference
|
|
328
|
+
2. Add WebAssembly SIMD optimization
|
|
329
|
+
3. Implement model quantization support
|
|
330
|
+
4. Add KV cache for faster generation
|
|
331
|
+
5. Optimize memory usage
|
|
332
|
+
|
|
333
|
+
**Deliverables**:
|
|
334
|
+
- Streaming token generation
|
|
335
|
+
- SIMD-optimized CPU inference
|
|
336
|
+
- INT8/INT4 quantized model support
|
|
337
|
+
- Reduced memory footprint
|
|
338
|
+
|
|
339
|
+
**Code Structure**:
|
|
340
|
+
```typescript
|
|
341
|
+
// src/router/providers/onnx.ts (streaming)
|
|
342
|
+
export class ONNXProvider implements LLMProvider {
|
|
343
|
+
supportsStreaming = true;
|
|
344
|
+
|
|
345
|
+
async *stream(params: ChatParams): AsyncGenerator<StreamChunk> {
|
|
346
|
+
if (!this.session) {
|
|
347
|
+
await this.initializeSession();
|
|
348
|
+
}
|
|
349
|
+
|
|
350
|
+
const inputs = await this.tokenize(params.messages);
|
|
351
|
+
const maxTokens = params.maxTokens || 512;
|
|
352
|
+
|
|
353
|
+
let generatedTokens = [];
|
|
354
|
+
|
|
355
|
+
for (let i = 0; i < maxTokens; i++) {
|
|
356
|
+
// Run inference for next token
|
|
357
|
+
const outputs = await this.session.run({
|
|
358
|
+
...inputs,
|
|
359
|
+
past_key_values: this.kvCache // Use KV cache for speed
|
|
360
|
+
});
|
|
361
|
+
|
|
362
|
+
// Extract next token
|
|
363
|
+
const nextToken = this.sampleToken(outputs.logits);
|
|
364
|
+
generatedTokens.push(nextToken);
|
|
365
|
+
|
|
366
|
+
// Update KV cache
|
|
367
|
+
this.updateKVCache(outputs.present_key_values);
|
|
368
|
+
|
|
369
|
+
// Decode and yield
|
|
370
|
+
const text = await this.tokenizer.decode([nextToken]);
|
|
371
|
+
|
|
372
|
+
yield {
|
|
373
|
+
type: 'content_block_delta',
|
|
374
|
+
delta: {
|
|
375
|
+
type: 'text_delta',
|
|
376
|
+
text
|
|
377
|
+
}
|
|
378
|
+
};
|
|
379
|
+
|
|
380
|
+
// Stop on EOS token
|
|
381
|
+
if (nextToken === this.tokenizer.eos_token_id) {
|
|
382
|
+
yield { type: 'message_stop' };
|
|
383
|
+
break;
|
|
384
|
+
}
|
|
385
|
+
}
|
|
386
|
+
}
|
|
387
|
+
}
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
**Optimizations**:
|
|
391
|
+
```typescript
|
|
392
|
+
// WASM + SIMD configuration
|
|
393
|
+
const sessionOptions: ort.InferenceSession.SessionOptions = {
|
|
394
|
+
executionProviders: [{
|
|
395
|
+
name: 'wasm',
|
|
396
|
+
options: {
|
|
397
|
+
simd: true,
|
|
398
|
+
threads: navigator.hardwareConcurrency || 4
|
|
399
|
+
}
|
|
400
|
+
}],
|
|
401
|
+
graphOptimizationLevel: 'all',
|
|
402
|
+
enableCpuMemArena: true,
|
|
403
|
+
enableMemPattern: true,
|
|
404
|
+
executionMode: 'parallel'
|
|
405
|
+
};
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
**Testing**:
|
|
409
|
+
- Measure streaming latency
|
|
410
|
+
- Verify SIMD activation
|
|
411
|
+
- Test quantized models (INT8, INT4)
|
|
412
|
+
- Benchmark KV cache improvements
|
|
413
|
+
- Memory profiling
|
|
414
|
+
|
|
415
|
+
**Expected Results**:
|
|
416
|
+
- Streaming: <100ms time to first token
|
|
417
|
+
- SIMD: 3.4x CPU performance improvement
|
|
418
|
+
- Quantization: 2-4x faster inference, 50% less memory
|
|
419
|
+
- KV cache: 2-3x faster multi-turn conversations
|
|
420
|
+
|
|
421
|
+
### Phase 4: Model Management (Week 4)
|
|
422
|
+
|
|
423
|
+
**Objective**: Model download, caching, and selection
|
|
424
|
+
|
|
425
|
+
**Tasks**:
|
|
426
|
+
1. Implement HuggingFace model downloader
|
|
427
|
+
2. Add local model caching
|
|
428
|
+
3. Create model registry
|
|
429
|
+
4. Add model version management
|
|
430
|
+
5. Implement automatic model selection based on hardware
|
|
431
|
+
|
|
432
|
+
**Deliverables**:
|
|
433
|
+
- Automatic model download from HuggingFace
|
|
434
|
+
- Local model cache (~/.agentic-flow/onnx-models/)
|
|
435
|
+
- Model registry with hardware requirements
|
|
436
|
+
- Smart model selection
|
|
437
|
+
|
|
438
|
+
**Code Structure**:
|
|
439
|
+
```typescript
|
|
440
|
+
// src/router/onnx/model-manager.ts
|
|
441
|
+
export class ONNXModelManager {
|
|
442
|
+
private cacheDir = join(homedir(), '.agentic-flow', 'onnx-models');
|
|
443
|
+
|
|
444
|
+
private models = {
|
|
445
|
+
'phi-3-mini-4k-cpu': {
|
|
446
|
+
huggingface: 'microsoft/Phi-3-mini-4k-instruct-onnx-cpu',
|
|
447
|
+
files: ['cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4'],
|
|
448
|
+
size: '2.4GB',
|
|
449
|
+
requirements: { ram: '4GB', gpu: false }
|
|
450
|
+
},
|
|
451
|
+
'phi-3-mini-4k-gpu': {
|
|
452
|
+
huggingface: 'microsoft/Phi-3-mini-4k-instruct-onnx-cuda',
|
|
453
|
+
files: ['cuda/cuda-fp16'],
|
|
454
|
+
size: '4.8GB',
|
|
455
|
+
requirements: { ram: '8GB', gpu: 'CUDA' }
|
|
456
|
+
},
|
|
457
|
+
'phi-3-medium-128k-cpu': {
|
|
458
|
+
huggingface: 'microsoft/Phi-3-medium-128k-instruct-onnx-cpu',
|
|
459
|
+
files: ['cpu_and_mobile/cpu-int4-rtn-block-32'],
|
|
460
|
+
size: '8.2GB',
|
|
461
|
+
requirements: { ram: '16GB', gpu: false }
|
|
462
|
+
}
|
|
463
|
+
};
|
|
464
|
+
|
|
465
|
+
async downloadModel(modelId: string): Promise<string> {
|
|
466
|
+
const modelInfo = this.models[modelId];
|
|
467
|
+
if (!modelInfo) {
|
|
468
|
+
throw new Error(`Unknown model: ${modelId}`);
|
|
469
|
+
}
|
|
470
|
+
|
|
471
|
+
const modelPath = join(this.cacheDir, modelId);
|
|
472
|
+
|
|
473
|
+
// Check if already downloaded
|
|
474
|
+
if (existsSync(join(modelPath, 'model.onnx'))) {
|
|
475
|
+
console.log(`✅ Model ${modelId} already cached`);
|
|
476
|
+
return modelPath;
|
|
477
|
+
}
|
|
478
|
+
|
|
479
|
+
console.log(`📥 Downloading ${modelId} (${modelInfo.size})...`);
|
|
480
|
+
|
|
481
|
+
// Download from HuggingFace
|
|
482
|
+
await this.downloadFromHuggingFace(
|
|
483
|
+
modelInfo.huggingface,
|
|
484
|
+
modelInfo.files,
|
|
485
|
+
modelPath
|
|
486
|
+
);
|
|
487
|
+
|
|
488
|
+
console.log(`✅ Model ${modelId} downloaded to ${modelPath}`);
|
|
489
|
+
return modelPath;
|
|
490
|
+
}
|
|
491
|
+
|
|
492
|
+
selectModelForHardware(): string {
|
|
493
|
+
// Detect hardware capabilities
|
|
494
|
+
const hasGPU = this.detectGPU();
|
|
495
|
+
const ram = this.getAvailableRAM();
|
|
496
|
+
|
|
497
|
+
if (hasGPU && ram >= 8) {
|
|
498
|
+
return 'phi-3-mini-4k-gpu';
|
|
499
|
+
} else if (ram >= 16) {
|
|
500
|
+
return 'phi-3-medium-128k-cpu';
|
|
501
|
+
} else {
|
|
502
|
+
return 'phi-3-mini-4k-cpu';
|
|
503
|
+
}
|
|
504
|
+
}
|
|
505
|
+
}
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
**Testing**:
|
|
509
|
+
- Test model download from HuggingFace
|
|
510
|
+
- Verify caching works correctly
|
|
511
|
+
- Test automatic model selection
|
|
512
|
+
- Test with multiple models
|
|
513
|
+
- Verify disk space management
|
|
514
|
+
|
|
515
|
+
### Phase 5: Integration & CLI (Week 5)
|
|
516
|
+
|
|
517
|
+
**Objective**: Integrate ONNX provider into router and add CLI commands
|
|
518
|
+
|
|
519
|
+
**Tasks**:
|
|
520
|
+
1. Add ONNX provider to router initialization
|
|
521
|
+
2. Add CLI commands for ONNX management
|
|
522
|
+
3. Implement cost tracking (always $0 for local)
|
|
523
|
+
4. Add performance benchmarking
|
|
524
|
+
5. Update routing rules for ONNX
|
|
525
|
+
|
|
526
|
+
**Deliverables**:
|
|
527
|
+
- ONNX provider in multi-model router
|
|
528
|
+
- CLI commands for model management
|
|
529
|
+
- Benchmark utilities
|
|
530
|
+
- Updated documentation
|
|
531
|
+
|
|
532
|
+
**CLI Commands**:
|
|
533
|
+
```bash
|
|
534
|
+
# List available ONNX models
|
|
535
|
+
npx agentic-flow onnx models
|
|
536
|
+
|
|
537
|
+
# Download a model
|
|
538
|
+
npx agentic-flow onnx download phi-3-mini-4k-cpu
|
|
539
|
+
|
|
540
|
+
# List downloaded models
|
|
541
|
+
npx agentic-flow onnx list
|
|
542
|
+
|
|
543
|
+
# Test ONNX inference
|
|
544
|
+
npx agentic-flow onnx test --model phi-3-mini-4k-cpu
|
|
545
|
+
|
|
546
|
+
# Benchmark performance
|
|
547
|
+
npx agentic-flow onnx benchmark --model phi-3-mini-4k-cpu
|
|
548
|
+
|
|
549
|
+
# Check hardware capabilities
|
|
550
|
+
npx agentic-flow onnx info
|
|
551
|
+
|
|
552
|
+
# Use ONNX provider for inference
|
|
553
|
+
npx agentic-flow --provider onnx --model phi-3-mini-4k-cpu --task "Hello world"
|
|
554
|
+
|
|
555
|
+
# Use ONNX with GPU
|
|
556
|
+
npx agentic-flow --provider onnx --model phi-3-mini-4k-gpu --execution-provider cuda --task "Complex task"
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
**Router Integration**:
|
|
560
|
+
```typescript
|
|
561
|
+
// src/router/router.ts
|
|
562
|
+
private initializeProviders(): void {
|
|
563
|
+
// ... existing providers ...
|
|
564
|
+
|
|
565
|
+
// Initialize ONNX
|
|
566
|
+
if (this.config.providers.onnx) {
|
|
567
|
+
try {
|
|
568
|
+
const provider = new ONNXProvider(this.config.providers.onnx);
|
|
569
|
+
this.providers.set('onnx', provider);
|
|
570
|
+
console.log('✅ ONNX provider initialized');
|
|
571
|
+
} catch (error) {
|
|
572
|
+
console.error('❌ Failed to initialize ONNX:', error);
|
|
573
|
+
}
|
|
574
|
+
}
|
|
575
|
+
}
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
**Configuration**:
|
|
579
|
+
```json
|
|
580
|
+
{
|
|
581
|
+
"providers": {
|
|
582
|
+
"onnx": {
|
|
583
|
+
"models": {
|
|
584
|
+
"default": "phi-3-mini-4k-cpu",
|
|
585
|
+
"fast": "phi-3-mini-4k-cpu",
|
|
586
|
+
"advanced": "phi-3-medium-128k-cpu",
|
|
587
|
+
"gpu": "phi-3-mini-4k-gpu"
|
|
588
|
+
},
|
|
589
|
+
"executionProviders": ["cuda", "dml", "cpu"],
|
|
590
|
+
"graphOptimizationLevel": "all",
|
|
591
|
+
"enableMemoryOptimization": true,
|
|
592
|
+
"threads": 4,
|
|
593
|
+
"timeout": 60000
|
|
594
|
+
}
|
|
595
|
+
},
|
|
596
|
+
"routing": {
|
|
597
|
+
"rules": [
|
|
598
|
+
{
|
|
599
|
+
"condition": {
|
|
600
|
+
"privacy": "high",
|
|
601
|
+
"localOnly": true
|
|
602
|
+
},
|
|
603
|
+
"action": {
|
|
604
|
+
"provider": "onnx",
|
|
605
|
+
"model": "phi-3-mini-4k-cpu"
|
|
606
|
+
},
|
|
607
|
+
"reason": "Privacy-sensitive tasks use local ONNX models"
|
|
608
|
+
},
|
|
609
|
+
{
|
|
610
|
+
"condition": {
|
|
611
|
+
"agentType": ["researcher"],
|
|
612
|
+
"complexity": "low"
|
|
613
|
+
},
|
|
614
|
+
"action": {
|
|
615
|
+
"provider": "onnx",
|
|
616
|
+
"model": "phi-3-mini-4k-cpu"
|
|
617
|
+
},
|
|
618
|
+
"reason": "Simple tasks use free local inference"
|
|
619
|
+
}
|
|
620
|
+
]
|
|
621
|
+
}
|
|
622
|
+
}
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
### Phase 6: Advanced Features (Week 6)
|
|
626
|
+
|
|
627
|
+
**Objective**: Vision support, tool calling, and production optimizations
|
|
628
|
+
|
|
629
|
+
**Tasks**:
|
|
630
|
+
1. Add multimodal support (vision)
|
|
631
|
+
2. Implement tool calling for ONNX models
|
|
632
|
+
3. Add model fine-tuning support
|
|
633
|
+
4. Implement distributed inference
|
|
634
|
+
5. Production hardening
|
|
635
|
+
|
|
636
|
+
**Features**:
|
|
637
|
+
- Phi-3-vision for image understanding
|
|
638
|
+
- Custom tool calling layer
|
|
639
|
+
- Model adaptation for specific tasks
|
|
640
|
+
- Multi-GPU support
|
|
641
|
+
- Load balancing across models
|
|
642
|
+
|
|
643
|
+
## 📋 Configuration Examples
|
|
644
|
+
|
|
645
|
+
### Basic CPU Configuration
|
|
646
|
+
|
|
647
|
+
```json
|
|
648
|
+
{
|
|
649
|
+
"version": "1.0",
|
|
650
|
+
"defaultProvider": "onnx",
|
|
651
|
+
"providers": {
|
|
652
|
+
"onnx": {
|
|
653
|
+
"models": {
|
|
654
|
+
"default": "phi-3-mini-4k-cpu"
|
|
655
|
+
},
|
|
656
|
+
"executionProviders": ["cpu"],
|
|
657
|
+
"threads": 4
|
|
658
|
+
}
|
|
659
|
+
}
|
|
660
|
+
}
|
|
661
|
+
```
|
|
662
|
+
|
|
663
|
+
### GPU Optimized Configuration
|
|
664
|
+
|
|
665
|
+
```json
|
|
666
|
+
{
|
|
667
|
+
"version": "1.0",
|
|
668
|
+
"defaultProvider": "onnx",
|
|
669
|
+
"providers": {
|
|
670
|
+
"onnx": {
|
|
671
|
+
"models": {
|
|
672
|
+
"default": "phi-3-mini-4k-gpu"
|
|
673
|
+
},
|
|
674
|
+
"executionProviders": ["cuda", "cpu"],
|
|
675
|
+
"cudaOptions": {
|
|
676
|
+
"deviceId": 0,
|
|
677
|
+
"gpuMemLimit": 4294967296,
|
|
678
|
+
"arenExtendStrategy": "kSameAsRequested"
|
|
679
|
+
}
|
|
680
|
+
}
|
|
681
|
+
}
|
|
682
|
+
}
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
### Hybrid Cloud + Local Configuration
|
|
686
|
+
|
|
687
|
+
```json
|
|
688
|
+
{
|
|
689
|
+
"version": "1.0",
|
|
690
|
+
"defaultProvider": "anthropic",
|
|
691
|
+
"fallbackChain": ["anthropic", "openrouter", "onnx"],
|
|
692
|
+
"providers": {
|
|
693
|
+
"anthropic": { ... },
|
|
694
|
+
"openrouter": { ... },
|
|
695
|
+
"onnx": {
|
|
696
|
+
"models": {
|
|
697
|
+
"default": "phi-3-mini-4k-cpu"
|
|
698
|
+
}
|
|
699
|
+
}
|
|
700
|
+
},
|
|
701
|
+
"routing": {
|
|
702
|
+
"mode": "rule-based",
|
|
703
|
+
"rules": [
|
|
704
|
+
{
|
|
705
|
+
"condition": { "privacy": "high" },
|
|
706
|
+
"action": { "provider": "onnx" }
|
|
707
|
+
},
|
|
708
|
+
{
|
|
709
|
+
"condition": { "complexity": "low" },
|
|
710
|
+
"action": { "provider": "onnx" }
|
|
711
|
+
},
|
|
712
|
+
{
|
|
713
|
+
"condition": { "complexity": "high" },
|
|
714
|
+
"action": { "provider": "anthropic" }
|
|
715
|
+
}
|
|
716
|
+
]
|
|
717
|
+
}
|
|
718
|
+
}
|
|
719
|
+
```
|
|
720
|
+
|
|
721
|
+
## 🎯 Success Metrics
|
|
722
|
+
|
|
723
|
+
### Performance Targets
|
|
724
|
+
|
|
725
|
+
| Metric | Target | Measurement |
|
|
726
|
+
|--------|--------|-------------|
|
|
727
|
+
| CPU Inference Speed | 15-20 tokens/sec | Phi-3-mini on i9-10920X |
|
|
728
|
+
| GPU Inference Speed | 100+ tokens/sec | Phi-3-mini on RTX 3090 |
|
|
729
|
+
| Time to First Token | <500ms | Streaming mode |
|
|
730
|
+
| Memory Usage (CPU) | <4GB | Phi-3-mini INT4 |
|
|
731
|
+
| Model Load Time | <10s | First request only |
|
|
732
|
+
|
|
733
|
+
### Cost Savings
|
|
734
|
+
|
|
735
|
+
| Scenario | Cloud Cost | ONNX Cost | Savings |
|
|
736
|
+
|----------|-----------|-----------|---------|
|
|
737
|
+
| 1M tokens (research) | $3.00 | $0.00 | 100% |
|
|
738
|
+
| 1M tokens (coding) | $15.00 | $0.00 | 100% |
|
|
739
|
+
| Monthly development | $100-500 | $0.00 | 100% |
|
|
740
|
+
|
|
741
|
+
### Quality Targets
|
|
742
|
+
|
|
743
|
+
| Metric | Target |
|
|
744
|
+
|--------|--------|
|
|
745
|
+
| Accuracy vs Cloud | >95% for simple tasks |
|
|
746
|
+
| Success Rate | >99% (no network failures) |
|
|
747
|
+
| Latency Variance | <10% (consistent) |
|
|
748
|
+
|
|
749
|
+
## 🔒 Security & Privacy
|
|
750
|
+
|
|
751
|
+
### Benefits
|
|
752
|
+
- ✅ No data sent to external services
|
|
753
|
+
- ✅ No API keys required for local models
|
|
754
|
+
- ✅ Fully offline operation possible
|
|
755
|
+
- ✅ HIPAA/GDPR compliant by design
|
|
756
|
+
- ✅ No usage tracking or telemetry
|
|
757
|
+
|
|
758
|
+
### Considerations
|
|
759
|
+
- Models downloaded from HuggingFace (verify checksums)
|
|
760
|
+
- Model license compliance (MIT for Phi-3)
|
|
761
|
+
- Disk space for model storage (2-10GB per model)
|
|
762
|
+
|
|
763
|
+
## 🐛 Known Limitations
|
|
764
|
+
|
|
765
|
+
1. **Model Size**: ONNX models are 2-10GB, requiring significant disk space
|
|
766
|
+
2. **Initial Download**: First-time model download takes 5-30 minutes
|
|
767
|
+
3. **Hardware Requirements**: GPU models require NVIDIA GPU with CUDA
|
|
768
|
+
4. **Tool Calling**: Limited compared to Claude/GPT (requires custom implementation)
|
|
769
|
+
5. **Streaming**: Initial implementation may have higher latency than cloud
|
|
770
|
+
6. **Context Length**: Phi-3-mini limited to 4K tokens (vs 200K for Claude)
|
|
771
|
+
|
|
772
|
+
## 📊 Benchmarking Plan
|
|
773
|
+
|
|
774
|
+
### Test Suite
|
|
775
|
+
|
|
776
|
+
```bash
|
|
777
|
+
# CPU Benchmark
|
|
778
|
+
npx agentic-flow onnx benchmark \
|
|
779
|
+
--model phi-3-mini-4k-cpu \
|
|
780
|
+
--provider cpu \
|
|
781
|
+
--iterations 100
|
|
782
|
+
|
|
783
|
+
# GPU Benchmark
|
|
784
|
+
npx agentic-flow onnx benchmark \
|
|
785
|
+
--model phi-3-mini-4k-gpu \
|
|
786
|
+
--provider cuda \
|
|
787
|
+
--iterations 100
|
|
788
|
+
|
|
789
|
+
# Comparison Benchmark
|
|
790
|
+
npx agentic-flow router benchmark \
|
|
791
|
+
--providers "onnx,anthropic,openrouter" \
|
|
792
|
+
--task "Write a hello world function" \
|
|
793
|
+
--iterations 50
|
|
794
|
+
```
|
|
795
|
+
|
|
796
|
+
### Benchmark Metrics
|
|
797
|
+
|
|
798
|
+
1. **Latency**
|
|
799
|
+
- Time to first token
|
|
800
|
+
- Tokens per second
|
|
801
|
+
- End-to-end request time
|
|
802
|
+
|
|
803
|
+
2. **Throughput**
|
|
804
|
+
- Concurrent requests
|
|
805
|
+
- Batch processing
|
|
806
|
+
|
|
807
|
+
3. **Resource Usage**
|
|
808
|
+
- CPU utilization
|
|
809
|
+
- Memory consumption
|
|
810
|
+
- GPU memory
|
|
811
|
+
- Disk I/O
|
|
812
|
+
|
|
813
|
+
4. **Quality**
|
|
814
|
+
- Response accuracy
|
|
815
|
+
- Instruction following
|
|
816
|
+
- Consistency
|
|
817
|
+
|
|
818
|
+
## 🚀 Deployment Strategy
|
|
819
|
+
|
|
820
|
+
### Development Phase
|
|
821
|
+
- Use ONNX CPU for testing
|
|
822
|
+
- Small models (Phi-3-mini)
|
|
823
|
+
- Local development only
|
|
824
|
+
|
|
825
|
+
### Staging Phase
|
|
826
|
+
- Test GPU acceleration
|
|
827
|
+
- Larger models (Phi-3-medium)
|
|
828
|
+
- Performance benchmarking
|
|
829
|
+
|
|
830
|
+
### Production Phase
|
|
831
|
+
- Hybrid cloud + local routing
|
|
832
|
+
- GPU for high-throughput
|
|
833
|
+
- Fallback to cloud for complex tasks
|
|
834
|
+
|
|
835
|
+
## 📚 Resources
|
|
836
|
+
|
|
837
|
+
### Documentation
|
|
838
|
+
- ONNX Runtime: https://onnxruntime.ai/
|
|
839
|
+
- Phi-3 Models: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-cpu
|
|
840
|
+
- Transformers.js: https://huggingface.co/docs/transformers.js
|
|
841
|
+
|
|
842
|
+
### Examples
|
|
843
|
+
- Phi-3 Chat: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/chat
|
|
844
|
+
- WebGPU RAG: https://github.com/microsoft/Phi-3CookBook/tree/main/code/08.RAG/rag_webgpu_chat
|
|
845
|
+
|
|
846
|
+
### Performance Guides
|
|
847
|
+
- CPU Optimization: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html
|
|
848
|
+
- GPU Providers: https://onnxruntime.ai/docs/execution-providers/
|
|
849
|
+
|
|
850
|
+
## ✅ Next Steps
|
|
851
|
+
|
|
852
|
+
1. **Immediate**: Review and approve this implementation plan
|
|
853
|
+
2. **Week 1**: Begin Phase 1 (Core ONNX Provider)
|
|
854
|
+
3. **Week 2**: Implement Phase 2 (GPU Acceleration)
|
|
855
|
+
4. **Week 3**: Complete Phase 3 (Optimization)
|
|
856
|
+
5. **Week 4**: Execute Phase 4 (Model Management)
|
|
857
|
+
6. **Week 5**: Finish Phase 5 (Integration)
|
|
858
|
+
7. **Week 6**: Deploy Phase 6 (Advanced Features)
|
|
859
|
+
|
|
860
|
+
---
|
|
861
|
+
|
|
862
|
+
**Status**: Ready for Implementation
|
|
863
|
+
**Estimated Timeline**: 6 weeks
|
|
864
|
+
**Estimated Effort**: 120-150 hours
|
|
865
|
+
**Risk Level**: Low (proven technology, clear path)
|
|
866
|
+
**ROI**: High (100% cost savings for local inference, 2-100x performance improvement)
|