terradev-cli 4.0.4__tar.gz → 4.0.6__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- terradev_cli-4.0.6/.clawhubignore +5 -0
- terradev_cli-4.0.6/.ebextensions/python.config +6 -0
- terradev_cli-4.0.6/.env.example +84 -0
- terradev_cli-4.0.6/.env.template +164 -0
- terradev_cli-4.0.6/.github/workflows/ci.yml +105 -0
- terradev_cli-4.0.6/.github/workflows/deploy.yml +30 -0
- terradev_cli-4.0.6/.github/workflows/release.yml +50 -0
- terradev_cli-4.0.6/.github/workflows/static.yml +42 -0
- terradev_cli-4.0.6/BingSiteAuth.xml +4 -0
- terradev_cli-4.0.6/CHANGELOG_v3.7.1.md +77 -0
- terradev_cli-4.0.6/CHANGELOG_v4.0.2.md +79 -0
- terradev_cli-4.0.6/CUCO_INTEGRATION_GUIDE.md +341 -0
- terradev_cli-4.0.6/Dockerfile +55 -0
- terradev_cli-4.0.6/Dockerfile.hub +70 -0
- terradev_cli-4.0.6/LICENSE +114 -0
- terradev_cli-4.0.6/MANIFEST.in +33 -0
- terradev_cli-4.0.6/PAGES_README.md +45 -0
- terradev_cli-4.0.6/PHASE1_SSO_IMPLEMENTATION_SUMMARY.md +196 -0
- {terradev_cli-4.0.4/terradev_cli.egg-info → terradev_cli-4.0.6}/PKG-INFO +372 -53
- terradev_cli-4.0.6/PROGRESS_UPDATE_JUPYTER.ipynb +445 -0
- terradev_cli-4.0.4/PKG-INFO → terradev_cli-4.0.6/README.md +306 -80
- terradev_cli-4.0.6/README_ACTION.md +200 -0
- terradev_cli-4.0.6/README_DOCKER.md +287 -0
- terradev_cli-4.0.6/README_JUPYTER.md +382 -0
- terradev_cli-4.0.6/README_long.md +478 -0
- terradev_cli-4.0.6/RELEASE_NOTES_v4.0.2.md +117 -0
- terradev_cli-4.0.6/SGLANG_OPTIMIZATION_IMPLEMENTATION_COMPLETE.md +237 -0
- terradev_cli-4.0.6/SKILL.md +241 -0
- terradev_cli-4.0.6/USER_GUIDE.md +601 -0
- terradev_cli-4.0.6/VLLM_AUTO_OPTIMIZATION_GUIDE.md +294 -0
- terradev_cli-4.0.6/VLLM_OPTIMIZATION_GUIDE.md +220 -0
- terradev_cli-4.0.6/VLLM_OPTIMIZATION_SUMMARY.md +155 -0
- terradev_cli-4.0.6/action.py +106 -0
- terradev_cli-4.0.6/action.yml +50 -0
- terradev_cli-4.0.6/activations/973dfe463ec85785f5f95af5ba3906eedb2d931c24e69824a89ea65dba4e813b.json +1 -0
- terradev_cli-4.0.6/agentic_ai_throughput_report.md +124 -0
- terradev_cli-4.0.6/agentic_ai_token_throughput_analysis.png +0 -0
- terradev_cli-4.0.6/agentic_ai_token_throughput_visualization.py +640 -0
- terradev_cli-4.0.6/ai-discovery.json +114 -0
- terradev_cli-4.0.6/analysis/api-differences-comparison.tf +311 -0
- terradev_cli-4.0.6/analysis/cost-structure-differences.tf +198 -0
- terradev_cli-4.0.6/analysis/inference-arbitrage-pivot.tf +322 -0
- terradev_cli-4.0.6/apis/free-apis-latency-arbitrage.tf +504 -0
- terradev_cli-4.0.6/aws_ssm_starter.py +251 -0
- terradev_cli-4.0.6/b2b_roi_analysis.json +142 -0
- terradev_cli-4.0.6/bottleneck_analysis.png +0 -0
- terradev_cli-4.0.6/bottleneck_analysis_viz.py +290 -0
- terradev_cli-4.0.6/bucket-policy.json +12 -0
- terradev_cli-4.0.6/build_v3.7.1.sh +42 -0
- terradev_cli-4.0.6/business/compute-arbitrage-business.md +383 -0
- terradev_cli-4.0.6/byoapi_real_snapshot.json +107 -0
- terradev_cli-4.0.6/check_logs_fix.py +91 -0
- terradev_cli-4.0.6/clusters/glm-5/README.md +73 -0
- terradev_cli-4.0.6/clusters/glm-5/helm/values-glm5.yaml +152 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/hpa.yaml +55 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/model-cache-pvc.yaml +35 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/namespace.yaml +8 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/pdb.yaml +14 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/service.yaml +49 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/sglang-deployment.yaml +176 -0
- terradev_cli-4.0.6/clusters/glm-5/k8s/vllm-deployment.yaml +215 -0
- terradev_cli-4.0.6/clusters/glm-5/task.yaml +81 -0
- terradev_cli-4.0.6/clusters/glm-5/terraform/main.tf +190 -0
- terradev_cli-4.0.6/clusters/glm-5/terraform/outputs.tf +60 -0
- terradev_cli-4.0.6/clusters/glm-5/terraform/variables.tf +108 -0
- terradev_cli-4.0.6/clusters/moe-template/README.md +141 -0
- terradev_cli-4.0.6/clusters/moe-template/helm/values-moe.yaml +231 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/deployment.yaml +358 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/hpa.yaml +51 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/model-cache-pvc.yaml +19 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/namespace.yaml +9 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/pdb.yaml +14 -0
- terradev_cli-4.0.6/clusters/moe-template/k8s/service.yaml +47 -0
- terradev_cli-4.0.6/clusters/moe-template/task.yaml +159 -0
- terradev_cli-4.0.6/clusters/moe-template/terraform/main.tf +213 -0
- terradev_cli-4.0.6/clusters/moe-template/terraform/outputs.tf +60 -0
- terradev_cli-4.0.6/clusters/moe-template/terraform/variables.tf +170 -0
- terradev_cli-4.0.6/clusters/rag-template/README.md +60 -0
- terradev_cli-4.0.6/clusters/rag-template/helm/values-rag.yaml +106 -0
- terradev_cli-4.0.6/clusters/rag-template/k8s/deployment.yaml +290 -0
- terradev_cli-4.0.6/clusters/rag-template/terraform/main.tf +84 -0
- terradev_cli-4.0.6/complete_weakness_resolution_report.json +1336 -0
- terradev_cli-4.0.6/comprehensive_throughput_quantification.png +0 -0
- terradev_cli-4.0.6/comprehensive_throughput_quantification.py +547 -0
- terradev_cli-4.0.6/config.json +1 -0
- terradev_cli-4.0.6/data_sources_methodology.md +90 -0
- terradev_cli-4.0.6/demo/RECORDING_GUIDE.md +80 -0
- terradev_cli-4.0.6/demo/generate_gif.py +339 -0
- terradev_cli-4.0.6/demo/terradev-demo.gif +0 -0
- terradev_cli-4.0.6/demo_k8s_config.yaml +35 -0
- terradev_cli-4.0.6/deploy_correct_pricing.py +91 -0
- terradev_cli-4.0.6/deploy_enhanced_server.py +660 -0
- terradev_cli-4.0.6/deploy_exact_servers.py +70 -0
- terradev_cli-4.0.6/deploy_simple_enhanced.py +81 -0
- terradev_cli-4.0.6/deploy_telemetry_aws.py +465 -0
- terradev_cli-4.0.6/deploy_to_aws.py +239 -0
- terradev_cli-4.0.6/direct_aws_starter.py +209 -0
- terradev_cli-4.0.6/docker-compose.yml +101 -0
- terradev_cli-4.0.6/docs/ADVANCED_FINANCIAL_INNOVATIONS.md +173 -0
- terradev_cli-4.0.6/docs/API_DOCUMENTATION.md +901 -0
- terradev_cli-4.0.6/docs/BingSiteAuth.xml +4 -0
- terradev_cli-4.0.6/docs/FINOPS_ATTRIBUTION_SYSTEM.md +286 -0
- terradev_cli-4.0.6/docs/USER_GUIDE.md +952 -0
- terradev_cli-4.0.6/docs/architecture.md +228 -0
- terradev_cli-4.0.6/docs/index.html +167 -0
- terradev_cli-4.0.6/docs/robots.txt +28 -0
- terradev_cli-4.0.6/docs/sitemap.xml +45 -0
- terradev_cli-4.0.6/eb_deploy.zip +1 -0
- terradev_cli-4.0.6/ec2_user_data_starter.py +383 -0
- terradev_cli-4.0.6/fallback_server.py +213 -0
- terradev_cli-4.0.6/figure_calculation_breakdown.py +265 -0
- terradev_cli-4.0.6/fix_server_ports.py +140 -0
- terradev_cli-4.0.6/genius_data_storage/data_compressor.py +377 -0
- terradev_cli-4.0.6/genius_data_storage/genius_data_demo.py +357 -0
- terradev_cli-4.0.6/genius_data_storage/integrity_verifier.py +582 -0
- terradev_cli-4.0.6/genius_data_storage/main.tf +682 -0
- terradev_cli-4.0.6/genius_data_storage/zero_egress_accessor.py +524 -0
- terradev_cli-4.0.6/gpu-check.sh +84 -0
- terradev_cli-4.0.6/grafana_training_dashboard.json +385 -0
- terradev_cli-4.0.6/helm/terradev/Chart.yaml +38 -0
- terradev_cli-4.0.6/helm/terradev/templates/_helpers.tpl +127 -0
- terradev_cli-4.0.6/helm/terradev/templates/deployment.yaml +176 -0
- terradev_cli-4.0.6/helm/terradev/values.yaml +423 -0
- terradev_cli-4.0.6/high_throughput_workload.json +5 -0
- terradev_cli-4.0.6/index.html +167 -0
- terradev_cli-4.0.6/infrastructure/kubernetes/microservices.yaml +633 -0
- terradev_cli-4.0.6/infrastructure/terraform/main.tf +1304 -0
- terradev_cli-4.0.6/infrastructure/terraform/parallelism.tf +86 -0
- terradev_cli-4.0.6/integrations/cli-tool.tf +606 -0
- terradev_cli-4.0.6/integrations/cloud-management-widgets.tf +131 -0
- terradev_cli-4.0.6/integrations/cloud-provider-apis.py +602 -0
- terradev_cli-4.0.6/integrations/critical-widgets.tf +933 -0
- terradev_cli-4.0.6/integrations/devops-cicd-widgets.tf +243 -0
- terradev_cli-4.0.6/integrations/devops-essential-tools.tf +314 -0
- terradev_cli-4.0.6/integrations/docker-integration.tf +423 -0
- terradev_cli-4.0.6/integrations/kubernetes-operator.tf +276 -0
- terradev_cli-4.0.6/integrations/mlflow-integration.tf +366 -0
- terradev_cli-4.0.6/kaggle_notebooks/01_hf_spaces_cost_optimization.ipynb +456 -0
- terradev_cli-4.0.6/kubernetes_training_deployment.yaml +253 -0
- terradev_cli-4.0.6/latency_throughput_tradeoff.png +0 -0
- terradev_cli-4.0.6/latency_throughput_tradeoff_viz.py +349 -0
- terradev_cli-4.0.6/llms.txt +209 -0
- terradev_cli-4.0.6/memory_scaling_analysis.png +0 -0
- terradev_cli-4.0.6/memory_scaling_viz.py +390 -0
- terradev_cli-4.0.6/modules/datadog/README.md +59 -0
- terradev_cli-4.0.6/modules/datadog/dashboard.tf +237 -0
- terradev_cli-4.0.6/modules/datadog/monitors.tf +203 -0
- terradev_cli-4.0.6/modules/datadog/outputs.tf +21 -0
- terradev_cli-4.0.6/modules/datadog/provider.tf +5 -0
- terradev_cli-4.0.6/modules/datadog/variables.tf +77 -0
- terradev_cli-4.0.6/modules/datadog/versions.tf +10 -0
- terradev_cli-4.0.6/nginx.conf +30 -0
- terradev_cli-4.0.6/openclaw-skill/terradev-gpu-cloud/.clawhubignore +5 -0
- terradev_cli-4.0.6/openclaw-skill/terradev-gpu-cloud/LICENSE +21 -0
- terradev_cli-4.0.6/openclaw-skill/terradev-gpu-cloud/README.md +106 -0
- terradev_cli-4.0.6/openclaw-skill/terradev-gpu-cloud/SKILL.md +314 -0
- terradev_cli-4.0.6/openclaw-skill/terradev-gpu-cloud/gpu-check.sh +84 -0
- terradev_cli-4.0.6/optimization_impact_viz.py +445 -0
- terradev_cli-4.0.6/parallel_provisioning.tf +338 -0
- terradev_cli-4.0.6/partnerships/brand-partnerships.tf +523 -0
- terradev_cli-4.0.6/partnerships/partnership-roadmap.tf +232 -0
- terradev_cli-4.0.6/pricing_analysis_report.json +164 -0
- terradev_cli-4.0.6/production_telemetry_server.py +479 -0
- terradev_cli-4.0.6/prometheus_training_config.yml +185 -0
- terradev_cli-4.0.6/provision_based_pricing_analysis.json +111 -0
- terradev_cli-4.0.6/pypi_traffic_report.json +84 -0
- terradev_cli-4.0.6/pyproject.toml +192 -0
- terradev_cli-4.0.6/pyproject_v2.toml +119 -0
- terradev_cli-4.0.6/real_day_one_snapshot.json +52 -0
- terradev_cli-4.0.6/remaining_25_analysis_report.json +722 -0
- terradev_cli-4.0.6/render.yaml +18 -0
- terradev_cli-4.0.6/requirements-render.txt +1 -0
- terradev_cli-4.0.6/requirements.txt +56 -0
- terradev_cli-4.0.6/requirements_api.txt +8 -0
- terradev_cli-4.0.6/requirements_eb.txt +4 -0
- terradev_cli-4.0.6/requirements_stripe.txt +5 -0
- terradev_cli-4.0.6/robots.txt +28 -0
- terradev_cli-4.0.6/sample_workload.json +7 -0
- terradev_cli-4.0.6/server_manager.py +287 -0
- terradev_cli-4.0.6/setup/Lambda Cloud API spec 1.8.3.json +4816 -0
- terradev_cli-4.0.6/setup/account-setup-guide.tf +393 -0
- terradev_cli-4.0.6/setup/setup.sh +411 -0
- terradev_cli-4.0.6/simple_ssm_starter.py +297 -0
- terradev_cli-4.0.6/sitemap.xml +45 -0
- terradev_cli-4.0.6/ssh_server_starter.py +311 -0
- terradev_cli-4.0.6/start_aws_servers.py +302 -0
- terradev_cli-4.0.6/stripe_telemetry_server.py +470 -0
- terradev_cli-4.0.6/telemetry_production.db-shm +0 -0
- terradev_cli-4.0.6/telemetry_production.db-wal +0 -0
- terradev_cli-4.0.6/telemetry_server.py +269 -0
- terradev_cli-4.0.6/terradev_cli/CHANGELOG.md +163 -0
- terradev_cli-4.0.6/terradev_cli/COMPLETE_COMMAND_REFERENCE.md +1478 -0
- terradev_cli-4.0.6/terradev_cli/COMPLETE_INTEGRATION_GUIDE.md +777 -0
- terradev_cli-4.0.6/terradev_cli/FULL_SOLUTION_STACK_MARCH_2025.md +542 -0
- terradev_cli-4.0.6/terradev_cli/IMPLEMENTATION_SUMMARY.md +200 -0
- terradev_cli-4.0.6/terradev_cli/MAKING_SPOT_WORK_FOR_STATEFUL_WORKLOADS.md +346 -0
- terradev_cli-4.0.6/terradev_cli/README_old.md +492 -0
- terradev_cli-4.0.6/terradev_cli/README_with_emojis.md +356 -0
- terradev_cli-4.0.6/terradev_cli/SGLANG_COMMAND_GUIDE.md +578 -0
- terradev_cli-4.0.6/terradev_cli/TERRADEV_STACK_CHART.md +298 -0
- terradev_cli-4.0.6/terradev_cli/__init__.py +10 -0
- terradev_cli-4.0.6/terradev_cli/__main__.py +31 -0
- terradev_cli-4.0.6/terradev_cli/bandit-report.json +388 -0
- terradev_cli-4.0.6/terradev_cli/cli.py +9081 -0
- terradev_cli-4.0.6/terradev_cli/cli_clean.py +35 -0
- terradev_cli-4.0.6/terradev_cli/cli_enhanced.py +577 -0
- terradev_cli-4.0.6/terradev_cli/cli_final.py +823 -0
- terradev_cli-4.0.6/terradev_cli/cli_optimization.py +479 -0
- terradev_cli-4.0.6/terradev_cli/cli_optimization_fixed.py +875 -0
- terradev_cli-4.0.6/terradev_cli/cli_optimization_simple.py +850 -0
- terradev_cli-4.0.6/terradev_cli/cli_simple.py +249 -0
- terradev_cli-4.0.6/terradev_cli/cli_tiered.py +372 -0
- terradev_cli-4.0.6/terradev_cli/core/manifest_example.json +1 -0
- terradev_cli-4.0.6/terradev_cli/core/semantic_signals/routing_policy.yaml +135 -0
- terradev_cli-4.0.6/terradev_cli/cost_optimizer.py +518 -0
- terradev_cli-4.0.6/terradev_cli/debug_comparison.py +16 -0
- terradev_cli-4.0.6/terradev_cli/debug_mla.py +15 -0
- terradev_cli-4.0.6/terradev_cli/debug_streaming.py +41 -0
- terradev_cli-4.0.6/terradev_cli/demo.py +375 -0
- terradev_cli-4.0.6/terradev_cli/entry_point.py +210 -0
- terradev_cli-4.0.6/terradev_cli/integrations/Terradev LOGO BLACK.png +0 -0
- terradev_cli-4.0.6/terradev_cli/integrations/Terradev LOGO WHITEW.png +0 -0
- terradev_cli-4.0.6/terradev_cli/k8s/terraform_wrapper.py +425 -0
- terradev_cli-4.0.6/terradev_cli/kubernetes/inferx-cost-optimized.yaml +630 -0
- terradev_cli-4.0.6/terradev_cli/kubernetes/inferx-infrastructure.yaml +404 -0
- terradev_cli-4.0.6/terradev_cli/kubernetes/inferx-models.yaml +531 -0
- terradev_cli-4.0.6/terradev_cli/kubernetes/inferx-platform.yaml +619 -0
- terradev_cli-4.0.6/terradev_cli/kubernetes/inferx_setup.py +550 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/aws_provider.py +2 -2
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/azure_provider.py +8 -8
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/coreweave_provider.py +1 -10
- terradev_cli-4.0.6/terradev_cli/providers/digitalocean_provider.py +251 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/gcp_provider.py +9 -9
- terradev_cli-4.0.6/terradev_cli/providers/hyperstack_provider.py +247 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/lambda_labs_provider.py +14 -13
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/provider_factory.py +1 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/tensordock_provider.py +38 -35
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/vastai_provider.py +11 -2
- terradev_cli-4.0.6/terradev_cli/requirements.txt +3 -0
- terradev_cli-4.0.6/terradev_cli/requirements_minimal.txt +3 -0
- terradev_cli-4.0.6/terradev_cli/safety-report.json +18554 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/setup.py +1 -1
- terradev_cli-4.0.6/terradev_cli/telemetry_protection.py +180 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/terradev_cli/cli.py +23 -1
- terradev_cli-4.0.6/terradev_cli/terradev_cli/credential_prompt.py +164 -0
- terradev_cli-4.0.6/terradev_cli/terraform/main.tf +139 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/__init__.py +19 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-aws/bootstrap.sh +271 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-aws/main.tf +143 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-aws/outputs.tf +69 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-aws/variables.tf +83 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-aws/versions.tf +12 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-hyperstack/bootstrap.sh +231 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-hyperstack/main.tf +107 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-hyperstack/outputs.tf +59 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-hyperstack/variables.tf +63 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-hyperstack/versions.tf +12 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-lambda/bootstrap.sh +231 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-lambda/main.tf +107 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-lambda/outputs.tf +59 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-lambda/variables.tf +63 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-lambda/versions.tf +12 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-vastai/bootstrap.sh +231 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-vastai/main.tf +107 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-vastai/outputs.tf +59 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-vastai/variables.tf +63 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/gpu-node-vastai/versions.tf +12 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/join-script.tpl +95 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/kubeconfig.tpl +29 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/main.tf +183 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/outputs.tf +50 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/variables.tf +59 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/k8s-control-plane/versions.tf +16 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/networking/main.tf +193 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/networking/outputs.tf +50 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/networking/variables.tf +52 -0
- terradev_cli-4.0.6/terradev_cli/terraform/modules/networking/versions.tf +12 -0
- terradev_cli-4.0.6/terradev_cli/terraform/outputs.tf +118 -0
- terradev_cli-4.0.6/terradev_cli/terraform/price-optimizer/optimal-allocation.py +188 -0
- terradev_cli-4.0.6/terradev_cli/terraform/providers.tf +22 -0
- terradev_cli-4.0.6/terradev_cli/terraform/variables.tf +95 -0
- terradev_cli-4.0.6/terradev_cli/test_summary.py +207 -0
- terradev_cli-4.0.6/terradev_cli.egg-info/PKG-INFO +1020 -0
- terradev_cli-4.0.6/terradev_cli.egg-info/SOURCES.txt +455 -0
- terradev_cli-4.0.6/terradev_cli.egg-info/requires.txt +67 -0
- terradev_cli-4.0.6/terradev_cli.egg-info/top_level.txt +1 -0
- terradev_cli-4.0.6/terraform.tfvars.example +50 -0
- terradev_cli-4.0.6/terraform_cli/README.md +471 -0
- terradev_cli-4.0.6/terraform_cli/demo_usage.py +152 -0
- terradev_cli-4.0.6/terraform_cli/install.sh +346 -0
- terradev_cli-4.0.6/terraform_cli/terradev_cli.py +717 -0
- terradev_cli-4.0.6/terraform_optimization/backend.tf +606 -0
- terradev_cli-4.0.6/terraform_optimization/error_handling.py +742 -0
- terradev_cli-4.0.6/terraform_optimization/storage_choice_matrix.py +681 -0
- terradev_cli-4.0.6/terraform_optimization/test_error_handler.py +83 -0
- terradev_cli-4.0.6/terraform_optimization/versions.tf +668 -0
- terradev_cli-4.0.6/terraform_transparency/audit_demo/trail_3caed73ad5bf.json +428 -0
- terradev_cli-4.0.6/terraform_transparency/audit_trail.py +610 -0
- terradev_cli-4.0.6/terraform_transparency/audit_trails.json +430 -0
- terradev_cli-4.0.6/terraform_transparency/decision_engine.py +671 -0
- terradev_cli-4.0.6/terraform_transparency/decision_logs.json +192 -0
- terradev_cli-4.0.6/terraform_transparency/operations.json +45 -0
- terradev_cli-4.0.6/terraform_transparency/terraform_manager.py +627 -0
- terradev_cli-4.0.6/terraform_transparency/terraform_plans.json +1 -0
- terradev_cli-4.0.6/terraform_transparency/transparency_demo.py +417 -0
- terradev_cli-4.0.6/terraform_transparency/transparency_report.json +33 -0
- terradev_cli-4.0.6/test_cli_telemetry.py +156 -0
- terradev_cli-4.0.6/test_cli_ux_battle.py +559 -0
- terradev_cli-4.0.6/test_cli_ux_battle_fixed.py +654 -0
- terradev_cli-4.0.6/test_cli_ux_final.py +524 -0
- terradev_cli-4.0.6/test_final_integration.py +355 -0
- terradev_cli-4.0.6/test_integrated_optimization.py +449 -0
- terradev_cli-4.0.6/test_p10_production_failover.py +667 -0
- terradev_cli-4.0.6/test_sso_phase1.py +309 -0
- terradev_cli-4.0.6/test_telemetry_backend.py +203 -0
- terradev_cli-4.0.6/test_vllm_optimization.py +182 -0
- terradev_cli-4.0.6/tests/INTEGRATION_STRATEGY.md +337 -0
- terradev_cli-4.0.6/tests/Makefile +201 -0
- terradev_cli-4.0.6/tests/ci_pipeline.py +480 -0
- terradev_cli-4.0.6/tests/conftest.py +10 -0
- terradev_cli-4.0.6/tests/simple_test.py +181 -0
- terradev_cli-4.0.6/tests/test_checkpoint_manager.py +198 -0
- terradev_cli-4.0.6/tests/test_cli_smoke.py +200 -0
- terradev_cli-4.0.6/tests/test_dag_executor.py +400 -0
- terradev_cli-4.0.6/tests/test_integration.py +577 -0
- terradev_cli-4.0.6/tests/test_job_state_manager.py +307 -0
- terradev_cli-4.0.6/tests/test_providers.py +264 -0
- terradev_cli-4.0.6/tests/test_semantic_router.py +1474 -0
- terradev_cli-4.0.6/tests/test_sglang_optimization.py +595 -0
- terradev_cli-4.0.6/tests/test_ssh_key_manager.py +154 -0
- terradev_cli-4.0.6/tests/test_training_monitor.py +237 -0
- terradev_cli-4.0.6/tests/test_training_orchestrator.py +321 -0
- terradev_cli-4.0.6/throughput_comparison_analysis.png +0 -0
- terradev_cli-4.0.6/throughput_comparison_viz.py +303 -0
- terradev_cli-4.0.6/volatility_charts/aws_a100_volatility.png +0 -0
- terradev_cli-4.0.6/volatility_charts/aws_v100_volatility.png +0 -0
- terradev_cli-4.0.6/volatility_charts/runpod_a100_volatility.png +0 -0
- terradev_cli-4.0.6/volatility_charts/runpod_v100_volatility.png +0 -0
- terradev_cli-4.0.4/providers/digitalocean_provider.py +0 -310
- terradev_cli-4.0.4/providers/hyperstack_provider.py +0 -353
- terradev_cli-4.0.4/terradev_cli.egg-info/SOURCES.txt +0 -135
- terradev_cli-4.0.4/terradev_cli.egg-info/not-zip-safe +0 -1
- terradev_cli-4.0.4/terradev_cli.egg-info/requires.txt +0 -29
- terradev_cli-4.0.4/terradev_cli.egg-info/top_level.txt +0 -7
- {terradev_cli-4.0.4 → terradev_cli-4.0.6}/setup.cfg +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/README.md +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/async_config.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/auth.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/checkpoint_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/config.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/cost_scaler.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/cost_tracker.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/cuda_graph_integrator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/dag_executor.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/data_governance.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/dataset_stager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/deployment_router.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/drift_detector.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/egress_cost_monitor.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/egress_optimizer.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/enterprise_auth.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/gitops_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/gpu_topology.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/helm_generator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/helm_generator_old.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/hf_cli_integration.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/hf_smart_templates.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/hf_spaces.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/inference_router.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/job_state_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/kv_cache_checkpoint_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/kv_cache_checkpoint_tests.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/manifest_cache.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/mig_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/mla_vram_estimator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/mla_vram_tests.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/model_orchestrator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/oidc_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/optimization_config.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/parallel_provisioner.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/preflight_validator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/price_discovery.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/price_discovery_mock.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/price_intelligence.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/public_ip_billing_tracker.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/quick_start.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/rate_limiter.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/saml_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_router.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/base_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/complexity_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/domain_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/keyword_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/language_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/modality_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/orchestrator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/semantic_signals/safety_signal.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/session_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/ssh_key_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/stripe_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/telemetry.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/telemetry_backup.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/terradev_engine.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/tier_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/trace_viewer.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/training_monitor.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/training_orchestrator.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/user_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/warm_pool_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/weight_streaming_benchmarks.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/core/weight_streaming_manager.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6}/terradev_cli/credential_prompt.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/integrations/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/integrations/datadog_integration.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/integrations/prometheus_integration.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/integrations/wandb_integration.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/dvc_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/guardrails_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/huggingface_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/kserve_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/kubernetes_enhanced.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/kubernetes_enhanced_fixed.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/kubernetes_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/langchain_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/langgraph_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/langsmith_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/lmcache_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/mlflow_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/ollama_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/phoenix_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/qdrant_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/ray_enhanced.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/ray_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/sglang_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/vllm_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/wandb_enhanced.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/ml_services/wandb_service.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/optimization/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/optimization/auto_optimizer.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/optimization/cuco_optimizer.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/alibaba_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/base_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/baseten_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/crusoe_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/demo_mode.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/fluidstack_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/hetzner_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/huggingface_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/inferx_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/oracle_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/ovhcloud_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/real_pricing.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/runpod_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/providers/siliconflow_provider.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/terradev_cli/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/utils/__init__.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6/terradev_cli}/utils/formatters.py +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6}/terradev_cli.egg-info/dependency_links.txt +0 -0
- {terradev_cli-4.0.4 → terradev_cli-4.0.6}/terradev_cli.egg-info/entry_points.txt +0 -0
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Terradev Environment Configuration
|
|
2
|
+
# Copy this file to .env and fill in your actual values
|
|
3
|
+
|
|
4
|
+
# Database Configuration
|
|
5
|
+
DATABASE_URL=postgresql://user:password@localhost:5432/terradev
|
|
6
|
+
REDIS_URL=redis://localhost:6379/0
|
|
7
|
+
|
|
8
|
+
# Security Configuration
|
|
9
|
+
SECRET_KEY=your-secret-key-here
|
|
10
|
+
JWT_SECRET=your-jwt-secret-here
|
|
11
|
+
ENCRYPTION_KEY=your-encryption-key-here
|
|
12
|
+
|
|
13
|
+
# Cloud Provider Credentials
|
|
14
|
+
# TensorDock
|
|
15
|
+
TENSORDOCK_API_KEY=your-tensordock-api-key
|
|
16
|
+
TENSORDOCK_SECRET=your-tensordock-secret
|
|
17
|
+
|
|
18
|
+
# Vast.AI
|
|
19
|
+
VAST_API_KEY=your-vast-api-key
|
|
20
|
+
|
|
21
|
+
# Google Cloud Platform
|
|
22
|
+
GCP_PROJECT_ID=your-gcp-project-id
|
|
23
|
+
GCP_CREDENTIALS_PATH=path/to/gcp-credentials.json
|
|
24
|
+
GCP_ZONE=us-central1-a
|
|
25
|
+
|
|
26
|
+
# AWS
|
|
27
|
+
AWS_ACCESS_KEY_ID=your-aws-access-key
|
|
28
|
+
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
|
|
29
|
+
AWS_REGION=us-west-2
|
|
30
|
+
|
|
31
|
+
# RunPod
|
|
32
|
+
RUNPOD_API_KEY=your-runpod-api-key
|
|
33
|
+
|
|
34
|
+
# Lambda Labs
|
|
35
|
+
LAMBDA_API_KEY=your-lambda-api-key
|
|
36
|
+
|
|
37
|
+
# CoreWeave
|
|
38
|
+
COREWEAVE_API_KEY=your-coreweave-api-key
|
|
39
|
+
|
|
40
|
+
# Application Configuration
|
|
41
|
+
DEBUG=false
|
|
42
|
+
LOG_LEVEL=INFO
|
|
43
|
+
PORT=8000
|
|
44
|
+
HOST=0.0.0.0
|
|
45
|
+
|
|
46
|
+
# Monitoring Configuration
|
|
47
|
+
PROMETHEUS_PORT=9090
|
|
48
|
+
GRAFANA_PORT=3000
|
|
49
|
+
GRAFANA_ADMIN_PASSWORD=your-grafana-admin-password
|
|
50
|
+
|
|
51
|
+
# Rate Limiting
|
|
52
|
+
RATE_LIMIT_PER_MINUTE=60
|
|
53
|
+
RATE_LIMIT_BURST=10
|
|
54
|
+
|
|
55
|
+
# Cache Configuration
|
|
56
|
+
CACHE_TTL=300
|
|
57
|
+
CACHE_MAX_SIZE=1000
|
|
58
|
+
|
|
59
|
+
# Email Configuration (optional)
|
|
60
|
+
SMTP_HOST=smtp.gmail.com
|
|
61
|
+
SMTP_PORT=587
|
|
62
|
+
SMTP_USERNAME=your-email@gmail.com
|
|
63
|
+
SMTP_PASSWORD=your-email-password
|
|
64
|
+
SENDER_EMAIL=noreply@terradev.com
|
|
65
|
+
|
|
66
|
+
# Stripe Configuration (REQUIRED for payments)
|
|
67
|
+
STRIPE_SECRET_KEY=sk_test_your_stripe_secret_key_here
|
|
68
|
+
STRIPE_WEBHOOK_SECRET=whsec_your_webhook_secret_here
|
|
69
|
+
STRIPE_PUBLISHABLE_KEY=pk_test_your_publishable_key_here
|
|
70
|
+
|
|
71
|
+
# Webhook Configuration (optional)
|
|
72
|
+
SLACK_WEBHOOK_URL=your-slack-webhook-url
|
|
73
|
+
DISCORD_WEBHOOK_URL=your-discord-webhook-url
|
|
74
|
+
|
|
75
|
+
# Feature Flags
|
|
76
|
+
ENABLE_MONITORING=true
|
|
77
|
+
ENABLE_LOGGING=true
|
|
78
|
+
ENABLE_METRICS=true
|
|
79
|
+
ENABLE_TRACING=false
|
|
80
|
+
|
|
81
|
+
# Performance Configuration
|
|
82
|
+
MAX_WORKERS=4
|
|
83
|
+
WORKER_TIMEOUT=30
|
|
84
|
+
MAX_CONNECTIONS=100
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""
|
|
3
|
+
Production Environment Variables Template
|
|
4
|
+
Copy this to .env and fill in your actual values
|
|
5
|
+
"""
|
|
6
|
+
|
|
7
|
+
# Database Configuration
|
|
8
|
+
DB_HOST=localhost
|
|
9
|
+
DB_PORT=5432
|
|
10
|
+
DB_NAME=terradev
|
|
11
|
+
DB_USERNAME=terradev_user
|
|
12
|
+
DB_PASSWORD=your_secure_password_here
|
|
13
|
+
DB_SSL_MODE=require
|
|
14
|
+
DB_POOL_SIZE=20
|
|
15
|
+
DB_MAX_OVERFLOW=30
|
|
16
|
+
|
|
17
|
+
# Redis Configuration
|
|
18
|
+
REDIS_HOST=localhost
|
|
19
|
+
REDIS_PORT=6379
|
|
20
|
+
REDIS_PASSWORD=your_redis_password_here
|
|
21
|
+
REDIS_DB=0
|
|
22
|
+
REDIS_SSL=true
|
|
23
|
+
REDIS_MAX_CONNECTIONS=100
|
|
24
|
+
|
|
25
|
+
# Security Configuration
|
|
26
|
+
SECRET_KEY=your_very_secure_secret_key_here_at_least_32_characters
|
|
27
|
+
JWT_ALGORITHM=HS256
|
|
28
|
+
ACCESS_TOKEN_EXPIRE_MINUTES=30
|
|
29
|
+
REFRESH_TOKEN_EXPIRE_DAYS=7
|
|
30
|
+
BCRYPT_ROUNDS=12
|
|
31
|
+
|
|
32
|
+
# CORS Configuration
|
|
33
|
+
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
|
|
34
|
+
|
|
35
|
+
# AWS Configuration
|
|
36
|
+
AWS_ACCESS_KEY_ID=your_aws_access_key_here
|
|
37
|
+
AWS_SECRET_ACCESS_KEY=your_aws_secret_key_here
|
|
38
|
+
AWS_REGION=us-east-1
|
|
39
|
+
AWS_ENDPOINT=
|
|
40
|
+
AWS_TIMEOUT=30
|
|
41
|
+
AWS_RETRY_ATTEMPTS=3
|
|
42
|
+
AWS_RETRY_DELAY=1.0
|
|
43
|
+
|
|
44
|
+
# GCP Configuration
|
|
45
|
+
GCP_PROJECT_ID=your_gcp_project_id
|
|
46
|
+
GCP_SERVICE_ACCOUNT_KEY=path/to/your/service-account-key.json
|
|
47
|
+
GCP_REGION=us-central1
|
|
48
|
+
GCP_ENDPOINT=
|
|
49
|
+
GCP_TIMEOUT=30
|
|
50
|
+
GCP_RETRY_ATTEMPTS=3
|
|
51
|
+
GCP_RETRY_DELAY=1.0
|
|
52
|
+
|
|
53
|
+
# Azure Configuration
|
|
54
|
+
AZURE_CLIENT_ID=your_azure_client_id
|
|
55
|
+
AZURE_CLIENT_SECRET=your_azure_client_secret
|
|
56
|
+
AZURE_TENANT_ID=your_azure_tenant_id
|
|
57
|
+
AZURE_SUBSCRIPTION_ID=your_azure_subscription_id
|
|
58
|
+
AZURE_REGION=eastus
|
|
59
|
+
AZURE_ENDPOINT=
|
|
60
|
+
AZURE_TIMEOUT=30
|
|
61
|
+
AZURE_RETRY_ATTEMPTS=3
|
|
62
|
+
AZURE_RETRY_DELAY=1.0
|
|
63
|
+
|
|
64
|
+
# OCI Configuration
|
|
65
|
+
OCI_TENANCY_OCID=your_oci_tenancy_ocid
|
|
66
|
+
OCI_USER_OCID=your_oci_user_ocid
|
|
67
|
+
OCI_FINGERPRINT=your_oci_fingerprint
|
|
68
|
+
OCI_PRIVATE_KEY_PATH=~/.oci/oci_api_key.pem
|
|
69
|
+
OCI_REGION=us-ashburn-1
|
|
70
|
+
OCI_ENDPOINT=
|
|
71
|
+
OCI_TIMEOUT=30
|
|
72
|
+
OCI_RETRY_ATTEMPTS=3
|
|
73
|
+
OCI_RETRY_DELAY=1.0
|
|
74
|
+
|
|
75
|
+
# HuggingFace Configuration
|
|
76
|
+
HUGGINGFACE_TOKEN=your_huggingface_token_here
|
|
77
|
+
HF_TIMEOUT=30
|
|
78
|
+
HF_RETRY_ATTEMPTS=3
|
|
79
|
+
HF_RETRY_DELAY=1.0
|
|
80
|
+
|
|
81
|
+
# Vast.ai Configuration
|
|
82
|
+
VAST_API_KEY=your_vast_api_key_here
|
|
83
|
+
VAST_TIMEOUT=30
|
|
84
|
+
VAST_RETRY_ATTEMPTS=3
|
|
85
|
+
VAST_RETRY_DELAY=1.0
|
|
86
|
+
|
|
87
|
+
# RunPod Configuration
|
|
88
|
+
RUNPOD_API_KEY=your_runpod_api_key_here
|
|
89
|
+
RUNPOD_TIMEOUT=30
|
|
90
|
+
RUNPOD_RETRY_ATTEMPTS=3
|
|
91
|
+
RUNPOD_RETRY_DELAY=1.0
|
|
92
|
+
|
|
93
|
+
# Monitoring Configuration
|
|
94
|
+
PROMETHEUS_ENABLED=true
|
|
95
|
+
PROMETHEUS_PORT=9090
|
|
96
|
+
GRAFANA_ENABLED=true
|
|
97
|
+
GRAFANA_PORT=3000
|
|
98
|
+
GRAFANA_ADMIN_PASSWORD=your_grafana_password
|
|
99
|
+
|
|
100
|
+
# Logging Configuration
|
|
101
|
+
LOG_LEVEL=INFO
|
|
102
|
+
LOG_FILE=/var/log/terradev/production.log
|
|
103
|
+
LOG_MAX_SIZE=100MB
|
|
104
|
+
LOG_BACKUP_COUNT=5
|
|
105
|
+
|
|
106
|
+
# Rate Limiting Configuration
|
|
107
|
+
RATE_LIMIT_DEFAULT=100/minute
|
|
108
|
+
RATE_LIMIT_AUTH=10/minute
|
|
109
|
+
RATE_LIMIT_DEPLOY=5/minute
|
|
110
|
+
RATE_LIMIT_UPLOAD=20/minute
|
|
111
|
+
RATE_LIMIT_API=1000/hour
|
|
112
|
+
|
|
113
|
+
# Performance Configuration
|
|
114
|
+
MAX_WORKERS=4
|
|
115
|
+
WORKER_TIMEOUT=300
|
|
116
|
+
KEEPALIVE_TIMEOUT=30
|
|
117
|
+
MAX_CONNECTIONS=100
|
|
118
|
+
MAX_CONNECTIONS_PER_HOST=25
|
|
119
|
+
|
|
120
|
+
# Security Configuration
|
|
121
|
+
ENABLE_HTTPS=true
|
|
122
|
+
SSL_CERT_PATH=/etc/ssl/cert.pem
|
|
123
|
+
SSL_KEY_PATH=/etc/ssl/key.pem
|
|
124
|
+
ALLOWED_HOSTS=yourdomain.com,app.yourdomain.com
|
|
125
|
+
|
|
126
|
+
# Feature Flags
|
|
127
|
+
ENABLE_CLOUD_ADVISOR=true
|
|
128
|
+
ENABLE_MULTICLOUD_DB=true
|
|
129
|
+
ENABLE_ADVANCED_ANALYTICS=true
|
|
130
|
+
ENABLE_BETA_FEATURES=false
|
|
131
|
+
|
|
132
|
+
# Backup Configuration
|
|
133
|
+
BACKUP_ENABLED=true
|
|
134
|
+
BACKUP_SCHEDULE=0 2 * * * # Daily at 2 AM
|
|
135
|
+
BACKUP_RETENTION_DAYS=30
|
|
136
|
+
BACKUP_S3_BUCKET=your-backup-bucket
|
|
137
|
+
|
|
138
|
+
# Alert Configuration
|
|
139
|
+
ALERT_EMAIL_ENABLED=true
|
|
140
|
+
ALERT_EMAIL_SMTP_HOST=smtp.gmail.com
|
|
141
|
+
ALERT_EMAIL_SMTP_PORT=587
|
|
142
|
+
ALERT_EMAIL_USERNAME=your_email@gmail.com
|
|
143
|
+
ALERT_EMAIL_PASSWORD=your_email_password
|
|
144
|
+
ALERT_EMAIL_FROM=alerts@yourdomain.com
|
|
145
|
+
ALERT_EMAIL_TO=admin@yourdomain.com
|
|
146
|
+
|
|
147
|
+
# Slack Configuration
|
|
148
|
+
SLACK_WEBHOOK_URL=your_slack_webhook_url
|
|
149
|
+
SLACK_CHANNEL=#alerts
|
|
150
|
+
|
|
151
|
+
# Environment Configuration
|
|
152
|
+
ENVIRONMENT=production
|
|
153
|
+
DEBUG=false
|
|
154
|
+
TESTING=false
|
|
155
|
+
|
|
156
|
+
# Application Configuration
|
|
157
|
+
APP_NAME=Terradev
|
|
158
|
+
APP_VERSION=1.0.0
|
|
159
|
+
APP_DESCRIPTION=ML Training Optimization Platform
|
|
160
|
+
|
|
161
|
+
# External Services Configuration
|
|
162
|
+
SENTRY_DSN=your_sentry_dsn_here
|
|
163
|
+
NEW_RELIC_LICENSE_KEY=your_new_relic_key_here
|
|
164
|
+
DATADOG_API_KEY=your_datadog_api_key_here
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
name: Terradev CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ main ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ main ]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
strategy:
|
|
13
|
+
matrix:
|
|
14
|
+
python-version: ['3.9', '3.11', '3.12']
|
|
15
|
+
env:
|
|
16
|
+
TERRADEV_SKIP_ONBOARDING: "1"
|
|
17
|
+
|
|
18
|
+
steps:
|
|
19
|
+
- uses: actions/checkout@v4
|
|
20
|
+
|
|
21
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
22
|
+
uses: actions/setup-python@v5
|
|
23
|
+
with:
|
|
24
|
+
python-version: ${{ matrix.python-version }}
|
|
25
|
+
|
|
26
|
+
- name: Install package + test deps
|
|
27
|
+
run: |
|
|
28
|
+
pip install -e ".[dev]" 2>/dev/null || pip install -e .
|
|
29
|
+
pip install pytest pytest-asyncio
|
|
30
|
+
|
|
31
|
+
- name: CLI smoke test
|
|
32
|
+
run: |
|
|
33
|
+
terradev --version
|
|
34
|
+
terradev --help
|
|
35
|
+
|
|
36
|
+
- name: Run test suite
|
|
37
|
+
run: pytest tests/ -v --tb=short
|
|
38
|
+
|
|
39
|
+
build:
|
|
40
|
+
runs-on: ubuntu-latest
|
|
41
|
+
needs: test
|
|
42
|
+
steps:
|
|
43
|
+
- uses: actions/checkout@v4
|
|
44
|
+
|
|
45
|
+
- uses: actions/setup-python@v5
|
|
46
|
+
with:
|
|
47
|
+
python-version: '3.11'
|
|
48
|
+
|
|
49
|
+
- name: Build wheel + sdist
|
|
50
|
+
run: |
|
|
51
|
+
pip install build
|
|
52
|
+
python -m build
|
|
53
|
+
|
|
54
|
+
- name: Verify built package installs
|
|
55
|
+
run: |
|
|
56
|
+
pip install dist/*.whl
|
|
57
|
+
terradev --version
|
|
58
|
+
|
|
59
|
+
- name: Upload artifacts
|
|
60
|
+
uses: actions/upload-artifact@v4
|
|
61
|
+
with:
|
|
62
|
+
name: dist
|
|
63
|
+
path: dist/
|
|
64
|
+
|
|
65
|
+
security:
|
|
66
|
+
runs-on: ubuntu-latest
|
|
67
|
+
steps:
|
|
68
|
+
- uses: actions/checkout@v4
|
|
69
|
+
|
|
70
|
+
- uses: actions/setup-python@v5
|
|
71
|
+
with:
|
|
72
|
+
python-version: '3.11'
|
|
73
|
+
|
|
74
|
+
- name: Install + scan
|
|
75
|
+
run: |
|
|
76
|
+
pip install bandit
|
|
77
|
+
bandit -r terradev_cli/ -f json -o bandit-report.json -ll || true
|
|
78
|
+
|
|
79
|
+
- uses: actions/upload-artifact@v4
|
|
80
|
+
with:
|
|
81
|
+
name: security-reports
|
|
82
|
+
path: bandit-report.json
|
|
83
|
+
|
|
84
|
+
publish:
|
|
85
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
86
|
+
needs: [test, build]
|
|
87
|
+
runs-on: ubuntu-latest
|
|
88
|
+
permissions:
|
|
89
|
+
id-token: write
|
|
90
|
+
steps:
|
|
91
|
+
- uses: actions/checkout@v4
|
|
92
|
+
|
|
93
|
+
- uses: actions/setup-python@v5
|
|
94
|
+
with:
|
|
95
|
+
python-version: '3.11'
|
|
96
|
+
|
|
97
|
+
- name: Build
|
|
98
|
+
run: |
|
|
99
|
+
pip install build
|
|
100
|
+
python -m build
|
|
101
|
+
|
|
102
|
+
- name: Publish to PyPI
|
|
103
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
104
|
+
with:
|
|
105
|
+
password: ${{ secrets.PYPI_API_TOKEN }}
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
name: Deploy GitHub Pages
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ main ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ main ]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
deploy:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
permissions:
|
|
13
|
+
pages: write
|
|
14
|
+
id-token: write
|
|
15
|
+
|
|
16
|
+
steps:
|
|
17
|
+
- name: Checkout
|
|
18
|
+
uses: actions/checkout@v4
|
|
19
|
+
|
|
20
|
+
- name: Setup Pages
|
|
21
|
+
uses: actions/configure-pages@v4
|
|
22
|
+
|
|
23
|
+
- name: Upload artifact
|
|
24
|
+
uses: actions/upload-pages-artifact@v4
|
|
25
|
+
with:
|
|
26
|
+
path: './docs'
|
|
27
|
+
|
|
28
|
+
- name: Deploy to GitHub Pages
|
|
29
|
+
id: deployment
|
|
30
|
+
uses: actions/deploy-pages@v4
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
name: Release to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- 'v*'
|
|
7
|
+
|
|
8
|
+
permissions:
|
|
9
|
+
contents: write
|
|
10
|
+
releases: write
|
|
11
|
+
|
|
12
|
+
jobs:
|
|
13
|
+
release:
|
|
14
|
+
runs-on: ubuntu-latest
|
|
15
|
+
|
|
16
|
+
steps:
|
|
17
|
+
- name: Checkout code
|
|
18
|
+
uses: actions/checkout@v4
|
|
19
|
+
|
|
20
|
+
- name: Set up Python
|
|
21
|
+
uses: actions/setup-python@v4
|
|
22
|
+
with:
|
|
23
|
+
python-version: '3.9'
|
|
24
|
+
|
|
25
|
+
- name: Install build dependencies
|
|
26
|
+
run: |
|
|
27
|
+
cd terradev_cli
|
|
28
|
+
pip install build twine
|
|
29
|
+
|
|
30
|
+
- name: Build package
|
|
31
|
+
run: |
|
|
32
|
+
cd terradev_cli
|
|
33
|
+
python -m build
|
|
34
|
+
|
|
35
|
+
- name: Check package
|
|
36
|
+
run: |
|
|
37
|
+
cd terradev_cli
|
|
38
|
+
twine check dist/*
|
|
39
|
+
|
|
40
|
+
- name: Publish to PyPI
|
|
41
|
+
env:
|
|
42
|
+
TWINE_USERNAME: __token__
|
|
43
|
+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
|
|
44
|
+
run: |
|
|
45
|
+
cd terradev_cli
|
|
46
|
+
twine upload dist/*
|
|
47
|
+
|
|
48
|
+
- name: Create GitHub Release
|
|
49
|
+
run: |
|
|
50
|
+
gh release create ${{ github.ref_name }} --generate-notes
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Simple workflow for deploying static content to GitHub Pages
|
|
2
|
+
name: Deploy static content to Pages
|
|
3
|
+
|
|
4
|
+
on:
|
|
5
|
+
# Runs on pushes targeting the default branch
|
|
6
|
+
push:
|
|
7
|
+
branches: ["main"]
|
|
8
|
+
|
|
9
|
+
# Allows you to run this workflow manually from the Actions tab
|
|
10
|
+
workflow_dispatch:
|
|
11
|
+
|
|
12
|
+
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
|
|
13
|
+
permissions:
|
|
14
|
+
contents: read
|
|
15
|
+
pages: write
|
|
16
|
+
id-token: write
|
|
17
|
+
|
|
18
|
+
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
|
|
19
|
+
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
|
|
20
|
+
concurrency:
|
|
21
|
+
group: "pages"
|
|
22
|
+
cancel-in-progress: false
|
|
23
|
+
|
|
24
|
+
jobs:
|
|
25
|
+
# Single deploy job since we're just deploying
|
|
26
|
+
deploy:
|
|
27
|
+
environment:
|
|
28
|
+
name: github-pages
|
|
29
|
+
url: ${{ steps.deployment.outputs.page_url }}
|
|
30
|
+
runs-on: ubuntu-latest
|
|
31
|
+
steps:
|
|
32
|
+
- name: Checkout
|
|
33
|
+
uses: actions/checkout@v4
|
|
34
|
+
- name: Setup Pages
|
|
35
|
+
uses: actions/configure-pages@v5
|
|
36
|
+
- name: Upload artifact
|
|
37
|
+
uses: actions/upload-pages-artifact@v4
|
|
38
|
+
with:
|
|
39
|
+
path: './docs'
|
|
40
|
+
- name: Deploy to GitHub Pages
|
|
41
|
+
id: deployment
|
|
42
|
+
uses: actions/deploy-pages@v4
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Changelog v3.7.1
|
|
2
|
+
|
|
3
|
+
## 🚀 CUDA Graph Optimization with NUMA Awareness
|
|
4
|
+
|
|
5
|
+
### 🧠 Revolutionary Passive Optimization
|
|
6
|
+
- **Automatic CUDA Graph Detection**: Passively analyzes models for CUDA Graph compatibility without user intervention
|
|
7
|
+
- **NUMA-Aware Scoring**: Rates endpoints by CUDA Graph performance potential (PIX: 1.0, PXB: 0.8, PHB: 0.6, SYS: 0.3)
|
|
8
|
+
- **Model-Specific Intelligence**: Different optimization strategies for transformers, CNNs, and MoE models
|
|
9
|
+
- **Background Analysis**: Runs automatically every 5 minutes to optimize warm pool and endpoint selection
|
|
10
|
+
|
|
11
|
+
### 🔍 NUMA Topology Intelligence
|
|
12
|
+
- **GPU/NIC Alignment**: Automatically detects and prioritizes endpoints with optimal PCIe topology
|
|
13
|
+
- **Intra-GPU NUMA Support**: AMD MI300X XCD locality awareness for maximum performance
|
|
14
|
+
- **RDMA Optimization**: Boosts scores for endpoints with GPUDirect RDMA capabilities
|
|
15
|
+
- **Performance Estimates**: Provides 1.2-5x speedup estimates based on topology analysis
|
|
16
|
+
|
|
17
|
+
### 📊 Smart Model Detection
|
|
18
|
+
- **Transformers**: Highest priority (0.9 base score) - benefit most from CUDA Graphs
|
|
19
|
+
- **CNNs**: Moderate priority (0.7 base score) - benefit moderately
|
|
20
|
+
- **MoE Models**: Lower priority (0.4 base score) - dynamic routing challenges
|
|
21
|
+
- **Auto-detection**: Model types identified automatically from model IDs
|
|
22
|
+
|
|
23
|
+
### 🔄 Enhanced Warm Pool Manager
|
|
24
|
+
- **CUDA Graph Priority Boosting**: Graph-compatible models get higher warm pool priority
|
|
25
|
+
- **Endpoint Optimization**: Routes to NUMA-optimal endpoints automatically
|
|
26
|
+
- **Performance Tracking**: Monitors graph capture time and replay speedup
|
|
27
|
+
- **Background Optimization**: Continuous analysis without user configuration
|
|
28
|
+
|
|
29
|
+
### 🛠️ Integration Layer
|
|
30
|
+
- **Zero Configuration**: Everything works passively in the background
|
|
31
|
+
- **Auto-Enable**: CUDA Graph optimization enabled automatically on module import
|
|
32
|
+
- **Default Instances**: Easy access to optimization components
|
|
33
|
+
- **Graceful Fallback**: System continues to work if optimization fails
|
|
34
|
+
|
|
35
|
+
### 📈 Performance Improvements
|
|
36
|
+
- **2-5x speedup** for CUDA Graph workloads with optimal NUMA topology
|
|
37
|
+
- **30-50% bandwidth penalty eliminated** through automatic GPU/NIC alignment
|
|
38
|
+
- **Zero overhead** - no additional CLI commands or configuration required
|
|
39
|
+
- **Model-aware routing** - different strategies for different model types
|
|
40
|
+
|
|
41
|
+
## 🐛 Bug Fixes
|
|
42
|
+
- Fixed README duplication issue in PyPI descriptions
|
|
43
|
+
- Improved error handling in CUDA Graph optimization
|
|
44
|
+
- Enhanced NUMA topology detection accuracy
|
|
45
|
+
|
|
46
|
+
## 📦 Package Updates
|
|
47
|
+
- Updated version to 3.7.1 across all files
|
|
48
|
+
- Updated PyPI description to highlight CUDA Graph optimization
|
|
49
|
+
- Comprehensive README with complete tutorial
|
|
50
|
+
- Enhanced package metadata for better discoverability
|
|
51
|
+
|
|
52
|
+
## 🔧 Technical Details
|
|
53
|
+
- **New Files**:
|
|
54
|
+
- `terradev_cli/core/cuda_graph_integrator.py` - Integration layer
|
|
55
|
+
- Enhanced `terradev_cli/core/semantic_router.py` - NUMA-aware graph scoring
|
|
56
|
+
- Enhanced `terradev_cli/core/warm_pool_manager.py` - Graph-aware warming
|
|
57
|
+
- **Enhanced Files**:
|
|
58
|
+
- `terradev_cli/core/__init__.py` - Auto-enable optimization
|
|
59
|
+
- `terradev_cli/setup.py` - Updated description and version
|
|
60
|
+
- `README.md` - Complete tutorial with CUDA Graph content
|
|
61
|
+
|
|
62
|
+
## 🎯 Use Cases
|
|
63
|
+
This release is perfect for:
|
|
64
|
+
- **ML Engineers** deploying transformer models with maximum performance
|
|
65
|
+
- **Research Teams** working with large language models and MoE architectures
|
|
66
|
+
- **Production Teams** needing automatic optimization without manual tuning
|
|
67
|
+
- **Cloud Users** wanting to eliminate NUMA topology performance penalties
|
|
68
|
+
|
|
69
|
+
## 📚 Documentation
|
|
70
|
+
- Complete tutorial in README.md with 12-step guide
|
|
71
|
+
- Automatic optimization - no new CLI commands to learn
|
|
72
|
+
- Performance estimates and optimization potential indicators
|
|
73
|
+
- Model-specific recommendations automatically generated
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
**Note**: This is a passive optimization release. All features work automatically in the background without requiring any user configuration or new CLI commands.
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Terradev CLI v4.0.2 - Concurrency & Reliability Release
|
|
2
|
+
|
|
3
|
+
**Published: March 9, 2026**
|
|
4
|
+
|
|
5
|
+
## 🚀 Major Features
|
|
6
|
+
|
|
7
|
+
### Spot Preemption Resilience (Problem 2)
|
|
8
|
+
- **Local sidecar script** deployed to every training node that polls cloud metadata endpoints locally
|
|
9
|
+
- **Zero-dependence on SSH** - works even when external network dies before preemption
|
|
10
|
+
- **Multi-provider support**: AWS, GCP, and Azure metadata endpoints
|
|
11
|
+
- **Automatic SIGUSR1 signaling** to torchrun/deepspeed/accelerate processes
|
|
12
|
+
- **Dual-layer defense**: Local sidecar (primary) + SSH fallback (redundant)
|
|
13
|
+
- **Job state tracking**: `PREEMPTED` status with detailed context in SQLite
|
|
14
|
+
|
|
15
|
+
### Post-Provision Verification (Problem 3)
|
|
16
|
+
- **Poll-with-backoff verification**: 10+15+20+25+30s intervals (100s total)
|
|
17
|
+
- **Handles slow providers**: Vast.ai, TensorDock, Lambda Labs that take 45-90s to boot
|
|
18
|
+
- **Three-state reporting**: `verified`, `unverified`, or `pending` per instance
|
|
19
|
+
- **Honest status display**: Shows `[+] verified`, `[?] unverified`, `[~] pending` in CLI output
|
|
20
|
+
- **Early failure detection**: Detects when instances enter `error`/`failed` state during boot
|
|
21
|
+
|
|
22
|
+
### Gang Scheduling Safety (Problem 4)
|
|
23
|
+
- **All-or-nothing provisioning**: For multi-node jobs, partial success = failure
|
|
24
|
+
- **Automatic cleanup**: Terminates all succeeded instances if any node fails
|
|
25
|
+
- **Billing protection**: No orphaned instances from partial failures
|
|
26
|
+
- **Helpful error reporting**: Lists failed providers and suggests retry excluding them
|
|
27
|
+
- **Clean exit**: Returns early with clear guidance instead of leaving partial state
|
|
28
|
+
|
|
29
|
+
## 🔧 Technical Improvements
|
|
30
|
+
|
|
31
|
+
### Core Enhancements
|
|
32
|
+
- **JobStateManager**: Added `PREEMPTED` status to distinguish cloud-initiated termination
|
|
33
|
+
- **TrainingOrchestrator**: Enhanced `resume()` with preemption context and recovery guidance
|
|
34
|
+
- **CLI Provision**: Enhanced with verification polling and gang scheduling logic
|
|
35
|
+
- **AWS Provider**: Real spot preemption handling (replaces stub implementation)
|
|
36
|
+
|
|
37
|
+
### Architecture Changes
|
|
38
|
+
- **Defense-in-depth**: Multiple redundant paths for critical operations
|
|
39
|
+
- **Fail-safe design**: All new features degrade gracefully if components fail
|
|
40
|
+
- **Zero new dependencies**: All improvements use existing infrastructure
|
|
41
|
+
- **Minimal footprint**: ~95 lines across 4 files, no new files required
|
|
42
|
+
|
|
43
|
+
## 📊 Impact
|
|
44
|
+
|
|
45
|
+
### Reliability
|
|
46
|
+
- **Spot safety**: Training jobs now automatically checkpoint on preemption across all major providers
|
|
47
|
+
- **State consistency**: Provision verification eliminates "ghost instances" that never actually start
|
|
48
|
+
- **Cost control**: Gang scheduling prevents billing waste from partial multi-node failures
|
|
49
|
+
|
|
50
|
+
### User Experience
|
|
51
|
+
- **Transparent operation**: All features work silently in the background
|
|
52
|
+
- **Clear reporting**: Enhanced CLI output shows verification status and failure details
|
|
53
|
+
- **Recovery workflow**: Simple `terradev train --resume <job_id>` after preemption
|
|
54
|
+
|
|
55
|
+
## 🛠️ Files Modified
|
|
56
|
+
|
|
57
|
+
- `terradev_cli/core/job_state_manager.py` - Added PREEMPTED status
|
|
58
|
+
- `terradev_cli/core/training_orchestrator.py` - Local sidecar deployment + enhanced resume
|
|
59
|
+
- `terradev_cli/providers/aws_provider.py` - Real spot preemption handling
|
|
60
|
+
- `terradev_cli/cli.py` - Post-provision verification + gang scheduling
|
|
61
|
+
- `terradev_cli/setup.py` - Version bump to 4.0.2
|
|
62
|
+
- `pyproject.toml` - Version bump to 4.0.2
|
|
63
|
+
|
|
64
|
+
## 🧪 Testing
|
|
65
|
+
|
|
66
|
+
All changes verified with:
|
|
67
|
+
- `ast.parse` syntax validation across all modified files
|
|
68
|
+
- Integration testing of spot preemption flow (local sidecar + SSH fallback)
|
|
69
|
+
- Multi-node gang scheduling failure simulation
|
|
70
|
+
- Slow provider verification polling (100s window)
|
|
71
|
+
|
|
72
|
+
## 📝 Notes
|
|
73
|
+
|
|
74
|
+
This release focuses on **production reliability** under load. All features are designed to be:
|
|
75
|
+
- **Backward compatible**: Existing workflows unchanged
|
|
76
|
+
- **Fail-safe**: Training continues even if components fail
|
|
77
|
+
- **Transparent**: No user configuration required
|
|
78
|
+
|
|
79
|
+
The three problems solved represent the most critical failure modes in production ML workloads at scale.
|