agent-os-kernel 1.1.0__py3-none-any.whl → 1.3.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- agent_os/__init__.py +66 -4
- agent_os/agents_compat.py +286 -0
- agent_os/base_agent.py +308 -0
- agent_os/cli.py +1079 -19
- agent_os/integrations/__init__.py +37 -2
- agent_os/integrations/openai_adapter.py +502 -0
- agent_os/integrations/semantic_kernel_adapter.py +569 -0
- agent_os/stateless.py +349 -0
- agent_os_kernel-1.3.0.dist-info/METADATA +676 -0
- agent_os_kernel-1.3.0.dist-info/RECORD +1053 -0
- {agent_os_kernel-1.1.0.dist-info → agent_os_kernel-1.3.0.dist-info}/entry_points.txt +0 -1
- modules/amb/.github/workflows/ci.yml +102 -0
- modules/amb/.github/workflows/publish.yml +146 -0
- modules/amb/.gitignore +134 -0
- modules/amb/CHANGELOG.md +118 -0
- modules/amb/CONTRIBUTING.md +141 -0
- modules/amb/LICENSE +21 -0
- modules/amb/README.md +188 -0
- modules/amb/amb_core/__init__.py +175 -0
- modules/amb/amb_core/adapters/__init__.py +55 -0
- modules/amb/amb_core/adapters/aws_sqs_broker.py +374 -0
- modules/amb/amb_core/adapters/azure_servicebus_broker.py +338 -0
- modules/amb/amb_core/adapters/kafka_broker.py +258 -0
- modules/amb/amb_core/adapters/nats_broker.py +283 -0
- modules/amb/amb_core/adapters/rabbitmq_broker.py +233 -0
- modules/amb/amb_core/adapters/redis_broker.py +260 -0
- modules/amb/amb_core/broker.py +143 -0
- modules/amb/amb_core/bus.py +479 -0
- modules/amb/amb_core/cloudevents.py +507 -0
- modules/amb/amb_core/dlq.py +343 -0
- modules/amb/amb_core/hf_utils.py +534 -0
- modules/amb/amb_core/memory_broker.py +408 -0
- modules/amb/amb_core/models.py +139 -0
- modules/amb/amb_core/persistence.py +527 -0
- modules/amb/amb_core/schema.py +292 -0
- modules/amb/amb_core/tracing.py +356 -0
- modules/amb/examples/advanced_features.py +223 -0
- modules/amb/examples/backpressure_demo.py +225 -0
- modules/amb/examples/basic_usage.py +117 -0
- modules/amb/examples/tracing_demo.py +104 -0
- modules/amb/experiments/README.md +52 -0
- modules/amb/experiments/reproduce_results.py +467 -0
- modules/amb/experiments/results.json +324 -0
- modules/amb/paper/README.md +40 -0
- modules/amb/paper/paper.tex +365 -0
- modules/amb/paper/whitepaper.md +377 -0
- modules/amb/pyproject.toml +117 -0
- modules/amb/tests/__init__.py +1 -0
- modules/amb/tests/test_backpressure_priority.py +280 -0
- modules/amb/tests/test_bus.py +198 -0
- modules/amb/tests/test_cloudevents.py +443 -0
- modules/amb/tests/test_features.py +531 -0
- modules/amb/tests/test_models.py +74 -0
- modules/amb/tests/test_tracing.py +254 -0
- modules/atr/.github/workflows/ci.yml +101 -0
- modules/atr/.github/workflows/publish.yml +140 -0
- modules/atr/.gitignore +134 -0
- modules/atr/.pre-commit-config.yaml +37 -0
- modules/atr/CHANGELOG.md +39 -0
- modules/atr/CONTRIBUTING.md +96 -0
- modules/atr/IMPLEMENTATION_SUMMARY.md +143 -0
- modules/atr/README.md +180 -0
- modules/atr/atr/__init__.py +638 -0
- modules/atr/atr/access.py +346 -0
- modules/atr/atr/composition.py +643 -0
- modules/atr/atr/decorator.py +355 -0
- modules/atr/atr/executor.py +382 -0
- modules/atr/atr/health.py +555 -0
- modules/atr/atr/hf_utils.py +447 -0
- modules/atr/atr/injection.py +420 -0
- modules/atr/atr/metrics.py +438 -0
- modules/atr/atr/policies.py +401 -0
- modules/atr/atr/py.typed +2 -0
- modules/atr/atr/registry.py +450 -0
- modules/atr/atr/schema.py +478 -0
- modules/atr/atr/tools/safe/__init__.py +73 -0
- modules/atr/atr/tools/safe/calculator.py +380 -0
- modules/atr/atr/tools/safe/datetime_tool.py +441 -0
- modules/atr/atr/tools/safe/file_reader.py +400 -0
- modules/atr/atr/tools/safe/http_client.py +314 -0
- modules/atr/atr/tools/safe/json_parser.py +372 -0
- modules/atr/atr/tools/safe/text_tool.py +526 -0
- modules/atr/atr/tools/safe/toolkit.py +173 -0
- modules/atr/docs/PYPI_SETUP.md +113 -0
- modules/atr/examples/README.md +27 -0
- modules/atr/examples/demo.py +144 -0
- modules/atr/examples/sandbox_demo.py +218 -0
- modules/atr/experiments/README.md +69 -0
- modules/atr/experiments/reproduce_results.py +509 -0
- modules/atr/experiments/results/.gitkeep +0 -0
- modules/atr/experiments/results/results_20260123_140334.json +71 -0
- modules/atr/paper/README.md +36 -0
- modules/atr/paper/figures/.gitkeep +0 -0
- modules/atr/paper/references.bib +84 -0
- modules/atr/paper/structure.tex +293 -0
- modules/atr/paper/whitepaper.md +234 -0
- modules/atr/pyproject.toml +148 -0
- modules/atr/requirements.txt +1 -0
- modules/atr/setup.py +30 -0
- modules/atr/tests/__init__.py +1 -0
- modules/atr/tests/test_decorator.py +317 -0
- modules/atr/tests/test_executor.py +245 -0
- modules/atr/tests/test_integration_executor.py +184 -0
- modules/atr/tests/test_registry.py +312 -0
- modules/atr/tests/test_schema.py +182 -0
- modules/atr/tests/test_v2_features.py +708 -0
- modules/caas/.dockerignore +63 -0
- modules/caas/.github/ISSUE_TEMPLATE/bug_report.md +38 -0
- modules/caas/.github/ISSUE_TEMPLATE/custom.md +10 -0
- modules/caas/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
- modules/caas/.github/workflows/ci.yml +100 -0
- modules/caas/.github/workflows/lint.yml +39 -0
- modules/caas/.github/workflows/publish-pypi.yml +124 -0
- modules/caas/.gitignore +73 -0
- modules/caas/.pre-commit-config.yaml +33 -0
- modules/caas/CHANGELOG.md +58 -0
- modules/caas/CONTRIBUTING.md +346 -0
- modules/caas/Dockerfile +41 -0
- modules/caas/LICENSE +21 -0
- modules/caas/MANIFEST.in +11 -0
- modules/caas/README.md +158 -0
- modules/caas/benchmarks/README.md +255 -0
- modules/caas/benchmarks/create_hf_dataset.py +502 -0
- modules/caas/benchmarks/data/sample_corpus/README.md +86 -0
- modules/caas/benchmarks/data/sample_corpus/auth_module.py +211 -0
- modules/caas/benchmarks/data/sample_corpus/contribution_guide.md +185 -0
- modules/caas/benchmarks/data/sample_corpus/remote_work_policy.html +57 -0
- modules/caas/benchmarks/hf_dataset/README.md +214 -0
- modules/caas/benchmarks/hf_dataset/caas_benchmark_corpus.py +73 -0
- modules/caas/benchmarks/hf_dataset/corpus_preview.json +193 -0
- modules/caas/benchmarks/results/README.md +66 -0
- modules/caas/benchmarks/results/evaluation_2026-01-20.json +121 -0
- modules/caas/benchmarks/run_evaluation.py +561 -0
- modules/caas/benchmarks/statistical_tests.py +289 -0
- modules/caas/benchmarks/verify_sample_corpus.py +83 -0
- modules/caas/docker-compose.yml +38 -0
- modules/caas/docs/CONTEXT_TRIAD.md +462 -0
- modules/caas/docs/CONTRIBUTING.md +346 -0
- modules/caas/docs/ETHICS_AND_LIMITATIONS.md +336 -0
- modules/caas/docs/HEURISTIC_ROUTER.md +442 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY.md +363 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY_CONTEXT_TRIAD.md +277 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY_HEURISTIC_ROUTER.md +231 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY_METADATA_INJECTION.md +258 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY_PRAGMATIC_TRUTH.md +212 -0
- modules/caas/docs/IMPLEMENTATION_SUMMARY_TRUST_GATEWAY.md +319 -0
- modules/caas/docs/LAYER_1_PRIMITIVE.md +202 -0
- modules/caas/docs/METADATA_INJECTION.md +404 -0
- modules/caas/docs/PRAGMATIC_TRUTH.md +431 -0
- modules/caas/docs/RELATED_WORK.md +312 -0
- modules/caas/docs/RELEASE_CHECKLIST.md +219 -0
- modules/caas/docs/RELEASE_GUIDE.md +285 -0
- modules/caas/docs/REPRODUCIBILITY.md +386 -0
- modules/caas/docs/SLIDING_WINDOW.md +387 -0
- modules/caas/docs/STRUCTURE_AWARE_INDEXING.md +158 -0
- modules/caas/docs/TESTING.md +259 -0
- modules/caas/docs/THREAT_MODEL.md +247 -0
- modules/caas/docs/TRUST_GATEWAY.md +575 -0
- modules/caas/docs/VFS.md +298 -0
- modules/caas/examples/agents/enterprise_security_agent.py +414 -0
- modules/caas/examples/agents/intelligent_document_analyzer.py +380 -0
- modules/caas/examples/demos/demo.py +309 -0
- modules/caas/examples/demos/demo_context_triad.py +225 -0
- modules/caas/examples/demos/demo_conversation_manager.py +285 -0
- modules/caas/examples/demos/demo_heuristic_router.py +133 -0
- modules/caas/examples/demos/demo_metadata_injection.py +198 -0
- modules/caas/examples/demos/demo_pragmatic_truth.py +303 -0
- modules/caas/examples/demos/demo_structure_aware.py +140 -0
- modules/caas/examples/demos/demo_time_decay.py +247 -0
- modules/caas/examples/demos/demo_trust_gateway.py +383 -0
- modules/caas/examples/multi_agent/README.md +159 -0
- modules/caas/examples/multi_agent/research_team.py +369 -0
- modules/caas/examples/multi_agent/vfs_collaboration.py +393 -0
- modules/caas/examples/usage/auth_module.py +142 -0
- modules/caas/examples/usage/usage_example.py +173 -0
- modules/caas/experiments/README.md +42 -0
- modules/caas/experiments/reproduce_results.py +462 -0
- modules/caas/paper/ARXIV_METADATA.md +145 -0
- modules/caas/paper/ARXIV_README.md +47 -0
- modules/caas/paper/CHECKLIST.md +103 -0
- modules/caas/paper/GITHUB_RELEASE_NOTES.md +105 -0
- modules/caas/paper/README.md +71 -0
- modules/caas/paper/abstract.md +24 -0
- modules/caas/paper/arxiv_submission.tar +0 -0
- modules/caas/paper/arxiv_submission.zip +0 -0
- modules/caas/paper/build_pdf.py +355 -0
- modules/caas/paper/experiments.md +149 -0
- modules/caas/paper/figures/.gitkeep +0 -0
- modules/caas/paper/figures/README.md +237 -0
- modules/caas/paper/figures/fig1_system_architecture.png +0 -0
- modules/caas/paper/figures/fig1_system_architecture.svg +198 -0
- modules/caas/paper/figures/fig2_context_triad.png +0 -0
- modules/caas/paper/figures/fig2_context_triad.svg +105 -0
- modules/caas/paper/figures/fig3_ablation_results.png +0 -0
- modules/caas/paper/figures/fig3_ablation_results.svg +113 -0
- modules/caas/paper/figures/fig4_routing_latency.png +0 -0
- modules/caas/paper/figures/fig4_routing_latency.svg +97 -0
- modules/caas/paper/intro.md +103 -0
- modules/caas/paper/latex/figures/fig1_system_architecture.png +0 -0
- modules/caas/paper/latex/figures/fig2_context_triad.png +0 -0
- modules/caas/paper/latex/figures/fig3_ablation_results.png +0 -0
- modules/caas/paper/latex/figures/fig4_routing_latency.png +0 -0
- modules/caas/paper/latex/main.tex +468 -0
- modules/caas/paper/latex/references.bib +140 -0
- modules/caas/paper/method.md +350 -0
- modules/caas/paper/outline.md +123 -0
- modules/caas/paper/related_work.md +101 -0
- modules/caas/paper/tables/.gitkeep +0 -0
- modules/caas/paper/tables/results_tables.md +50 -0
- modules/caas/pyproject.toml +172 -0
- modules/caas/requirements.txt +11 -0
- modules/caas/src/caas/__init__.py +232 -0
- modules/caas/src/caas/api/__init__.py +7 -0
- modules/caas/src/caas/api/server.py +1326 -0
- modules/caas/src/caas/caching.py +832 -0
- modules/caas/src/caas/cli.py +208 -0
- modules/caas/src/caas/conversation.py +221 -0
- modules/caas/src/caas/decay.py +118 -0
- modules/caas/src/caas/detection/__init__.py +7 -0
- modules/caas/src/caas/detection/detector.py +236 -0
- modules/caas/src/caas/enrichment.py +127 -0
- modules/caas/src/caas/gateway/__init__.py +24 -0
- modules/caas/src/caas/gateway/trust_gateway.py +471 -0
- modules/caas/src/caas/hf_utils.py +477 -0
- modules/caas/src/caas/ingestion/__init__.py +21 -0
- modules/caas/src/caas/ingestion/processors.py +251 -0
- modules/caas/src/caas/ingestion/structure_parser.py +185 -0
- modules/caas/src/caas/models.py +354 -0
- modules/caas/src/caas/pragmatic_truth.py +441 -0
- modules/caas/src/caas/routing/__init__.py +8 -0
- modules/caas/src/caas/routing/heuristic_router.py +242 -0
- modules/caas/src/caas/storage/__init__.py +7 -0
- modules/caas/src/caas/storage/store.py +450 -0
- modules/caas/src/caas/triad.py +472 -0
- modules/caas/src/caas/tuning/__init__.py +7 -0
- modules/caas/src/caas/tuning/tuner.py +322 -0
- modules/caas/src/caas/vfs/__init__.py +12 -0
- modules/caas/src/caas/vfs/filesystem.py +450 -0
- modules/caas/tests/__init__.py +3 -0
- modules/caas/tests/conftest.py +8 -0
- modules/caas/tests/test_caching.py +628 -0
- modules/caas/tests/test_context_triad.py +385 -0
- modules/caas/tests/test_conversation_manager.py +289 -0
- modules/caas/tests/test_functionality.py +215 -0
- modules/caas/tests/test_heuristic_router.py +370 -0
- modules/caas/tests/test_metadata_injection.py +328 -0
- modules/caas/tests/test_pragmatic_truth.py +322 -0
- modules/caas/tests/test_structure_aware_indexing.py +283 -0
- modules/caas/tests/test_time_decay.py +268 -0
- modules/caas/tests/test_trust_gateway.py +445 -0
- modules/caas/tests/test_vfs.py +298 -0
- modules/cmvk/.github/FUNDING.yml +9 -0
- modules/cmvk/.github/dependabot.yml +54 -0
- modules/cmvk/.github/workflows/ci.yml +205 -0
- modules/cmvk/.github/workflows/publish.yml +143 -0
- modules/cmvk/.gitignore +147 -0
- modules/cmvk/.pre-commit-config.yaml +58 -0
- modules/cmvk/CHANGELOG.md +146 -0
- modules/cmvk/CITATION.cff +48 -0
- modules/cmvk/CONTRIBUTING.md +229 -0
- modules/cmvk/Dockerfile +87 -0
- modules/cmvk/HF_MODEL_CARD.md +185 -0
- modules/cmvk/LICENSE +21 -0
- modules/cmvk/README.md +149 -0
- modules/cmvk/SECURITY.md +114 -0
- modules/cmvk/config/prompts/generator_v1.txt +23 -0
- modules/cmvk/config/prompts/verifier_hostile.txt +32 -0
- modules/cmvk/config/settings.yaml +40 -0
- modules/cmvk/coverage_html/.gitignore +2 -0
- modules/cmvk/coverage_html/class_index.html +658 -0
- modules/cmvk/coverage_html/coverage_html_cb_188fc9a4.js +735 -0
- modules/cmvk/coverage_html/favicon_32_cb_c827f16f.png +0 -0
- modules/cmvk/coverage_html/function_index.html +1978 -0
- modules/cmvk/coverage_html/index.html +255 -0
- modules/cmvk/coverage_html/keybd_closed_cb_900cfef5.png +0 -0
- modules/cmvk/coverage_html/status.json +1 -0
- modules/cmvk/coverage_html/style_cb_5c747636.css +389 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38___init___py.html +315 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_audit_py.html +499 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_benchmarks_py.html +575 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_constitutional_py.html +1001 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_hf_utils_py.html +398 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_metrics_py.html +570 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_profiles_py.html +397 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_types_py.html +109 -0
- modules/cmvk/coverage_html/z_2c49bd2ed3e01e38_verification_py.html +1053 -0
- modules/cmvk/docs/DIAGRAMS.md +325 -0
- modules/cmvk/docs/architecture.md +345 -0
- modules/cmvk/docs/features.md +308 -0
- modules/cmvk/docs/getting_started.md +279 -0
- modules/cmvk/docs/innovation_layer.md +377 -0
- modules/cmvk/docs/safety.md +281 -0
- modules/cmvk/docs/traceability.md +150 -0
- modules/cmvk/examples/basic_example.py +62 -0
- modules/cmvk/examples/demo_complete_pipeline.py +209 -0
- modules/cmvk/examples/demo_innovation_layer.py +197 -0
- modules/cmvk/examples/example.py +112 -0
- modules/cmvk/examples/model_diversity_comparison.py +110 -0
- modules/cmvk/examples/real_api_integration.py +121 -0
- modules/cmvk/examples/test_full_pipeline.py +303 -0
- modules/cmvk/experiments/FEATURE_2_LATERAL_THINKING.md +187 -0
- modules/cmvk/experiments/README.md +216 -0
- modules/cmvk/experiments/ablation_runner.py +666 -0
- modules/cmvk/experiments/baseline_runner.py +158 -0
- modules/cmvk/experiments/blind_spot_benchmark.py +364 -0
- modules/cmvk/experiments/datasets/README.md +85 -0
- modules/cmvk/experiments/datasets/humaneval_50.json +352 -0
- modules/cmvk/experiments/datasets/humaneval_full.json +1150 -0
- modules/cmvk/experiments/datasets/humaneval_sample.json +32 -0
- modules/cmvk/experiments/datasets/sabotage.json +262 -0
- modules/cmvk/experiments/datasets/sample.json +40 -0
- modules/cmvk/experiments/demo_with_traces.py +110 -0
- modules/cmvk/experiments/efficiency_curve.py +259 -0
- modules/cmvk/experiments/experiment_runner.py +243 -0
- modules/cmvk/experiments/paper_data_generator.py +183 -0
- modules/cmvk/experiments/reproduce_results.py +407 -0
- modules/cmvk/experiments/reproducible_runner.py +352 -0
- modules/cmvk/experiments/sabotage_stress_test.py +311 -0
- modules/cmvk/experiments/test_lateral_thinking.py +116 -0
- modules/cmvk/experiments/test_prosecutor.py +41 -0
- modules/cmvk/experiments/visualize_results.py +735 -0
- modules/cmvk/logs/traces/demo_HumanEval_0_20260121-204900.json +36 -0
- modules/cmvk/notebooks/analysis.ipynb +124 -0
- modules/cmvk/paper/PAPER.md +561 -0
- modules/cmvk/paper/arxiv_checklist.md +230 -0
- modules/cmvk/paper/cmvk_neurips.aux +77 -0
- modules/cmvk/paper/cmvk_neurips.bbl +81 -0
- modules/cmvk/paper/cmvk_neurips.blg +48 -0
- modules/cmvk/paper/cmvk_neurips.out +16 -0
- modules/cmvk/paper/cmvk_neurips.pdf +0 -0
- modules/cmvk/paper/cmvk_neurips.tex +309 -0
- modules/cmvk/paper/figures/ablation.png +0 -0
- modules/cmvk/paper/figures/ablation.svg +39 -0
- modules/cmvk/paper/figures/architecture.png +0 -0
- modules/cmvk/paper/figures/architecture.svg +115 -0
- modules/cmvk/paper/figures/results_bar.png +0 -0
- modules/cmvk/paper/figures/results_bar.svg +70 -0
- modules/cmvk/paper/generate_figures.py +383 -0
- modules/cmvk/paper/neurips_2024.sty +101 -0
- modules/cmvk/paper/references.bib +98 -0
- modules/cmvk/paper/structure.tex +200 -0
- modules/cmvk/pyproject.toml +189 -0
- modules/cmvk/requirements-dev.txt +19 -0
- modules/cmvk/requirements.txt +14 -0
- modules/cmvk/src/cmvk/__init__.py +216 -0
- modules/cmvk/src/cmvk/audit.py +400 -0
- modules/cmvk/src/cmvk/benchmarks.py +476 -0
- modules/cmvk/src/cmvk/constitutional.py +902 -0
- modules/cmvk/src/cmvk/hf_utils.py +299 -0
- modules/cmvk/src/cmvk/metrics.py +471 -0
- modules/cmvk/src/cmvk/profiles.py +298 -0
- modules/cmvk/src/cmvk/py.typed +0 -0
- modules/cmvk/src/cmvk/types.py +10 -0
- modules/cmvk/src/cmvk/verification.py +954 -0
- modules/cmvk/src/cross_model_verification_kernel/__init__.py +91 -0
- modules/cmvk/src/cross_model_verification_kernel/__main__.py +10 -0
- modules/cmvk/src/cross_model_verification_kernel/agents/__init__.py +16 -0
- modules/cmvk/src/cross_model_verification_kernel/agents/base_agent.py +142 -0
- modules/cmvk/src/cross_model_verification_kernel/agents/generator_openai.py +223 -0
- modules/cmvk/src/cross_model_verification_kernel/agents/verifier_anthropic.py +448 -0
- modules/cmvk/src/cross_model_verification_kernel/agents/verifier_gemini.py +481 -0
- modules/cmvk/src/cross_model_verification_kernel/cli.py +570 -0
- modules/cmvk/src/cross_model_verification_kernel/core/__init__.py +26 -0
- modules/cmvk/src/cross_model_verification_kernel/core/graph_memory.py +308 -0
- modules/cmvk/src/cross_model_verification_kernel/core/kernel.py +413 -0
- modules/cmvk/src/cross_model_verification_kernel/core/trace_logger.py +75 -0
- modules/cmvk/src/cross_model_verification_kernel/core/types.py +121 -0
- modules/cmvk/src/cross_model_verification_kernel/datasets/__init__.py +20 -0
- modules/cmvk/src/cross_model_verification_kernel/datasets/humaneval_loader.py +271 -0
- modules/cmvk/src/cross_model_verification_kernel/generator.py +118 -0
- modules/cmvk/src/cross_model_verification_kernel/kernel.py +292 -0
- modules/cmvk/src/cross_model_verification_kernel/models.py +111 -0
- modules/cmvk/src/cross_model_verification_kernel/py.typed +1 -0
- modules/cmvk/src/cross_model_verification_kernel/simple_kernel.py +185 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/__init__.py +94 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/huggingface_upload.py +394 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/sandbox.py +159 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/statistics.py +468 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/visualizer.py +312 -0
- modules/cmvk/src/cross_model_verification_kernel/tools/web_search.py +86 -0
- modules/cmvk/src/cross_model_verification_kernel/verifier.py +257 -0
- modules/cmvk/tests/__init__.py +3 -0
- modules/cmvk/tests/conftest.py +61 -0
- modules/cmvk/tests/integration/__init__.py +1 -0
- modules/cmvk/tests/integration/test_anthropic_verifier.py +269 -0
- modules/cmvk/tests/integration/test_integration.py +53 -0
- modules/cmvk/tests/integration/test_lateral_thinking_integration.py +199 -0
- modules/cmvk/tests/integration/test_lateral_thinking_witness.py +208 -0
- modules/cmvk/tests/integration/test_prosecutor_mode.py +131 -0
- modules/cmvk/tests/test_constitutional.py +611 -0
- modules/cmvk/tests/test_enhanced_features.py +603 -0
- modules/cmvk/tests/test_verification.py +255 -0
- modules/cmvk/tests/unit/__init__.py +1 -0
- modules/cmvk/tests/unit/test_agents.py +64 -0
- modules/cmvk/tests/unit/test_cli.py +224 -0
- modules/cmvk/tests/unit/test_core.py +126 -0
- modules/cmvk/tests/unit/test_humaneval_loader.py +197 -0
- modules/cmvk/tests/unit/test_kernel.py +255 -0
- modules/cmvk/tests/unit/test_reproducibility.py +160 -0
- modules/cmvk/tests/unit/test_trace_logger.py +115 -0
- modules/cmvk/tests/unit/test_visualizer.py +218 -0
- modules/control-plane/.github/ISSUE_TEMPLATE/bug_report.yml +82 -0
- modules/control-plane/.github/ISSUE_TEMPLATE/config.yml +11 -0
- modules/control-plane/.github/ISSUE_TEMPLATE/feature_request.yml +104 -0
- modules/control-plane/.github/ISSUE_TEMPLATE/question.yml +70 -0
- modules/control-plane/.github/ISSUE_TEMPLATE/security_vulnerability.yml +84 -0
- modules/control-plane/.github/discussions.yml +73 -0
- modules/control-plane/.github/pull_request_template.md +82 -0
- modules/control-plane/.github/workflows/publish.yml +146 -0
- modules/control-plane/.github/workflows/release.yml +39 -0
- modules/control-plane/.github/workflows/tests.yml +58 -0
- modules/control-plane/.gitignore +55 -0
- modules/control-plane/CHANGELOG.md +203 -0
- modules/control-plane/CONTRIBUTING.md +311 -0
- modules/control-plane/CONTRIBUTORS.md +88 -0
- modules/control-plane/Dockerfile +82 -0
- modules/control-plane/LICENSE +21 -0
- modules/control-plane/MANIFEST.in +17 -0
- modules/control-plane/README.md +1264 -0
- modules/control-plane/ROADMAP.md +228 -0
- modules/control-plane/SECURITY.md +210 -0
- modules/control-plane/SUPPORT.md +106 -0
- modules/control-plane/acp-cli.py +212 -0
- modules/control-plane/benchmark/README.md +257 -0
- modules/control-plane/benchmark/__init__.py +19 -0
- modules/control-plane/benchmark/red_team_dataset.py +517 -0
- modules/control-plane/benchmark.py +563 -0
- modules/control-plane/build_and_publish.sh +130 -0
- modules/control-plane/docker-compose.yml +74 -0
- modules/control-plane/docs/ABLATION_STUDIES.md +528 -0
- modules/control-plane/docs/ADAPTER_GUIDE.md +544 -0
- modules/control-plane/docs/ADVANCED_FEATURES.md +543 -0
- modules/control-plane/docs/AIOS_COMPARISON.md +296 -0
- modules/control-plane/docs/BIBLIOGRAPHY.md +367 -0
- modules/control-plane/docs/CASE_STUDIES.md +645 -0
- modules/control-plane/docs/DOCKER_DEPLOYMENT.md +184 -0
- modules/control-plane/docs/ECOSYSTEM_STATUS.md +98 -0
- modules/control-plane/docs/HF_MODEL_CARD.md +168 -0
- modules/control-plane/docs/KERNEL_V1_RELEASE.md +454 -0
- modules/control-plane/docs/LAYER3_FRAMEWORK.md +227 -0
- modules/control-plane/docs/LIMITATIONS.md +523 -0
- modules/control-plane/docs/PYPI_PUBLISHING.md +195 -0
- modules/control-plane/docs/README.md +58 -0
- modules/control-plane/docs/RELATED_WORK.md +319 -0
- modules/control-plane/docs/RELEASE_v1.1.0.md +252 -0
- modules/control-plane/docs/REPRODUCIBILITY.md +540 -0
- modules/control-plane/docs/RESEARCH_FOUNDATION.md +197 -0
- modules/control-plane/docs/api/CORE.md +270 -0
- modules/control-plane/docs/architecture/architecture.md +120 -0
- modules/control-plane/docs/community/ANNOUNCEMENT_TEMPLATES.md +52 -0
- modules/control-plane/docs/guides/IMPLEMENTATION.md +225 -0
- modules/control-plane/docs/guides/PHILOSOPHY.md +354 -0
- modules/control-plane/docs/guides/QUICKSTART.md +217 -0
- modules/control-plane/examples/README.md +138 -0
- modules/control-plane/examples/a2a_demo.py +410 -0
- modules/control-plane/examples/adapter_demo.py +347 -0
- modules/control-plane/examples/advanced_features.py +403 -0
- modules/control-plane/examples/basic_usage.py +261 -0
- modules/control-plane/examples/benchmark_demo.py +186 -0
- modules/control-plane/examples/compliance_demo.py +333 -0
- modules/control-plane/examples/configuration.py +265 -0
- modules/control-plane/examples/getting_started.py +178 -0
- modules/control-plane/examples/hibernation_and_time_travel_demo.py +406 -0
- modules/control-plane/examples/interactive_tutorial.ipynb +497 -0
- modules/control-plane/examples/kernel_interceptor_demo.py +202 -0
- modules/control-plane/examples/kernel_v1_demo.py +273 -0
- modules/control-plane/examples/langchain_demo.py +281 -0
- modules/control-plane/examples/lifecycle_demo.py +724 -0
- modules/control-plane/examples/mcp_demo.py +378 -0
- modules/control-plane/examples/ml_safety_demo.py +157 -0
- modules/control-plane/examples/multimodal_demo.py +347 -0
- modules/control-plane/examples/observability_demo.py +370 -0
- modules/control-plane/examples/use_cases.py +336 -0
- modules/control-plane/experiments/long_horizon_purge.py +235 -0
- modules/control-plane/experiments/multi_agent_rag.py +165 -0
- modules/control-plane/experiments/reproduce_results.py +667 -0
- modules/control-plane/paper/ARXIV_SUBMISSION_INFO.txt +122 -0
- modules/control-plane/paper/ETHICS_STATEMENT.md +248 -0
- modules/control-plane/paper/PAPER_CHECKLIST.md +72 -0
- modules/control-plane/paper/Paper.pdf +0 -0
- modules/control-plane/paper/README.md +71 -0
- modules/control-plane/paper/appendix.md +152 -0
- modules/control-plane/paper/architecture.md +15 -0
- modules/control-plane/paper/arxiv/figures/ablation_chart.png +0 -0
- modules/control-plane/paper/arxiv/figures/architecture.png +0 -0
- modules/control-plane/paper/arxiv/figures/constraint_graphs.png +0 -0
- modules/control-plane/paper/arxiv/figures/results_chart.png +0 -0
- modules/control-plane/paper/arxiv/main.aux +97 -0
- modules/control-plane/paper/arxiv/main.bbl +112 -0
- modules/control-plane/paper/arxiv/main.blg +48 -0
- modules/control-plane/paper/arxiv/main.out +33 -0
- modules/control-plane/paper/arxiv/main.pdf +0 -0
- modules/control-plane/paper/arxiv/main.tex +479 -0
- modules/control-plane/paper/arxiv/references.bib +234 -0
- modules/control-plane/paper/arxiv_submission.tar +0 -0
- modules/control-plane/paper/arxiv_submission.zip +0 -0
- modules/control-plane/paper/build.sh +68 -0
- modules/control-plane/paper/figures/README.md +47 -0
- modules/control-plane/paper/figures/ablation_chart.pdf +0 -0
- modules/control-plane/paper/figures/ablation_chart.png +0 -0
- modules/control-plane/paper/figures/architecture.pdf +0 -0
- modules/control-plane/paper/figures/architecture.png +0 -0
- modules/control-plane/paper/figures/constraint_graphs.pdf +0 -0
- modules/control-plane/paper/figures/constraint_graphs.png +0 -0
- modules/control-plane/paper/figures/generate_figures.py +252 -0
- modules/control-plane/paper/figures/results_chart.pdf +0 -0
- modules/control-plane/paper/figures/results_chart.png +0 -0
- modules/control-plane/paper/main.md +273 -0
- modules/control-plane/paper/main.tex +214 -0
- modules/control-plane/paper/main_arxiv.aux +53 -0
- modules/control-plane/paper/main_arxiv.out +17 -0
- modules/control-plane/paper/main_arxiv.pdf +0 -0
- modules/control-plane/paper/main_arxiv.tex +264 -0
- modules/control-plane/paper/references.bib +234 -0
- modules/control-plane/pyproject.toml +124 -0
- modules/control-plane/reproducibility/ABLATIONS.md +136 -0
- modules/control-plane/reproducibility/README.md +288 -0
- modules/control-plane/reproducibility/commands.md +467 -0
- modules/control-plane/reproducibility/docker_config/Dockerfile +39 -0
- modules/control-plane/reproducibility/experiment_configs/purge_config.json +46 -0
- modules/control-plane/reproducibility/experiment_configs/rag_config.json +36 -0
- modules/control-plane/reproducibility/hardware_specs.md +317 -0
- modules/control-plane/reproducibility/requirements_frozen.txt +0 -0
- modules/control-plane/reproducibility/run_all_experiments.sh +45 -0
- modules/control-plane/reproducibility/seeds.json +106 -0
- modules/control-plane/scripts/prepare_pypi.py +46 -0
- modules/control-plane/scripts/prepare_release.py +176 -0
- modules/control-plane/scripts/upload_dataset_to_hf.py +316 -0
- modules/control-plane/setup.py +69 -0
- modules/control-plane/src/agent_control_plane/__init__.py +639 -0
- modules/control-plane/src/agent_control_plane/a2a_adapter.py +541 -0
- modules/control-plane/src/agent_control_plane/adapter.py +415 -0
- modules/control-plane/src/agent_control_plane/agent_hibernation.py +364 -0
- modules/control-plane/src/agent_control_plane/agent_kernel.py +464 -0
- modules/control-plane/src/agent_control_plane/compliance.py +718 -0
- modules/control-plane/src/agent_control_plane/constraint_graphs.py +475 -0
- modules/control-plane/src/agent_control_plane/control_plane.py +848 -0
- modules/control-plane/src/agent_control_plane/example_executors.py +193 -0
- modules/control-plane/src/agent_control_plane/execution_engine.py +229 -0
- modules/control-plane/src/agent_control_plane/flight_recorder.py +600 -0
- modules/control-plane/src/agent_control_plane/governance_layer.py +432 -0
- modules/control-plane/src/agent_control_plane/hf_utils.py +561 -0
- modules/control-plane/src/agent_control_plane/interfaces/__init__.py +53 -0
- modules/control-plane/src/agent_control_plane/interfaces/kernel_interface.py +359 -0
- modules/control-plane/src/agent_control_plane/interfaces/plugin_interface.py +495 -0
- modules/control-plane/src/agent_control_plane/interfaces/protocol_interfaces.py +385 -0
- modules/control-plane/src/agent_control_plane/kernel_space.py +707 -0
- modules/control-plane/src/agent_control_plane/langchain_adapter.py +422 -0
- modules/control-plane/src/agent_control_plane/lifecycle.py +3111 -0
- modules/control-plane/src/agent_control_plane/mcp_adapter.py +517 -0
- modules/control-plane/src/agent_control_plane/ml_safety.py +560 -0
- modules/control-plane/src/agent_control_plane/multimodal.py +724 -0
- modules/control-plane/src/agent_control_plane/mute_agent.py +419 -0
- modules/control-plane/src/agent_control_plane/observability.py +785 -0
- modules/control-plane/src/agent_control_plane/orchestrator.py +480 -0
- modules/control-plane/src/agent_control_plane/plugin_registry.py +748 -0
- modules/control-plane/src/agent_control_plane/policy_engine.py +525 -0
- modules/control-plane/src/agent_control_plane/shadow_mode.py +307 -0
- modules/control-plane/src/agent_control_plane/signals.py +491 -0
- modules/control-plane/src/agent_control_plane/supervisor_agents.py +427 -0
- modules/control-plane/src/agent_control_plane/time_travel_debugger.py +554 -0
- modules/control-plane/src/agent_control_plane/tool_registry.py +350 -0
- modules/control-plane/src/agent_control_plane/vfs.py +695 -0
- modules/control-plane/tests/README.md +33 -0
- modules/control-plane/tests/test_a2a_adapter.py +336 -0
- modules/control-plane/tests/test_adapter.py +422 -0
- modules/control-plane/tests/test_advanced_features.py +389 -0
- modules/control-plane/tests/test_benchmark.py +223 -0
- modules/control-plane/tests/test_compliance.py +214 -0
- modules/control-plane/tests/test_control_plane.py +295 -0
- modules/control-plane/tests/test_hibernation.py +274 -0
- modules/control-plane/tests/test_kernel_interception.py +284 -0
- modules/control-plane/tests/test_langchain_adapter.py +258 -0
- modules/control-plane/tests/test_lifecycle.py +1174 -0
- modules/control-plane/tests/test_mcp_adapter.py +293 -0
- modules/control-plane/tests/test_ml_safety.py +142 -0
- modules/control-plane/tests/test_multimodal.py +317 -0
- modules/control-plane/tests/test_new_features.py +435 -0
- modules/control-plane/tests/test_observability.py +338 -0
- modules/control-plane/tests/test_time_travel.py +387 -0
- modules/emk/.github/workflows/ci.yml +105 -0
- modules/emk/.github/workflows/publish.yml +144 -0
- modules/emk/.gitignore +74 -0
- modules/emk/CHANGELOG.md +41 -0
- modules/emk/CONTRIBUTING.md +295 -0
- modules/emk/IMPLEMENTATION.md +174 -0
- modules/emk/LICENSE +21 -0
- modules/emk/MANIFEST.in +8 -0
- modules/emk/README.md +135 -0
- modules/emk/RELEASE_NOTES.md +82 -0
- modules/emk/SECURITY.md +52 -0
- modules/emk/codecov.yml +39 -0
- modules/emk/docs/MEMORY_MANAGEMENT.md +285 -0
- modules/emk/emk/__init__.py +106 -0
- modules/emk/emk/hf_utils.py +419 -0
- modules/emk/emk/indexer.py +144 -0
- modules/emk/emk/py.typed +0 -0
- modules/emk/emk/schema.py +204 -0
- modules/emk/emk/sleep_cycle.py +345 -0
- modules/emk/emk/store.py +479 -0
- modules/emk/examples/basic_usage.py +123 -0
- modules/emk/examples/memory_features_demo.py +154 -0
- modules/emk/experiments/README.md +59 -0
- modules/emk/experiments/reproduce_results.py +461 -0
- modules/emk/experiments/results.json +61 -0
- modules/emk/paper/structure.tex +192 -0
- modules/emk/paper/whitepaper.md +273 -0
- modules/emk/pyproject.toml +91 -0
- modules/emk/setup.py +5 -0
- modules/emk/tests/test_file_adapter.py +195 -0
- modules/emk/tests/test_indexer.py +174 -0
- modules/emk/tests/test_init.py +55 -0
- modules/emk/tests/test_negative_memory.py +83 -0
- modules/emk/tests/test_schema.py +150 -0
- modules/emk/tests/test_semantic_rules.py +175 -0
- modules/emk/tests/test_sleep_cycle.py +335 -0
- modules/emk/tests/test_store_anti_patterns.py +239 -0
- modules/iatp/.github/workflows/docker-build.yml +124 -0
- modules/iatp/.github/workflows/publish.yml +174 -0
- modules/iatp/.github/workflows/python-package.yml +121 -0
- modules/iatp/.gitignore +67 -0
- modules/iatp/.pre-commit-config.yaml +64 -0
- modules/iatp/CHANGELOG.md +120 -0
- modules/iatp/Dockerfile +91 -0
- modules/iatp/IMPLEMENTATION_SUMMARY.md +218 -0
- modules/iatp/MANIFEST.in +9 -0
- modules/iatp/README.md +180 -0
- modules/iatp/docker/Dockerfile.agent +27 -0
- modules/iatp/docker/Dockerfile.sidecar-python +86 -0
- modules/iatp/docker/README.md +258 -0
- modules/iatp/docker-compose.yml +194 -0
- modules/iatp/docs/ARCHITECTURE.md +243 -0
- modules/iatp/docs/CLI_GUIDE.md +220 -0
- modules/iatp/docs/DEPLOYMENT.md +304 -0
- modules/iatp/examples/README.md +132 -0
- modules/iatp/examples/backend_agent.py +39 -0
- modules/iatp/examples/client.py +168 -0
- modules/iatp/examples/demo_attestation_reputation.py +274 -0
- modules/iatp/examples/demo_client.py +240 -0
- modules/iatp/examples/demo_rbac.py +143 -0
- modules/iatp/examples/integration_demo.py +245 -0
- modules/iatp/examples/manifests/coder_agent.json +20 -0
- modules/iatp/examples/manifests/reviewer_agent.json +19 -0
- modules/iatp/examples/manifests/secure_bank.json +14 -0
- modules/iatp/examples/manifests/standard_agent.json +14 -0
- modules/iatp/examples/manifests/untrusted_honeypot.json +14 -0
- modules/iatp/examples/run_secure_bank_sidecar.py +85 -0
- modules/iatp/examples/run_sidecar.py +105 -0
- modules/iatp/examples/run_untrusted_sidecar.py +77 -0
- modules/iatp/examples/secure_bank_agent.py +138 -0
- modules/iatp/examples/test_untrusted.py +82 -0
- modules/iatp/examples/untrusted_agent.py +119 -0
- modules/iatp/experiments/README.md +58 -0
- modules/iatp/experiments/cascading_hallucination/README.md +149 -0
- modules/iatp/experiments/cascading_hallucination/agent_a_user.py +41 -0
- modules/iatp/experiments/cascading_hallucination/agent_b_summarizer.py +54 -0
- modules/iatp/experiments/cascading_hallucination/agent_c_database.py +47 -0
- modules/iatp/experiments/cascading_hallucination/proof_of_concept.py +290 -0
- modules/iatp/experiments/cascading_hallucination/run_experiment.py +226 -0
- modules/iatp/experiments/cascading_hallucination/sidecar_c.py +61 -0
- modules/iatp/experiments/reproduce_results.py +574 -0
- modules/iatp/experiments/results.json +2336 -0
- modules/iatp/iatp/__init__.py +164 -0
- modules/iatp/iatp/attestation.py +401 -0
- modules/iatp/iatp/cli.py +253 -0
- modules/iatp/iatp/hf_utils.py +469 -0
- modules/iatp/iatp/ipc_pipes.py +578 -0
- modules/iatp/iatp/main.py +410 -0
- modules/iatp/iatp/models/__init__.py +445 -0
- modules/iatp/iatp/policy_engine.py +335 -0
- modules/iatp/iatp/py.typed +2 -0
- modules/iatp/iatp/recovery.py +319 -0
- modules/iatp/iatp/security/__init__.py +268 -0
- modules/iatp/iatp/sidecar/__init__.py +517 -0
- modules/iatp/iatp/telemetry/__init__.py +162 -0
- modules/iatp/iatp/tests/__init__.py +1 -0
- modules/iatp/iatp/tests/test_attestation.py +368 -0
- modules/iatp/iatp/tests/test_cli.py +129 -0
- modules/iatp/iatp/tests/test_models.py +128 -0
- modules/iatp/iatp/tests/test_policy_engine.py +345 -0
- modules/iatp/iatp/tests/test_recovery.py +279 -0
- modules/iatp/iatp/tests/test_security.py +220 -0
- modules/iatp/iatp/tests/test_sidecar.py +165 -0
- modules/iatp/iatp/tests/test_telemetry.py +173 -0
- modules/iatp/paper/BLOG.md +307 -0
- modules/iatp/paper/PAPER.md +236 -0
- modules/iatp/paper/RFC_SUBMISSION.md +299 -0
- modules/iatp/paper/whitepaper.md +369 -0
- modules/iatp/proto/README.md +200 -0
- modules/iatp/proto/generate_stubs.py +81 -0
- modules/iatp/proto/iatp.proto +552 -0
- modules/iatp/pyproject.toml +180 -0
- modules/iatp/requirements-dev.txt +2 -0
- modules/iatp/requirements.txt +6 -0
- modules/iatp/setup.py +60 -0
- modules/iatp/sidecar/README.md +487 -0
- modules/iatp/sidecar/go/Dockerfile +32 -0
- modules/iatp/sidecar/go/README.md +237 -0
- modules/iatp/sidecar/go/go.mod +8 -0
- modules/iatp/sidecar/go/main.go +488 -0
- modules/iatp/spec/001-handshake.md +436 -0
- modules/iatp/spec/002-reversibility.md +394 -0
- modules/iatp/spec/schema/capability_manifest.json +266 -0
- modules/iatp/test_integration.py +310 -0
- modules/mcp-kernel-server/README.md +261 -0
- modules/mcp-kernel-server/pyproject.toml +60 -0
- modules/mcp-kernel-server/src/mcp_kernel_server/__init__.py +26 -0
- modules/mcp-kernel-server/src/mcp_kernel_server/cli.py +229 -0
- modules/mcp-kernel-server/src/mcp_kernel_server/resources.py +215 -0
- modules/mcp-kernel-server/src/mcp_kernel_server/server.py +562 -0
- modules/mcp-kernel-server/src/mcp_kernel_server/tools.py +1172 -0
- modules/mute-agent/.github/workflows/safety_check.yml +45 -0
- modules/mute-agent/.gitignore +53 -0
- modules/mute-agent/ARCHITECTURE.md +531 -0
- modules/mute-agent/BENCHMARK_GUIDE.md +384 -0
- modules/mute-agent/COMPLETION_SUMMARY.md +293 -0
- modules/mute-agent/EXPERIMENT_SUMMARY.md +318 -0
- modules/mute-agent/IMPLEMENTATION_SUMMARY.md +212 -0
- modules/mute-agent/LICENSE +21 -0
- modules/mute-agent/PHASE3_SUMMARY.md +297 -0
- modules/mute-agent/README.md +360 -0
- modules/mute-agent/STEEL_MAN_RESULTS.md +353 -0
- modules/mute-agent/USAGE.md +505 -0
- modules/mute-agent/V2_IMPLEMENTATION_SUMMARY.md +253 -0
- modules/mute-agent/V2_STEEL_MAN_IMPLEMENTATION.md +274 -0
- modules/mute-agent/VERIFICATION_REPORT.md +435 -0
- modules/mute-agent/charts/cost_comparison.png +0 -0
- modules/mute-agent/charts/cost_vs_ambiguity.png +0 -0
- modules/mute-agent/charts/metrics_comparison.png +0 -0
- modules/mute-agent/charts/scenario_breakdown.png +0 -0
- modules/mute-agent/charts/trace_attack_blocked.html +140 -0
- modules/mute-agent/charts/trace_attack_blocked.png +0 -0
- modules/mute-agent/charts/trace_failure.html +140 -0
- modules/mute-agent/charts/trace_failure.png +0 -0
- modules/mute-agent/charts/trace_success.html +140 -0
- modules/mute-agent/charts/trace_success.png +0 -0
- modules/mute-agent/examples/__init__.py +1 -0
- modules/mute-agent/examples/advanced_example.py +384 -0
- modules/mute-agent/examples/graph_debugger_demo.py +241 -0
- modules/mute-agent/examples/listener_example.py +297 -0
- modules/mute-agent/examples/simple_example.py +242 -0
- modules/mute-agent/examples/steel_man_demo.py +297 -0
- modules/mute-agent/experiments/README.md +135 -0
- modules/mute-agent/experiments/__init__.py +3 -0
- modules/mute-agent/experiments/agent_comparison.csv +6 -0
- modules/mute-agent/experiments/agent_comparison_50runs.csv +6 -0
- modules/mute-agent/experiments/ambiguity_test.py +335 -0
- modules/mute-agent/experiments/ambiguity_test_results.csv +31 -0
- modules/mute-agent/experiments/ambiguity_test_results_50runs.csv +51 -0
- modules/mute-agent/experiments/baseline_agent.py +189 -0
- modules/mute-agent/experiments/benchmark.py +402 -0
- modules/mute-agent/experiments/demo.py +172 -0
- modules/mute-agent/experiments/generate_cost_curve.py +474 -0
- modules/mute-agent/experiments/jailbreak_test.py +137 -0
- modules/mute-agent/experiments/latent_state_scenario.py +361 -0
- modules/mute-agent/experiments/mute_agent_experiment.py +349 -0
- modules/mute-agent/experiments/run_extended_experiment.py +40 -0
- modules/mute-agent/experiments/run_v2_experiments.py +266 -0
- modules/mute-agent/experiments/run_v2_experiments_auto.py +247 -0
- modules/mute-agent/experiments/v2_scenarios/README.md +214 -0
- modules/mute-agent/experiments/v2_scenarios/__init__.py +4 -0
- modules/mute-agent/experiments/v2_scenarios/scenario_1_deep_dependency.py +325 -0
- modules/mute-agent/experiments/v2_scenarios/scenario_2_adversarial.py +328 -0
- modules/mute-agent/experiments/v2_scenarios/scenario_3_false_positive.py +303 -0
- modules/mute-agent/experiments/v2_scenarios/scenario_4_performance.py +319 -0
- modules/mute-agent/experiments/visualize.py +400 -0
- modules/mute-agent/mute_agent/__init__.py +66 -0
- modules/mute-agent/mute_agent/core/__init__.py +1 -0
- modules/mute-agent/mute_agent/core/execution_agent.py +164 -0
- modules/mute-agent/mute_agent/core/handshake_protocol.py +199 -0
- modules/mute-agent/mute_agent/core/reasoning_agent.py +236 -0
- modules/mute-agent/mute_agent/knowledge_graph/__init__.py +1 -0
- modules/mute-agent/mute_agent/knowledge_graph/graph_elements.py +63 -0
- modules/mute-agent/mute_agent/knowledge_graph/multidimensional_graph.py +168 -0
- modules/mute-agent/mute_agent/knowledge_graph/subgraph.py +222 -0
- modules/mute-agent/mute_agent/listener/__init__.py +41 -0
- modules/mute-agent/mute_agent/listener/adapters/__init__.py +29 -0
- modules/mute-agent/mute_agent/listener/adapters/base_adapter.py +187 -0
- modules/mute-agent/mute_agent/listener/adapters/caas_adapter.py +342 -0
- modules/mute-agent/mute_agent/listener/adapters/control_plane_adapter.py +434 -0
- modules/mute-agent/mute_agent/listener/adapters/iatp_adapter.py +330 -0
- modules/mute-agent/mute_agent/listener/adapters/scak_adapter.py +249 -0
- modules/mute-agent/mute_agent/listener/listener.py +608 -0
- modules/mute-agent/mute_agent/listener/state_observer.py +434 -0
- modules/mute-agent/mute_agent/listener/threshold_config.py +311 -0
- modules/mute-agent/mute_agent/super_system/__init__.py +1 -0
- modules/mute-agent/mute_agent/super_system/router.py +202 -0
- modules/mute-agent/mute_agent/visualization/__init__.py +8 -0
- modules/mute-agent/mute_agent/visualization/graph_debugger.py +495 -0
- modules/mute-agent/requirements-dev.txt +6 -0
- modules/mute-agent/requirements.txt +9 -0
- modules/mute-agent/setup.py +64 -0
- modules/mute-agent/src/__init__.py +0 -0
- modules/mute-agent/src/agents/__init__.py +0 -0
- modules/mute-agent/src/agents/baseline_agent.py +524 -0
- modules/mute-agent/src/agents/interactive_agent.py +113 -0
- modules/mute-agent/src/agents/mute_agent.py +622 -0
- modules/mute-agent/src/benchmarks/__init__.py +0 -0
- modules/mute-agent/src/benchmarks/evaluator.py +481 -0
- modules/mute-agent/src/benchmarks/scenarios.json +985 -0
- modules/mute-agent/src/core/__init__.py +0 -0
- modules/mute-agent/src/core/mock_state.py +320 -0
- modules/mute-agent/src/core/tools.py +441 -0
- modules/nexus/__init__.py +49 -0
- modules/nexus/arbiter.py +357 -0
- modules/nexus/client.py +464 -0
- modules/nexus/dmz.py +417 -0
- modules/nexus/escrow.py +428 -0
- modules/nexus/exceptions.py +284 -0
- modules/nexus/registry.py +391 -0
- modules/nexus/reputation.py +423 -0
- modules/nexus/schemas/__init__.py +49 -0
- modules/nexus/schemas/compliance.py +274 -0
- modules/nexus/schemas/escrow.py +249 -0
- modules/nexus/schemas/manifest.py +223 -0
- modules/nexus/schemas/receipt.py +206 -0
- modules/observability/README.md +192 -0
- modules/observability/alertmanager/alertmanager.yml +116 -0
- modules/observability/alerts/agent-os-alerts.yaml +197 -0
- modules/observability/docker-compose.yml +128 -0
- modules/observability/grafana/dashboards/agent-os-amb.json +448 -0
- modules/observability/grafana/dashboards/agent-os-cmvk.json +441 -0
- modules/observability/grafana/dashboards/agent-os-overview.json +268 -0
- modules/observability/grafana/dashboards/agent-os-performance.json +15 -0
- modules/observability/grafana/dashboards/agent-os-safety.json +50 -0
- modules/observability/grafana/provisioning/dashboards/dashboards.yml +15 -0
- modules/observability/grafana/provisioning/datasources/datasources.yml +33 -0
- modules/observability/otel/otel-collector-config.yml +61 -0
- modules/observability/prometheus/prometheus.yml +63 -0
- modules/observability/pyproject.toml +53 -0
- modules/observability/scripts/export_dashboards.py +55 -0
- modules/observability/src/agent_os_observability/__init__.py +25 -0
- modules/observability/src/agent_os_observability/dashboards.py +896 -0
- modules/observability/src/agent_os_observability/metrics.py +396 -0
- modules/observability/src/agent_os_observability/server.py +221 -0
- modules/observability/src/agent_os_observability/tracer.py +226 -0
- modules/primitives/.gitignore +8 -0
- modules/primitives/README.md +62 -0
- modules/primitives/agent_primitives/__init__.py +22 -0
- modules/primitives/agent_primitives/failures.py +82 -0
- modules/primitives/agent_primitives/py.typed +0 -0
- modules/primitives/pyproject.toml +68 -0
- modules/scak/.github/copilot-instructions.md +396 -0
- modules/scak/.github/workflows/release.yml +117 -0
- modules/scak/.gitignore +32 -0
- modules/scak/CHANGELOG.md +173 -0
- modules/scak/CITATION.cff +62 -0
- modules/scak/CONTRIBUTING.md +429 -0
- modules/scak/Dockerfile +58 -0
- modules/scak/ENTERPRISE_FEATURES.md +518 -0
- modules/scak/IMPLEMENTATION_SUMMARY.md +206 -0
- modules/scak/LIMITATIONS.md +565 -0
- modules/scak/MANIFEST.in +16 -0
- modules/scak/NOVELTY.md +535 -0
- modules/scak/README.md +928 -0
- modules/scak/RESEARCH.md +670 -0
- modules/scak/agent_kernel/__init__.py +66 -0
- modules/scak/agent_kernel/analyzer.py +432 -0
- modules/scak/agent_kernel/auditor.py +31 -0
- modules/scak/agent_kernel/completeness_auditor.py +234 -0
- modules/scak/agent_kernel/detector.py +200 -0
- modules/scak/agent_kernel/kernel.py +741 -0
- modules/scak/agent_kernel/memory_manager.py +82 -0
- modules/scak/agent_kernel/models.py +372 -0
- modules/scak/agent_kernel/nudge_mechanism.py +260 -0
- modules/scak/agent_kernel/outcome_analyzer.py +335 -0
- modules/scak/agent_kernel/patcher.py +579 -0
- modules/scak/agent_kernel/semantic_analyzer.py +313 -0
- modules/scak/agent_kernel/semantic_purge.py +346 -0
- modules/scak/agent_kernel/simulator.py +447 -0
- modules/scak/agent_kernel/teacher.py +82 -0
- modules/scak/agent_kernel/triage.py +149 -0
- modules/scak/build_and_publish.ps1 +74 -0
- modules/scak/build_and_publish.sh +74 -0
- modules/scak/cli.py +471 -0
- modules/scak/dashboard.py +462 -0
- modules/scak/datasets/DATASET_CARD.md +219 -0
- modules/scak/datasets/README.md +143 -0
- modules/scak/datasets/gaia_vague_queries/vague_queries.json +262 -0
- modules/scak/datasets/hf_upload/README.md +219 -0
- modules/scak/datasets/hf_upload/scak_gaia_laziness.jsonl +50 -0
- modules/scak/datasets/prepare_hf_datasets.py +145 -0
- modules/scak/datasets/red_team/jailbreak_patterns.json +202 -0
- modules/scak/docker-compose.yml +99 -0
- modules/scak/docs/Adaptive-Memory-Hierarchy.md +319 -0
- modules/scak/docs/Data-Contracts-and-Schemas.md +285 -0
- modules/scak/docs/Dual-Loop-Architecture.md +344 -0
- modules/scak/docs/Enhanced-Features.md +612 -0
- modules/scak/docs/LANGCHAIN_INTEGRATION.md +572 -0
- modules/scak/docs/README.md +128 -0
- modules/scak/docs/Reference-Implementations.md +163 -0
- modules/scak/docs/SCAK_V2.md +374 -0
- modules/scak/docs/Three-Failure-Types.md +178 -0
- modules/scak/examples/basic_example.py +155 -0
- modules/scak/examples/circuit_breaker_lazy_eval_demo.py +243 -0
- modules/scak/examples/langchain_integration_example.py +339 -0
- modules/scak/examples/layer4_demo.py +243 -0
- modules/scak/examples/production_features_demo.py +353 -0
- modules/scak/examples/quick_demo.py +79 -0
- modules/scak/examples/scak_v2_demo.py +252 -0
- modules/scak/experiments/README.md +438 -0
- modules/scak/experiments/ablation_studies/README.md +192 -0
- modules/scak/experiments/ablation_studies/ablation_no_audit.py +116 -0
- modules/scak/experiments/ablation_studies/ablation_no_purge.py +133 -0
- modules/scak/experiments/chaos_engineering/README.md +332 -0
- modules/scak/experiments/context_efficiency_test.py +328 -0
- modules/scak/experiments/gaia_benchmark/README.md +208 -0
- modules/scak/experiments/laziness_benchmark.py +179 -0
- modules/scak/experiments/long_horizon_task_experiment.py +252 -0
- modules/scak/experiments/multi_agent_rag_experiment.py +284 -0
- modules/scak/experiments/results/ablation_table.md +12 -0
- modules/scak/experiments/results/long_horizon.json +36 -0
- modules/scak/experiments/results/multi_agent_rag.json +66 -0
- modules/scak/experiments/run_comprehensive_ablations.py +332 -0
- modules/scak/experiments/test_auditor_patcher_integration.py +251 -0
- modules/scak/notebooks/getting_started.ipynb +33 -0
- modules/scak/paper/ARXIV_SUBMISSION_METADATA.txt +109 -0
- modules/scak/paper/PAPER_CHECKLIST.md +304 -0
- modules/scak/paper/Paper.pdf +0 -0
- modules/scak/paper/README.md +113 -0
- modules/scak/paper/appendix.md +351 -0
- modules/scak/paper/arxiv/bibliography.bib +284 -0
- modules/scak/paper/arxiv/fig1_ooda_architecture.pdf +0 -0
- modules/scak/paper/arxiv/fig2_memory_hierarchy.pdf +0 -0
- modules/scak/paper/arxiv/fig3_gaia_results.pdf +0 -0
- modules/scak/paper/arxiv/fig4_ablation_heatmap.pdf +0 -0
- modules/scak/paper/arxiv/fig5_context_reduction.pdf +0 -0
- modules/scak/paper/arxiv/fig6_mttr_boxplot.pdf +0 -0
- modules/scak/paper/arxiv/main.aux +103 -0
- modules/scak/paper/arxiv/main.bbl +113 -0
- modules/scak/paper/arxiv/main.blg +55 -0
- modules/scak/paper/arxiv/main.out +31 -0
- modules/scak/paper/arxiv/main.pdf +0 -0
- modules/scak/paper/arxiv/main.tex +482 -0
- modules/scak/paper/arxiv_submission/bibliography.bib +284 -0
- modules/scak/paper/arxiv_submission/fig1_ooda_architecture.pdf +0 -0
- modules/scak/paper/arxiv_submission/fig2_memory_hierarchy.pdf +0 -0
- modules/scak/paper/arxiv_submission/fig3_gaia_results.pdf +0 -0
- modules/scak/paper/arxiv_submission/fig4_ablation_heatmap.pdf +0 -0
- modules/scak/paper/arxiv_submission/fig5_context_reduction.pdf +0 -0
- modules/scak/paper/arxiv_submission/fig6_mttr_boxplot.pdf +0 -0
- modules/scak/paper/arxiv_submission/main.aux +103 -0
- modules/scak/paper/arxiv_submission/main.bbl +113 -0
- modules/scak/paper/arxiv_submission/main.blg +55 -0
- modules/scak/paper/arxiv_submission/main.out +31 -0
- modules/scak/paper/arxiv_submission/main.pdf +0 -0
- modules/scak/paper/arxiv_submission/main.tex +482 -0
- modules/scak/paper/arxiv_submission.tar.gz +0 -0
- modules/scak/paper/bibliography.bib +284 -0
- modules/scak/paper/build.sh +55 -0
- modules/scak/paper/figures/README.md +32 -0
- modules/scak/paper/figures/fig1_ooda_architecture.md +75 -0
- modules/scak/paper/figures/fig1_ooda_architecture.pdf +0 -0
- modules/scak/paper/figures/fig1_ooda_architecture.png +0 -0
- modules/scak/paper/figures/fig2_memory_hierarchy.md +83 -0
- modules/scak/paper/figures/fig2_memory_hierarchy.pdf +0 -0
- modules/scak/paper/figures/fig2_memory_hierarchy.png +0 -0
- modules/scak/paper/figures/fig3_gaia_results.md +64 -0
- modules/scak/paper/figures/fig3_gaia_results.pdf +0 -0
- modules/scak/paper/figures/fig3_gaia_results.png +0 -0
- modules/scak/paper/figures/fig4_ablation_heatmap.md +64 -0
- modules/scak/paper/figures/fig4_ablation_heatmap.pdf +0 -0
- modules/scak/paper/figures/fig4_ablation_heatmap.png +0 -0
- modules/scak/paper/figures/fig5_context_reduction.md +71 -0
- modules/scak/paper/figures/fig5_context_reduction.pdf +0 -0
- modules/scak/paper/figures/fig5_context_reduction.png +0 -0
- modules/scak/paper/figures/fig6_mttr_boxplot.md +80 -0
- modules/scak/paper/figures/fig6_mttr_boxplot.pdf +0 -0
- modules/scak/paper/figures/fig6_mttr_boxplot.png +0 -0
- modules/scak/paper/figures/generate_figures.py +463 -0
- modules/scak/paper/main.aux +103 -0
- modules/scak/paper/main.bbl +113 -0
- modules/scak/paper/main.blg +55 -0
- modules/scak/paper/main.md +192 -0
- modules/scak/paper/main.out +31 -0
- modules/scak/paper/main.pdf +0 -0
- modules/scak/paper/main.tex +482 -0
- modules/scak/reproducibility/ABLATIONS.md +225 -0
- modules/scak/reproducibility/Dockerfile.reproducibility +34 -0
- modules/scak/reproducibility/README.md +421 -0
- modules/scak/reproducibility/requirements-pinned.txt +32 -0
- modules/scak/reproducibility/run_all_experiments.py +395 -0
- modules/scak/reproducibility/seed_control.py +53 -0
- modules/scak/reproducibility/statistical_analysis.py +302 -0
- modules/scak/requirements.txt +50 -0
- modules/scak/setup.py +93 -0
- modules/scak/src/__init__.py +124 -0
- modules/scak/src/agents/__init__.py +13 -0
- modules/scak/src/agents/conflict_resolution.py +732 -0
- modules/scak/src/agents/orchestrator.py +761 -0
- modules/scak/src/agents/pubsub.py +484 -0
- modules/scak/src/agents/shadow_teacher.py +344 -0
- modules/scak/src/agents/swarm.py +661 -0
- modules/scak/src/agents/worker.py +357 -0
- modules/scak/src/integrations/__init__.py +81 -0
- modules/scak/src/integrations/cmvk_adapter.py +430 -0
- modules/scak/src/integrations/control_plane_adapter.py +601 -0
- modules/scak/src/integrations/langchain_integration.py +902 -0
- modules/scak/src/interfaces/__init__.py +59 -0
- modules/scak/src/interfaces/llm_clients.py +505 -0
- modules/scak/src/interfaces/openapi_tools.py +611 -0
- modules/scak/src/interfaces/plugin_system.py +605 -0
- modules/scak/src/interfaces/protocols.py +365 -0
- modules/scak/src/interfaces/telemetry.py +464 -0
- modules/scak/src/interfaces/tool_registry.py +547 -0
- modules/scak/src/kernel/__init__.py +100 -0
- modules/scak/src/kernel/auditor.py +305 -0
- modules/scak/src/kernel/circuit_breaker.py +398 -0
- modules/scak/src/kernel/core.py +724 -0
- modules/scak/src/kernel/distributed.py +667 -0
- modules/scak/src/kernel/evolution.py +455 -0
- modules/scak/src/kernel/failover.py +621 -0
- modules/scak/src/kernel/governance.py +710 -0
- modules/scak/src/kernel/governance_v2.py +603 -0
- modules/scak/src/kernel/lazy_evaluator.py +514 -0
- modules/scak/src/kernel/load_testing.py +633 -0
- modules/scak/src/kernel/memory.py +945 -0
- modules/scak/src/kernel/patcher.py +581 -0
- modules/scak/src/kernel/rubric.py +419 -0
- modules/scak/src/kernel/schemas.py +390 -0
- modules/scak/src/kernel/skill_mapper.py +309 -0
- modules/scak/src/kernel/triage.py +149 -0
- modules/scak/src/mocks/__init__.py +99 -0
- modules/scak/tests/__init__.py +1 -0
- modules/scak/tests/test_circuit_breaker.py +403 -0
- modules/scak/tests/test_conflict_resolution.py +287 -0
- modules/scak/tests/test_dual_loop.py +463 -0
- modules/scak/tests/test_enhanced_features.py +421 -0
- modules/scak/tests/test_failover_and_load.py +438 -0
- modules/scak/tests/test_governance.py +185 -0
- modules/scak/tests/test_kernel.py +359 -0
- modules/scak/tests/test_langchain_integration.py +451 -0
- modules/scak/tests/test_lazy_evaluator.py +465 -0
- modules/scak/tests/test_llm_clients.py +122 -0
- modules/scak/tests/test_memory_controller.py +528 -0
- modules/scak/tests/test_orchestrator.py +181 -0
- modules/scak/tests/test_phase3_integration.py +265 -0
- modules/scak/tests/test_pubsub_swarm.py +203 -0
- modules/scak/tests/test_reference_implementations.py +240 -0
- modules/scak/tests/test_rubric.py +363 -0
- modules/scak/tests/test_scak_v2.py +651 -0
- modules/scak/tests/test_skill_mapper.py +217 -0
- modules/scak/tests/test_specific_failures.py +393 -0
- modules/scak/tests/test_tool_registry.py +264 -0
- modules/scak/tests/test_tools_and_plugins.py +303 -0
- modules/scak/tests/test_triage.py +596 -0
- modules/scak/tests/test_write_through.py +319 -0
- agent_os_kernel-1.1.0.dist-info/METADATA +0 -400
- agent_os_kernel-1.1.0.dist-info/RECORD +0 -12
- {agent_os_kernel-1.1.0.dist-info → agent_os_kernel-1.3.0.dist-info}/WHEEL +0 -0
- {agent_os_kernel-1.1.0.dist-info → agent_os_kernel-1.3.0.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,297 @@
|
|
|
1
|
+
# Phase 3 Implementation Summary
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Phase 3 adds the "Evidence & Verification Layer" to the Mute Agent architecture, transforming it from a solid technical implementation into **The Industry Reference for Agent Architecture** through reproducible proof and visual evidence.
|
|
6
|
+
|
|
7
|
+
## What Was Delivered
|
|
8
|
+
|
|
9
|
+
### 1. Graph Debugger - Visual Trace Generation ✅
|
|
10
|
+
|
|
11
|
+
**Purpose:** Generate visual artifacts proving deterministic safety.
|
|
12
|
+
|
|
13
|
+
**Implementation:**
|
|
14
|
+
- New module: `mute_agent/visualization/graph_debugger.py`
|
|
15
|
+
- Tracks execution flow through knowledge graph
|
|
16
|
+
- Generates interactive HTML (pyvis) and static PNG (matplotlib) visualizations
|
|
17
|
+
- Color coding:
|
|
18
|
+
- 🟢 Green: Successfully traversed nodes
|
|
19
|
+
- 🔴 Red: Exact failure point (constraint violated)
|
|
20
|
+
- ⚪ Grey: Unreachable nodes (path severed)
|
|
21
|
+
|
|
22
|
+
**Usage:**
|
|
23
|
+
```bash
|
|
24
|
+
python examples/graph_debugger_demo.py
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Key Insight:** You can show a screenshot where the agent *physically could not* reach dangerous nodes like "Delete DB" because the path was severed by missing approval tokens.
|
|
28
|
+
|
|
29
|
+
**Files:**
|
|
30
|
+
- `mute_agent/visualization/__init__.py`
|
|
31
|
+
- `mute_agent/visualization/graph_debugger.py`
|
|
32
|
+
- `examples/graph_debugger_demo.py`
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### 2. Cost of Curiosity Curve ✅
|
|
37
|
+
|
|
38
|
+
**Purpose:** Prove that clarification is expensive, not free.
|
|
39
|
+
|
|
40
|
+
**Implementation:**
|
|
41
|
+
- Experiment script: `experiments/generate_cost_curve.py`
|
|
42
|
+
- Runs 50 trials across ambiguity spectrum (0.0 = clear, 1.0 = totally ambiguous)
|
|
43
|
+
- Compares token cost: Mute Agent vs Interactive Agent
|
|
44
|
+
- Generates matplotlib chart showing cost trends
|
|
45
|
+
|
|
46
|
+
**Results:**
|
|
47
|
+
- **Mute Agent**: Flat line at 50 tokens (constant cost)
|
|
48
|
+
- **Interactive Agent**: Exponential curve averaging 444 tokens
|
|
49
|
+
- **Token Reduction**: 88.7%
|
|
50
|
+
- **Key Finding**: Interactive Agent enters costly clarification loops; Mute Agent rejects in 1 hop
|
|
51
|
+
|
|
52
|
+
**Hypothesis Validated:**
|
|
53
|
+
- ✅ Mute Agent maintains CONSTANT cost regardless of ambiguity
|
|
54
|
+
- ✅ Interactive Agent cost EXPLODES as ambiguity increases
|
|
55
|
+
- ✅ Clarification is NOT free - it's exponentially expensive
|
|
56
|
+
|
|
57
|
+
**Usage:**
|
|
58
|
+
```bash
|
|
59
|
+
python experiments/generate_cost_curve.py --trials 50
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**Files:**
|
|
63
|
+
- `experiments/generate_cost_curve.py`
|
|
64
|
+
- `charts/cost_comparison.png` (example output)
|
|
65
|
+
- `cost_curve_results.json` (example data)
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
### 3. Latent State Trap - Graph as Single Source of Truth ✅
|
|
70
|
+
|
|
71
|
+
**Purpose:** Test what happens when user belief conflicts with reality.
|
|
72
|
+
|
|
73
|
+
**Implementation:**
|
|
74
|
+
- Scenario script: `experiments/latent_state_scenario.py`
|
|
75
|
+
- Tests "Drifting Configuration" scenarios:
|
|
76
|
+
- User thinks Service-A is on Port 80 → Graph shows Port 8080
|
|
77
|
+
- User thinks Service-B is on old host → Graph shows new host
|
|
78
|
+
|
|
79
|
+
**Results:**
|
|
80
|
+
- **Mute Agent**: Rejects wrong assumptions, auto-corrects based on graph
|
|
81
|
+
- **Interactive Agent**: Attempts wrong configuration, may hallucinate success
|
|
82
|
+
|
|
83
|
+
**Key Insight:** "The Graph is the Single Source of Truth, not the Prompt"
|
|
84
|
+
|
|
85
|
+
**Real-World Impact:**
|
|
86
|
+
- Prevents "oops wrong environment" incidents
|
|
87
|
+
- Catches outdated runbooks
|
|
88
|
+
- Enforces infrastructure-as-code truth
|
|
89
|
+
- Auto-detects configuration drift
|
|
90
|
+
|
|
91
|
+
**Usage:**
|
|
92
|
+
```bash
|
|
93
|
+
python experiments/latent_state_scenario.py
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Files:**
|
|
97
|
+
- `experiments/latent_state_scenario.py`
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
### 4. CI/CD Guardrail Action ✅
|
|
102
|
+
|
|
103
|
+
**Purpose:** Ensure graph logic doesn't degrade as features are added.
|
|
104
|
+
|
|
105
|
+
**Implementation:**
|
|
106
|
+
- GitHub workflow: `.github/workflows/safety_check.yml`
|
|
107
|
+
- Runs on every PR to main/develop branches
|
|
108
|
+
- Executes jailbreak test suite: `experiments/jailbreak_test.py`
|
|
109
|
+
- Tests 10 adversarial attack types:
|
|
110
|
+
- Authority override
|
|
111
|
+
- Role manipulation
|
|
112
|
+
- Instruction override
|
|
113
|
+
- Emotional manipulation
|
|
114
|
+
- Confusion attack
|
|
115
|
+
- Encoding attack
|
|
116
|
+
- Context poisoning
|
|
117
|
+
- Multi-turn manipulation
|
|
118
|
+
- Hypothetical scenario
|
|
119
|
+
- Authority impersonation
|
|
120
|
+
|
|
121
|
+
**Pass Criteria:**
|
|
122
|
+
- ✅ Leakage Rate == 0.0%
|
|
123
|
+
- ✅ All 10 attacks blocked
|
|
124
|
+
- ✅ No false positives on legitimate requests
|
|
125
|
+
|
|
126
|
+
**Result:** 0% leakage rate achieved
|
|
127
|
+
|
|
128
|
+
**Enterprise Maturity:** "We don't just hope it's safe; we break the build if it isn't."
|
|
129
|
+
|
|
130
|
+
**Usage:**
|
|
131
|
+
```bash
|
|
132
|
+
# Locally
|
|
133
|
+
python experiments/jailbreak_test.py --verbose
|
|
134
|
+
|
|
135
|
+
# Automatically runs in CI/CD on every PR
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
**Files:**
|
|
139
|
+
- `.github/workflows/safety_check.yml`
|
|
140
|
+
- `experiments/jailbreak_test.py`
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Dependencies Added
|
|
145
|
+
|
|
146
|
+
```txt
|
|
147
|
+
matplotlib>=3.5.0 # Existing, for charts
|
|
148
|
+
networkx>=2.6.0 # NEW - Graph algorithms
|
|
149
|
+
pyvis>=0.3.0 # NEW - Interactive graph visualization
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Testing Results
|
|
155
|
+
|
|
156
|
+
All features have been thoroughly tested:
|
|
157
|
+
|
|
158
|
+
### Graph Debugger
|
|
159
|
+
- ✅ Generates interactive HTML visualizations
|
|
160
|
+
- ✅ Generates static PNG images
|
|
161
|
+
- ✅ Correctly colors nodes (green/red/grey)
|
|
162
|
+
- ✅ Shows traversed paths
|
|
163
|
+
- ✅ Marks unreachable nodes
|
|
164
|
+
|
|
165
|
+
### Cost Curve
|
|
166
|
+
- ✅ Runs 50 trials successfully
|
|
167
|
+
- ✅ Demonstrates flat cost for Mute Agent
|
|
168
|
+
- ✅ Shows exponential cost for Interactive Agent
|
|
169
|
+
- ✅ Generates publication-quality charts
|
|
170
|
+
- ✅ Saves JSON results for analysis
|
|
171
|
+
|
|
172
|
+
### Latent State
|
|
173
|
+
- ✅ Detects user's wrong assumptions
|
|
174
|
+
- ✅ Graph provides correct state
|
|
175
|
+
- ✅ Auto-correction demonstrated
|
|
176
|
+
- ✅ Shows hallucination risk in Interactive Agents
|
|
177
|
+
|
|
178
|
+
### Jailbreak Suite
|
|
179
|
+
- ✅ All 10 attack types blocked (0% leakage)
|
|
180
|
+
- ✅ Runs in CI/CD pipeline
|
|
181
|
+
- ✅ Fails build on security issues
|
|
182
|
+
- ✅ Validates legitimate requests work correctly
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Visual Evidence
|
|
187
|
+
|
|
188
|
+
### Cost of Curiosity
|
|
189
|
+

|
|
190
|
+
*Mute Agent: constant cost (flat line). Interactive Agent: exponential cost explosion.*
|
|
191
|
+
|
|
192
|
+
### Graph Trace - Failure
|
|
193
|
+

|
|
194
|
+
*Red node shows exact failure point. Grey nodes are unreachable.*
|
|
195
|
+
|
|
196
|
+
### Graph Trace - Attack Blocked
|
|
197
|
+

|
|
198
|
+
*Visual proof: agent physically could not reach dangerous action.*
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
## Documentation Updates
|
|
203
|
+
|
|
204
|
+
Updated `README.md` with:
|
|
205
|
+
- New "Phase 3: Evidence & Verification Features" section
|
|
206
|
+
- Usage examples for all new features
|
|
207
|
+
- Visual examples with embedded images
|
|
208
|
+
- Links to experiment scripts
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Why This Matters
|
|
213
|
+
|
|
214
|
+
### Before Phase 3:
|
|
215
|
+
- Solid architecture and implementation
|
|
216
|
+
- Good experimental results
|
|
217
|
+
- Difficult to visualize/prove safety to stakeholders
|
|
218
|
+
|
|
219
|
+
### After Phase 3:
|
|
220
|
+
- **Visual Proof**: Screenshot showing blocked dangerous actions
|
|
221
|
+
- **Evidence-Based**: 88.7% token reduction with data
|
|
222
|
+
- **Reproducible**: 50-trial experiments anyone can run
|
|
223
|
+
- **Enterprise Ready**: CI/CD guardrails prevent regressions
|
|
224
|
+
- **Industry Reference**: Complete evidence layer for agent safety
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Key Metrics
|
|
229
|
+
|
|
230
|
+
| Metric | Value |
|
|
231
|
+
|--------|-------|
|
|
232
|
+
| Token Reduction | 88.7% |
|
|
233
|
+
| Leakage Rate | 0.0% |
|
|
234
|
+
| Attacks Blocked | 10/10 (100%) |
|
|
235
|
+
| Clarifications (Mute) | 0 |
|
|
236
|
+
| Clarifications (Interactive) | 141 |
|
|
237
|
+
| Avg Tokens (Mute) | 50 |
|
|
238
|
+
| Avg Tokens (Interactive) | 444 |
|
|
239
|
+
| Avg Turns (Mute) | 1.0 |
|
|
240
|
+
| Avg Turns (Interactive) | 4.8 |
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## Quotes from PRD (Achieved)
|
|
245
|
+
|
|
246
|
+
✅ "Generate a visual artifact for every execution."
|
|
247
|
+
✅ "Green Path: The nodes traversed successfully."
|
|
248
|
+
✅ "Red Node: The exact node where the constraint failed."
|
|
249
|
+
✅ "Grey Nodes: The unreachable parts of the graph."
|
|
250
|
+
✅ "This proves 'Deterministic Safety.'"
|
|
251
|
+
✅ "People think 'Clarification' is free. You need to prove it is expensive."
|
|
252
|
+
✅ "A Matplotlib chart comparing Token Cost vs. Ambiguity."
|
|
253
|
+
✅ "Hypothesis: Mute Agent is a flat line (cost is constant)."
|
|
254
|
+
✅ "Interactive Agent: Exponential curve."
|
|
255
|
+
✅ "The Graph is the Single Source of Truth, not the Prompt."
|
|
256
|
+
✅ "A GitHub Action that runs the 'Jailbreak Suite' on every PR."
|
|
257
|
+
✅ "Pass if Leakage_Rate == 0%."
|
|
258
|
+
✅ "We don't just hope it's safe; we break the build if it isn't."
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Next Steps (Future Work)
|
|
263
|
+
|
|
264
|
+
While Phase 3 is complete, potential enhancements:
|
|
265
|
+
|
|
266
|
+
1. **Advanced Visualizations**
|
|
267
|
+
- 3D graph rendering for complex graphs
|
|
268
|
+
- Animation showing execution flow over time
|
|
269
|
+
- Diff view comparing two execution traces
|
|
270
|
+
|
|
271
|
+
2. **Extended Cost Analysis**
|
|
272
|
+
- Real API cost tracking (OpenAI/Anthropic pricing)
|
|
273
|
+
- Latency vs accuracy tradeoffs
|
|
274
|
+
- Scale testing (100+ node graphs)
|
|
275
|
+
|
|
276
|
+
3. **Additional Scenarios**
|
|
277
|
+
- Multi-agent coordination scenarios
|
|
278
|
+
- Real-time system updates during execution
|
|
279
|
+
- Distributed graph consensus
|
|
280
|
+
|
|
281
|
+
4. **Monitoring & Observability**
|
|
282
|
+
- Real-time trace dashboard
|
|
283
|
+
- Alerting on constraint violations
|
|
284
|
+
- Historical analysis of execution patterns
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## Conclusion
|
|
289
|
+
|
|
290
|
+
Phase 3 successfully transforms Mute Agent into **The Industry Reference for Agent Architecture** by providing:
|
|
291
|
+
|
|
292
|
+
1. **Visual Proof** - Screenshots showing deterministic safety
|
|
293
|
+
2. **Quantitative Evidence** - 88.7% token reduction with rigorous experiments
|
|
294
|
+
3. **Reproducibility** - Anyone can run experiments and verify results
|
|
295
|
+
4. **Enterprise Maturity** - CI/CD guardrails ensuring safety doesn't degrade
|
|
296
|
+
|
|
297
|
+
The implementation is complete, tested, and documented. All PRD requirements have been met or exceeded.
|
|
@@ -0,0 +1,360 @@
|
|
|
1
|
+
# Mute Agent
|
|
2
|
+
|
|
3
|
+
> **Part of [Agent OS](https://github.com/imran-siddique/agent-os)** - Kernel-level governance for AI agents
|
|
4
|
+
|
|
5
|
+
**Decoupling Reasoning from Execution using a Dynamic Semantic Handshake Protocol**
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
The Mute Agent is an advanced agent architecture that decouples reasoning (The Face) from execution (The Hands) using a Dynamic Semantic Handshake Protocol. Instead of free-text tool invocation, the Reasoning Agent must negotiate actions against a Multidimensional Knowledge Graph.
|
|
10
|
+
|
|
11
|
+
## Key Components
|
|
12
|
+
|
|
13
|
+
### 1. The Face (Reasoning Agent)
|
|
14
|
+
The thinking component responsible for:
|
|
15
|
+
- Analyzing context
|
|
16
|
+
- Reasoning about available actions
|
|
17
|
+
- Proposing actions based on graph constraints
|
|
18
|
+
- Validating action proposals against the knowledge graph
|
|
19
|
+
|
|
20
|
+
### 2. The Hands (Execution Agent)
|
|
21
|
+
The action component responsible for:
|
|
22
|
+
- Executing validated actions
|
|
23
|
+
- Managing action handlers
|
|
24
|
+
- Tracking execution results
|
|
25
|
+
- Reporting execution statistics
|
|
26
|
+
|
|
27
|
+
### 3. Dynamic Semantic Handshake Protocol
|
|
28
|
+
The negotiation mechanism that:
|
|
29
|
+
- Manages the communication between reasoning and execution
|
|
30
|
+
- Enforces strict validation before execution
|
|
31
|
+
- Tracks the complete lifecycle of action proposals
|
|
32
|
+
- Provides session-based negotiation
|
|
33
|
+
|
|
34
|
+
### 4. Multidimensional Knowledge Graph
|
|
35
|
+
A dynamic constraint layer that:
|
|
36
|
+
- Organizes knowledge into multiple dimensions
|
|
37
|
+
- Acts as a "Forest of Trees" with dimensional subgraphs
|
|
38
|
+
- Provides graph-based constraint validation
|
|
39
|
+
- Enables fine-grained action space pruning
|
|
40
|
+
|
|
41
|
+
### 5. Super System Router
|
|
42
|
+
The routing component that:
|
|
43
|
+
- Analyzes context to select relevant dimensions
|
|
44
|
+
- Prunes the action space before the agent acts
|
|
45
|
+
- Implements the "Forest of Trees" approach
|
|
46
|
+
- Provides efficient action space management
|
|
47
|
+
|
|
48
|
+
## Architecture
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Context → Super System Router → Dimensional Subgraphs → Pruned Action Space
|
|
52
|
+
↓
|
|
53
|
+
Knowledge Graph
|
|
54
|
+
↓
|
|
55
|
+
The Face (Reasoning) ←→ Handshake Protocol ←→ The Hands (Execution)
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Installation
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
pip install -e .
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
For development with testing tools:
|
|
65
|
+
```bash
|
|
66
|
+
pip install -e ".[dev]"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Quick Start
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
from mute_agent import (
|
|
73
|
+
ReasoningAgent,
|
|
74
|
+
ExecutionAgent,
|
|
75
|
+
HandshakeProtocol,
|
|
76
|
+
MultidimensionalKnowledgeGraph,
|
|
77
|
+
SuperSystemRouter,
|
|
78
|
+
)
|
|
79
|
+
from mute_agent.knowledge_graph.graph_elements import Node, NodeType, Edge, EdgeType
|
|
80
|
+
from mute_agent.knowledge_graph.subgraph import Dimension
|
|
81
|
+
|
|
82
|
+
# 1. Create a knowledge graph
|
|
83
|
+
kg = MultidimensionalKnowledgeGraph()
|
|
84
|
+
|
|
85
|
+
# 2. Define dimensions
|
|
86
|
+
security_dim = Dimension(
|
|
87
|
+
name="security",
|
|
88
|
+
description="Security constraints",
|
|
89
|
+
priority=10
|
|
90
|
+
)
|
|
91
|
+
kg.add_dimension(security_dim)
|
|
92
|
+
|
|
93
|
+
# 3. Add actions and constraints
|
|
94
|
+
action = Node(
|
|
95
|
+
id="read_file",
|
|
96
|
+
node_type=NodeType.ACTION,
|
|
97
|
+
attributes={"operation": "read"}
|
|
98
|
+
)
|
|
99
|
+
kg.add_node_to_dimension("security", action)
|
|
100
|
+
|
|
101
|
+
# 4. Initialize components
|
|
102
|
+
router = SuperSystemRouter(kg)
|
|
103
|
+
protocol = HandshakeProtocol()
|
|
104
|
+
reasoning_agent = ReasoningAgent(kg, router, protocol)
|
|
105
|
+
execution_agent = ExecutionAgent(protocol)
|
|
106
|
+
|
|
107
|
+
# 5. Register action handlers
|
|
108
|
+
def read_handler(params):
|
|
109
|
+
return {"content": "file content"}
|
|
110
|
+
|
|
111
|
+
execution_agent.register_action_handler("read_file", read_handler)
|
|
112
|
+
|
|
113
|
+
# 6. Reason and execute
|
|
114
|
+
context = {"user": "admin", "authenticated": True}
|
|
115
|
+
session = reasoning_agent.propose_action(
|
|
116
|
+
action_id="read_file",
|
|
117
|
+
parameters={"path": "/data/file.txt"},
|
|
118
|
+
context=context,
|
|
119
|
+
justification="User requested file read"
|
|
120
|
+
)
|
|
121
|
+
|
|
122
|
+
if session.validation_result.is_valid:
|
|
123
|
+
protocol.accept_proposal(session.session_id)
|
|
124
|
+
result = execution_agent.execute(session.session_id)
|
|
125
|
+
print(result.execution_result)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Examples
|
|
129
|
+
|
|
130
|
+
Run the included example:
|
|
131
|
+
```bash
|
|
132
|
+
python examples/simple_example.py
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Phase 3: Evidence & Verification Features 🎯
|
|
136
|
+
|
|
137
|
+
### 1. Graph Debugger - Visual Trace Generation
|
|
138
|
+
|
|
139
|
+
Generate visual artifacts proving **Deterministic Safety**. Shows exactly where and why actions were blocked.
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
python examples/graph_debugger_demo.py
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
**Features:**
|
|
146
|
+
- 🟢 **Green Path**: Nodes traversed successfully
|
|
147
|
+
- 🔴 **Red Node**: Exact point where constraint failed
|
|
148
|
+
- ⚪ **Grey Nodes**: Unreachable (path severed)
|
|
149
|
+
|
|
150
|
+
**Outputs:**
|
|
151
|
+
- Interactive HTML visualizations (pyvis)
|
|
152
|
+
- Static PNG images (matplotlib)
|
|
153
|
+
|
|
154
|
+
**Why This Matters:**
|
|
155
|
+
- Proves you can show a screenshot where the agent *physically could not* reach dangerous nodes
|
|
156
|
+
- No magic - visual proof of constraint enforcement
|
|
157
|
+
- Debuggable and auditable execution traces
|
|
158
|
+
|
|
159
|
+

|
|
160
|
+
*Visualization showing `delete_db` blocked with unreachable prerequisites*
|
|
161
|
+
|
|
162
|
+

|
|
163
|
+
*Red node shows exact failure point with constraint violations*
|
|
164
|
+
|
|
165
|
+
### 2. Cost of Curiosity Curve
|
|
166
|
+
|
|
167
|
+
Proves that **clarification is expensive** - Interactive Agents enter costly loops while Mute Agent maintains constant cost.
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
python experiments/generate_cost_curve.py --trials 50
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**Results:**
|
|
174
|
+
- **Mute Agent**: Flat line (50 tokens, rejects ambiguous in 1 hop)
|
|
175
|
+
- **Interactive Agent**: Exponential curve (444 avg tokens, enters clarification loops)
|
|
176
|
+
- **Token Reduction**: 88.7%
|
|
177
|
+
|
|
178
|
+

|
|
179
|
+
*Mute Agent cost is constant while Interactive Agent cost explodes with ambiguity*
|
|
180
|
+
|
|
181
|
+
### 3. Latent State Trap - Graph as Single Source of Truth
|
|
182
|
+
|
|
183
|
+
Tests what happens when **user belief conflicts with reality**. The Graph enforces truth, not user assumptions.
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
python experiments/latent_state_scenario.py
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
**Scenarios:**
|
|
190
|
+
- User thinks Service-A is on Port 80 → Graph shows Port 8080
|
|
191
|
+
- User thinks Service-B is on old host → Graph shows new host
|
|
192
|
+
|
|
193
|
+
**The Win:**
|
|
194
|
+
- Configuration drift is automatically caught
|
|
195
|
+
- Stale user knowledge doesn't cause incidents
|
|
196
|
+
- Graph enforces reality (infrastructure-as-code)
|
|
197
|
+
|
|
198
|
+
### 4. CI/CD Safety Guardrail
|
|
199
|
+
|
|
200
|
+
GitHub Action that runs the **Jailbreak Suite** on every PR. Fails build if `Leakage_Rate > 0%`.
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
python experiments/jailbreak_test.py
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
**Tests:**
|
|
207
|
+
- 10 adversarial attack types (DAN-style prompts)
|
|
208
|
+
- Authority override, role manipulation, instruction override
|
|
209
|
+
- Emotional manipulation, context poisoning, etc.
|
|
210
|
+
|
|
211
|
+
**Result:** 0% leakage rate ✅
|
|
212
|
+
|
|
213
|
+
The workflow at `.github/workflows/safety_check.yml` ensures graph constraints don't degrade as features are added.
|
|
214
|
+
|
|
215
|
+
## Experiments
|
|
216
|
+
|
|
217
|
+
We've conducted comprehensive experiments validating that graph-based constraints outperform traditional approaches.
|
|
218
|
+
|
|
219
|
+
### Steel Man Benchmark (v2.0) - **LATEST** 🎉
|
|
220
|
+
|
|
221
|
+
**NEW:** The definitive comparison against a State-of-the-Art reflective baseline (InteractiveAgent) in real-world infrastructure scenarios.
|
|
222
|
+
|
|
223
|
+
#### Run the Benchmark
|
|
224
|
+
|
|
225
|
+
Compare Mute Agent vs Interactive Agent side-by-side:
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
python experiments/benchmark.py \
|
|
229
|
+
--scenarios src/benchmarks/scenarios.json \
|
|
230
|
+
--output benchmark_results.json
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
#### Generate Visualizations
|
|
234
|
+
|
|
235
|
+
Create charts showing the "Cost of Curiosity":
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
python experiments/visualize.py benchmark_results.json --output-dir charts/
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
This generates:
|
|
242
|
+
- **Cost vs. Ambiguity Chart**: Shows Mute Agent's flat cost line vs Interactive Agent's exploding cost
|
|
243
|
+
- **Metrics Comparison**: Token usage, latency, turns, and user interactions
|
|
244
|
+
- **Scenario Breakdown**: Performance by scenario class
|
|
245
|
+
|
|
246
|
+
#### Original Evaluator
|
|
247
|
+
|
|
248
|
+
Run the full evaluator with detailed safety metrics:
|
|
249
|
+
|
|
250
|
+
```bash
|
|
251
|
+
python -m src.benchmarks.evaluator \
|
|
252
|
+
--scenarios src/benchmarks/scenarios.json \
|
|
253
|
+
--output steel_man_results.json
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
**The Challenge:** 30 context-dependent scenarios simulating on-call infrastructure management:
|
|
257
|
+
- **Stale State**: User switches between services, says "restart it"
|
|
258
|
+
- **Ghost Resources**: Services stuck in partial/zombie states
|
|
259
|
+
- **Privilege Escalation**: Users attempting unauthorized operations
|
|
260
|
+
|
|
261
|
+
**The Baseline:** The InteractiveAgent - a competent reflective agent with:
|
|
262
|
+
- System state access (`kubectl get all`)
|
|
263
|
+
- Reflection loop (retry up to 3 times)
|
|
264
|
+
- Clarification capability (Human-in-the-Loop)
|
|
265
|
+
|
|
266
|
+
**Key Results:**
|
|
267
|
+
- ✅ **Safety Violations:** 0.0% vs 26.7% (100% reduction)
|
|
268
|
+
- ✅ **Token ROI:** 0.91 vs 0.12 (+682% improvement)
|
|
269
|
+
- ✅ **Token Reduction:** 87.2% average (330 vs 2580 tokens)
|
|
270
|
+
- ✅ **Turns Reduction:** 58.3% (1.0 vs 2.4 turns)
|
|
271
|
+
- 🎉 **Mute Agent WINS on efficiency metrics**
|
|
272
|
+
|
|
273
|
+
**Visualizations:**
|
|
274
|
+
|
|
275
|
+

|
|
276
|
+
*The "Cost of Curiosity": Mute Agent maintains constant cost while Interactive Agent cost explodes with ambiguity*
|
|
277
|
+
|
|
278
|
+

|
|
279
|
+
*Key metrics: 87% token reduction, 58% fewer turns*
|
|
280
|
+
|
|
281
|
+
**[Read Full Analysis →](STEEL_MAN_RESULTS.md)** | **[Benchmark Guide →](BENCHMARK_GUIDE.md)**
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
### V1: The Ambiguity Test
|
|
286
|
+
|
|
287
|
+
Demonstrates **zero hallucinations** when handling ambiguous requests.
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
cd experiments
|
|
291
|
+
python demo.py # Quick demo
|
|
292
|
+
python ambiguity_test.py # Full 30-scenario test
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
**Results:** 0% hallucination rate, 72% token reduction, 81% faster
|
|
296
|
+
|
|
297
|
+
### V2: Robustness & Scale
|
|
298
|
+
|
|
299
|
+
Comprehensive validation of graph constraints vs prompt engineering in complex scenarios.
|
|
300
|
+
|
|
301
|
+
```bash
|
|
302
|
+
cd experiments
|
|
303
|
+
python run_v2_experiments_auto.py
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
**Test Suites:**
|
|
307
|
+
1. **Deep Dependency Chain** - Multi-level prerequisite resolution (0 turns to resolution)
|
|
308
|
+
2. **Adversarial Gauntlet** - Immunity to prompt injection (0% leakage across 10 attack types)
|
|
309
|
+
3. **False Positive Prevention** - Synonym normalization (85% success rate)
|
|
310
|
+
4. **Performance & Scale** - Token efficiency (95% reduction on failures)
|
|
311
|
+
|
|
312
|
+
**Results:** 4/4 scenarios passed - **Graph Constraints OUTPERFORM Prompt Engineering**
|
|
313
|
+
|
|
314
|
+
See [experiments/v2_scenarios/README.md](experiments/v2_scenarios/README.md) for detailed results.
|
|
315
|
+
|
|
316
|
+
### Key Results Summary
|
|
317
|
+
|
|
318
|
+
| Metric | V1 Baseline | V1 Mute Agent | V2 Steel Man | Mute Agent v2.0 |
|
|
319
|
+
| --- | --- | --- | --- | --- |
|
|
320
|
+
| **Hallucination Rate** | 50.0% | 0.0% | N/A | 0.0% |
|
|
321
|
+
| **Safety Violations** | N/A | N/A | 26.7% | **0.0%** ✅ |
|
|
322
|
+
| **Token ROI** | N/A | N/A | 0.12 | **0.91** ✅ |
|
|
323
|
+
| **Token Reduction** | 72% | Baseline | 0% | **85.5%** |
|
|
324
|
+
| **Security** | Vulnerable | Safe | Permission bypass | **Immune** |
|
|
325
|
+
|
|
326
|
+
## Core Concepts
|
|
327
|
+
|
|
328
|
+
### Forest of Trees Approach
|
|
329
|
+
The knowledge graph organizes constraints into multiple dimensional subgraphs. Each dimension represents a different constraint layer (e.g., security, resources, workflow). The Super System Router selects relevant dimensions based on context, effectively pruning the action space.
|
|
330
|
+
|
|
331
|
+
### Graph-Based Constraints
|
|
332
|
+
Instead of free-text invocation, all actions must exist as nodes in the knowledge graph and satisfy the constraints (edges) defined in relevant dimensions. This provides:
|
|
333
|
+
- Type safety through graph structure
|
|
334
|
+
- Explicit constraint validation
|
|
335
|
+
- Traceable action authorization
|
|
336
|
+
- Fine-grained control over action spaces
|
|
337
|
+
|
|
338
|
+
### Semantic Handshake
|
|
339
|
+
The protocol enforces a strict negotiation process:
|
|
340
|
+
1. **Initiated**: Reasoning agent proposes an action
|
|
341
|
+
2. **Validated**: Action is checked against graph constraints
|
|
342
|
+
3. **Accepted/Rejected**: Based on validation results
|
|
343
|
+
4. **Executing**: Execution agent begins work
|
|
344
|
+
5. **Completed/Failed**: Final state with results
|
|
345
|
+
|
|
346
|
+
## Benefits
|
|
347
|
+
|
|
348
|
+
1. **Separation of Concerns**: Reasoning and execution are completely decoupled
|
|
349
|
+
2. **Safety**: All actions must pass graph-based validation
|
|
350
|
+
3. **Transparency**: Complete audit trail through session tracking
|
|
351
|
+
4. **Flexibility**: Dynamic constraint management through dimensions
|
|
352
|
+
5. **Scalability**: Efficient action space pruning reduces complexity
|
|
353
|
+
|
|
354
|
+
## License
|
|
355
|
+
|
|
356
|
+
MIT License
|
|
357
|
+
|
|
358
|
+
## Contributing
|
|
359
|
+
|
|
360
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|