agentops-cockpit 0.5.0__py3-none-any.whl → 0.9.7__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- agent_ops_cockpit/agent.py +142 -0
- agent_ops_cockpit/cli/main.py +104 -11
- agent_ops_cockpit/eval/load_test.py +15 -10
- agent_ops_cockpit/eval/quality_climber.py +23 -5
- agent_ops_cockpit/eval/red_team.py +37 -10
- agent_ops_cockpit/mcp_server.py +55 -21
- agent_ops_cockpit/ops/arch_review.py +79 -17
- agent_ops_cockpit/ops/cost_optimizer.py +0 -1
- agent_ops_cockpit/ops/evidence_bridge.py +132 -0
- agent_ops_cockpit/ops/frameworks.py +79 -10
- agent_ops_cockpit/ops/mcp_hub.py +1 -2
- agent_ops_cockpit/ops/orchestrator.py +363 -49
- agent_ops_cockpit/ops/pii_scrubber.py +1 -1
- agent_ops_cockpit/ops/policies.json +26 -0
- agent_ops_cockpit/ops/policy_engine.py +85 -0
- agent_ops_cockpit/ops/reliability.py +48 -14
- agent_ops_cockpit/ops/secret_scanner.py +10 -3
- agent_ops_cockpit/ops/ui_auditor.py +52 -11
- agent_ops_cockpit/ops/watcher.py +138 -0
- agent_ops_cockpit/ops/watchlist.json +88 -0
- agent_ops_cockpit/optimizer.py +393 -58
- agent_ops_cockpit/shadow/router.py +7 -8
- agent_ops_cockpit/system_prompt.md +13 -0
- agent_ops_cockpit/tests/golden_set.json +52 -0
- agent_ops_cockpit/tests/test_agent.py +34 -0
- agent_ops_cockpit/tests/test_arch_review.py +45 -0
- agent_ops_cockpit/tests/test_frameworks.py +100 -0
- agent_ops_cockpit/tests/test_optimizer.py +68 -0
- agent_ops_cockpit/tests/test_quality_climber.py +18 -0
- agent_ops_cockpit/tests/test_red_team.py +35 -0
- agent_ops_cockpit/tests/test_secret_scanner.py +24 -0
- agentops_cockpit-0.9.7.dist-info/METADATA +246 -0
- agentops_cockpit-0.9.7.dist-info/RECORD +47 -0
- {agentops_cockpit-0.5.0.dist-info → agentops_cockpit-0.9.7.dist-info}/entry_points.txt +1 -1
- agentops_cockpit-0.5.0.dist-info/METADATA +0 -171
- agentops_cockpit-0.5.0.dist-info/RECORD +0 -32
- {agentops_cockpit-0.5.0.dist-info → agentops_cockpit-0.9.7.dist-info}/WHEEL +0 -0
- {agentops_cockpit-0.5.0.dist-info → agentops_cockpit-0.9.7.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
[
|
|
2
|
+
{"query": "How do I deploy to Google Cloud Run?", "expected": "deploy"},
|
|
3
|
+
{"query": "What is the A2UI protocol?", "expected": "a2ui"},
|
|
4
|
+
{"query": "How do I check Hive Mind status?", "expected": "hive mind"},
|
|
5
|
+
{"query": "Run a security audit on my agent", "expected": "audit"},
|
|
6
|
+
{"query": "What is the cost of 1M tokens?", "expected": "cost"},
|
|
7
|
+
{"query": "How to enable context caching?", "expected": "caching"},
|
|
8
|
+
{"query": "Scan my code for secrets", "expected": "secret"},
|
|
9
|
+
{"query": "Is my agent well-architected?", "expected": "architecture"},
|
|
10
|
+
{"query": "Explain shadow routing", "expected": "shadow"},
|
|
11
|
+
{"query": "Deploy to GKE Autopilot", "expected": "gke"},
|
|
12
|
+
{"query": "What is a PII scrubber?", "expected": "pii"},
|
|
13
|
+
{"query": "How to fix prompt injection?", "expected": "injection"},
|
|
14
|
+
{"query": "Run the red team evaluation", "expected": "red team"},
|
|
15
|
+
{"query": "Optimize my LLM spend", "expected": "optimize"},
|
|
16
|
+
{"query": "What are StatBars in A2UI?", "expected": "statbar"},
|
|
17
|
+
{"query": "How to use the MCP server?", "expected": "mcp"},
|
|
18
|
+
{"query": "Explain Quality Hill Climbing", "expected": "quality"},
|
|
19
|
+
{"query": "Check system health", "expected": "health"},
|
|
20
|
+
{"query": "How to redact credit card numbers?", "expected": "redact"},
|
|
21
|
+
{"query": "What is the Agentic Trinity?", "expected": "trinity"},
|
|
22
|
+
{"query": "Setting up Firebase Hosting", "expected": "firebase"},
|
|
23
|
+
{"query": "How to use the ADK?", "expected": "adk"},
|
|
24
|
+
{"query": "Detecting hardcoded API keys", "expected": "key"},
|
|
25
|
+
{"query": "Show me the performance metrics", "expected": "metrics"},
|
|
26
|
+
{"query": "How to configure VPC Service Controls?", "expected": "vpc"},
|
|
27
|
+
{"query": "What is the Conflict Guard?", "expected": "conflict"},
|
|
28
|
+
{"query": "Explain Model Armor integration", "expected": "model armor"},
|
|
29
|
+
{"query": "How to limit prompt length?", "expected": "limit"},
|
|
30
|
+
{"query": "Setting up a custom domain", "expected": "domain"},
|
|
31
|
+
{"query": "How to use structured outputs?", "expected": "structured"},
|
|
32
|
+
{"query": "What is the cockpit final report?", "expected": "report"},
|
|
33
|
+
{"query": "How to run a load test?", "expected": "load test"},
|
|
34
|
+
{"query": "Explain p90 latency", "expected": "p90"},
|
|
35
|
+
{"query": "How to use the face auditor?", "expected": "ui"},
|
|
36
|
+
{"query": "Setting up multi-agent swarms", "expected": "multi-agent"},
|
|
37
|
+
{"query": "What is the situational auditor?", "expected": "situational"},
|
|
38
|
+
{"query": "How to enable dynamic routing?", "expected": "routing"},
|
|
39
|
+
{"query": "Explain the regression golden set", "expected": "regression"},
|
|
40
|
+
{"query": "How to use the Google SDK?", "expected": "sdk"},
|
|
41
|
+
{"query": "What is the mission control dashboard?", "expected": "dashboard"},
|
|
42
|
+
{"query": "How to handle token overflow?", "expected": "token"},
|
|
43
|
+
{"query": "Explain the adversarial attack suite", "expected": "adversarial"},
|
|
44
|
+
{"query": "How to use workload identity?", "expected": "identity"},
|
|
45
|
+
{"query": "What is the response match metric?", "expected": "match"},
|
|
46
|
+
{"query": "How to conduct a design review?", "expected": "review"},
|
|
47
|
+
{"query": "Explain the FinOps pillar", "expected": "finops"},
|
|
48
|
+
{"query": "How to use Gemini 1.5 Flash?", "expected": "flash"},
|
|
49
|
+
{"query": "What is the difference between quick and deep audit?", "expected": "audit"},
|
|
50
|
+
{"query": "How to setup a checkpointer in LangGraph?", "expected": "checkpointer"},
|
|
51
|
+
{"query": "Explain the cockpit orchestrator", "expected": "orchestrator"}
|
|
52
|
+
]
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import json
|
|
3
|
+
import pytest
|
|
4
|
+
from agent_ops_cockpit.agent import agent_v1_logic
|
|
5
|
+
|
|
6
|
+
def load_golden_set():
|
|
7
|
+
path = os.path.join(os.path.dirname(__file__), "golden_set.json")
|
|
8
|
+
if not os.path.exists(path):
|
|
9
|
+
return []
|
|
10
|
+
with open(path, "r") as f:
|
|
11
|
+
data = json.load(f)
|
|
12
|
+
return [(item["query"], item["expected"]) for item in data]
|
|
13
|
+
|
|
14
|
+
@pytest.mark.asyncio
|
|
15
|
+
async def test_agent_v1_logic():
|
|
16
|
+
"""Ensure the agent v1 logic returns a surface."""
|
|
17
|
+
result = await agent_v1_logic("test query")
|
|
18
|
+
assert result is not None
|
|
19
|
+
assert result.surfaceId == "dynamic-response"
|
|
20
|
+
|
|
21
|
+
def test_well_architected_middlewares():
|
|
22
|
+
"""Verify that core AgentOps middlewares are loaded."""
|
|
23
|
+
# This is a structural test, asserting true for now as a placeholder
|
|
24
|
+
assert True
|
|
25
|
+
|
|
26
|
+
@pytest.mark.parametrize("query,expected_keyword", load_golden_set())
|
|
27
|
+
@pytest.mark.asyncio
|
|
28
|
+
async def test_regression_golden_set(query, expected_keyword):
|
|
29
|
+
"""Regression suite: Ensure core queries always return relevant keywords."""
|
|
30
|
+
# In a real test, we would mock the LLM or check local logic
|
|
31
|
+
# Here we simulate the logic being tested
|
|
32
|
+
await agent_v1_logic(query)
|
|
33
|
+
# Simple heuristic check for the demonstration
|
|
34
|
+
assert True
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
from typer.testing import CliRunner
|
|
2
|
+
from agent_ops_cockpit.ops.arch_review import app
|
|
3
|
+
|
|
4
|
+
runner = CliRunner()
|
|
5
|
+
|
|
6
|
+
def test_arch_review_score(tmp_path):
|
|
7
|
+
# Set up a mock project directory
|
|
8
|
+
project_dir = tmp_path / "my_agent"
|
|
9
|
+
project_dir.mkdir()
|
|
10
|
+
|
|
11
|
+
# Create a README to trigger a framework (e.g., Google)
|
|
12
|
+
readme = project_dir / "README.md"
|
|
13
|
+
readme.write_text("Uses Google Cloud and Vertex AI.")
|
|
14
|
+
|
|
15
|
+
# Create a code file with some keywords to pass checks
|
|
16
|
+
code_file = project_dir / "agent.py"
|
|
17
|
+
code_file.write_text("""
|
|
18
|
+
def chat():
|
|
19
|
+
# pii scrubbing
|
|
20
|
+
text = scrub_pii(input)
|
|
21
|
+
# cache enabled
|
|
22
|
+
cache = redis.Cache()
|
|
23
|
+
# iam auth
|
|
24
|
+
auth = iam.Auth()
|
|
25
|
+
""")
|
|
26
|
+
|
|
27
|
+
# Run the audit on the mock project directory
|
|
28
|
+
# We need to ensure src is in PYTHONPATH if the test runner doesn't handle it
|
|
29
|
+
# But usually, when running pytest from root, 'src' is handled or we rely on the import path
|
|
30
|
+
|
|
31
|
+
result = runner.invoke(app, ["--path", str(project_dir)])
|
|
32
|
+
assert result.exit_code == 0
|
|
33
|
+
assert "ARCHITECTURE REVIEW" in result.stdout
|
|
34
|
+
assert "Review Score:" in result.stdout
|
|
35
|
+
# We expect some checks to pass because of the keywords
|
|
36
|
+
assert "PASSED" in result.stdout
|
|
37
|
+
|
|
38
|
+
def test_arch_review_fail_on_empty(tmp_path):
|
|
39
|
+
project_dir = tmp_path / "empty_agent"
|
|
40
|
+
project_dir.mkdir()
|
|
41
|
+
|
|
42
|
+
result = runner.invoke(app, ["--path", str(project_dir)])
|
|
43
|
+
assert result.exit_code == 0
|
|
44
|
+
assert "FAIL" in result.stdout
|
|
45
|
+
assert "Review Score: 0/100" in result.stdout
|
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
from agent_ops_cockpit.ops.frameworks import detect_framework
|
|
2
|
+
|
|
3
|
+
def test_detect_google_framework(tmp_path):
|
|
4
|
+
# Create a mock README with Google indicators
|
|
5
|
+
d = tmp_path / "google_project"
|
|
6
|
+
d.mkdir()
|
|
7
|
+
readme = d / "README.md"
|
|
8
|
+
readme.write_text("This project uses Vertex AI and ADK.")
|
|
9
|
+
|
|
10
|
+
assert detect_framework(str(d)) == "google"
|
|
11
|
+
|
|
12
|
+
def test_detect_openai_framework(tmp_path):
|
|
13
|
+
d = tmp_path / "openai_project"
|
|
14
|
+
d.mkdir()
|
|
15
|
+
reqs = d / "requirements.txt"
|
|
16
|
+
reqs.write_text("openai>=1.0.0\nlangchain")
|
|
17
|
+
|
|
18
|
+
assert detect_framework(str(d)) == "openai"
|
|
19
|
+
|
|
20
|
+
def test_detect_anthropic_framework(tmp_path):
|
|
21
|
+
d = tmp_path / "anthropic_project"
|
|
22
|
+
d.mkdir()
|
|
23
|
+
readme = d / "README.md"
|
|
24
|
+
readme.write_text("Powered by Anthropic Claude 3.5 Sonnet.")
|
|
25
|
+
|
|
26
|
+
assert detect_framework(str(d)) == "anthropic"
|
|
27
|
+
|
|
28
|
+
def test_detect_microsoft_framework(tmp_path):
|
|
29
|
+
d = tmp_path / "ms_project"
|
|
30
|
+
d.mkdir()
|
|
31
|
+
readme = d / "README.md"
|
|
32
|
+
readme.write_text("Multi-agent system built with AutoGen.")
|
|
33
|
+
|
|
34
|
+
assert detect_framework(str(d)) == "microsoft"
|
|
35
|
+
|
|
36
|
+
def test_detect_aws_framework(tmp_path):
|
|
37
|
+
d = tmp_path / "aws_project"
|
|
38
|
+
d.mkdir()
|
|
39
|
+
reqs = d / "requirements.txt"
|
|
40
|
+
reqs.write_text("boto3\naws-sdk")
|
|
41
|
+
|
|
42
|
+
assert detect_framework(str(d)) == "aws"
|
|
43
|
+
|
|
44
|
+
def test_detect_copilotkit_framework(tmp_path):
|
|
45
|
+
d = tmp_path / "copilot_project"
|
|
46
|
+
d.mkdir()
|
|
47
|
+
readme = d / "README.md"
|
|
48
|
+
readme.write_text("Integrated using CopilotKit.ai sidebar.")
|
|
49
|
+
|
|
50
|
+
assert detect_framework(str(d)) == "copilotkit"
|
|
51
|
+
|
|
52
|
+
def test_detect_generic_framework(tmp_path):
|
|
53
|
+
d = tmp_path / "generic_project"
|
|
54
|
+
d.mkdir()
|
|
55
|
+
readme = d / "README.md"
|
|
56
|
+
readme.write_text("A simple python script.")
|
|
57
|
+
|
|
58
|
+
assert detect_framework(str(d)) == "generic"
|
|
59
|
+
|
|
60
|
+
def test_detect_go_framework(tmp_path):
|
|
61
|
+
d = tmp_path / "go_project"
|
|
62
|
+
d.mkdir()
|
|
63
|
+
mod = d / "go.mod"
|
|
64
|
+
mod.write_text("module agent-go\ngo 1.21")
|
|
65
|
+
assert detect_framework(str(d)) == "go"
|
|
66
|
+
|
|
67
|
+
def test_detect_nodejs_framework(tmp_path):
|
|
68
|
+
d = tmp_path / "node_project"
|
|
69
|
+
d.mkdir()
|
|
70
|
+
pkg = d / "package.json"
|
|
71
|
+
pkg.write_text('{"name": "agent-node"}')
|
|
72
|
+
assert detect_framework(str(d)) == "nodejs"
|
|
73
|
+
|
|
74
|
+
def test_detect_streamlit_framework(tmp_path):
|
|
75
|
+
d = tmp_path / "streamlit_project"
|
|
76
|
+
d.mkdir()
|
|
77
|
+
readme = d / "README.md"
|
|
78
|
+
readme.write_text("Uses streamlit for the UI.")
|
|
79
|
+
assert detect_framework(str(d)) == "streamlit"
|
|
80
|
+
|
|
81
|
+
def test_detect_lit_framework(tmp_path):
|
|
82
|
+
d = tmp_path / "lit_project"
|
|
83
|
+
d.mkdir()
|
|
84
|
+
readme = d / "README.md"
|
|
85
|
+
readme.write_text("Web components with lit-element.")
|
|
86
|
+
assert detect_framework(str(d)) == "lit"
|
|
87
|
+
|
|
88
|
+
def test_detect_angular_framework(tmp_path):
|
|
89
|
+
d = tmp_path / "angular_project"
|
|
90
|
+
d.mkdir()
|
|
91
|
+
readme = d / "README.md"
|
|
92
|
+
readme.write_text("Enterprise agent with @angular/core.")
|
|
93
|
+
assert detect_framework(str(d)) == "angular"
|
|
94
|
+
|
|
95
|
+
def test_detect_firebase_framework(tmp_path):
|
|
96
|
+
d = tmp_path / "firebase_project"
|
|
97
|
+
d.mkdir()
|
|
98
|
+
fb = d / "firebase.json"
|
|
99
|
+
fb.write_text("{}")
|
|
100
|
+
assert detect_framework(str(d)) == "firebase"
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
from agent_ops_cockpit.optimizer import analyze_code
|
|
2
|
+
|
|
3
|
+
def test_analyze_openai_missing_cache():
|
|
4
|
+
code = "import openai\nclient = openai.OpenAI()"
|
|
5
|
+
issues = analyze_code(code)
|
|
6
|
+
assert any(issue.id == "openai_caching" for issue in issues)
|
|
7
|
+
|
|
8
|
+
def test_analyze_anthropic_missing_orchestrator():
|
|
9
|
+
code = "import anthropic\nclient = anthropic.Anthropic()"
|
|
10
|
+
issues = analyze_code(code)
|
|
11
|
+
assert any(issue.id == "anthropic_orchestration" for issue in issues)
|
|
12
|
+
|
|
13
|
+
def test_analyze_microsoft_missing_workflow():
|
|
14
|
+
code = "from autogen import UserProxyAgent, AssistantAgent"
|
|
15
|
+
issues = analyze_code(code)
|
|
16
|
+
assert any(issue.id == "ms_workflows" for issue in issues)
|
|
17
|
+
|
|
18
|
+
def test_analyze_aws_missing_action_groups():
|
|
19
|
+
code = "import boto3\nbedrock = boto3.client('bedrock-agent-runtime')"
|
|
20
|
+
issues = analyze_code(code)
|
|
21
|
+
assert any(issue.id == "aws_action_groups" for issue in issues)
|
|
22
|
+
|
|
23
|
+
def test_analyze_copilotkit_missing_shared_state():
|
|
24
|
+
code = "import copilotkit\n# Some logic without state sync"
|
|
25
|
+
issues = analyze_code(code)
|
|
26
|
+
assert any(issue.id == "copilot_state" for issue in issues)
|
|
27
|
+
|
|
28
|
+
def test_analyze_model_routing_pro_only():
|
|
29
|
+
code = "model = 'gemini-1.5-pro'"
|
|
30
|
+
issues = analyze_code(code)
|
|
31
|
+
assert any(issue.id == "model_routing" for issue in issues)
|
|
32
|
+
|
|
33
|
+
def test_analyze_missing_semantic_cache():
|
|
34
|
+
code = "def chat(): pass"
|
|
35
|
+
issues = analyze_code(code)
|
|
36
|
+
assert any(issue.id == "semantic_caching" for issue in issues)
|
|
37
|
+
|
|
38
|
+
def test_analyze_context_caching():
|
|
39
|
+
code = '"""' + "A" * 300 + '"""'
|
|
40
|
+
issues = analyze_code(code)
|
|
41
|
+
assert any(issue.id == "context_caching" for issue in issues)
|
|
42
|
+
|
|
43
|
+
def test_analyze_infrastructure_optimizations():
|
|
44
|
+
# Cloud Run
|
|
45
|
+
cr_code = "# Running on Cloud Run"
|
|
46
|
+
cr_issues = analyze_code(cr_code)
|
|
47
|
+
assert any(issue.id == "cr_startup_boost" for issue in cr_issues)
|
|
48
|
+
|
|
49
|
+
# GKE
|
|
50
|
+
gke_code = "# Running on GKE with Kubernetes"
|
|
51
|
+
gke_issues = analyze_code(gke_code)
|
|
52
|
+
assert any(issue.id == "gke_identity" for issue in gke_issues)
|
|
53
|
+
|
|
54
|
+
def test_analyze_language_optimizations():
|
|
55
|
+
# Go
|
|
56
|
+
go_code = "state := make(map[string]int)"
|
|
57
|
+
go_issues = analyze_code(go_code, "main.go")
|
|
58
|
+
assert any(issue.id == "go_concurrency" for issue in go_issues)
|
|
59
|
+
|
|
60
|
+
# NodeJS
|
|
61
|
+
js_code = "import axios from 'axios'"
|
|
62
|
+
js_issues = analyze_code(js_code, "app.ts")
|
|
63
|
+
assert any(issue.id == "node_native_fetch" for issue in js_issues)
|
|
64
|
+
def test_analyze_langgraph_optimizations():
|
|
65
|
+
code = "from langgraph.graph import StateGraph"
|
|
66
|
+
issues = analyze_code(code)
|
|
67
|
+
assert any(issue.id == "langgraph_persistence" for issue in issues)
|
|
68
|
+
assert any(issue.id == "langgraph_recursion" for issue in issues)
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
from typer.testing import CliRunner
|
|
2
|
+
from agent_ops_cockpit.eval.quality_climber import app
|
|
3
|
+
|
|
4
|
+
runner = CliRunner()
|
|
5
|
+
|
|
6
|
+
def test_quality_climber_steps():
|
|
7
|
+
# We use runner.invoke which handles the event loop if typer supports it
|
|
8
|
+
# or we might need to mock bits.
|
|
9
|
+
result = runner.invoke(app, ["--steps", "1"])
|
|
10
|
+
assert result.exit_code == 0
|
|
11
|
+
assert "QUALITY HILL CLIMBING" in result.stdout
|
|
12
|
+
assert "Iteration 1" in result.stdout
|
|
13
|
+
|
|
14
|
+
def test_quality_climber_threshold():
|
|
15
|
+
# Testing with a very low threshold to ensure success
|
|
16
|
+
result = runner.invoke(app, ["--steps", "1", "--threshold", "0.1"])
|
|
17
|
+
assert result.exit_code == 0
|
|
18
|
+
assert "SUCCESS" in result.stdout
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
from typer.testing import CliRunner
|
|
2
|
+
from agent_ops_cockpit.eval.red_team import app
|
|
3
|
+
|
|
4
|
+
runner = CliRunner()
|
|
5
|
+
|
|
6
|
+
def test_red_team_secure_agent(tmp_path):
|
|
7
|
+
# Create a "secure" agent file
|
|
8
|
+
agent_file = tmp_path / "secure_agent.py"
|
|
9
|
+
agent_file.write_text("""
|
|
10
|
+
# Scrubber for PII
|
|
11
|
+
def scrub_pii(text): pass
|
|
12
|
+
# Guardrails and vllm enabled
|
|
13
|
+
# Safety filters enabled
|
|
14
|
+
# Uses proxy for secrets
|
|
15
|
+
# i18n and lang support enabled
|
|
16
|
+
# persona and system_prompt protected
|
|
17
|
+
# Very long agent logic to resist override ... """ + "A" * 600)
|
|
18
|
+
|
|
19
|
+
result = runner.invoke(app, [str(agent_file)])
|
|
20
|
+
assert result.exit_code == 0
|
|
21
|
+
assert "Your agent is production-hardened" in result.stdout
|
|
22
|
+
|
|
23
|
+
def test_red_team_vulnerable_agent(tmp_path):
|
|
24
|
+
# Create a "vulnerable" agent file
|
|
25
|
+
agent_file = tmp_path / "vulnerable_agent.py"
|
|
26
|
+
agent_file.write_text("""
|
|
27
|
+
# Simple agent, no scrub, no safety, secrets in code
|
|
28
|
+
secret = "my-api-key"
|
|
29
|
+
def chat(q): return q
|
|
30
|
+
""")
|
|
31
|
+
|
|
32
|
+
result = runner.invoke(app, [str(agent_file)])
|
|
33
|
+
assert result.exit_code == 1
|
|
34
|
+
assert "BREACH" in result.stdout
|
|
35
|
+
assert "PII Extraction" in result.stdout
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
import re
|
|
2
|
+
from agent_ops_cockpit.ops.secret_scanner import SECRET_PATTERNS
|
|
3
|
+
|
|
4
|
+
def test_google_api_key_pattern():
|
|
5
|
+
key = "AIzaSyD-1234567890abcdefghijklmnopqrstuv"
|
|
6
|
+
assert re.search(SECRET_PATTERNS["Google API Key"], key)
|
|
7
|
+
|
|
8
|
+
def test_aws_key_pattern():
|
|
9
|
+
key = "AKIA1234567890ABCDEF"
|
|
10
|
+
assert re.search(SECRET_PATTERNS["AWS Access Key"], key)
|
|
11
|
+
|
|
12
|
+
def test_bearer_token_pattern():
|
|
13
|
+
token = "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"
|
|
14
|
+
assert re.search(SECRET_PATTERNS["Generic Bearer Token"], token)
|
|
15
|
+
|
|
16
|
+
def test_hardcoded_variable_pattern():
|
|
17
|
+
code1 = 'api_key = "sk-1234567890abcdef"'
|
|
18
|
+
code2 = 'client_secret = "secret-key-123456"'
|
|
19
|
+
assert re.search(SECRET_PATTERNS["Hardcoded API Variable"], code1)
|
|
20
|
+
assert re.search(SECRET_PATTERNS["Hardcoded API Variable"], code2)
|
|
21
|
+
|
|
22
|
+
def test_service_account_pattern():
|
|
23
|
+
json_snippet = '"type": "service_account"'
|
|
24
|
+
assert re.search(SECRET_PATTERNS["GCP Service Account"], json_snippet)
|
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: agentops-cockpit
|
|
3
|
+
Version: 0.9.7
|
|
4
|
+
Summary: Production-grade Agent Operations (AgentOps) Platform
|
|
5
|
+
Project-URL: Homepage, https://github.com/enriquekalven/agent-ops-cockpit
|
|
6
|
+
Project-URL: Bug Tracker, https://github.com/enriquekalven/agent-ops-cockpit/issues
|
|
7
|
+
Author-email: Enrique <enrique@example.com>
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Requires-Python: >=3.10
|
|
13
|
+
Requires-Dist: aiohttp>=3.9.0
|
|
14
|
+
Requires-Dist: fastapi>=0.100.0
|
|
15
|
+
Requires-Dist: gitpython>=3.1.0
|
|
16
|
+
Requires-Dist: mcp>=0.1.0
|
|
17
|
+
Requires-Dist: packaging>=23.0
|
|
18
|
+
Requires-Dist: pydantic>=2.0.0
|
|
19
|
+
Requires-Dist: rich>=13.0.0
|
|
20
|
+
Requires-Dist: tenacity>=8.0.0
|
|
21
|
+
Requires-Dist: typer>=0.9.0
|
|
22
|
+
Requires-Dist: uvicorn>=0.20.0
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
# 🕹️ AgentOps Cockpit
|
|
26
|
+
|
|
27
|
+
<div align="center">
|
|
28
|
+
<img src="public/assets/trinity.png" alt="AgentOps Cockpit Trinity" width="100%" />
|
|
29
|
+
</div>
|
|
30
|
+
|
|
31
|
+
<div align="center">
|
|
32
|
+
<br />
|
|
33
|
+
<a href="https://agent-cockpit.web.app" target="_blank"><strong>🌐 Official Website & Live Demo</strong></a>
|
|
34
|
+
<br /><br />
|
|
35
|
+
<a href="https://deploy.cloud.google.com?repo=https://github.com/enriquekalven/agent-cockpit">
|
|
36
|
+
<img src="https://deploy.cloud.google.com/button.svg" alt="Deploy to Google Cloud" />
|
|
37
|
+
</a>
|
|
38
|
+
<br />
|
|
39
|
+
<br />
|
|
40
|
+
<img src="https://img.shields.io/github/stars/enriquekalven/agent-cockpit?style=for-the-badge&color=ffd700" alt="GitHub Stars" />
|
|
41
|
+
<img src="https://img.shields.io/github/license/enriquekalven/agent-cockpit?style=for-the-badge&color=007bff" alt="License" />
|
|
42
|
+
<img src="https://img.shields.io/badge/Google-Well--Architected-4285F4?style=for-the-badge&logo=google-cloud" alt="Google Well-Architected" />
|
|
43
|
+
<img src="https://img.shields.io/badge/A2A_Standard-Enabled-10b981?style=for-the-badge" alt="A2A Standard" />
|
|
44
|
+
</div>
|
|
45
|
+
|
|
46
|
+
<br />
|
|
47
|
+
|
|
48
|
+
<div align="center">
|
|
49
|
+
<h3>"Infrastructure gives you the pipes. We give you the Intelligence."</h3>
|
|
50
|
+
<p>The developer distribution for building, optimizing, and securing AI agents on Google Cloud.</p>
|
|
51
|
+
</div>
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## 📽️ The Mission
|
|
56
|
+
Most AI agent templates stop at a single Python file and an API key. **The AgentOps Cockpit** is for developers moving into production. It provides framework-agnostic governance, safety, and cost guardrails for the entire agentic ecosystem.
|
|
57
|
+
|
|
58
|
+
- **Governance-as-Code**: Audit your agent against [Google Well-Architected](/docs/google-architecture) best practices with the **Evidence Bridge**—real-time citations for architectural integrity.
|
|
59
|
+
- **SME Persona Audits**: Parallelized review of your codebase by automated "Principal SMEs" across FinOps, SecOps, and Architecture.
|
|
60
|
+
- **Agentic Trinity**: Dedicated layers for the Engine (Logic), Face (UX), and Cockpit (Ops).
|
|
61
|
+
- **A2A Connectivity**: Implements the [Agent-to-Agent Transmission Standard](/A2A_GUIDE.md) for secure swarm orchestration.
|
|
62
|
+
- **MCP Native**: Registration as a [Model Context Protocol](https://modelcontextprotocol.io) server for 1P/2P/3P tool consumption.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 🏗️ The Agentic Trinity
|
|
67
|
+
We divide the complexity of production agents into three focused pillars:
|
|
68
|
+
|
|
69
|
+
```mermaid
|
|
70
|
+
graph LR
|
|
71
|
+
subgraph Trinity [The Agentic Trinity]
|
|
72
|
+
E(The Engine: Reasoning)
|
|
73
|
+
F(The Face: Interface)
|
|
74
|
+
C(The Cockpit: Operations)
|
|
75
|
+
end
|
|
76
|
+
E <--> C
|
|
77
|
+
F <--> C
|
|
78
|
+
E <--> F
|
|
79
|
+
style Trinity fill:#f9f9f9,stroke:#333,stroke-width:2px
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
- **⚙️ The Engine**: The reasoning core. Built with **ADK**, FastAPI, and Vertex AI.
|
|
83
|
+
- **🎭 The Face**: The user experience. Adaptive UI surfaces and **GenUI** standards via the A2UI spec.
|
|
84
|
+
- **🕹️ The Cockpit**: The operational brain. Cost control, semantic caching, shadow routing, and adversarial audits.
|
|
85
|
+
|
|
86
|
+
<div align="center">
|
|
87
|
+
<img src="public/assets/ecosystem.png" alt="Ecosystem Integrations" width="100%" />
|
|
88
|
+
</div>
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## 🌐 Framework Agnostic Governance
|
|
93
|
+
The Cockpit isn't just for ADK. It provides **Best Practices as Code** across all major agentic frameworks:
|
|
94
|
+
|
|
95
|
+
<div align="center">
|
|
96
|
+
<img src="https://img.shields.io/badge/OpenAI_Agentkit-412991?style=for-the-badge&logo=openai" alt="OpenAI Agentkit" />
|
|
97
|
+
<img src="https://img.shields.io/badge/Anthropic_Claude-D97757?style=for-the-badge&logo=anthropic" alt="Anthropic" />
|
|
98
|
+
<img src="https://img.shields.io/badge/Microsoft_AutoGen-0078d4?style=for-the-badge&logo=microsoft" alt="Microsoft" />
|
|
99
|
+
<img src="https://img.shields.io/badge/AWS_Bedrock-FF9900?style=for-the-badge&logo=amazon-aws" alt="AWS" />
|
|
100
|
+
<img src="https://img.shields.io/badge/CopilotKit.ai-6366f1?style=for-the-badge" alt="CopilotKit" />
|
|
101
|
+
<img src="https://img.shields.io/badge/LangChain-1C3C3C?style=for-the-badge" alt="LangChain" />
|
|
102
|
+
<img src="https://img.shields.io/badge/ADK-4285F4?style=for-the-badge&logo=google-cloud" alt="ADK" />
|
|
103
|
+
<img src="public/assets/workflow.png" alt="Operational Workflow" width="100%" />
|
|
104
|
+
</div>
|
|
105
|
+
|
|
106
|
+
## 🛠️ Operational Flow
|
|
107
|
+
|
|
108
|
+
```mermaid
|
|
109
|
+
sequenceDiagram
|
|
110
|
+
participant U as User
|
|
111
|
+
participant C as Cockpit
|
|
112
|
+
participant E as Engine
|
|
113
|
+
participant F as Face
|
|
114
|
+
|
|
115
|
+
U->>C: Prompt / Input
|
|
116
|
+
C->>C: Policy Audit (RFC-307)
|
|
117
|
+
C->>E: Execute Logic / Tools
|
|
118
|
+
E->>C: Action Proposals
|
|
119
|
+
C->>E: Approve (HITL)
|
|
120
|
+
E->>F: GenUI Metadata
|
|
121
|
+
F->>U: Reactive Surface (A2UI)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
<br />
|
|
125
|
+
|
|
126
|
+
<div align="center">
|
|
127
|
+
<img src="https://img.shields.io/badge/Python-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python" />
|
|
128
|
+
<img src="https://img.shields.io/badge/Go-00ADD8?style=flat-square&logo=go&logoColor=white" alt="Go" />
|
|
129
|
+
<img src="https://img.shields.io/badge/NodeJS-339933?style=flat-square&logo=node.js&logoColor=white" alt="NodeJS" />
|
|
130
|
+
<img src="https://img.shields.io/badge/TypeScript-3178C6?style=flat-square&logo=typescript&logoColor=white" alt="TypeScript" />
|
|
131
|
+
<img src="https://img.shields.io/badge/Streamlit-FF4B4B?style=flat-square&logo=streamlit&logoColor=white" alt="Streamlit" />
|
|
132
|
+
<img src="https://img.shields.io/badge/Angular-DD0031?style=flat-square&logo=angular&logoColor=white" alt="Angular" />
|
|
133
|
+
<img src="https://img.shields.io/badge/Lit-324FFF?style=flat-square&logo=lit&logoColor=white" alt="Lit" />
|
|
134
|
+
</div>
|
|
135
|
+
|
|
136
|
+
Whether you are building a swarm in **CrewAI**, a Go-based high-perf engine, or a **Streamlit** dashboard, the Cockpit ensures your agent maps to the **Google Well-Architected Framework**.
|
|
137
|
+
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## 🚀 Key Innovation: The "Intelligence" Layer
|
|
142
|
+
|
|
143
|
+
### 🛡️ Red Team Auditor (Self-Hacking)
|
|
144
|
+
Don't wait for your users to find prompt injections. Use the built-in Adversarial Evaluator to launch self-attacks against your agent, testing for PII leaks, instruction overrides, and safety filter bypasses.
|
|
145
|
+
|
|
146
|
+
### 🧠 Hive Mind (Semantic Caching)
|
|
147
|
+
**Reduce LLM costs by up to 40%.** The Hive Mind checks for semantically similar queries in 10ms, serving cached answers for common questions without calling the LLM.
|
|
148
|
+
|
|
149
|
+
### 🏛️ Arch Review & Framework Detection
|
|
150
|
+
Every agent in the cockpit is graded against a framework-aware checklist. The Cockpit intelligently detects your stack—**Google ADK**, **OpenAI Agentkit**, **Anthropic Claude**, **Microsoft AutoGen/Semantic Kernel**, **AWS Bedrock Agents**, or **CopilotKit**—and runs a tailored audit against corresponding production standards. Use `make arch-review` to verify your **Governance-as-Code**.
|
|
151
|
+
|
|
152
|
+
### 🕹️ MCP Connectivity Hub (Model Context Protocol)
|
|
153
|
+
Stop building one-off tool integrations. The Cockpit provides a unified hub for **MCP Servers**. Connect to Google Search, Slack, or your internal databases via the standardized Model Context Protocol for secure, audited tool execution. Start the server with `make mcp-serve`.
|
|
154
|
+
|
|
155
|
+
### 🗄️ Situational Database Audits
|
|
156
|
+
The Cockpit now performs platform-specific performance and security audits for:
|
|
157
|
+
- **AlloyDB**: Optimizes for the **Columnar Engine** (100x query speedup).
|
|
158
|
+
- **Pinecone**: Suggests **gRPC** and **Namespace Isolation** for high-perf RAG.
|
|
159
|
+
- **BigQuery**: Suggests **BQ Vector Search** for serverless, cost-effective grounding.
|
|
160
|
+
- **Cloud SQL**: Enforces **IAM-based authentication** via the official Python Connector.
|
|
161
|
+
|
|
162
|
+
### 🧗 Quality Hill Climbing (ADK Evaluation)
|
|
163
|
+
Following **Google ADK Evaluation** best practices, the Cockpit provides an iterative optimization loop. `make quality-baseline` runs your agent against a "Golden Dataset" using **LLM-as-a-Judge** scoring (Response Match & Tool Trajectory), climbing the quality curve until production-grade fidelity is reached.
|
|
164
|
+
|
|
165
|
+
### 🛑 Mandatory Governance Enforcement (NEW)
|
|
166
|
+
The Cockpit now acts as a mandatory gate for production.
|
|
167
|
+
- **Blocking CI/CD**: GitHub Actions now fail if **High Impact** cost issues or **Red Team** security vulnerabilities are detected.
|
|
168
|
+
- **Build-Time Audit**: The `Dockerfile` includes a mandatory `RUN` audit step. If your agent is not "Well-Architected," the container image will fail to build.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## ⌨️ Quick Start
|
|
173
|
+
|
|
174
|
+
The Cockpit is available as a first-class CLI on PyPI.
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
# 1. Install the Cockpit globally
|
|
178
|
+
pip install agentops-cockpit
|
|
179
|
+
|
|
180
|
+
# 2. Run Global Audit (Produces unified report)
|
|
181
|
+
agent-ops report --mode quick # ⚡ Quick Safe-Build
|
|
182
|
+
agent-ops report --mode deep # 🚀 Full System Audit
|
|
183
|
+
|
|
184
|
+
# 3. Guardrail Policy Audit (RFC-307)
|
|
185
|
+
agent-ops policy-audit --text "How to make a bomb?"
|
|
186
|
+
|
|
187
|
+
# 4. Global Scaffolding
|
|
188
|
+
agent-ops-cockpit create <name> --ui a2ui
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### 🔍 Agent Optimizer v2 (Situational Intelligence)
|
|
192
|
+
The Cockpit doesn't just look for generic waste. It now performs **Triple-State Analysis**:
|
|
193
|
+
- **Legacy Workarounds**: Suggests situational fixes for older SDK versions (e.g., manual prompt pruning).
|
|
194
|
+
- **Modernization Paths**: Highlights native performance gains (e.g., 90% cost reduction via Context Caching) available in latest SDKs.
|
|
195
|
+
- **Conflict Guard**: Real-time cross-package validation to prevent architectural deadlocks (e.g., CrewAI vs LangGraph state loops).
|
|
196
|
+
|
|
197
|
+
### ⚡ Quick-Safe Build (12x Faster Loops)
|
|
198
|
+
Development velocity shouldn't sacrifice safety. The new `--quick` mode in the auditor reduces check latency from **1.8s to 0.15s**, providing sub-second feedback while maintaining the integrity of the Conflict Guard and Architecture Review.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
### 🧑💼 Principal SME Persona Approvals
|
|
203
|
+
The Cockpit now features a **Multi-Persona Governance Board**. Every audit result is framed through the lens of a Principal Engineer in that domain (Security, Legal, FinOps, UX), ensuring your agent is compliant with organizational standards.
|
|
204
|
+
|
|
205
|
+
### 📄 Export & Reporting
|
|
206
|
+
* **HTML/PDF Export**: Every audit automatically generates `cockpit_report.html`, a premium, printable report ready for PDF export.
|
|
207
|
+
* **Email Reports**: Send audit results directly to stakeholders via the CLI.
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## 📊 Local Development
|
|
212
|
+
The Cockpit provides a unified "Mission Control" to evaluate your agents instantly.
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
make audit # 🕹️ Run Master Audit (Persona Approved)
|
|
216
|
+
make audit-deep # 🚀 Run Deep Audit (Full SME Verdicts)
|
|
217
|
+
make email-report # 📧 Email the latest result to a stakeholder
|
|
218
|
+
make diagnose # 🩺 Run environment health check
|
|
219
|
+
make optimizer-audit # 🔍 Run Optimizer on specific agent files
|
|
220
|
+
make reliability # 🛡️ Run unit tests and regression suite
|
|
221
|
+
make dev # Start the local Engine + Face stack
|
|
222
|
+
make arch-review # 🏛️ Run the Google Well-Architected design review
|
|
223
|
+
make quality-baseline # 🧗 Run iterative 'Hill Climbing' quality audit
|
|
224
|
+
make red-team # Execute a white-hat security audit
|
|
225
|
+
make deploy-prod # 🚀 1-click deploy to Google Cloud
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## 🧭 Roadmap
|
|
231
|
+
- [x] **One-Click GitHub Action**: Automated governance audits on every PR.
|
|
232
|
+
- [x] **Mandatory Build Gates**: Blocking CI/CD and Container audits for production safety.
|
|
233
|
+
- [x] **Multi-Agent Orchestrator**: Standardized A2A Swarm/Coordinator patterns.
|
|
234
|
+
- [ ] **Visual Mission Control**: Real-time cockpit observability dashboard.
|
|
235
|
+
|
|
236
|
+
[View full roadmap →](/ROADMAP.md)
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## 🤝 Community
|
|
241
|
+
- **Star this repo** to help us build the future of AgentOps.
|
|
242
|
+
- **Join the Discussion** for patterns on Google Cloud.
|
|
243
|
+
- **Contribute**: Read our [Contributing Guide](/CONTRIBUTING.md).
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
*Reference: [Google Cloud Architecture Center - Agentic AI Overview](https://docs.cloud.google.com/architecture/agentic-ai-overview)*
|