testmcpy 0.2.0__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- testmcpy-0.2.2/NOTICE +18 -0
- testmcpy-0.2.2/PKG-INFO +474 -0
- testmcpy-0.2.2/README.md +419 -0
- testmcpy-0.2.2/pyproject.toml +154 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/cli.py +481 -290
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/config.py +119 -61
- testmcpy-0.2.2/testmcpy/evals/__init__.py +1 -0
- testmcpy-0.2.2/testmcpy/evals/base_evaluators.py +909 -0
- testmcpy-0.2.2/testmcpy/mcp_profiles.py +243 -0
- testmcpy-0.2.2/testmcpy/research/claude_sdk_detailed_exploration.py +139 -0
- testmcpy-0.2.2/testmcpy/research/claude_sdk_poc.py +338 -0
- testmcpy-0.2.2/testmcpy/research/claude_sdk_working_poc.py +259 -0
- testmcpy-0.2.2/testmcpy/research/test_ollama_tools.py +369 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/server/api.py +167 -103
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/server/websocket.py +18 -29
- testmcpy-0.2.2/testmcpy/src/__init__.py +1 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/src/llm_integration.py +204 -264
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/src/mcp_client.py +64 -54
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/src/test_runner.py +115 -95
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/package-lock.json +282 -134
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/package.json +3 -2
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/App.jsx +13 -30
- testmcpy-0.2.2/testmcpy/ui/src/components/ParameterCard.jsx +203 -0
- testmcpy-0.2.2/testmcpy/ui/src/components/TestResultPanel.jsx +153 -0
- testmcpy-0.2.2/testmcpy/ui/src/components/TestStatusIndicator.jsx +42 -0
- testmcpy-0.2.2/testmcpy/ui/src/components/TypeBadge.jsx +63 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/pages/ChatInterface.jsx +170 -35
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/pages/MCPExplorer.jsx +77 -50
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/pages/TestManager.jsx +124 -75
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/tailwind.config.js +5 -0
- testmcpy-0.2.2/testmcpy.egg-info/PKG-INFO +474 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy.egg-info/SOURCES.txt +10 -3
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy.egg-info/requires.txt +6 -2
- {testmcpy-0.2.0 → testmcpy-0.2.2}/tests/test_url_protection.py +51 -101
- testmcpy-0.2.0/PKG-INFO +0 -403
- testmcpy-0.2.0/README.md +0 -351
- testmcpy-0.2.0/pyproject.toml +0 -81
- testmcpy-0.2.0/testmcpy/evals/__init__.py +0 -1
- testmcpy-0.2.0/testmcpy/evals/base_evaluators.py +0 -539
- testmcpy-0.2.0/testmcpy/src/__init__.py +0 -1
- testmcpy-0.2.0/testmcpy/ui/dist/assets/index-DJJ3xyEQ.css +0 -1
- testmcpy-0.2.0/testmcpy/ui/dist/assets/index-DvrU_H8q.js +0 -258
- testmcpy-0.2.0/testmcpy/ui/dist/index.html +0 -14
- testmcpy-0.2.0/testmcpy.egg-info/PKG-INFO +0 -403
- {testmcpy-0.2.0 → testmcpy-0.2.2}/LICENSE +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/MANIFEST.in +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/setup.cfg +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/__init__.py +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/server/__init__.py +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/README.md +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/index.html +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/postcss.config.js +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/index.css +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/main.jsx +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/src/pages/Configuration.jsx +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy/ui/vite.config.js +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy.egg-info/dependency_links.txt +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy.egg-info/entry_points.txt +0 -0
- {testmcpy-0.2.0 → testmcpy-0.2.2}/testmcpy.egg-info/top_level.txt +0 -0
testmcpy-0.2.2/NOTICE
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
testmcpy
|
|
2
|
+
Copyright 2024 Preset, Inc.
|
|
3
|
+
|
|
4
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
5
|
+
you may not use this file except in compliance with the License.
|
|
6
|
+
You may obtain a copy of the License at
|
|
7
|
+
|
|
8
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
9
|
+
|
|
10
|
+
Unless required by applicable law or agreed to in writing, software
|
|
11
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
12
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
13
|
+
See the License for the specific language governing permissions and
|
|
14
|
+
limitations under the License.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
This project includes software developed by Preset, Inc.
|
testmcpy-0.2.2/PKG-INFO
ADDED
|
@@ -0,0 +1,474 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: testmcpy
|
|
3
|
+
Version: 0.2.2
|
|
4
|
+
Summary: A comprehensive testing framework for validating LLM tool calling capabilities with MCP services
|
|
5
|
+
Author: Amin Ghadersohi
|
|
6
|
+
License-Expression: Apache-2.0
|
|
7
|
+
Project-URL: Homepage, https://github.com/preset-io/testmcpy
|
|
8
|
+
Project-URL: Repository, https://github.com/preset-io/testmcpy
|
|
9
|
+
Project-URL: Issues, https://github.com/preset-io/testmcpy/issues
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Requires-Python: <3.13,>=3.10
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
License-File: LICENSE
|
|
19
|
+
License-File: NOTICE
|
|
20
|
+
Requires-Dist: typer<1.0.0,>=0.9.0
|
|
21
|
+
Requires-Dist: rich<14.0.0,>=13.0.0
|
|
22
|
+
Requires-Dist: pyyaml<7.0,>=6.0
|
|
23
|
+
Requires-Dist: requests<3.0.0,>=2.28.0
|
|
24
|
+
Requires-Dist: aiohttp<4.0.0,>=3.8.0
|
|
25
|
+
Requires-Dist: ollama>=0.1.0
|
|
26
|
+
Requires-Dist: anthropic<1.0.0,>=0.39.0
|
|
27
|
+
Requires-Dist: fastmcp<3.0.0,>=2.0.0
|
|
28
|
+
Requires-Dist: httpx<1.0.0,>=0.27.0
|
|
29
|
+
Requires-Dist: python-dotenv<2.0.0,>=1.0.0
|
|
30
|
+
Requires-Dist: click<9.0.0,>=8.0.0
|
|
31
|
+
Requires-Dist: shellingham<2.0.0,>=1.3.0
|
|
32
|
+
Provides-Extra: dev
|
|
33
|
+
Requires-Dist: ruff>=0.8.0; extra == "dev"
|
|
34
|
+
Requires-Dist: mypy>=1.13.0; extra == "dev"
|
|
35
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
36
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
37
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
|
|
38
|
+
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
|
|
39
|
+
Requires-Dist: build>=1.0.0; extra == "dev"
|
|
40
|
+
Requires-Dist: twine>=5.0.0; extra == "dev"
|
|
41
|
+
Requires-Dist: types-pyyaml>=6.0.0; extra == "dev"
|
|
42
|
+
Requires-Dist: types-requests>=2.28.0; extra == "dev"
|
|
43
|
+
Provides-Extra: server
|
|
44
|
+
Requires-Dist: fastapi<1.0.0,>=0.104.0; extra == "server"
|
|
45
|
+
Requires-Dist: uvicorn<1.0.0,>=0.24.0; extra == "server"
|
|
46
|
+
Requires-Dist: websockets<13.0,>=12.0; extra == "server"
|
|
47
|
+
Provides-Extra: sdk
|
|
48
|
+
Requires-Dist: claude-agent-sdk>=0.1.0; extra == "sdk"
|
|
49
|
+
Provides-Extra: all
|
|
50
|
+
Requires-Dist: fastapi<1.0.0,>=0.104.0; extra == "all"
|
|
51
|
+
Requires-Dist: uvicorn<1.0.0,>=0.24.0; extra == "all"
|
|
52
|
+
Requires-Dist: websockets<13.0,>=12.0; extra == "all"
|
|
53
|
+
Requires-Dist: claude-agent-sdk>=0.1.0; extra == "all"
|
|
54
|
+
Dynamic: license-file
|
|
55
|
+
|
|
56
|
+
# testmcpy
|
|
57
|
+
|
|
58
|
+
**Test and benchmark LLMs with MCP tools in minutes.**
|
|
59
|
+
|
|
60
|
+
A testing framework for validating how LLMs call tools via Model Context Protocol (MCP) - compare Claude, GPT-4, Llama, and other models' accuracy, cost, and performance.
|
|
61
|
+
|
|
62
|
+
[](https://www.python.org/downloads/)
|
|
63
|
+
[](LICENSE)
|
|
64
|
+
[](https://pypi.org/project/testmcpy/)
|
|
65
|
+
|
|
66
|
+
[Screenshot: CLI test runner with colorful progress bars and results]
|
|
67
|
+
|
|
68
|
+
[Screenshot: Web UI showing tool explorer and interactive chat]
|
|
69
|
+
|
|
70
|
+
[GIF: Running a test suite from command line with real-time progress]
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
**[Documentation](docs/)** • **[Examples](examples/)** • **[Contributing](CONTRIBUTING.md)** • **[Discussions](https://github.com/preset-io/testmcpy/discussions)**
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Why testmcpy?
|
|
79
|
+
|
|
80
|
+
- **Validate tool calling**: Ensure LLMs call the right tools with correct parameters
|
|
81
|
+
- **Compare models**: Find the best price/performance balance for your use case
|
|
82
|
+
- **Prevent regressions**: Catch breaking changes in your MCP service with CI/CD
|
|
83
|
+
- **Optimize costs**: Track token usage and identify the most cost-effective models
|
|
84
|
+
|
|
85
|
+
## Quick Start
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
# Install testmcpy
|
|
89
|
+
pip install testmcpy
|
|
90
|
+
|
|
91
|
+
# Run interactive setup
|
|
92
|
+
testmcpy setup
|
|
93
|
+
|
|
94
|
+
# Start testing
|
|
95
|
+
testmcpy chat # Interactive chat with MCP tools
|
|
96
|
+
testmcpy research # Test LLM tool-calling capabilities
|
|
97
|
+
testmcpy run tests/ # Run your test suite
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
That's it! No complex configuration needed to get started.
|
|
101
|
+
|
|
102
|
+
## Key Features
|
|
103
|
+
|
|
104
|
+
### Multi-Provider Support
|
|
105
|
+
Test with **Claude**, **GPT-4**, **Llama**, and other models. Works with both paid APIs and free local models via Ollama.
|
|
106
|
+
|
|
107
|
+
[Screenshot: Model selector showing Claude, GPT-4, and Ollama options]
|
|
108
|
+
|
|
109
|
+
### Built-in Evaluators
|
|
110
|
+
Comprehensive validation out of the box:
|
|
111
|
+
- **Tool Selection**: Did the LLM call the right tool?
|
|
112
|
+
- **Parameter Validation**: Were correct parameters passed?
|
|
113
|
+
- **Execution Success**: Did the tool call complete without errors?
|
|
114
|
+
- **Performance**: Response time and token usage tracking
|
|
115
|
+
- **Cost Analysis**: Monitor API costs across test runs
|
|
116
|
+
|
|
117
|
+
[Screenshot: Test results showing pass/fail for different evaluators]
|
|
118
|
+
|
|
119
|
+
### Beautiful CLI & Web UI
|
|
120
|
+
- **Rich terminal UI**: Progress bars, colored output, formatted tables
|
|
121
|
+
- **Optional web interface**: Visual tool explorer and interactive chat
|
|
122
|
+
- **Real-time feedback**: Watch tests execute with live updates
|
|
123
|
+
|
|
124
|
+
[Screenshot: Split view of CLI and Web UI running the same test]
|
|
125
|
+
|
|
126
|
+
### YAML Test Definitions
|
|
127
|
+
Define test suites as code for repeatable, version-controlled testing:
|
|
128
|
+
|
|
129
|
+
```yaml
|
|
130
|
+
version: "1.0"
|
|
131
|
+
name: "Chart Operations Test Suite"
|
|
132
|
+
|
|
133
|
+
tests:
|
|
134
|
+
- name: "test_create_chart"
|
|
135
|
+
prompt: "Create a bar chart showing sales by region"
|
|
136
|
+
evaluators:
|
|
137
|
+
- name: "was_mcp_tool_called"
|
|
138
|
+
args:
|
|
139
|
+
tool_name: "create_chart"
|
|
140
|
+
- name: "execution_successful"
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Use Cases
|
|
144
|
+
|
|
145
|
+
Perfect for:
|
|
146
|
+
|
|
147
|
+
- **LLM Benchmarking**: Compare tool-calling accuracy across Claude, GPT-4, and Llama
|
|
148
|
+
- **MCP Service Testing**: Validate your MCP integrations work correctly
|
|
149
|
+
- **Regression Prevention**: Catch breaking changes in CI/CD pipelines
|
|
150
|
+
- **Model Selection**: Make data-driven decisions about which LLM to use
|
|
151
|
+
- **Cost Optimization**: Find the best price/performance balance for your workload
|
|
152
|
+
- **Parameter Validation**: Ensure LLMs pass correct parameters to your tools
|
|
153
|
+
|
|
154
|
+
## Architecture
|
|
155
|
+
|
|
156
|
+
testmcpy connects your LLM provider to your MCP service and validates the interactions:
|
|
157
|
+
|
|
158
|
+
```mermaid
|
|
159
|
+
graph TB
|
|
160
|
+
subgraph "CLI Interface"
|
|
161
|
+
CLI[testmcpy CLI]
|
|
162
|
+
WebUI[Web UI - Optional]
|
|
163
|
+
end
|
|
164
|
+
|
|
165
|
+
subgraph "Core Framework"
|
|
166
|
+
TestRunner[Test Runner]
|
|
167
|
+
Evaluators[Evaluators]
|
|
168
|
+
Config[Configuration Manager]
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
subgraph "LLM Providers"
|
|
172
|
+
Anthropic[Anthropic API]
|
|
173
|
+
OpenAI[OpenAI API]
|
|
174
|
+
Ollama[Ollama Local]
|
|
175
|
+
end
|
|
176
|
+
|
|
177
|
+
subgraph "MCP Integration"
|
|
178
|
+
MCPClient[MCP Client]
|
|
179
|
+
MCPService[MCP Service<br/>HTTP/SSE]
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
CLI --> TestRunner
|
|
183
|
+
WebUI --> TestRunner
|
|
184
|
+
TestRunner --> Config
|
|
185
|
+
TestRunner --> Evaluators
|
|
186
|
+
TestRunner --> Anthropic
|
|
187
|
+
TestRunner --> OpenAI
|
|
188
|
+
TestRunner --> Ollama
|
|
189
|
+
Anthropic --> MCPClient
|
|
190
|
+
OpenAI --> MCPClient
|
|
191
|
+
Ollama --> MCPClient
|
|
192
|
+
MCPClient --> MCPService
|
|
193
|
+
|
|
194
|
+
style CLI fill:#4A90E2
|
|
195
|
+
style WebUI fill:#4A90E2
|
|
196
|
+
style TestRunner fill:#50E3C2
|
|
197
|
+
style MCPClient fill:#F5A623
|
|
198
|
+
style MCPService fill:#BD10E0
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
**How it works:**
|
|
202
|
+
1. Define test cases in YAML with prompts and expected behavior
|
|
203
|
+
2. testmcpy sends prompts to your chosen LLM (Claude, GPT-4, Llama, etc.)
|
|
204
|
+
3. LLM calls tools via MCP protocol to your service
|
|
205
|
+
4. Evaluators validate tool selection, parameters, execution, and performance
|
|
206
|
+
5. Get detailed pass/fail results with metrics and cost analysis
|
|
207
|
+
|
|
208
|
+
## Installation
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
# Install base package
|
|
212
|
+
pip install testmcpy
|
|
213
|
+
|
|
214
|
+
# With web UI support
|
|
215
|
+
pip install 'testmcpy[server]'
|
|
216
|
+
|
|
217
|
+
# All optional features
|
|
218
|
+
pip install 'testmcpy[all]'
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Requirements:** Python 3.9-3.12 (3.13+ not yet supported)
|
|
222
|
+
|
|
223
|
+
## Getting Started
|
|
224
|
+
|
|
225
|
+
### 1. Configuration
|
|
226
|
+
|
|
227
|
+
Run the interactive setup wizard:
|
|
228
|
+
|
|
229
|
+
```bash
|
|
230
|
+
testmcpy setup
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
Or manually create `~/.testmcpy`:
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
# MCP Service
|
|
237
|
+
MCP_URL=http://localhost:5008/mcp/
|
|
238
|
+
MCP_AUTH_TOKEN=your_bearer_token
|
|
239
|
+
|
|
240
|
+
# LLM Provider (choose one)
|
|
241
|
+
DEFAULT_PROVIDER=anthropic
|
|
242
|
+
DEFAULT_MODEL=claude-haiku-4-5
|
|
243
|
+
ANTHROPIC_API_KEY=sk-ant-...
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
**Configuration priority:** CLI options > `.env` > `~/.testmcpy` > Environment variables > Defaults
|
|
247
|
+
|
|
248
|
+
### 2. Test Your MCP Service
|
|
249
|
+
|
|
250
|
+
```bash
|
|
251
|
+
# List available MCP tools
|
|
252
|
+
testmcpy tools
|
|
253
|
+
|
|
254
|
+
# Interactive chat to explore your tools
|
|
255
|
+
testmcpy chat
|
|
256
|
+
|
|
257
|
+
# Run automated research on tool-calling capabilities
|
|
258
|
+
testmcpy research --model claude-haiku-4-5
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### 3. Create Test Suites
|
|
262
|
+
|
|
263
|
+
Define tests in YAML (`tests/my_tests.yaml`):
|
|
264
|
+
|
|
265
|
+
```yaml
|
|
266
|
+
version: "1.0"
|
|
267
|
+
name: "My MCP Service Tests"
|
|
268
|
+
|
|
269
|
+
tests:
|
|
270
|
+
- name: "test_tool_selection"
|
|
271
|
+
prompt: "Create a bar chart showing sales by region"
|
|
272
|
+
evaluators:
|
|
273
|
+
- name: "was_mcp_tool_called"
|
|
274
|
+
args:
|
|
275
|
+
tool_name: "create_chart"
|
|
276
|
+
- name: "execution_successful"
|
|
277
|
+
- name: "within_time_limit"
|
|
278
|
+
args:
|
|
279
|
+
max_seconds: 30
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
Run your tests:
|
|
283
|
+
|
|
284
|
+
```bash
|
|
285
|
+
testmcpy run tests/ --model claude-haiku-4-5
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Documentation
|
|
289
|
+
|
|
290
|
+
### Core Guides
|
|
291
|
+
- **[Evaluator Reference](docs/EVALUATOR_REFERENCE.md)** - All available evaluators and usage examples
|
|
292
|
+
- **[Client Usage Guide](docs/CLIENT_USAGE_GUIDE.md)** - Complete guide for testing your MCP service
|
|
293
|
+
- **[MCP Profiles](docs/MCP_PROFILES.md)** - Managing multiple MCP service configurations
|
|
294
|
+
|
|
295
|
+
### Examples
|
|
296
|
+
- **[Basic Tests](examples/)** - Simple test cases to get started
|
|
297
|
+
- **[CI/CD Integration](examples/ci-cd/)** - GitHub Actions and GitLab CI configurations
|
|
298
|
+
- **[Custom Evaluators](examples/)** - Building your own validation logic
|
|
299
|
+
|
|
300
|
+
### Commands Reference
|
|
301
|
+
|
|
302
|
+
| Command | Description |
|
|
303
|
+
|---------|-------------|
|
|
304
|
+
| `testmcpy setup` | Interactive configuration wizard |
|
|
305
|
+
| `testmcpy tools` | List available MCP tools |
|
|
306
|
+
| `testmcpy research` | Test LLM tool-calling capabilities |
|
|
307
|
+
| `testmcpy run <path>` | Execute test suite |
|
|
308
|
+
| `testmcpy chat` | Interactive chat with MCP tools |
|
|
309
|
+
| `testmcpy serve` | Start web UI server |
|
|
310
|
+
| `testmcpy report` | Compare test results across models |
|
|
311
|
+
| `testmcpy config-cmd` | View current configuration |
|
|
312
|
+
| `testmcpy doctor` | Diagnose installation issues |
|
|
313
|
+
|
|
314
|
+
## LLM Providers
|
|
315
|
+
|
|
316
|
+
### Anthropic (Recommended)
|
|
317
|
+
Best tool-calling accuracy, native MCP support:
|
|
318
|
+
|
|
319
|
+
```bash
|
|
320
|
+
ANTHROPIC_API_KEY=sk-ant-your-key
|
|
321
|
+
DEFAULT_MODEL=claude-haiku-4-5 # Fast & cost-effective
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Available models:** `claude-haiku-4-5`, `claude-sonnet-4-5`, `claude-opus-4-1`
|
|
325
|
+
|
|
326
|
+
### Ollama (Free, Local)
|
|
327
|
+
Perfect for development without API costs:
|
|
328
|
+
|
|
329
|
+
```bash
|
|
330
|
+
# Install Ollama
|
|
331
|
+
brew install ollama # macOS
|
|
332
|
+
# or: curl -fsSL https://ollama.com/install.sh | sh
|
|
333
|
+
|
|
334
|
+
# Start Ollama and pull a model
|
|
335
|
+
ollama serve
|
|
336
|
+
ollama pull llama3.1:8b
|
|
337
|
+
|
|
338
|
+
# Configure testmcpy
|
|
339
|
+
DEFAULT_PROVIDER=ollama
|
|
340
|
+
DEFAULT_MODEL=llama3.1:8b
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
### OpenAI
|
|
344
|
+
```bash
|
|
345
|
+
OPENAI_API_KEY=sk-your-key
|
|
346
|
+
DEFAULT_MODEL=gpt-4-turbo
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
## Built-in Evaluators
|
|
350
|
+
|
|
351
|
+
testmcpy includes comprehensive evaluators for validating LLM behavior:
|
|
352
|
+
|
|
353
|
+
### Tool Calling
|
|
354
|
+
- `was_mcp_tool_called` - Verify specific tool was invoked
|
|
355
|
+
- `tool_call_count` - Validate number of tool calls
|
|
356
|
+
- `tool_called_with_parameter` - Check specific parameter was passed
|
|
357
|
+
- `tool_called_with_parameters` - Validate multiple parameters
|
|
358
|
+
- `parameter_value_in_range` - Ensure numeric parameters are valid
|
|
359
|
+
|
|
360
|
+
### Execution
|
|
361
|
+
- `execution_successful` - Check for errors or failures
|
|
362
|
+
- `within_time_limit` - Performance validation
|
|
363
|
+
- `final_answer_contains` - Validate response content
|
|
364
|
+
|
|
365
|
+
### Cost & Performance
|
|
366
|
+
- `token_usage_reasonable` - Cost efficiency validation
|
|
367
|
+
- Performance metrics automatically tracked
|
|
368
|
+
|
|
369
|
+
**Extensible:** Easily add custom evaluators for your domain-specific needs.
|
|
370
|
+
|
|
371
|
+
See **[Evaluator Reference](docs/EVALUATOR_REFERENCE.md)** for complete documentation.
|
|
372
|
+
|
|
373
|
+
## For MCP Service Developers
|
|
374
|
+
|
|
375
|
+
Integrate testmcpy into your MCP service for automated testing:
|
|
376
|
+
|
|
377
|
+
```bash
|
|
378
|
+
# Install testmcpy in your project
|
|
379
|
+
pip install testmcpy[all]
|
|
380
|
+
|
|
381
|
+
# Create tests for your MCP tools
|
|
382
|
+
cat > tests/my_service_tests.yaml <<EOF
|
|
383
|
+
version: "1.0"
|
|
384
|
+
name: "My MCP Service Tests"
|
|
385
|
+
tests:
|
|
386
|
+
- name: "test_tool_selection"
|
|
387
|
+
prompt: "List all items"
|
|
388
|
+
evaluators:
|
|
389
|
+
- name: "was_mcp_tool_called"
|
|
390
|
+
args:
|
|
391
|
+
tool_name: "list_items"
|
|
392
|
+
- name: "execution_successful"
|
|
393
|
+
EOF
|
|
394
|
+
|
|
395
|
+
# Run tests in CI/CD
|
|
396
|
+
testmcpy run tests/ --model claude-haiku-4-5
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
**[Client Usage Guide](docs/CLIENT_USAGE_GUIDE.md)** - Complete integration guide for your MCP service
|
|
400
|
+
|
|
401
|
+
**[CI/CD Examples](examples/ci-cd/)** - GitHub Actions and GitLab CI configurations
|
|
402
|
+
|
|
403
|
+
## Web Interface
|
|
404
|
+
|
|
405
|
+
Optional React-based UI for visual testing:
|
|
406
|
+
|
|
407
|
+
[Screenshot: Web UI dashboard with tool explorer]
|
|
408
|
+
|
|
409
|
+
```bash
|
|
410
|
+
# Install with UI support
|
|
411
|
+
pip install 'testmcpy[server]'
|
|
412
|
+
|
|
413
|
+
# Start server
|
|
414
|
+
testmcpy serve
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
Features:
|
|
418
|
+
- Visual MCP tool explorer
|
|
419
|
+
- Interactive chat interface
|
|
420
|
+
- Test management and execution
|
|
421
|
+
- Real-time results display
|
|
422
|
+
|
|
423
|
+
Access at `http://localhost:8000`
|
|
424
|
+
|
|
425
|
+
## Examples
|
|
426
|
+
|
|
427
|
+
Check out the `examples/` directory for:
|
|
428
|
+
|
|
429
|
+
- **Basic test suites** - Simple examples to get started
|
|
430
|
+
- **CI/CD integration** - GitHub Actions and GitLab CI workflows
|
|
431
|
+
- **Custom evaluators** - Building domain-specific validation
|
|
432
|
+
- **Multi-model comparison** - Benchmarking different LLMs
|
|
433
|
+
|
|
434
|
+
## Contributing
|
|
435
|
+
|
|
436
|
+
We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.
|
|
437
|
+
|
|
438
|
+
**[Read the Contributing Guide](CONTRIBUTING.md)** to get started.
|
|
439
|
+
|
|
440
|
+
Quick guidelines:
|
|
441
|
+
- Follow Black code formatting (100 char line length)
|
|
442
|
+
- Add tests for new features
|
|
443
|
+
- Ensure multi-provider compatibility (test with Ollama, Claude, GPT)
|
|
444
|
+
- Document your changes
|
|
445
|
+
- Be respectful and collaborative
|
|
446
|
+
|
|
447
|
+
## Contributors
|
|
448
|
+
|
|
449
|
+
Built with contributions from:
|
|
450
|
+
|
|
451
|
+
<!-- Add contributor images here when ready -->
|
|
452
|
+
|
|
453
|
+
Want to see your name here? Check out our [Contributing Guide](CONTRIBUTING.md)!
|
|
454
|
+
|
|
455
|
+
## Community & Support
|
|
456
|
+
|
|
457
|
+
- **Issues**: [Report bugs or request features](https://github.com/preset-io/testmcpy/issues)
|
|
458
|
+
- **Discussions**: [Ask questions and share ideas](https://github.com/preset-io/testmcpy/discussions)
|
|
459
|
+
- **Documentation**: Browse the [docs/](docs/) directory
|
|
460
|
+
- **Examples**: Explore [examples/](examples/) for sample code
|
|
461
|
+
|
|
462
|
+
## License
|
|
463
|
+
|
|
464
|
+
Apache License 2.0 - See [LICENSE](LICENSE) for details.
|
|
465
|
+
|
|
466
|
+
By contributing, you agree that your contributions will be licensed under Apache 2.0.
|
|
467
|
+
|
|
468
|
+
---
|
|
469
|
+
|
|
470
|
+
## Acknowledgments
|
|
471
|
+
|
|
472
|
+
Built by the team at [Preset](https://preset.io) to enable better LLM testing and integration with Apache Superset and beyond.
|
|
473
|
+
|
|
474
|
+
Special thanks to the MCP community and all our contributors!
|