testmcpy 0.3.0__tar.gz → 0.3.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (181) hide show
  1. testmcpy-0.3.2/PKG-INFO +552 -0
  2. testmcpy-0.3.2/README.md +482 -0
  3. {testmcpy-0.3.0 → testmcpy-0.3.2}/pyproject.toml +5 -1
  4. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/__init__.py +1 -1
  5. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/run.py +82 -16
  6. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/evals/auth_evaluators.py +207 -0
  7. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/evals/base_evaluators.py +102 -0
  8. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/migrate_json.py +1 -0
  9. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/models.py +1 -0
  10. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/results.py +1 -0
  11. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/websocket.py +6 -1
  12. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/comparison_runner.py +144 -0
  13. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/llm_integration.py +254 -1
  14. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/model_registry.py +269 -11
  15. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/test_runner.py +247 -3
  16. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/storage.py +4 -0
  17. testmcpy-0.3.2/testmcpy/ui/src/components/StreamingLogViewer.jsx +684 -0
  18. testmcpy-0.3.2/testmcpy/ui/src/components/ToolCallTimeline.jsx +275 -0
  19. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/ChatInterface.jsx +11 -178
  20. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/MCPExplorer.jsx +36 -15
  21. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/TestManager.jsx +122 -38
  22. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/tailwind.config.js +10 -0
  23. testmcpy-0.3.2/testmcpy.egg-info/PKG-INFO +552 -0
  24. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy.egg-info/SOURCES.txt +2 -3
  25. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy.egg-info/requires.txt +4 -0
  26. testmcpy-0.3.0/PKG-INFO +0 -636
  27. testmcpy-0.3.0/README.md +0 -569
  28. testmcpy-0.3.0/testmcpy/ui/dist/assets/index-C8j69QMM.js +0 -287
  29. testmcpy-0.3.0/testmcpy/ui/dist/assets/index-DFiQIkV-.css +0 -1
  30. testmcpy-0.3.0/testmcpy/ui/dist/index.html +0 -22
  31. testmcpy-0.3.0/testmcpy.egg-info/PKG-INFO +0 -636
  32. {testmcpy-0.3.0 → testmcpy-0.3.2}/LICENSE +0 -0
  33. {testmcpy-0.3.0 → testmcpy-0.3.2}/MANIFEST.in +0 -0
  34. {testmcpy-0.3.0 → testmcpy-0.3.2}/NOTICE +0 -0
  35. {testmcpy-0.3.0 → testmcpy-0.3.2}/setup.cfg +0 -0
  36. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/__init__.py +0 -0
  37. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/hooks.py +0 -0
  38. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/models.py +0 -0
  39. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/orchestrator.py +0 -0
  40. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/prompts.py +0 -0
  41. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/agent/tools.py +0 -0
  42. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/auth_debugger.py +0 -0
  43. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/auth_flow_recorder.py +0 -0
  44. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/__init__.py +0 -0
  45. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/app.py +0 -0
  46. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/__init__.py +0 -0
  47. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/agent.py +0 -0
  48. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/baseline.py +0 -0
  49. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/export_db.py +0 -0
  50. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/mcp.py +0 -0
  51. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/metamorphic.py +0 -0
  52. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/multi_env.py +0 -0
  53. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/mutate.py +0 -0
  54. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/push.py +0 -0
  55. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/server.py +0 -0
  56. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/tools.py +0 -0
  57. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/tui.py +0 -0
  58. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/cli/commands/wizard.py +0 -0
  59. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/config.py +0 -0
  60. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/__init__.py +0 -0
  61. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/chat_session.py +0 -0
  62. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/docs_optimizer.py +0 -0
  63. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/mcp_manager.py +0 -0
  64. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/tool_comparison.py +0 -0
  65. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/core/tool_discovery.py +0 -0
  66. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/db.py +0 -0
  67. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/error_handlers.py +0 -0
  68. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/evals/__init__.py +0 -0
  69. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/evals/evaluator_packs.py +0 -0
  70. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/__init__.py +0 -0
  71. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/base.py +0 -0
  72. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/curl.py +0 -0
  73. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/graphql.py +0 -0
  74. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/javascript_client.py +0 -0
  75. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/json_yaml.py +0 -0
  76. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/protobuf.py +0 -0
  77. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/python.py +0 -0
  78. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/python_client.py +0 -0
  79. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/thrift.py +0 -0
  80. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/typescript.py +0 -0
  81. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/formatters/typescript_client.py +0 -0
  82. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/llm_profiles.py +0 -0
  83. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/mcp_profiles.py +0 -0
  84. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/research/claude_sdk_detailed_exploration.py +0 -0
  85. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/research/claude_sdk_poc.py +0 -0
  86. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/research/claude_sdk_working_poc.py +0 -0
  87. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/research/test_ollama_tools.py +0 -0
  88. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/__init__.py +0 -0
  89. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/api.py +0 -0
  90. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/api.py.bak +0 -0
  91. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/auth_middleware.py +0 -0
  92. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/helpers/__init__.py +0 -0
  93. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/helpers/mcp_config.py +0 -0
  94. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/models.py +0 -0
  95. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/__init__.py +0 -0
  96. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/agent.py +0 -0
  97. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/auth.py +0 -0
  98. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/compare.py +0 -0
  99. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/compatibility.py +0 -0
  100. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/generation_logs.py +0 -0
  101. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/health.py +0 -0
  102. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/llm.py +0 -0
  103. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/mcp_profiles.py +0 -0
  104. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/metrics.py +0 -0
  105. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/search.py +0 -0
  106. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/security.py +0 -0
  107. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/smoke_reports.py +0 -0
  108. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/test_profiles.py +0 -0
  109. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/tests.py +0 -0
  110. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/routers/tools.py +0 -0
  111. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/server/state.py +0 -0
  112. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/smoke_test.py +0 -0
  113. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/__init__.py +0 -0
  114. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/baseline.py +0 -0
  115. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/ci_gate.py +0 -0
  116. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/coverage_analyzer.py +0 -0
  117. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/html_report.py +0 -0
  118. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/mcp_client.py +0 -0
  119. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/metamorphic.py +0 -0
  120. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/models.py +0 -0
  121. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/multi_env.py +0 -0
  122. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/oauth_flows.py +0 -0
  123. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/prompt_mutation.py +0 -0
  124. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/report_generator.py +0 -0
  125. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/runner_tools.py +0 -0
  126. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/schema_diff.py +0 -0
  127. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/src/token_manager.py +0 -0
  128. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/test_profiles.py +0 -0
  129. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/README.md +0 -0
  130. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/index.html +0 -0
  131. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/package-lock.json +0 -0
  132. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/package.json +0 -0
  133. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/postcss.config.js +0 -0
  134. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/App.jsx +0 -0
  135. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/CommandPalette.jsx +0 -0
  136. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/CompareToolsTab.jsx +0 -0
  137. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/ErrorAlert.jsx +0 -0
  138. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/ErrorBoundary.jsx +0 -0
  139. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/LLMProfileSelector.jsx +0 -0
  140. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/LoadingSpinner.jsx +0 -0
  141. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/MCPProfileSelector.jsx +0 -0
  142. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/NotificationProvider.jsx +0 -0
  143. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/OptimizeDocsModal.jsx +0 -0
  144. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/OutputDiff.jsx +0 -0
  145. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/ParameterCard.jsx +0 -0
  146. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/SchemaCodeViewer.jsx +0 -0
  147. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/SkeletonLoader.jsx +0 -0
  148. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TestGenerationModal.jsx +0 -0
  149. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TestProfileSelector.jsx +0 -0
  150. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TestResultPanel.jsx +0 -0
  151. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TestStatusIndicator.jsx +0 -0
  152. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/ToolComparison.jsx +0 -0
  153. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/ToolDebugModal.jsx +0 -0
  154. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TraceView.jsx +0 -0
  155. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/TypeBadge.jsx +0 -0
  156. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/components/Wizard.jsx +0 -0
  157. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/contexts/TestRunContext.jsx +0 -0
  158. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/contexts/ThemeContext.jsx +0 -0
  159. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/hooks/useEditorTheme.js +0 -0
  160. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/hooks/useKeyboardShortcuts.js +0 -0
  161. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/hooks/useSafeFetch.js +0 -0
  162. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/index.css +0 -0
  163. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/main.jsx +0 -0
  164. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/AuthDebugger.jsx +0 -0
  165. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/CompatibilityMatrix.jsx +0 -0
  166. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/Configuration.jsx +0 -0
  167. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/GenerationHistory.jsx +0 -0
  168. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/LLMProfiles.jsx +0 -0
  169. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/MCPHealth.jsx +0 -0
  170. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/MCPProfiles.jsx +0 -0
  171. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/MetricsDashboard.jsx +0 -0
  172. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/ProfilesManager.jsx +0 -0
  173. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/Reports.jsx +0 -0
  174. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/RunComparison.jsx +0 -0
  175. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/pages/SecurityDashboard.jsx +0 -0
  176. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/utils/__tests__/formatConverters.test.js +0 -0
  177. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/src/utils/formatConverters.js +0 -0
  178. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy/ui/vite.config.js +0 -0
  179. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy.egg-info/dependency_links.txt +0 -0
  180. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy.egg-info/entry_points.txt +0 -0
  181. {testmcpy-0.3.0 → testmcpy-0.3.2}/testmcpy.egg-info/top_level.txt +0 -0
@@ -0,0 +1,552 @@
1
+ Metadata-Version: 2.4
2
+ Name: testmcpy
3
+ Version: 0.3.2
4
+ Summary: A comprehensive testing framework for validating LLM tool calling capabilities with MCP services
5
+ Author: Amin Ghadersohi
6
+ License-Expression: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/preset-io/testmcpy
8
+ Project-URL: Repository, https://github.com/preset-io/testmcpy
9
+ Project-URL: Issues, https://github.com/preset-io/testmcpy/issues
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Requires-Python: <3.13,>=3.10
17
+ Description-Content-Type: text/markdown
18
+ License-File: LICENSE
19
+ License-File: NOTICE
20
+ Requires-Dist: typer<1.0.0,>=0.9.0
21
+ Requires-Dist: rich<15.0.0,>=13.0.0
22
+ Requires-Dist: pyyaml<7.0,>=6.0
23
+ Requires-Dist: requests<3.0.0,>=2.28.0
24
+ Requires-Dist: aiohttp<4.0.0,>=3.8.0
25
+ Requires-Dist: ollama>=0.1.0
26
+ Requires-Dist: anthropic<1.0.0,>=0.39.0
27
+ Requires-Dist: fastmcp<3.0.0,>=2.0.0
28
+ Requires-Dist: httpx<1.0.0,>=0.27.0
29
+ Requires-Dist: python-dotenv<2.0.0,>=1.0.0
30
+ Requires-Dist: click<9.0.0,>=8.0.0
31
+ Requires-Dist: shellingham<2.0.0,>=1.3.0
32
+ Requires-Dist: textual<1.0.0,>=0.47.0
33
+ Requires-Dist: sqlalchemy<3.0.0,>=2.0.0
34
+ Requires-Dist: alembic<2.0.0,>=1.13.0
35
+ Provides-Extra: dev
36
+ Requires-Dist: ruff>=0.8.0; extra == "dev"
37
+ Requires-Dist: mypy>=1.13.0; extra == "dev"
38
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
39
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
40
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
41
+ Requires-Dist: pre-commit>=3.0.0; extra == "dev"
42
+ Requires-Dist: build>=1.0.0; extra == "dev"
43
+ Requires-Dist: twine>=5.0.0; extra == "dev"
44
+ Requires-Dist: types-pyyaml>=6.0.0; extra == "dev"
45
+ Requires-Dist: types-requests>=2.28.0; extra == "dev"
46
+ Requires-Dist: textual-dev>=1.0.0; extra == "dev"
47
+ Provides-Extra: server
48
+ Requires-Dist: fastapi<1.0.0,>=0.104.0; extra == "server"
49
+ Requires-Dist: uvicorn[standard]<1.0.0,>=0.24.0; extra == "server"
50
+ Requires-Dist: websockets<15.0,>=14.0; extra == "server"
51
+ Provides-Extra: sdk
52
+ Requires-Dist: claude-agent-sdk>=0.1.0; extra == "sdk"
53
+ Provides-Extra: tui
54
+ Requires-Dist: textual>=0.85.0; extra == "tui"
55
+ Provides-Extra: e2e
56
+ Requires-Dist: playwright>=1.40.0; extra == "e2e"
57
+ Requires-Dist: pytest-playwright>=0.4.0; extra == "e2e"
58
+ Provides-Extra: export
59
+ Requires-Dist: pandas<3.0.0,>=2.0.0; extra == "export"
60
+ Provides-Extra: bedrock
61
+ Requires-Dist: boto3>=1.28.0; extra == "bedrock"
62
+ Provides-Extra: all
63
+ Requires-Dist: fastapi<1.0.0,>=0.104.0; extra == "all"
64
+ Requires-Dist: uvicorn[standard]<1.0.0,>=0.24.0; extra == "all"
65
+ Requires-Dist: websockets<15.0,>=14.0; extra == "all"
66
+ Requires-Dist: claude-agent-sdk>=0.1.0; extra == "all"
67
+ Requires-Dist: textual>=0.85.0; extra == "all"
68
+ Requires-Dist: boto3>=1.28.0; extra == "all"
69
+ Dynamic: license-file
70
+
71
+ <p align="center">
72
+ <img src="https://raw.githubusercontent.com/preset-io/testmcpy/main/docs/logos/logo.svg" alt="testmcpy logo" width="600">
73
+ </p>
74
+
75
+ <p align="center">
76
+ <strong>Test and benchmark LLMs with MCP tools in minutes.</strong>
77
+ </p>
78
+
79
+ <p align="center">
80
+ A testing framework for validating how LLMs call tools via Model Context Protocol (MCP) — compare Claude, GPT-4, Llama, and other models' accuracy, cost, and performance.
81
+ </p>
82
+
83
+ <p align="center">
84
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a>
85
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License"></a>
86
+ <a href="https://pypi.org/project/testmcpy/"><img src="https://img.shields.io/badge/pypi-testmcpy-blue" alt="PyPI"></a>
87
+ </p>
88
+
89
+ ![MCP Explorer](https://raw.githubusercontent.com/preset-io/testmcpy/main/context/images/web-ui-explorer.png)
90
+
91
+ ---
92
+
93
+ **[Documentation](context/)** | **[Examples](examples/)** | **[Contributing](CONTRIBUTING.md)** | **[Discussions](https://github.com/preset-io/testmcpy/discussions)**
94
+
95
+ ---
96
+
97
+ ## Why testmcpy?
98
+
99
+ - **Validate tool calling**: Ensure LLMs call the right tools with correct parameters
100
+ - **Compare models**: Find the best price/performance balance for your use case
101
+ - **Prevent regressions**: Catch breaking changes in your MCP service with CI/CD
102
+ - **Optimize costs**: Track token usage and identify the most cost-effective models
103
+
104
+ ## Quick Start
105
+
106
+ ```bash
107
+ # Install testmcpy
108
+ pip install testmcpy
109
+
110
+ # Run interactive setup
111
+ testmcpy setup
112
+
113
+ # Start testing
114
+ testmcpy chat # Interactive chat with MCP tools
115
+ testmcpy research # Test LLM tool-calling capabilities
116
+ testmcpy run tests/ # Run your test suite
117
+ ```
118
+
119
+ That's it! No complex configuration needed to get started.
120
+
121
+ ## Key Features
122
+
123
+ ### Multi-Provider LLM Support
124
+
125
+ Test with **Claude**, **GPT-4**, **Llama**, and other models. Works with both paid APIs and free local models via Ollama. Includes a Claude SDK provider for subprocess-based MCP support.
126
+
127
+ | Provider | Models | Features |
128
+ |----------|--------|----------|
129
+ | Anthropic | claude-opus-4, claude-sonnet-4-5, claude-haiku-4-5 | Native MCP, extended thinking, vision, token caching |
130
+ | OpenAI | gpt-4, gpt-4-turbo, gpt-4o | Function calling, vision, cost tracking |
131
+ | Ollama | Llama, Mistral, etc. (local) | Free, local execution, no API costs |
132
+ | Claude SDK | claude-cli, claude-code | Subprocess-based, full MCP support |
133
+
134
+ ![LLM Profiles](https://raw.githubusercontent.com/preset-io/testmcpy/main/context/images/model-selector.png)
135
+
136
+ ### Built-in Evaluators
137
+
138
+ Comprehensive validation out of the box. Each evaluator returns a score from 0.0 to 1.0 with pass/fail status and detailed reasoning.
139
+
140
+ **Tool Calling:**
141
+ - `was_mcp_tool_called` — Verify specific tool was invoked (supports prefix/gateway matching)
142
+ - `tool_call_count` — Validate number of tool calls
143
+ - `tool_called_with_parameter` — Check specific parameter was passed (fuzzy matching)
144
+ - `tool_called_with_parameters` — Validate multiple parameters at once
145
+ - `parameter_value_in_range` — Ensure numeric parameters are within bounds
146
+
147
+ **Execution & Performance:**
148
+ - `execution_successful` — Check for errors or failures in tool results
149
+ - `within_time_limit` — Performance validation against max_seconds
150
+ - `final_answer_contains` — Validate response content
151
+ - `token_usage_reasonable` — Cost efficiency validation
152
+ - `response_time_acceptable` — Latency threshold checking
153
+ - `auth_successful` — Authentication flow validation
154
+
155
+ **Extensible:** Extend `BaseEvaluator` and implement `evaluate(context) -> EvalResult` to create custom evaluators for your domain.
156
+
157
+ ![Test Results](https://raw.githubusercontent.com/preset-io/testmcpy/main/context/images/test-results.png)
158
+
159
+ ### YAML Test Definitions
160
+
161
+ Define test suites as code for repeatable, version-controlled testing:
162
+
163
+ ```yaml
164
+ version: "1.0"
165
+ name: "Chart Operations Test Suite"
166
+
167
+ config:
168
+ timeout: 30
169
+ model: "claude-sonnet-4-5"
170
+ provider: "anthropic"
171
+
172
+ tests:
173
+ - name: "test_create_chart"
174
+ prompt: "Create a bar chart showing sales by region"
175
+ evaluators:
176
+ - name: "was_mcp_tool_called"
177
+ args:
178
+ tool_name: "create_chart"
179
+ - name: "execution_successful"
180
+
181
+ # Multi-turn test
182
+ - name: "test_multi_turn"
183
+ steps:
184
+ - prompt: "List all dashboards"
185
+ evaluators:
186
+ - name: "was_mcp_tool_called"
187
+ args:
188
+ tool_name: "list_dashboards"
189
+ - prompt: "Show me the first one"
190
+ evaluators:
191
+ - name: "final_answer_contains"
192
+ args:
193
+ content: "dashboard"
194
+
195
+ # Load testing
196
+ - name: "test_load"
197
+ prompt: "List dashboards"
198
+ load_test:
199
+ concurrent: 5
200
+ duration: 60
201
+ ```
202
+
203
+ ### Interactive TUI Dashboard
204
+
205
+ Beautiful terminal interface for MCP testing — no browser required:
206
+
207
+ ```bash
208
+ testmcpy dash # Launch interactive dashboard
209
+ testmcpy dash --auto-refresh # Live connection monitoring
210
+ testmcpy dash --profile prod # Use specific MCP profile
211
+ ```
212
+
213
+ **TUI Features:**
214
+ - Real-time MCP connection status
215
+ - Interactive tool exploration
216
+ - Live test execution with progress
217
+ - Configuration editor
218
+ - Global search across tools, tests, and settings
219
+ - Help system with keyboard shortcuts (press `?`)
220
+ - Multiple themes (default, light, high contrast)
221
+
222
+ ### CLI & Web UI
223
+
224
+ - **Rich terminal UI**: Progress bars, colored output, formatted tables
225
+ - **Optional web interface**: Visual tool explorer, interactive chat, analytics dashboards
226
+ - **Real-time feedback**: Watch tests execute with live updates via WebSocket
227
+
228
+ ![Chat Interface](https://raw.githubusercontent.com/preset-io/testmcpy/main/context/images/cli-interface.png)
229
+
230
+ ## Architecture
231
+
232
+ testmcpy connects your LLM provider to your MCP service and validates the interactions:
233
+
234
+ ```mermaid
235
+ graph TB
236
+ subgraph UI["User Interface Layer"]
237
+ CLI["CLI Commands<br>(Typer)"]
238
+ WebUI["Web UI<br>(React + Vite + Tailwind)"]
239
+ TUI["Terminal Dashboard<br>(Textual)"]
240
+ end
241
+
242
+ subgraph Core["Core Framework"]
243
+ Runner["Test Runner"]
244
+ LLM["LLM Integration"]
245
+ Evals["Evaluators"]
246
+ end
247
+
248
+ subgraph MCP_Layer["MCP Integration Layer"]
249
+ Client["MCP Client<br>(FastMCP)"]
250
+ Auth["Auth Manager"]
251
+ Discovery["Tool Discovery"]
252
+ end
253
+
254
+ subgraph External["External Services"]
255
+ LLM_APIs["LLM APIs<br>(Anthropic, OpenAI, Ollama)"]
256
+ MCP_Services["MCP Services<br>(HTTP/SSE)"]
257
+ Storage["Storage<br>(SQLite + JSON)"]
258
+ end
259
+
260
+ UI --> Core
261
+ Core --> MCP_Layer
262
+ MCP_Layer --> External
263
+ Core --> External
264
+ ```
265
+
266
+ **How it works:**
267
+ 1. Define test cases in YAML with prompts and expected behavior
268
+ 2. testmcpy sends prompts to your chosen LLM (Claude, GPT-4, Llama, etc.)
269
+ 3. LLM calls tools via MCP protocol to your service
270
+ 4. Evaluators validate tool selection, parameters, execution, and performance
271
+ 5. Get detailed pass/fail results with metrics and cost analysis
272
+
273
+ ## Installation
274
+
275
+ ```bash
276
+ # Install base package
277
+ pip install testmcpy
278
+
279
+ # With web UI support
280
+ pip install 'testmcpy[server]'
281
+
282
+ # All optional features
283
+ pip install 'testmcpy[all]'
284
+ ```
285
+
286
+ **Requirements:** Python 3.10-3.12
287
+
288
+ ## Getting Started
289
+
290
+ ### 1. Configuration
291
+
292
+ Run the interactive setup wizard:
293
+
294
+ ```bash
295
+ testmcpy setup
296
+ ```
297
+
298
+ This creates two config files:
299
+
300
+ **`.llm_providers.yaml`** — LLM configuration:
301
+
302
+ ```yaml
303
+ default: prod
304
+
305
+ profiles:
306
+ prod:
307
+ name: "Production"
308
+ providers:
309
+ - name: "Claude Sonnet"
310
+ provider: "anthropic"
311
+ model: "claude-sonnet-4-5"
312
+ api_key: "your-anthropic-api-key"
313
+ timeout: 60
314
+ default: true
315
+ ```
316
+
317
+ **`.mcp_services.yaml`** — MCP server profiles:
318
+
319
+ ```yaml
320
+ default: prod
321
+
322
+ profiles:
323
+ prod:
324
+ name: "Production"
325
+ mcps:
326
+ - name: "My MCP Service"
327
+ mcp_url: "https://your-service.example.com/mcp"
328
+ auth:
329
+ auth_type: "jwt" # or "bearer", "oauth", "none"
330
+ api_url: "https://auth.example.com/v1/auth/"
331
+ api_token: "your-api-token"
332
+ api_secret: "your-api-secret"
333
+ timeout: 30
334
+ rate_limit_rpm: 60
335
+ default: true
336
+ ```
337
+
338
+ **Configuration priority:** CLI options > Profile files > `.env` > User config (`~/.testmcpy`) > Environment variables > Built-in defaults
339
+
340
+ The setup command is **idempotent** — safe to run multiple times. Use `--force` to overwrite existing files.
341
+
342
+ ### 2. Explore Your MCP Service
343
+
344
+ ```bash
345
+ # List available MCP tools
346
+ testmcpy tools
347
+
348
+ # Interactive chat to explore your tools
349
+ testmcpy chat
350
+
351
+ # Run automated research on tool-calling capabilities
352
+ testmcpy research --model claude-haiku-4-5
353
+ ```
354
+
355
+ ### 3. Create and Run Test Suites
356
+
357
+ ```yaml
358
+ # tests/my_tests.yaml
359
+ version: "1.0"
360
+ name: "My MCP Service Tests"
361
+
362
+ tests:
363
+ - name: "test_tool_selection"
364
+ prompt: "Create a bar chart showing sales by region"
365
+ evaluators:
366
+ - name: "was_mcp_tool_called"
367
+ args:
368
+ tool_name: "create_chart"
369
+ - name: "execution_successful"
370
+ - name: "within_time_limit"
371
+ args:
372
+ max_seconds: 30
373
+ ```
374
+
375
+ ```bash
376
+ testmcpy run tests/ --model claude-haiku-4-5
377
+ ```
378
+
379
+ ## Commands Reference
380
+
381
+ | Command | Description |
382
+ |---------|-------------|
383
+ | **Setup** | |
384
+ | `testmcpy setup` | Interactive configuration wizard |
385
+ | `testmcpy doctor` | Diagnose installation issues |
386
+ | **Discovery** | |
387
+ | `testmcpy tools` | List available MCP tools |
388
+ | `testmcpy profiles` | List MCP profiles (table) |
389
+ | `testmcpy status` | Show MCP connection status |
390
+ | `testmcpy explore-cli` | Browse tools (non-interactive) |
391
+ | **Testing** | |
392
+ | `testmcpy run <path>` | Execute test suite |
393
+ | `testmcpy research` | Test LLM tool-calling capabilities |
394
+ | `testmcpy chat` | Interactive chat with MCP tools |
395
+ | `testmcpy compare` | Multi-model comparison |
396
+ | **Advanced** | |
397
+ | `testmcpy baseline` | Save and compare against baselines |
398
+ | `testmcpy mutate` | Prompt mutation testing |
399
+ | `testmcpy metamorphic` | Metamorphic testing |
400
+ | **UI** | |
401
+ | `testmcpy serve` | Start web UI server (port 8000) |
402
+ | `testmcpy dash` | Launch terminal UI dashboard |
403
+ | `testmcpy config-cmd` | View current configuration |
404
+
405
+ **Common options:** `--profile`, `--llm-profile`, `--model`, `--provider`, `--timeout`, `--verbose`, `--output`
406
+
407
+ ## Web Interface
408
+
409
+ Optional React-based UI with 15+ pages for visual testing and analytics:
410
+
411
+ ![Test Manager](https://raw.githubusercontent.com/preset-io/testmcpy/main/context/images/web-ui-dashboard.png)
412
+
413
+ ```bash
414
+ # Install with UI support
415
+ pip install 'testmcpy[server]'
416
+
417
+ # Start server
418
+ testmcpy serve
419
+ ```
420
+
421
+ | Route | Page | Description |
422
+ |-------|------|-------------|
423
+ | `/` | MCP Explorer | Tool discovery, smoke tests, schema viewing |
424
+ | `/tests` | Test Manager | YAML test browser, execution, results |
425
+ | `/reports` | Reports | All test results, evaluations, cost analysis |
426
+ | `/chat` | Chat Interface | Multi-turn conversation with MCP tools |
427
+ | `/metrics` | Metrics Dashboard | Performance and cost analytics |
428
+ | `/compare` | Run Comparison | Side-by-side model comparison |
429
+ | `/compatibility` | Compatibility Matrix | Tool/model compatibility view |
430
+ | `/mcp-health` | MCP Health | Server health monitoring |
431
+ | `/security` | Security Dashboard | Security analysis |
432
+ | `/generation-history` | Generation History | AI test generation logs |
433
+ | `/auth-debugger` | Auth Debugger | Auth flow debugging |
434
+ | `/config` | Configuration | Settings and environment |
435
+ | `/mcp-profiles` | MCP Profiles | MCP server configuration |
436
+ | `/llm-profiles` | LLM Profiles | LLM provider configuration |
437
+
438
+ Access at `http://localhost:8000`
439
+
440
+ ## LLM Providers
441
+
442
+ ### Anthropic (Recommended)
443
+
444
+ Best tool-calling accuracy, native MCP support:
445
+
446
+ ```yaml
447
+ # .llm_providers.yaml
448
+ prod:
449
+ name: "Production"
450
+ providers:
451
+ - name: "Claude Sonnet"
452
+ provider: "anthropic"
453
+ model: "claude-sonnet-4-5"
454
+ api_key_env: "ANTHROPIC_API_KEY"
455
+ default: true
456
+ ```
457
+
458
+ ### Ollama (Free, Local)
459
+
460
+ Perfect for development without API costs:
461
+
462
+ ```bash
463
+ brew install ollama # macOS
464
+ ollama serve
465
+ ollama pull llama3.1:8b
466
+ ```
467
+
468
+ ```yaml
469
+ local:
470
+ name: "Local Only"
471
+ providers:
472
+ - name: "Ollama Llama"
473
+ provider: "ollama"
474
+ model: "llama3.1:8b"
475
+ base_url: "http://localhost:11434"
476
+ default: true
477
+ ```
478
+
479
+ ### OpenAI
480
+
481
+ ```yaml
482
+ openai:
483
+ name: "OpenAI"
484
+ providers:
485
+ - name: "GPT-4"
486
+ provider: "openai"
487
+ model: "gpt-4-turbo"
488
+ api_key_env: "OPENAI_API_KEY"
489
+ default: true
490
+ ```
491
+
492
+ ## CI/CD Integration
493
+
494
+ testmcpy ships as a GitHub Action:
495
+
496
+ ```yaml
497
+ - uses: preset-io/testmcpy@v1
498
+ with:
499
+ test-path: tests/
500
+ model: claude-haiku-4-5
501
+ mcp-profile: prod
502
+ ```
503
+
504
+ ## Custom Evaluators
505
+
506
+ Extend testmcpy with domain-specific validation:
507
+
508
+ ```python
509
+ from testmcpy.evals.base_evaluators import BaseEvaluator, EvalResult
510
+
511
+ class MyEvaluator(BaseEvaluator):
512
+ def evaluate(self, context: dict) -> EvalResult:
513
+ response = context.get("response", "")
514
+ passed = "expected" in response
515
+ return EvalResult(
516
+ passed=passed,
517
+ score=1.0 if passed else 0.0,
518
+ reason=f"Check passed: {passed}",
519
+ )
520
+ ```
521
+
522
+ See **[Evaluator Reference](context/concepts/evaluators.md)** for complete documentation.
523
+
524
+ ## Examples
525
+
526
+ Check out the `examples/` directory for:
527
+
528
+ - **Basic test suites** — Simple examples to get started
529
+ - **CI/CD integration** — GitHub Actions and GitLab CI workflows
530
+ - **Custom evaluators** — Building domain-specific validation
531
+ - **Multi-model comparison** — Benchmarking different LLMs
532
+
533
+ ## Contributing
534
+
535
+ We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.
536
+
537
+ **[Read the Contributing Guide](CONTRIBUTING.md)** to get started.
538
+
539
+ ## Community & Support
540
+
541
+ - **Issues**: [Report bugs or request features](https://github.com/preset-io/testmcpy/issues)
542
+ - **Discussions**: [Ask questions and share ideas](https://github.com/preset-io/testmcpy/discussions)
543
+ - **Documentation**: Browse the [context/](context/) directory
544
+ - **Examples**: Explore [examples/](examples/) for sample code
545
+
546
+ ## License
547
+
548
+ Apache License 2.0 — See [LICENSE](LICENSE) for details.
549
+
550
+ ---
551
+
552
+ **Built by [@aminghadersohi](https://github.com/aminghadersohi)** ([Preset](https://preset.io), [Apache Superset](https://github.com/apache/superset)).