@langwatch/mcp-server 0.0.5 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/.env.example +2 -0
  2. package/.eslintrc.cjs +0 -1
  3. package/CHANGELOG.md +29 -0
  4. package/CONTRIBUTING.md +96 -0
  5. package/README.md +13 -6
  6. package/dist/index.js +7957 -1017
  7. package/dist/index.js.map +1 -1
  8. package/package.json +22 -9
  9. package/pnpm-workspace.yaml +2 -0
  10. package/pyproject.toml +17 -0
  11. package/src/index.ts +54 -11
  12. package/src/langwatch-api.ts +95 -85
  13. package/tests/evaluations.ipynb +649 -0
  14. package/tests/fixtures/azure/azure_openai_stream_bot_expected.py +102 -0
  15. package/tests/fixtures/azure/azure_openai_stream_bot_input.py +78 -0
  16. package/tests/fixtures/dspy/dspy_bot_expected.py +61 -0
  17. package/tests/fixtures/dspy/dspy_bot_input.py +53 -0
  18. package/tests/fixtures/fastapi/fastapi_app_expected.py +68 -0
  19. package/tests/fixtures/fastapi/fastapi_app_input.py +60 -0
  20. package/tests/fixtures/fastapi/prompt_management_fastapi_expected.py +114 -0
  21. package/tests/fixtures/fastapi/prompt_management_fastapi_input.py +88 -0
  22. package/tests/fixtures/haystack/haystack_bot_expected.py +141 -0
  23. package/tests/fixtures/haystack/haystack_bot_input.py +69 -0
  24. package/tests/fixtures/langchain/langchain_bot_expected.py +53 -0
  25. package/tests/fixtures/langchain/langchain_bot_input.py +45 -0
  26. package/tests/fixtures/langchain/langchain_bot_with_memory_expected.py +69 -0
  27. package/tests/fixtures/langchain/langchain_bot_with_memory_input.py +61 -0
  28. package/tests/fixtures/langchain/langchain_rag_bot_expected.py +97 -0
  29. package/tests/fixtures/langchain/langchain_rag_bot_input.py +77 -0
  30. package/tests/fixtures/langchain/langchain_rag_bot_vertex_ai_expected.py +116 -0
  31. package/tests/fixtures/langchain/langchain_rag_bot_vertex_ai_input.py +81 -0
  32. package/tests/fixtures/langchain/langgraph_rag_bot_with_threads_expected.py +331 -0
  33. package/tests/fixtures/langchain/langgraph_rag_bot_with_threads_input.py +106 -0
  34. package/tests/fixtures/litellm/litellm_bot_expected.py +40 -0
  35. package/tests/fixtures/litellm/litellm_bot_input.py +35 -0
  36. package/tests/fixtures/openai/openai_bot_expected.py +43 -0
  37. package/tests/fixtures/openai/openai_bot_function_call_expected.py +91 -0
  38. package/tests/fixtures/openai/openai_bot_function_call_input.py +82 -0
  39. package/tests/fixtures/openai/openai_bot_input.py +36 -0
  40. package/tests/fixtures/openai/openai_bot_rag_expected.py +73 -0
  41. package/tests/fixtures/openai/openai_bot_rag_input.py +51 -0
  42. package/tests/fixtures/opentelemetry/openinference_dspy_bot_expected.py +63 -0
  43. package/tests/fixtures/opentelemetry/openinference_dspy_bot_input.py +58 -0
  44. package/tests/fixtures/opentelemetry/openinference_langchain_bot_expected.py +53 -0
  45. package/tests/fixtures/opentelemetry/openinference_langchain_bot_input.py +52 -0
  46. package/tests/fixtures/opentelemetry/openinference_openai_bot_expected.py +49 -0
  47. package/tests/fixtures/opentelemetry/openinference_openai_bot_input.py +41 -0
  48. package/tests/fixtures/opentelemetry/openllmetry_openai_bot_expected.py +44 -0
  49. package/tests/fixtures/opentelemetry/openllmetry_openai_bot_input.py +40 -0
  50. package/tests/fixtures/strands/strands_bot_expected.py +84 -0
  51. package/tests/fixtures/strands/strands_bot_input.py +52 -0
  52. package/tests/scenario-openai.test.ts +158 -0
  53. package/tsconfig.json +0 -1
  54. package/uv.lock +2607 -0
  55. package/vitest.config.js +7 -0
package/.env.example ADDED
@@ -0,0 +1,2 @@
1
+ LANGWATCH_API_KEY=
2
+ ANTHROPIC_API_KEY=
package/.eslintrc.cjs CHANGED
@@ -1,4 +1,3 @@
1
- /** @type {import("eslint").Linter.Config} */
2
1
  const config = {
3
2
  parser: "@typescript-eslint/parser",
4
3
  parserOptions: {
package/CHANGELOG.md ADDED
@@ -0,0 +1,29 @@
1
+ # Changelog
2
+
3
+ ## [0.1.0](https://github.com/langwatch/langwatch/compare/mcp-server@v0.0.5...mcp-server@v0.1.0) (2025-09-19)
4
+
5
+
6
+ ### Features
7
+
8
+ * added auto setup functionality for langwatch mcp ([#617](https://github.com/langwatch/langwatch/issues/617)) ([8c95b07](https://github.com/langwatch/langwatch/commit/8c95b07598a74285940b0c9267368543a9ced5e0))
9
+ * ci/cd steps for all packages and deployables, including improvements to caching and bundle sizes ([#351](https://github.com/langwatch/langwatch/issues/351)) ([e67a169](https://github.com/langwatch/langwatch/commit/e67a1694fec2f96479266454403928e9dc68a20f))
10
+
11
+
12
+ ### Bug Fixes
13
+
14
+ * add missing dotenv dependency for running tests ([fb706ce](https://github.com/langwatch/langwatch/commit/fb706ceef9a298d070b264ad8b6da7c2df5e2a5d))
15
+ * judge agent for mcp-server test ([cd8e378](https://github.com/langwatch/langwatch/commit/cd8e3783ec02f02174ecb5fd86fa86c3f11e1734))
16
+ * mcp-server ci ([0ab6e51](https://github.com/langwatch/langwatch/commit/0ab6e513129d9b1fbdb7a696ce1d99ed6093dea3))
17
+ * run claude-code on the CI ([d760307](https://github.com/langwatch/langwatch/commit/d760307807c72a2a0e995a4f0a42845c2cc5114a))
18
+
19
+
20
+ ### Documentation
21
+
22
+ * add detailed markdown documentation for LangWatch eval notebook ([#618](https://github.com/langwatch/langwatch/issues/618)) ([525b62a](https://github.com/langwatch/langwatch/commit/525b62ad6ea01f122297b1a3fd1eb7e842479f19))
23
+ * added mcp-server contributing guide ([19d1431](https://github.com/langwatch/langwatch/commit/19d14313824663842e5bba3a98986b9b80382300))
24
+ * improve notebook descriptions ([fa1f267](https://github.com/langwatch/langwatch/commit/fa1f26705bfff3143dbd6d16edfdae86bd5ce6bd))
25
+
26
+
27
+ ### Code Refactoring
28
+
29
+ * split tool call fix helper ([c95028f](https://github.com/langwatch/langwatch/commit/c95028fba882357b33ca975e9d08ceabfe5cfc1c))
@@ -0,0 +1,96 @@
1
+ # Contributing to LangWatch MCP Server
2
+
3
+ Thank you for your interest in contributing to the LangWatch MCP Server! This guide will help you get set up for development and understand our testing approach.
4
+
5
+ ## Development Setup
6
+
7
+ ### Prerequisites
8
+
9
+ - Node.js and pnpm
10
+ - Python with uv package manager
11
+ - Git
12
+
13
+ ### Getting Started
14
+
15
+ 1. **Clone the repository and navigate to the MCP server directory:**
16
+ ```bash
17
+ git clone https://github.com/langwatch/langwatch.git
18
+ cd langwatch/mcp-server
19
+ ```
20
+
21
+ 2. **Install dependencies and build the MCP server:**
22
+ ```bash
23
+ pnpm install
24
+ pnpm run build
25
+ ```
26
+
27
+ 3. **Configure environment variables:**
28
+ ```bash
29
+ cp .env.example .env
30
+ ```
31
+
32
+ Fill in the following required variables in your `.env` file:
33
+ - `LANGWATCH_API_KEY` - Your LangWatch project API key
34
+ - `ANTHROPIC_API_KEY` - Your Anthropic API key for Claude Code integration
35
+
36
+ 4. **Install Python dependencies (for evaluation notebooks):**
37
+ ```bash
38
+ uv sync
39
+ ```
40
+
41
+ ## Testing Approach
42
+
43
+ This project follows the **[Agent Testing Pyramid](https://scenario.langwatch.ai/best-practices/the-agent-testing-pyramid/)** methodology, which provides a structured approach to testing AI agents across three layers:
44
+
45
+ ### 1. Unit Tests (Foundation)
46
+ Traditional software tests for deterministic components like API connections, data pipelines, and error handling.
47
+
48
+ ### 2. Evals & Optimization (Middle Layer)
49
+ Component-level evaluation and optimization of probabilistic AI components, including prompt effectiveness and retrieval accuracy.
50
+
51
+ ### 3. Simulations (Peak)
52
+ End-to-end testing that validates the complete agent behavior in realistic scenarios.
53
+
54
+ ## Running Tests
55
+
56
+ ### Quick Evaluations (Jupyter Notebook)
57
+
58
+ For rapid iteration and component testing:
59
+
60
+ ```bash
61
+ # Open the evaluation notebook in VS Code/Cursor
62
+ code tests/evaluations.ipynb
63
+ ```
64
+
65
+ The notebook contains lightweight tests that directly test the MCP server with a "mocked" coding agent on single files. These are useful for:
66
+ - Quick validation of MCP tool functionality
67
+ - Testing individual instrumentation patterns
68
+ - Rapid prototyping of new features
69
+
70
+ ### End-to-End Simulations
71
+
72
+ For comprehensive system validation:
73
+
74
+ ```bash
75
+ pnpm test
76
+ ```
77
+
78
+ This runs full simulation tests using the Scenario framework, which:
79
+ - Launches actual Claude Code sessions
80
+ - Uses the MCP server in a real development environment
81
+ - Tests complete workflows on entire codebases
82
+ - Validates that the agent can successfully instrument various AI frameworks (OpenAI, LangChain, DSPy, etc.)
83
+
84
+ When tests run successfully, you'll see:
85
+ - LangWatch Scenario interface opening
86
+ - Terminal output showing Claude Code using MCP tools
87
+ - Validation of code instrumentation at the end of each scenario
88
+
89
+ ## Questions?
90
+
91
+ If you encounter any issues or have questions about the setup, please:
92
+ - Check existing GitHub issues
93
+ - Create a new issue with detailed reproduction steps
94
+ - Join our Discord community for real-time support
95
+
96
+ Happy contributing! 🚀
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # LangWatch 🏰 MCP Server
2
2
 
3
- The LangWatch MCP Server is a tool designed to aid finding, searching, and looking up LLM traces from the LangWatch platform via the [Model Context Protocol](https://modelcontextprotocol.io/introduction).
3
+ The LangWatch MCP Server is a tool designed to automatically instrument your AI code with LangWatch monitoring via the [Model Context Protocol](https://modelcontextprotocol.io/introduction).
4
4
 
5
- This server facilitates LLM development by allowing the agent to search for traces, understand all the steps in between a problematic output and try to fix the issue.
5
+ This server facilitates LLM development by helping AI coding assistants automatically add LangWatch instrumentation to your codebase, then use those traces to analyze and debug the very AI agents they're building.
6
6
 
7
7
  ## Setup in your Codebase
8
8
 
@@ -15,8 +15,9 @@ Check out [LangWatch integration guide](https://docs.langwatch.ai/integration/ov
15
15
  3. Set the "name" as "LangWatch"
16
16
  4. Set the "type" to `command`
17
17
  5. Set the "command" to `npx -y @langwatch/mcp-server --apiKey=sk-lw-...`
18
+
18
19
  - `--apiKey`: Your LangWatch API key. This is mandatory and must be provided.
19
- - `--endpoint`: *Optional* The endpoint for the LangWatch API. Defaults to `https://app.langwatch.ai` if not specified.
20
+ - `--endpoint`: _Optional_ The endpoint for the LangWatch API. Defaults to `https://app.langwatch.ai` if not specified.
20
21
 
21
22
  > [!TIP]
22
23
  > To aid in securing your keys, the MCP will first look at the global system environment variables `LANGWATCH_API_KEY` and `LANGWATCH_ENDPOINT` to check if they have values as well as looking at arguments passed into the server on start.
@@ -31,6 +32,12 @@ Check out [LangWatch integration guide](https://docs.langwatch.ai/integration/ov
31
32
 
32
33
  The MCP Server provides the following tools:
33
34
 
35
+ ### `fetch_langwatch_docs`
36
+
37
+ - **Description:** Fetches the LangWatch docs for understanding how to implement LangWatch in your codebase.
38
+ - **Parameters:**
39
+ - `url`: (Optional) The full url of the specific doc page. If not provided, the docs index will be fetched.
40
+
34
41
  ### `get_latest_traces`
35
42
 
36
43
  - **Description:** Retrieves the latest LLM traces.
@@ -49,12 +56,13 @@ The MCP Server provides the following tools:
49
56
  To use these tools within Cursor, follow these steps:
50
57
 
51
58
  1. **Open the Cursor Chat view:**
52
- - `Cmd + I`
59
+
60
+ - `Cmd + I`
53
61
 
54
62
  2. **Ensure the MCP server is running:**
55
63
 
56
64
  3. **Interact with your Agent:**
57
- - Ask a question like the following to test the tools are accessible: *Note: When the tool is detected, you'll need to run `Run tool` in the chat view for it to be called.
65
+ - Ask a question like the following to test the tools are accessible: \*Note: When the tool is detected, you'll need to run `Run tool` in the chat view for it to be called.
58
66
 
59
67
  > "I just ran into an issue while debugging, can you check the latest traces and fix it?"
60
68
 
@@ -64,7 +72,6 @@ To use these tools within Cursor, follow these steps:
64
72
  <img alt="LangWatch Logo" src="../assets/mcp-server/cursor-example.light.webp" width="900">
65
73
  </picture>
66
74
 
67
-
68
75
  ## 🛟 Support
69
76
 
70
77
  If you have questions or need help, join our community: