coala-cli 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,157 @@
1
+ Metadata-Version: 2.1
2
+ Name: coala-cli
3
+ Version: 0.2.0
4
+ Summary: Convert any CMD tool into a LLM agent
5
+ License: MIT
6
+ Author: Qiang
7
+ Requires-Python: >=3.12,<3.14
8
+ Classifier: License :: OSI Approved :: MIT License
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.12
11
+ Requires-Dist: cwltool (>=3.1.20240909164951,<4.0.0)
12
+ Requires-Dist: fastapi (>=0.114.0)
13
+ Requires-Dist: mcp (>=1.9.0,<2.0.0)
14
+ Requires-Dist: pydantic (>=2.9.2,<3.0.0)
15
+ Requires-Dist: requests (>=2.32.3,<3.0.0)
16
+ Requires-Dist: uvicorn (>=0.30.6,<0.31.0)
17
+ Description-Content-Type: text/markdown
18
+
19
+ # coala-cli
20
+ ======================
21
+
22
+ ## Overview
23
+
24
+ Coala, implemented as a Python package, is a standards-based framework for turning command-line tools into reproducible, agent-accessible toolsets that support natural-language interaction.
25
+
26
+ ## How the Framework Works
27
+
28
+ Coala integrates the [Common Workflow Language (CWL)](https://www.commonwl.org/specification/) with the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro) to standardize tool execution. This approach allows Large Language Model (LLM) agents to discover and run tools through structured interfaces, while strictly enforcing the containerized environments and deterministic results necessary for reproducible science.
29
+
30
+ ### Core Components
31
+ - **Client Layer:** Any MCP-compliant client application (e.g., Claude Desktop, Cursor, or custom interfaces) that utilizes LLMs (such as Gemini, GPT-5, or Claude) to enable natural language interaction.
32
+ - **Bridge Layer:** A local, generic MCP server that acts as a schema translator. Unlike standard MCP servers that require custom Python wrappers for each tool, the bridge layer automatically parses CWL definitions and exposes the CWL-described command-line tools as executable MCP utilities.
33
+ - **Execution Layer:** A standard CWL runner that executes the underlying binaries within containerized environments (Docker). This ensures that analyses are reproducible and isolated from the host system's dependencies.
34
+
35
+ ### Quick Start
36
+
37
+ 1. **Initialize:** Create a local MCP server instance using `mcp_api()`.
38
+ 2. **Register:** Load your domain-specific tools described in CWL via `add_tool()` (supports local files or repositories).
39
+ 3. **Serve:** Start the MCP server using `mcp.serve()`.
40
+
41
+ ### The Workflow
42
+
43
+ - **Interact:** The user sends a natural language query to the MCP Client (e.g., Claude Desktop).
44
+ - **Discover & Select:** The Client retrieves the tool list from the MCP server. The LLM selects the appropriate tool and sends a structured request for the analysis.
45
+ - **Execute:** Coala translates this selection into a CWL job and executes it within a container (Docker), ensuring reproducibility.
46
+ - **Respond:** The execution logs and results are returned to the LLM, which interprets them and presents the final answer to the user.
47
+
48
+ ## Get Started
49
+
50
+ ### Requirements
51
+
52
+ * Python 3.12 or later
53
+ * FastAPI
54
+ * Requests
55
+ * Pydantic
56
+ * Uvicorn
57
+ * cwltool
58
+ * mcp (Model Context Protocol SDK)
59
+
60
+ ### Installation
61
+
62
+ To install coala-cli, run the following command:
63
+ ```bash
64
+ pip install coala-cli
65
+ ```
66
+
67
+ ### Use Cases
68
+
69
+
70
+ <!-- This text is a hidden note and will not be displayed in the rendered README.
71
+
72
+ ### MCP server
73
+
74
+ The framework allows you to set up an MCP server with predefined tools for specific domains. For example, to create a bioinformatics-focused MCP server, you can use the following setup (as shown in [`examples/bioinfo_question.py`](examples/bioinfo_question.py)):
75
+
76
+ ```python
77
+ from cmdagent.mcp_api import mcp_api
78
+
79
+ mcp = mcp_api(host='0.0.0.0', port=8000)
80
+ mcp.add_tool('examples/ncbi_datasets_gene.cwl', 'ncbi_datasets_gene')
81
+ mcp.add_tool('examples/bcftools_view.cwl', 'bcftools_view', read_outs=False)
82
+ mcp.serve()
83
+ ```
84
+
85
+ This creates an MCP server that exposes two bioinformatics tools:
86
+ - `ncbi_datasets_gene`: Retrieves gene metadata from NCBI datasets
87
+ - `bcftools_view`: Subsets and filters VCF/BCF files
88
+
89
+ Once the server is running, you can configure your MCP client (e.g., in Cursor) to connect to it:
90
+
91
+ ```json
92
+ {
93
+ "mcpServers": {
94
+ "cmdagent": {
95
+ "url": "http://localhost:8000/mcp",
96
+ "transport": "streamable-http"
97
+ }
98
+ }
99
+ }
100
+ ```
101
+
102
+ With this setup, you can ask the LLM natural language questions like:
103
+ - "Give me a summary about gene BRCA1"
104
+ - "Subset variants in the gene BRCA1 from the https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz"
105
+
106
+ The LLM will automatically discover the available tools, understand their parameters, invoke the appropriate tool with the correct arguments, and present the results in a user-friendly format.
107
+
108
+ * Start MCP server
109
+ ```
110
+ python examples/bioinfo_question.py
111
+ ```
112
+
113
+ * Call by MCP client from Cursor
114
+ [![Demo md5](tests/cmdagent.gif)](https://www.youtube.com/watch?v=QqevFmQbTDU)
115
+
116
+
117
+ ### Function call
118
+ * Creating an API
119
+
120
+ To create an API, import the `tool_api` function from `cmdagent.remote_api` and pass in the path to a CWL file and the name of the tool:
121
+ ```python
122
+ from cmdagent.remote_api import tool_api
123
+
124
+ api = tool_api(cwl_file='tests/dockstore-tool-md5sum.cwl', tool_name='md5sum')
125
+ api.serve()
126
+ ```
127
+ The `api.serve()` method will start a RESTful API as a service, allowing you to run the tool remotely from the cloud or locally.
128
+
129
+ * Creating a Tool Agent
130
+
131
+ To create a tool agent, import the `cmdagent` function from `cmdagent.agent` and pass in the API instance:
132
+ ```python
133
+ from cmdagent.agent import tool_agent
134
+
135
+ ta = tool_agent(api)
136
+ md5 = ta.create_tool()
137
+ md5(input_file="tests/dockstore-tool-md5sum.cwl")
138
+ ```
139
+ Function `md5` is created automatically based on the `api`.
140
+
141
+ * Function call with Gemini
142
+
143
+ To integrate the tool agent with Gemini, import the `GenerativeModel` class from `google.generativeai` and create a new instance:
144
+ ```python
145
+ import google.generativeai as genai
146
+
147
+ genai.configure(api_key="******")
148
+ model = genai.GenerativeModel(model_name='gemini-1.5-flash', tools=[md5])
149
+
150
+ chat = model.start_chat(enable_automatic_function_calling=True)
151
+ response = chat.send_message("what is md5 of tests/dockstore-tool-md5sum.cwl?")
152
+ response.text
153
+ ```
154
+ ```
155
+ 'The md5sum of tests/dockstore-tool-md5sum.cwl is ad59d9e9ed6344f5c20ee7e0143c6c12. \n'
156
+ ```
157
+ -->
@@ -0,0 +1,139 @@
1
+ # coala-cli
2
+ ======================
3
+
4
+ ## Overview
5
+
6
+ Coala, implemented as a Python package, is a standards-based framework for turning command-line tools into reproducible, agent-accessible toolsets that support natural-language interaction.
7
+
8
+ ## How the Framework Works
9
+
10
+ Coala integrates the [Common Workflow Language (CWL)](https://www.commonwl.org/specification/) with the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro) to standardize tool execution. This approach allows Large Language Model (LLM) agents to discover and run tools through structured interfaces, while strictly enforcing the containerized environments and deterministic results necessary for reproducible science.
11
+
12
+ ### Core Components
13
+ - **Client Layer:** Any MCP-compliant client application (e.g., Claude Desktop, Cursor, or custom interfaces) that utilizes LLMs (such as Gemini, GPT-5, or Claude) to enable natural language interaction.
14
+ - **Bridge Layer:** A local, generic MCP server that acts as a schema translator. Unlike standard MCP servers that require custom Python wrappers for each tool, the bridge layer automatically parses CWL definitions and exposes the CWL-described command-line tools as executable MCP utilities.
15
+ - **Execution Layer:** A standard CWL runner that executes the underlying binaries within containerized environments (Docker). This ensures that analyses are reproducible and isolated from the host system's dependencies.
16
+
17
+ ### Quick Start
18
+
19
+ 1. **Initialize:** Create a local MCP server instance using `mcp_api()`.
20
+ 2. **Register:** Load your domain-specific tools described in CWL via `add_tool()` (supports local files or repositories).
21
+ 3. **Serve:** Start the MCP server using `mcp.serve()`.
22
+
23
+ ### The Workflow
24
+
25
+ - **Interact:** The user sends a natural language query to the MCP Client (e.g., Claude Desktop).
26
+ - **Discover & Select:** The Client retrieves the tool list from the MCP server. The LLM selects the appropriate tool and sends a structured request for the analysis.
27
+ - **Execute:** Coala translates this selection into a CWL job and executes it within a container (Docker), ensuring reproducibility.
28
+ - **Respond:** The execution logs and results are returned to the LLM, which interprets them and presents the final answer to the user.
29
+
30
+ ## Get Started
31
+
32
+ ### Requirements
33
+
34
+ * Python 3.12 or later
35
+ * FastAPI
36
+ * Requests
37
+ * Pydantic
38
+ * Uvicorn
39
+ * cwltool
40
+ * mcp (Model Context Protocol SDK)
41
+
42
+ ### Installation
43
+
44
+ To install coala-cli, run the following command:
45
+ ```bash
46
+ pip install coala-cli
47
+ ```
48
+
49
+ ### Use Cases
50
+
51
+
52
+ <!-- This text is a hidden note and will not be displayed in the rendered README.
53
+
54
+ ### MCP server
55
+
56
+ The framework allows you to set up an MCP server with predefined tools for specific domains. For example, to create a bioinformatics-focused MCP server, you can use the following setup (as shown in [`examples/bioinfo_question.py`](examples/bioinfo_question.py)):
57
+
58
+ ```python
59
+ from cmdagent.mcp_api import mcp_api
60
+
61
+ mcp = mcp_api(host='0.0.0.0', port=8000)
62
+ mcp.add_tool('examples/ncbi_datasets_gene.cwl', 'ncbi_datasets_gene')
63
+ mcp.add_tool('examples/bcftools_view.cwl', 'bcftools_view', read_outs=False)
64
+ mcp.serve()
65
+ ```
66
+
67
+ This creates an MCP server that exposes two bioinformatics tools:
68
+ - `ncbi_datasets_gene`: Retrieves gene metadata from NCBI datasets
69
+ - `bcftools_view`: Subsets and filters VCF/BCF files
70
+
71
+ Once the server is running, you can configure your MCP client (e.g., in Cursor) to connect to it:
72
+
73
+ ```json
74
+ {
75
+ "mcpServers": {
76
+ "cmdagent": {
77
+ "url": "http://localhost:8000/mcp",
78
+ "transport": "streamable-http"
79
+ }
80
+ }
81
+ }
82
+ ```
83
+
84
+ With this setup, you can ask the LLM natural language questions like:
85
+ - "Give me a summary about gene BRCA1"
86
+ - "Subset variants in the gene BRCA1 from the https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz"
87
+
88
+ The LLM will automatically discover the available tools, understand their parameters, invoke the appropriate tool with the correct arguments, and present the results in a user-friendly format.
89
+
90
+ * Start MCP server
91
+ ```
92
+ python examples/bioinfo_question.py
93
+ ```
94
+
95
+ * Call by MCP client from Cursor
96
+ [![Demo md5](tests/cmdagent.gif)](https://www.youtube.com/watch?v=QqevFmQbTDU)
97
+
98
+
99
+ ### Function call
100
+ * Creating an API
101
+
102
+ To create an API, import the `tool_api` function from `cmdagent.remote_api` and pass in the path to a CWL file and the name of the tool:
103
+ ```python
104
+ from cmdagent.remote_api import tool_api
105
+
106
+ api = tool_api(cwl_file='tests/dockstore-tool-md5sum.cwl', tool_name='md5sum')
107
+ api.serve()
108
+ ```
109
+ The `api.serve()` method will start a RESTful API as a service, allowing you to run the tool remotely from the cloud or locally.
110
+
111
+ * Creating a Tool Agent
112
+
113
+ To create a tool agent, import the `cmdagent` function from `cmdagent.agent` and pass in the API instance:
114
+ ```python
115
+ from cmdagent.agent import tool_agent
116
+
117
+ ta = tool_agent(api)
118
+ md5 = ta.create_tool()
119
+ md5(input_file="tests/dockstore-tool-md5sum.cwl")
120
+ ```
121
+ Function `md5` is created automatically based on the `api`.
122
+
123
+ * Function call with Gemini
124
+
125
+ To integrate the tool agent with Gemini, import the `GenerativeModel` class from `google.generativeai` and create a new instance:
126
+ ```python
127
+ import google.generativeai as genai
128
+
129
+ genai.configure(api_key="******")
130
+ model = genai.GenerativeModel(model_name='gemini-1.5-flash', tools=[md5])
131
+
132
+ chat = model.start_chat(enable_automatic_function_calling=True)
133
+ response = chat.send_message("what is md5 of tests/dockstore-tool-md5sum.cwl?")
134
+ response.text
135
+ ```
136
+ ```
137
+ 'The md5sum of tests/dockstore-tool-md5sum.cwl is ad59d9e9ed6344f5c20ee7e0143c6c12. \n'
138
+ ```
139
+ -->
File without changes
@@ -0,0 +1,102 @@
1
+
2
+ import requests
3
+ from types import FunctionType
4
+
5
+ class tool_agent():
6
+ def __init__(self, api):
7
+
8
+ # self.parameter_names = parameter_names
9
+ self.api = api.server
10
+ self.tool = api.tool
11
+ self.url = api.url
12
+ self.Base = api.Base
13
+ self.tool_name = api.tool_name
14
+ self.run = self._create_function()
15
+ self.parameter_names = [it['name'] for it in api.tool.t.inputs_record_schema['fields']]
16
+
17
+ #params = {'input_file': file_path}
18
+ def upload_file(self, file_path):
19
+ url_upload = f"http://{self.api.config.host}:{self.api.config.port}/uploadFile/"
20
+ files = {'file': open(file_path, 'rb')}
21
+ response = requests.post(url_upload, files=files)
22
+
23
+ if response.status_code == 200:
24
+ return response.json()
25
+ else:
26
+ raise Exception(f"Error uploading file: {response.text}")
27
+
28
+ def pre_inputs(self, inputs, kwargs):
29
+ params = kwargs.copy()
30
+ for ip in inputs:
31
+ if 'File' in ip['type']:
32
+ # upload to server
33
+ r_path = self.upload_file(kwargs[ip['name']])
34
+ params[ip['name']] = 'file://' + r_path['filepath']
35
+
36
+ # params[ip['name']] = {
37
+ # "class": "File",
38
+ # "location": r_path['filepath']
39
+ # }
40
+ return params
41
+
42
+
43
+ def _create_function(self):
44
+ def gen_function(**kwargs):
45
+ # Ensure all required parameters are passed
46
+ for param in self.parameter_names:
47
+ if param not in kwargs:
48
+ raise ValueError(f"Missing required parameter: {param}")
49
+
50
+ print(", ".join(f"{param}={kwargs[param]}" for param in self.parameter_names))
51
+
52
+ inputs = self.tool.t.inputs_record_schema['fields']
53
+
54
+ params = self.pre_inputs(inputs, kwargs)
55
+ print(params)
56
+ response = requests.post(self.url, json=[params])
57
+
58
+ if response.status_code == 200:
59
+ return response.json()
60
+ else:
61
+ raise Exception(f"Error uploading file: {response.text}")
62
+
63
+ gen_function.__name__ = self.tool_name
64
+ ann = {}
65
+ for k, v in self.Base.model_fields.items():
66
+ ann[k] = v.annotation
67
+ ann['return'] = str
68
+ gen_function.__annotations__ = ann
69
+ gen_function.__doc__ = self.tool.t.tool['doc']
70
+
71
+ return gen_function
72
+
73
+ def create_tool(self):
74
+ tool_name = self.tool_name
75
+ param_names = self.parameter_names
76
+ fun = self.run
77
+
78
+ # Start building the function definition as a string
79
+ function_code = "def {}({}):\n".format(tool_name, ", ".join(param_names))
80
+ function_code += " kwargs = {" + ", ".join([f"'{name}': {name}" for name in param_names]) + "}\n"
81
+ function_code += f" return {fun.__name__}(**kwargs)\n"
82
+ # Define a local namespace to execute the function
83
+ local_namespace = {}
84
+
85
+ # Execute the function code in the local namespace
86
+ exec(function_code, {}, local_namespace)
87
+
88
+ generated_function = local_namespace[tool_name]
89
+
90
+ # Use FunctionType to create the function, passing globals with the external function
91
+ dynamic_func = FunctionType(
92
+ generated_function.__code__,
93
+ {fun.__name__: fun}, # pass the predefined function to globals
94
+ generated_function.__name__
95
+ )
96
+
97
+ # Return the generated function
98
+ dynamic_func.__annotations__ = fun.__annotations__
99
+ dynamic_func.__doc__ = fun.__doc__
100
+
101
+ return dynamic_func
102
+
@@ -0,0 +1,442 @@
1
+ # Suppress Pydantic warning about Field() with Optional/Union types
2
+ # This must be done before any Pydantic imports to ensure the filter is active
3
+ import warnings
4
+ warnings.filterwarnings('ignore', message='.*default.*Field.*')
5
+ warnings.filterwarnings('ignore', message='.*UnsupportedFieldAttributeWarning.*')
6
+
7
+ from fastapi import FastAPI, UploadFile, File
8
+ from pydantic import create_model, Field
9
+ from pydantic.warnings import UnsupportedFieldAttributeWarning
10
+ import logging
11
+ import uvicorn
12
+ from tempfile import NamedTemporaryFile, mkdtemp
13
+ from cwltool import factory
14
+ from cwltool.context import RuntimeContext
15
+ from threading import Thread
16
+ import time
17
+ from typing import Optional, List, Annotated
18
+ from mcp.server.fastmcp import FastMCP
19
+ from coala.tool_logic import run_tool, configure_container_runner # <-- import shared logic
20
+ import threading
21
+ import sys
22
+ import os
23
+
24
+ # Additional filter for the specific warning category (applied after import)
25
+ warnings.filterwarnings('ignore', category=UnsupportedFieldAttributeWarning)
26
+
27
+ logger = logging.getLogger(__name__)
28
+ logger.setLevel(logging.INFO)
29
+ # Use stderr for logging to avoid interfering with stdio transport
30
+ logger.addHandler(logging.StreamHandler(sys.stderr))
31
+
32
+
33
+ class mcp_api():
34
+ def __init__(self, host='0.0.0.0', port=8000, container_runner=None):
35
+ """
36
+ Initializes an MCP server that can host multiple CWL tools.
37
+
38
+ Parameters:
39
+ host (str): The host IP address. Defaults to '0.0.0.0'.
40
+ port (int): The port number. Defaults to 8000.
41
+ container_runner (str, optional): Container runtime to use for all tools.
42
+ Valid values: 'docker', 'podman', 'singularity', 'udocker', etc.
43
+ Defaults to None (uses tool's default, typically 'docker').
44
+
45
+ Notes:
46
+ Output-reading behavior is controlled per tool via `add_tool(..., read_outs=False)`.
47
+ """
48
+ self.host = host
49
+ self.port = port
50
+ self.container_runner = container_runner
51
+ self.server = None
52
+ self.url = None
53
+ self.mcp = FastMCP(host=host, port=port)
54
+ self.tools = {} # tool_name -> tool info
55
+ self.system_prompt = """
56
+ At the end of your response, append a summary listing the tool name and version from the tool's description using this exact format:
57
+ ```
58
+ Tool Invocation Summary:
59
+ tool_name: <TOOL_NAME>
60
+ tool_version: <TOOL_VERSION>
61
+ ```
62
+ """
63
+
64
+
65
+ # @self.mcp.tool()
66
+ # async def uploadFile(file: UploadFile = File(description="The file to be uploaded to the server")) -> dict:
67
+ # """
68
+ # Upload a file to the server.
69
+ # """
70
+ # with NamedTemporaryFile(delete=False) as tmp:
71
+ # contents = file.file.read()
72
+ # tmp.write(contents)
73
+ # return {"filename": file.filename, "filepath": tmp.name}
74
+
75
+ def _build_field_description(self, field_name, input_field, model_field):
76
+ """
77
+ Build field description with type hints.
78
+ """
79
+ doc = input_field.get('doc', '')
80
+ type_val = input_field.get('type', '')
81
+ type_list = type_val if isinstance(type_val, list) else [type_val]
82
+ type_str = ' '.join(str(t) for t in type_list)
83
+
84
+ type_hint = ""
85
+ if 'File' in type_str:
86
+ type_hint = "file path"
87
+ elif 'string' in type_str:
88
+ type_hint = "str"
89
+ elif 'double' in type_str or 'float' in type_str:
90
+ type_hint = "float"
91
+ elif 'int' in type_str:
92
+ type_hint = "int"
93
+ elif 'boolean' in type_str:
94
+ type_hint = "bool"
95
+
96
+ annotation = model_field.annotation.__name__ if hasattr(model_field.annotation, '__name__') else str(model_field.annotation)
97
+
98
+ if type_hint:
99
+ return f"{field_name}: {doc}, {annotation}, {type_hint}"
100
+ else:
101
+ return f"{field_name}: {doc}, {annotation}"
102
+
103
+ def _build_output_description(self, output_field):
104
+ """
105
+ Build output field description with type hints.
106
+ """
107
+ field_name = output_field.get('name', '')
108
+ doc = output_field.get('doc', '')
109
+ type_val = output_field.get('type', '')
110
+ type_list = type_val if isinstance(type_val, list) else [type_val]
111
+ type_str = ' '.join(str(t) for t in type_list)
112
+
113
+ type_hint = ""
114
+ if 'File' in type_str:
115
+ type_hint = "file path"
116
+ elif 'string' in type_str:
117
+ type_hint = "str"
118
+ elif 'double' in type_str or 'float' in type_str:
119
+ type_hint = "float"
120
+ elif 'int' in type_str:
121
+ type_hint = "int"
122
+ elif 'boolean' in type_str:
123
+ type_hint = "bool"
124
+
125
+ if type_hint:
126
+ return f"{field_name}: {doc}, {type_hint}"
127
+ else:
128
+ return f"{field_name}: {doc}"
129
+
130
+ def _transform_input_value(self, field_name, value, input_type):
131
+ """
132
+ Transform input values based on their expected type.
133
+
134
+ - For File types: If value is just a filename, try to resolve to full path
135
+ - For string types: If value is a full path, extract just the filename
136
+ - For array types: Transform each element in the array
137
+
138
+ Parameters:
139
+ field_name: Name of the input field
140
+ value: The input value to transform
141
+ input_type: The CWL type definition for this input
142
+
143
+ Returns:
144
+ Transformed value
145
+ """
146
+ if value is None:
147
+ return value
148
+
149
+ # Check if it's an array type
150
+ is_array = False
151
+ base_type = input_type
152
+
153
+ if isinstance(input_type, list):
154
+ # Filter out 'null' to get actual types
155
+ non_null_types = [t for t in input_type if t != 'null']
156
+ if non_null_types:
157
+ base_type = non_null_types[0]
158
+
159
+ # Check for array notation (e.g., 'float[]' or {'type': 'array', 'items': 'float'})
160
+ if isinstance(base_type, dict) and base_type.get('type') == 'array':
161
+ is_array = True
162
+ base_type = base_type.get('items', 'string')
163
+ elif isinstance(base_type, str) and '[]' in base_type:
164
+ is_array = True
165
+ base_type = base_type.replace('[]', '')
166
+
167
+ # If value is a list and type is array, transform each element
168
+ if is_array and isinstance(value, list):
169
+ return [self._transform_input_value(f"{field_name}[{i}]", item, base_type)
170
+ for i, item in enumerate(value)]
171
+
172
+ # Convert type to string for checking
173
+ type_str = str(base_type) if not isinstance(base_type, dict) else base_type.get('type', '')
174
+
175
+ # Check if it's a File type
176
+ if 'File' in type_str and isinstance(value, str):
177
+ # If it's already a file:// URI, return as is
178
+ if value.startswith('file://'):
179
+ return value
180
+
181
+ # If it's already an absolute path that exists, return as is
182
+ if os.path.isabs(value) and os.path.isfile(value):
183
+ return value
184
+
185
+ # Try to resolve filename to full path
186
+ # Check if it's a file in current directory
187
+ if os.path.isfile(value):
188
+ return os.path.abspath(value)
189
+
190
+ # Check in current working directory
191
+ cwd_path = os.path.join(os.getcwd(), value)
192
+ if os.path.isfile(cwd_path):
193
+ return os.path.abspath(cwd_path)
194
+
195
+ # If not found, return as is (let run_tool handle it)
196
+ return value
197
+
198
+ # Check if it's a string type
199
+ elif 'string' in type_str and isinstance(value, str):
200
+ # If it looks like a full path, check if directory exists
201
+ if os.path.sep in value or (os.path.altsep and os.path.altsep in value):
202
+ # Get the directory part of the path
203
+ dir_path = os.path.dirname(value)
204
+ # Only extract filename if the directory exists
205
+ if dir_path and os.path.isdir(dir_path):
206
+ # Extract filename from path
207
+ filename = os.path.basename(value)
208
+ logger.info(f"Transformed string input '{field_name}': '{value}' -> '{filename}'")
209
+ return filename
210
+ # If directory doesn't exist, keep the full path as is
211
+ return value
212
+
213
+ return value
214
+
215
+ def add_tool(self, cwl_file, tool_name=None, read_outs=False):
216
+ """
217
+ Adds a CWL tool to the MCP server.
218
+
219
+ Parameters:
220
+ cwl_file: Path to the CWL tool file
221
+ tool_name: Optional tool name. If not provided, will use:
222
+ 1. The 'id' field from the CWL tool
223
+ 2. If 'id' is not defined, the basename of cwl_file (without .cwl extension)
224
+ read_outs: Whether to read output files
225
+
226
+ Raises:
227
+ FileNotFoundError: If the CWL file does not exist
228
+ Exception: If there's an error loading the CWL tool
229
+ """
230
+ # Check if file exists
231
+ if not os.path.exists(cwl_file):
232
+ raise FileNotFoundError(f"CWL file not found: {cwl_file}")
233
+
234
+ if not os.path.isfile(cwl_file):
235
+ raise ValueError(f"Path is not a file: {cwl_file}")
236
+
237
+ runtime_context = RuntimeContext()
238
+ runtime_context.outdir = mkdtemp()
239
+ # Configure container runner if specified
240
+ if self.container_runner:
241
+ configure_container_runner(runtime_context, self.container_runner)
242
+ fac = factory.Factory(runtime_context=runtime_context)
243
+
244
+ try:
245
+ tool = fac.make(cwl_file)
246
+ except Exception as e:
247
+ raise Exception(f"Failed to load CWL tool from {cwl_file}: {str(e)}") from e
248
+
249
+ # Determine tool_name if not provided
250
+ if tool_name is None:
251
+ # Try to get 'id' from CWL tool
252
+ tool_id = tool.t.tool.get('id') if hasattr(tool.t, 'tool') and tool.t.tool else None
253
+ # Only use id if it contains a '#' fragment (e.g., "file://path#ToolName")
254
+ # If id is just a file:// path without fragment, treat it as undefined
255
+ if tool_id and '#' in tool_id:
256
+ tool_name = tool_id.split('#')[-1]
257
+ # If 'id' is not defined or doesn't have a fragment, use basename of cwl_file without .cwl extension
258
+ if not tool_name:
259
+ tool_name = os.path.basename(cwl_file).replace('.cwl', '')
260
+
261
+ inputs = tool.t.inputs_record_schema['fields']
262
+ outputs = tool.t.outputs_record_schema['fields']
263
+
264
+ # Create a mapping from field name to input field definition
265
+ inputs_by_name = {it['name']: it for it in inputs}
266
+
267
+ # map types
268
+ it_map = {}
269
+ for it in inputs:
270
+ # it['type'] can be a list like ['null', 'org.w3id.cwl.cwl.File'] or ['null', 'float[]']
271
+ # or a dict like {'type': 'array', 'items': 'float'}
272
+ # or a string like 'float[]'
273
+ raw_type = it['type']
274
+ type_list = raw_type if isinstance(raw_type, list) else [raw_type]
275
+
276
+ # Check for 'null' in type list (optional field)
277
+ is_optional = 'null' in type_list
278
+ # Filter out 'null' to get the actual type(s)
279
+ non_null_types = [t for t in type_list if t != 'null']
280
+
281
+ # Check if it's a dict-based array type (e.g., {'type': 'array', 'items': 'float'})
282
+ is_array = False
283
+ base_type_str = None
284
+ if isinstance(raw_type, dict) and raw_type.get('type') == 'array':
285
+ is_array = True
286
+ items_type = raw_type.get('items', 'string')
287
+ base_type_str = str(items_type) if not isinstance(items_type, dict) else items_type.get('type', 'string')
288
+ elif non_null_types:
289
+ # Check for array notation in string (e.g., 'float[]')
290
+ # Look through non-null types for array notation
291
+ for t in non_null_types:
292
+ t_str = str(t)
293
+ if '[]' in t_str:
294
+ is_array = True
295
+ base_type_str = t_str.replace('[]', '')
296
+ break
297
+
298
+ if not is_array and non_null_types:
299
+ # Not an array, use the first non-null type
300
+ base_type_str = str(non_null_types[0])
301
+ else:
302
+ # Fallback to string if no types found
303
+ base_type_str = 'string'
304
+
305
+ # Get field description from CWL input
306
+ field_doc = it.get('doc', '')
307
+
308
+ # Determine base Python type
309
+ if 'File' in base_type_str:
310
+ base_py_type = str
311
+ elif 'string' in base_type_str:
312
+ base_py_type = str
313
+ elif 'double' in base_type_str or 'float' in base_type_str:
314
+ base_py_type = float
315
+ elif 'int' in base_type_str:
316
+ base_py_type = int
317
+ elif 'boolean' in base_type_str:
318
+ base_py_type = bool
319
+ else:
320
+ base_py_type = str
321
+
322
+ # Wrap in List if it's an array
323
+ if is_array:
324
+ py_type = List[base_py_type]
325
+ else:
326
+ py_type = base_py_type
327
+
328
+ # Create Field with description
329
+ # For optional fields, use (Optional[type], None) - Field() can't be used with Union types
330
+ # For required fields, use Field directly
331
+ if is_optional:
332
+ # Use Optional type with None as default
333
+ # Note: We can't use Field() with Optional/Union types, so description will be set via field_doc in fields_desc
334
+ it_map[it['name']] = (Optional[py_type], None)
335
+ else:
336
+ it_map[it['name']] = (py_type, Field(description=field_doc))
337
+
338
+ Base = create_model(f'Base_{tool_name}', **it_map)
339
+
340
+ fields_desc = "\n\n".join(
341
+ self._build_field_description(k, inputs_by_name[k], v)
342
+ for k, v in Base.model_fields.items()
343
+ )
344
+
345
+ outputs_desc = "\n\n".join(
346
+ self._build_output_description(out)
347
+ for out in outputs
348
+ )
349
+
350
+ # Extract Docker image information
351
+ docker_info = ""
352
+ docker_version = ""
353
+ # Check requirements first
354
+ if hasattr(tool.t, 'requirements') and tool.t.requirements:
355
+ for req in tool.t.requirements:
356
+ if isinstance(req, dict) and req.get('class') == 'DockerRequirement':
357
+ docker_pull = req.get('dockerPull', '')
358
+ if docker_pull:
359
+ docker_info = f"\n\ntool_version: {docker_pull}"
360
+ docker_version = docker_pull
361
+ break
362
+ # If not found in requirements, check hints
363
+ if not docker_info and hasattr(tool.t, 'hints') and tool.t.hints:
364
+ for hint in tool.t.hints:
365
+ if isinstance(hint, dict) and hint.get('class') == 'DockerRequirement':
366
+ docker_pull = hint.get('dockerPull', '')
367
+ if docker_pull:
368
+ docker_info = f"\n\ntool_version: {docker_pull}"
369
+ docker_version = docker_pull
370
+ break
371
+
372
+ tool_desc = f"{tool_name}: {tool.t.tool.get('label', '')}\n\n {tool.t.tool.get('doc', '')}{docker_info}\n\nReturns:\n\n{outputs_desc}"
373
+
374
+ @self.mcp.tool(name=tool_name, description=f"{tool_desc}\n\nInput data for '{tool_name}'. Fields: \n\n{fields_desc}")
375
+ def mcp_tool(data: List[Base]) -> dict:
376
+ """MCP tool wrapper for CWL tool execution."""
377
+ # Store fields_desc as function attribute for programmatic access
378
+ mcp_tool.fields_desc = fields_desc
379
+ # Assign interpolated docstring with field descriptions
380
+ mcp_tool.__doc__ = f"""
381
+ MCP tool wrapper for CWL tool execution.
382
+
383
+ Input fields:
384
+ {fields_desc}
385
+ """
386
+ logger.info(data)
387
+ params = data[0].model_dump()
388
+
389
+ # Transform input values based on their types
390
+ # Create a mapping from field name to input type
391
+ inputs_by_name = {it['name']: it for it in inputs}
392
+ for field_name, value in params.items():
393
+ if field_name in inputs_by_name:
394
+ input_field = inputs_by_name[field_name]
395
+ input_type = input_field.get('type', 'string')
396
+ transformed_value = self._transform_input_value(field_name, value, input_type)
397
+ if transformed_value != value:
398
+ logger.info(f"Transformed input '{field_name}': '{value}' -> '{transformed_value}'")
399
+ params[field_name] = transformed_value
400
+
401
+ outs = run_tool(tool, params, outputs, read_outs, container_runner=self.container_runner)
402
+ outs['tool_name'] = tool_name
403
+ outs['tool_version'] = docker_version
404
+ outs['system_prompt'] = self.system_prompt
405
+ logger.info(outs)
406
+ return outs
407
+
408
+ # Store tool info if needed
409
+ self.tools[tool_name] = {
410
+ 'cwl_file': cwl_file,
411
+ 'tool': tool,
412
+ 'Base': Base,
413
+ 'inputs': inputs,
414
+ 'outputs': outputs
415
+ }
416
+
417
+ def serve(self, transport=None):
418
+ """
419
+ Starts the MCP server.
420
+
421
+ Parameters:
422
+ transport (str, optional): Transport type ('stdio' or 'streamable-http').
423
+ If None, auto-detects based on stdin availability.
424
+ """
425
+ # Auto-detect transport: if stdin is not a TTY, use stdio transport
426
+ if transport is None:
427
+ if not sys.stdin.isatty():
428
+ transport = 'stdio'
429
+ else:
430
+ transport = 'streamable-http'
431
+
432
+ if transport == 'streamable-http':
433
+ # Print to stderr to avoid interfering with stdio transport
434
+ print(f"Starting MCP server at http://{self.host}:{self.port}/", file=sys.stderr, flush=True)
435
+ else:
436
+ # For stdio transport, don't print startup messages to stdout
437
+ logger.info("Starting MCP server with stdio transport")
438
+
439
+ self.mcp.run(transport=transport)
440
+ # thread = threading.Thread(target=self.mcp.run, kwargs={'transport': 'sse'}, daemon=True)
441
+ # thread.start()
442
+ # self.server_thread = thread
@@ -0,0 +1,123 @@
1
+ from fastapi import FastAPI, UploadFile, Body
2
+ from pydantic import create_model
3
+ import logging
4
+ import uvicorn
5
+ from tempfile import NamedTemporaryFile, mkdtemp
6
+ from cwltool import factory
7
+ from cwltool.context import RuntimeContext
8
+ from threading import Thread
9
+ import time
10
+ from typing import Optional, List
11
+ from coala.tool_logic import run_tool # <-- import shared logic
12
+
13
+
14
+ logger = logging.getLogger(__name__)
15
+ logger.setLevel(logging.INFO)
16
+ logger.addHandler(logging.StreamHandler())
17
+
18
+
19
+ class tool_api():
20
+ def __init__(self, cwl_file, tool_name='tool', host='0.0.0.0', port=8000, read_outs=False):
21
+ """
22
+ Initializes a tool_api object, which is used to create a FastAPI server for a given CWL file.
23
+
24
+ Parameters:
25
+ cwl_file (str): The path to the CWL file.
26
+ tool_name (str): The name of the tool. Defaults to 'tool'.
27
+ host (str): The host IP address. Defaults to '0.0.0.0'.
28
+ port (int): The port number. Defaults to 8000.
29
+ read_outs (bool): Whether to read the outputs. Defaults to False.
30
+
31
+ Returns:
32
+ None
33
+ """
34
+ self.cwl_file = cwl_file
35
+ self.tool_name = tool_name
36
+ self.host = host
37
+ self.port = port
38
+ self.read_outs = read_outs
39
+ self.server = None
40
+ self.url = None
41
+ # cwl
42
+ runtime_context = RuntimeContext()
43
+ runtime_context.outdir = mkdtemp()
44
+ fac = factory.Factory(runtime_context=runtime_context)
45
+ self.tool = fac.make(cwl_file)
46
+
47
+ self.inputs = self.tool.t.inputs_record_schema['fields']
48
+ self.outputs = self.tool.t.outputs_record_schema['fields']
49
+
50
+ # map types
51
+ it_map = {}
52
+ for it in self.inputs:
53
+ # it['type'] can be a list like ['null', 'org.w3id.cwl.cwl.File']
54
+ type_list = it['type'] if isinstance(it['type'], list) else [it['type']]
55
+ type_str = ' '.join(str(t) for t in type_list) # Join for checking substrings
56
+
57
+ if 'File' in type_str:
58
+ it_map[it['name']] = (str, None)
59
+ elif 'string' in type_str:
60
+ it_map[it['name']] = (str, None)
61
+ elif 'double' in type_str:
62
+ it_map[it['name']] = (float, None)
63
+ elif 'int' in type_str:
64
+ it_map[it['name']] = (int, None)
65
+ elif 'boolean' in type_str:
66
+ it_map[it['name']] = (bool, None)
67
+ else:
68
+ it_map[it['name']] = (str, None)
69
+
70
+ if 'null' in type_list:
71
+ type, v = it_map[it['name']]
72
+ it_map[it['name']] = (Optional[type], v)
73
+
74
+ self.Base = create_model('Base', **it_map)
75
+
76
+ # define tool
77
+ # fastapi
78
+ self.app = FastAPI()
79
+
80
+ @self.app.post('/uploadFile/')
81
+ async def uploadFile(file: UploadFile):
82
+ with NamedTemporaryFile(delete=False) as tmp:
83
+ contents = file.file.read()
84
+ tmp.write(contents)
85
+ return {"filename": file.filename, "filepath": tmp.name}
86
+
87
+ @self.app.post(f"/{self.tool_name}/")
88
+ def tool(data: List[self.Base] = Body(...)):
89
+ logger.info(data)
90
+ params = data[0].model_dump()
91
+ outs = run_tool(self.tool, params, self.outputs, self.read_outs)
92
+ logger.info(outs)
93
+ return outs
94
+
95
+ def serve(self):
96
+ """
97
+ Starts a FastAPI server to serve the specified tool.
98
+
99
+ This function initializes a FastAPI server and sets up the necessary routes for the specified tool. The server listens for HTTP requests on the specified host and port.
100
+ """
101
+ config = uvicorn.Config(app=self.app, host=self.host, port=self.port)
102
+ self.server = uvicorn.Server(config=config)
103
+ thread = Thread(target=self.server.run)
104
+ thread.start() # non-blocking call
105
+
106
+ while not self.server.started:
107
+ time.sleep(0.1)
108
+ else:
109
+ print(f"HTTP server is now running on http://{self.host}:{self.port}")
110
+ self.url = f"http://{self.host}:{self.port}/{self.tool_name}/"
111
+
112
+
113
+ def stop(self):
114
+ """
115
+ Stops the server by setting the should_exit flag to True.
116
+ """
117
+ self.server.should_exit = True
118
+
119
+
120
+
121
+ # api = tool_api(cwl_file='test_data/dockstore-tool-md5sum.cwl')
122
+ # api.serve()
123
+ # api.stop()
@@ -0,0 +1,109 @@
1
+ # coala/tool_logic.py
2
+ import os.path
3
+ import gzip
4
+ from cwltool.context import RuntimeContext
5
+
6
+ def configure_container_runner(runtime_context: RuntimeContext, container_runner: str) -> None:
7
+ """
8
+ Configure the runtime context with the specified container runner.
9
+
10
+ Parameters:
11
+ runtime_context: The RuntimeContext to configure
12
+ container_runner: Container runtime to use ('docker', 'podman', 'singularity', 'udocker', etc.)
13
+ """
14
+ runtime_context.default_container = container_runner
15
+ # Set boolean flags for specific container runners
16
+ runtime_context.singularity = (container_runner == 'singularity')
17
+ runtime_context.podman = (container_runner == 'podman')
18
+
19
+ def _read_file_content(filepath):
20
+ """Read file content, handling gzipped files."""
21
+ try:
22
+ if filepath.endswith('.gz'):
23
+ with gzip.open(filepath, 'rt', encoding='utf-8') as f:
24
+ return f.read().replace('\n', '')
25
+ else:
26
+ with open(filepath, 'r', encoding='utf-8') as f:
27
+ return f.read().replace('\n', '')
28
+ except (UnicodeDecodeError, OSError):
29
+ # If reading fails (binary file, etc.), return the filepath instead
30
+ return filepath
31
+
32
+ def run_tool(tool, params, outputs, read_outs=False, container_runner=None):
33
+ """
34
+ Execute a CWL tool with the given parameters.
35
+
36
+ Parameters:
37
+ tool: The CWL tool object (created via factory.Factory().make())
38
+ params: Dictionary of input parameters
39
+ outputs: List of output field definitions
40
+ read_outs: Whether to read output file contents (default: False)
41
+ container_runner: Container runtime to use (default: None, uses tool's default)
42
+ Valid values: 'docker', 'podman', 'singularity', 'udocker', etc.
43
+
44
+ Returns:
45
+ Dictionary mapping output field names to their values
46
+ """
47
+ # Prepare params for CWL tool
48
+ inputs = tool.t.inputs_record_schema['fields']
49
+ in_dict = {}
50
+ for i in inputs:
51
+ in_dict[i['name']] = i['type']
52
+
53
+ for k, v in params.items():
54
+ if k in in_dict:
55
+ type_val = in_dict[k]
56
+ # Handle both list and string types (e.g., ['null', 'File'] or 'File?')
57
+ # Convert each item to str to handle CommentedMap from ruamel.yaml (enum types)
58
+ type_str = ' '.join(str(t) for t in type_val) if isinstance(type_val, list) else str(type_val)
59
+ if 'File' in type_str and v is not None:
60
+ if type(v) is dict and 'location' in v:
61
+ location = v['location']
62
+ elif isinstance(v, str) and v.startswith('file://'):
63
+ location = v
64
+ elif isinstance(v, str) and os.path.isfile(v):
65
+ location = f"file://{v}"
66
+ else:
67
+ continue # Do nothing if v is not a file
68
+
69
+ params[k] = {
70
+ "class": "File",
71
+ "location": location
72
+ }
73
+
74
+ # Modify the tool's runtime context if container runner is specified
75
+ if container_runner:
76
+ # Try to get the original runtime context from the tool
77
+ original_runtime_context = None
78
+ if hasattr(tool, 'runtime_context'):
79
+ original_runtime_context = tool.runtime_context
80
+ elif hasattr(tool, 't') and hasattr(tool.t, 'runtime_context'):
81
+ original_runtime_context = tool.t.runtime_context
82
+
83
+ # If we found the runtime context, modify it in place
84
+ if original_runtime_context:
85
+ configure_container_runner(original_runtime_context, container_runner)
86
+
87
+ # Execute tool (no need to pass runtime_context if we modified it in place)
88
+ res = tool(**params)
89
+ outs = {}
90
+ for ot in outputs:
91
+ out_content = res[ot['name']]
92
+ # Handle both list and string types (e.g., ['null', 'File'] or 'File?')
93
+ # Convert each item to str to handle CommentedMap from ruamel.yaml (enum types)
94
+ type_val = ot['type']
95
+ type_str = ' '.join(str(t) for t in type_val) if isinstance(type_val, list) else str(type_val)
96
+ if read_outs and 'File' in type_str:
97
+ # Handle both single File and File[] (array) outputs
98
+ file_result = res[ot['name']]
99
+ if isinstance(file_result, list):
100
+ # File[] - read first file
101
+ if len(file_result) > 0:
102
+ out_file = file_result[0]['location'].replace('file://', '')
103
+ out_content = _read_file_content(out_file)
104
+ else:
105
+ # Single File
106
+ out_file = file_result['location'].replace('file://', '')
107
+ out_content = _read_file_content(out_file)
108
+ outs[ot['name']] = out_content
109
+ return outs
@@ -0,0 +1,27 @@
1
+ [tool.poetry]
2
+ name = "coala-cli"
3
+ version = "0.2.0"
4
+ description = "Convert any CMD tool into a LLM agent"
5
+ authors = ["Qiang"]
6
+ license = "MIT"
7
+ readme = "README.md"
8
+ packages = [{ include = "coala" }]
9
+
10
+ [tool.poetry.dependencies]
11
+ python = ">=3.12,<3.14"
12
+ fastapi = ">=0.114.0"
13
+ requests = "^2.32.3"
14
+ pydantic = "^2.9.2"
15
+ uvicorn = "^0.30.6"
16
+ cwltool = "^3.1.20240909164951"
17
+ mcp = "^1.9.0"
18
+
19
+
20
+ [build-system]
21
+ requires = ["poetry-core"]
22
+ build-backend = "poetry.core.masonry.api"
23
+
24
+ [tool.pytest.ini_options]
25
+ filterwarnings = [
26
+ "ignore::pydantic.warnings.UnsupportedFieldAttributeWarning",
27
+ ]