@toolbeltai/skills 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@toolbeltai/skills",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Official Toolbelt skills — named /toolbelt-* flows (start, analyze, find, entities, geo, stream, invite) that teach any MCP-capable agent to orchestrate Toolbelt's MCP tools end-to-end.",
5
5
  "license": "MIT",
6
6
  "homepage": "https://toolbelt.ai",
@@ -18,7 +18,7 @@ compatibility: >
18
18
  before invocation.
19
19
  metadata:
20
20
  author: toolbeltai
21
- version: "1.0"
21
+ version: "2.0"
22
22
  homepage: "https://toolbelt.ai/docs/sql"
23
23
  ---
24
24
 
@@ -42,15 +42,23 @@ Extract these from the args string or conversation context before starting:
42
42
  | Parameter | Required | Description |
43
43
  |---|---|---|
44
44
  | `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
45
- | `csv_content` | No | Raw CSV text to upload. Uses the embedded sample dataset if omitted. |
46
- | `asset_name` | No | Name for the uploaded table asset. Defaults to `sales-data`. |
47
- | `question` | No | Natural language question to ask about the data. Defaults to `What is the total sales amount by region?` |
45
+ | `csv_inputs` | No | Array of `{ name, content }` objects for multi-table analysis (e.g. orders + customers). Preferred when two or more related CSVs need JOINs. |
46
+ | `csv_content` | No | Single CSV text (shorthand for `csv_inputs: [{ name: asset_name, content: ... }]`). |
47
+ | `asset_name` | No | Name for the single uploaded table (when `csv_content` is used). Defaults to `sales-data`. |
48
+ | `question` | No | Natural language question. Defaults vary by input shape (see below). |
49
+
50
+ If neither `csv_inputs` nor `csv_content` is provided, use the single built-in
51
+ sample dataset below with `asset_name = "sales-data"`.
52
+
53
+ The resolved list of uploads is called **`uploads`** for the rest of this
54
+ skill — each element has `{ name, content }`.
48
55
 
49
56
  ---
50
57
 
51
58
  ## Default Sample Data
52
59
 
53
- If no `csv_content` is provided, use this sales dataset verbatim:
60
+ If no inputs are provided, use this single sales dataset verbatim as
61
+ `uploads = [{ name: "sales-data", content: <csv below> }]`:
54
62
 
55
63
  ```
56
64
  order_id,date,region,product,category,quantity,unit_price,amount,rep
@@ -76,7 +84,11 @@ order_id,date,region,product,category,quantity,unit_price,amount,rep
76
84
  1020,2024-03-28,Midwest,Widget Pro,Hardware,14,49.99,699.86,Carol Singh
77
85
  ```
78
86
 
79
- Default `question`: `What is the total sales amount by region?`
87
+ Default `question` for the single-table case:
88
+ `What is the total sales amount by region?`
89
+
90
+ For multi-table cases (`csv_inputs.length >= 2`) with no `question` provided,
91
+ ask the user to clarify — don't guess a default cross-table question.
80
92
 
81
93
  ---
82
94
 
@@ -114,41 +126,45 @@ Store the resolved `namespace_id` — pass it to every subsequent tool call.
114
126
 
115
127
  ---
116
128
 
117
- ## Phase 2: Upload CSV Data
129
+ ## Phase 2: Upload Each CSV
118
130
 
119
- Resolve `csv_content` (use parameter value or default sample above).
120
- Resolve `asset_name` (use parameter value or default `sales-data`).
131
+ Resolve `uploads` per the Invocation Parameters section.
121
132
 
122
- Call `toolbelt_save`:
133
+ **For each** `{ name, content }` in `uploads`, call `toolbelt_save`:
123
134
 
124
135
  ```json
125
136
  {
126
137
  "asset_type": "document",
127
138
  "namespace_id": "<namespace_id>",
128
- "name": "<asset_name>",
129
- "file_name": "<asset_name>.csv",
130
- "content": "<csv_content>",
139
+ "name": "<name>",
140
+ "file_name": "<name>.csv",
141
+ "content": "<content>",
131
142
  "content_encoding": "text",
132
143
  "data_format": "csv"
133
144
  }
134
145
  ```
135
146
 
136
- Record the returned `asset_id`.
147
+ Collect the returned `asset_id` for each upload into an array `asset_ids`.
148
+ Track the upload count as `upload_count`.
149
+
150
+ If any `toolbelt_save` fails, emit structured failure naming which upload
151
+ failed and halt — partial multi-table ingests can't be joined.
137
152
 
138
153
  ---
139
154
 
140
- ## Phase 3: Poll for Ingestion
155
+ ## Phase 3: Poll for All Ingestions
141
156
 
142
157
  Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` every 10 seconds.
143
158
 
144
- Wait for the `ingest` job for this asset to reach `completed`.
159
+ Wait until **every** `ingest` job for the `asset_ids` from Phase 2 reaches `completed`.
145
160
 
146
- Typical duration: 15–60 seconds. Maximum wait: 3 minutes.
161
+ Typical duration: 15–60 seconds per file. Maximum wait: 3 minutes total.
147
162
 
148
- If the job reaches `failed` or the timeout elapses, emit structured failure and halt:
163
+ If any job reaches `failed` or the timeout elapses, emit structured failure:
149
164
  ```
150
- FAILURE: CSV ingestion did not complete.
165
+ FAILURE: Ingestion did not complete for <asset_name>.
151
166
  Job status: <last observed status>
167
+ Completed so far: <N of M>
152
168
  ```
153
169
 
154
170
  ---
@@ -157,19 +173,25 @@ Job status: <last observed status>
157
173
 
158
174
  Call `toolbelt_context` with `{ "namespace_id": "<namespace_id>" }`.
159
175
 
160
- Locate the table corresponding to the uploaded asset (match by `asset_name` or
161
- the table name returned from the save call). Record:
162
- - `table_name`: the SQL table name for this asset
163
- - `column_names`: list of columns in the table
164
- - `row_count`: number of rows if provided in context
176
+ For each uploaded `asset_name` (from Phase 2), locate the corresponding
177
+ table in the returned context and record:
178
+ - `table_name`: the SQL table name
179
+ - `column_names`: columns in that table
180
+ - `row_count`: row count if provided
181
+
182
+ Store all of them in a `tables` array so Phase 5 can reason about JOINs.
165
183
 
166
184
  ---
167
185
 
168
- ## Phase 5: Ask a Natural Language Question
186
+ ## Phase 5: Answer the Question
187
+
188
+ Resolve `question` per the Invocation Parameters section.
189
+
190
+ **Decide on approach using `upload_count`:**
169
191
 
170
- Resolve `question` (use parameter value or default).
192
+ ### If `upload_count == 1` (single-table analytics)
171
193
 
172
- Call `toolbelt_search`:
194
+ Prefer `toolbelt_search` (it routes through our hybrid + NL→SQL layer):
173
195
 
174
196
  ```json
175
197
  {
@@ -179,23 +201,49 @@ Call `toolbelt_search`:
179
201
  }
180
202
  ```
181
203
 
182
- Parse the response to extract:
183
- - `answer`: the synthesized natural language answer
184
- - `sql_generated`: the SQL query that was generated and executed
185
- - `row_count`: number of rows returned by the SQL query
186
- - `sources`: any cited source tables or assets
204
+ If `toolbelt_search` does not return SQL, fall back to `toolbelt_sql`
205
+ with a query you write from the single table's schema:
206
+
207
+ ```json
208
+ {
209
+ "namespace_id": "<namespace_id>",
210
+ "query": "SELECT <...> FROM <table_name> ..."
211
+ }
212
+ ```
213
+
214
+ ### If `upload_count >= 2` (multi-table JOIN)
187
215
 
188
- If `toolbelt_search` does not return SQL, try `toolbelt_sql` directly with a
189
- query you write from the schema context:
216
+ 1. From the `tables` array collected in Phase 4, **identify candidate join
217
+ keys** columns that appear in two or more tables with compatible types
218
+ and matching name patterns (e.g. `customer_id` in `orders` + `customers`).
219
+ 2. Write a `SELECT` that JOINs on the identified key(s) and answers the
220
+ user's question. Prefer explicit `JOIN ... ON ...` (never implicit join).
221
+ 3. Call `toolbelt_sql` directly with the JOIN query:
190
222
 
191
223
  ```json
192
224
  {
193
225
  "namespace_id": "<namespace_id>",
194
- "query": "SELECT region, SUM(amount) AS total_amount FROM <table_name> GROUP BY region ORDER BY total_amount DESC"
226
+ "query": "SELECT a.<col>, SUM(b.<col>) FROM <t1> a JOIN <t2> b ON a.<key> = b.<key> GROUP BY a.<col>"
195
227
  }
196
228
  ```
197
229
 
198
- Record whichever path succeeded as `query_method` (`"search"` or `"direct_sql"`).
230
+ If no join key is identifiable, emit structured failure explaining which
231
+ tables were uploaded and asking the caller to supply a join key:
232
+ ```
233
+ FAILURE: Uploaded tables have no obvious join key.
234
+ Tables and columns: [<summary>]
235
+ Re-invoke with a question that specifies which column(s) relate the tables.
236
+ ```
237
+
238
+ ### For either case, parse the result:
239
+
240
+ - `answer`: synthesized natural-language answer (if using `toolbelt_search`) or a one-sentence summary you compose from the rows (if using `toolbelt_sql` directly)
241
+ - `sql_generated`: the SQL that ran
242
+ - `row_count`: rows returned
243
+ - `sources`: cited tables/assets
244
+
245
+ Record the path used as `query_method` — `"search"` (NL→SQL via hybrid),
246
+ `"direct_sql_single"` (one-table direct), or `"direct_sql_join"` (multi-table JOIN).
199
247
 
200
248
  ---
201
249
 
@@ -206,19 +254,22 @@ After all phases complete, emit a single structured result:
206
254
  ```
207
255
  RESULT:
208
256
  namespace_id: <uuid>
209
- asset_name: <name of uploaded table>
210
- table_name: <SQL table name>
211
- row_count_ingested: <rows in the table>
257
+ uploaded_tables:
258
+ - asset_name: <name>
259
+ table_name: <sql table name>
260
+ column_names: [<list>]
261
+ row_count_ingested: <rows>
262
+ upload_count: <N>
212
263
  phases_run: [0, 1, 2, 3, 4, 5]
213
264
 
214
265
  question: "<question asked>"
215
- query_method: "<'search' or 'direct_sql'>"
266
+ query_method: "<search | direct_sql_single | direct_sql_join>"
216
267
  sql_generated: |
217
- <SQL query that was generated or executed>
218
- row_count: <number of rows returned by the query>
268
+ <SQL query executed>
269
+ row_count: <rows returned>
219
270
  answer: |
220
- <synthesized answer>
221
- sources: [<cited tables or assets>]
271
+ <natural-language answer>
272
+ sources: [<tables cited>]
222
273
  ```
223
274
 
224
275
  ---
@@ -229,8 +280,28 @@ RESULT:
229
280
  |---|---|
230
281
  | 0. Verify connection | `toolbelt_list_namespaces` |
231
282
  | 1. Resolve namespace | (from Phase 0 result) |
232
- | 2. Upload CSV document | `toolbelt_save` |
283
+ | 2. Upload each CSV | `toolbelt_save` (once per input) |
233
284
  | 3. Poll for ingestion | `toolbelt_jobs` |
234
285
  | 4. Get schema context | `toolbelt_context` |
235
- | 5. Ask question | `toolbelt_search`, `toolbelt_sql` (fallback) |
286
+ | 5. Answer | `toolbelt_search` (single-table NL path), `toolbelt_sql` (direct; required for JOINs) |
236
287
  | 6. Emit result | (structured output) |
288
+
289
+ ---
290
+
291
+ ## Multi-Table Example
292
+
293
+ Invocation:
294
+ ```
295
+ /toolbelt-analyze csv_inputs=[
296
+ { name: "orders", content: "order_id,customer_id,amount,..." },
297
+ { name: "customers", content: "customer_id,region,tier,..." }
298
+ ] question="Total amount by customer region"
299
+ ```
300
+
301
+ Expected phases:
302
+ - Phase 2: `toolbelt_save` for `orders`, then `toolbelt_save` for `customers`
303
+ - Phase 3: poll both `ingest` jobs to completion
304
+ - Phase 4: context returns two tables; record both schemas
305
+ - Phase 5: identify `customer_id` as the join key, run
306
+ `SELECT c.region, SUM(o.amount) FROM orders o JOIN customers c ON o.customer_id = c.customer_id GROUP BY c.region`
307
+ - Phase 6: structured result with `query_method = "direct_sql_join"` and both tables in `uploaded_tables`