@toolbeltai/skills 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +64 -0
- package/assets/demos/fake-curl.sh +3 -0
- package/assets/demos/geo-analyst-demo.sh +85 -0
- package/assets/demos/run-toolbelt-demo.sh +66 -0
- package/assets/geo-analyst-demo.gif +0 -0
- package/assets/run-toolbelt-demo.gif +0 -0
- package/assets/signup-demo.gif +0 -0
- package/bin/install.js +110 -0
- package/data-blend/SKILL.md +283 -0
- package/geo-analyst/README.md +7 -0
- package/geo-analyst/SKILL.md +241 -0
- package/knowledge-graph/SKILL.md +354 -0
- package/multi-agent-workspace/SKILL.md +217 -0
- package/package.json +51 -0
- package/run-toolbelt/README.md +122 -0
- package/run-toolbelt/SKILL.md +238 -0
- package/sql-analyst/SKILL.md +232 -0
- package/streaming-analyst/SKILL.md +351 -0
- package/vector-search/SKILL.md +259 -0
|
@@ -0,0 +1,232 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sql-analyst
|
|
3
|
+
description: >
|
|
4
|
+
Text-to-SQL analyst powered by Toolbelt MCP. Toolbelt is a multi-modal data
|
|
5
|
+
platform combining SQL analytics, vector search, and real-time streaming.
|
|
6
|
+
Uploads a CSV, auto-generates schema context, then answers natural language
|
|
7
|
+
questions by generating and executing SQL. Use when an AI agent needs to analyze
|
|
8
|
+
tabular data without writing queries manually — upload, ask, get answers.
|
|
9
|
+
license: MIT
|
|
10
|
+
compatibility: >
|
|
11
|
+
Requires a Toolbelt account (provision free at https://toolbelt.ai) and an
|
|
12
|
+
MCP-compatible AI agent (Claude Code, Claude Desktop, or any client that
|
|
13
|
+
supports MCP server connections). MCP connection must be pre-established
|
|
14
|
+
before invocation.
|
|
15
|
+
metadata:
|
|
16
|
+
author: toolbeltai
|
|
17
|
+
version: "1.0"
|
|
18
|
+
openclaw:
|
|
19
|
+
emoji: "📊"
|
|
20
|
+
homepage: "https://toolbelt.ai/docs/sql"
|
|
21
|
+
skillKey: "sql-analyst"
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
Upload tabular data and answer natural language questions about it using
|
|
25
|
+
Toolbelt MCP tools. Work through each phase in order without prompting for
|
|
26
|
+
user input. On unrecoverable error, emit a structured failure and halt.
|
|
27
|
+
|
|
28
|
+
## When Not To Use
|
|
29
|
+
|
|
30
|
+
- For unstructured text or documents — use `knowledge-graph` to extract entities and relationships.
|
|
31
|
+
- For real-time or streaming data — use `streaming-analyst` instead.
|
|
32
|
+
- For spatial data with lat/lon coordinates — use `geo-analyst` instead.
|
|
33
|
+
|
|
34
|
+
## Invocation Parameters
|
|
35
|
+
|
|
36
|
+
Extract these from the args string or conversation context before starting:
|
|
37
|
+
|
|
38
|
+
| Parameter | Required | Description |
|
|
39
|
+
|---|---|---|
|
|
40
|
+
| `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
|
|
41
|
+
| `csv_content` | No | Raw CSV text to upload. Uses the embedded sample dataset if omitted. |
|
|
42
|
+
| `asset_name` | No | Name for the uploaded table asset. Defaults to `sales-data`. |
|
|
43
|
+
| `question` | No | Natural language question to ask about the data. Defaults to `What is the total sales amount by region?` |
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Default Sample Data
|
|
48
|
+
|
|
49
|
+
If no `csv_content` is provided, use this sales dataset verbatim:
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
order_id,date,region,product,category,quantity,unit_price,amount,rep
|
|
53
|
+
1001,2024-01-05,Northeast,Widget Pro,Hardware,12,49.99,599.88,Alice Chen
|
|
54
|
+
1002,2024-01-08,Southeast,Gadget Basic,Software,5,29.99,149.95,Bob Martinez
|
|
55
|
+
1003,2024-01-12,Midwest,Widget Pro,Hardware,8,49.99,399.92,Carol Singh
|
|
56
|
+
1004,2024-01-15,West,Service Plan,Services,3,199.00,597.00,David Park
|
|
57
|
+
1005,2024-01-19,Northeast,Gadget Basic,Software,20,29.99,599.80,Alice Chen
|
|
58
|
+
1006,2024-01-22,West,Widget Pro,Hardware,6,49.99,299.94,Emma Lopez
|
|
59
|
+
1007,2024-02-03,Southeast,Service Plan,Services,2,199.00,398.00,Bob Martinez
|
|
60
|
+
1008,2024-02-07,Midwest,Gadget Plus,Software,15,79.99,1199.85,Frank Kim
|
|
61
|
+
1009,2024-02-11,Northeast,Widget Pro,Hardware,10,49.99,499.90,Alice Chen
|
|
62
|
+
1010,2024-02-14,West,Gadget Basic,Software,8,29.99,239.92,David Park
|
|
63
|
+
1011,2024-02-18,Southeast,Gadget Plus,Software,4,79.99,319.96,Carol Singh
|
|
64
|
+
1012,2024-02-21,Midwest,Service Plan,Services,1,199.00,199.00,Frank Kim
|
|
65
|
+
1013,2024-03-02,Northeast,Service Plan,Services,5,199.00,995.00,Alice Chen
|
|
66
|
+
1014,2024-03-06,West,Gadget Plus,Software,9,79.99,719.91,Emma Lopez
|
|
67
|
+
1015,2024-03-10,Southeast,Widget Pro,Hardware,7,49.99,349.93,Bob Martinez
|
|
68
|
+
1016,2024-03-14,Midwest,Gadget Basic,Software,11,29.99,329.89,Carol Singh
|
|
69
|
+
1017,2024-03-18,Northeast,Gadget Plus,Software,6,79.99,479.94,David Park
|
|
70
|
+
1018,2024-03-22,West,Service Plan,Services,4,199.00,796.00,Emma Lopez
|
|
71
|
+
1019,2024-03-25,Southeast,Widget Pro,Hardware,3,49.99,149.97,Frank Kim
|
|
72
|
+
1020,2024-03-28,Midwest,Widget Pro,Hardware,14,49.99,699.86,Carol Singh
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Default `question`: `What is the total sales amount by region?`
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Phase 0: Verify Connection
|
|
80
|
+
|
|
81
|
+
Call `get_semantic_names` (no arguments) immediately.
|
|
82
|
+
|
|
83
|
+
- **If it succeeds:** proceed to Phase 1 using the returned namespaces.
|
|
84
|
+
- **If it fails:** emit structured failure and halt.
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
FAILURE: Toolbelt MCP connection is not established.
|
|
88
|
+
The MCP server must be connected before invoking this skill.
|
|
89
|
+
See: https://toolbelt.ai/docs/mcp for setup instructions.
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Phase 1: Resolve Namespace
|
|
95
|
+
|
|
96
|
+
Use the namespaces returned from Phase 0.
|
|
97
|
+
|
|
98
|
+
Resolution order:
|
|
99
|
+
1. If `namespace_id` was provided as a parameter, use it directly.
|
|
100
|
+
2. If only one namespace exists, use it.
|
|
101
|
+
3. If multiple exist and no `namespace_id` was specified, emit structured failure and halt.
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
FAILURE: Multiple namespaces found and none specified.
|
|
105
|
+
Available: [<list namespace display names and IDs>]
|
|
106
|
+
Re-invoke with namespace_id=<uuid>.
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Store the resolved `namespace_id` — pass it to every subsequent tool call.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Phase 2: Upload CSV Data
|
|
114
|
+
|
|
115
|
+
Resolve `csv_content` (use parameter value or default sample above).
|
|
116
|
+
Resolve `asset_name` (use parameter value or default `sales-data`).
|
|
117
|
+
|
|
118
|
+
Call `toolbelt_save`:
|
|
119
|
+
|
|
120
|
+
```json
|
|
121
|
+
{
|
|
122
|
+
"asset_type": "document",
|
|
123
|
+
"namespace_id": "<namespace_id>",
|
|
124
|
+
"name": "<asset_name>",
|
|
125
|
+
"file_name": "<asset_name>.csv",
|
|
126
|
+
"content": "<csv_content>",
|
|
127
|
+
"content_encoding": "text",
|
|
128
|
+
"data_format": "csv"
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Record the returned `asset_id`.
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Phase 3: Poll for Ingestion
|
|
137
|
+
|
|
138
|
+
Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` every 10 seconds.
|
|
139
|
+
|
|
140
|
+
Wait for the `ingest` job for this asset to reach `completed`.
|
|
141
|
+
|
|
142
|
+
Typical duration: 15–60 seconds. Maximum wait: 3 minutes.
|
|
143
|
+
|
|
144
|
+
If the job reaches `failed` or the timeout elapses, emit structured failure and halt:
|
|
145
|
+
```
|
|
146
|
+
FAILURE: CSV ingestion did not complete.
|
|
147
|
+
Job status: <last observed status>
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Phase 4: Get Schema Context
|
|
153
|
+
|
|
154
|
+
Call `toolbelt_context` with `{ "namespace_id": "<namespace_id>" }`.
|
|
155
|
+
|
|
156
|
+
Locate the table corresponding to the uploaded asset (match by `asset_name` or
|
|
157
|
+
the table name returned from the save call). Record:
|
|
158
|
+
- `table_name`: the SQL table name for this asset
|
|
159
|
+
- `column_names`: list of columns in the table
|
|
160
|
+
- `row_count`: number of rows if provided in context
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Phase 5: Ask a Natural Language Question
|
|
165
|
+
|
|
166
|
+
Resolve `question` (use parameter value or default).
|
|
167
|
+
|
|
168
|
+
Call `toolbelt_search`:
|
|
169
|
+
|
|
170
|
+
```json
|
|
171
|
+
{
|
|
172
|
+
"question": "<question>",
|
|
173
|
+
"namespace_id": "<namespace_id>",
|
|
174
|
+
"synthesize": true
|
|
175
|
+
}
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Parse the response to extract:
|
|
179
|
+
- `answer`: the synthesized natural language answer
|
|
180
|
+
- `sql_generated`: the SQL query that was generated and executed
|
|
181
|
+
- `row_count`: number of rows returned by the SQL query
|
|
182
|
+
- `sources`: any cited source tables or assets
|
|
183
|
+
|
|
184
|
+
If `toolbelt_search` does not return SQL, try `toolbelt_sql` directly with a
|
|
185
|
+
query you write from the schema context:
|
|
186
|
+
|
|
187
|
+
```json
|
|
188
|
+
{
|
|
189
|
+
"namespace_id": "<namespace_id>",
|
|
190
|
+
"query": "SELECT region, SUM(amount) AS total_amount FROM <table_name> GROUP BY region ORDER BY total_amount DESC"
|
|
191
|
+
}
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Record whichever path succeeded as `query_method` (`"search"` or `"direct_sql"`).
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## Phase 6: Structured Output
|
|
199
|
+
|
|
200
|
+
After all phases complete, emit a single structured result:
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
RESULT:
|
|
204
|
+
namespace_id: <uuid>
|
|
205
|
+
asset_name: <name of uploaded table>
|
|
206
|
+
table_name: <SQL table name>
|
|
207
|
+
row_count_ingested: <rows in the table>
|
|
208
|
+
phases_run: [0, 1, 2, 3, 4, 5]
|
|
209
|
+
|
|
210
|
+
question: "<question asked>"
|
|
211
|
+
query_method: "<'search' or 'direct_sql'>"
|
|
212
|
+
sql_generated: |
|
|
213
|
+
<SQL query that was generated or executed>
|
|
214
|
+
row_count: <number of rows returned by the query>
|
|
215
|
+
answer: |
|
|
216
|
+
<synthesized answer>
|
|
217
|
+
sources: [<cited tables or assets>]
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Tool Reference
|
|
223
|
+
|
|
224
|
+
| Phase | Tool(s) |
|
|
225
|
+
|---|---|
|
|
226
|
+
| 0. Verify connection | `get_semantic_names` |
|
|
227
|
+
| 1. Resolve namespace | (from Phase 0 result) |
|
|
228
|
+
| 2. Upload CSV document | `toolbelt_save` |
|
|
229
|
+
| 3. Poll for ingestion | `toolbelt_jobs` |
|
|
230
|
+
| 4. Get schema context | `toolbelt_context` |
|
|
231
|
+
| 5. Ask question | `toolbelt_search`, `toolbelt_sql` (fallback) |
|
|
232
|
+
| 6. Emit result | (structured output) |
|
|
@@ -0,0 +1,351 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: streaming-analyst
|
|
3
|
+
description: >
|
|
4
|
+
Real-time streaming analytics powered by Toolbelt MCP. Toolbelt is a
|
|
5
|
+
multi-modal data platform combining SQL analytics, vector search, and real-time
|
|
6
|
+
streaming. Connects a Kafka topic, runs windowed aggregation queries, and detects
|
|
7
|
+
anomalies using standard deviation. Falls back to simulated stream data when no
|
|
8
|
+
Kafka broker is provided. Use when an AI agent needs to analyze streaming data —
|
|
9
|
+
IoT sensors, event logs, security events, or fleet telemetry — without writing
|
|
10
|
+
infrastructure code.
|
|
11
|
+
license: MIT
|
|
12
|
+
compatibility: >
|
|
13
|
+
Requires a Toolbelt account (provision free at https://toolbelt.ai) and an
|
|
14
|
+
MCP-compatible AI agent (Claude Code, Claude Desktop, or any client that
|
|
15
|
+
supports MCP server connections). MCP connection must be pre-established
|
|
16
|
+
before invocation. Kafka parameters are optional — omit them to run with
|
|
17
|
+
simulated stream data.
|
|
18
|
+
metadata:
|
|
19
|
+
author: toolbeltai
|
|
20
|
+
version: "1.0"
|
|
21
|
+
openclaw:
|
|
22
|
+
emoji: "📡"
|
|
23
|
+
homepage: "https://toolbelt.ai/docs/streaming"
|
|
24
|
+
skillKey: "streaming-analyst"
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
Connect a Kafka topic (or simulate one) and run real-time aggregation and
|
|
28
|
+
anomaly detection using Toolbelt MCP tools. Work through each phase in order
|
|
29
|
+
without prompting for user input. On unrecoverable error, emit a structured
|
|
30
|
+
failure and halt.
|
|
31
|
+
|
|
32
|
+
## When Not To Use
|
|
33
|
+
|
|
34
|
+
- For static batch tabular data — use `sql-analyst` instead.
|
|
35
|
+
- When real-time monitoring, windowed aggregation, or anomaly detection is not the goal.
|
|
36
|
+
|
|
37
|
+
## Invocation Parameters
|
|
38
|
+
|
|
39
|
+
Extract these from the args string or conversation context before starting:
|
|
40
|
+
|
|
41
|
+
| Parameter | Required | Description |
|
|
42
|
+
|---|---|---|
|
|
43
|
+
| `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
|
|
44
|
+
| `kafka_broker` | No | Kafka broker URL (e.g. `kafka-broker:9092`). Omit to use simulated stream data. |
|
|
45
|
+
| `kafka_topic` | No | Kafka topic name. Required if `kafka_broker` is provided. |
|
|
46
|
+
| `kafka_schema` | No | SQL column schema (e.g. `sensor_id VARCHAR(64), ts TIMESTAMP, value DOUBLE`). Defaults to IoT schema below. |
|
|
47
|
+
| `kafka_group_id` | No | Kafka consumer group ID. Omitted if not provided. |
|
|
48
|
+
| `anomaly_threshold` | No | Standard deviation multiplier for anomaly detection. Defaults to `2.0`. |
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Default Simulated Stream Data
|
|
53
|
+
|
|
54
|
+
When `kafka_broker` is not provided, upload this IoT sensor reading dataset as
|
|
55
|
+
a relational asset to simulate a stream snapshot. It includes planted anomalies
|
|
56
|
+
(readings > 2 standard deviations from mean) for detection validation.
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
sensor_id,ts,value,unit
|
|
60
|
+
sensor-01,2024-03-01 00:00:00,72.3,celsius
|
|
61
|
+
sensor-02,2024-03-01 00:00:00,71.8,celsius
|
|
62
|
+
sensor-03,2024-03-01 00:00:00,70.5,celsius
|
|
63
|
+
sensor-01,2024-03-01 00:01:00,72.6,celsius
|
|
64
|
+
sensor-02,2024-03-01 00:01:00,72.1,celsius
|
|
65
|
+
sensor-03,2024-03-01 00:01:00,71.0,celsius
|
|
66
|
+
sensor-01,2024-03-01 00:02:00,73.0,celsius
|
|
67
|
+
sensor-02,2024-03-01 00:02:00,71.5,celsius
|
|
68
|
+
sensor-03,2024-03-01 00:02:00,70.8,celsius
|
|
69
|
+
sensor-01,2024-03-01 00:03:00,72.8,celsius
|
|
70
|
+
sensor-02,2024-03-01 00:03:00,98.7,celsius
|
|
71
|
+
sensor-03,2024-03-01 00:03:00,71.2,celsius
|
|
72
|
+
sensor-01,2024-03-01 00:04:00,73.1,celsius
|
|
73
|
+
sensor-02,2024-03-01 00:04:00,72.4,celsius
|
|
74
|
+
sensor-03,2024-03-01 00:04:00,45.1,celsius
|
|
75
|
+
sensor-01,2024-03-01 00:05:00,72.5,celsius
|
|
76
|
+
sensor-02,2024-03-01 00:05:00,72.0,celsius
|
|
77
|
+
sensor-03,2024-03-01 00:05:00,71.5,celsius
|
|
78
|
+
sensor-01,2024-03-01 00:06:00,72.9,celsius
|
|
79
|
+
sensor-02,2024-03-01 00:06:00,71.7,celsius
|
|
80
|
+
sensor-03,2024-03-01 00:06:00,70.9,celsius
|
|
81
|
+
sensor-01,2024-03-01 00:07:00,126.4,celsius
|
|
82
|
+
sensor-02,2024-03-01 00:07:00,72.3,celsius
|
|
83
|
+
sensor-03,2024-03-01 00:07:00,71.8,celsius
|
|
84
|
+
sensor-01,2024-03-01 00:08:00,73.2,celsius
|
|
85
|
+
sensor-02,2024-03-01 00:08:00,71.9,celsius
|
|
86
|
+
sensor-03,2024-03-01 00:08:00,71.3,celsius
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
The three anomalies in this dataset:
|
|
90
|
+
- `sensor-02` at `00:03:00` — value `98.7` (spike high)
|
|
91
|
+
- `sensor-03` at `00:04:00` — value `45.1` (drop low)
|
|
92
|
+
- `sensor-01` at `00:07:00` — value `126.4` (extreme spike)
|
|
93
|
+
|
|
94
|
+
Default `kafka_schema` (used if broker is provided without schema):
|
|
95
|
+
```
|
|
96
|
+
sensor_id VARCHAR(64), ts TIMESTAMP, value DOUBLE, unit VARCHAR(32)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Phase 0: Verify Connection
|
|
102
|
+
|
|
103
|
+
Call `get_semantic_names` (no arguments) immediately.
|
|
104
|
+
|
|
105
|
+
- **If it succeeds:** proceed to Phase 1 using the returned namespaces.
|
|
106
|
+
- **If it fails:** emit structured failure and halt.
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
FAILURE: Toolbelt MCP connection is not established.
|
|
110
|
+
The MCP server must be connected before invoking this skill.
|
|
111
|
+
See: https://toolbelt.ai/docs/mcp for setup instructions.
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Phase 1: Resolve Namespace
|
|
117
|
+
|
|
118
|
+
Use the namespaces returned from Phase 0.
|
|
119
|
+
|
|
120
|
+
Resolution order:
|
|
121
|
+
1. If `namespace_id` was provided as a parameter, use it directly.
|
|
122
|
+
2. If only one namespace exists, use it.
|
|
123
|
+
3. If multiple exist and no `namespace_id` was specified, emit structured failure and halt.
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
FAILURE: Multiple namespaces found and none specified.
|
|
127
|
+
Available: [<list namespace display names and IDs>]
|
|
128
|
+
Re-invoke with namespace_id=<uuid>.
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Store the resolved `namespace_id` — pass it to every subsequent tool call.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Phase 2: Connect Stream Source
|
|
136
|
+
|
|
137
|
+
**If `kafka_broker` is provided:**
|
|
138
|
+
|
|
139
|
+
Call `toolbelt_connect`:
|
|
140
|
+
```json
|
|
141
|
+
{
|
|
142
|
+
"source_type": "kafka",
|
|
143
|
+
"namespace_id": "<namespace_id>",
|
|
144
|
+
"location": "KAFKA://<kafka_broker>",
|
|
145
|
+
"external_table_name": "<kafka_topic>",
|
|
146
|
+
"asset_name": "<kafka_topic>",
|
|
147
|
+
"kafka_column_definitions": "<kafka_schema or default schema>",
|
|
148
|
+
"kafka_subscribe": true,
|
|
149
|
+
"extra_options": { "kafka.group.id": "<kafka_group_id>" }
|
|
150
|
+
}
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Omit `extra_options` if `kafka_group_id` was not provided.
|
|
154
|
+
|
|
155
|
+
Store the resulting table name as `stream_table`. Record `source_mode: "kafka"`.
|
|
156
|
+
|
|
157
|
+
**If `kafka_broker` is not provided (simulated stream):**
|
|
158
|
+
|
|
159
|
+
Upload the default sample data above as a document using `toolbelt_save`:
|
|
160
|
+
|
|
161
|
+
```json
|
|
162
|
+
{
|
|
163
|
+
"asset_type": "document",
|
|
164
|
+
"namespace_id": "<namespace_id>",
|
|
165
|
+
"name": "stream-readings",
|
|
166
|
+
"file_name": "stream-readings.csv",
|
|
167
|
+
"content": "<default sample data above>",
|
|
168
|
+
"content_encoding": "text",
|
|
169
|
+
"data_format": "csv"
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Record `source_mode: "simulated"`. Poll `toolbelt_jobs` every 10 seconds until
|
|
174
|
+
the `ingest` job reaches `completed`. Maximum wait: 3 minutes.
|
|
175
|
+
|
|
176
|
+
If the job reaches `failed` or the timeout elapses, emit structured failure and halt:
|
|
177
|
+
```
|
|
178
|
+
FAILURE: Stream data ingestion did not complete.
|
|
179
|
+
Job status: <last observed status>
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Call `toolbelt_context` to get the table name. Store as `stream_table`.
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Phase 3: Confirm Data Arrival
|
|
187
|
+
|
|
188
|
+
Call `toolbelt_execute` to verify the stream table is queryable and report
|
|
189
|
+
the initial row count:
|
|
190
|
+
|
|
191
|
+
```sql
|
|
192
|
+
SELECT COUNT(*) AS row_count FROM <stream_table>
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
If this fails, emit structured failure and halt:
|
|
196
|
+
```
|
|
197
|
+
FAILURE: Stream table is not queryable.
|
|
198
|
+
Table: <stream_table>
|
|
199
|
+
Error: <error message>
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Record `initial_row_count`. For simulated streams this is the full dataset.
|
|
203
|
+
For live Kafka streams, wait 30 seconds and poll again to observe growth:
|
|
204
|
+
|
|
205
|
+
```sql
|
|
206
|
+
SELECT COUNT(*) AS row_count FROM <stream_table>
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
Record `final_row_count`. If `final_row_count > initial_row_count`, set
|
|
210
|
+
`data_is_growing: true`. For simulated mode, set `data_is_growing: "simulated"`.
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Phase 4: Aggregation Queries
|
|
215
|
+
|
|
216
|
+
Run the following aggregation queries via `toolbelt_execute`. Substitute
|
|
217
|
+
`<stream_table>` with the resolved table name throughout.
|
|
218
|
+
|
|
219
|
+
### Query 1 — Per-sensor stats
|
|
220
|
+
|
|
221
|
+
Compute mean, min, max, and reading count per sensor:
|
|
222
|
+
|
|
223
|
+
```sql
|
|
224
|
+
SELECT
|
|
225
|
+
sensor_id,
|
|
226
|
+
COUNT(*) AS reading_count,
|
|
227
|
+
ROUND(AVG(value), 2) AS avg_value,
|
|
228
|
+
ROUND(MIN(value), 2) AS min_value,
|
|
229
|
+
ROUND(MAX(value), 2) AS max_value
|
|
230
|
+
FROM <stream_table>
|
|
231
|
+
GROUP BY sensor_id
|
|
232
|
+
ORDER BY sensor_id ASC
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
Record `per_sensor_rows` (number of rows returned).
|
|
236
|
+
|
|
237
|
+
### Query 2 — 1-minute window aggregation
|
|
238
|
+
|
|
239
|
+
Aggregate readings into 1-minute windows using timestamp truncation:
|
|
240
|
+
|
|
241
|
+
```sql
|
|
242
|
+
SELECT
|
|
243
|
+
sensor_id,
|
|
244
|
+
DATETIME(ts, 'start of minute') AS window_start,
|
|
245
|
+
COUNT(*) AS readings_in_window,
|
|
246
|
+
ROUND(AVG(value), 2) AS avg_value
|
|
247
|
+
FROM <stream_table>
|
|
248
|
+
GROUP BY sensor_id, DATETIME(ts, 'start of minute')
|
|
249
|
+
ORDER BY window_start ASC, sensor_id ASC
|
|
250
|
+
LIMIT 20
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
If the database does not support `DATETIME(ts, 'start of minute')`, fall back to:
|
|
254
|
+
```sql
|
|
255
|
+
SELECT
|
|
256
|
+
sensor_id,
|
|
257
|
+
ts,
|
|
258
|
+
value
|
|
259
|
+
FROM <stream_table>
|
|
260
|
+
ORDER BY ts ASC, sensor_id ASC
|
|
261
|
+
LIMIT 20
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
Record `window_rows` (number of rows returned).
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## Phase 5: Anomaly Detection
|
|
269
|
+
|
|
270
|
+
Detect readings that deviate more than `anomaly_threshold` standard deviations
|
|
271
|
+
from the per-sensor mean. Run via `toolbelt_execute`:
|
|
272
|
+
|
|
273
|
+
```sql
|
|
274
|
+
SELECT
|
|
275
|
+
r.sensor_id,
|
|
276
|
+
r.ts,
|
|
277
|
+
r.value,
|
|
278
|
+
ROUND(stats.avg_val, 2) AS sensor_mean,
|
|
279
|
+
ROUND(stats.std_val, 2) AS sensor_stddev,
|
|
280
|
+
ROUND(ABS(r.value - stats.avg_val) / NULLIF(stats.std_val, 0), 2) AS z_score
|
|
281
|
+
FROM <stream_table> r
|
|
282
|
+
JOIN (
|
|
283
|
+
SELECT
|
|
284
|
+
sensor_id,
|
|
285
|
+
AVG(value) AS avg_val,
|
|
286
|
+
STDDEV(value) AS std_val
|
|
287
|
+
FROM <stream_table>
|
|
288
|
+
GROUP BY sensor_id
|
|
289
|
+
) stats ON r.sensor_id = stats.sensor_id
|
|
290
|
+
WHERE ABS(r.value - stats.avg_val) > (<anomaly_threshold> * stats.std_val)
|
|
291
|
+
ORDER BY z_score DESC
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
Substitute `<anomaly_threshold>` with the resolved value (default `2.0`).
|
|
295
|
+
|
|
296
|
+
If `STDDEV` is not supported, substitute with the population standard deviation
|
|
297
|
+
expression: `SQRT(AVG(value * value) - AVG(value) * AVG(value))`.
|
|
298
|
+
|
|
299
|
+
Record:
|
|
300
|
+
- `anomaly_count`: number of rows returned
|
|
301
|
+
- `anomalies`: list of `{ sensor_id, ts, value, z_score }` for each row
|
|
302
|
+
|
|
303
|
+
---
|
|
304
|
+
|
|
305
|
+
## Phase 6: Structured Output
|
|
306
|
+
|
|
307
|
+
Emit a single structured result after all phases complete:
|
|
308
|
+
|
|
309
|
+
```
|
|
310
|
+
RESULT:
|
|
311
|
+
namespace_id: <uuid>
|
|
312
|
+
stream_table: <table name>
|
|
313
|
+
source_mode: <"kafka" or "simulated">
|
|
314
|
+
phases_run: [0, 1, 2, 3, 4, 5]
|
|
315
|
+
|
|
316
|
+
data_arrival:
|
|
317
|
+
initial_row_count: <count>
|
|
318
|
+
final_row_count: <count>
|
|
319
|
+
data_is_growing: <true / false / "simulated">
|
|
320
|
+
|
|
321
|
+
aggregation:
|
|
322
|
+
per_sensor_rows: <count>
|
|
323
|
+
window_rows: <count>
|
|
324
|
+
|
|
325
|
+
anomaly_detection:
|
|
326
|
+
threshold_stddev: <anomaly_threshold>
|
|
327
|
+
anomaly_count: <count>
|
|
328
|
+
anomalies:
|
|
329
|
+
- sensor_id: <id>
|
|
330
|
+
ts: <timestamp>
|
|
331
|
+
value: <value>
|
|
332
|
+
z_score: <z_score>
|
|
333
|
+
...
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
If any Phase 4–5 query fails, include `query_error: "<error>"` under that
|
|
337
|
+
section and continue. Only halt on Phase 0–2 failures.
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Tool Reference
|
|
342
|
+
|
|
343
|
+
| Phase | Tool(s) |
|
|
344
|
+
|---|---|
|
|
345
|
+
| 0. Verify connection | `get_semantic_names` |
|
|
346
|
+
| 1. Resolve namespace | (from Phase 0 result) |
|
|
347
|
+
| 2. Connect stream | `toolbelt_connect` (Kafka) or `toolbelt_save` + `toolbelt_jobs` + `toolbelt_context` (simulated) |
|
|
348
|
+
| 3. Confirm data arrival | `toolbelt_execute` × 1–2 |
|
|
349
|
+
| 4. Aggregation queries | `toolbelt_execute` × 2 |
|
|
350
|
+
| 5. Anomaly detection | `toolbelt_execute` × 1 |
|
|
351
|
+
| 6. Emit result | (structured output) |
|