@toolbeltai/skills 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +64 -0
- package/assets/demos/fake-curl.sh +3 -0
- package/assets/demos/geo-analyst-demo.sh +85 -0
- package/assets/demos/run-toolbelt-demo.sh +66 -0
- package/assets/geo-analyst-demo.gif +0 -0
- package/assets/run-toolbelt-demo.gif +0 -0
- package/assets/signup-demo.gif +0 -0
- package/bin/install.js +110 -0
- package/data-blend/SKILL.md +283 -0
- package/geo-analyst/README.md +7 -0
- package/geo-analyst/SKILL.md +241 -0
- package/knowledge-graph/SKILL.md +354 -0
- package/multi-agent-workspace/SKILL.md +217 -0
- package/package.json +51 -0
- package/run-toolbelt/README.md +122 -0
- package/run-toolbelt/SKILL.md +238 -0
- package/sql-analyst/SKILL.md +232 -0
- package/streaming-analyst/SKILL.md +351 -0
- package/vector-search/SKILL.md +259 -0
|
@@ -0,0 +1,241 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: geo-analyst
|
|
3
|
+
description: >
|
|
4
|
+
GPU-accelerated geospatial analytics powered by Toolbelt MCP. Toolbelt is a
|
|
5
|
+
multi-modal data platform combining SQL analytics, vector search, and real-time
|
|
6
|
+
streaming. Uploads lat/lon sensor data, runs geospatial SQL queries (distance,
|
|
7
|
+
point-in-polygon, track creation), and emits structured results. Use when an AI
|
|
8
|
+
agent needs to analyze spatial data, compute distances, test point containment,
|
|
9
|
+
or build movement tracks without writing infrastructure code.
|
|
10
|
+
license: MIT
|
|
11
|
+
compatibility: >
|
|
12
|
+
Requires a Toolbelt account (provision free at https://toolbelt.ai) and an
|
|
13
|
+
MCP-compatible AI agent (Claude Code, Claude Desktop, or any client that
|
|
14
|
+
supports MCP server connections). MCP connection must be pre-established
|
|
15
|
+
before invocation.
|
|
16
|
+
metadata:
|
|
17
|
+
author: toolbeltai
|
|
18
|
+
version: "1.0"
|
|
19
|
+
openclaw:
|
|
20
|
+
emoji: "🌍"
|
|
21
|
+
homepage: "https://toolbelt.ai/docs/geospatial"
|
|
22
|
+
skillKey: "geo-analyst"
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
Execute GPU-accelerated geospatial analytics end-to-end using Toolbelt MCP tools.
|
|
26
|
+
Work through each phase in order. Extract all required inputs from task parameters
|
|
27
|
+
or invocation context — do not prompt for user input. Progress through phases
|
|
28
|
+
without confirmation. On unrecoverable error, emit a structured failure and halt.
|
|
29
|
+
|
|
30
|
+
## When Not To Use
|
|
31
|
+
|
|
32
|
+
- For tabular data without lat/lon coordinates — use `sql-analyst` instead.
|
|
33
|
+
- For unstructured text or documents — use `knowledge-graph` instead.
|
|
34
|
+
|
|
35
|
+
## Invocation Parameters
|
|
36
|
+
|
|
37
|
+
Extract these from the args string or conversation context before starting:
|
|
38
|
+
|
|
39
|
+
| Parameter | Required | Description |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
|
|
42
|
+
| `csv_content` | No | Raw CSV text with id, name, lat, lon columns. Uses default Tampa Bay sample if omitted. |
|
|
43
|
+
| `asset_name` | No | Name for the uploaded sensor table. Defaults to `sensor-locations`. |
|
|
44
|
+
| `zone_wkt` | No | WKT polygon for point-in-polygon query. Uses default Tampa downtown zone if omitted. |
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Default Sample Data
|
|
49
|
+
|
|
50
|
+
If no `csv_content` is provided, use this Tampa Bay area sensor dataset:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
id,name,lat,lon
|
|
54
|
+
1,Sensor A,27.9506,-82.4572
|
|
55
|
+
2,Sensor B,27.9659,-82.4398
|
|
56
|
+
3,Sensor C,27.9881,-82.5014
|
|
57
|
+
4,Sensor D,27.9344,-82.5181
|
|
58
|
+
5,Sensor E,28.0080,-82.4271
|
|
59
|
+
6,Sensor F,27.9196,-82.3943
|
|
60
|
+
7,Sensor G,27.8772,-82.5236
|
|
61
|
+
8,Sensor H,28.0346,-82.4850
|
|
62
|
+
9,Sensor I,27.9712,-82.5489
|
|
63
|
+
10,Sensor J,27.9050,-82.4127
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Default `zone_wkt` (downtown Tampa bounding polygon):
|
|
67
|
+
```
|
|
68
|
+
POLYGON((-82.4650 27.9400, -82.4350 27.9400, -82.4350 27.9700, -82.4650 27.9700, -82.4650 27.9400))
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Phase 0: Verify Connection
|
|
74
|
+
|
|
75
|
+
Call `get_semantic_names` (no arguments) immediately.
|
|
76
|
+
|
|
77
|
+
- **If it succeeds:** proceed to Phase 1 using the returned namespaces.
|
|
78
|
+
- **If it fails:** emit structured failure and halt.
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
FAILURE: Toolbelt MCP connection is not established.
|
|
82
|
+
The MCP server must be connected before invoking this skill.
|
|
83
|
+
See: https://toolbelt.ai/docs/mcp for setup instructions.
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Phase 1: Resolve Namespace
|
|
89
|
+
|
|
90
|
+
Use the namespaces returned from Phase 0.
|
|
91
|
+
|
|
92
|
+
Resolution order:
|
|
93
|
+
1. If `namespace_id` was provided as a parameter, use it directly.
|
|
94
|
+
2. If only one namespace exists, use it.
|
|
95
|
+
3. If multiple exist and no `namespace_id` was specified, emit structured failure and halt.
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
FAILURE: Multiple namespaces found and none specified.
|
|
99
|
+
Available: [<list namespace display names and IDs>]
|
|
100
|
+
Re-invoke with namespace_id=<uuid>.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Store the resolved `namespace_id` — pass it to every subsequent tool call.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Phase 2: Upload Sensor Data
|
|
108
|
+
|
|
109
|
+
Upload the CSV as a document using `toolbelt_save`:
|
|
110
|
+
|
|
111
|
+
```json
|
|
112
|
+
{
|
|
113
|
+
"asset_type": "document",
|
|
114
|
+
"namespace_id": "<namespace_id>",
|
|
115
|
+
"name": "<asset_name or 'sensor-locations'>",
|
|
116
|
+
"file_name": "sensor-locations.csv",
|
|
117
|
+
"content": "<csv_content or default sample data above>",
|
|
118
|
+
"content_encoding": "text",
|
|
119
|
+
"data_format": "csv"
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Poll for ingestion
|
|
124
|
+
|
|
125
|
+
Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` every 10 seconds.
|
|
126
|
+
|
|
127
|
+
Wait for the `ingest` job to reach `completed`.
|
|
128
|
+
|
|
129
|
+
Typical duration: 15–60 seconds. Maximum wait: 3 minutes.
|
|
130
|
+
|
|
131
|
+
If the job reaches `failed` or the timeout elapses, emit structured failure and halt:
|
|
132
|
+
```
|
|
133
|
+
FAILURE: Sensor data ingestion did not complete.
|
|
134
|
+
Job status: <last observed status>
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
After completion, call `toolbelt_context` to retrieve the table name for the
|
|
138
|
+
uploaded asset. Store it as `sensor_table` for use in Phase 3.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Phase 3: Run Geospatial Queries
|
|
143
|
+
|
|
144
|
+
Run all three queries using `toolbelt_execute`. Pass `namespace_id` and `query`
|
|
145
|
+
for each call. Collect results.
|
|
146
|
+
|
|
147
|
+
**Note:** `ST_DISTANCE`, `ST_CONTAINS`, and `ST_MAKELINE` are Kinetica-native geospatial
|
|
148
|
+
functions available through Toolbelt's GPU-accelerated query engine. They are not standard
|
|
149
|
+
SQL and will not work against other databases.
|
|
150
|
+
|
|
151
|
+
### Query 1 — Pairwise Distance
|
|
152
|
+
|
|
153
|
+
`ST_DISTANCE(lat1, lon1, lat2, lon2)` → meters between two WGS-84 points.
|
|
154
|
+
|
|
155
|
+
```sql
|
|
156
|
+
SELECT
|
|
157
|
+
a.name AS sensor_a,
|
|
158
|
+
b.name AS sensor_b,
|
|
159
|
+
ROUND(ST_DISTANCE(a.lat, a.lon, b.lat, b.lon)) AS distance_m
|
|
160
|
+
FROM <sensor_table> a
|
|
161
|
+
JOIN <sensor_table> b ON a.id < b.id
|
|
162
|
+
ORDER BY distance_m ASC
|
|
163
|
+
LIMIT 10
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
Record: `distance_query_rows` (number of rows returned), `closest_pair` (sensor_a and sensor_b from the first row), `min_distance_m`.
|
|
167
|
+
|
|
168
|
+
### Query 2 — Point-in-Polygon
|
|
169
|
+
|
|
170
|
+
`ST_CONTAINS(wkt_polygon, lat, lon)` → 1 if point is inside the polygon.
|
|
171
|
+
|
|
172
|
+
```sql
|
|
173
|
+
SELECT
|
|
174
|
+
id,
|
|
175
|
+
name,
|
|
176
|
+
lat,
|
|
177
|
+
lon,
|
|
178
|
+
ST_CONTAINS('<zone_wkt>', lat, lon) AS in_zone
|
|
179
|
+
FROM <sensor_table>
|
|
180
|
+
WHERE ST_CONTAINS('<zone_wkt>', lat, lon) = 1
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Substitute `<zone_wkt>` with the provided or default WKT string.
|
|
184
|
+
|
|
185
|
+
Record: `in_zone_count` (number of sensors inside the polygon), `in_zone_sensors` (list of names).
|
|
186
|
+
|
|
187
|
+
### Query 3 — Track Line
|
|
188
|
+
|
|
189
|
+
`ST_MAKELINE(lat, lon ORDER BY id)` → linestring connecting all points in sequence.
|
|
190
|
+
|
|
191
|
+
```sql
|
|
192
|
+
SELECT
|
|
193
|
+
ST_ASTEXT(ST_MAKELINE(lat, lon ORDER BY id ASC)) AS track_wkt,
|
|
194
|
+
COUNT(*) AS point_count
|
|
195
|
+
FROM <sensor_table>
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Record: `track_point_count`, `track_wkt_excerpt` (first 120 chars of the WKT).
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
## Phase 4: Structured Output
|
|
203
|
+
|
|
204
|
+
After all queries complete, emit a single structured result:
|
|
205
|
+
|
|
206
|
+
```
|
|
207
|
+
RESULT:
|
|
208
|
+
namespace_id: <uuid>
|
|
209
|
+
sensor_table: <table name>
|
|
210
|
+
phases_run: [0, 1, 2, 3]
|
|
211
|
+
row_count: <total rows in sensor table>
|
|
212
|
+
|
|
213
|
+
distance_query:
|
|
214
|
+
rows_returned: <distance_query_rows>
|
|
215
|
+
closest_pair: "<sensor_a> → <sensor_b>"
|
|
216
|
+
min_distance_m: <min_distance_m>
|
|
217
|
+
|
|
218
|
+
point_in_polygon:
|
|
219
|
+
zone_wkt: "<zone_wkt used>"
|
|
220
|
+
in_zone_count: <count>
|
|
221
|
+
in_zone_sensors: [<names>]
|
|
222
|
+
|
|
223
|
+
track:
|
|
224
|
+
point_count: <track_point_count>
|
|
225
|
+
track_wkt_excerpt: "<first 120 chars>"
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
If any query fails, include `query_error: "<error>"` under that query's section
|
|
229
|
+
and continue with remaining queries. Only halt on Phase 0–2 failures.
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## Tool Reference
|
|
234
|
+
|
|
235
|
+
| Phase | Tool(s) |
|
|
236
|
+
|---|---|
|
|
237
|
+
| 0. Verify connection | `get_semantic_names` |
|
|
238
|
+
| 1. Resolve namespace | (from Phase 0 result) |
|
|
239
|
+
| 2. Upload sensor data | `toolbelt_save`, `toolbelt_jobs`, `toolbelt_context` |
|
|
240
|
+
| 3. Run geospatial queries | `toolbelt_execute` × 3 |
|
|
241
|
+
| 4. Emit result | (structured output) |
|
|
@@ -0,0 +1,354 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: knowledge-graph
|
|
3
|
+
description: >
|
|
4
|
+
Automatically extract entities and relationships from any document and explore
|
|
5
|
+
the results as a knowledge graph — no schema or ontology required. Toolbelt is
|
|
6
|
+
a multi-modal data platform combining SQL analytics, vector search, and
|
|
7
|
+
real-time streaming. Upload a document, then surface entities, relationship
|
|
8
|
+
types, and hidden connections via graph queries. Use when an AI agent needs to
|
|
9
|
+
turn unstructured text into structured, navigable knowledge without writing any
|
|
10
|
+
extraction code.
|
|
11
|
+
license: MIT
|
|
12
|
+
compatibility: >
|
|
13
|
+
Requires a Toolbelt account (provision free at https://toolbelt.ai) and an
|
|
14
|
+
MCP-compatible AI agent (Claude Code, Claude Desktop, or any client that
|
|
15
|
+
supports MCP server connections). MCP connection must be pre-established
|
|
16
|
+
before invocation.
|
|
17
|
+
metadata:
|
|
18
|
+
author: toolbeltai
|
|
19
|
+
version: "1.0"
|
|
20
|
+
openclaw:
|
|
21
|
+
emoji: "🕸️"
|
|
22
|
+
homepage: "https://toolbelt.ai/docs/knowledge-graph"
|
|
23
|
+
skillKey: "knowledge-graph"
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
Extract a knowledge graph from a document and explore it autonomously using
|
|
27
|
+
Toolbelt MCP tools. Work through each phase in order without prompting for
|
|
28
|
+
user input. On unrecoverable error, emit a structured failure and halt.
|
|
29
|
+
|
|
30
|
+
## When Not To Use
|
|
31
|
+
|
|
32
|
+
- For structured tabular data (CSV, SQL tables) — use `sql-analyst` instead.
|
|
33
|
+
- When entity and relationship extraction is not needed — use `sql-analyst` or `streaming-analyst` for the appropriate data type.
|
|
34
|
+
|
|
35
|
+
## Invocation Parameters
|
|
36
|
+
|
|
37
|
+
Extract these from the args string or conversation context before starting:
|
|
38
|
+
|
|
39
|
+
| Parameter | Required | Description |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
|
|
42
|
+
| `document_content` | No | Raw text to upload. Uses the embedded sample document if omitted. |
|
|
43
|
+
| `document_name` | No | Name for the document asset. Defaults to `kg-sample-doc`. |
|
|
44
|
+
| `cypher_query` | No | Custom Cypher query to run in Phase 5. Uses default discovery query if omitted. |
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Default Sample Document
|
|
49
|
+
|
|
50
|
+
If no `document_content` is provided, use the following text verbatim:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
NovaTech Industries: Company Overview
|
|
54
|
+
|
|
55
|
+
NovaTech Industries was founded in 2018 by Dr. Elena Vasquez and Marcus Okafor
|
|
56
|
+
in Austin, Texas. The company specializes in next-generation industrial automation
|
|
57
|
+
hardware and AI-driven process control systems.
|
|
58
|
+
|
|
59
|
+
Dr. Elena Vasquez, Chief Executive Officer, previously led R&D at Siemens Energy
|
|
60
|
+
before co-founding NovaTech. Marcus Okafor, Chief Technology Officer, holds three
|
|
61
|
+
patents in embedded sensor design and was a principal engineer at Honeywell
|
|
62
|
+
Automation prior to joining the venture.
|
|
63
|
+
|
|
64
|
+
NovaTech's flagship product, the Sentinel-X200, is an industrial sensor array
|
|
65
|
+
that monitors temperature, vibration, and chemical composition simultaneously.
|
|
66
|
+
The Sentinel-X200 is manufactured at NovaTech's production facility in San
|
|
67
|
+
Antonio, Texas, and has been deployed at over 140 client sites worldwide. Key
|
|
68
|
+
clients include Pacific Rim Petrochemicals in Singapore, Northern Grid Energy
|
|
69
|
+
in Oslo, Norway, and Delta Fabrication Group in Detroit, Michigan.
|
|
70
|
+
|
|
71
|
+
In 2021, NovaTech acquired Axon Micro Systems, a startup based in Boulder,
|
|
72
|
+
Colorado, founded by Dr. Priya Nair. Axon Micro specialized in MEMS-based
|
|
73
|
+
pressure sensors. Following the acquisition, Dr. Nair joined NovaTech as VP
|
|
74
|
+
of Sensor Innovation and relocated to the Austin headquarters.
|
|
75
|
+
|
|
76
|
+
NovaTech's second product line, the Argus Platform, provides real-time analytics
|
|
77
|
+
for industrial IoT deployments. The Argus Platform integrates directly with the
|
|
78
|
+
Sentinel-X200 and supports connectivity to Siemens SCADA systems, Rockwell
|
|
79
|
+
Automation ControlLogix PLCs, and ABB Ability industrial cloud. The Argus Platform
|
|
80
|
+
is sold as a subscription service and is managed from NovaTech's cloud operations
|
|
81
|
+
center in Phoenix, Arizona.
|
|
82
|
+
|
|
83
|
+
The company raised a $45 million Series B in 2022, led by Meridian Growth Capital
|
|
84
|
+
and co-invested by Cascade Ventures. The funding round was used to expand the
|
|
85
|
+
San Antonio manufacturing facility and open a European sales office in Frankfurt,
|
|
86
|
+
Germany, led by regional director Thomas Brandt.
|
|
87
|
+
|
|
88
|
+
NovaTech employs 320 people across its Austin headquarters, San Antonio plant,
|
|
89
|
+
Phoenix operations center, and Frankfurt office. The company reported $62 million
|
|
90
|
+
in revenue for fiscal year 2023, with 40% of revenue from international markets.
|
|
91
|
+
|
|
92
|
+
Key partnerships include a joint development agreement with MIT's Advanced
|
|
93
|
+
Materials Lab, overseen by Professor Alan Chen, to develop next-generation
|
|
94
|
+
ceramic sensor substrates. NovaTech also holds a preferred supplier agreement
|
|
95
|
+
with Broadcom for embedded processing chips used in the Sentinel-X200.
|
|
96
|
+
|
|
97
|
+
In 2024, NovaTech announced the Sentinel-X300, the next-generation successor to
|
|
98
|
+
the X200, featuring AI-based anomaly detection co-developed with researchers at
|
|
99
|
+
UT Austin. The X300 is scheduled for commercial release in Q3 2025 and will be
|
|
100
|
+
manufactured at a new facility in Round Rock, Texas.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Phase 0: Verify Connection
|
|
106
|
+
|
|
107
|
+
Call `get_semantic_names` (no arguments) immediately.
|
|
108
|
+
|
|
109
|
+
- **If it succeeds:** proceed to Phase 1 using the returned namespaces.
|
|
110
|
+
- **If it fails:** emit structured failure and halt.
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
FAILURE: Toolbelt MCP connection is not established.
|
|
114
|
+
The MCP server must be connected before invoking this skill.
|
|
115
|
+
See: https://toolbelt.ai/docs/mcp for setup instructions.
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Phase 1: Resolve Namespace
|
|
121
|
+
|
|
122
|
+
Use the namespaces returned from Phase 0.
|
|
123
|
+
|
|
124
|
+
Resolution order:
|
|
125
|
+
1. If `namespace_id` was provided as a parameter, use it directly.
|
|
126
|
+
2. If only one namespace exists, use it.
|
|
127
|
+
3. If multiple exist and no `namespace_id` was specified, emit structured failure and halt.
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
FAILURE: Multiple namespaces found and none specified.
|
|
131
|
+
Available: [<list namespace display names and IDs>]
|
|
132
|
+
Re-invoke with namespace_id=<uuid>.
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Store the resolved `namespace_id` — pass it to every subsequent tool call.
|
|
136
|
+
Never use the display name as the ID.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Phase 2: Upload Document
|
|
141
|
+
|
|
142
|
+
Resolve `document_content` (use parameter value or default sample above).
|
|
143
|
+
Resolve `document_name` (use parameter value or default `kg-sample-doc`).
|
|
144
|
+
|
|
145
|
+
Call `toolbelt_save`:
|
|
146
|
+
|
|
147
|
+
```json
|
|
148
|
+
{
|
|
149
|
+
"asset_type": "document",
|
|
150
|
+
"namespace_id": "<namespace_id>",
|
|
151
|
+
"name": "<document_name>",
|
|
152
|
+
"file_name": "document.txt",
|
|
153
|
+
"content": "<document_content>",
|
|
154
|
+
"content_encoding": "text"
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Record the returned `asset_id` for reference.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Phase 3: Poll for Entity Extraction
|
|
163
|
+
|
|
164
|
+
Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` every 10 seconds.
|
|
165
|
+
|
|
166
|
+
Both job stages must reach `completed`:
|
|
167
|
+
- `ingest` — document parsed and stored
|
|
168
|
+
- `semantic` — embeddings generated and GLiNER entity extraction complete
|
|
169
|
+
|
|
170
|
+
Typical duration: 30–120 seconds. Maximum wait: 5 minutes.
|
|
171
|
+
|
|
172
|
+
If either job reaches `failed` or the timeout elapses, emit structured failure and halt:
|
|
173
|
+
```
|
|
174
|
+
FAILURE: Entity extraction did not complete.
|
|
175
|
+
Job status: <last observed status for ingest and semantic jobs>
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Do not proceed to Phase 4 until **both** jobs are `completed`. The knowledge
|
|
179
|
+
graph is only available after the `semantic` job finishes — this is when GLiNER
|
|
180
|
+
runs and populates entity nodes and relationship edges.
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Phase 4: Check Graph Readiness
|
|
185
|
+
|
|
186
|
+
Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` and inspect the
|
|
187
|
+
job list for a `kg-rebuild` (or `graph_build` / `graph`) job entry.
|
|
188
|
+
|
|
189
|
+
**If a `kg-rebuild` job is found and `completed`:** set `graph_ready = true`.
|
|
190
|
+
|
|
191
|
+
**If a `kg-rebuild` job is found and `running`:** poll every 10 seconds, up to
|
|
192
|
+
2 minutes. If it completes in that window, set `graph_ready = true`. If it
|
|
193
|
+
times out, set `graph_ready = false` and emit the warning below.
|
|
194
|
+
|
|
195
|
+
**If no `kg-rebuild` job is found, or it is in `failed` / `pending` state:**
|
|
196
|
+
set `graph_ready = false` and emit this non-fatal warning — do NOT halt:
|
|
197
|
+
|
|
198
|
+
```
|
|
199
|
+
WARNING: kg-rebuild job has not run or did not complete for this namespace.
|
|
200
|
+
The Kinetica knowledge graph is unavailable.
|
|
201
|
+
Entity and relationship data will be surfaced from the vector/embedding store instead.
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Proceed to Phase 5 regardless of `graph_ready`.
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## Phase 5: Surface Entities
|
|
209
|
+
|
|
210
|
+
### Path A — Graph describe (when `graph_ready = true`)
|
|
211
|
+
|
|
212
|
+
Call `toolbelt_graph` with `operation: "describe"`:
|
|
213
|
+
|
|
214
|
+
```json
|
|
215
|
+
{
|
|
216
|
+
"operation": "describe",
|
|
217
|
+
"namespace_id": "<namespace_id>"
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
Parse the response to extract:
|
|
222
|
+
- `graph_name`: the name of the knowledge graph (required for Phase 6 Cypher)
|
|
223
|
+
- `entity_count`: total number of entity nodes extracted
|
|
224
|
+
- `relationship_count`: total number of relationship edges extracted
|
|
225
|
+
- `entity_types`: list of distinct entity type labels (e.g. PERSON, ORG, LOCATION, PRODUCT)
|
|
226
|
+
- `sample_entities`: up to 5 example entity names from the response
|
|
227
|
+
|
|
228
|
+
Store `graph_name` — it is required as a parameter and as the prefix in every Cypher query.
|
|
229
|
+
|
|
230
|
+
If `toolbelt_graph describe` fails or returns no graphs despite `graph_ready = true`,
|
|
231
|
+
reset `graph_ready = false` and continue with Path B.
|
|
232
|
+
|
|
233
|
+
### Path B — Vector store entity search (when `graph_ready = false`)
|
|
234
|
+
|
|
235
|
+
Run the following four `toolbelt_search` calls against the namespace to surface
|
|
236
|
+
entity mentions extracted during the `semantic` job:
|
|
237
|
+
|
|
238
|
+
```json
|
|
239
|
+
{ "namespace_id": "<namespace_id>", "query": "people and persons mentioned" }
|
|
240
|
+
{ "namespace_id": "<namespace_id>", "query": "organizations and companies" }
|
|
241
|
+
{ "namespace_id": "<namespace_id>", "query": "locations cities and places" }
|
|
242
|
+
{ "namespace_id": "<namespace_id>", "query": "products technologies and systems" }
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
From the combined results, collect:
|
|
246
|
+
- `entity_types`: the categories queried that returned results (`PERSON`, `ORG`, `LOCATION`, `PRODUCT`)
|
|
247
|
+
- `sample_entities`: up to 5 representative names surfaced across the results
|
|
248
|
+
- `entity_count`: approximate — set to the total number of distinct names found
|
|
249
|
+
- `relationship_count`: set to `null` (not available via this path)
|
|
250
|
+
- `graph_name`: set to `null`
|
|
251
|
+
|
|
252
|
+
Note in the RESULT that counts are approximate and sourced from vector search.
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## Phase 6: Explore Connections
|
|
257
|
+
|
|
258
|
+
### Path A — Cypher query (when `graph_ready = true` and `graph_name` is set)
|
|
259
|
+
|
|
260
|
+
**Important:** Kinetica Cypher requires the query to begin with
|
|
261
|
+
`GRAPH <graph_name> MATCH ...`. The `graph_name` must come from the Phase 5
|
|
262
|
+
`describe` response.
|
|
263
|
+
|
|
264
|
+
Attempt:
|
|
265
|
+
|
|
266
|
+
```json
|
|
267
|
+
{
|
|
268
|
+
"operation": "query",
|
|
269
|
+
"namespace_id": "<namespace_id>",
|
|
270
|
+
"graph_name": "<graph_name>",
|
|
271
|
+
"query": "GRAPH <graph_name> MATCH (a)-[r]->(b) RETURN a.name AS source, label(a) AS source_type, type(r) AS relationship, b.name AS target, label(b) AS target_type ORDER BY source LIMIT 25"
|
|
272
|
+
}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
If the Cypher query succeeds, record:
|
|
276
|
+
- `relationship_rows`: rows returned (first 10)
|
|
277
|
+
- `result_count`: total rows
|
|
278
|
+
- `query_method`: `"cypher"`
|
|
279
|
+
|
|
280
|
+
If Approach A fails (HTTP 5xx or zero rows), extract relationship edges from the
|
|
281
|
+
Phase 5 `describe` response (`relationships` or `edges` arrays) and record:
|
|
282
|
+
- `relationship_rows`: tuples extracted from `describe` (first 10)
|
|
283
|
+
- `result_count`: total relationships from Phase 5 `relationship_count`
|
|
284
|
+
- `query_method`: `"describe_fallback"`
|
|
285
|
+
|
|
286
|
+
### Path B — Vector relationship search (when `graph_ready = false`)
|
|
287
|
+
|
|
288
|
+
Run the following `toolbelt_search` calls to surface relationship-rich passages:
|
|
289
|
+
|
|
290
|
+
```json
|
|
291
|
+
{ "namespace_id": "<namespace_id>", "query": "founded by acquired partnership agreement" }
|
|
292
|
+
{ "namespace_id": "<namespace_id>", "query": "works for reports to leads manages" }
|
|
293
|
+
{ "namespace_id": "<namespace_id>", "query": "located in based in headquartered" }
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
From the combined results, extract the most informative relationship snippets.
|
|
297
|
+
Present them as plain-language connection summaries (not structured tuples) and record:
|
|
298
|
+
- `relationship_rows`: up to 10 plain-language relationship summaries
|
|
299
|
+
- `result_count`: number of summaries
|
|
300
|
+
- `query_method`: `"vector_search"`
|
|
301
|
+
|
|
302
|
+
---
|
|
303
|
+
|
|
304
|
+
## Phase 7: Structured Output
|
|
305
|
+
|
|
306
|
+
After all phases complete, emit a single structured result:
|
|
307
|
+
|
|
308
|
+
```
|
|
309
|
+
RESULT:
|
|
310
|
+
namespace_id: <uuid>
|
|
311
|
+
document_name: <document name uploaded>
|
|
312
|
+
phases_run: [0, 1, 2, 3, 4, 5, 6]
|
|
313
|
+
graph_ready: <true | false>
|
|
314
|
+
|
|
315
|
+
knowledge_graph:
|
|
316
|
+
entity_count: <integer, or "~N (approximate)" if vector path>
|
|
317
|
+
relationship_count: <integer, or null if vector path>
|
|
318
|
+
entity_types: [<list of type labels>]
|
|
319
|
+
sample_entities: [<up to 5 entity names>]
|
|
320
|
+
|
|
321
|
+
connections:
|
|
322
|
+
query_method: "<cypher | describe_fallback | vector_search>"
|
|
323
|
+
query_used: "<query executed or 'toolbelt_search (relationship queries)' if vector path>"
|
|
324
|
+
result_count: <integer>
|
|
325
|
+
rows:
|
|
326
|
+
- source: "<name>" relationship: "<type>" target: "<name>" # graph paths
|
|
327
|
+
- "<plain-language relationship summary>" # vector path
|
|
328
|
+
... (up to 10 rows)
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
If `graph_ready` was `false`, include a note:
|
|
332
|
+
```
|
|
333
|
+
note: "kg-rebuild job was not available; entity and relationship data sourced from vector/embedding store. Re-run after kg-rebuild completes for full graph traversal."
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
If Phase 5 or 6 produced partial data (e.g., counts present but no rows), include
|
|
337
|
+
what is available and note the gap. Only halt on Phase 0–3 failures.
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Tool Reference
|
|
342
|
+
|
|
343
|
+
| Phase | Tool(s) |
|
|
344
|
+
|---|---|
|
|
345
|
+
| 0. Verify connection | `get_semantic_names` |
|
|
346
|
+
| 1. Resolve namespace | (from Phase 0 result) |
|
|
347
|
+
| 2. Upload document | `toolbelt_save` |
|
|
348
|
+
| 3. Poll for extraction | `toolbelt_jobs` (ingest + semantic) |
|
|
349
|
+
| 4. Check graph readiness | `toolbelt_jobs` (kg-rebuild; sets `graph_ready` flag) |
|
|
350
|
+
| 5A. Describe entities (graph path) | `toolbelt_graph` (operation: describe) |
|
|
351
|
+
| 5B. Surface entities (vector path) | `toolbelt_search` (4 entity-type queries) |
|
|
352
|
+
| 6A. Explore connections (graph path) | `toolbelt_graph` (operation: query), fallback to describe |
|
|
353
|
+
| 6B. Explore connections (vector path) | `toolbelt_search` (3 relationship queries) |
|
|
354
|
+
| 7. Emit result | (structured output) |
|