@toolbeltai/skills 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,259 @@
1
+ ---
2
+ name: vector-search
3
+ description: >
4
+ Find semantically relevant passages from documents without keyword matching.
5
+ Toolbelt is a multi-modal data platform combining SQL analytics, vector search,
6
+ and real-time streaming. Upload a document, then retrieve the passages most
7
+ semantically similar to a natural language query. Use when an AI agent needs to
8
+ ground answers in source documents (RAG), find related content without exact
9
+ keyword matches, or rank passages by meaning rather than text overlap.
10
+ license: MIT
11
+ compatibility: >
12
+ Requires a Toolbelt account (provision free at https://toolbelt.ai) and an
13
+ MCP-compatible AI agent (Claude Code, Claude Desktop, or any client that
14
+ supports MCP server connections). MCP connection must be pre-established
15
+ before invocation.
16
+ metadata:
17
+ author: toolbeltai
18
+ version: "1.0"
19
+ openclaw:
20
+ emoji: "🔍"
21
+ homepage: "https://toolbelt.ai/docs/vectors"
22
+ skillKey: "vector-search"
23
+ ---
24
+
25
+ Upload a document and retrieve semantically similar passages using Toolbelt MCP
26
+ tools. Work through each phase in order without prompting for user input. On
27
+ unrecoverable error, emit a structured failure and halt.
28
+
29
+ ## When Not To Use
30
+
31
+ - For structured tabular data (CSV, SQL tables) — use `sql-analyst` instead.
32
+ - For aggregate queries, counts, or filtering by exact values — use `sql-analyst`; vector search ranks by meaning, not criteria.
33
+ - For entity and relationship extraction — use `knowledge-graph` instead.
34
+ - When you need a synthesized answer that may draw on SQL tables — use `sql-analyst` with `toolbelt_search` (hybrid routing) instead.
35
+
36
+ ## How This Differs From `toolbelt_search`
37
+
38
+ `toolbelt_vectors` is **pure semantic similarity search** — it returns ranked
39
+ document passages by embedding distance. `toolbelt_search` uses **hybrid
40
+ routing** and may execute SQL, vector search, or both depending on the question.
41
+ Use this skill when you specifically want passage retrieval from documents.
42
+
43
+ ## Invocation Parameters
44
+
45
+ Extract these from the args string or conversation context before starting:
46
+
47
+ | Parameter | Required | Description |
48
+ |---|---|---|
49
+ | `namespace_id` | No | UUID of target namespace. Auto-select if omitted and only one exists; fail if ambiguous. |
50
+ | `document_content` | No | Raw text to upload. Uses the embedded sample document if omitted. |
51
+ | `document_name` | No | Name for the document asset. Defaults to `vector-search-sample`. |
52
+ | `question` | No | Natural language query to search for. Defaults to `What are the effects on coastal ecosystems?` |
53
+ | `skip_upload` | No | Set to `true` to skip Phases 2–3 and search existing namespace content. |
54
+
55
+ ---
56
+
57
+ ## Default Sample Document
58
+
59
+ If no `document_content` is provided, use the following text verbatim:
60
+
61
+ ```
62
+ Global Climate Trends: 2024 Summary Report
63
+
64
+ Section 1: Surface Temperature Changes
65
+ Average global surface temperatures rose 1.4°C above pre-industrial levels in
66
+ 2023, continuing a decades-long trend. The ten hottest years on record have all
67
+ occurred since 2010. Heat waves in Europe and North America broke records in
68
+ duration and intensity. Urban heat islands amplified these effects in densely
69
+ populated areas, with some cities recording nighttime lows 5°C above surrounding
70
+ rural areas.
71
+
72
+ Section 2: Sea Level and Ocean Systems
73
+ Global mean sea level rose 4.2mm in 2023, driven by thermal expansion and
74
+ accelerating ice sheet melt in Greenland and West Antarctica. Ocean acidity
75
+ increased 0.1 pH units since 1990, threatening calcifying marine organisms
76
+ including coral and shellfish. The Atlantic Meridional Overturning Circulation
77
+ showed continued weakening, with potential implications for European climate
78
+ stability and North Atlantic fisheries.
79
+
80
+ Section 3: Biodiversity and Ecosystem Impacts
81
+ Species range shifts accelerated as organisms tracked suitable climate envelopes
82
+ poleward and to higher elevations. Coral bleaching events affected over 60% of
83
+ the Great Barrier Reef for the fourth consecutive year. Migratory bird species
84
+ showed timing mismatches with peak insect abundance, reducing breeding success.
85
+ Boreal forest die-offs from drought stress and bark beetle outbreaks expanded
86
+ across Canada and Siberia, releasing stored carbon and reducing canopy cover.
87
+
88
+ Section 4: Freshwater Availability
89
+ Glacial retreat reduced dry-season freshwater availability for approximately 2
90
+ billion people dependent on glacial meltwater. Extended droughts in the American
91
+ Southwest and Mediterranean region drove groundwater depletion and crop failures.
92
+ Conversely, increased atmospheric moisture intensified precipitation events,
93
+ causing flooding in traditionally dry regions of sub-Saharan Africa and South Asia.
94
+
95
+ Section 5: Policy and Emissions Trajectories
96
+ Global CO2 emissions reached 37.4 billion metric tons in 2023, a record high
97
+ despite rapid renewable energy deployment. Solar and wind capacity additions
98
+ outpaced projections, but total energy demand growth offset efficiency gains.
99
+ Carbon capture projects remained far below the scale required by IPCC scenarios.
100
+ National commitments under the Paris Agreement, if fully implemented, are
101
+ projected to limit warming to 2.5°C — above the 1.5°C target.
102
+ ```
103
+
104
+ Default `question`: `What are the effects on coastal ecosystems?`
105
+
106
+ ---
107
+
108
+ ## Phase 0: Verify Connection
109
+
110
+ Call `get_semantic_names` (no arguments) immediately.
111
+
112
+ - **If it succeeds:** proceed to Phase 1 using the returned namespaces.
113
+ - **If it fails:** emit structured failure and halt.
114
+
115
+ ```
116
+ FAILURE: Toolbelt MCP connection is not established.
117
+ The MCP server must be connected before invoking this skill.
118
+ See: https://toolbelt.ai/docs/mcp for setup instructions.
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Phase 1: Resolve Namespace
124
+
125
+ Use the namespaces returned from Phase 0.
126
+
127
+ Resolution order:
128
+ 1. If `namespace_id` was provided as a parameter, use it directly.
129
+ 2. If only one namespace exists, use it.
130
+ 3. If multiple exist and no `namespace_id` was specified, emit structured failure and halt.
131
+
132
+ ```
133
+ FAILURE: Multiple namespaces found and none specified.
134
+ Available: [<list namespace display names and IDs>]
135
+ Re-invoke with namespace_id=<uuid>.
136
+ ```
137
+
138
+ Store the resolved `namespace_id` — pass it to every subsequent tool call.
139
+
140
+ ---
141
+
142
+ ## Phase 2: Upload Document
143
+
144
+ Skip this phase if `skip_upload` is `true`.
145
+
146
+ Resolve `document_content` (use parameter value or default sample above).
147
+ Resolve `document_name` (use parameter value or default `vector-search-sample`).
148
+
149
+ Call `toolbelt_save`:
150
+
151
+ ```json
152
+ {
153
+ "asset_type": "document",
154
+ "namespace_id": "<namespace_id>",
155
+ "name": "<document_name>",
156
+ "file_name": "document.txt",
157
+ "content": "<document_content>",
158
+ "content_encoding": "text"
159
+ }
160
+ ```
161
+
162
+ Record the returned `asset_id`.
163
+
164
+ ---
165
+
166
+ ## Phase 3: Poll for Semantic Indexing
167
+
168
+ Skip this phase if `skip_upload` is `true`.
169
+
170
+ Call `toolbelt_jobs` with `{ "namespace_id": "<namespace_id>" }` every 10 seconds.
171
+
172
+ **Both** job stages must reach `completed` before proceeding:
173
+ - `ingest` — document parsed and stored
174
+ - `semantic` — embeddings generated and vector index populated
175
+
176
+ Vector search requires the `semantic` job to complete. Searching before it
177
+ finishes will return zero results.
178
+
179
+ Typical duration: 30–120 seconds. Maximum wait: 5 minutes.
180
+
181
+ If either job reaches `failed` or the timeout elapses, emit structured failure and halt:
182
+ ```
183
+ FAILURE: Semantic indexing did not complete.
184
+ Job status: <last observed status for ingest and semantic jobs>
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Phase 4: Run Vector Search
190
+
191
+ Resolve `question` (use parameter value or default).
192
+
193
+ Call `toolbelt_vectors`:
194
+
195
+ ```json
196
+ {
197
+ "namespace_id": "<namespace_id>",
198
+ "question": "<question>"
199
+ }
200
+ ```
201
+
202
+ Parse the response:
203
+ - `results`: array of passage objects. Each passage may contain `text`, `content`, or `excerpt`.
204
+ - `result_count`: total number of results returned.
205
+ - `top_result`: the first (highest-ranked) passage — extract up to 300 characters.
206
+
207
+ If the call returns zero results and `skip_upload` was `true`, the namespace may
208
+ not contain any documents with semantic indexes. Emit a structured failure:
209
+ ```
210
+ FAILURE: toolbelt_vectors returned zero results.
211
+ The namespace may not contain any semantically indexed documents.
212
+ Re-invoke without skip_upload=true to upload and index a document first.
213
+ ```
214
+
215
+ If the call returns zero results after a fresh upload (Phases 2–3 completed),
216
+ emit structured failure and halt:
217
+ ```
218
+ FAILURE: toolbelt_vectors returned zero results after indexing completed.
219
+ namespace_id: <uuid>
220
+ asset_id: <asset_id>
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Phase 5: Structured Output
226
+
227
+ After all phases complete, emit a single structured result:
228
+
229
+ ```
230
+ RESULT:
231
+ namespace_id: <uuid>
232
+ document_name: <name of uploaded document, or "existing namespace content" if skip_upload>
233
+ question: "<question asked>"
234
+ phases_run: [0, 1, 2, 3, 4] # or [0, 1, 4] if skip_upload
235
+
236
+ vector_search:
237
+ result_count: <integer>
238
+ top_result: |
239
+ <first ~300 chars of the highest-ranked passage>
240
+ all_results:
241
+ - rank: 1
242
+ excerpt: "<first ~150 chars>"
243
+ - rank: 2
244
+ excerpt: "<first ~150 chars>"
245
+ ... (up to 5 results)
246
+ ```
247
+
248
+ ---
249
+
250
+ ## Tool Reference
251
+
252
+ | Phase | Tool(s) |
253
+ |---|---|
254
+ | 0. Verify connection | `get_semantic_names` |
255
+ | 1. Resolve namespace | (from Phase 0 result) |
256
+ | 2. Upload document | `toolbelt_save` |
257
+ | 3. Poll for indexing | `toolbelt_jobs` |
258
+ | 4. Run vector search | `toolbelt_vectors` |
259
+ | 5. Emit result | (structured output) |