@tyvm/knowhow 0.0.109 → 0.0.110
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/autodoc/README.md +324 -0
- package/autodoc/chat-guide.md +268 -365
- package/autodoc/cli-reference.md +399 -473
- package/autodoc/config-reference.md +431 -330
- package/autodoc/embeddings-guide.md +223 -322
- package/autodoc/generate-guide.md +261 -301
- package/autodoc/language-plugin-guide.md +221 -247
- package/autodoc/modules-guide.md +242 -215
- package/autodoc/plugins-guide.md +470 -469
- package/autodoc/quickstart-guide.md +67 -70
- package/autodoc/skills-guide.md +455 -339
- package/autodoc/worker-guide.md +301 -308
- package/package.json +1 -1
- package/scripts/build-for-node.sh +10 -24
- package/src/agents/tools/list.ts +2 -2
- package/src/ai.ts +81 -37
- package/src/chat/CliChatService.ts +1 -1
- package/src/chat/modules/AgentModule.ts +7 -2
- package/src/chat/modules/SessionsModule.ts +40 -1
- package/src/chat/modules/SystemModule.ts +2 -2
- package/src/clients/anthropic.ts +1 -1
- package/src/clients/index.ts +25 -6
- package/src/clients/openai.ts +8 -5
- package/src/clients/types.ts +29 -6
- package/src/clients/withRetry.ts +89 -0
- package/src/commands/agent.ts +30 -0
- package/src/commands/modules.ts +417 -47
- package/src/config.ts +1 -1
- package/src/fileSync.ts +20 -12
- package/src/hashes.ts +43 -22
- package/src/index.ts +4 -2
- package/src/processors/Base64ImageDetector.ts +73 -0
- package/src/services/MediaProcessorService.ts +79 -10
- package/src/services/modules/index.ts +47 -18
- package/tests/processors/Base64ImageDetector.test.ts +160 -0
- package/tests/unit/clients/AIClient.test.ts +446 -0
- package/tests/unit/clients/withRetry.test.ts +319 -0
- package/tests/unit/commands/github-credentials.test.ts +1 -2
- package/ts_build/package.json +1 -1
- package/ts_build/src/agents/tools/list.js +2 -2
- package/ts_build/src/agents/tools/list.js.map +1 -1
- package/ts_build/src/ai.d.ts +3 -3
- package/ts_build/src/ai.js +51 -23
- package/ts_build/src/ai.js.map +1 -1
- package/ts_build/src/chat/CliChatService.js +1 -1
- package/ts_build/src/chat/CliChatService.js.map +1 -1
- package/ts_build/src/chat/modules/AgentModule.js +5 -2
- package/ts_build/src/chat/modules/AgentModule.js.map +1 -1
- package/ts_build/src/chat/modules/SessionsModule.js +30 -1
- package/ts_build/src/chat/modules/SessionsModule.js.map +1 -1
- package/ts_build/src/chat/modules/SystemModule.js +2 -2
- package/ts_build/src/chat/modules/SystemModule.js.map +1 -1
- package/ts_build/src/clients/anthropic.js +1 -1
- package/ts_build/src/clients/anthropic.js.map +1 -1
- package/ts_build/src/clients/index.js +7 -6
- package/ts_build/src/clients/index.js.map +1 -1
- package/ts_build/src/clients/openai.js +4 -4
- package/ts_build/src/clients/openai.js.map +1 -1
- package/ts_build/src/clients/types.d.ts +12 -6
- package/ts_build/src/clients/withRetry.d.ts +2 -0
- package/ts_build/src/clients/withRetry.js +60 -0
- package/ts_build/src/clients/withRetry.js.map +1 -0
- package/ts_build/src/commands/agent.js +25 -0
- package/ts_build/src/commands/agent.js.map +1 -1
- package/ts_build/src/commands/modules.js +359 -32
- package/ts_build/src/commands/modules.js.map +1 -1
- package/ts_build/src/config.js +1 -1
- package/ts_build/src/config.js.map +1 -1
- package/ts_build/src/fileSync.d.ts +2 -2
- package/ts_build/src/fileSync.js +13 -11
- package/ts_build/src/fileSync.js.map +1 -1
- package/ts_build/src/hashes.d.ts +2 -2
- package/ts_build/src/hashes.js +40 -16
- package/ts_build/src/hashes.js.map +1 -1
- package/ts_build/src/index.js +1 -1
- package/ts_build/src/index.js.map +1 -1
- package/ts_build/src/processors/Base64ImageDetector.d.ts +3 -0
- package/ts_build/src/processors/Base64ImageDetector.js +42 -0
- package/ts_build/src/processors/Base64ImageDetector.js.map +1 -1
- package/ts_build/src/services/MediaProcessorService.d.ts +5 -4
- package/ts_build/src/services/MediaProcessorService.js +53 -8
- package/ts_build/src/services/MediaProcessorService.js.map +1 -1
- package/ts_build/src/services/modules/index.js +35 -12
- package/ts_build/src/services/modules/index.js.map +1 -1
- package/ts_build/tests/processors/Base64ImageDetector.test.js +111 -0
- package/ts_build/tests/processors/Base64ImageDetector.test.js.map +1 -1
- package/ts_build/tests/unit/clients/AIClient.test.d.ts +1 -0
- package/ts_build/tests/unit/clients/AIClient.test.js +339 -0
- package/ts_build/tests/unit/clients/AIClient.test.js.map +1 -0
- package/ts_build/tests/unit/clients/withRetry.test.d.ts +1 -0
- package/ts_build/tests/unit/clients/withRetry.test.js +225 -0
- package/ts_build/tests/unit/clients/withRetry.test.js.map +1 -0
- package/ts_build/tests/unit/commands/github-credentials.test.js +1 -2
- package/ts_build/tests/unit/commands/github-credentials.test.js.map +1 -1
|
@@ -1,213 +1,159 @@
|
|
|
1
1
|
# Embeddings Guide (Knowhow CLI)
|
|
2
2
|
|
|
3
|
-
Embeddings are the backbone of Knowhow’s semantic search. Instead of
|
|
3
|
+
Embeddings are the backbone of Knowhow’s semantic search. Instead of matching exact keywords, Knowhow turns **text chunks** into **vectors** (arrays of numbers) and later searches by **meaning** using vector similarity.
|
|
4
4
|
|
|
5
|
-
This guide explains how to generate, configure, store, and use embeddings
|
|
5
|
+
This guide explains how to generate, configure, store, and use embeddings with the Knowhow CLI.
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
9
|
## 1) What embeddings are
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
**Embeddings** are **vector representations** of text.
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
In Knowhow:
|
|
14
14
|
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
-
|
|
15
|
+
- Your inputs are converted into text (for files: `convertToText(filePath)`).
|
|
16
|
+
- The text is split into **chunks** (optional `chunkSize`).
|
|
17
|
+
- Each chunk is embedded by an embedding model, producing a `vector: number[]`.
|
|
18
|
+
- The result is stored in a local JSON file with entries shaped like:
|
|
19
19
|
|
|
20
20
|
```json
|
|
21
21
|
{
|
|
22
|
-
"id": "chunk-id",
|
|
23
|
-
"text": "chunk content (
|
|
22
|
+
"id": "some-chunk-id",
|
|
23
|
+
"text": "chunk content (or summarized content)",
|
|
24
24
|
"vector": [0.0123, -0.0045, ...],
|
|
25
25
|
"metadata": {
|
|
26
|
-
"
|
|
26
|
+
"filepath": "...",
|
|
27
|
+
"date": "2026-05-23T..."
|
|
27
28
|
}
|
|
28
29
|
}
|
|
29
30
|
```
|
|
30
31
|
|
|
31
|
-
### Chunk IDs
|
|
32
|
-
- If `chunkSize` is set, Knowhow typically uses:
|
|
33
|
-
- `id-index` (e.g. `path/to/file.ts-3`)
|
|
34
|
-
- If `chunkSize` is not set, it may keep the original `id`.
|
|
32
|
+
### Chunk IDs and pruning behavior
|
|
35
33
|
|
|
36
|
-
Knowhow
|
|
34
|
+
Knowhow assigns chunk IDs like:
|
|
37
35
|
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
## 2) `knowhow embed` (generate embeddings)
|
|
36
|
+
- If `chunkSize` is set: `chunkId = "${id}-${chunkIndex}"`
|
|
37
|
+
- If the chunk ID already ends with a numeric suffix, it won’t be re-suffixed.
|
|
41
38
|
|
|
42
|
-
|
|
39
|
+
It also **prunes old chunks**: any existing chunk under the same base `id` that is not part of the newly generated chunk set is removed from the embeddings JSON.
|
|
43
40
|
|
|
44
|
-
|
|
45
|
-
knowhow embed
|
|
46
|
-
```
|
|
41
|
+
---
|
|
47
42
|
|
|
48
|
-
|
|
43
|
+
## 2) `knowhow embed` — generate embeddings
|
|
49
44
|
|
|
50
|
-
|
|
51
|
-
2. For each entry in `embedSources`, embed content into the configured `.json` output file(s)
|
|
52
|
-
3. Save updated embeddings locally under paths like:
|
|
53
|
-
- `.knowhow/embeddings/docs.json`
|
|
54
|
-
- `.knowhow/embeddings/code.json`
|
|
45
|
+
The CLI’s embedding generation is driven by your `embedSources` configuration.
|
|
55
46
|
|
|
56
|
-
|
|
47
|
+
`knowhow embed`:
|
|
57
48
|
|
|
58
|
-
|
|
49
|
+
1. Loads `.knowhow/knowhow.json`
|
|
50
|
+
2. Reads `config.embeddingModel` (fallback: OpenAI Ada v2)
|
|
51
|
+
3. For each entry in `config.embedSources`, runs embedding generation and writes the result to `embedSources[].output`
|
|
59
52
|
|
|
60
|
-
|
|
53
|
+
### Example config (local embeddings for docs + code)
|
|
61
54
|
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```json
|
|
55
|
+
```jsonc
|
|
65
56
|
{
|
|
66
|
-
"
|
|
57
|
+
"embeddingModel": "openai.EmbeddingAda2",
|
|
58
|
+
"embedSources": [
|
|
59
|
+
{
|
|
60
|
+
"input": ".knowhow/docs/**/*.mdx",
|
|
61
|
+
"output": ".knowhow/embeddings/docs.json",
|
|
62
|
+
"prompt": "BasicEmbeddingExplainer",
|
|
63
|
+
"chunkSize": 2000
|
|
64
|
+
},
|
|
65
|
+
{
|
|
66
|
+
"input": "src/**/*.ts",
|
|
67
|
+
"output": ".knowhow/embeddings/code.json",
|
|
68
|
+
"chunkSize": 2000
|
|
69
|
+
}
|
|
70
|
+
]
|
|
67
71
|
}
|
|
68
72
|
```
|
|
69
73
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
### `input` (required)
|
|
73
|
-
**Glob pattern** or a direct input string (depending on `kind`).
|
|
74
|
-
|
|
75
|
-
- If `kind` is `"file"` (default), Knowhow globs the filesystem:
|
|
76
|
-
- `input: ".knowhow/docs/**/*.mdx"`
|
|
77
|
-
- For non-`file` kinds, Knowhow may treat `input` as a single input value.
|
|
78
|
-
|
|
79
|
-
### `output` (required)
|
|
80
|
-
Path where the generated embeddings JSON file is saved.
|
|
81
|
-
|
|
82
|
-
Example:
|
|
83
|
-
- `.knowhow/embeddings/docs.json`
|
|
74
|
+
Then run:
|
|
84
75
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
- Each element includes `id`, `text`, `vector`, `metadata`
|
|
88
|
-
|
|
89
|
-
### `chunkSize` (optional)
|
|
90
|
-
How many **characters per chunk**.
|
|
91
|
-
|
|
92
|
-
- Default in the template config is `2000`
|
|
93
|
-
- If provided, chunk IDs include `-index`
|
|
94
|
-
|
|
95
|
-
### `minLength` (optional)
|
|
96
|
-
Skip chunks shorter than this number of characters.
|
|
97
|
-
|
|
98
|
-
Implementation detail:
|
|
99
|
-
```ts
|
|
100
|
-
const tooShort = minLength && textOfChunk.length < minLength;
|
|
76
|
+
```bash
|
|
77
|
+
knowhow embed
|
|
101
78
|
```
|
|
102
79
|
|
|
103
|
-
|
|
104
|
-
If set, Knowhow transforms each chunk *before* embedding by summarizing it with a prompt.
|
|
105
|
-
|
|
106
|
-
- The prompt is loaded via `summarizeTexts([textOfChunk], prompt)`
|
|
107
|
-
- Metadata stores extra text when a prompt is used:
|
|
108
|
-
- `metadata.text` is set to the original chunking output (see code path)
|
|
80
|
+
---
|
|
109
81
|
|
|
110
|
-
|
|
111
|
-
- compress long chunks
|
|
112
|
-
- standardize content for better retrieval
|
|
113
|
-
- emphasize relevant information
|
|
82
|
+
## 3) `embedSources` config
|
|
114
83
|
|
|
115
|
-
|
|
116
|
-
Controls how the input is interpreted.
|
|
84
|
+
Each `embedSources[]` entry controls **what** to embed, **how** to chunk/transform, and **where** to store the resulting embedding JSON.
|
|
117
85
|
|
|
118
|
-
|
|
119
|
-
- Supported patterns:
|
|
120
|
-
- `"file"`: embed content from files on disk (converted to text)
|
|
121
|
-
- `"text"`: embed a provided text string
|
|
122
|
-
- Other kinds: typically handled by embeddings plugins (see “Special input kinds” below)
|
|
86
|
+
### Supported fields (from code)
|
|
123
87
|
|
|
124
|
-
|
|
88
|
+
| Field | Type | What it does |
|
|
89
|
+
|---|---:|---|
|
|
90
|
+
| `input` | string | Glob pattern (or special kind input) describing what to embed |
|
|
91
|
+
| `output` | string | Path to the `.json` embeddings file |
|
|
92
|
+
| `chunkSize` | number | Split text into chunks of this many characters (template default is commonly `2000`) |
|
|
93
|
+
| `minLength` | number | Skip chunks shorter than this many characters |
|
|
94
|
+
| `prompt` | string | Optional prompt name/string used to *summarize/transform* each chunk before embedding |
|
|
95
|
+
| `kind` | string | Embedding strategy kind (default: `"file"`). Can also be a plugin kind like `asana`, `github`, `url`, etc. |
|
|
125
96
|
|
|
126
|
-
###
|
|
97
|
+
### Scenario: embed file globs (`kind` defaults to `"file"`)
|
|
127
98
|
|
|
128
|
-
```
|
|
99
|
+
```jsonc
|
|
129
100
|
{
|
|
130
|
-
"embeddingModel": "text-embedding-ada-002",
|
|
131
101
|
"embedSources": [
|
|
132
102
|
{
|
|
133
|
-
"
|
|
134
|
-
"input": ".knowhow/docs/**/*.mdx",
|
|
103
|
+
"input": "docs/**/*.md",
|
|
135
104
|
"output": ".knowhow/embeddings/docs.json",
|
|
136
|
-
"
|
|
137
|
-
"
|
|
105
|
+
"chunkSize": 2000,
|
|
106
|
+
"minLength": 50
|
|
138
107
|
}
|
|
139
108
|
]
|
|
140
109
|
}
|
|
141
110
|
```
|
|
142
111
|
|
|
143
|
-
|
|
112
|
+
### Scenario: embed text files by using `kind: "text"`
|
|
144
113
|
|
|
145
|
-
|
|
114
|
+
If `kind` is `"text"`, Knowhow treats the source input as raw text and embeds it as a single item (no file globbing).
|
|
146
115
|
|
|
147
|
-
```
|
|
116
|
+
```jsonc
|
|
148
117
|
{
|
|
149
118
|
"embedSources": [
|
|
150
119
|
{
|
|
151
|
-
"kind": "
|
|
152
|
-
"input": "
|
|
153
|
-
"output": ".knowhow/embeddings/
|
|
154
|
-
"chunkSize": 2000,
|
|
155
|
-
"minLength": 200
|
|
120
|
+
"kind": "text",
|
|
121
|
+
"input": "This is the content I want embedded",
|
|
122
|
+
"output": ".knowhow/embeddings/notes.json"
|
|
156
123
|
}
|
|
157
124
|
]
|
|
158
125
|
}
|
|
159
126
|
```
|
|
160
127
|
|
|
161
|
-
|
|
128
|
+
### Scenario: transform before embedding with `prompt`
|
|
162
129
|
|
|
163
|
-
|
|
130
|
+
When `prompt` is provided, Knowhow calls a summarization step before generating vectors.
|
|
164
131
|
|
|
165
|
-
```
|
|
132
|
+
```jsonc
|
|
166
133
|
{
|
|
167
134
|
"embedSources": [
|
|
168
135
|
{
|
|
169
|
-
"
|
|
170
|
-
"
|
|
171
|
-
"
|
|
172
|
-
"chunkSize":
|
|
173
|
-
"minLength": 1
|
|
136
|
+
"input": "src/**/*.ts",
|
|
137
|
+
"output": ".knowhow/embeddings/code.json",
|
|
138
|
+
"prompt": "BasicEmbeddingExplainer",
|
|
139
|
+
"chunkSize": 2000
|
|
174
140
|
}
|
|
175
141
|
]
|
|
176
142
|
}
|
|
177
143
|
```
|
|
178
144
|
|
|
179
|
-
>
|
|
180
|
-
|
|
181
|
-
---
|
|
182
|
-
|
|
183
|
-
## 4) Embedding models (`embeddingModel`)
|
|
145
|
+
> Tip: The prompt is loaded via `loadPrompt(promptName)` which supports either a prompt name (from `.knowhow/prompts/*.mdx`) or a direct prompt string.
|
|
184
146
|
|
|
185
|
-
|
|
147
|
+
### Scenario: skip very small chunks with `minLength`
|
|
186
148
|
|
|
187
|
-
|
|
188
|
-
- `text-embedding-ada-002`
|
|
189
|
-
|
|
190
|
-
Supported models in the codebase:
|
|
191
|
-
|
|
192
|
-
### OpenAI embedding models
|
|
193
|
-
- `text-embedding-ada-002` (`EmbeddingAda2`)
|
|
194
|
-
- `text-embedding-3-small` (`EmbeddingSmall3`)
|
|
195
|
-
- `text-embedding-3-large` (`EmbeddingLarge3`)
|
|
196
|
-
|
|
197
|
-
### Google embedding models
|
|
198
|
-
- `gemini-embedding-exp` (`Gemini_Embedding`)
|
|
199
|
-
- `gemini-embedding-001` (`Gemini_Embedding_001`)
|
|
200
|
-
|
|
201
|
-
Example config:
|
|
202
|
-
|
|
203
|
-
```json
|
|
149
|
+
```jsonc
|
|
204
150
|
{
|
|
205
|
-
"embeddingModel": "text-embedding-3-small",
|
|
206
151
|
"embedSources": [
|
|
207
152
|
{
|
|
208
153
|
"input": "docs/**/*.md",
|
|
209
154
|
"output": ".knowhow/embeddings/docs.json",
|
|
210
|
-
"chunkSize": 2000
|
|
155
|
+
"chunkSize": 2000,
|
|
156
|
+
"minLength": 120
|
|
211
157
|
}
|
|
212
158
|
]
|
|
213
159
|
}
|
|
@@ -215,119 +161,108 @@ Example config:
|
|
|
215
161
|
|
|
216
162
|
---
|
|
217
163
|
|
|
218
|
-
##
|
|
219
|
-
|
|
220
|
-
Knowhow can store the generated embeddings JSON remotely using `remote` and `remoteType` in each `embedSources` entry.
|
|
221
|
-
|
|
222
|
-
### A) Upload to S3: `remoteType: "s3"`
|
|
223
|
-
**Config**
|
|
224
|
-
- `remote`: S3 bucket name
|
|
225
|
-
- `output`: local path to the `.json` embeddings file to upload
|
|
226
|
-
|
|
227
|
-
Upload behavior (from `knowhow upload`):
|
|
228
|
-
- uploads `source.output` to `${bucket}/${embeddingName}.json` (where `embeddingName` is derived from the local filename)
|
|
164
|
+
## 4) Embedding models (`embeddingModel`)
|
|
229
165
|
|
|
230
|
-
|
|
166
|
+
Your embedding model is configured via:
|
|
231
167
|
|
|
232
|
-
```
|
|
168
|
+
```jsonc
|
|
233
169
|
{
|
|
234
|
-
"
|
|
235
|
-
{
|
|
236
|
-
"input": ".knowhow/docs/**/*.mdx",
|
|
237
|
-
"output": ".knowhow/embeddings/docs.json",
|
|
238
|
-
"remoteType": "s3",
|
|
239
|
-
"remote": "my-knowhow-embeddings",
|
|
240
|
-
"chunkSize": 2000
|
|
241
|
-
}
|
|
242
|
-
]
|
|
170
|
+
"embeddingModel": "openai.EmbeddingAda2"
|
|
243
171
|
}
|
|
244
172
|
```
|
|
245
173
|
|
|
246
|
-
|
|
247
|
-
|
|
174
|
+
Knowhow passes `embeddingModel` directly to the embedding provider client when creating embeddings.
|
|
175
|
+
|
|
176
|
+
### Supported models
|
|
177
|
+
|
|
178
|
+
The code exports embedding model sets under:
|
|
179
|
+
|
|
180
|
+
- `EmbeddingModels.openai.*`
|
|
181
|
+
- `EmbeddingModels.google.*`
|
|
182
|
+
|
|
183
|
+
So supported values are those available in `EmbeddingModels.openai` and `EmbeddingModels.google` in your installed Knowhow version.
|
|
184
|
+
|
|
185
|
+
> If you also use remote uploads to Knowhow Cloud, be aware that **vectors generated with different models are not comparable**. Knowhow warns on upload when local `embeddingModel` differs from the backend’s stored model.
|
|
248
186
|
|
|
249
187
|
---
|
|
250
188
|
|
|
251
|
-
|
|
252
|
-
The downloader supports `remoteType: "github"` (implemented in `knowhow download`).
|
|
189
|
+
## 5) Remote storage options
|
|
253
190
|
|
|
254
|
-
|
|
255
|
-
- downloads `".knowhow/embeddings/<fileName>.json"` from a configured GitHub remote into `destinationPath`
|
|
191
|
+
Embeddings can be uploaded to remote storage using `knowhow upload`.
|
|
256
192
|
|
|
257
|
-
|
|
193
|
+
Your `embedSources[]` entry must specify:
|
|
258
194
|
|
|
259
|
-
|
|
195
|
+
- `remote`: destination identifier (varies by remote type)
|
|
196
|
+
- `remoteType`: which backend to use
|
|
197
|
+
- optionally `remoteId` (required for `remoteType: "knowhow"`)
|
|
198
|
+
|
|
199
|
+
### A) Upload to S3 (`remoteType: "s3"`)
|
|
200
|
+
|
|
201
|
+
```jsonc
|
|
260
202
|
{
|
|
261
203
|
"embedSources": [
|
|
262
204
|
{
|
|
263
205
|
"input": "src/**/*.ts",
|
|
264
206
|
"output": ".knowhow/embeddings/code.json",
|
|
265
|
-
"
|
|
266
|
-
|
|
207
|
+
"chunkSize": 2000,
|
|
208
|
+
|
|
209
|
+
"remoteType": "s3",
|
|
210
|
+
"remote": "my-embeddings-bucket"
|
|
267
211
|
}
|
|
268
212
|
]
|
|
269
213
|
}
|
|
270
214
|
```
|
|
271
215
|
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
---
|
|
216
|
+
Run:
|
|
275
217
|
|
|
276
|
-
|
|
277
|
-
|
|
218
|
+
```bash
|
|
219
|
+
knowhow embed
|
|
220
|
+
knowhow upload
|
|
221
|
+
```
|
|
278
222
|
|
|
279
|
-
|
|
280
|
-
-
|
|
281
|
-
|
|
282
|
-
- then syncs embedding metadata back to the backend DB
|
|
223
|
+
How it maps:
|
|
224
|
+
- Local `output` JSON is uploaded as something like:
|
|
225
|
+
`bucketName/embeddingName.json`
|
|
283
226
|
|
|
284
|
-
|
|
227
|
+
### B) Upload to GitHub via git LFS (`remoteType: "github"`)
|
|
285
228
|
|
|
286
|
-
```
|
|
229
|
+
```jsonc
|
|
287
230
|
{
|
|
288
231
|
"embedSources": [
|
|
289
232
|
{
|
|
290
233
|
"input": ".knowhow/docs/**/*.mdx",
|
|
291
234
|
"output": ".knowhow/embeddings/docs.json",
|
|
292
|
-
"
|
|
293
|
-
|
|
235
|
+
"chunkSize": 2000,
|
|
236
|
+
|
|
237
|
+
"remoteType": "github",
|
|
238
|
+
"remote": "org-or-user/repo-name"
|
|
294
239
|
}
|
|
295
240
|
]
|
|
296
241
|
}
|
|
297
242
|
```
|
|
298
243
|
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
## 6) `knowhow upload` (upload embeddings to remote)
|
|
302
|
-
|
|
303
|
-
Command:
|
|
244
|
+
Run:
|
|
304
245
|
|
|
305
246
|
```bash
|
|
247
|
+
knowhow embed
|
|
306
248
|
knowhow upload
|
|
307
249
|
```
|
|
308
250
|
|
|
309
|
-
|
|
251
|
+
> The exact LFS paths/commit behavior is implemented by the Embeddings service resolver for `github`.
|
|
310
252
|
|
|
311
|
-
|
|
312
|
-
- if `remoteType === "s3"` → uploads via S3Service
|
|
313
|
-
- if `remoteType === "knowhow"` → uploads via Knowhow presigned URLs and syncs metadata
|
|
253
|
+
### C) Upload to Knowhow Cloud KB (`remoteType: "knowhow"`)
|
|
314
254
|
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
```json
|
|
255
|
+
```jsonc
|
|
318
256
|
{
|
|
319
257
|
"embedSources": [
|
|
320
258
|
{
|
|
321
259
|
"input": ".knowhow/docs/**/*.mdx",
|
|
322
260
|
"output": ".knowhow/embeddings/docs.json",
|
|
323
|
-
"
|
|
324
|
-
|
|
325
|
-
},
|
|
326
|
-
{
|
|
327
|
-
"input": "src/**/*.ts",
|
|
328
|
-
"output": ".knowhow/embeddings/code.json",
|
|
261
|
+
"chunkSize": 2000,
|
|
262
|
+
|
|
329
263
|
"remoteType": "knowhow",
|
|
330
|
-
"
|
|
264
|
+
"remote": "unused-or-label",
|
|
265
|
+
"remoteId": "KB_ID_FROM_KNOWHOW_DASHBOARD"
|
|
331
266
|
}
|
|
332
267
|
]
|
|
333
268
|
}
|
|
@@ -336,147 +271,123 @@ Example: upload both docs and code embeddings
|
|
|
336
271
|
Run:
|
|
337
272
|
|
|
338
273
|
```bash
|
|
274
|
+
knowhow embed
|
|
339
275
|
knowhow upload
|
|
340
276
|
```
|
|
341
277
|
|
|
342
278
|
---
|
|
343
279
|
|
|
344
|
-
##
|
|
280
|
+
## 6) `knowhow upload` — upload embeddings to remote
|
|
345
281
|
|
|
346
|
-
|
|
282
|
+
`knowhow upload` iterates `config.embedSources` and uploads the JSON file at each `source.output`.
|
|
347
283
|
|
|
348
|
-
|
|
349
|
-
knowhow download
|
|
350
|
-
```
|
|
284
|
+
### Behavior by remote type
|
|
351
285
|
|
|
352
|
-
|
|
286
|
+
- If `remoteType` is known (resolver exists) and `remoteType !== "knowhow"`: uploads using the embeddings resolver.
|
|
287
|
+
- If `remoteType === "knowhow"`:
|
|
288
|
+
1. Requires `remoteId`
|
|
289
|
+
2. Fetches (and warns about) model mismatches
|
|
290
|
+
3. Requests a **presigned upload URL**
|
|
291
|
+
4. Uploads the local embedding file via S3 under the hood
|
|
292
|
+
5. Syncs metadata back to the KB (glob, chunk size, etc.)
|
|
353
293
|
|
|
354
|
-
|
|
355
|
-
- `s3`
|
|
356
|
-
- `github`
|
|
357
|
-
- `knowhow` (requires `remoteId`)
|
|
294
|
+
---
|
|
358
295
|
|
|
359
|
-
|
|
296
|
+
## 7) `knowhow download` — download embeddings from remote
|
|
360
297
|
|
|
361
|
-
|
|
362
|
-
{
|
|
363
|
-
"embedSources": [
|
|
364
|
-
{
|
|
365
|
-
"output": ".knowhow/embeddings/docs.json",
|
|
366
|
-
"remoteType": "s3",
|
|
367
|
-
"remote": "my-knowhow-embeddings"
|
|
368
|
-
}
|
|
369
|
-
]
|
|
370
|
-
}
|
|
371
|
-
```
|
|
298
|
+
`knowhow download` downloads each embeddings file defined in `embedSources` where `remoteType` is set.
|
|
372
299
|
|
|
373
|
-
|
|
300
|
+
### Example: S3 download
|
|
374
301
|
|
|
375
302
|
```bash
|
|
376
303
|
knowhow download
|
|
377
304
|
```
|
|
378
305
|
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
306
|
+
Where it goes:
|
|
307
|
+
- For non-knowhow resolvers: it uses the embeddings resolver’s download logic.
|
|
308
|
+
- For `remoteType: "knowhow"`: it requests a presigned download URL from the Knowhow API, then saves to your configured `source.output`.
|
|
382
309
|
|
|
383
|
-
|
|
310
|
+
---
|
|
384
311
|
|
|
385
|
-
|
|
386
|
-
You need the `KB ID` (stored as `remoteId`) from **knowhow.tyvm.ai**.
|
|
312
|
+
## 8) Uploading to knowhow.tyvm.ai (Knowhow Cloud KB)
|
|
387
313
|
|
|
388
|
-
### Step
|
|
389
|
-
Set:
|
|
314
|
+
### Step 1: Get your KB ID
|
|
390
315
|
|
|
391
|
-
|
|
392
|
-
- `remoteId: "<your KB ID>"`
|
|
316
|
+
From the Knowhow web app / dashboard, locate the KB (knowledge base) you want embeddings uploaded into and copy its **KB ID**.
|
|
393
317
|
|
|
394
|
-
|
|
318
|
+
### Step 2: Configure your `embedSources` entry
|
|
395
319
|
|
|
396
|
-
```
|
|
320
|
+
```jsonc
|
|
397
321
|
{
|
|
398
322
|
"embedSources": [
|
|
399
323
|
{
|
|
400
324
|
"input": ".knowhow/docs/**/*.mdx",
|
|
401
325
|
"output": ".knowhow/embeddings/docs.json",
|
|
326
|
+
"chunkSize": 2000,
|
|
327
|
+
|
|
402
328
|
"remoteType": "knowhow",
|
|
403
|
-
"remoteId": "
|
|
404
|
-
"chunkSize": 2000
|
|
329
|
+
"remoteId": "your-kb-id"
|
|
405
330
|
}
|
|
406
331
|
]
|
|
407
332
|
}
|
|
408
333
|
```
|
|
409
334
|
|
|
410
335
|
### Step 3: Generate + upload
|
|
411
|
-
1) Generate embeddings:
|
|
412
336
|
|
|
413
337
|
```bash
|
|
414
338
|
knowhow embed
|
|
415
|
-
```
|
|
416
|
-
|
|
417
|
-
2) Upload them:
|
|
418
|
-
|
|
419
|
-
```bash
|
|
420
339
|
knowhow upload
|
|
421
340
|
```
|
|
422
341
|
|
|
423
|
-
Knowhow Cloud upload
|
|
342
|
+
Knowhow Cloud upload flow:
|
|
343
|
+
- uses the KB ID (`remoteId`) to request a presigned upload URL
|
|
344
|
+
- uploads your embeddings JSON
|
|
345
|
+
- syncs embed configuration metadata back to the backend
|
|
424
346
|
|
|
425
347
|
---
|
|
426
348
|
|
|
427
349
|
## 9) Using embeddings in chat
|
|
428
350
|
|
|
429
|
-
|
|
351
|
+
When the **embeddings plugin** is enabled (it is included in the default plugin list), Knowhow can:
|
|
352
|
+
|
|
353
|
+
1. Embed the user query using your configured embedding model
|
|
354
|
+
2. Compare the query vector against stored vectors using cosine similarity
|
|
355
|
+
3. Retrieve the most relevant chunks
|
|
356
|
+
4. Inject relevant context into the agent/chat prompt automatically
|
|
430
357
|
|
|
431
|
-
|
|
432
|
-
-
|
|
433
|
-
- automatically selects the most relevant chunks
|
|
434
|
-
- injects them into the model context as supporting material
|
|
358
|
+
In other words, chat becomes semantic:
|
|
359
|
+
- “Where is the auth code?” matches the meaning even if your code uses different keywords.
|
|
435
360
|
|
|
436
|
-
|
|
361
|
+
> The similarity computation is done by `cosineSimilarity(embedding.vector, queryVector)` and results are sorted descending.
|
|
437
362
|
|
|
438
363
|
---
|
|
439
364
|
|
|
440
|
-
## 10) Special input kinds (
|
|
365
|
+
## 10) Special input kinds (plugins)
|
|
441
366
|
|
|
442
|
-
|
|
367
|
+
In embedding generation, Knowhow checks:
|
|
443
368
|
|
|
444
|
-
-
|
|
445
|
-
|
|
446
|
-
return Plugins.embed(kind, input);
|
|
447
|
-
```
|
|
369
|
+
- If `Plugins.isPlugin(kind)` → it delegates to `Plugins.embed(kind, input)`
|
|
370
|
+
- Otherwise it falls back to built-in kinds (`file`, `text`)
|
|
448
371
|
|
|
449
|
-
|
|
372
|
+
That means “special input kinds” work as long as the corresponding plugin is installed/enabled.
|
|
450
373
|
|
|
451
|
-
|
|
452
|
-
|
|
374
|
+
Examples include (as referenced by your default enabled plugin list / mentions):
|
|
375
|
+
- `asana`
|
|
376
|
+
- `github`
|
|
377
|
+
- `download`
|
|
378
|
+
- `url`
|
|
379
|
+
- `jira`
|
|
380
|
+
- `linear`
|
|
381
|
+
- etc.
|
|
453
382
|
|
|
454
|
-
|
|
455
|
-
- `input`: plugin-specific selector or identifier
|
|
456
|
-
- `output`: local embeddings JSON file
|
|
457
|
-
- optional: `prompt`, `chunkSize`, `minLength`
|
|
383
|
+
### A) Embed a URL/web page (`kind: "url"`)
|
|
458
384
|
|
|
459
|
-
|
|
460
|
-
```json
|
|
461
|
-
{
|
|
462
|
-
"embedSources": [
|
|
463
|
-
{
|
|
464
|
-
"kind": "asana",
|
|
465
|
-
"input": "workspace-or-project-id-or-filter",
|
|
466
|
-
"output": ".knowhow/embeddings/asana.json",
|
|
467
|
-
"chunkSize": 2000
|
|
468
|
-
}
|
|
469
|
-
]
|
|
470
|
-
}
|
|
471
|
-
```
|
|
472
|
-
|
|
473
|
-
#### Example: embed web pages (URL plugin)
|
|
474
|
-
```json
|
|
385
|
+
```jsonc
|
|
475
386
|
{
|
|
476
387
|
"embedSources": [
|
|
477
388
|
{
|
|
478
389
|
"kind": "url",
|
|
479
|
-
"input": "https://example.com/docs
|
|
390
|
+
"input": "https://example.com/docs",
|
|
480
391
|
"output": ".knowhow/embeddings/web.json",
|
|
481
392
|
"chunkSize": 2000
|
|
482
393
|
}
|
|
@@ -484,28 +395,29 @@ Use:
|
|
|
484
395
|
}
|
|
485
396
|
```
|
|
486
397
|
|
|
487
|
-
|
|
488
|
-
|
|
398
|
+
### B) Embed Asana tasks (`kind: "asana"`)
|
|
399
|
+
|
|
400
|
+
```jsonc
|
|
489
401
|
{
|
|
490
402
|
"embedSources": [
|
|
491
403
|
{
|
|
492
|
-
"kind": "
|
|
493
|
-
"input": "
|
|
494
|
-
"output": ".knowhow/embeddings/
|
|
495
|
-
"chunkSize": 2000
|
|
496
|
-
"prompt": "BasicEmbeddingExplainer"
|
|
404
|
+
"kind": "asana",
|
|
405
|
+
"input": "project:MY_PROJECT_ID or task:123456",
|
|
406
|
+
"output": ".knowhow/embeddings/asana.json",
|
|
407
|
+
"chunkSize": 2000
|
|
497
408
|
}
|
|
498
409
|
]
|
|
499
410
|
}
|
|
500
411
|
```
|
|
501
412
|
|
|
502
|
-
|
|
503
|
-
|
|
413
|
+
### C) Embed GitHub content (`kind: "github"`)
|
|
414
|
+
|
|
415
|
+
```jsonc
|
|
504
416
|
{
|
|
505
417
|
"embedSources": [
|
|
506
418
|
{
|
|
507
419
|
"kind": "github",
|
|
508
|
-
"input": "
|
|
420
|
+
"input": "org/repo",
|
|
509
421
|
"output": ".knowhow/embeddings/github.json",
|
|
510
422
|
"chunkSize": 2000
|
|
511
423
|
}
|
|
@@ -513,54 +425,43 @@ Use:
|
|
|
513
425
|
}
|
|
514
426
|
```
|
|
515
427
|
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
---
|
|
428
|
+
### D) Embed YouTube videos (plugin kind)
|
|
519
429
|
|
|
520
|
-
|
|
430
|
+
If you have a YouTube embedding plugin installed, you can use it similarly:
|
|
521
431
|
|
|
522
|
-
|
|
523
|
-
```json
|
|
432
|
+
```jsonc
|
|
524
433
|
{
|
|
525
|
-
"embeddingModel": "text-embedding-3-small",
|
|
526
434
|
"embedSources": [
|
|
527
435
|
{
|
|
528
|
-
"
|
|
529
|
-
"
|
|
530
|
-
"
|
|
531
|
-
"chunkSize": 2000
|
|
532
|
-
"remoteType": "s3",
|
|
533
|
-
"remote": "my-knowhow-embeddings"
|
|
436
|
+
"kind": "youtube",
|
|
437
|
+
"input": "https://www.youtube.com/watch?v=VIDEO_ID",
|
|
438
|
+
"output": ".knowhow/embeddings/youtube.json",
|
|
439
|
+
"chunkSize": 2000
|
|
534
440
|
}
|
|
535
441
|
]
|
|
536
442
|
}
|
|
537
443
|
```
|
|
538
444
|
|
|
539
|
-
|
|
540
|
-
knowhow embed
|
|
541
|
-
knowhow upload
|
|
542
|
-
```
|
|
445
|
+
> The exact `input` format is plugin-specific—use the plugin’s documentation/examples for how it expects URLs, IDs, or project selectors.
|
|
543
446
|
|
|
544
|
-
|
|
545
|
-
```json
|
|
546
|
-
{
|
|
547
|
-
"embedSources": [
|
|
548
|
-
{
|
|
549
|
-
"input": "src/**/*.ts",
|
|
550
|
-
"output": ".knowhow/embeddings/code.json",
|
|
551
|
-
"chunkSize": 2000,
|
|
552
|
-
"remoteType": "knowhow",
|
|
553
|
-
"remoteId": "kb_1234567890abcdef"
|
|
554
|
-
}
|
|
555
|
-
]
|
|
556
|
-
}
|
|
557
|
-
```
|
|
447
|
+
---
|
|
558
448
|
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
562
|
-
|
|
449
|
+
# Recommended workflow
|
|
450
|
+
|
|
451
|
+
1. **Configure** `embedSources` locally
|
|
452
|
+
2. Run:
|
|
453
|
+
```bash
|
|
454
|
+
knowhow embed
|
|
455
|
+
```
|
|
456
|
+
3. If desired, store remotely:
|
|
457
|
+
```bash
|
|
458
|
+
knowhow upload
|
|
459
|
+
```
|
|
460
|
+
4. For other environments/machines:
|
|
461
|
+
```bash
|
|
462
|
+
knowhow download
|
|
463
|
+
```
|
|
563
464
|
|
|
564
465
|
---
|
|
565
466
|
|
|
566
|
-
If you
|
|
467
|
+
If you paste your current `.knowhow/knowhow.json` (especially `embedSources`), I can suggest an optimal setup (chunk sizing, minLength, prompt strategy, and the best remoteType for your workflow).
|