@tyvm/knowhow 0.0.109 → 0.0.110

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/autodoc/README.md +324 -0
  2. package/autodoc/chat-guide.md +268 -365
  3. package/autodoc/cli-reference.md +399 -473
  4. package/autodoc/config-reference.md +431 -330
  5. package/autodoc/embeddings-guide.md +223 -322
  6. package/autodoc/generate-guide.md +261 -301
  7. package/autodoc/language-plugin-guide.md +221 -247
  8. package/autodoc/modules-guide.md +242 -215
  9. package/autodoc/plugins-guide.md +470 -469
  10. package/autodoc/quickstart-guide.md +67 -70
  11. package/autodoc/skills-guide.md +455 -339
  12. package/autodoc/worker-guide.md +301 -308
  13. package/package.json +1 -1
  14. package/scripts/build-for-node.sh +10 -24
  15. package/src/agents/tools/list.ts +2 -2
  16. package/src/ai.ts +81 -37
  17. package/src/chat/CliChatService.ts +1 -1
  18. package/src/chat/modules/AgentModule.ts +7 -2
  19. package/src/chat/modules/SessionsModule.ts +40 -1
  20. package/src/chat/modules/SystemModule.ts +2 -2
  21. package/src/clients/anthropic.ts +1 -1
  22. package/src/clients/index.ts +25 -6
  23. package/src/clients/openai.ts +8 -5
  24. package/src/clients/types.ts +29 -6
  25. package/src/clients/withRetry.ts +89 -0
  26. package/src/commands/agent.ts +30 -0
  27. package/src/commands/modules.ts +417 -47
  28. package/src/config.ts +1 -1
  29. package/src/fileSync.ts +20 -12
  30. package/src/hashes.ts +43 -22
  31. package/src/index.ts +4 -2
  32. package/src/processors/Base64ImageDetector.ts +73 -0
  33. package/src/services/MediaProcessorService.ts +79 -10
  34. package/src/services/modules/index.ts +47 -18
  35. package/tests/processors/Base64ImageDetector.test.ts +160 -0
  36. package/tests/unit/clients/AIClient.test.ts +446 -0
  37. package/tests/unit/clients/withRetry.test.ts +319 -0
  38. package/tests/unit/commands/github-credentials.test.ts +1 -2
  39. package/ts_build/package.json +1 -1
  40. package/ts_build/src/agents/tools/list.js +2 -2
  41. package/ts_build/src/agents/tools/list.js.map +1 -1
  42. package/ts_build/src/ai.d.ts +3 -3
  43. package/ts_build/src/ai.js +51 -23
  44. package/ts_build/src/ai.js.map +1 -1
  45. package/ts_build/src/chat/CliChatService.js +1 -1
  46. package/ts_build/src/chat/CliChatService.js.map +1 -1
  47. package/ts_build/src/chat/modules/AgentModule.js +5 -2
  48. package/ts_build/src/chat/modules/AgentModule.js.map +1 -1
  49. package/ts_build/src/chat/modules/SessionsModule.js +30 -1
  50. package/ts_build/src/chat/modules/SessionsModule.js.map +1 -1
  51. package/ts_build/src/chat/modules/SystemModule.js +2 -2
  52. package/ts_build/src/chat/modules/SystemModule.js.map +1 -1
  53. package/ts_build/src/clients/anthropic.js +1 -1
  54. package/ts_build/src/clients/anthropic.js.map +1 -1
  55. package/ts_build/src/clients/index.js +7 -6
  56. package/ts_build/src/clients/index.js.map +1 -1
  57. package/ts_build/src/clients/openai.js +4 -4
  58. package/ts_build/src/clients/openai.js.map +1 -1
  59. package/ts_build/src/clients/types.d.ts +12 -6
  60. package/ts_build/src/clients/withRetry.d.ts +2 -0
  61. package/ts_build/src/clients/withRetry.js +60 -0
  62. package/ts_build/src/clients/withRetry.js.map +1 -0
  63. package/ts_build/src/commands/agent.js +25 -0
  64. package/ts_build/src/commands/agent.js.map +1 -1
  65. package/ts_build/src/commands/modules.js +359 -32
  66. package/ts_build/src/commands/modules.js.map +1 -1
  67. package/ts_build/src/config.js +1 -1
  68. package/ts_build/src/config.js.map +1 -1
  69. package/ts_build/src/fileSync.d.ts +2 -2
  70. package/ts_build/src/fileSync.js +13 -11
  71. package/ts_build/src/fileSync.js.map +1 -1
  72. package/ts_build/src/hashes.d.ts +2 -2
  73. package/ts_build/src/hashes.js +40 -16
  74. package/ts_build/src/hashes.js.map +1 -1
  75. package/ts_build/src/index.js +1 -1
  76. package/ts_build/src/index.js.map +1 -1
  77. package/ts_build/src/processors/Base64ImageDetector.d.ts +3 -0
  78. package/ts_build/src/processors/Base64ImageDetector.js +42 -0
  79. package/ts_build/src/processors/Base64ImageDetector.js.map +1 -1
  80. package/ts_build/src/services/MediaProcessorService.d.ts +5 -4
  81. package/ts_build/src/services/MediaProcessorService.js +53 -8
  82. package/ts_build/src/services/MediaProcessorService.js.map +1 -1
  83. package/ts_build/src/services/modules/index.js +35 -12
  84. package/ts_build/src/services/modules/index.js.map +1 -1
  85. package/ts_build/tests/processors/Base64ImageDetector.test.js +111 -0
  86. package/ts_build/tests/processors/Base64ImageDetector.test.js.map +1 -1
  87. package/ts_build/tests/unit/clients/AIClient.test.d.ts +1 -0
  88. package/ts_build/tests/unit/clients/AIClient.test.js +339 -0
  89. package/ts_build/tests/unit/clients/AIClient.test.js.map +1 -0
  90. package/ts_build/tests/unit/clients/withRetry.test.d.ts +1 -0
  91. package/ts_build/tests/unit/clients/withRetry.test.js +225 -0
  92. package/ts_build/tests/unit/clients/withRetry.test.js.map +1 -0
  93. package/ts_build/tests/unit/commands/github-credentials.test.js +1 -2
  94. package/ts_build/tests/unit/commands/github-credentials.test.js.map +1 -1
@@ -1,213 +1,159 @@
1
1
  # Embeddings Guide (Knowhow CLI)
2
2
 
3
- Embeddings are the backbone of Knowhow’s semantic search. Instead of searching for exact words, Knowhow converts *text chunks* into **vectors** (arrays of numbers) that represent meaning. Later, when you ask a question, Knowhow embeds the question and finds the most similar vectors across your docs/code/other sources.
3
+ Embeddings are the backbone of Knowhow’s semantic search. Instead of matching exact keywords, Knowhow turns **text chunks** into **vectors** (arrays of numbers) and later searches by **meaning** using vector similarity.
4
4
 
5
- This guide explains how to generate, configure, store, and use embeddings in Knowhow.
5
+ This guide explains how to generate, configure, store, and use embeddings with the Knowhow CLI.
6
6
 
7
7
  ---
8
8
 
9
9
  ## 1) What embeddings are
10
10
 
11
- An **embedding** is a numeric vector representing a piece of text.
11
+ **Embeddings** are **vector representations** of text.
12
12
 
13
- When Knowhow runs embedding generation:
13
+ In Knowhow:
14
14
 
15
- - It **chunks** your content into pieces (default ~2000 characters per chunk).
16
- - For each chunk, it may optionally **summarize/transform** the text with a prompt.
17
- - It then calls the configured embedding model to create a vector.
18
- - It saves an entry shaped like:
15
+ - Your inputs are converted into text (for files: `convertToText(filePath)`).
16
+ - The text is split into **chunks** (optional `chunkSize`).
17
+ - Each chunk is embedded by an embedding model, producing a `vector: number[]`.
18
+ - The result is stored in a local JSON file with entries shaped like:
19
19
 
20
20
  ```json
21
21
  {
22
- "id": "chunk-id",
23
- "text": "chunk content (possibly summarized)",
22
+ "id": "some-chunk-id",
23
+ "text": "chunk content (or summarized content)",
24
24
  "vector": [0.0123, -0.0045, ...],
25
25
  "metadata": {
26
- "...": "source-specific metadata"
26
+ "filepath": "...",
27
+ "date": "2026-05-23T..."
27
28
  }
28
29
  }
29
30
  ```
30
31
 
31
- ### Chunk IDs (how Knowhow identifies chunks)
32
- - If `chunkSize` is set, Knowhow typically uses:
33
- - `id-index` (e.g. `path/to/file.ts-3`)
34
- - If `chunkSize` is not set, it may keep the original `id`.
32
+ ### Chunk IDs and pruning behavior
35
33
 
36
- Knowhow also prunes old chunk embeddings that no longer match the current input (to keep embeddings in sync).
34
+ Knowhow assigns chunk IDs like:
37
35
 
38
- ---
39
-
40
- ## 2) `knowhow embed` (generate embeddings)
36
+ - If `chunkSize` is set: `chunkId = "${id}-${chunkIndex}"`
37
+ - If the chunk ID already ends with a numeric suffix, it won’t be re-suffixed.
41
38
 
42
- Run the embedding generation step:
39
+ It also **prunes old chunks**: any existing chunk under the same base `id` that is not part of the newly generated chunk set is removed from the embeddings JSON.
43
40
 
44
- ```bash
45
- knowhow embed
46
- ```
41
+ ---
47
42
 
48
- Knowhow will:
43
+ ## 2) `knowhow embed` — generate embeddings
49
44
 
50
- 1. Load `.knowhow/knowhow.json`
51
- 2. For each entry in `embedSources`, embed content into the configured `.json` output file(s)
52
- 3. Save updated embeddings locally under paths like:
53
- - `.knowhow/embeddings/docs.json`
54
- - `.knowhow/embeddings/code.json`
45
+ The CLI’s embedding generation is driven by your `embedSources` configuration.
55
46
 
56
- > In code, the embedding step iterates `config.embedSources` and calls `embedSource(...)` for each configured source.
47
+ `knowhow embed`:
57
48
 
58
- ---
49
+ 1. Loads `.knowhow/knowhow.json`
50
+ 2. Reads `config.embeddingModel` (fallback: OpenAI Ada v2)
51
+ 3. For each entry in `config.embedSources`, runs embedding generation and writes the result to `embedSources[].output`
59
52
 
60
- ## 3) `embedSources` config (what to embed)
53
+ ### Example config (local embeddings for docs + code)
61
54
 
62
- In `.knowhow/knowhow.json`, embeddings are configured under:
63
-
64
- ```json
55
+ ```jsonc
65
56
  {
66
- "embedSources": [ ... ]
57
+ "embeddingModel": "openai.EmbeddingAda2",
58
+ "embedSources": [
59
+ {
60
+ "input": ".knowhow/docs/**/*.mdx",
61
+ "output": ".knowhow/embeddings/docs.json",
62
+ "prompt": "BasicEmbeddingExplainer",
63
+ "chunkSize": 2000
64
+ },
65
+ {
66
+ "input": "src/**/*.ts",
67
+ "output": ".knowhow/embeddings/code.json",
68
+ "chunkSize": 2000
69
+ }
70
+ ]
67
71
  }
68
72
  ```
69
73
 
70
- Each entry supports these fields (from the config types and embedding logic):
71
-
72
- ### `input` (required)
73
- **Glob pattern** or a direct input string (depending on `kind`).
74
-
75
- - If `kind` is `"file"` (default), Knowhow globs the filesystem:
76
- - `input: ".knowhow/docs/**/*.mdx"`
77
- - For non-`file` kinds, Knowhow may treat `input` as a single input value.
78
-
79
- ### `output` (required)
80
- Path where the generated embeddings JSON file is saved.
81
-
82
- Example:
83
- - `.knowhow/embeddings/docs.json`
74
+ Then run:
84
75
 
85
- Knowhow writes the file as a JSON array:
86
- - Sorted by `id`
87
- - Each element includes `id`, `text`, `vector`, `metadata`
88
-
89
- ### `chunkSize` (optional)
90
- How many **characters per chunk**.
91
-
92
- - Default in the template config is `2000`
93
- - If provided, chunk IDs include `-index`
94
-
95
- ### `minLength` (optional)
96
- Skip chunks shorter than this number of characters.
97
-
98
- Implementation detail:
99
- ```ts
100
- const tooShort = minLength && textOfChunk.length < minLength;
76
+ ```bash
77
+ knowhow embed
101
78
  ```
102
79
 
103
- ### `prompt` (optional)
104
- If set, Knowhow transforms each chunk *before* embedding by summarizing it with a prompt.
105
-
106
- - The prompt is loaded via `summarizeTexts([textOfChunk], prompt)`
107
- - Metadata stores extra text when a prompt is used:
108
- - `metadata.text` is set to the original chunking output (see code path)
80
+ ---
109
81
 
110
- This is especially useful to:
111
- - compress long chunks
112
- - standardize content for better retrieval
113
- - emphasize relevant information
82
+ ## 3) `embedSources` config
114
83
 
115
- ### `kind` (optional)
116
- Controls how the input is interpreted.
84
+ Each `embedSources[]` entry controls **what** to embed, **how** to chunk/transform, and **where** to store the resulting embedding JSON.
117
85
 
118
- - If omitted: defaults to `"file"`
119
- - Supported patterns:
120
- - `"file"`: embed content from files on disk (converted to text)
121
- - `"text"`: embed a provided text string
122
- - Other kinds: typically handled by embeddings plugins (see “Special input kinds” below)
86
+ ### Supported fields (from code)
123
87
 
124
- ---
88
+ | Field | Type | What it does |
89
+ |---|---:|---|
90
+ | `input` | string | Glob pattern (or special kind input) describing what to embed |
91
+ | `output` | string | Path to the `.json` embeddings file |
92
+ | `chunkSize` | number | Split text into chunks of this many characters (template default is commonly `2000`) |
93
+ | `minLength` | number | Skip chunks shorter than this many characters |
94
+ | `prompt` | string | Optional prompt name/string used to *summarize/transform* each chunk before embedding |
95
+ | `kind` | string | Embedding strategy kind (default: `"file"`). Can also be a plugin kind like `asana`, `github`, `url`, etc. |
125
96
 
126
- ### `embedSources` example: embed docs (MDX) with chunking + prompt
97
+ ### Scenario: embed file globs (`kind` defaults to `"file"`)
127
98
 
128
- ```json
99
+ ```jsonc
129
100
  {
130
- "embeddingModel": "text-embedding-ada-002",
131
101
  "embedSources": [
132
102
  {
133
- "kind": "file",
134
- "input": ".knowhow/docs/**/*.mdx",
103
+ "input": "docs/**/*.md",
135
104
  "output": ".knowhow/embeddings/docs.json",
136
- "prompt": "BasicEmbeddingExplainer",
137
- "chunkSize": 2000
105
+ "chunkSize": 2000,
106
+ "minLength": 50
138
107
  }
139
108
  ]
140
109
  }
141
110
  ```
142
111
 
143
- ---
112
+ ### Scenario: embed text files by using `kind: "text"`
144
113
 
145
- ### `embedSources` example: embed TypeScript source files
114
+ If `kind` is `"text"`, Knowhow treats the source input as raw text and embeds it as a single item (no file globbing).
146
115
 
147
- ```json
116
+ ```jsonc
148
117
  {
149
118
  "embedSources": [
150
119
  {
151
- "kind": "file",
152
- "input": "src/**/*.ts",
153
- "output": ".knowhow/embeddings/code.json",
154
- "chunkSize": 2000,
155
- "minLength": 200
120
+ "kind": "text",
121
+ "input": "This is the content I want embedded",
122
+ "output": ".knowhow/embeddings/notes.json"
156
123
  }
157
124
  ]
158
125
  }
159
126
  ```
160
127
 
161
- ---
128
+ ### Scenario: transform before embedding with `prompt`
162
129
 
163
- ### `embedSources` example: embed a literal text string
130
+ When `prompt` is provided, Knowhow calls a summarization step before generating vectors.
164
131
 
165
- ```json
132
+ ```jsonc
166
133
  {
167
134
  "embedSources": [
168
135
  {
169
- "kind": "text",
170
- "input": "This is a short paragraph I want searchable.",
171
- "output": ".knowhow/embeddings/notes.json",
172
- "chunkSize": 0,
173
- "minLength": 1
136
+ "input": "src/**/*.ts",
137
+ "output": ".knowhow/embeddings/code.json",
138
+ "prompt": "BasicEmbeddingExplainer",
139
+ "chunkSize": 2000
174
140
  }
175
141
  ]
176
142
  }
177
143
  ```
178
144
 
179
- > With `kind: "text"`, Knowhow treats `input` as the content to embed (it hashes it to generate an ID).
180
-
181
- ---
182
-
183
- ## 4) Embedding models (`embeddingModel`)
145
+ > Tip: The prompt is loaded via `loadPrompt(promptName)` which supports either a prompt name (from `.knowhow/prompts/*.mdx`) or a direct prompt string.
184
146
 
185
- Knowhow uses `embeddingModel` from config to request vectors from the embedding provider.
147
+ ### Scenario: skip very small chunks with `minLength`
186
148
 
187
- Default (from the template config):
188
- - `text-embedding-ada-002`
189
-
190
- Supported models in the codebase:
191
-
192
- ### OpenAI embedding models
193
- - `text-embedding-ada-002` (`EmbeddingAda2`)
194
- - `text-embedding-3-small` (`EmbeddingSmall3`)
195
- - `text-embedding-3-large` (`EmbeddingLarge3`)
196
-
197
- ### Google embedding models
198
- - `gemini-embedding-exp` (`Gemini_Embedding`)
199
- - `gemini-embedding-001` (`Gemini_Embedding_001`)
200
-
201
- Example config:
202
-
203
- ```json
149
+ ```jsonc
204
150
  {
205
- "embeddingModel": "text-embedding-3-small",
206
151
  "embedSources": [
207
152
  {
208
153
  "input": "docs/**/*.md",
209
154
  "output": ".knowhow/embeddings/docs.json",
210
- "chunkSize": 2000
155
+ "chunkSize": 2000,
156
+ "minLength": 120
211
157
  }
212
158
  ]
213
159
  }
@@ -215,119 +161,108 @@ Example config:
215
161
 
216
162
  ---
217
163
 
218
- ## 5) Remote storage options (upload/download embeddings)
219
-
220
- Knowhow can store the generated embeddings JSON remotely using `remote` and `remoteType` in each `embedSources` entry.
221
-
222
- ### A) Upload to S3: `remoteType: "s3"`
223
- **Config**
224
- - `remote`: S3 bucket name
225
- - `output`: local path to the `.json` embeddings file to upload
226
-
227
- Upload behavior (from `knowhow upload`):
228
- - uploads `source.output` to `${bucket}/${embeddingName}.json` (where `embeddingName` is derived from the local filename)
164
+ ## 4) Embedding models (`embeddingModel`)
229
165
 
230
- Example:
166
+ Your embedding model is configured via:
231
167
 
232
- ```json
168
+ ```jsonc
233
169
  {
234
- "embedSources": [
235
- {
236
- "input": ".knowhow/docs/**/*.mdx",
237
- "output": ".knowhow/embeddings/docs.json",
238
- "remoteType": "s3",
239
- "remote": "my-knowhow-embeddings",
240
- "chunkSize": 2000
241
- }
242
- ]
170
+ "embeddingModel": "openai.EmbeddingAda2"
243
171
  }
244
172
  ```
245
173
 
246
- Download behavior (from `knowhow download`):
247
- - downloads `${name}.json` from the bucket into `source.output`
174
+ Knowhow passes `embeddingModel` directly to the embedding provider client when creating embeddings.
175
+
176
+ ### Supported models
177
+
178
+ The code exports embedding model sets under:
179
+
180
+ - `EmbeddingModels.openai.*`
181
+ - `EmbeddingModels.google.*`
182
+
183
+ So supported values are those available in `EmbeddingModels.openai` and `EmbeddingModels.google` in your installed Knowhow version.
184
+
185
+ > If you also use remote uploads to Knowhow Cloud, be aware that **vectors generated with different models are not comparable**. Knowhow warns on upload when local `embeddingModel` differs from the backend’s stored model.
248
186
 
249
187
  ---
250
188
 
251
- ### B) Upload via GitHub (git LFS): `remoteType: "github"`
252
- The downloader supports `remoteType: "github"` (implemented in `knowhow download`).
189
+ ## 5) Remote storage options
253
190
 
254
- From `knowhow download`:
255
- - downloads `".knowhow/embeddings/<fileName>.json"` from a configured GitHub remote into `destinationPath`
191
+ Embeddings can be uploaded to remote storage using `knowhow upload`.
256
192
 
257
- Example:
193
+ Your `embedSources[]` entry must specify:
258
194
 
259
- ```json
195
+ - `remote`: destination identifier (varies by remote type)
196
+ - `remoteType`: which backend to use
197
+ - optionally `remoteId` (required for `remoteType: "knowhow"`)
198
+
199
+ ### A) Upload to S3 (`remoteType: "s3"`)
200
+
201
+ ```jsonc
260
202
  {
261
203
  "embedSources": [
262
204
  {
263
205
  "input": "src/**/*.ts",
264
206
  "output": ".knowhow/embeddings/code.json",
265
- "remoteType": "github",
266
- "remote": "github-owner/github-repo"
207
+ "chunkSize": 2000,
208
+
209
+ "remoteType": "s3",
210
+ "remote": "my-embeddings-bucket"
267
211
  }
268
212
  ]
269
213
  }
270
214
  ```
271
215
 
272
- > Note: in the provided `knowhow upload` implementation, S3 and Knowhow-cloud uploads are explicit; GitHub/LFS upload behavior may be handled by other integrations in your setup.
273
-
274
- ---
216
+ Run:
275
217
 
276
- ### C) Upload to Knowhow Cloud KB: `remoteType: "knowhow"`
277
- This is the integration path for storing embeddings into Knowhow’s hosted knowledge base.
218
+ ```bash
219
+ knowhow embed
220
+ knowhow upload
221
+ ```
278
222
 
279
- From `knowhow upload`:
280
- - requires `remoteId`
281
- - uses a presigned upload URL
282
- - then syncs embedding metadata back to the backend DB
223
+ How it maps:
224
+ - Local `output` JSON is uploaded as something like:
225
+ `bucketName/embeddingName.json`
283
226
 
284
- Example:
227
+ ### B) Upload to GitHub via git LFS (`remoteType: "github"`)
285
228
 
286
- ```json
229
+ ```jsonc
287
230
  {
288
231
  "embedSources": [
289
232
  {
290
233
  "input": ".knowhow/docs/**/*.mdx",
291
234
  "output": ".knowhow/embeddings/docs.json",
292
- "remoteType": "knowhow",
293
- "remoteId": "kb_1234567890abcdef"
235
+ "chunkSize": 2000,
236
+
237
+ "remoteType": "github",
238
+ "remote": "org-or-user/repo-name"
294
239
  }
295
240
  ]
296
241
  }
297
242
  ```
298
243
 
299
- ---
300
-
301
- ## 6) `knowhow upload` (upload embeddings to remote)
302
-
303
- Command:
244
+ Run:
304
245
 
305
246
  ```bash
247
+ knowhow embed
306
248
  knowhow upload
307
249
  ```
308
250
 
309
- For each `embedSources` entry:
251
+ > The exact LFS paths/commit behavior is implemented by the Embeddings service resolver for `github`.
310
252
 
311
- - if `remoteType` is missing it skips that source
312
- - if `remoteType === "s3"` → uploads via S3Service
313
- - if `remoteType === "knowhow"` → uploads via Knowhow presigned URLs and syncs metadata
253
+ ### C) Upload to Knowhow Cloud KB (`remoteType: "knowhow"`)
314
254
 
315
- Example: upload both docs and code embeddings
316
-
317
- ```json
255
+ ```jsonc
318
256
  {
319
257
  "embedSources": [
320
258
  {
321
259
  "input": ".knowhow/docs/**/*.mdx",
322
260
  "output": ".knowhow/embeddings/docs.json",
323
- "remoteType": "s3",
324
- "remote": "my-knowhow-embeddings"
325
- },
326
- {
327
- "input": "src/**/*.ts",
328
- "output": ".knowhow/embeddings/code.json",
261
+ "chunkSize": 2000,
262
+
329
263
  "remoteType": "knowhow",
330
- "remoteId": "kb_1234567890abcdef"
264
+ "remote": "unused-or-label",
265
+ "remoteId": "KB_ID_FROM_KNOWHOW_DASHBOARD"
331
266
  }
332
267
  ]
333
268
  }
@@ -336,147 +271,123 @@ Example: upload both docs and code embeddings
336
271
  Run:
337
272
 
338
273
  ```bash
274
+ knowhow embed
339
275
  knowhow upload
340
276
  ```
341
277
 
342
278
  ---
343
279
 
344
- ## 7) `knowhow download` (download embeddings from remote)
280
+ ## 6) `knowhow upload` upload embeddings to remote
345
281
 
346
- Command:
282
+ `knowhow upload` iterates `config.embedSources` and uploads the JSON file at each `source.output`.
347
283
 
348
- ```bash
349
- knowhow download
350
- ```
284
+ ### Behavior by remote type
351
285
 
352
- It will read each configured `embedSources[].remoteType` and download the corresponding embeddings JSON into `embedSources[].output`.
286
+ - If `remoteType` is known (resolver exists) and `remoteType !== "knowhow"`: uploads using the embeddings resolver.
287
+ - If `remoteType === "knowhow"`:
288
+ 1. Requires `remoteId`
289
+ 2. Fetches (and warns about) model mismatches
290
+ 3. Requests a **presigned upload URL**
291
+ 4. Uploads the local embedding file via S3 under the hood
292
+ 5. Syncs metadata back to the KB (glob, chunk size, etc.)
353
293
 
354
- Supported remote types in the provided code:
355
- - `s3`
356
- - `github`
357
- - `knowhow` (requires `remoteId`)
294
+ ---
358
295
 
359
- Example:
296
+ ## 7) `knowhow download` — download embeddings from remote
360
297
 
361
- ```json
362
- {
363
- "embedSources": [
364
- {
365
- "output": ".knowhow/embeddings/docs.json",
366
- "remoteType": "s3",
367
- "remote": "my-knowhow-embeddings"
368
- }
369
- ]
370
- }
371
- ```
298
+ `knowhow download` downloads each embeddings file defined in `embedSources` where `remoteType` is set.
372
299
 
373
- Run:
300
+ ### Example: S3 download
374
301
 
375
302
  ```bash
376
303
  knowhow download
377
304
  ```
378
305
 
379
- ---
380
-
381
- ## 8) Uploading to knowhow.tyvm.ai (Cloud KB)
306
+ Where it goes:
307
+ - For non-knowhow resolvers: it uses the embeddings resolver’s download logic.
308
+ - For `remoteType: "knowhow"`: it requests a presigned download URL from the Knowhow API, then saves to your configured `source.output`.
382
309
 
383
- To upload embeddings to the Knowhow cloud knowledge base:
310
+ ---
384
311
 
385
- ### Step 1: Get a KB ID
386
- You need the `KB ID` (stored as `remoteId`) from **knowhow.tyvm.ai**.
312
+ ## 8) Uploading to knowhow.tyvm.ai (Knowhow Cloud KB)
387
313
 
388
- ### Step 2: Configure your local `embedSources`
389
- Set:
314
+ ### Step 1: Get your KB ID
390
315
 
391
- - `remoteType: "knowhow"`
392
- - `remoteId: "<your KB ID>"`
316
+ From the Knowhow web app / dashboard, locate the KB (knowledge base) you want embeddings uploaded into and copy its **KB ID**.
393
317
 
394
- Example:
318
+ ### Step 2: Configure your `embedSources` entry
395
319
 
396
- ```json
320
+ ```jsonc
397
321
  {
398
322
  "embedSources": [
399
323
  {
400
324
  "input": ".knowhow/docs/**/*.mdx",
401
325
  "output": ".knowhow/embeddings/docs.json",
326
+ "chunkSize": 2000,
327
+
402
328
  "remoteType": "knowhow",
403
- "remoteId": "kb_1234567890abcdef",
404
- "chunkSize": 2000
329
+ "remoteId": "your-kb-id"
405
330
  }
406
331
  ]
407
332
  }
408
333
  ```
409
334
 
410
335
  ### Step 3: Generate + upload
411
- 1) Generate embeddings:
412
336
 
413
337
  ```bash
414
338
  knowhow embed
415
- ```
416
-
417
- 2) Upload them:
418
-
419
- ```bash
420
339
  knowhow upload
421
340
  ```
422
341
 
423
- Knowhow Cloud upload also syncs metadata back (glob, output path, chunk size, remoteType).
342
+ Knowhow Cloud upload flow:
343
+ - uses the KB ID (`remoteId`) to request a presigned upload URL
344
+ - uploads your embeddings JSON
345
+ - syncs embed configuration metadata back to the backend
424
346
 
425
347
  ---
426
348
 
427
349
  ## 9) Using embeddings in chat
428
350
 
429
- Knowhow’s chat tooling includes an **embeddings plugin** (enabled by default in the template config). The plugin:
351
+ When the **embeddings plugin** is enabled (it is included in the default plugin list), Knowhow can:
352
+
353
+ 1. Embed the user query using your configured embedding model
354
+ 2. Compare the query vector against stored vectors using cosine similarity
355
+ 3. Retrieve the most relevant chunks
356
+ 4. Inject relevant context into the agent/chat prompt automatically
430
357
 
431
- - embeds the user query
432
- - computes similarity between the query vector and stored embedding vectors
433
- - automatically selects the most relevant chunks
434
- - injects them into the model context as supporting material
358
+ In other words, chat becomes semantic:
359
+ - “Where is the auth code?” matches the meaning even if your code uses different keywords.
435
360
 
436
- So, once your embeddings are generated and (optionally) uploaded/downloaded, you typically don’t manually reference the embedding files—**semantic retrieval happens automatically** by the chat/embeddings integration.
361
+ > The similarity computation is done by `cosineSimilarity(embedding.vector, queryVector)` and results are sorted descending.
437
362
 
438
363
  ---
439
364
 
440
- ## 10) Special input kinds (YouTube, Asana, web pages, etc.)
365
+ ## 10) Special input kinds (plugins)
441
366
 
442
- `embedSources[].kind` can be more than `"file"`/`"text"`. The embedding pipeline checks:
367
+ In embedding generation, Knowhow checks:
443
368
 
444
- - if `Plugins.isPlugin(kind)` is true → it delegates embedding to that plugin:
445
- ```ts
446
- return Plugins.embed(kind, input);
447
- ```
369
+ - If `Plugins.isPlugin(kind)` → it delegates to `Plugins.embed(kind, input)`
370
+ - Otherwise it falls back to built-in kinds (`file`, `text`)
448
371
 
449
- Your default config template enables many plugins (including `asana`, `github`, `download`, `url`, etc.), which commonly correspond to special `kind` values.
372
+ That means “special input kinds” work as long as the corresponding plugin is installed/enabled.
450
373
 
451
- ### Pattern for plugin-based kinds
452
- Use:
374
+ Examples include (as referenced by your default enabled plugin list / mentions):
375
+ - `asana`
376
+ - `github`
377
+ - `download`
378
+ - `url`
379
+ - `jira`
380
+ - `linear`
381
+ - etc.
453
382
 
454
- - `kind`: plugin name
455
- - `input`: plugin-specific selector or identifier
456
- - `output`: local embeddings JSON file
457
- - optional: `prompt`, `chunkSize`, `minLength`
383
+ ### A) Embed a URL/web page (`kind: "url"`)
458
384
 
459
- #### Example: embed Asana tasks (plugin-based kind)
460
- ```json
461
- {
462
- "embedSources": [
463
- {
464
- "kind": "asana",
465
- "input": "workspace-or-project-id-or-filter",
466
- "output": ".knowhow/embeddings/asana.json",
467
- "chunkSize": 2000
468
- }
469
- ]
470
- }
471
- ```
472
-
473
- #### Example: embed web pages (URL plugin)
474
- ```json
385
+ ```jsonc
475
386
  {
476
387
  "embedSources": [
477
388
  {
478
389
  "kind": "url",
479
- "input": "https://example.com/docs/index.html",
390
+ "input": "https://example.com/docs",
480
391
  "output": ".knowhow/embeddings/web.json",
481
392
  "chunkSize": 2000
482
393
  }
@@ -484,28 +395,29 @@ Use:
484
395
  }
485
396
  ```
486
397
 
487
- #### Example: embed YouTube videos
488
- ```json
398
+ ### B) Embed Asana tasks (`kind: "asana"`)
399
+
400
+ ```jsonc
489
401
  {
490
402
  "embedSources": [
491
403
  {
492
- "kind": "youtube",
493
- "input": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
494
- "output": ".knowhow/embeddings/youtube.json",
495
- "chunkSize": 2000,
496
- "prompt": "BasicEmbeddingExplainer"
404
+ "kind": "asana",
405
+ "input": "project:MY_PROJECT_ID or task:123456",
406
+ "output": ".knowhow/embeddings/asana.json",
407
+ "chunkSize": 2000
497
408
  }
498
409
  ]
499
410
  }
500
411
  ```
501
412
 
502
- #### Example: embed GitHub content (plugin-based kind)
503
- ```json
413
+ ### C) Embed GitHub content (`kind: "github"`)
414
+
415
+ ```jsonc
504
416
  {
505
417
  "embedSources": [
506
418
  {
507
419
  "kind": "github",
508
- "input": "owner/repo",
420
+ "input": "org/repo",
509
421
  "output": ".knowhow/embeddings/github.json",
510
422
  "chunkSize": 2000
511
423
  }
@@ -513,54 +425,43 @@ Use:
513
425
  }
514
426
  ```
515
427
 
516
- > If a `kind` doesn’t correspond to an enabled embedding plugin, Knowhow may not know how to fetch/convert that input. Ensure the plugin is installed/enabled in your Knowhow setup.
517
-
518
- ---
428
+ ### D) Embed YouTube videos (plugin kind)
519
429
 
520
- ## Practical recipes
430
+ If you have a YouTube embedding plugin installed, you can use it similarly:
521
431
 
522
- ### 1) Embed docs + upload to S3
523
- ```json
432
+ ```jsonc
524
433
  {
525
- "embeddingModel": "text-embedding-3-small",
526
434
  "embedSources": [
527
435
  {
528
- "input": ".knowhow/docs/**/*.mdx",
529
- "output": ".knowhow/embeddings/docs.json",
530
- "prompt": "BasicEmbeddingExplainer",
531
- "chunkSize": 2000,
532
- "remoteType": "s3",
533
- "remote": "my-knowhow-embeddings"
436
+ "kind": "youtube",
437
+ "input": "https://www.youtube.com/watch?v=VIDEO_ID",
438
+ "output": ".knowhow/embeddings/youtube.json",
439
+ "chunkSize": 2000
534
440
  }
535
441
  ]
536
442
  }
537
443
  ```
538
444
 
539
- ```bash
540
- knowhow embed
541
- knowhow upload
542
- ```
445
+ > The exact `input` format is plugin-specific—use the plugin’s documentation/examples for how it expects URLs, IDs, or project selectors.
543
446
 
544
- ### 2) Embed code + upload to Knowhow cloud KB
545
- ```json
546
- {
547
- "embedSources": [
548
- {
549
- "input": "src/**/*.ts",
550
- "output": ".knowhow/embeddings/code.json",
551
- "chunkSize": 2000,
552
- "remoteType": "knowhow",
553
- "remoteId": "kb_1234567890abcdef"
554
- }
555
- ]
556
- }
557
- ```
447
+ ---
558
448
 
559
- ```bash
560
- knowhow embed
561
- knowhow upload
562
- ```
449
+ # Recommended workflow
450
+
451
+ 1. **Configure** `embedSources` locally
452
+ 2. Run:
453
+ ```bash
454
+ knowhow embed
455
+ ```
456
+ 3. If desired, store remotely:
457
+ ```bash
458
+ knowhow upload
459
+ ```
460
+ 4. For other environments/machines:
461
+ ```bash
462
+ knowhow download
463
+ ```
563
464
 
564
465
  ---
565
466
 
566
- If you share your current `.knowhow/knowhow.json`, I can tailor an embeddings configuration (chunking, prompts, and remote storage) to your exact project structure and retrieval goals.
467
+ If you paste your current `.knowhow/knowhow.json` (especially `embedSources`), I can suggest an optimal setup (chunk sizing, minLength, prompt strategy, and the best remoteType for your workflow).