@tyvm/knowhow 0.0.26 → 0.0.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONFIG.md +38 -0
- package/PROCESSING.md +429 -0
- package/WORKER.md +425 -0
- package/package.json +1 -1
- package/src/chat.ts +5 -5
- package/src/clients/anthropic.ts +0 -5
- package/src/clients/gemini.ts +0 -4
- package/src/clients/openai.ts +0 -3
- package/src/clients/xai.ts +0 -2
- package/src/embeddings.ts +25 -10
- package/ts_build/src/chat.js +1 -1
- package/ts_build/src/chat.js.map +1 -1
- package/ts_build/src/clients/anthropic.js +0 -5
- package/ts_build/src/clients/anthropic.js.map +1 -1
- package/ts_build/src/clients/gemini.js +0 -1
- package/ts_build/src/clients/gemini.js.map +1 -1
- package/ts_build/src/clients/openai.js +0 -3
- package/ts_build/src/clients/openai.js.map +1 -1
- package/ts_build/src/clients/xai.js +0 -2
- package/ts_build/src/clients/xai.js.map +1 -1
- package/ts_build/src/embeddings.js +13 -9
- package/ts_build/src/embeddings.js.map +1 -1
package/CONFIG.md
CHANGED
|
@@ -88,6 +88,44 @@ Here is an overview of examples from various `knowhow.json` configuration files
|
|
|
88
88
|
],
|
|
89
89
|
```
|
|
90
90
|
|
|
91
|
+
## Expose Tools to Knowhow Site via workers
|
|
92
|
+
After you've connected via `knowhow login` you can connect tools from your local machine
|
|
93
|
+
Run `knowhow worker` to generate a block like below. Edit it to include the tools you want to expose.
|
|
94
|
+
Then run `knowhow worker` again to connect
|
|
95
|
+
```json
|
|
96
|
+
"worker": {
|
|
97
|
+
"allowedTools": [
|
|
98
|
+
"embeddingSearch",
|
|
99
|
+
"finalAnswer",
|
|
100
|
+
"callPlugin",
|
|
101
|
+
"readFile",
|
|
102
|
+
"readBlocks",
|
|
103
|
+
"patchFile",
|
|
104
|
+
"lintFile",
|
|
105
|
+
"textSearch",
|
|
106
|
+
"fileSearch",
|
|
107
|
+
"writeFileChunk",
|
|
108
|
+
"createAiCompletion",
|
|
109
|
+
"listAllModels",
|
|
110
|
+
"listAllProviders",
|
|
111
|
+
"getPullRequest",
|
|
112
|
+
"getPullRequestBuildStatuses",
|
|
113
|
+
"getRunLogs",
|
|
114
|
+
"getPullRequestBuildFailureLogs",
|
|
115
|
+
"addLanguageTerm",
|
|
116
|
+
"getAllLanguageTerms",
|
|
117
|
+
"lookupLanguageTerm",
|
|
118
|
+
"mcp_0_puppeteer_navigate",
|
|
119
|
+
"mcp_0_puppeteer_screenshot",
|
|
120
|
+
"mcp_0_puppeteer_click",
|
|
121
|
+
"mcp_0_puppeteer_fill",
|
|
122
|
+
"mcp_0_puppeteer_select",
|
|
123
|
+
"mcp_0_puppeteer_hover",
|
|
124
|
+
"mcp_0_puppeteer_evaluate"
|
|
125
|
+
]
|
|
126
|
+
},
|
|
127
|
+
```
|
|
128
|
+
|
|
91
129
|
## Custom Models Via LMS Studio
|
|
92
130
|
```json
|
|
93
131
|
"modelProviders": [
|
package/PROCESSING.md
ADDED
|
@@ -0,0 +1,429 @@
|
|
|
1
|
+
# Media Processing with Knowhow
|
|
2
|
+
|
|
3
|
+
This guide covers how to process audio, video, and PDF files using Knowhow's configuration system. Knowhow automatically converts media files to text for AI processing, supports pipeline chaining where outputs become inputs, and allows assignment of processing tasks to specific agents.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Supported File Types
|
|
8
|
+
|
|
9
|
+
Knowhow's `convertToText()` function automatically handles these file types:
|
|
10
|
+
|
|
11
|
+
**Audio Files:**
|
|
12
|
+
- `.mp3` - MP3 audio files
|
|
13
|
+
- `.wav` - WAV audio files
|
|
14
|
+
- `.m4a` - M4A audio files
|
|
15
|
+
- `.mpga` - MPEG audio files
|
|
16
|
+
|
|
17
|
+
**Video Files:**
|
|
18
|
+
- `.mp4` - MP4 video files
|
|
19
|
+
- `.webm` - WebM video files
|
|
20
|
+
- `.mov` - QuickTime video files
|
|
21
|
+
- `.mpeg` - MPEG video files
|
|
22
|
+
|
|
23
|
+
**Document Files:**
|
|
24
|
+
- `.pdf` - PDF documents
|
|
25
|
+
|
|
26
|
+
All other file types are processed as plain text files.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Audio Processing
|
|
31
|
+
|
|
32
|
+
### Basic Audio Transcription
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"sources": [
|
|
37
|
+
{
|
|
38
|
+
"input": "./recordings/**/*.mp3",
|
|
39
|
+
"output": "./transcripts/",
|
|
40
|
+
"prompt": "BasicTranscript"
|
|
41
|
+
}
|
|
42
|
+
]
|
|
43
|
+
}
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Audio files are automatically:
|
|
47
|
+
1. Split into 30-second chunks by default (configurable via `chunkTime` parameter)
|
|
48
|
+
2. Transcribed using speech-to-text via `Downloader.transcribeChunks()`
|
|
49
|
+
3. Saved as `transcript.json` in a folder named after the audio file
|
|
50
|
+
4. Formatted with timestamps: `[0:30s] transcript text [30:60s] more text`
|
|
51
|
+
|
|
52
|
+
### Audio with Agent Assignment
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{
|
|
56
|
+
"sources": [
|
|
57
|
+
{
|
|
58
|
+
"input": "./meetings/**/*.wav",
|
|
59
|
+
"output": "./meeting-notes/",
|
|
60
|
+
"prompt": "MeetingNotes",
|
|
61
|
+
"agent": "patcher"
|
|
62
|
+
}
|
|
63
|
+
]
|
|
64
|
+
}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Meeting Recording Pipeline
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"sources": [
|
|
72
|
+
{
|
|
73
|
+
"input": "./meetings/**/*.m4a",
|
|
74
|
+
"output": "./meetings/transcripts/",
|
|
75
|
+
"prompt": "MeetingTranscriber"
|
|
76
|
+
},
|
|
77
|
+
{
|
|
78
|
+
"input": "./meetings/transcripts/**/*.mdx",
|
|
79
|
+
"output": "./meetings/summaries/",
|
|
80
|
+
"prompt": "MeetingSummarizer"
|
|
81
|
+
},
|
|
82
|
+
{
|
|
83
|
+
"input": "./meetings/summaries/**/*.mdx",
|
|
84
|
+
"output": "./meetings/action-items.txt",
|
|
85
|
+
"prompt": "ActionItemExtractor"
|
|
86
|
+
}
|
|
87
|
+
]
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
This pipeline demonstrates chaining:
|
|
92
|
+
1. Audio files → transcripts (multi-output to directory)
|
|
93
|
+
2. Transcripts → summaries (multi-output to directory)
|
|
94
|
+
3. Summaries → single action items file (single output)
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Video Processing
|
|
99
|
+
|
|
100
|
+
### Basic Video Processing
|
|
101
|
+
|
|
102
|
+
```json
|
|
103
|
+
{
|
|
104
|
+
"sources": [
|
|
105
|
+
{
|
|
106
|
+
"input": "./videos/**/*.mp4",
|
|
107
|
+
"output": "./video-analysis/",
|
|
108
|
+
"prompt": "VideoAnalyzer"
|
|
109
|
+
}
|
|
110
|
+
]
|
|
111
|
+
}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Video files are processed by:
|
|
115
|
+
1. Extracting and transcribing audio (same process as audio files)
|
|
116
|
+
2. Extracting keyframes at regular intervals using `Downloader.extractKeyframes()`
|
|
117
|
+
3. Analyzing visual content of each keyframe
|
|
118
|
+
4. Combining transcript and visual analysis with timestamps
|
|
119
|
+
|
|
120
|
+
The actual output format from `convertVideoToText()` includes:
|
|
121
|
+
```
|
|
122
|
+
Chunk: (1/10):
|
|
123
|
+
Start Timestamp: [0s]
|
|
124
|
+
Visual: description of keyframe
|
|
125
|
+
Audio: transcribed audio
|
|
126
|
+
End Timestamp: [30s]
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Video Content Organization
|
|
130
|
+
|
|
131
|
+
```json
|
|
132
|
+
{
|
|
133
|
+
"sources": [
|
|
134
|
+
{
|
|
135
|
+
"input": "./raw-videos/**/*.webm",
|
|
136
|
+
"output": "./organized-videos/",
|
|
137
|
+
"prompt": "VideoOrganizer",
|
|
138
|
+
"agent": "patcher"
|
|
139
|
+
}
|
|
140
|
+
]
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
This example shows assigning video processing to the "patcher" agent.
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## PDF Processing
|
|
149
|
+
|
|
150
|
+
### Document Analysis
|
|
151
|
+
|
|
152
|
+
```json
|
|
153
|
+
{
|
|
154
|
+
"sources": [
|
|
155
|
+
{
|
|
156
|
+
"input": "./documents/**/*.pdf",
|
|
157
|
+
"output": "./document-summaries/",
|
|
158
|
+
"prompt": "DocumentSummarizer"
|
|
159
|
+
}
|
|
160
|
+
]
|
|
161
|
+
}
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
PDF files are processed by:
|
|
165
|
+
1. Reading the file using `fs.readFileSync()`
|
|
166
|
+
2. Extracting text content using the `pdf-parse` library
|
|
167
|
+
3. Returning the extracted text via `data.text`
|
|
168
|
+
4. Applying the specified prompt for analysis
|
|
169
|
+
|
|
170
|
+
### Multi-Document Research Pipeline
|
|
171
|
+
|
|
172
|
+
```json
|
|
173
|
+
{
|
|
174
|
+
"sources": [
|
|
175
|
+
{
|
|
176
|
+
"input": "./research-papers/**/*.pdf",
|
|
177
|
+
"output": "./paper-summaries/",
|
|
178
|
+
"prompt": "AcademicSummarizer"
|
|
179
|
+
},
|
|
180
|
+
{
|
|
181
|
+
"input": "./paper-summaries/**/*.mdx",
|
|
182
|
+
"output": "./research-synthesis.md",
|
|
183
|
+
"prompt": "ResearchSynthesizer"
|
|
184
|
+
}
|
|
185
|
+
]
|
|
186
|
+
}
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Agent Assignment
|
|
192
|
+
|
|
193
|
+
### Assigning Processing to Specific Agents
|
|
194
|
+
|
|
195
|
+
```json
|
|
196
|
+
{
|
|
197
|
+
"sources": [
|
|
198
|
+
{
|
|
199
|
+
"input": "./meetings/**/*.mov",
|
|
200
|
+
"output": "./meeting-notes/",
|
|
201
|
+
"prompt": "MeetingNotes",
|
|
202
|
+
"agent": "patcher"
|
|
203
|
+
}
|
|
204
|
+
]
|
|
205
|
+
}
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
The `agent` parameter assigns processing to a specific agent defined in your configuration. The agent receives the file content and prompt, then processes it according to its instructions and capabilities.
|
|
209
|
+
|
|
210
|
+
### Multi-Agent Pipeline
|
|
211
|
+
|
|
212
|
+
```json
|
|
213
|
+
{
|
|
214
|
+
"sources": [
|
|
215
|
+
{
|
|
216
|
+
"input": "./interviews/**/*.wav",
|
|
217
|
+
"output": "./interview-transcripts/",
|
|
218
|
+
"prompt": "InterviewTranscriber",
|
|
219
|
+
"agent": "transcriber"
|
|
220
|
+
},
|
|
221
|
+
{
|
|
222
|
+
"input": "./interview-transcripts/**/*.mdx",
|
|
223
|
+
"output": "./interview-insights/",
|
|
224
|
+
"prompt": "InsightExtractor",
|
|
225
|
+
"agent": "analyst"
|
|
226
|
+
}
|
|
227
|
+
]
|
|
228
|
+
}
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
This assigns transcription to a "transcriber" agent and analysis to an "analyst" agent, each specialized for their respective tasks.
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Embedding Generation from Processed Media
|
|
236
|
+
|
|
237
|
+
### Audio Embeddings
|
|
238
|
+
|
|
239
|
+
```json
|
|
240
|
+
{
|
|
241
|
+
"embedSources": [
|
|
242
|
+
{
|
|
243
|
+
"input": "./podcasts/**/*.mp3",
|
|
244
|
+
"output": ".knowhow/embeddings/podcasts.json",
|
|
245
|
+
"chunkSize": 2000,
|
|
246
|
+
"prompt": "PodcastEmbeddingExplainer"
|
|
247
|
+
}
|
|
248
|
+
]
|
|
249
|
+
}
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Audio files are automatically transcribed using `convertAudioToText()`, then chunked and embedded for semantic search.
|
|
253
|
+
|
|
254
|
+
### Video Embeddings
|
|
255
|
+
|
|
256
|
+
```json
|
|
257
|
+
{
|
|
258
|
+
"embedSources": [
|
|
259
|
+
{
|
|
260
|
+
"input": "./tutorials/**/*.mp4",
|
|
261
|
+
"output": ".knowhow/embeddings/tutorials.json",
|
|
262
|
+
"chunkSize": 1500
|
|
263
|
+
}
|
|
264
|
+
]
|
|
265
|
+
}
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
Video files are processed using `convertVideoToText()` to extract both visual and audio content, then embedded for search.
|
|
269
|
+
|
|
270
|
+
### PDF Document Embeddings
|
|
271
|
+
|
|
272
|
+
```json
|
|
273
|
+
{
|
|
274
|
+
"embedSources": [
|
|
275
|
+
{
|
|
276
|
+
"input": "./documentation/**/*.pdf",
|
|
277
|
+
"output": ".knowhow/embeddings/docs.json",
|
|
278
|
+
"chunkSize": 2000,
|
|
279
|
+
"prompt": "DocumentChunker"
|
|
280
|
+
}
|
|
281
|
+
]
|
|
282
|
+
}
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
## Pipeline Chaining Examples
|
|
288
|
+
|
|
289
|
+
### Complete Media Processing Workflow
|
|
290
|
+
|
|
291
|
+
```json
|
|
292
|
+
{
|
|
293
|
+
"sources": [
|
|
294
|
+
{
|
|
295
|
+
"input": "./raw-content/**/*.{mp4,mp3,pdf}",
|
|
296
|
+
"output": "./processed-content/",
|
|
297
|
+
"prompt": "ContentProcessor"
|
|
298
|
+
},
|
|
299
|
+
{
|
|
300
|
+
"input": "./processed-content/**/*.mdx",
|
|
301
|
+
"output": "./content-categories/",
|
|
302
|
+
"prompt": "ContentCategorizer"
|
|
303
|
+
},
|
|
304
|
+
{
|
|
305
|
+
"input": "./content-categories/**/*.mdx",
|
|
306
|
+
"output": "./final-report.md",
|
|
307
|
+
"prompt": "ReportGenerator"
|
|
308
|
+
}
|
|
309
|
+
]
|
|
310
|
+
}
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
This three-stage pipeline:
|
|
314
|
+
1. Processes mixed media files (video, audio, PDF) into text summaries
|
|
315
|
+
2. Categorizes the processed content
|
|
316
|
+
3. Generates a final consolidated report
|
|
317
|
+
|
|
318
|
+
### Meeting-to-Tasks Pipeline
|
|
319
|
+
|
|
320
|
+
```json
|
|
321
|
+
{
|
|
322
|
+
"sources": [
|
|
323
|
+
{
|
|
324
|
+
"input": "./weekly-meetings/**/*.mov",
|
|
325
|
+
"output": "./meeting-transcripts/",
|
|
326
|
+
"prompt": "MeetingTranscriber"
|
|
327
|
+
},
|
|
328
|
+
{
|
|
329
|
+
"input": "./meeting-transcripts/**/*.mdx",
|
|
330
|
+
"output": "./extracted-tasks/",
|
|
331
|
+
"prompt": "TaskExtractor"
|
|
332
|
+
},
|
|
333
|
+
{
|
|
334
|
+
"input": "./extracted-tasks/**/*.mdx",
|
|
335
|
+
"output": "./project-tasks.json",
|
|
336
|
+
"prompt": "TaskConsolidator",
|
|
337
|
+
"agent": "patcher"
|
|
338
|
+
}
|
|
339
|
+
]
|
|
340
|
+
}
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
---
|
|
344
|
+
|
|
345
|
+
## Processing Workflow Details
|
|
346
|
+
|
|
347
|
+
### Transcript Caching and Reuse
|
|
348
|
+
|
|
349
|
+
The system implements intelligent caching for audio/video processing:
|
|
350
|
+
|
|
351
|
+
**Audio Processing (`processAudio()`):**
|
|
352
|
+
- Transcripts saved as `transcript.json` in `{dir}/{filename}/transcript.json`
|
|
353
|
+
- Uses `reusePreviousTranscript` parameter (default: true)
|
|
354
|
+
- Checks if transcript exists with `fileExists()` before reprocessing
|
|
355
|
+
- Enables fast re-processing with different prompts
|
|
356
|
+
|
|
357
|
+
**Video Processing (`processVideo()`):**
|
|
358
|
+
- Audio transcripts cached the same way as audio files
|
|
359
|
+
- Video analysis cached as `video.json` in `{dir}/{filename}/video.json`
|
|
360
|
+
- Keyframes extracted using `Downloader.extractKeyframes()`
|
|
361
|
+
|
|
362
|
+
### Chunking Behavior
|
|
363
|
+
|
|
364
|
+
**Audio Files:**
|
|
365
|
+
- Default chunk time: 30 seconds (configurable via `chunkTime` parameter)
|
|
366
|
+
- Uses `Downloader.chunk()` to split audio files
|
|
367
|
+
- Each chunk transcribed separately then combined with timestamps
|
|
368
|
+
|
|
369
|
+
**Video Files:**
|
|
370
|
+
- Same 30-second default chunking for audio track
|
|
371
|
+
- Keyframes extracted at chunk intervals
|
|
372
|
+
- Visual and audio analysis combined per chunk
|
|
373
|
+
|
|
374
|
+
### Output Structure
|
|
375
|
+
|
|
376
|
+
**Multi-Output (directory ending with `/`):**
|
|
377
|
+
- Creates one output file per input file
|
|
378
|
+
- Preserves directory structure relative to input pattern
|
|
379
|
+
- Uses `outputExt` (default: "mdx") for file extensions
|
|
380
|
+
- Uses `outputName` or original filename
|
|
381
|
+
|
|
382
|
+
**Single Output (specific filename):**
|
|
383
|
+
- Combines all input files into one output
|
|
384
|
+
- Useful for reports, summaries, and consolidated documents
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## Troubleshooting
|
|
389
|
+
|
|
390
|
+
### Common Issues
|
|
391
|
+
|
|
392
|
+
**Audio/Video Processing Fails:**
|
|
393
|
+
- Ensure ffmpeg is installed and accessible (required by Downloader)
|
|
394
|
+
- Check file permissions and disk space
|
|
395
|
+
- Verify audio/video files aren't corrupted
|
|
396
|
+
- Check that `Downloader.chunk()` and `Downloader.transcribeChunks()` are working
|
|
397
|
+
|
|
398
|
+
**PDF Processing Fails:**
|
|
399
|
+
- Some PDFs may have restrictions or encryption
|
|
400
|
+
- Scanned PDFs without OCR won't extract text properly with `pdf-parse`
|
|
401
|
+
- Large PDFs may cause memory issues when loading with `fs.readFileSync()`
|
|
402
|
+
- Ensure PDF file isn't corrupted
|
|
403
|
+
|
|
404
|
+
**Pipeline Chaining Issues:**
|
|
405
|
+
- Ensure output directory of one stage matches input pattern of next stage
|
|
406
|
+
- Check that intermediate files are being created successfully
|
|
407
|
+
- Verify file extensions match between stages (default: .mdx)
|
|
408
|
+
|
|
409
|
+
**Agent Assignment Problems:**
|
|
410
|
+
- Ensure the specified agent exists in your configuration
|
|
411
|
+
- Check that the agent has appropriate permissions and tools
|
|
412
|
+
- Verify the agent can handle the specific prompt and content type
|
|
413
|
+
|
|
414
|
+
### Performance Optimization
|
|
415
|
+
|
|
416
|
+
**Large File Processing:**
|
|
417
|
+
- Audio/video files are automatically chunked (30s default) for efficient processing
|
|
418
|
+
- Consider adjusting `chunkTime` parameter for very long recordings
|
|
419
|
+
- Transcript caching avoids reprocessing unchanged files
|
|
420
|
+
|
|
421
|
+
**Batch Processing:**
|
|
422
|
+
- Process files in smaller batches if memory becomes an issue
|
|
423
|
+
- Use specific file patterns rather than overly broad glob patterns
|
|
424
|
+
- Consider processing different file types in separate pipeline stages
|
|
425
|
+
|
|
426
|
+
**File System Considerations:**
|
|
427
|
+
- Transcripts create subdirectories: `{filename}/transcript.json`
|
|
428
|
+
- Video processing creates: `{filename}/video.json`
|
|
429
|
+
- Ensure sufficient disk space for intermediate files
|