@tyvm/knowhow 0.0.68 → 0.0.70

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (215) hide show
  1. package/docs/shell-commands.md +174 -0
  2. package/package.json +2 -2
  3. package/src/agents/base/base.ts +1 -3
  4. package/src/agents/developer/developer.ts +21 -16
  5. package/src/agents/tools/agentCall.ts +4 -2
  6. package/src/agents/tools/fileSearch.ts +5 -1
  7. package/src/agents/tools/list.ts +41 -37
  8. package/src/agents/tools/startAgentTask.ts +131 -22
  9. package/src/chat/CliChatService.ts +57 -11
  10. package/src/chat/modules/AgentModule.ts +72 -12
  11. package/src/chat/modules/CustomCommandsModule.ts +79 -0
  12. package/src/chat/modules/InternalChatModule.ts +11 -1
  13. package/src/chat/modules/ShellCommandModule.ts +96 -0
  14. package/src/chat/modules/index.ts +1 -0
  15. package/src/chat/types.ts +14 -2
  16. package/src/chat.ts +16 -13
  17. package/src/cli.ts +16 -6
  18. package/src/clients/anthropic.ts +88 -91
  19. package/src/clients/gemini.ts +495 -94
  20. package/src/clients/index.ts +125 -0
  21. package/src/clients/knowhow.ts +81 -0
  22. package/src/clients/openai.ts +256 -145
  23. package/src/clients/pricing/anthropic.ts +90 -0
  24. package/src/clients/pricing/google.ts +65 -0
  25. package/src/clients/pricing/index.ts +4 -0
  26. package/src/clients/pricing/openai.ts +134 -0
  27. package/src/clients/pricing/xai.ts +62 -0
  28. package/src/clients/types.ts +170 -1
  29. package/src/clients/xai.ts +275 -46
  30. package/src/config.ts +61 -15
  31. package/src/embeddings.ts +9 -1
  32. package/src/microphone.ts +15 -16
  33. package/src/migrations.ts +151 -0
  34. package/src/plugins/AgentsMdPlugin.ts +118 -0
  35. package/src/plugins/PluginBase.ts +8 -0
  36. package/src/plugins/downloader/downloader.ts +5 -6
  37. package/src/plugins/embedding.ts +10 -8
  38. package/src/plugins/exec.ts +70 -0
  39. package/src/plugins/github.ts +120 -74
  40. package/src/plugins/language.ts +11 -13
  41. package/src/plugins/plugins.ts +25 -4
  42. package/src/plugins/tmux.ts +132 -0
  43. package/src/plugins/types.ts +1 -0
  44. package/src/plugins/vim.ts +14 -1
  45. package/src/server/index.ts +2 -0
  46. package/src/services/AgentSyncFs.ts +417 -0
  47. package/src/services/{AgentSynchronization.ts → AgentSyncKnowhowWeb.ts} +2 -2
  48. package/src/services/EventService.ts +0 -1
  49. package/src/services/KnowhowClient.ts +106 -0
  50. package/src/services/index.ts +4 -2
  51. package/src/types.ts +57 -4
  52. package/src/worker.ts +25 -2
  53. package/tests/manual/modalities/README.md +157 -0
  54. package/tests/manual/modalities/google.modalities.test.ts +335 -0
  55. package/tests/manual/modalities/openai.modalities.test.ts +329 -0
  56. package/tests/manual/modalities/streaming.test.ts +260 -0
  57. package/tests/manual/modalities/xai.modalities.test.ts +307 -0
  58. package/tests/plugins/language/languagePlugin-content-triggers.test.ts +5 -5
  59. package/tests/plugins/language/languagePlugin-integration.test.ts +1 -1
  60. package/tests/plugins/language/languagePlugin.test.ts +17 -8
  61. package/ts_build/package.json +2 -2
  62. package/ts_build/src/agents/base/base.js +1 -1
  63. package/ts_build/src/agents/base/base.js.map +1 -1
  64. package/ts_build/src/agents/developer/developer.js +21 -15
  65. package/ts_build/src/agents/developer/developer.js.map +1 -1
  66. package/ts_build/src/agents/tools/agentCall.js +4 -2
  67. package/ts_build/src/agents/tools/agentCall.js.map +1 -1
  68. package/ts_build/src/agents/tools/executeScript/index.d.ts +1 -1
  69. package/ts_build/src/agents/tools/fileSearch.js +2 -1
  70. package/ts_build/src/agents/tools/fileSearch.js.map +1 -1
  71. package/ts_build/src/agents/tools/github/index.d.ts +1 -1
  72. package/ts_build/src/agents/tools/list.js +41 -37
  73. package/ts_build/src/agents/tools/list.js.map +1 -1
  74. package/ts_build/src/agents/tools/startAgentTask.d.ts +2 -1
  75. package/ts_build/src/agents/tools/startAgentTask.js +118 -17
  76. package/ts_build/src/agents/tools/startAgentTask.js.map +1 -1
  77. package/ts_build/src/chat/CliChatService.d.ts +4 -0
  78. package/ts_build/src/chat/CliChatService.js +39 -5
  79. package/ts_build/src/chat/CliChatService.js.map +1 -1
  80. package/ts_build/src/chat/modules/AgentModule.d.ts +4 -1
  81. package/ts_build/src/chat/modules/AgentModule.js +49 -11
  82. package/ts_build/src/chat/modules/AgentModule.js.map +1 -1
  83. package/ts_build/src/chat/modules/CustomCommandsModule.d.ts +9 -0
  84. package/ts_build/src/chat/modules/CustomCommandsModule.js +58 -0
  85. package/ts_build/src/chat/modules/CustomCommandsModule.js.map +1 -0
  86. package/ts_build/src/chat/modules/InternalChatModule.d.ts +2 -0
  87. package/ts_build/src/chat/modules/InternalChatModule.js +10 -0
  88. package/ts_build/src/chat/modules/InternalChatModule.js.map +1 -1
  89. package/ts_build/src/chat/modules/ShellCommandModule.d.ts +8 -0
  90. package/ts_build/src/chat/modules/ShellCommandModule.js +83 -0
  91. package/ts_build/src/chat/modules/ShellCommandModule.js.map +1 -0
  92. package/ts_build/src/chat/modules/index.d.ts +1 -0
  93. package/ts_build/src/chat/modules/index.js +3 -1
  94. package/ts_build/src/chat/modules/index.js.map +1 -1
  95. package/ts_build/src/chat/types.d.ts +11 -1
  96. package/ts_build/src/chat.js +16 -13
  97. package/ts_build/src/chat.js.map +1 -1
  98. package/ts_build/src/cli.js +10 -3
  99. package/ts_build/src/cli.js.map +1 -1
  100. package/ts_build/src/clients/anthropic.d.ts +6 -1
  101. package/ts_build/src/clients/anthropic.js +47 -92
  102. package/ts_build/src/clients/anthropic.js.map +1 -1
  103. package/ts_build/src/clients/gemini.d.ts +81 -2
  104. package/ts_build/src/clients/gemini.js +362 -79
  105. package/ts_build/src/clients/gemini.js.map +1 -1
  106. package/ts_build/src/clients/index.d.ts +9 -1
  107. package/ts_build/src/clients/index.js +65 -0
  108. package/ts_build/src/clients/index.js.map +1 -1
  109. package/ts_build/src/clients/knowhow.d.ts +9 -1
  110. package/ts_build/src/clients/knowhow.js +43 -0
  111. package/ts_build/src/clients/knowhow.js.map +1 -1
  112. package/ts_build/src/clients/openai.d.ts +9 -1
  113. package/ts_build/src/clients/openai.js +201 -133
  114. package/ts_build/src/clients/openai.js.map +1 -1
  115. package/ts_build/src/clients/pricing/anthropic.d.ts +17 -0
  116. package/ts_build/src/clients/pricing/anthropic.js +93 -0
  117. package/ts_build/src/clients/pricing/anthropic.js.map +1 -0
  118. package/ts_build/src/clients/pricing/google.d.ts +73 -0
  119. package/ts_build/src/clients/pricing/google.js +68 -0
  120. package/ts_build/src/clients/pricing/google.js.map +1 -0
  121. package/ts_build/src/clients/pricing/index.d.ts +4 -0
  122. package/ts_build/src/clients/pricing/index.js +14 -0
  123. package/ts_build/src/clients/pricing/index.js.map +1 -0
  124. package/ts_build/src/clients/pricing/openai.d.ts +7 -0
  125. package/ts_build/src/clients/pricing/openai.js +137 -0
  126. package/ts_build/src/clients/pricing/openai.js.map +1 -0
  127. package/ts_build/src/clients/pricing/xai.d.ts +26 -0
  128. package/ts_build/src/clients/pricing/xai.js +59 -0
  129. package/ts_build/src/clients/pricing/xai.js.map +1 -0
  130. package/ts_build/src/clients/types.d.ts +135 -0
  131. package/ts_build/src/clients/xai.d.ts +9 -1
  132. package/ts_build/src/clients/xai.js +178 -46
  133. package/ts_build/src/clients/xai.js.map +1 -1
  134. package/ts_build/src/config.d.ts +1 -0
  135. package/ts_build/src/config.js +45 -16
  136. package/ts_build/src/config.js.map +1 -1
  137. package/ts_build/src/embeddings.js +8 -1
  138. package/ts_build/src/embeddings.js.map +1 -1
  139. package/ts_build/src/microphone.js +7 -9
  140. package/ts_build/src/microphone.js.map +1 -1
  141. package/ts_build/src/migrations.d.ts +17 -0
  142. package/ts_build/src/migrations.js +86 -0
  143. package/ts_build/src/migrations.js.map +1 -0
  144. package/ts_build/src/plugins/AgentsMdPlugin.d.ts +13 -0
  145. package/ts_build/src/plugins/AgentsMdPlugin.js +118 -0
  146. package/ts_build/src/plugins/AgentsMdPlugin.js.map +1 -0
  147. package/ts_build/src/plugins/PluginBase.d.ts +1 -0
  148. package/ts_build/src/plugins/PluginBase.js +3 -0
  149. package/ts_build/src/plugins/PluginBase.js.map +1 -1
  150. package/ts_build/src/plugins/downloader/downloader.js +5 -5
  151. package/ts_build/src/plugins/downloader/downloader.js.map +1 -1
  152. package/ts_build/src/plugins/embedding.js +9 -8
  153. package/ts_build/src/plugins/embedding.js.map +1 -1
  154. package/ts_build/src/plugins/exec.d.ts +10 -0
  155. package/ts_build/src/plugins/exec.js +56 -0
  156. package/ts_build/src/plugins/exec.js.map +1 -0
  157. package/ts_build/src/plugins/github.js +93 -51
  158. package/ts_build/src/plugins/github.js.map +1 -1
  159. package/ts_build/src/plugins/language.js +14 -11
  160. package/ts_build/src/plugins/language.js.map +1 -1
  161. package/ts_build/src/plugins/plugins.d.ts +1 -0
  162. package/ts_build/src/plugins/plugins.js +19 -1
  163. package/ts_build/src/plugins/plugins.js.map +1 -1
  164. package/ts_build/src/plugins/tmux.d.ts +14 -0
  165. package/ts_build/src/plugins/tmux.js +108 -0
  166. package/ts_build/src/plugins/tmux.js.map +1 -0
  167. package/ts_build/src/plugins/types.d.ts +1 -0
  168. package/ts_build/src/plugins/vim.js +11 -1
  169. package/ts_build/src/plugins/vim.js.map +1 -1
  170. package/ts_build/src/server/index.js.map +1 -1
  171. package/ts_build/src/services/AgentSyncFs.d.ts +34 -0
  172. package/ts_build/src/services/AgentSyncFs.js +325 -0
  173. package/ts_build/src/services/AgentSyncFs.js.map +1 -0
  174. package/ts_build/src/services/AgentSyncKnowhowWeb.d.ts +29 -0
  175. package/ts_build/src/services/AgentSyncKnowhowWeb.js +178 -0
  176. package/ts_build/src/services/AgentSyncKnowhowWeb.js.map +1 -0
  177. package/ts_build/src/services/AgentSynchronization.d.ts +1 -1
  178. package/ts_build/src/services/AgentSynchronization.js +3 -3
  179. package/ts_build/src/services/AgentSynchronization.js.map +1 -1
  180. package/ts_build/src/services/EventService.js.map +1 -1
  181. package/ts_build/src/services/KnowhowClient.d.ts +9 -1
  182. package/ts_build/src/services/KnowhowClient.js +58 -0
  183. package/ts_build/src/services/KnowhowClient.js.map +1 -1
  184. package/ts_build/src/services/index.d.ts +2 -1
  185. package/ts_build/src/services/index.js +2 -1
  186. package/ts_build/src/services/index.js.map +1 -1
  187. package/ts_build/src/types.d.ts +26 -1
  188. package/ts_build/src/types.js +45 -4
  189. package/ts_build/src/types.js.map +1 -1
  190. package/ts_build/src/utils/PersistentInputManager.d.ts +28 -0
  191. package/ts_build/src/utils/PersistentInputManager.js +293 -0
  192. package/ts_build/src/utils/PersistentInputManager.js.map +1 -0
  193. package/ts_build/src/worker.js +11 -2
  194. package/ts_build/src/worker.js.map +1 -1
  195. package/ts_build/tests/manual/modalities/google.modalities.test.d.ts +1 -0
  196. package/ts_build/tests/manual/modalities/google.modalities.test.js +252 -0
  197. package/ts_build/tests/manual/modalities/google.modalities.test.js.map +1 -0
  198. package/ts_build/tests/manual/modalities/openai.modalities.test.d.ts +1 -0
  199. package/ts_build/tests/manual/modalities/openai.modalities.test.js +252 -0
  200. package/ts_build/tests/manual/modalities/openai.modalities.test.js.map +1 -0
  201. package/ts_build/tests/manual/modalities/streaming.test.d.ts +1 -0
  202. package/ts_build/tests/manual/modalities/streaming.test.js +206 -0
  203. package/ts_build/tests/manual/modalities/streaming.test.js.map +1 -0
  204. package/ts_build/tests/manual/modalities/xai.modalities.test.d.ts +1 -0
  205. package/ts_build/tests/manual/modalities/xai.modalities.test.js +226 -0
  206. package/ts_build/tests/manual/modalities/xai.modalities.test.js.map +1 -0
  207. package/ts_build/tests/manual/persistent-input-test.d.ts +1 -0
  208. package/ts_build/tests/manual/persistent-input-test.js +35 -0
  209. package/ts_build/tests/manual/persistent-input-test.js.map +1 -0
  210. package/ts_build/tests/plugins/language/languagePlugin-content-triggers.test.js +5 -5
  211. package/ts_build/tests/plugins/language/languagePlugin-content-triggers.test.js.map +1 -1
  212. package/ts_build/tests/plugins/language/languagePlugin-integration.test.js +1 -1
  213. package/ts_build/tests/plugins/language/languagePlugin-integration.test.js.map +1 -1
  214. package/ts_build/tests/plugins/language/languagePlugin.test.js +17 -7
  215. package/ts_build/tests/plugins/language/languagePlugin.test.js.map +1 -1
@@ -7,6 +7,20 @@ import {
7
7
  CompletionResponse,
8
8
  EmbeddingOptions,
9
9
  EmbeddingResponse,
10
+ AudioTranscriptionOptions,
11
+ AudioTranscriptionResponse,
12
+ AudioGenerationOptions,
13
+ AudioGenerationResponse,
14
+ ImageGenerationOptions,
15
+ ImageGenerationResponse,
16
+ VideoGenerationOptions,
17
+ VideoGenerationResponse,
18
+ VideoStatusOptions,
19
+ VideoStatusResponse,
20
+ FileUploadOptions,
21
+ FileUploadResponse,
22
+ FileDownloadOptions,
23
+ FileDownloadResponse,
10
24
  } from "../clients";
11
25
  import { Config } from "../types";
12
26
 
@@ -202,6 +216,98 @@ export class KnowhowSimpleClient {
202
216
  });
203
217
  }
204
218
 
219
+ async createAudioTranscription(options: AudioTranscriptionOptions) {
220
+ await this.checkJwt();
221
+ const formData = new FormData();
222
+ // options.file can be a Buffer, ReadStream, Blob, or File
223
+ if (Buffer.isBuffer(options.file)) {
224
+ formData.append("file", new Blob([options.file]), options["fileName"] || "audio.mp3");
225
+ } else {
226
+ formData.append("file", options.file);
227
+ }
228
+ if (options.model) formData.append("model", options.model);
229
+ if (options.language) formData.append("language", options.language);
230
+ if (options.prompt) formData.append("prompt", options.prompt);
231
+ if (options.response_format) formData.append("response_format", options.response_format);
232
+ if (options.temperature != null) formData.append("temperature", String(options.temperature));
233
+
234
+ return axios.post<AudioTranscriptionResponse>(
235
+ `${this.baseUrl}/api/proxy/v1/audio/transcriptions`,
236
+ formData,
237
+ { headers: { ...this.headers } }
238
+ );
239
+ }
240
+
241
+ async createAudioGeneration(options: AudioGenerationOptions) {
242
+ await this.checkJwt();
243
+ return axios.post<AudioGenerationResponse>(
244
+ `${this.baseUrl}/api/proxy/v1/audio/generations`,
245
+ options,
246
+ { headers: this.headers }
247
+ );
248
+ }
249
+
250
+ async createImageGeneration(options: ImageGenerationOptions) {
251
+ await this.checkJwt();
252
+ return axios.post<ImageGenerationResponse>(
253
+ `${this.baseUrl}/api/proxy/v1/images/generations`,
254
+ options,
255
+ { headers: this.headers }
256
+ );
257
+ }
258
+
259
+ async createVideoGeneration(options: VideoGenerationOptions) {
260
+ await this.checkJwt();
261
+ return axios.post<VideoGenerationResponse>(
262
+ `${this.baseUrl}/api/proxy/v1/videos/generations`,
263
+ options,
264
+ { headers: this.headers }
265
+ );
266
+ }
267
+
268
+ async getVideoStatus(options: VideoStatusOptions) {
269
+ await this.checkJwt();
270
+ const { jobId, ...rest } = options;
271
+ return axios.get<VideoStatusResponse>(
272
+ `${this.baseUrl}/api/proxy/v1/videos/${jobId}/status`,
273
+ { headers: this.headers, params: rest }
274
+ );
275
+ }
276
+
277
+ async downloadVideo(options: FileDownloadOptions) {
278
+ await this.checkJwt();
279
+ const { fileId } = options;
280
+ return axios.get<ArrayBuffer>(
281
+ `${this.baseUrl}/api/proxy/v1/videos/${fileId}/content`,
282
+ { headers: this.headers, responseType: "arraybuffer" }
283
+ );
284
+ }
285
+
286
+ async uploadFile(options: FileUploadOptions) {
287
+ await this.checkJwt();
288
+ // Send as JSON with base64-encoded data
289
+ const body = {
290
+ data: options.data.toString("base64"),
291
+ mimeType: options.mimeType,
292
+ fileName: options.fileName,
293
+ displayName: options.displayName,
294
+ };
295
+ return axios.post<FileUploadResponse>(
296
+ `${this.baseUrl}/api/proxy/v1/files`,
297
+ body,
298
+ { headers: this.headers }
299
+ );
300
+ }
301
+
302
+ async downloadFile(options: FileDownloadOptions) {
303
+ await this.checkJwt();
304
+ const { fileId } = options;
305
+ return axios.get<ArrayBuffer>(
306
+ `${this.baseUrl}/api/proxy/v1/files/${fileId}/content`,
307
+ { headers: this.headers, responseType: "arraybuffer" }
308
+ );
309
+ }
310
+
205
311
  async createChatTask(request: CreateMessageTaskRequest) {
206
312
  await this.checkJwt();
207
313
  return axios.post<CreateMessageTaskResponse>(
@@ -10,7 +10,8 @@ import { S3Service } from "./S3";
10
10
  import { ToolsService } from "./Tools";
11
11
  import { PluginService } from "../plugins/plugins";
12
12
  import { DockerService } from "./DockerService";
13
- import { AgentSynchronization } from "./AgentSynchronization";
13
+ import { AgentSyncKnowhowWeb } from "./AgentSyncKnowhowWeb";
14
+ import { AgentSyncFs } from "./AgentSyncFs";
14
15
  import { SessionManager } from "./SessionManager";
15
16
  import { TaskRegistry } from "./TaskRegistry";
16
17
 
@@ -24,7 +25,8 @@ export * from "./LazyToolsService";
24
25
  export * as MCP from "./Mcp";
25
26
  export * from "./EmbeddingService";
26
27
  export * from "./DockerService";
27
- export * from "./AgentSynchronization";
28
+ export * from "./AgentSyncKnowhowWeb";
29
+ export * from "./AgentSyncFs";
28
30
  export * from "./SessionManager";
29
31
  export * from "./TaskRegistry";
30
32
  export { Clients } from "../clients";
package/src/types.ts CHANGED
@@ -46,7 +46,8 @@ export type Config = {
46
46
  embedSources: EmbedSource[];
47
47
  embeddingModel: string;
48
48
 
49
- plugins: string[];
49
+ plugins: { enabled: string[]; disabled: string[] };
50
+
50
51
  modules: string[];
51
52
 
52
53
  agents: Assistant[];
@@ -140,6 +141,7 @@ export type Language = {
140
141
  events: string[];
141
142
  sources: IDatasource[];
142
143
  context?: string;
144
+ handled?: boolean;
143
145
  };
144
146
  };
145
147
 
@@ -177,6 +179,8 @@ export const Models = {
177
179
  Grok3MiniFastBeta: "grok-3-mini-fast-beta",
178
180
  Grok21212: "grok-2-1212",
179
181
  Grok2Vision1212: "grok-2-vision-1212",
182
+ GrokImagineImage: "grok-imagine-image",
183
+ GrokImagineVideo: "grok-imagine-video",
180
184
  },
181
185
  openai: {
182
186
  GPT_5_2: "gpt-5.2",
@@ -203,6 +207,14 @@ export const Models = {
203
207
  o1_Mini: "o1-mini-2024-09-12",
204
208
  GPT_4o_Mini_Search: "gpt-4o-mini-search-preview-2025-03-11",
205
209
  GPT_4o_Search: "gpt-4o-search-preview-2025-03-11",
210
+
211
+ TTS_1: "tts-1",
212
+ Whisper_1: "whisper-1",
213
+ DALL_E_3: "dall-e-3",
214
+ DALL_E_2: "dall-e-2",
215
+ Sora: "sora",
216
+ Sora_2: "sora-2",
217
+ Sora_2_Pro: "sora-2-pro",
206
218
  // Computer_Use: "computer-use-preview-2025-03-11",
207
219
  // Codex_Mini: "codex-mini-latest",
208
220
  },
@@ -212,14 +224,17 @@ export const Models = {
212
224
  Gemini_25_Pro_Preview: "gemini-2.5-pro-preview-05-06",
213
225
  Gemini_20_Flash: "gemini-2.0-flash",
214
226
  Gemini_20_Flash_Preview_Image_Generation:
215
- "gemini-2.0-flash-preview-image-generation",
227
+ "gemini-2.0-flash-exp-image-generation",
216
228
  Gemini_20_Flash_Lite: "gemini-2.0-flash-lite",
217
229
  Gemini_15_Flash: "gemini-1.5-flash",
218
230
  Gemini_15_Flash_8B: "gemini-1.5-flash-8b",
219
231
  Gemini_15_Pro: "gemini-1.5-pro",
220
- Imagen_3: "imagen-3.0-generate-002",
232
+ Imagen_3: "imagen-4.0-generate-001",
221
233
  Veo_2: "veo-2.0-generate-001",
234
+ Veo_3_1: "veo-3.1-generate-preview",
222
235
  Gemini_20_Flash_Live: "gemini-2.0-flash-live-001",
236
+ Gemini_25_Flash_TTS: "gemini-2.5-flash-preview-tts",
237
+ Gemini_20_Flash_TTS: "gemini-2.0-flash-preview-tts",
223
238
  },
224
239
  };
225
240
 
@@ -234,6 +249,20 @@ export const EmbeddingModels = {
234
249
  },
235
250
  };
236
251
 
252
+ export function getEnabledPlugins(
253
+ plugins: Config["plugins"] | undefined
254
+ ): string[] {
255
+ if (!plugins) return [];
256
+ return plugins.enabled ?? [];
257
+ }
258
+
259
+ export function getDisabledPlugins(
260
+ plugins: Config["plugins"] | undefined
261
+ ): string[] {
262
+ if (!plugins) return [];
263
+ return plugins.disabled ?? [];
264
+ }
265
+
237
266
  export const Providers = Object.keys(Models).reduce((obj, key) => {
238
267
  obj[key] = key;
239
268
  return obj;
@@ -275,6 +304,30 @@ export const GoogleImageModels = [
275
304
  Models.google.Imagen_3,
276
305
  ];
277
306
 
278
- export const GoogleVideoModels = [Models.google.Veo_2];
307
+ export const OpenAiImageModels = [
308
+ Models.openai.DALL_E_3,
309
+ Models.openai.DALL_E_2,
310
+ ];
311
+
312
+ export const OpenAiVideoModels = [
313
+ Models.openai.Sora,
314
+ Models.openai.Sora_2,
315
+ Models.openai.Sora_2_Pro,
316
+ ];
317
+
318
+ export const OpenAiTTSModels = [Models.openai.TTS_1];
319
+
320
+ export const OpenAiTranscriptionModels = [Models.openai.Whisper_1];
321
+
322
+ export const XaiImageModels = [Models.xai.GrokImagineImage];
323
+
324
+ export const XaiVideoModels = [Models.xai.GrokImagineVideo];
325
+
326
+ export const GoogleTTSModels = [
327
+ Models.google.Gemini_25_Flash_TTS,
328
+ Models.google.Gemini_20_Flash_TTS,
329
+ ];
330
+
331
+ export const GoogleVideoModels = [Models.google.Veo_2, Models.google.Veo_3_1];
279
332
 
280
333
  export const GoogleEmbeddingModels = [EmbeddingModels.google.Gemini_Embedding];
package/src/worker.ts CHANGED
@@ -306,14 +306,37 @@ export async function worker(options?: {
306
306
  // Get the allowedPorts configuration
307
307
  const allowedPorts = config.worker?.tunnel?.allowedPorts || [];
308
308
 
309
+ // Create URL rewriter callback that returns the hostname (without protocol)
310
+ // The tunnel package will add the protocol based on the useHttps config
311
+ // This receives port and metadata from the tunnel request
312
+ const urlRewriter = (port: number, metadata?: any) => {
313
+ const workerId = metadata?.workerId;
314
+ const secret = metadata?.secret;
315
+
316
+ // Build the hostname/domain (without protocol) based on metadata
317
+ // The tunnel handler will add the protocol using the useHttps config
318
+ // Examples:
319
+ // - secret-p3000.worker.example.com
320
+ // - workerId-p3000.worker.example.com
321
+ const subdomain = secret
322
+ ? `${secret}-p${port}`
323
+ : `${workerId}-p${port}`;
324
+
325
+ // Return just the hostname - the tunnel package should add the protocol
326
+ // based on the useHttps configuration passed below
327
+ const replacementUrl = `${subdomain}.${tunnelDomain}`;
328
+ return replacementUrl;
329
+ };
330
+
309
331
  // Initialize tunnel handler with the tunnel-specific WebSocket
332
+ // Pass useHttps flag so the tunnel package can add the correct protocol
310
333
  tunnelHandler = createTunnelHandler(tunnelConnection!, {
311
334
  allowedPorts,
312
335
  maxConcurrentStreams:
313
336
  config.worker?.tunnel?.maxConcurrentStreams || 50,
337
+ tunnelUseHttps: tunnelUseHttps,
314
338
  localHost: tunnelLocalHost,
315
- tunnelDomain,
316
- tunnelUseHttps,
339
+ urlRewriter,
317
340
  enableUrlRewriting:
318
341
  config.worker?.tunnel?.enableUrlRewriting !== false,
319
342
  portMapping,
@@ -0,0 +1,157 @@
1
+ # Modalities Manual Tests
2
+
3
+ Manual integration tests for audio, image, vision, and video generation across all supported AI providers.
4
+
5
+ ## Overview
6
+
7
+ These tests exercise every output/input modality supported by each provider:
8
+
9
+ | Modality | OpenAI | Google | XAI |
10
+ |----------------------|--------------|--------------------|--------------|
11
+ | Audio Generation | TTS-1 / TTS-1-HD | Gemini 2.0 Flash TTS | ❌ Not supported |
12
+ | Audio Transcription | Whisper-1 | ❌ Not supported | ❌ Not supported |
13
+ | Image Generation | DALL-E 3 | Imagen 3 / Gemini 2.0 Flash | Aurora |
14
+ | Vision (image input) | GPT-4o | Gemini 2.0 Flash | Grok-2-Vision |
15
+ | Video Generation | Sora (stub) | Veo 2 | ❌ Not supported |
16
+
17
+ ---
18
+
19
+ ## Prerequisites
20
+
21
+ Set the appropriate API keys in your environment before running tests:
22
+
23
+ ```bash
24
+ export OPENAI_KEY="sk-..."
25
+ export GEMINI_API_KEY="AIza..."
26
+ export XAI_API_KEY="xai-..."
27
+ ```
28
+
29
+ ---
30
+
31
+ ## Running the Tests
32
+
33
+ ### Run all modality tests
34
+
35
+ ```bash
36
+ npx jest tests/manual/modalities --testTimeout=300000 --runInBand
37
+ ```
38
+
39
+ > **`--runInBand`** is recommended so tests within a file run sequentially (some tests depend on outputs from prior tests in the same file).
40
+
41
+ ### Run a single provider
42
+
43
+ ```bash
44
+ # OpenAI only
45
+ npx jest tests/manual/modalities/openai.modalities.test.ts --testTimeout=120000 --runInBand
46
+
47
+ # Google only
48
+ npx jest tests/manual/modalities/google.modalities.test.ts --testTimeout=300000 --runInBand
49
+
50
+ # XAI only
51
+ npx jest tests/manual/modalities/xai.modalities.test.ts --testTimeout=120000 --runInBand
52
+ ```
53
+
54
+ ### Run a single test by name
55
+
56
+ ```bash
57
+ npx jest tests/manual/modalities/openai.modalities.test.ts \
58
+ --testNamePattern="DALL-E 3" \
59
+ --testTimeout=60000
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Output Files
65
+
66
+ All generated artifacts are saved under `tests/manual/modalities/outputs/<provider>/` so they can be reviewed after the tests run.
67
+
68
+ ### OpenAI (`outputs/openai/`)
69
+
70
+ | File | Test | Description |
71
+ |------|------|-------------|
72
+ | `tts-output.mp3` | Test 1 | TTS-1 generated speech audio |
73
+ | `transcription.json` | Test 2 | Whisper transcription of the TTS audio |
74
+ | `dalle3-output.png` | Test 3 | DALL-E 3 generated image |
75
+ | `dalle3-output-url.txt` | Test 3 | URL fallback if b64_json not returned |
76
+ | `vision-description.txt` | Test 4 | GPT-4o description of the DALL-E image |
77
+
78
+ ### Google (`outputs/google/`)
79
+
80
+ | File | Test | Description |
81
+ |------|------|-------------|
82
+ | `tts-output.wav` | Test 1 | Gemini 2.0 Flash TTS audio |
83
+ | `gemini-flash-image.png` | Test 2 | Gemini 2.0 Flash inline image |
84
+ | `gemini-flash-image-url.txt` | Test 2 | URL fallback |
85
+ | `imagen3-output.png` | Test 3 | Imagen 3 generated image |
86
+ | `imagen3-output-url.txt` | Test 3 | URL fallback |
87
+ | `vision-description.txt` | Test 4 | Gemini description of the Imagen 3 image |
88
+ | `veo2-output.mp4` | Test 5 | Veo 2 generated video |
89
+ | `veo2-output-url.txt` | Test 5 | URL fallback |
90
+
91
+ ### XAI (`outputs/xai/`)
92
+
93
+ | File | Test | Description |
94
+ |------|------|-------------|
95
+ | `aurora-output.png` | Test 1 | Aurora generated image |
96
+ | `aurora-output-url.txt` | Test 1 | URL fallback |
97
+ | `vision-description.txt` | Test 2 | Grok-2-Vision description of Aurora image |
98
+
99
+ ---
100
+
101
+ ## Test Dependency Order
102
+
103
+ Several tests depend on output from a previous test **within the same file**. Always run tests sequentially (`--runInBand`):
104
+
105
+ - **OpenAI**: Test 2 (Whisper) needs `tts-output.mp3` from Test 1
106
+ - **OpenAI**: Test 4 (Vision) needs `dalle3-output.png` from Test 3
107
+ - **Google**: Test 4 (Vision) needs `imagen3-output.png` from Test 3
108
+ - **XAI**: Test 2 (Vision) needs `aurora-output.png` from Test 1
109
+
110
+ If a dependency file is missing, the dependent test will be skipped with a clear log message.
111
+
112
+ ---
113
+
114
+ ## Provider Notes
115
+
116
+ ### OpenAI
117
+
118
+ - **TTS models**: `tts-1` (faster, lower quality) and `tts-1-hd` (higher quality). Voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.
119
+ - **Whisper**: Supports `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`. Response formats: `json`, `text`, `srt`, `verbose_json`, `vtt`.
120
+ - **DALL-E 3**: Sizes `1024x1024`, `1792x1024`, `1024x1792`. Quality: `standard` or `hd`.
121
+ - **Sora**: Not yet implemented in the client. See Test 5 for the intended API.
122
+
123
+ ### Google
124
+
125
+ - **Gemini TTS**: Uses `gemini-2.0-flash-preview-tts` model. Supports multi-speaker synthesis.
126
+ - **Gemini 2.0 Flash image**: Native inline image generation using the `gemini-2.0-flash-preview-image-generation` model.
127
+ - **Imagen 3**: High-quality image generation via the Vertex-style Gemini API.
128
+ - **Veo 2**: Video generation via `veo-2.0-generate-001`. Generation is asynchronous and polls for completion — allow up to 5 minutes.
129
+
130
+ ### XAI
131
+
132
+ - **Aurora**: XAI's image generation model. Uses the OpenAI-compatible `/images/generations` endpoint.
133
+ - **Grok-2-Vision**: Vision model for image understanding (`grok-2-vision-1212`).
134
+ - **Audio**: XAI does not support audio generation or transcription. Tests 3 and 4 verify these throw errors.
135
+ - **Video**: XAI has no public video generation API yet. Test 5 is a documented placeholder.
136
+
137
+ ---
138
+
139
+ ## Adding New Tests
140
+
141
+ 1. Create a new file: `tests/manual/modalities/<provider>.modalities.test.ts`
142
+ 2. Save all outputs to `path.join(__dirname, "outputs", "<provider>", "<filename>")`
143
+ 3. Guard each test with an API key check and skip gracefully if not set
144
+ 4. Add `--runInBand` if tests depend on each other's outputs
145
+ 5. Update this README with the new provider and output files
146
+
147
+ ---
148
+
149
+ ## Troubleshooting
150
+
151
+ **Tests time out**: Increase `--testTimeout`. Video generation (Veo 2) can take 2–5 minutes.
152
+
153
+ **"API key not set" skip**: Export the relevant environment variable before running.
154
+
155
+ **Dependency file missing**: Run tests in order with `--runInBand` rather than in parallel.
156
+
157
+ **TypeScript errors**: Run `npm run compile` from `packages/knowhow/` to check for type issues.