@tyvm/knowhow 0.0.69 → 0.0.71

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (214) hide show
  1. package/docs/shell-commands.md +174 -0
  2. package/package.json +1 -1
  3. package/src/agents/base/base.ts +4 -5
  4. package/src/agents/developer/developer.ts +21 -13
  5. package/src/agents/tools/agentCall.ts +4 -2
  6. package/src/agents/tools/fileSearch.ts +5 -1
  7. package/src/agents/tools/startAgentTask.ts +131 -22
  8. package/src/agents/tools/stringReplace.ts +42 -12
  9. package/src/chat/CliChatService.ts +57 -11
  10. package/src/chat/modules/AgentModule.ts +72 -12
  11. package/src/chat/modules/CustomCommandsModule.ts +79 -0
  12. package/src/chat/modules/InternalChatModule.ts +11 -1
  13. package/src/chat/modules/ShellCommandModule.ts +96 -0
  14. package/src/chat/modules/index.ts +1 -0
  15. package/src/chat/types.ts +14 -2
  16. package/src/chat.ts +16 -13
  17. package/src/cli.ts +16 -6
  18. package/src/clients/anthropic.ts +82 -112
  19. package/src/clients/gemini.ts +445 -87
  20. package/src/clients/index.ts +125 -0
  21. package/src/clients/knowhow.ts +81 -0
  22. package/src/clients/openai.ts +256 -145
  23. package/src/clients/pricing/anthropic.ts +90 -0
  24. package/src/clients/pricing/google.ts +65 -0
  25. package/src/clients/pricing/index.ts +4 -0
  26. package/src/clients/pricing/openai.ts +134 -0
  27. package/src/clients/pricing/xai.ts +62 -0
  28. package/src/clients/types.ts +170 -1
  29. package/src/clients/xai.ts +275 -46
  30. package/src/config.ts +61 -15
  31. package/src/embeddings.ts +9 -1
  32. package/src/microphone.ts +15 -16
  33. package/src/migrations.ts +151 -0
  34. package/src/plugins/AgentsMdPlugin.ts +118 -0
  35. package/src/plugins/PluginBase.ts +8 -0
  36. package/src/plugins/downloader/downloader.ts +5 -6
  37. package/src/plugins/embedding.ts +10 -8
  38. package/src/plugins/exec.ts +70 -0
  39. package/src/plugins/github.ts +120 -74
  40. package/src/plugins/language.ts +11 -13
  41. package/src/plugins/plugins.ts +25 -4
  42. package/src/plugins/tmux.ts +132 -0
  43. package/src/plugins/types.ts +1 -0
  44. package/src/plugins/vim.ts +14 -1
  45. package/src/services/AgentSyncFs.ts +417 -0
  46. package/src/services/{AgentSynchronization.ts → AgentSyncKnowhowWeb.ts} +2 -2
  47. package/src/services/EventService.ts +0 -1
  48. package/src/services/KnowhowClient.ts +106 -0
  49. package/src/services/index.ts +4 -2
  50. package/src/types.ts +57 -4
  51. package/src/worker.ts +11 -6
  52. package/tests/manual/modalities/README.md +157 -0
  53. package/tests/manual/modalities/google.modalities.test.ts +335 -0
  54. package/tests/manual/modalities/openai.modalities.test.ts +329 -0
  55. package/tests/manual/modalities/streaming.test.ts +260 -0
  56. package/tests/manual/modalities/xai.modalities.test.ts +307 -0
  57. package/tests/plugins/language/languagePlugin-content-triggers.test.ts +5 -5
  58. package/tests/plugins/language/languagePlugin-integration.test.ts +1 -1
  59. package/tests/plugins/language/languagePlugin.test.ts +17 -8
  60. package/ts_build/package.json +1 -1
  61. package/ts_build/src/agents/base/base.d.ts +3 -3
  62. package/ts_build/src/agents/base/base.js +1 -1
  63. package/ts_build/src/agents/base/base.js.map +1 -1
  64. package/ts_build/src/agents/developer/developer.js +21 -12
  65. package/ts_build/src/agents/developer/developer.js.map +1 -1
  66. package/ts_build/src/agents/tools/agentCall.js +4 -2
  67. package/ts_build/src/agents/tools/agentCall.js.map +1 -1
  68. package/ts_build/src/agents/tools/executeScript/index.d.ts +1 -1
  69. package/ts_build/src/agents/tools/fileSearch.js +2 -1
  70. package/ts_build/src/agents/tools/fileSearch.js.map +1 -1
  71. package/ts_build/src/agents/tools/github/index.d.ts +1 -1
  72. package/ts_build/src/agents/tools/startAgentTask.d.ts +2 -1
  73. package/ts_build/src/agents/tools/startAgentTask.js +118 -17
  74. package/ts_build/src/agents/tools/startAgentTask.js.map +1 -1
  75. package/ts_build/src/agents/tools/stringReplace.js +29 -12
  76. package/ts_build/src/agents/tools/stringReplace.js.map +1 -1
  77. package/ts_build/src/chat/CliChatService.d.ts +4 -0
  78. package/ts_build/src/chat/CliChatService.js +39 -5
  79. package/ts_build/src/chat/CliChatService.js.map +1 -1
  80. package/ts_build/src/chat/modules/AgentModule.d.ts +4 -1
  81. package/ts_build/src/chat/modules/AgentModule.js +49 -11
  82. package/ts_build/src/chat/modules/AgentModule.js.map +1 -1
  83. package/ts_build/src/chat/modules/CustomCommandsModule.d.ts +9 -0
  84. package/ts_build/src/chat/modules/CustomCommandsModule.js +58 -0
  85. package/ts_build/src/chat/modules/CustomCommandsModule.js.map +1 -0
  86. package/ts_build/src/chat/modules/InternalChatModule.d.ts +2 -0
  87. package/ts_build/src/chat/modules/InternalChatModule.js +10 -0
  88. package/ts_build/src/chat/modules/InternalChatModule.js.map +1 -1
  89. package/ts_build/src/chat/modules/ShellCommandModule.d.ts +8 -0
  90. package/ts_build/src/chat/modules/ShellCommandModule.js +83 -0
  91. package/ts_build/src/chat/modules/ShellCommandModule.js.map +1 -0
  92. package/ts_build/src/chat/modules/index.d.ts +1 -0
  93. package/ts_build/src/chat/modules/index.js +3 -1
  94. package/ts_build/src/chat/modules/index.js.map +1 -1
  95. package/ts_build/src/chat/types.d.ts +11 -1
  96. package/ts_build/src/chat.js +16 -13
  97. package/ts_build/src/chat.js.map +1 -1
  98. package/ts_build/src/cli.js +10 -3
  99. package/ts_build/src/cli.js.map +1 -1
  100. package/ts_build/src/clients/anthropic.d.ts +5 -1
  101. package/ts_build/src/clients/anthropic.js +61 -112
  102. package/ts_build/src/clients/anthropic.js.map +1 -1
  103. package/ts_build/src/clients/gemini.d.ts +80 -2
  104. package/ts_build/src/clients/gemini.js +336 -74
  105. package/ts_build/src/clients/gemini.js.map +1 -1
  106. package/ts_build/src/clients/index.d.ts +9 -1
  107. package/ts_build/src/clients/index.js +65 -0
  108. package/ts_build/src/clients/index.js.map +1 -1
  109. package/ts_build/src/clients/knowhow.d.ts +9 -1
  110. package/ts_build/src/clients/knowhow.js +43 -0
  111. package/ts_build/src/clients/knowhow.js.map +1 -1
  112. package/ts_build/src/clients/openai.d.ts +9 -1
  113. package/ts_build/src/clients/openai.js +201 -133
  114. package/ts_build/src/clients/openai.js.map +1 -1
  115. package/ts_build/src/clients/pricing/anthropic.d.ts +17 -0
  116. package/ts_build/src/clients/pricing/anthropic.js +93 -0
  117. package/ts_build/src/clients/pricing/anthropic.js.map +1 -0
  118. package/ts_build/src/clients/pricing/google.d.ts +73 -0
  119. package/ts_build/src/clients/pricing/google.js +68 -0
  120. package/ts_build/src/clients/pricing/google.js.map +1 -0
  121. package/ts_build/src/clients/pricing/index.d.ts +4 -0
  122. package/ts_build/src/clients/pricing/index.js +14 -0
  123. package/ts_build/src/clients/pricing/index.js.map +1 -0
  124. package/ts_build/src/clients/pricing/openai.d.ts +7 -0
  125. package/ts_build/src/clients/pricing/openai.js +137 -0
  126. package/ts_build/src/clients/pricing/openai.js.map +1 -0
  127. package/ts_build/src/clients/pricing/xai.d.ts +26 -0
  128. package/ts_build/src/clients/pricing/xai.js +59 -0
  129. package/ts_build/src/clients/pricing/xai.js.map +1 -0
  130. package/ts_build/src/clients/types.d.ts +135 -0
  131. package/ts_build/src/clients/xai.d.ts +9 -1
  132. package/ts_build/src/clients/xai.js +178 -46
  133. package/ts_build/src/clients/xai.js.map +1 -1
  134. package/ts_build/src/config.d.ts +1 -0
  135. package/ts_build/src/config.js +45 -16
  136. package/ts_build/src/config.js.map +1 -1
  137. package/ts_build/src/embeddings.js +8 -1
  138. package/ts_build/src/embeddings.js.map +1 -1
  139. package/ts_build/src/microphone.js +7 -9
  140. package/ts_build/src/microphone.js.map +1 -1
  141. package/ts_build/src/migrations.d.ts +17 -0
  142. package/ts_build/src/migrations.js +86 -0
  143. package/ts_build/src/migrations.js.map +1 -0
  144. package/ts_build/src/plugins/AgentsMdPlugin.d.ts +13 -0
  145. package/ts_build/src/plugins/AgentsMdPlugin.js +118 -0
  146. package/ts_build/src/plugins/AgentsMdPlugin.js.map +1 -0
  147. package/ts_build/src/plugins/PluginBase.d.ts +1 -0
  148. package/ts_build/src/plugins/PluginBase.js +3 -0
  149. package/ts_build/src/plugins/PluginBase.js.map +1 -1
  150. package/ts_build/src/plugins/downloader/downloader.js +5 -5
  151. package/ts_build/src/plugins/downloader/downloader.js.map +1 -1
  152. package/ts_build/src/plugins/embedding.js +9 -8
  153. package/ts_build/src/plugins/embedding.js.map +1 -1
  154. package/ts_build/src/plugins/exec.d.ts +10 -0
  155. package/ts_build/src/plugins/exec.js +56 -0
  156. package/ts_build/src/plugins/exec.js.map +1 -0
  157. package/ts_build/src/plugins/github.js +93 -51
  158. package/ts_build/src/plugins/github.js.map +1 -1
  159. package/ts_build/src/plugins/language.js +14 -11
  160. package/ts_build/src/plugins/language.js.map +1 -1
  161. package/ts_build/src/plugins/plugins.d.ts +1 -0
  162. package/ts_build/src/plugins/plugins.js +19 -1
  163. package/ts_build/src/plugins/plugins.js.map +1 -1
  164. package/ts_build/src/plugins/tmux.d.ts +14 -0
  165. package/ts_build/src/plugins/tmux.js +108 -0
  166. package/ts_build/src/plugins/tmux.js.map +1 -0
  167. package/ts_build/src/plugins/types.d.ts +1 -0
  168. package/ts_build/src/plugins/vim.js +11 -1
  169. package/ts_build/src/plugins/vim.js.map +1 -1
  170. package/ts_build/src/services/AgentSyncFs.d.ts +34 -0
  171. package/ts_build/src/services/AgentSyncFs.js +325 -0
  172. package/ts_build/src/services/AgentSyncFs.js.map +1 -0
  173. package/ts_build/src/services/AgentSyncKnowhowWeb.d.ts +29 -0
  174. package/ts_build/src/services/AgentSyncKnowhowWeb.js +178 -0
  175. package/ts_build/src/services/AgentSyncKnowhowWeb.js.map +1 -0
  176. package/ts_build/src/services/AgentSynchronization.d.ts +1 -1
  177. package/ts_build/src/services/AgentSynchronization.js +3 -3
  178. package/ts_build/src/services/AgentSynchronization.js.map +1 -1
  179. package/ts_build/src/services/EventService.js.map +1 -1
  180. package/ts_build/src/services/KnowhowClient.d.ts +9 -1
  181. package/ts_build/src/services/KnowhowClient.js +58 -0
  182. package/ts_build/src/services/KnowhowClient.js.map +1 -1
  183. package/ts_build/src/services/index.d.ts +2 -1
  184. package/ts_build/src/services/index.js +2 -1
  185. package/ts_build/src/services/index.js.map +1 -1
  186. package/ts_build/src/types.d.ts +26 -1
  187. package/ts_build/src/types.js +45 -4
  188. package/ts_build/src/types.js.map +1 -1
  189. package/ts_build/src/utils/PersistentInputManager.d.ts +28 -0
  190. package/ts_build/src/utils/PersistentInputManager.js +293 -0
  191. package/ts_build/src/utils/PersistentInputManager.js.map +1 -0
  192. package/ts_build/src/worker.js +2 -2
  193. package/ts_build/src/worker.js.map +1 -1
  194. package/ts_build/tests/manual/modalities/google.modalities.test.d.ts +1 -0
  195. package/ts_build/tests/manual/modalities/google.modalities.test.js +252 -0
  196. package/ts_build/tests/manual/modalities/google.modalities.test.js.map +1 -0
  197. package/ts_build/tests/manual/modalities/openai.modalities.test.d.ts +1 -0
  198. package/ts_build/tests/manual/modalities/openai.modalities.test.js +252 -0
  199. package/ts_build/tests/manual/modalities/openai.modalities.test.js.map +1 -0
  200. package/ts_build/tests/manual/modalities/streaming.test.d.ts +1 -0
  201. package/ts_build/tests/manual/modalities/streaming.test.js +206 -0
  202. package/ts_build/tests/manual/modalities/streaming.test.js.map +1 -0
  203. package/ts_build/tests/manual/modalities/xai.modalities.test.d.ts +1 -0
  204. package/ts_build/tests/manual/modalities/xai.modalities.test.js +226 -0
  205. package/ts_build/tests/manual/modalities/xai.modalities.test.js.map +1 -0
  206. package/ts_build/tests/manual/persistent-input-test.d.ts +1 -0
  207. package/ts_build/tests/manual/persistent-input-test.js +35 -0
  208. package/ts_build/tests/manual/persistent-input-test.js.map +1 -0
  209. package/ts_build/tests/plugins/language/languagePlugin-content-triggers.test.js +5 -5
  210. package/ts_build/tests/plugins/language/languagePlugin-content-triggers.test.js.map +1 -1
  211. package/ts_build/tests/plugins/language/languagePlugin-integration.test.js +1 -1
  212. package/ts_build/tests/plugins/language/languagePlugin-integration.test.js.map +1 -1
  213. package/ts_build/tests/plugins/language/languagePlugin.test.js +17 -7
  214. package/ts_build/tests/plugins/language/languagePlugin.test.js.map +1 -1
@@ -7,6 +7,20 @@ import {
7
7
  CompletionResponse,
8
8
  EmbeddingOptions,
9
9
  EmbeddingResponse,
10
+ AudioTranscriptionOptions,
11
+ AudioTranscriptionResponse,
12
+ AudioGenerationOptions,
13
+ AudioGenerationResponse,
14
+ ImageGenerationOptions,
15
+ ImageGenerationResponse,
16
+ VideoGenerationOptions,
17
+ VideoGenerationResponse,
18
+ VideoStatusOptions,
19
+ VideoStatusResponse,
20
+ FileUploadOptions,
21
+ FileUploadResponse,
22
+ FileDownloadOptions,
23
+ FileDownloadResponse,
10
24
  } from "../clients";
11
25
  import { Config } from "../types";
12
26
 
@@ -202,6 +216,98 @@ export class KnowhowSimpleClient {
202
216
  });
203
217
  }
204
218
 
219
+ async createAudioTranscription(options: AudioTranscriptionOptions) {
220
+ await this.checkJwt();
221
+ const formData = new FormData();
222
+ // options.file can be a Buffer, ReadStream, Blob, or File
223
+ if (Buffer.isBuffer(options.file)) {
224
+ formData.append("file", new Blob([options.file]), options["fileName"] || "audio.mp3");
225
+ } else {
226
+ formData.append("file", options.file);
227
+ }
228
+ if (options.model) formData.append("model", options.model);
229
+ if (options.language) formData.append("language", options.language);
230
+ if (options.prompt) formData.append("prompt", options.prompt);
231
+ if (options.response_format) formData.append("response_format", options.response_format);
232
+ if (options.temperature != null) formData.append("temperature", String(options.temperature));
233
+
234
+ return axios.post<AudioTranscriptionResponse>(
235
+ `${this.baseUrl}/api/proxy/v1/audio/transcriptions`,
236
+ formData,
237
+ { headers: { ...this.headers } }
238
+ );
239
+ }
240
+
241
+ async createAudioGeneration(options: AudioGenerationOptions) {
242
+ await this.checkJwt();
243
+ return axios.post<AudioGenerationResponse>(
244
+ `${this.baseUrl}/api/proxy/v1/audio/generations`,
245
+ options,
246
+ { headers: this.headers }
247
+ );
248
+ }
249
+
250
+ async createImageGeneration(options: ImageGenerationOptions) {
251
+ await this.checkJwt();
252
+ return axios.post<ImageGenerationResponse>(
253
+ `${this.baseUrl}/api/proxy/v1/images/generations`,
254
+ options,
255
+ { headers: this.headers }
256
+ );
257
+ }
258
+
259
+ async createVideoGeneration(options: VideoGenerationOptions) {
260
+ await this.checkJwt();
261
+ return axios.post<VideoGenerationResponse>(
262
+ `${this.baseUrl}/api/proxy/v1/videos/generations`,
263
+ options,
264
+ { headers: this.headers }
265
+ );
266
+ }
267
+
268
+ async getVideoStatus(options: VideoStatusOptions) {
269
+ await this.checkJwt();
270
+ const { jobId, ...rest } = options;
271
+ return axios.get<VideoStatusResponse>(
272
+ `${this.baseUrl}/api/proxy/v1/videos/${jobId}/status`,
273
+ { headers: this.headers, params: rest }
274
+ );
275
+ }
276
+
277
+ async downloadVideo(options: FileDownloadOptions) {
278
+ await this.checkJwt();
279
+ const { fileId } = options;
280
+ return axios.get<ArrayBuffer>(
281
+ `${this.baseUrl}/api/proxy/v1/videos/${fileId}/content`,
282
+ { headers: this.headers, responseType: "arraybuffer" }
283
+ );
284
+ }
285
+
286
+ async uploadFile(options: FileUploadOptions) {
287
+ await this.checkJwt();
288
+ // Send as JSON with base64-encoded data
289
+ const body = {
290
+ data: options.data.toString("base64"),
291
+ mimeType: options.mimeType,
292
+ fileName: options.fileName,
293
+ displayName: options.displayName,
294
+ };
295
+ return axios.post<FileUploadResponse>(
296
+ `${this.baseUrl}/api/proxy/v1/files`,
297
+ body,
298
+ { headers: this.headers }
299
+ );
300
+ }
301
+
302
+ async downloadFile(options: FileDownloadOptions) {
303
+ await this.checkJwt();
304
+ const { fileId } = options;
305
+ return axios.get<ArrayBuffer>(
306
+ `${this.baseUrl}/api/proxy/v1/files/${fileId}/content`,
307
+ { headers: this.headers, responseType: "arraybuffer" }
308
+ );
309
+ }
310
+
205
311
  async createChatTask(request: CreateMessageTaskRequest) {
206
312
  await this.checkJwt();
207
313
  return axios.post<CreateMessageTaskResponse>(
@@ -10,7 +10,8 @@ import { S3Service } from "./S3";
10
10
  import { ToolsService } from "./Tools";
11
11
  import { PluginService } from "../plugins/plugins";
12
12
  import { DockerService } from "./DockerService";
13
- import { AgentSynchronization } from "./AgentSynchronization";
13
+ import { AgentSyncKnowhowWeb } from "./AgentSyncKnowhowWeb";
14
+ import { AgentSyncFs } from "./AgentSyncFs";
14
15
  import { SessionManager } from "./SessionManager";
15
16
  import { TaskRegistry } from "./TaskRegistry";
16
17
 
@@ -24,7 +25,8 @@ export * from "./LazyToolsService";
24
25
  export * as MCP from "./Mcp";
25
26
  export * from "./EmbeddingService";
26
27
  export * from "./DockerService";
27
- export * from "./AgentSynchronization";
28
+ export * from "./AgentSyncKnowhowWeb";
29
+ export * from "./AgentSyncFs";
28
30
  export * from "./SessionManager";
29
31
  export * from "./TaskRegistry";
30
32
  export { Clients } from "../clients";
package/src/types.ts CHANGED
@@ -46,7 +46,8 @@ export type Config = {
46
46
  embedSources: EmbedSource[];
47
47
  embeddingModel: string;
48
48
 
49
- plugins: string[];
49
+ plugins: { enabled: string[]; disabled: string[] };
50
+
50
51
  modules: string[];
51
52
 
52
53
  agents: Assistant[];
@@ -140,6 +141,7 @@ export type Language = {
140
141
  events: string[];
141
142
  sources: IDatasource[];
142
143
  context?: string;
144
+ handled?: boolean;
143
145
  };
144
146
  };
145
147
 
@@ -177,6 +179,8 @@ export const Models = {
177
179
  Grok3MiniFastBeta: "grok-3-mini-fast-beta",
178
180
  Grok21212: "grok-2-1212",
179
181
  Grok2Vision1212: "grok-2-vision-1212",
182
+ GrokImagineImage: "grok-imagine-image",
183
+ GrokImagineVideo: "grok-imagine-video",
180
184
  },
181
185
  openai: {
182
186
  GPT_5_2: "gpt-5.2",
@@ -203,6 +207,14 @@ export const Models = {
203
207
  o1_Mini: "o1-mini-2024-09-12",
204
208
  GPT_4o_Mini_Search: "gpt-4o-mini-search-preview-2025-03-11",
205
209
  GPT_4o_Search: "gpt-4o-search-preview-2025-03-11",
210
+
211
+ TTS_1: "tts-1",
212
+ Whisper_1: "whisper-1",
213
+ DALL_E_3: "dall-e-3",
214
+ DALL_E_2: "dall-e-2",
215
+ Sora: "sora",
216
+ Sora_2: "sora-2",
217
+ Sora_2_Pro: "sora-2-pro",
206
218
  // Computer_Use: "computer-use-preview-2025-03-11",
207
219
  // Codex_Mini: "codex-mini-latest",
208
220
  },
@@ -212,14 +224,17 @@ export const Models = {
212
224
  Gemini_25_Pro_Preview: "gemini-2.5-pro-preview-05-06",
213
225
  Gemini_20_Flash: "gemini-2.0-flash",
214
226
  Gemini_20_Flash_Preview_Image_Generation:
215
- "gemini-2.0-flash-preview-image-generation",
227
+ "gemini-2.0-flash-exp-image-generation",
216
228
  Gemini_20_Flash_Lite: "gemini-2.0-flash-lite",
217
229
  Gemini_15_Flash: "gemini-1.5-flash",
218
230
  Gemini_15_Flash_8B: "gemini-1.5-flash-8b",
219
231
  Gemini_15_Pro: "gemini-1.5-pro",
220
- Imagen_3: "imagen-3.0-generate-002",
232
+ Imagen_3: "imagen-4.0-generate-001",
221
233
  Veo_2: "veo-2.0-generate-001",
234
+ Veo_3_1: "veo-3.1-generate-preview",
222
235
  Gemini_20_Flash_Live: "gemini-2.0-flash-live-001",
236
+ Gemini_25_Flash_TTS: "gemini-2.5-flash-preview-tts",
237
+ Gemini_20_Flash_TTS: "gemini-2.0-flash-preview-tts",
223
238
  },
224
239
  };
225
240
 
@@ -234,6 +249,20 @@ export const EmbeddingModels = {
234
249
  },
235
250
  };
236
251
 
252
+ export function getEnabledPlugins(
253
+ plugins: Config["plugins"] | undefined
254
+ ): string[] {
255
+ if (!plugins) return [];
256
+ return plugins.enabled ?? [];
257
+ }
258
+
259
+ export function getDisabledPlugins(
260
+ plugins: Config["plugins"] | undefined
261
+ ): string[] {
262
+ if (!plugins) return [];
263
+ return plugins.disabled ?? [];
264
+ }
265
+
237
266
  export const Providers = Object.keys(Models).reduce((obj, key) => {
238
267
  obj[key] = key;
239
268
  return obj;
@@ -275,6 +304,30 @@ export const GoogleImageModels = [
275
304
  Models.google.Imagen_3,
276
305
  ];
277
306
 
278
- export const GoogleVideoModels = [Models.google.Veo_2];
307
+ export const OpenAiImageModels = [
308
+ Models.openai.DALL_E_3,
309
+ Models.openai.DALL_E_2,
310
+ ];
311
+
312
+ export const OpenAiVideoModels = [
313
+ Models.openai.Sora,
314
+ Models.openai.Sora_2,
315
+ Models.openai.Sora_2_Pro,
316
+ ];
317
+
318
+ export const OpenAiTTSModels = [Models.openai.TTS_1];
319
+
320
+ export const OpenAiTranscriptionModels = [Models.openai.Whisper_1];
321
+
322
+ export const XaiImageModels = [Models.xai.GrokImagineImage];
323
+
324
+ export const XaiVideoModels = [Models.xai.GrokImagineVideo];
325
+
326
+ export const GoogleTTSModels = [
327
+ Models.google.Gemini_25_Flash_TTS,
328
+ Models.google.Gemini_20_Flash_TTS,
329
+ ];
330
+
331
+ export const GoogleVideoModels = [Models.google.Veo_2, Models.google.Veo_3_1];
279
332
 
280
333
  export const GoogleEmbeddingModels = [EmbeddingModels.google.Gemini_Embedding];
package/src/worker.ts CHANGED
@@ -306,30 +306,35 @@ export async function worker(options?: {
306
306
  // Get the allowedPorts configuration
307
307
  const allowedPorts = config.worker?.tunnel?.allowedPorts || [];
308
308
 
309
- // Create URL rewriter callback that can customize URL replacement logic
309
+ // Create URL rewriter callback that returns the hostname (without protocol)
310
+ // The tunnel package will add the protocol based on the useHttps config
310
311
  // This receives port and metadata from the tunnel request
311
312
  const urlRewriter = (port: number, metadata?: any) => {
312
313
  const workerId = metadata?.workerId;
313
314
  const secret = metadata?.secret;
314
315
 
315
- // Build the replacement URL based on metadata
316
+ // Build the hostname/domain (without protocol) based on metadata
317
+ // The tunnel handler will add the protocol using the useHttps config
316
318
  // Examples:
317
- // - https://workerId-p.tunnelDomain
318
- // - https://secret.workerId-p.tunnelDomain
319
+ // - secret-p3000.worker.example.com
320
+ // - workerId-p3000.worker.example.com
319
321
  const subdomain = secret
320
- ? `${secret}.${workerId}-p${port}`
322
+ ? `${secret}-p${port}`
321
323
  : `${workerId}-p${port}`;
322
324
 
323
- const protocol = tunnelUseHttps ? "https" : "http";
325
+ // Return just the hostname - the tunnel package should add the protocol
326
+ // based on the useHttps configuration passed below
324
327
  const replacementUrl = `${subdomain}.${tunnelDomain}`;
325
328
  return replacementUrl;
326
329
  };
327
330
 
328
331
  // Initialize tunnel handler with the tunnel-specific WebSocket
332
+ // Pass useHttps flag so the tunnel package can add the correct protocol
329
333
  tunnelHandler = createTunnelHandler(tunnelConnection!, {
330
334
  allowedPorts,
331
335
  maxConcurrentStreams:
332
336
  config.worker?.tunnel?.maxConcurrentStreams || 50,
337
+ tunnelUseHttps: tunnelUseHttps,
333
338
  localHost: tunnelLocalHost,
334
339
  urlRewriter,
335
340
  enableUrlRewriting:
@@ -0,0 +1,157 @@
1
+ # Modalities Manual Tests
2
+
3
+ Manual integration tests for audio, image, vision, and video generation across all supported AI providers.
4
+
5
+ ## Overview
6
+
7
+ These tests exercise every output/input modality supported by each provider:
8
+
9
+ | Modality | OpenAI | Google | XAI |
10
+ |----------------------|--------------|--------------------|--------------|
11
+ | Audio Generation | TTS-1 / TTS-1-HD | Gemini 2.0 Flash TTS | ❌ Not supported |
12
+ | Audio Transcription | Whisper-1 | ❌ Not supported | ❌ Not supported |
13
+ | Image Generation | DALL-E 3 | Imagen 3 / Gemini 2.0 Flash | Aurora |
14
+ | Vision (image input) | GPT-4o | Gemini 2.0 Flash | Grok-2-Vision |
15
+ | Video Generation | Sora (stub) | Veo 2 | ❌ Not supported |
16
+
17
+ ---
18
+
19
+ ## Prerequisites
20
+
21
+ Set the appropriate API keys in your environment before running tests:
22
+
23
+ ```bash
24
+ export OPENAI_KEY="sk-..."
25
+ export GEMINI_API_KEY="AIza..."
26
+ export XAI_API_KEY="xai-..."
27
+ ```
28
+
29
+ ---
30
+
31
+ ## Running the Tests
32
+
33
+ ### Run all modality tests
34
+
35
+ ```bash
36
+ npx jest tests/manual/modalities --testTimeout=300000 --runInBand
37
+ ```
38
+
39
+ > **`--runInBand`** is recommended so tests within a file run sequentially (some tests depend on outputs from prior tests in the same file).
40
+
41
+ ### Run a single provider
42
+
43
+ ```bash
44
+ # OpenAI only
45
+ npx jest tests/manual/modalities/openai.modalities.test.ts --testTimeout=120000 --runInBand
46
+
47
+ # Google only
48
+ npx jest tests/manual/modalities/google.modalities.test.ts --testTimeout=300000 --runInBand
49
+
50
+ # XAI only
51
+ npx jest tests/manual/modalities/xai.modalities.test.ts --testTimeout=120000 --runInBand
52
+ ```
53
+
54
+ ### Run a single test by name
55
+
56
+ ```bash
57
+ npx jest tests/manual/modalities/openai.modalities.test.ts \
58
+ --testNamePattern="DALL-E 3" \
59
+ --testTimeout=60000
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Output Files
65
+
66
+ All generated artifacts are saved under `tests/manual/modalities/outputs/<provider>/` so they can be reviewed after the tests run.
67
+
68
+ ### OpenAI (`outputs/openai/`)
69
+
70
+ | File | Test | Description |
71
+ |------|------|-------------|
72
+ | `tts-output.mp3` | Test 1 | TTS-1 generated speech audio |
73
+ | `transcription.json` | Test 2 | Whisper transcription of the TTS audio |
74
+ | `dalle3-output.png` | Test 3 | DALL-E 3 generated image |
75
+ | `dalle3-output-url.txt` | Test 3 | URL fallback if b64_json not returned |
76
+ | `vision-description.txt` | Test 4 | GPT-4o description of the DALL-E image |
77
+
78
+ ### Google (`outputs/google/`)
79
+
80
+ | File | Test | Description |
81
+ |------|------|-------------|
82
+ | `tts-output.wav` | Test 1 | Gemini 2.0 Flash TTS audio |
83
+ | `gemini-flash-image.png` | Test 2 | Gemini 2.0 Flash inline image |
84
+ | `gemini-flash-image-url.txt` | Test 2 | URL fallback |
85
+ | `imagen3-output.png` | Test 3 | Imagen 3 generated image |
86
+ | `imagen3-output-url.txt` | Test 3 | URL fallback |
87
+ | `vision-description.txt` | Test 4 | Gemini description of the Imagen 3 image |
88
+ | `veo2-output.mp4` | Test 5 | Veo 2 generated video |
89
+ | `veo2-output-url.txt` | Test 5 | URL fallback |
90
+
91
+ ### XAI (`outputs/xai/`)
92
+
93
+ | File | Test | Description |
94
+ |------|------|-------------|
95
+ | `aurora-output.png` | Test 1 | Aurora generated image |
96
+ | `aurora-output-url.txt` | Test 1 | URL fallback |
97
+ | `vision-description.txt` | Test 2 | Grok-2-Vision description of Aurora image |
98
+
99
+ ---
100
+
101
+ ## Test Dependency Order
102
+
103
+ Several tests depend on output from a previous test **within the same file**. Always run tests sequentially (`--runInBand`):
104
+
105
+ - **OpenAI**: Test 2 (Whisper) needs `tts-output.mp3` from Test 1
106
+ - **OpenAI**: Test 4 (Vision) needs `dalle3-output.png` from Test 3
107
+ - **Google**: Test 4 (Vision) needs `imagen3-output.png` from Test 3
108
+ - **XAI**: Test 2 (Vision) needs `aurora-output.png` from Test 1
109
+
110
+ If a dependency file is missing, the dependent test will be skipped with a clear log message.
111
+
112
+ ---
113
+
114
+ ## Provider Notes
115
+
116
+ ### OpenAI
117
+
118
+ - **TTS models**: `tts-1` (faster, lower quality) and `tts-1-hd` (higher quality). Voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.
119
+ - **Whisper**: Supports `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`. Response formats: `json`, `text`, `srt`, `verbose_json`, `vtt`.
120
+ - **DALL-E 3**: Sizes `1024x1024`, `1792x1024`, `1024x1792`. Quality: `standard` or `hd`.
121
+ - **Sora**: Not yet implemented in the client. See Test 5 for the intended API.
122
+
123
+ ### Google
124
+
125
+ - **Gemini TTS**: Uses `gemini-2.0-flash-preview-tts` model. Supports multi-speaker synthesis.
126
+ - **Gemini 2.0 Flash image**: Native inline image generation using the `gemini-2.0-flash-preview-image-generation` model.
127
+ - **Imagen 3**: High-quality image generation via the Vertex-style Gemini API.
128
+ - **Veo 2**: Video generation via `veo-2.0-generate-001`. Generation is asynchronous and polls for completion — allow up to 5 minutes.
129
+
130
+ ### XAI
131
+
132
+ - **Aurora**: XAI's image generation model. Uses the OpenAI-compatible `/images/generations` endpoint.
133
+ - **Grok-2-Vision**: Vision model for image understanding (`grok-2-vision-1212`).
134
+ - **Audio**: XAI does not support audio generation or transcription. Tests 3 and 4 verify these throw errors.
135
+ - **Video**: XAI has no public video generation API yet. Test 5 is a documented placeholder.
136
+
137
+ ---
138
+
139
+ ## Adding New Tests
140
+
141
+ 1. Create a new file: `tests/manual/modalities/<provider>.modalities.test.ts`
142
+ 2. Save all outputs to `path.join(__dirname, "outputs", "<provider>", "<filename>")`
143
+ 3. Guard each test with an API key check and skip gracefully if not set
144
+ 4. Add `--runInBand` if tests depend on each other's outputs
145
+ 5. Update this README with the new provider and output files
146
+
147
+ ---
148
+
149
+ ## Troubleshooting
150
+
151
+ **Tests time out**: Increase `--testTimeout`. Video generation (Veo 2) can take 2–5 minutes.
152
+
153
+ **"API key not set" skip**: Export the relevant environment variable before running.
154
+
155
+ **Dependency file missing**: Run tests in order with `--runInBand` rather than in parallel.
156
+
157
+ **TypeScript errors**: Run `npm run compile` from `packages/knowhow/` to check for type issues.