@aj-archipelago/cortex 1.3.7 → 1.3.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  # Cortex
2
- Cortex simplifies and accelerates the process of creating applications that harness the power of modern AI models like chatGPT and GPT-4 by providing a structured interface (GraphQL or REST) to a powerful prompt execution environment. This enables complex augmented prompting and abstracts away most of the complexity of managing model connections like chunking input, rate limiting, formatting output, caching, and handling errors.
2
+ Cortex simplifies and accelerates the process of creating applications that harness the power of modern AI models like GPT-4o (chatGPT), o1, Gemini, the Claude series, Flux, Grok and more by poviding a structured interface (GraphQL or REST) to a powerful prompt execution environment. This enables complex augmented prompting and abstracts away most of the complexity of managing model connections like chunking input, rate limiting, formatting output, caching, and handling errors.
3
3
  ## Why build Cortex?
4
4
  Modern AI models are transformational, but a number of complexities emerge when developers start using them to deliver application-ready functions. Most models require precisely formatted, carefully engineered and sequenced prompts to produce consistent results, and the responses are typically largely unstructured text without validation or formatting. Additionally, these models are evolving rapidly, are typically costly and slow to query and implement hard request size and rate restrictions that need to be carefully navigated for optimum throughput. Cortex offers a solution to these problems and provides a simple and extensible package for interacting with NL AI models.
5
5
 
@@ -7,8 +7,7 @@ Modern AI models are transformational, but a number of complexities emerge when
7
7
  Just about anything! It's kind of an LLM swiss army knife. Here are some ideas:
8
8
  * Create custom chat agents with memory and personalization and then expose them through a bunch of different UIs (custom chat portals, Slack, Microsoft Teams, etc. - anything that can be extended and speak to a REST or GraphQL endpoint)
9
9
  * Spin up LLM powered automatons with their prompting logic and AI API handling logic all centrally encapsulated.
10
- * Make LLM chains and agents from LangChain.js available via scalable REST or GraphQL endpoints.
11
- * Put a REST or GraphQL front end on your locally-run models (e.g. llama.cpp) and use them in concert with other tools.
10
+ * Put a REST or GraphQL front end on any model, including your locally-run models (e.g. llama.cpp) and use them in concert with other tools.
12
11
  * Create modular custom coding assistants (code generation, code reviews, test writing, AI pair programming) and easily integrate them with your existing editing tools.
13
12
  * Create powerful AI editing tools (copy editing, paraphrasing, summarization, etc.) for your company and then integrate them with your existing workflow tools without having to build all the LLM-handling logic into those tools.
14
13
  * Create cached endpoints for functions with repeated calls so the results return instantly and you don't run up LLM token charges.
@@ -17,7 +16,47 @@ Just about anything! It's kind of an LLM swiss army knife. Here are some ideas:
17
16
  ## Features
18
17
 
19
18
  * Simple architecture to build custom functional endpoints (called `pathways`), that implement common NL AI tasks. Default pathways include chat, summarization, translation, paraphrasing, completion, spelling and grammar correction, entity extraction, sentiment analysis, and bias analysis.
20
- * Allows for building multi-model, multi-tool, multi-vendor, and model-agnostic pathways (choose the right model or combination of models and tools for the job, implement redundancy) with built-in support for OpenAI GPT-3, GPT-3.5 (chatGPT), and GPT-4 models - both from OpenAI directly and through Azure OpenAI, PaLM Text and PaLM Chat from Google, OpenAI Whisper, Azure Translator, LangChain.js and more.
19
+ * Extensive model support with built-in integrations for:
20
+ - OpenAI models:
21
+ - GPT-4 Omni (GPT-4o)
22
+ - GPT-4 Omni Mini (GPT-4o-mini)
23
+ - O1 (including o1-mini and o1-preview) (Advanced reasoning models)
24
+ - Most of the earlier GPT models (GPT-4, 3.5 Turbo, etc.)
25
+ - Google models:
26
+ - Gemini 1.5 Pro
27
+ - Gemini 2.0 Flash (experimental, via 1.5 Vision API)
28
+ - Gemini 1.5 Flash
29
+ - Earlier Google models (Gemini 1.0 series, PaLM)
30
+ - Anthropic models:
31
+ - Claude 3.5 Sonnet v2 (latest)
32
+ - Claude 3.5 Sonnet
33
+ - Claude 3.5 Haiku
34
+ - Claude 3 Series
35
+ - Azure OpenAI support
36
+ - Custom model implementations
37
+ * Advanced voice and audio capabilities:
38
+ - Real-time voice streaming and processing
39
+ - Audio visualization
40
+ - Whisper integration for transcription with customizable parameters
41
+ - Support for word timestamps and highlighting
42
+ * Enhanced memory management:
43
+ - Structured memory organization (self, directives, user, topics)
44
+ - Context-aware memory search
45
+ - Memory migration and categorization
46
+ - Persistent conversation context
47
+ * Multimodal content support:
48
+ - Text and image processing
49
+ - Vision model integrations
50
+ - Content safety checks
51
+ * Built-in support for:
52
+ - Long-running, asynchronous operations with progress updates
53
+ - Streaming responses
54
+ - Context persistence and memory management
55
+ - Automatic traffic management and content optimization
56
+ - Input/output validation and formatting
57
+ - Request caching
58
+ - Rate limiting and request parallelization
59
+ * Allows for building multi-model, multi-tool, multi-vendor, and model-agnostic pathways (choose the right model or combination of models and tools for the job, implement redundancy) with built-in support for foundation models by OpenAI (hosted at OpenAI or Azure), Gemini, Anthropic, Grok, Black Forest Labs, and more.
21
60
  * Easy, templatized prompt definition with flexible support for most prompt engineering techniques and strategies ranging from simple single prompts to complex custom prompt chains with context continuity.
22
61
  * Built in support for long-running, asynchronous operations with progress updates or streaming responses
23
62
  * Integrated context persistence: have your pathways "remember" whatever you want and use it on the next request to the model
@@ -187,6 +226,74 @@ export default {
187
226
  ```
188
227
  By simply specifying a `format` property and a `list` property, this pathway invokes a built in parser that will take the result of the prompt and try to parse it into an array of 5 objects. The `list` property can be set with or without a `format` property. If there is no `format`, the list will simply try to parse the string into a list of strings. All of this default behavior is implemented in `parser.js`, and you can override it to do whatever you want by providing your own `parser` function in your pathway.
189
228
 
229
+ ### Custom Execution with executePathway
230
+
231
+ The `executePathway` property is the preferred method for customizing pathway behavior while maintaining Cortex's built-in safeguards and optimizations. Unlike a custom resolver, `executePathway` preserves important system features like input chunking, caching, and error handling.
232
+
233
+ ```js
234
+ export default {
235
+ prompt: `{{{text}}}\n\nWrite a summary of the above text in {{language}}:\n\n`,
236
+ inputParameters: {
237
+ language: 'English',
238
+ minLength: 100,
239
+ maxLength: 500
240
+ },
241
+ executePathway: async ({args, resolver, runAllPrompts}) => {
242
+ try {
243
+ // Pre-process arguments and set defaults
244
+ if (!args.language) {
245
+ args.language = 'English';
246
+ }
247
+
248
+ // Pre-execution validation
249
+ if (args.minLength >= args.maxLength) {
250
+ throw new Error('minLength must be less than maxLength');
251
+ }
252
+
253
+ // Execute the prompt
254
+ const result = await runAllPrompts();
255
+
256
+ // Post-execution processing
257
+ if (result.length < args.minLength) {
258
+ // Add more detail request to the prompt
259
+ args.text = result;
260
+ args.prompt = `${result}\n\nPlease expand this summary with more detail to at least ${args.minLength} characters:\n\n`;
261
+ return await runAllPrompts();
262
+ }
263
+
264
+ if (result.length > args.maxLength) {
265
+ // Condense the summary
266
+ args.text = result;
267
+ args.prompt = `${result}\n\nPlease condense this summary to no more than ${args.maxLength} characters while keeping the key points:\n\n`;
268
+ return await runAllPrompts();
269
+ }
270
+
271
+ return result;
272
+ } catch (e) {
273
+ resolver.logError(e);
274
+ throw e;
275
+ }
276
+ }
277
+ };
278
+ ```
279
+
280
+ Key benefits of using `executePathway`:
281
+ - Maintains Cortex's input processing (chunking, validation)
282
+ - Preserves caching and rate limiting
283
+ - Keeps error handling and logging consistent
284
+ - Enables pre- and post-processing of prompts and results
285
+ - Supports validation and conditional execution
286
+ - Allows multiple prompt runs with modified parameters
287
+
288
+ The `executePathway` function receives:
289
+ - `args`: The processed input parameters
290
+ - `resolver`: The pathway resolver with access to:
291
+ - `pathway`: Current pathway configuration
292
+ - `config`: Global Cortex configuration
293
+ - `tool`: Tool-specific data
294
+ - Helper methods like `logError` and `logWarning`
295
+ - `runAllPrompts`: Function to execute the defined prompts with current args
296
+
190
297
  ### Custom Resolver
191
298
 
192
299
  The resolver property defines the function that processes the input and returns the result. The resolver function is an asynchronous function that takes four parameters: `parent`, `args`, `contextValue`, and `info`. The `parent` parameter is the parent object of the resolver function. The `args` parameter is an object that contains the input parameters and any other parameters that are passed to the resolver. The `contextValue` parameter is an object that contains the context and configuration of the pathway. The `info` parameter is an object that contains information about the GraphQL query that triggered the resolver.
@@ -269,56 +376,39 @@ export default {
269
376
  }
270
377
  };
271
378
  ```
272
- ### LangChain.js Support
273
- The ability to define a custom resolver function in Cortex pathways gives Cortex the flexibility to be able to cleanly incorporate alternate pipelines and technology stacks into the execution of a pathway. LangChain JS (https://github.com/hwchase17/langchainjs) is a very popular and well supported mechanism for wiring together models, tools, and logic to achieve some amazing results. We have developed specific functionality to support LangChain in the Cortex prompt execution framework and will continue to build features to fully integrate it with Cortex prompt execution contexts.
274
-
275
- Below is an example pathway integrating with one of the example agents from the LangChain docs. You can see the seamless integration of Cortex's configuration and graphQL / REST interface logic.
276
- ```js
277
- // lc_test.js
278
- // LangChain Cortex integration test
279
-
280
- // Import required modules
281
- import { OpenAI } from "langchain/llms";
282
- import { initializeAgentExecutor } from "langchain/agents";
283
- import { SerpAPI, Calculator } from "langchain/tools";
284
-
285
- export default {
286
-
287
- // Implement custom logic and interaction with Cortex
288
- // in custom resolver.
289
-
290
- resolver: async (parent, args, contextValue, info) => {
291
-
292
- const { config } = contextValue;
293
- const openAIApiKey = config.get('openaiApiKey');
294
- const serpApiKey = config.get('serpApiKey');
295
-
296
- const model = new OpenAI({ openAIApiKey: openAIApiKey, temperature: 0 });
297
- const tools = [new SerpAPI( serpApiKey ), new Calculator()];
298
-
299
- const executor = await initializeAgentExecutor(
300
- tools,
301
- model,
302
- "zero-shot-react-description"
303
- );
304
-
305
- console.log(`====================`);
306
- console.log("Loaded langchain agent.");
307
- const input = args.text;
308
- console.log(`Executing with input "${input}"...`);
309
- const result = await executor.call({ input });
310
- console.log(`Got output ${result.output}`);
311
- console.log(`====================`);
312
-
313
- return result?.output;
314
- },
315
- };
316
- ```
317
379
 
318
380
  ### Building and Loading Pathways
319
381
 
320
382
  Pathways are loaded from modules in the `pathways` directory. The pathways are built and loaded to the `config` object using the `buildPathways` function. The `buildPathways` function loads the base pathway, the core pathways, and any custom pathways. It then creates a new object that contains all the pathways and adds it to the pathways property of the config object. The order of loading means that custom pathways will always override any core pathways that Cortex provides. While pathways are designed to be self-contained, you can override some pathway properties - including whether they're even available at all - in the `pathways` section of the config file.
321
383
 
384
+ ### Pathway Properties
385
+
386
+ Each pathway can define the following properties (with defaults from basePathway.js):
387
+
388
+ - `prompt`: The template string or array of prompts to execute. Default: `{{text}}`
389
+ - `defaultInputParameters`: Default parameters that all pathways inherit:
390
+ - `text`: The input text (default: empty string)
391
+ - `async`: Enable async mode (default: false)
392
+ - `contextId`: Identify request context (default: empty string)
393
+ - `stream`: Enable streaming mode (default: false)
394
+ - `inputParameters`: Additional parameters specific to the pathway. Default: `{}`
395
+ - `typeDef`: GraphQL type definitions for the pathway
396
+ - `rootResolver`: Root resolver for GraphQL queries
397
+ - `resolver`: Resolver for the pathway's specific functionality
398
+ - `inputFormat`: Format of the input ('text' or 'html'). Affects input chunking behavior. Default: 'text'
399
+ - `useInputChunking`: Enable splitting input into multiple chunks to meet context window size. Default: true
400
+ - `useParallelChunkProcessing`: Enable parallel processing of chunks. Default: false
401
+ - `joinChunksWith`: String to join result chunks with when chunking is enabled. Default: '\n\n'
402
+ - `useInputSummarization`: Summarize input instead of chunking. Default: false
403
+ - `truncateFromFront`: Truncate from the front of input instead of the back. Default: false
404
+ - `timeout`: Cancel pathway after this many seconds. Default: 120
405
+ - `enableDuplicateRequests`: Send duplicate requests if not completed after timeout. Default: false
406
+ - `duplicateRequestAfter`: Seconds to wait before sending backup request. Default: 10
407
+ - `executePathway`: Optional function to override default execution. Signature: `({args, runAllPrompts}) => result`
408
+ - `temperature`: Model temperature setting (0.0 to 1.0). Default: 0.9
409
+ - `json`: Require valid JSON response from model. Default: false
410
+ - `manageTokenLength`: Manage input token length for model. Default: true
411
+
322
412
  ## Core (Default) Pathways
323
413
 
324
414
  Below are the default pathways provided with Cortex. These can be used as is, overridden, or disabled via configuration. For documentation on each one including input and output parameters, please look at them in the GraphQL Playground.
@@ -413,45 +503,144 @@ import cortex from '@aj-archipelago/cortex';
413
503
  ```
414
504
 
415
505
  ## Configuration
416
- Configuration of Cortex is done via a [convict](https://github.com/mozilla/node-convict/tree/master) object called `config`. The `config` object is built by combining the default values and any values specified in a configuration file or environment variables. The environment variables take precedence over the values in the configuration file. Below are the configurable properties and their defaults:
506
+ Configuration of Cortex is done via a [convict](https://github.com/mozilla/node-convict/tree/master) object called `config`. The `config` object is built by combining the default values and any values specified in a configuration file or environment variables. The environment variables take precedence over the values in the configuration file.
507
+
508
+ ### Model Configuration
417
509
 
418
- - `basePathwayPath`: The path to the base pathway (the prototype pathway) for Cortex. Default properties for the pathway are set from their values in this basePathway. Default is path.join(__dirname, 'pathways', 'basePathway.js').
510
+ Models are configured in the `models` section of the config. Each model can have the following types:
511
+
512
+ - `OPENAI-CHAT`: For OpenAI chat models (legacy GPT-3.5)
513
+ - `OPENAI-VISION`: For multimodal models (GPT-4o, GPT-4o-mini) supporting text, images, and other content types
514
+ - `OPENAI-REASONING`: For O1 reasoning model with vision capabilities
515
+ - `OPENAI-COMPLETION`: For OpenAI completion models
516
+ - `OPENAI-WHISPER`: For Whisper transcription
517
+ - `GEMINI-1.5-CHAT`: For Gemini 1.5 Pro chat models
518
+ - `GEMINI-1.5-VISION`: For Gemini vision models (including 2.0 Flash experimental)
519
+ - `CLAUDE-3-VERTEX`: For Claude-3 and 3.5 models (Haiku, Opus, Sonnet)
520
+ - `PALM-CHAT`: For PaLM chat models
521
+ - `AZURE-TRANSLATE`: For Azure translation services
522
+
523
+ Each model configuration can include:
524
+
525
+ ```json
526
+ {
527
+ "type": "MODEL_TYPE",
528
+ "url": "API_ENDPOINT",
529
+ "endpoints": [
530
+ {
531
+ "name": "ENDPOINT_NAME",
532
+ "url": "ENDPOINT_URL",
533
+ "headers": {
534
+ "api-key": "{{API_KEY}}",
535
+ "Content-Type": "application/json"
536
+ },
537
+ "requestsPerSecond": 10
538
+ }
539
+ ],
540
+ "maxTokenLength": 32768,
541
+ "maxReturnTokens": 8192,
542
+ "maxImageSize": 5242880,
543
+ "supportsStreaming": true,
544
+ "supportsVision": true,
545
+ "geminiSafetySettings": [
546
+ {
547
+ "category": "HARM_CATEGORY",
548
+ "threshold": "BLOCK_ONLY_HIGH"
549
+ }
550
+ ]
551
+ }
552
+ ```
553
+
554
+ ### Other Configuration Properties
555
+
556
+ The following properties can be configured through environment variables or the configuration file:
557
+
558
+ - `basePathwayPath`: The path to the base pathway (the prototype pathway) for Cortex. Default is path.join(__dirname, 'pathways', 'basePathway.js').
419
559
  - `corePathwaysPath`: The path to the core pathways for Cortex. Default is path.join(__dirname, 'pathways').
420
- - `cortexConfigFile`: The path to a JSON configuration file for the project. Default is null. The value can be set using the `CORTEX_CONFIG_FILE` environment variable.
421
- - `defaultModelName`: The default model name for the project. Default is null. The value can be set using the `DEFAULT_MODEL_NAME` environment variable.
422
- - `enableCache`: A boolean flag indicating whether to enable Axios-level request caching. Default is true. The value can be set using the `CORTEX_ENABLE_CACHE` environment variable.
423
- - `enableGraphqlCache`: A boolean flag indicating whether to enable GraphQL query caching. Default is false. The value can be set using the `CORTEX_ENABLE_GRAPHQL_CACHE` environment variable.
424
- - `enableRestEndpoints`: A boolean flag indicating whether create REST endpoints for pathways as well as GraphQL queries. Default is false. The value can be set using the `CORTEX_ENABLE_REST` environment variable.
425
- - `cortexApiKeys`: A string containing one or more comma separated API keys that the client must pass to Cortex for authorization. Default is null in which case Cortex is unprotected. The value can be set using the `CORTEX_API_KEY` environment variable
426
- - `models`: An object containing the different models used by the project. The value can be set using the `CORTEX_MODELS` environment variable. Cortex is model and vendor agnostic - you can use this config to set up models of any type from any vendor.
427
- - `openaiApiKey`: The API key used for accessing the OpenAI API. This is sensitive information and has no default value. The value can be set using the `OPENAI_API_KEY` environment variable.
428
- - `openaiApiUrl`: The URL used for accessing the OpenAI API. Default is https://api.openai.com/v1/completions. The value can be set using the `OPENAI_API_URL` environment variable.
429
- - `openaiDefaultModel`: The default model name used for the OpenAI API. Default is text-davinci-003. The value can be set using the `OPENAI_DEFAULT_MODEL` environment variable.
430
- - `pathways`: An object containing pathways for the project. The default is an empty object that is filled in during the `buildPathways` step.
431
- - `pathwaysPath`: The path to custom pathways for the project. Default is null.
432
- - `PORT`: The port number for the Cortex server. Default is 4000. The value can be set using the `CORTEX_PORT` environment variable.
433
- - `storageConnectionString`: The connection string used for accessing storage. This is sensitive information and has no default value. The value can be set using the `STORAGE_CONNECTION_STRING` environment variable.
434
-
435
- The `buildPathways` function takes the config object and builds the `pathways` and `pathwayManager` objects by loading the core pathways and any custom pathways specified in the `pathwaysPath` property of the config object. The function returns the `pathways` and `pathwayManager` objects.
436
-
437
- The `buildModels` function takes the `config` object and builds the `models` object by compiling handlebars templates for each model specified in the `models` property of the config object. The function returns the `models` object.
438
-
439
- The `config` object can be used to access configuration values throughout the project. For example, to get the port number for the server, use
560
+ - `cortexApiKeys`: A string containing one or more comma separated API keys that the client must pass to Cortex for authorization. Default is null.
561
+ - `cortexConfigFile`: The path to a JSON configuration file for the project. Default is null.
562
+ - `cortexId`: Identifier for the Cortex instance. Default is 'local'.
563
+ - `defaultModelName`: The default model name for the project. Default is null.
564
+ - `enableCache`: Enable Axios-level request caching. Default is true.
565
+ - `enableDuplicateRequests`: Enable sending duplicate requests if not completed after timeout. Default is true.
566
+ - `enableGraphqlCache`: Enable GraphQL query caching. Default is false.
567
+ - `enableRestEndpoints`: Create REST endpoints for pathways as well as GraphQL queries. Default is false.
568
+ - `gcpServiceAccountKey`: GCP service account key for authentication. Default is null.
569
+ - `models`: Object containing the different models used by the project.
570
+ - `pathways`: Object containing pathways for the project.
571
+ - `pathwaysPath`: Path to custom pathways. Default is './pathways'.
572
+ - `PORT`: Port number for the Cortex server. Default is 4000.
573
+ - `redisEncryptionKey`: Key for Redis data encryption. Default is null.
574
+ - `replicateApiKey`: API key for Replicate services. Default is null.
575
+ - `runwareAiApiKey`: API key for Runware AI services. Default is null.
576
+ - `storageConnectionString`: Connection string for storage access. Default is empty string.
577
+ - `subscriptionKeepAlive`: Keep-alive time for subscriptions in seconds. Default is 0.
578
+
579
+ API-specific configuration:
580
+ - `azureVideoTranslationApiUrl`: URL for Azure video translation API. Default is 'http://127.0.0.1:5005'.
581
+ - `dalleImageApiUrl`: URL for DALL-E image API. Default is 'null'.
582
+ - `neuralSpaceApiKey`: API key for NeuralSpace services. Default is null.
583
+ - `whisperMediaApiUrl`: URL for Whisper media API. Default is 'null'.
584
+ - `whisperTSApiUrl`: URL for Whisper TS API. Default is null.
585
+
586
+ Dynamic Pathways configuration can be set using:
587
+ - `DYNAMIC_PATHWAYS_CONFIG_FILE`: Path to JSON configuration file
588
+ - `DYNAMIC_PATHWAYS_CONFIG_JSON`: JSON configuration as a string
589
+
590
+ The configuration supports environment variable overrides, with environment variables taking precedence over the configuration file values. Access configuration values using:
440
591
  ```js
441
- config.get('PORT')
592
+ config.get('propertyName')
442
593
  ```
443
594
 
444
595
  ## Helper Apps
445
- The Cortex project includes a set of utility applications, which are located in the `helper-apps`` directory. Each of these applications comes with a Dockerfile. This Dockerfile can be used to create a Docker image of the application, which in turn allows the application to be run in a standalone manner using Docker.
446
-
447
- - cortex-file-handler
448
- Extends Cortex with several file processing units. Handles file operations (download, split, upload) with local file system or Azure Storage. It can process different file types including documents, files ( .pdf, .docx, .xlsx, .csv .txt, .json, .md, .xml, .js, .html, .css) and additionally YouTube URLs. It also manages deletion requests and cleanup operations, and provides progress reporting for requests.
596
+ The Cortex project includes a set of utility applications, which are located in the `helper-apps` directory. Each of these applications comes with a Dockerfile. This Dockerfile can be used to create a Docker image of the application, which in turn allows the application to be run in a standalone manner using Docker.
597
+
598
+ ### cortex-realtime-voice-server
599
+ A real-time voice processing server that enables voice interactions with Cortex. Key features include:
600
+ - Real-time audio streaming and processing
601
+ - WebSocket-based communication for low-latency interactions
602
+ - Audio visualization capabilities
603
+ - Support for multiple audio formats
604
+ - Integration with various chat models for voice-to-text-to-voice interactions
605
+ - Configurable audio parameters and processing options
606
+
607
+ ### cortex-whisper-wrapper
608
+ A custom API wrapper for OpenAI's Whisper package, designed as a FastAPI server for transcribing audio files. Features include:
609
+ - Support for multiple audio file formats
610
+ - Customizable transcription parameters:
611
+ - `word_timestamps`: Enable word-level timing information
612
+ - `highlight_words`: Enable word highlighting in output
613
+ - `max_line_count`: Control maximum lines in output
614
+ - `max_line_width`: Control line width in characters
615
+ - `max_words_per_line`: Control words per line
616
+ - SRT file generation for subtitles
617
+ - Progress reporting for long-running transcriptions
618
+ - Support for multiple languages
619
+ - Integration with Azure Blob Storage for file handling
620
+
621
+ ### cortex-file-handler
622
+ Extends Cortex with several file processing capabilities:
623
+ - File operations (download, split, upload) with local file system or Azure Storage
624
+ - Support for various file types:
625
+ - Documents (.pdf, .docx)
626
+ - Spreadsheets (.xlsx, .csv)
627
+ - Text files (.txt, .json, .md, .xml)
628
+ - Web files (.js, .html, .css)
629
+ - YouTube URL processing
630
+ - Progress reporting for file operations
631
+ - Cleanup and deletion management
632
+
633
+ Each helper app can be deployed independently using Docker:
634
+ ```sh
635
+ # Build the Docker image
636
+ docker build --platform=linux/amd64 -t [app-name] .
449
637
 
450
- - cortex-whisper-wrapper
451
- The cortex-whisper-wrapper is a custom API wrapper for the Whisper package from OpenAI. Designed as a FastAPI server, it aids in transcribing audio files using the Whisper library.
452
- The server provides an HTTP endpoint ("/") that accepts POST requests with a JSON payload containing a "fileurl" parameter specifying the URL of the audio file to transcribe. Upon receiving a request, the server calls the transcribe function to perform the transcription using the Whisper model, saves the transcription as an SRT file, and returns the SRT content as the response.
453
- It helps Cortex to make use of Whisper OS parameters which currently are not available in OpenAI API. Parameters supported are: 'word_timestamps', 'highlight_words', 'max_line_count', 'max_line_width', 'max_words_per_line'. These parameters customizes transcription output, for more info on the parameters see open source Whisper package https://github.com/openai/whisper
638
+ # Tag the image for your registry
639
+ docker tag [app-name] [registry-url]/cortex/[app-name]
454
640
 
641
+ # Push to registry (optional login may be required)
642
+ docker push [registry-url]/cortex/[app-name]
643
+ ```
455
644
 
456
645
  ## Troubleshooting
457
646
  If you encounter any issues while using Cortex, there are a few things you can do. First, check the Cortex documentation for any common errors and their solutions. If that does not help, you can also open an issue on the Cortex GitHub repository.
@@ -569,3 +758,312 @@ To ensure the security of dynamic pathways:
569
758
  Each instance of Cortex maintains its own local cache of pathways. On every dynamic pathway request, it checks if the local cache is up to date by comparing the last modified timestamp of the storage with the last update time of the local cache. If the local cache is out of date, it reloads the pathways from storage.
570
759
 
571
760
  This approach ensures that all instances of Cortex will eventually have access to the most up-to-date dynamic pathways without requiring immediate synchronization.
761
+
762
+ ## Entity System
763
+
764
+ Cortex includes a powerful Entity System that allows you to build autonomous agents with memory, tool routing, and multi-modal interaction capabilities. These entities can be accessed synchronously or asynchronously through text or voice interfaces.
765
+
766
+ ### Overview
767
+
768
+ The Entity System is built around two core pathways:
769
+ - `sys_entity_start.js`: The entry point for entity interactions, handling initial routing and tool selection
770
+ - `sys_entity_continue.js`: Manages callback execution in synchronous mode
771
+
772
+ ### Key Features
773
+
774
+ - **Memory Management**: Entities maintain contextual memory that can be self-modified
775
+ - **Tool Routing**: Automatic detection and routing to specialized tools:
776
+ - Code execution
777
+ - Image generation and vision processing
778
+ - Video and audio processing
779
+ - Document handling
780
+ - Expert reasoning
781
+ - Search capabilities
782
+ - Memory operations
783
+ - **Multi-Modal Support**: Handle text, voice, images, and other content types
784
+ - **Flexible Response Modes**:
785
+ - Synchronous: Complete interactions with callbacks
786
+ - Asynchronous: Fire-and-forget operations with queue support
787
+ - Streaming: Real-time response streaming
788
+ - **Voice Integration**: Built-in voice response capabilities with acknowledgment system
789
+
790
+ ### Basic Usage
791
+
792
+ Using Apollo Client (or any GraphQL client):
793
+
794
+ ```js
795
+ import { ApolloClient, InMemoryCache, gql } from '@apollo/client';
796
+
797
+ const client = new ApolloClient({
798
+ uri: 'http://your-cortex-server:4000/graphql',
799
+ cache: new InMemoryCache()
800
+ });
801
+
802
+ // Define your queries
803
+ const START_ENTITY = gql`
804
+ query StartEntity(
805
+ $chatHistory: [ChatMessageInput!]!
806
+ $aiName: String
807
+ $contextId: String
808
+ $aiMemorySelfModify: Boolean
809
+ $aiStyle: String
810
+ $voiceResponse: Boolean
811
+ $stream: Boolean
812
+ ) {
813
+ entityStart(
814
+ chatHistory: $chatHistory
815
+ aiName: $aiName
816
+ contextId: $contextId
817
+ aiMemorySelfModify: $aiMemorySelfModify
818
+ aiStyle: $aiStyle
819
+ voiceResponse: $voiceResponse
820
+ stream: $stream
821
+ ) {
822
+ result
823
+ tool
824
+ }
825
+ }
826
+ `;
827
+
828
+ const CONTINUE_ENTITY = gql`
829
+ query ContinueEntity(
830
+ $chatHistory: [ChatMessageInput!]!
831
+ $contextId: String!
832
+ $generatorPathway: String!
833
+ ) {
834
+ entityContinue(
835
+ chatHistory: $chatHistory
836
+ contextId: $contextId
837
+ generatorPathway: $generatorPathway
838
+ ) {
839
+ result
840
+ }
841
+ }
842
+ `;
843
+
844
+ // Example usage
845
+ async function interactWithEntity() {
846
+ // Start an entity interaction
847
+ const startResponse = await client.query({
848
+ query: START_ENTITY,
849
+ variables: {
850
+ chatHistory: [
851
+ { role: 'user', content: 'Create a Python script that calculates prime numbers' }
852
+ ],
853
+ aiName: "Jarvis",
854
+ contextId: "session-123",
855
+ aiMemorySelfModify: true,
856
+ aiStyle: "OpenAI",
857
+ voiceResponse: false,
858
+ stream: false
859
+ }
860
+ });
861
+
862
+ // Handle tool routing response
863
+ const tool = JSON.parse(startResponse.data.entityStart.tool);
864
+
865
+ if (tool.toolCallbackName) {
866
+ // Continue with specific tool if needed
867
+ const continueResponse = await client.query({
868
+ query: CONTINUE_ENTITY,
869
+ variables: {
870
+ chatHistory: [
871
+ { role: 'user', content: 'Create a Python script that calculates prime numbers' },
872
+ { role: 'assistant', content: startResponse.data.entityStart.result }
873
+ ],
874
+ contextId: "session-123",
875
+ generatorPathway: tool.toolCallbackName
876
+ }
877
+ });
878
+
879
+ return continueResponse.data.entityContinue.result;
880
+ }
881
+
882
+ return startResponse.data.entityStart.result;
883
+ }
884
+
885
+ // For streaming responses
886
+ const STREAM_ENTITY = gql`
887
+ subscription StreamEntity(
888
+ $chatHistory: [ChatMessageInput!]!
889
+ $contextId: String!
890
+ $aiName: String
891
+ ) {
892
+ entityStream(
893
+ chatHistory: $chatHistory
894
+ contextId: $contextId
895
+ aiName: $aiName
896
+ ) {
897
+ content
898
+ done
899
+ }
900
+ }
901
+ `;
902
+
903
+ // Example streaming usage
904
+ client.subscribe({
905
+ query: STREAM_ENTITY,
906
+ variables: {
907
+ chatHistory: [
908
+ { role: 'user', content: 'Explain quantum computing' }
909
+ ],
910
+ contextId: "session-123",
911
+ aiName: "Jarvis"
912
+ }
913
+ }).subscribe({
914
+ next(response) {
915
+ if (response.data.entityStream.content) {
916
+ console.log(response.data.entityStream.content);
917
+ }
918
+ if (response.data.entityStream.done) {
919
+ console.log('Stream completed');
920
+ }
921
+ },
922
+ error(err) {
923
+ console.error('Error:', err);
924
+ }
925
+ });
926
+ ```
927
+
928
+ This example demonstrates:
929
+ - Setting up a GraphQL client
930
+ - Starting an entity interaction
931
+ - Handling tool routing responses
932
+ - Continuing with specific tools when needed
933
+ - Using streaming for real-time responses
934
+
935
+ ### Configuration Options
936
+
937
+ - `aiName`: Custom name for the entity
938
+ - `aiStyle`: Choose between "OpenAI" or "Anthropic" response styles
939
+ - `aiMemorySelfModify`: Enable/disable autonomous memory management
940
+ - `voiceResponse`: Enable voice responses with acknowledgments
941
+ - `stream`: Enable response streaming
942
+ - `dataSources`: Array of data sources to use ["mydata", "aja", "aje", "wires", "bing"]
943
+ - `privateData`: Flag for handling private data
944
+ - `language`: Preferred language for responses
945
+
946
+ ### Tool Integration
947
+
948
+ The Entity System automatically routes requests to appropriate tools based on content analysis:
949
+
950
+ 1. **Code Execution**:
951
+ - Detects coding tasks
952
+ - Routes to async execution queue
953
+ - Returns progress updates
954
+
955
+ 2. **Content Generation**:
956
+ - Image generation
957
+ - Expert writing
958
+ - Reasoning tasks
959
+ - Document processing
960
+
961
+ 3. **Search and Memory**:
962
+ - Integrated search capabilities
963
+ - Memory context retrieval
964
+ - Document analysis
965
+
966
+ 4. **Multi-Modal Processing**:
967
+ - Vision analysis
968
+ - Video processing
969
+ - Audio handling
970
+ - PDF processing
971
+
972
+ ### Memory System
973
+
974
+ Entities maintain a sophisticated memory system that:
975
+ - Preserves context between interactions
976
+ - Self-modifies based on interactions
977
+ - Categorizes information
978
+ - Provides relevant context for future interactions
979
+
980
+ ### Best Practices
981
+
982
+ 1. **Context Management**:
983
+ - Use consistent `contextId` for related interactions
984
+ - Limit chat history to recent messages for efficiency
985
+
986
+ 2. **Tool Selection**:
987
+ - Let the entity auto-route to appropriate tools
988
+ - Override routing with specific `generatorPathway` when needed
989
+
990
+ 3. **Memory Usage**:
991
+ - Enable `aiMemorySelfModify` for autonomous memory management
992
+ - Use memory context for more coherent interactions
993
+
994
+ 4. **Response Handling**:
995
+ - Use streaming for real-time interactions
996
+ - Enable voice responses for voice interfaces
997
+ - Handle async operations with appropriate timeouts
998
+
999
+ ## Redis Integration
1000
+
1001
+ Cortex uses Redis as both a storage system and a communication backplane:
1002
+
1003
+ ### Memory and Context Storage
1004
+
1005
+ - **Entity Memory**: Stores and searches entity memory contexts using `contextId` as the key
1006
+ - **Context Persistence**: Saves pathway context between executions
1007
+
1008
+ ### Inter-Service Communication
1009
+
1010
+ - **Distributed Deployment**: Enables communication between multiple Cortex instances
1011
+ - **Helper App Integration**: Facilitates communication with auxiliary services:
1012
+ - File Handler: Progress updates and file operation status
1013
+ - Autogen: Message queuing and async task management
1014
+ - Voice Server: Real-time streaming coordination
1015
+ - Whisper Wrapper: Transcription job management
1016
+ - **Pub/Sub Messaging**: Supports real-time event distribution across services
1017
+ - **Queue Management**: Handles asynchronous task distribution and processing
1018
+
1019
+ ### Caching
1020
+
1021
+ - **Request Caching**: When `enableCache` is true, caches model responses to avoid duplicate API calls
1022
+ - **GraphQL Caching**: When `enableGraphqlCache` is true, caches GraphQL query results
1023
+ - **Cache Encryption**: Uses `redisEncryptionKey` to encrypt sensitive cached data
1024
+
1025
+ ### Configuration
1026
+
1027
+ Redis connection can be configured through environment variables:
1028
+
1029
+ ```sh
1030
+ # Required
1031
+ REDIS_URL=redis://your-redis-host:6379
1032
+
1033
+ # Optional
1034
+ REDIS_ENCRYPTION_KEY=your-encryption-key # For encrypted caching
1035
+ REDIS_PASSWORD=your-redis-password # If authentication is required
1036
+ REDIS_TLS=true # For TLS/SSL connections
1037
+ REDIS_CONNECTION_STRING= # Full connection string (alternative to URL)
1038
+ ```
1039
+
1040
+ ### Cache Management
1041
+
1042
+ Cortex implements intelligent cache management:
1043
+ - Automatic cache invalidation based on TTL
1044
+ - Model-specific cache keys for optimized hit rates
1045
+ - Cache size management to prevent memory overflow
1046
+ - Support for cache clearing through API endpoints
1047
+
1048
+ ### Best Practices
1049
+
1050
+ 1. **Memory Storage**:
1051
+ - Use consistent `contextId` values for related operations
1052
+ - Implement regular memory cleanup for unused contexts
1053
+ - Monitor memory usage to prevent Redis memory overflow
1054
+
1055
+ 2. **Caching**:
1056
+ - Enable caching for frequently repeated queries
1057
+ - Use encryption for sensitive data
1058
+ - Monitor cache hit rates for optimization
1059
+
1060
+ 3. **High Availability**:
1061
+ - Configure Redis persistence for data durability
1062
+ - Use Redis clustering for scalability
1063
+ - Implement failover mechanisms for reliability
1064
+
1065
+ 4. **Communication**:
1066
+ - Use appropriate channels for different types of messages
1067
+ - Implement retry logic for critical operations
1068
+ - Monitor queue lengths and processing times
1069
+ - Set up proper error handling for pub/sub operations