@elizaos/plugin-local-ai 1.0.0-beta.48 → 1.0.0-beta.49

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,46 +12,62 @@ Add the plugin to your character configuration:
12
12
 
13
13
  ## Configuration
14
14
 
15
- The plugin requires these environment variables (can be set in .env file or character settings):
16
-
17
- ```json
18
- "settings": {
19
- "USE_LOCAL_AI": true,
20
- "USE_STUDIOLM_TEXT_MODELS": false,
21
-
22
- "STUDIOLM_SERVER_URL": "http://localhost:1234",
23
- "STUDIOLM_SMALL_MODEL": "lmstudio-community/deepseek-r1-distill-qwen-1.5b",
24
- "STUDIOLM_MEDIUM_MODEL": "deepseek-r1-distill-qwen-7b",
25
- "STUDIOLM_EMBEDDING_MODEL": false
26
- }
27
- ```
15
+ The plugin is configured using environment variables (typically set in a `.env` file or via your deployment settings):
28
16
 
29
17
  Or in `.env` file:
30
18
 
31
19
  ```env
32
- # Local AI Configuration
33
- USE_LOCAL_AI=true
34
- USE_STUDIOLM_TEXT_MODELS=false
35
-
36
- # StudioLM Configuration
37
- STUDIOLM_SERVER_URL=http://localhost:1234
38
- STUDIOLM_SMALL_MODEL=lmstudio-community/deepseek-r1-distill-qwen-1.5b
39
- STUDIOLM_MEDIUM_MODEL=deepseek-r1-distill-qwen-7b
40
- STUDIOLM_EMBEDDING_MODEL=false
41
- ```
20
+ # Optional: Specify a custom directory for models (GGUF files)
21
+ # MODELS_DIR=/path/to/your/models
42
22
 
43
- ### Configuration Options
23
+ # Optional: Specify a custom directory for caching other components (tokenizers, etc.)
24
+ # CACHE_DIR=/path/to/your/cache
44
25
 
45
- #### Text Model Source (Choose One)
26
+ # Optional: Specify filenames for the text generation and embedding models within the models directory
27
+ # LOCAL_SMALL_MODEL=my-custom-small-model.gguf
28
+ # LOCAL_LARGE_MODEL=my-custom-large-model.gguf
29
+ # LOCAL_EMBEDDING_MODEL=my-custom-embedding-model.gguf
46
30
 
47
- - `USE_STUDIOLM_TEXT_MODELS`: Enable StudioLM text models
31
+ # Optional: Fallback dimension size for embeddings if generation fails. Defaults to the model's default (e.g., 384).
32
+ # LOCAL_EMBEDDING_DIMENSIONS=384
33
+ ```
48
34
 
49
- #### StudioLM Settings
35
+ ### Configuration Options
50
36
 
51
- - `STUDIOLM_SERVER_URL`: StudioLM API endpoint (default: http://localhost:1234)
52
- - `STUDIOLM_SMALL_MODEL`: Model for lighter tasks
53
- - `STUDIOLM_MEDIUM_MODEL`: Model for standard tasks
54
- - `STUDIOLM_EMBEDDING_MODEL`: Model for embeddings (or false to disable)
37
+ - `MODELS_DIR` (Optional): Specifies a custom directory for storing model files (GGUF format). If not set, defaults to `~/.eliza/models`.
38
+ - `CACHE_DIR` (Optional): Specifies a custom directory for caching other components like tokenizers. If not set, defaults to `~/.eliza/cache`.
39
+ - `LOCAL_SMALL_MODEL` (Optional): Specifies the filename for the small text generation model (e.g., `DeepHermes-3-Llama-3-3B-Preview-q4.gguf`) located in the models directory.
40
+ - `LOCAL_LARGE_MODEL` (Optional): Specifies the filename for the large text generation model (e.g., `DeepHermes-3-Llama-3-8B-q4.gguf`) located in the models directory.
41
+ - `LOCAL_EMBEDDING_MODEL` (Optional): Specifies the filename for the text embedding model (e.g., `bge-small-en-v1.5.Q4_K_M.gguf`) located in the models directory.
42
+ - `LOCAL_EMBEDDING_DIMENSIONS` (Optional): Defines the expected dimension size for text embeddings. This is primarily used as a fallback dimension if the embedding model fails to generate an embedding. If not set, it defaults to the embedding model's native dimension size (e.g., 384 for `bge-small-en-v1.5.Q4_K_M.gguf`).
43
+
44
+ ## Prerequisites
45
+
46
+ ### FFmpeg for Audio Transcription
47
+
48
+ The audio transcription feature (`ModelType.TRANSCRIPTION`) relies on **FFmpeg** to process audio files. If FFmpeg is not installed or not found in your system's PATH, transcription will fail.
49
+
50
+ **Installation:**
51
+
52
+ - **macOS (using Homebrew):**
53
+ ```bash
54
+ brew install ffmpeg
55
+ ```
56
+ - **Linux (Debian/Ubuntu):**
57
+ ```bash
58
+ sudo apt-get update && sudo apt-get install ffmpeg
59
+ ```
60
+ - **Linux (Fedora):**
61
+ ```bash
62
+ sudo dnf install ffmpeg
63
+ ```
64
+ - **Windows (using Chocolatey):**
65
+ ```bash
66
+ choco install ffmpeg
67
+ ```
68
+ Alternatively, download FFmpeg from the [official FFmpeg website](https://ffmpeg.org/download.html) and add the `bin` directory (containing `ffmpeg.exe`) to your system's PATH environment variable.
69
+
70
+ After installation, ensure that the `ffmpeg` command is accessible from your terminal. You may need to restart your terminal or your application for the changes to take effect.
55
71
 
56
72
  ## Features
57
73
 
@@ -59,55 +75,61 @@ The plugin provides these model classes:
59
75
 
60
76
  - `TEXT_SMALL`: Fast, efficient text generation using smaller models
61
77
  - `TEXT_LARGE`: More capable text generation using larger models
78
+ - `TEXT_EMBEDDING`: Generates text embeddings locally.
62
79
  - `IMAGE_DESCRIPTION`: Local image analysis using Florence-2 vision model
63
80
  - `TEXT_TO_SPEECH`: Local text-to-speech synthesis
64
81
  - `TRANSCRIPTION`: Local audio transcription using Whisper
65
82
 
66
- ### Image Analysis
83
+ ### Text Generation
67
84
 
68
85
  ```typescript
69
- const { title, description } = await runtime.useModel(
70
- ModelType.IMAGE_DESCRIPTION,
71
- 'https://example.com/image.jpg'
72
- );
86
+ // Using small model
87
+ const smallResponse = await runtime.useModel(ModelType.TEXT_SMALL, {
88
+ prompt: 'Generate a short response',
89
+ stopSequences: [],
90
+ });
91
+
92
+ // Using large model
93
+ const largeResponse = await runtime.useModel(ModelType.TEXT_LARGE, {
94
+ prompt: 'Generate a detailed response',
95
+ stopSequences: [],
96
+ });
73
97
  ```
74
98
 
75
99
  ### Text-to-Speech
76
100
 
101
+ This plugin uses the [`transformers.js`](https://huggingface.co/docs/transformers.js) library for Text-to-Speech synthesis, running directly in the Node.js environment without external Python dependencies for this feature.
102
+
77
103
  ```typescript
78
104
  const audioStream = await runtime.useModel(ModelType.TEXT_TO_SPEECH, 'Text to convert to speech');
79
105
  ```
80
106
 
81
- ### Audio Transcription
107
+ **Current Implementation Details:**
82
108
 
83
- ```typescript
84
- const transcription = await runtime.useModel(ModelType.TRANSCRIPTION, audioBuffer);
85
- ```
109
+ - **Model:** By default, it uses the [`Xenova/speecht5_tts`](https://huggingface.co/Xenova/speecht5_tts) model (ONNX format), which is optimized for `transformers.js`.
110
+ - **Engine:** `@huggingface/transformers` library.
111
+ - **Speaker:** It uses a default speaker embedding for `SpeechT5`. The specific voice cannot be configured through environment variables currently.
112
+ - **Caching:** The ONNX model files and the default speaker embedding will be automatically downloaded and cached by `transformers.js` (typically in `~/.cache/huggingface/hub` or as configured by `transformers.js` environment variables) on first use.
86
113
 
87
- ### Text Generation
114
+ ### Text Embedding
88
115
 
89
116
  ```typescript
90
- // Using small model
91
- const smallResponse = await runtime.useModel(ModelType.TEXT_SMALL, {
92
- context: 'Generate a short response',
93
- stopSequences: [],
94
- });
95
-
96
- // Using large model
97
- const largeResponse = await runtime.useModel(ModelType.TEXT_LARGE, {
98
- context: 'Generate a detailed response',
99
- stopSequences: [],
117
+ const embedding = await runtime.useModel(ModelType.TEXT_EMBEDDING, {
118
+ text: 'Text to get embedding for',
100
119
  });
101
120
  ```
102
121
 
103
- ## Model Sources
122
+ ### Image Analysis
104
123
 
105
- ### 1. StudioLM (LM Studio)
124
+ ```typescript
125
+ const { title, description } = await runtime.useModel(
126
+ ModelType.IMAGE_DESCRIPTION,
127
+ 'https://example.com/image.jpg'
128
+ );
129
+ ```
106
130
 
107
- - Local inference server for running various open models
108
- - Supports chat completion API similar to OpenAI
109
- - Configure with `USE_STUDIOLM_TEXT_MODELS=true`
110
- - Supports both small and medium-sized models
111
- - Optional embedding model support
131
+ ### Audio Transcription
112
132
 
113
- Note: The plugin validates that only one text model source is enabled at a time to prevent conflicts.
133
+ ```typescript
134
+ const transcription = await runtime.useModel(ModelType.TRANSCRIPTION, audioBuffer);
135
+ ```