npm - @nocobase/plugin-ai - Versions diffs - 2.1.0-alpha.34 → 2.1.0-alpha.35 - Mend

@nocobase/plugin-ai 2.1.0-alpha.34 → 2.1.0-alpha.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (103) hide show

package/dist/ai/docs/nocobase/ai-employees/scenarios/localization-hy-mt.md ADDED Viewed

@@ -0,0 +1,241 @@
+---
+pkg: '@nocobase/plugin-ai'
+title: 'Use Lina and local HY-MT to translate localization entries'
+description: 'Deploy the HY-MT1.5 GGUF translation model with llama-server and configure it for Lina to batch translate NocoBase localization entries.'
+keywords: 'Lina,localization,HY-MT,GGUF,llama-server,OpenAI compatible,AI translation,NocoBase'
+---
+# Use Lina and local HY-MT1.5-1.8B to translate localization entries
+This guide describes a localization translation practice: deploy a translation-specific small model locally, expose it as an OpenAI-compatible service, and configure it for Lina to translate localization entries in batches.
+This approach is suitable for translating many system entries, plugin text, menus, collection titles, and field labels. Compared with online models, local models are not affected by external API RPM, TPM, or concurrency limits, and concurrency can be tuned according to machine and model capability.
+## Overview
+This guide uses:
+- Model: `tencent/HY-MT1.5-1.8B-GGUF`
+- Inference service: `llama-server`
+- Integration: OpenAI-compatible API
+- AI Employee: Lina
+- Entry point: Localization Management page
+:::info{title=Note}
+HY-MT1.5-1.8B is a translation-specific small model. It is more suitable for short entries, UI text, and batch translation. General chat models are not recommended as the first choice for localization tasks.
+:::
+## Prerequisites
+Before starting, prepare:
+- The **Localization Management** plugin is enabled.
+- Target language is enabled.
+- Localization entries have been synchronized.
+- The local machine or server can run [`llama-server`](https://github.com/ggml-org/llama.cpp).
+- The NocoBase service can access the HTTP address of `llama-server`.
+## Deploy HY-MT GGUF
+### Install llama.cpp
+On macOS, you can install it with Homebrew:
+```bash
+brew install llama.cpp
+```
+You can also use a prebuilt llama.cpp binary or build it from source. The final requirement is that `llama-server` is available.
+### Start an OpenAI-compatible service
+Start the service with the GGUF model from Hugging Face:
+```bash
+llama-server \
+  -hf tencent/HY-MT1.5-1.8B-GGUF:Q4_K_M \
+  --host 0.0.0.0 \
+  --port 8000 \
+  -c 2048 \
+  -np 4
+```
+| Parameter | Description |
+| --- | --- |
+| `-hf` | Load the model from Hugging Face. |
+| `--host` | Listening address. Use `127.0.0.1` for local testing or `0.0.0.0` for container or remote access. |
+| `--port` | HTTP service port. |
+| `-c` | Context length. Localization entries are usually short, so `2048` is usually enough. |
+| `-np` | Number of parallel slots. Adjust according to machine performance. |
+:::info{title=Tip}
+If server resources are limited, start with `-np 1` or `-np 2`, then increase gradually after verifying stability.
+:::
+## Test the Model Service
+After `llama-server` starts, check service health:
+```bash
+curl http://127.0.0.1:8000/health
+```
+Then test translation through the OpenAI-compatible API:
+```bash
+curl http://127.0.0.1:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tencent/HY-MT1.5-1.8B-GGUF:Q4_K_M",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Translate the following text into Chinese. Output only the translated result without any additional explanation:\n\nSave"
+      }
+    ]
+  }'
+```
+If you start from a local model file, change `model` to the actual model name returned or configured by the service.
+:::warning{title=Note}
+If a request does not respond for a long time, the model may be too slow, concurrency may be too high, or context may be too large. Lower `-np` and NocoBase translation concurrency first, then observe response time.
+:::
+## Configure an LLM Service in NocoBase
+Go to `System Settings -> AI Employees -> LLM service` and add an LLM service.
+Example configuration:
+| Setting | Example |
+| --- | --- |
+| Provider | OpenAI (completions) |
+| Title | HY-MT Local |
+| Base URL | `http://127.0.0.1:8000/v1` |
+| API Key | If llama-server has no authentication, use a placeholder such as `dummy`. |
+| Enabled Models | Select `tencent/HY-MT1.5-1.8B-GGUF:Q4_K_M`, or enter the actual model name. |
+After configuration, use `Test flight` to verify the model.
+:::info{title=Tip}
+If NocoBase runs in Docker, `127.0.0.1` points to the container itself and may not access the host service. Use the host IP, container network address, or `host.docker.internal`.
+:::
+## Configure Lina's Dedicated Model
+Go to `System Settings -> AI Employees -> AI employees`, open Lina, and switch to `Model settings`.
+1. Enable `Enable dedicated model configuration`.
+2. Select the HY-MT local model in `Models`.
+3. Save the configuration.
+After this, Lina uses this model for localization translation tasks, preventing users or tasks from switching to general chat models.
+For details, see [Configure AI Employee Models](/ai-employees/features/model-settings).
+## Configure Translation Concurrency
+Localization translation task concurrency is controlled by `AI_LOCALIZATION_CONCURRENCY`:
+```bash
+AI_LOCALIZATION_CONCURRENCY=10
+```
+Rules:
+- Default: `10`
+- Minimum: `1`
+- Maximum: `20`
+- Values outside the range use the default
+The best concurrency depends on CPU, GPU, memory, model quantization, and `llama-server -np`. If the default concurrency causes issues:
+1. Start with `AI_LOCALIZATION_CONCURRENCY=1` and verify single-entry translation.
+2. Set both `llama-server -np` and `AI_LOCALIZATION_CONCURRENCY` to `2` or `4`.
+3. Observe response time, CPU/GPU usage, and task progress.
+4. Increase concurrency gradually only if stable.
+:::warning{title=Note}
+Do not set concurrency too high at the beginning. If concurrency exceeds actual model capacity, tasks may become slower due to queuing, timeout, or service stalls.
+:::
+## Execute Localization Translation
+Go to `System Management -> Localization Management`.
+1. Switch to the target language.
+2. Click `Synchronize` to ensure entries are synchronized.
+3. Click Lina's avatar.
+4. Choose a task scope:
+   - `Incremental translation`: translate entries without translations.
+   - `Selected translation`: translate selected entries in the table.
+   - `Full translation`: translate all entries in the current language.
+5. Check entry count, provider, and model in the confirmation dialog.
+6. Confirm to create the async task.
+7. Wait for completion, review translations, and publish.
+Start with `Selected translation` for a few entries to verify output style and speed before running incremental or full translation.
+## How Lina Builds Translation Requests
+Lina builds requests from entries and reference translations. For short entries, existing references are used to improve consistency:
+- Built-in entries prefer Chinese translations as references.
+- Non-built-in entries prefer the system default language as references.
+- If an English reference exists, English is used as source text.
+- Translation results are written to the target language but are not published automatically.
+Prompt semantics are similar to:
+```text
+Refer to the following translation:
+{source_term} is translated as {target_term}
+Translate the following text into {target_language}. Output only the translated result without any additional explanation:
+{source_text}
+```
+## Troubleshooting
+### No progress after creating a task
+Check whether `llama-server` received requests. View service logs or call `/v1/chat/completions` with `curl`.
+If the model receives requests but does not return, reduce:
+- `AI_LOCALIZATION_CONCURRENCY`
+- `llama-server -np`
+- `llama-server -c`
+### The model returns explanations instead of translations
+Local translation models are usually more stable than general chat models. If explanations still appear, test the same prompt with `curl` first to verify the model's output style.
+You can also translate shorter entries first or reduce sampling parameters such as temperature.
+### NocoBase cannot connect to the model service
+Check:
+- Whether Base URL includes `/v1`.
+- Whether the NocoBase runtime environment can access the address.
+- Whether firewall or container networking blocks the port.
+- Whether `llama-server` is still running.
+## Review Before Publishing
+After AI translation finishes, review before publishing:
+- Filter by module and check short entries such as menus, buttons, field names, and statuses.
+- Check variables, placeholders, HTML tags, and formatting symbols.
+- Check key business terminology consistency.
+- If built-in entry translations are overwritten, resynchronize in Localization Management and select `Reset system built-in entry translations` to restore defaults. To contribute default translations for the system and official plugins, see [Translation Contribution](/get-started/translations).
+- Publish in a test environment first, then sync to production.
+## References
+- [tencent/HY-MT1.5-1.8B-GGUF](https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF)
+- [llama-server documentation](https://www.mintlify.com/ggml-org/llama.cpp/inference/server)
+- [Lina: Localization Engineer](/ai-employees/built-in/lina)

package/dist/ai/docs/nocobase/ai-employees/workflow/nodes/employee/configuration.md CHANGED Viewed

@@ -18,7 +18,7 @@ You can refer to the following documents:
 - [Workflow](/workflow)
 - [Configure LLM Service](/ai-employees/features/llm-service)
-- [Built-in AI Employees](/ai-employees/features/built-in-employee)
+- [Built-in AI Employees](/ai-employees/built-in/)
 - [New AI Employee](/ai-employees/features/new-ai-employees)
 ### Task

package/dist/ai/docs/nocobase/cluster-mode/index.md CHANGED Viewed

@@ -11,6 +11,10 @@ keywords: "cluster mode,multi-instance,load balancing,shared storage,Redis,Kuber
 Starting from v1.6.0, NocoBase supports running applications in cluster mode. When an application runs in cluster mode, it can improve its performance in handling concurrent access by using multiple instances and a multi-core mode.
+Based on cluster mode, you can achieve application-level high availability: traffic is distributed by a load balancer across multiple NocoBase instances within the same cluster, so if a single instance fails, restarts, or is being released, other instances can continue serving traffic. In practice, a single cluster should usually be deployed within the same low-latency network environment.
+It is important to note that NocoBase cluster mode addresses horizontal scaling and high availability of application instances at the application layer. If you need warm standby or disaster recovery across availability zones or regions, you would typically deploy multiple independent clusters, and the operations team would be responsible for the data replication and switchover strategy for the database, shared storage, and other infrastructure.
 ## System Architecture
@@ -33,4 +37,4 @@ This document only introduces the basic concepts and components of NocoBase's cl
   - [Operations](./operations)
 - Advanced
   - [Service Splitting](./services-splitting)
-- [Development Reference](./development)
+- [Development Reference](./development)

package/dist/ai/docs/nocobase/cluster-mode/preparations.md CHANGED Viewed

@@ -25,12 +25,14 @@ First, please ensure you have obtained licenses for the above plugins (you can p
 ## System Components
-Other system components, besides the application instance itself, can be selected by operations personnel based on the team's operational needs.
+In addition to the application instances themselves, cluster deployment also requires system components such as the database, middleware, shared storage, and load balancing. Different teams can choose the specific implementation of these components based on their own operating model.
 ### Database
 Since the current cluster mode only targets application instances, the database temporarily supports only a single node. If you have a database architecture like master-slave, you need to implement it yourself through middleware and ensure it is transparent to the NocoBase application.
+If you need warm standby or disaster recovery across availability zones or regions, the database synchronization and switchover strategy must be designed and implemented by your operations team.
 ### Middleware
 NocoBase's cluster mode relies on some middleware to achieve inter-cluster communication and coordination, including:
@@ -49,10 +51,37 @@ When all middleware components use Redis, you can start a single Redis service w
 ### Shared Storage
-NocoBase needs to use the storage directory to store system-related files. In multi-node mode, you should mount a cloud disk (or NFS) to support shared access across multiple nodes. Otherwise, local storage will not be automatically synchronized, and it will not function properly.
+NocoBase needs to use the `storage` directory to store system-related files, and shared storage is also a required component of cluster deployment. In multi-node mode, you can choose different implementations based on your infrastructure environment, such as cloud disks, NFS, or EFS, to support shared access across multiple nodes. Otherwise, system files will not be synchronized automatically and the application will not work properly.
 When deploying with Kubernetes, please refer to the [Kubernetes Deployment: Shared Storage](./kubernetes#shared-storage) section.
+#### What is typically stored in the `storage` directory
+The contents of the `storage` directory vary depending on the enabled plugins and the deployment method. Based on the current implementation, common contents include:
+| Path | Purpose | Usage recommendation |
+| --- | --- | --- |
+| `storage/uploads` | Uploaded files when using local storage mode | In production clusters, prefer object storage such as S3 / OSS / COS |
+| `storage/plugins` | Local plugin packages installed, uploaded, or discovered at runtime | If you rely on local plugins, this directory must be shared; if plugins are built into the image, this dependency can be reduced |
+| `storage/apps/<app>/jwt_secret.dat` | Default token secret generated automatically when `APP_KEY` is not explicitly configured | Do not rely on this file in production; explicitly configure `APP_KEY` instead |
+| `storage/apps/<app>/aes_key.dat` | Default AES key generated automatically when `APP_AES_SECRET_KEY` is not explicitly configured | Do not rely on this file in production; explicitly configure `APP_AES_SECRET_KEY` instead |
+| `storage/environment-variables/<app>/aes_key.dat` | AES key file used in environment-variable plugin scenarios | A read-only mounted key file is recommended |
+| `storage/logs` | Default log directory and some migration logs | It is recommended to integrate with an external logging platform in the future |
+| `storage/tmp` | Temporary files for import, export, migration, etc. | It can be temporary, but if it needs to be reused across nodes, it must be shared, or the operation should be fixed to a single management node |
+| `storage/backups`, `storage/duplicator`, `storage/migration-manager` | Artifacts related to backup, restore, and migration | These should be treated as operations directories, stored persistently, and not modified concurrently across multiple nodes |
+The table above is not exhaustive, but it illustrates an important point: `storage` mixes business files, secret files, plugin directories, logs, and operations-related temporary artifacts. Therefore, in cluster deployment, the baseline is usually to persist and share the entire `/app/nocobase/storage`.
+#### Storage recommendations
+Cluster consistency in NocoBase mainly relies on the database, Redis, message queues, and distributed locks, rather than treating shared file systems as a high-concurrency coordination medium.
+Therefore, the following is recommended:
+- For high-frequency business files such as attachments, prefer object storage. In production clusters, long-term reliance on local storage is not recommended.
+- Shared storage should mainly be used to host the `storage` directory, rather than as a high-throughput file storage service.
+- Operations such as plugin installation, plugin upgrade, backup, restore, and migration should be performed only after scaling the cluster down to a single node, and the cluster can be scaled out again after completion.
 ### Load Balancing
 Cluster mode requires a load balancer to distribute requests, as well as for health checks and failover of application instances. This part should be selected and configured according to the team's operational needs.
@@ -61,7 +90,6 @@ Taking a self-hosted Nginx as an example, add the following content to the confi
 ```
 upstream myapp {
-    # ip_hash; # Can be used for session persistence. When enabled, requests from the same client are always sent to the same backend server.
     server 172.31.0.1:13000; # Internal node 1
     server 172.31.0.2:13000; # Internal node 2
     server 172.31.0.3:13000; # Internal node 3
@@ -82,10 +110,37 @@ This means that requests are reverse-proxied and distributed to different server
 For load balancing middleware provided by other cloud service providers, please refer to the configuration documentation provided by the specific provider.
+For high-availability deployments, the following is recommended:
+- Run at least 2 application instances within the same cluster, and let the load balancer handle instance failover.
+- The health check of the load balancer should reflect actual application availability, not just whether the port is open.
+- If you need warm standby across availability zones or regions, you would typically deploy multiple independent clusters, and the operations team would be responsible for synchronizing and switching the database, shared storage, and other infrastructure.
 ## Environment Variable Configuration
 All nodes in the cluster should use the same environment variable configuration. In addition to NocoBase's basic [environment variables](../api/app/env), the following middleware-related environment variables also need to be configured.
+### Key Secrets
+In addition to the middleware environment variables, all nodes in the cluster should also explicitly configure the same key secrets:
+```ini
+APP_KEY=
+APP_AES_SECRET_KEY=
+# Or use a read-only mounted key file
+# APP_AES_SECRET_KEY_PATH=
+```
+- `APP_KEY` is used for token / JWT signing. If it is not explicitly configured, the application falls back to the default secret file under `storage`.
+- `APP_AES_SECRET_KEY` is used to decrypt sensitive fields in the database. If it is not explicitly configured, the application also falls back to the default secret file under `storage`.
+- In ephemeral containers or multi-node deployments, relying on automatically generated local secret files can cause tokens to become invalid after restart, or historical encrypted data to become undecryptable.
+:::info{title=Tip}
+`APP_AES_SECRET_KEY` must be a 32-byte AES-256 key, represented by 64 hexadecimal characters.
+In cloud environments, it is recommended to manage these values centrally through services such as Secrets Manager, SSM Parameter Store, Kubernetes Secret, or a read-only mounted key file.
+:::
 ### Multi-core Mode
 When the application runs on a multi-core node, you can enable the node's multi-core mode: