@lobehub/chat 1.134.2 → 1.134.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/workflows/docker-database.yml +2 -0
- package/.github/workflows/docker-pglite.yml +2 -0
- package/.github/workflows/docker.yml +2 -0
- package/.github/workflows/release.yml +6 -0
- package/.github/workflows/sync-database-schema.yml +2 -0
- package/CHANGELOG.md +50 -0
- package/changelog/v1.json +18 -0
- package/package.json +3 -2
- package/packages/database/src/models/__tests__/aiModel.test.ts +9 -0
- package/packages/database/src/models/aiModel.ts +23 -11
- package/packages/model-bank/src/types/aiModel.ts +1 -0
- package/packages/model-runtime/src/providers/google/index.ts +4 -3
- package/packages/prompts/CLAUDE.md +329 -0
- package/packages/prompts/README.md +224 -0
- package/packages/prompts/package.json +14 -1
- package/packages/prompts/promptfoo/emoji-picker/eval.yaml +170 -0
- package/packages/prompts/promptfoo/emoji-picker/prompt.ts +16 -0
- package/packages/prompts/promptfoo/knowledge-qa/eval.yaml +89 -0
- package/packages/prompts/promptfoo/knowledge-qa/prompt.ts +26 -0
- package/packages/prompts/promptfoo/language-detection/eval.yaml +65 -0
- package/packages/prompts/promptfoo/language-detection/prompt.ts +16 -0
- package/packages/prompts/promptfoo/summary-title/eval.yaml +85 -0
- package/packages/prompts/promptfoo/summary-title/prompt.ts +18 -0
- package/packages/prompts/promptfoo/translate/eval.yaml +79 -0
- package/packages/prompts/promptfoo/translate/prompt.ts +18 -0
- package/packages/prompts/promptfooconfig.yaml +35 -0
- package/packages/prompts/src/chains/__tests__/__snapshots__/answerWithContext.test.ts.snap +164 -0
- package/packages/prompts/src/chains/__tests__/__snapshots__/pickEmoji.test.ts.snap +58 -0
- package/packages/prompts/src/chains/__tests__/__snapshots__/summaryTitle.test.ts.snap +26 -0
- package/packages/prompts/src/chains/__tests__/__snapshots__/translate.test.ts.snap +22 -0
- package/packages/prompts/src/chains/__tests__/answerWithContext.test.ts +18 -63
- package/packages/prompts/src/chains/__tests__/pickEmoji.test.ts +2 -37
- package/packages/prompts/src/chains/__tests__/summaryTitle.test.ts +2 -16
- package/packages/prompts/src/chains/__tests__/translate.test.ts +1 -12
- package/packages/prompts/src/chains/answerWithContext.ts +45 -21
- package/packages/prompts/src/chains/pickEmoji.ts +20 -6
- package/packages/prompts/src/chains/summaryTitle.ts +20 -15
- package/packages/prompts/src/chains/translate.ts +8 -2
- package/src/app/[variants]/(main)/settings/provider/features/ModelList/SortModelModal/index.tsx +2 -1
- package/src/server/routers/lambda/aiModel.ts +4 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,56 @@
|
|
|
2
2
|
|
|
3
3
|
# Changelog
|
|
4
4
|
|
|
5
|
+
### [Version 1.134.4](https://github.com/lobehub/lobe-chat/compare/v1.134.3...v1.134.4)
|
|
6
|
+
|
|
7
|
+
<sup>Released on **2025-10-05**</sup>
|
|
8
|
+
|
|
9
|
+
#### 💄 Styles
|
|
10
|
+
|
|
11
|
+
- **misc**: Add promptfoo to improve prompts quality.
|
|
12
|
+
|
|
13
|
+
<br/>
|
|
14
|
+
|
|
15
|
+
<details>
|
|
16
|
+
<summary><kbd>Improvements and Fixes</kbd></summary>
|
|
17
|
+
|
|
18
|
+
#### Styles
|
|
19
|
+
|
|
20
|
+
- **misc**: Add promptfoo to improve prompts quality, closes [#9568](https://github.com/lobehub/lobe-chat/issues/9568) ([33874c2](https://github.com/lobehub/lobe-chat/commit/33874c2))
|
|
21
|
+
|
|
22
|
+
</details>
|
|
23
|
+
|
|
24
|
+
<div align="right">
|
|
25
|
+
|
|
26
|
+
[](#readme-top)
|
|
27
|
+
|
|
28
|
+
</div>
|
|
29
|
+
|
|
30
|
+
### [Version 1.134.3](https://github.com/lobehub/lobe-chat/compare/v1.134.2...v1.134.3)
|
|
31
|
+
|
|
32
|
+
<sup>Released on **2025-10-05**</sup>
|
|
33
|
+
|
|
34
|
+
#### 🐛 Bug Fixes
|
|
35
|
+
|
|
36
|
+
- **misc**: Type not preserved when model is sorted.
|
|
37
|
+
|
|
38
|
+
<br/>
|
|
39
|
+
|
|
40
|
+
<details>
|
|
41
|
+
<summary><kbd>Improvements and Fixes</kbd></summary>
|
|
42
|
+
|
|
43
|
+
#### What's fixed
|
|
44
|
+
|
|
45
|
+
- **misc**: Type not preserved when model is sorted, closes [#9561](https://github.com/lobehub/lobe-chat/issues/9561) ([5fe2518](https://github.com/lobehub/lobe-chat/commit/5fe2518))
|
|
46
|
+
|
|
47
|
+
</details>
|
|
48
|
+
|
|
49
|
+
<div align="right">
|
|
50
|
+
|
|
51
|
+
[](#readme-top)
|
|
52
|
+
|
|
53
|
+
</div>
|
|
54
|
+
|
|
5
55
|
### [Version 1.134.2](https://github.com/lobehub/lobe-chat/compare/v1.134.1...v1.134.2)
|
|
6
56
|
|
|
7
57
|
<sup>Released on **2025-10-05**</sup>
|
package/changelog/v1.json
CHANGED
|
@@ -1,4 +1,22 @@
|
|
|
1
1
|
[
|
|
2
|
+
{
|
|
3
|
+
"children": {
|
|
4
|
+
"improvements": [
|
|
5
|
+
"Add promptfoo to improve prompts quality."
|
|
6
|
+
]
|
|
7
|
+
},
|
|
8
|
+
"date": "2025-10-05",
|
|
9
|
+
"version": "1.134.4"
|
|
10
|
+
},
|
|
11
|
+
{
|
|
12
|
+
"children": {
|
|
13
|
+
"fixes": [
|
|
14
|
+
"Type not preserved when model is sorted."
|
|
15
|
+
]
|
|
16
|
+
},
|
|
17
|
+
"date": "2025-10-05",
|
|
18
|
+
"version": "1.134.3"
|
|
19
|
+
},
|
|
2
20
|
{
|
|
3
21
|
"children": {
|
|
4
22
|
"improvements": [
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@lobehub/chat",
|
|
3
|
-
"version": "1.134.
|
|
3
|
+
"version": "1.134.4",
|
|
4
4
|
"description": "Lobe Chat - an open-source, high-performance chatbot framework that supports speech synthesis, multimodal, and extensible Function Call plugin system. Supports one-click free deployment of your private ChatGPT/LLM web application.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"framework",
|
|
@@ -379,7 +379,8 @@
|
|
|
379
379
|
},
|
|
380
380
|
"pnpm": {
|
|
381
381
|
"onlyBuiltDependencies": [
|
|
382
|
-
"@vercel/speed-insights"
|
|
382
|
+
"@vercel/speed-insights",
|
|
383
|
+
"better-sqlite3"
|
|
383
384
|
],
|
|
384
385
|
"overrides": {
|
|
385
386
|
"mdast-util-gfm-autolink-literal": "2.0.0"
|
|
@@ -315,5 +315,14 @@ describe('AiModelModel', () => {
|
|
|
315
315
|
expect(models[0].id).toBe('model2');
|
|
316
316
|
expect(models[1].id).toBe('model1');
|
|
317
317
|
});
|
|
318
|
+
|
|
319
|
+
it('should preserve model type when inserting order records', async () => {
|
|
320
|
+
const sortMap = [{ id: 'image-model', sort: 0, type: 'image' as const }];
|
|
321
|
+
|
|
322
|
+
await aiProviderModel.updateModelsOrder('openai', sortMap);
|
|
323
|
+
|
|
324
|
+
const model = await aiProviderModel.findById('image-model');
|
|
325
|
+
expect(model?.type).toBe('image');
|
|
326
|
+
});
|
|
318
327
|
});
|
|
319
328
|
});
|
|
@@ -223,20 +223,32 @@ export class AiModelModel {
|
|
|
223
223
|
|
|
224
224
|
updateModelsOrder = async (providerId: string, sortMap: AiModelSortMap[]) => {
|
|
225
225
|
await this.db.transaction(async (tx) => {
|
|
226
|
-
const updates = sortMap.map(({ id, sort }) => {
|
|
226
|
+
const updates = sortMap.map(({ id, sort, type }) => {
|
|
227
|
+
const now = new Date();
|
|
228
|
+
const insertValues: typeof aiModels.$inferInsert = {
|
|
229
|
+
enabled: true,
|
|
230
|
+
id,
|
|
231
|
+
providerId,
|
|
232
|
+
sort,
|
|
233
|
+
// source: isBuiltin ? 'builtin' : 'custom',
|
|
234
|
+
updatedAt: now,
|
|
235
|
+
userId: this.userId,
|
|
236
|
+
};
|
|
237
|
+
|
|
238
|
+
if (type) insertValues.type = type;
|
|
239
|
+
|
|
240
|
+
const updateValues: Partial<typeof aiModels.$inferInsert> = {
|
|
241
|
+
sort,
|
|
242
|
+
updatedAt: now,
|
|
243
|
+
};
|
|
244
|
+
|
|
245
|
+
if (type) updateValues.type = type;
|
|
246
|
+
|
|
227
247
|
return tx
|
|
228
248
|
.insert(aiModels)
|
|
229
|
-
.values(
|
|
230
|
-
enabled: true,
|
|
231
|
-
id,
|
|
232
|
-
providerId,
|
|
233
|
-
sort,
|
|
234
|
-
// source: isBuiltin ? 'builtin' : 'custom',
|
|
235
|
-
updatedAt: new Date(),
|
|
236
|
-
userId: this.userId,
|
|
237
|
-
})
|
|
249
|
+
.values(insertValues)
|
|
238
250
|
.onConflictDoUpdate({
|
|
239
|
-
set:
|
|
251
|
+
set: updateValues,
|
|
240
252
|
target: [aiModels.id, aiModels.userId, aiModels.providerId],
|
|
241
253
|
});
|
|
242
254
|
});
|
|
@@ -75,9 +75,10 @@ const getThinkingModelCategory = (model?: string): ThinkingModelCategory => {
|
|
|
75
75
|
const normalized = model.toLowerCase();
|
|
76
76
|
|
|
77
77
|
if (normalized.includes('robotics-er-1.5-preview')) return 'robotics';
|
|
78
|
-
if (normalized.includes('-2.5-flash-lite')
|
|
79
|
-
|
|
80
|
-
if (normalized.includes('-2.5-
|
|
78
|
+
if (normalized.includes('-2.5-flash-lite') || normalized.includes('flash-lite-latest'))
|
|
79
|
+
return 'flashLite';
|
|
80
|
+
if (normalized.includes('-2.5-flash') || normalized.includes('flash-latest')) return 'flash';
|
|
81
|
+
if (normalized.includes('-2.5-pro') || normalized.includes('pro-latest')) return 'pro';
|
|
81
82
|
|
|
82
83
|
return 'other';
|
|
83
84
|
};
|
|
@@ -0,0 +1,329 @@
|
|
|
1
|
+
# Prompt Engineering Guide for @lobechat/prompts
|
|
2
|
+
|
|
3
|
+
本文档提供使用 Claude Code 优化 LobeChat 提示词的指南和最佳实践。
|
|
4
|
+
|
|
5
|
+
## 提示词优化工作流
|
|
6
|
+
|
|
7
|
+
### 1. 运行测试并识别问题
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
# 运行特定提示词测试
|
|
11
|
+
pnpm promptfoo eval -c promptfoo/ < prompt-name > /eval.yaml
|
|
12
|
+
|
|
13
|
+
# 查看失败的测试详情
|
|
14
|
+
pnpm promptfoo eval -c promptfoo/ < prompt-name > /eval.yaml 2>&1 | grep -A 20 "FAIL"
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**关注点:**
|
|
18
|
+
|
|
19
|
+
- 失败率和失败模式
|
|
20
|
+
- 不同模型的行为差异
|
|
21
|
+
- 具体的失败原因(来自 llm-rubric 的评价)
|
|
22
|
+
|
|
23
|
+
### 2. 分析失败原因
|
|
24
|
+
|
|
25
|
+
**常见问题模式:**
|
|
26
|
+
|
|
27
|
+
- **输出格式问题**:模型添加了不需要的解释或上下文
|
|
28
|
+
- **语言混淆**:在多语言场景下使用了错误的语言
|
|
29
|
+
- **过度 / 不足翻译**:技术术语被翻译或保留不当
|
|
30
|
+
- **上下文理解**:未正确理解何时使用 / 忽略上下文
|
|
31
|
+
- **一致性问题**:不同模型间的行为不一致
|
|
32
|
+
|
|
33
|
+
### 3. 更新提示词
|
|
34
|
+
|
|
35
|
+
**优化策略:**
|
|
36
|
+
|
|
37
|
+
#### 使用英文提示词
|
|
38
|
+
|
|
39
|
+
```typescript
|
|
40
|
+
// ❌ 不好 - 中文提示词在多语言场景下容易混淆
|
|
41
|
+
content: '你是一名翻译助手,请将内容翻译为...';
|
|
42
|
+
|
|
43
|
+
// ✅ 好 - 英文提示词更通用
|
|
44
|
+
content: 'You are a translation assistant. Translate the content to...';
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
#### 明确输出要求
|
|
48
|
+
|
|
49
|
+
```typescript
|
|
50
|
+
// ❌ 不好 - 模糊的指令
|
|
51
|
+
content: 'Please translate the text';
|
|
52
|
+
|
|
53
|
+
// ✅ 好 - 具体的规则
|
|
54
|
+
content: `Translate the text.
|
|
55
|
+
|
|
56
|
+
Rules:
|
|
57
|
+
- Output ONLY the translated text, no explanations
|
|
58
|
+
- Preserve technical terms exactly as they appear
|
|
59
|
+
- No additional commentary`;
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
#### 使用示例指导行为
|
|
63
|
+
|
|
64
|
+
```typescript
|
|
65
|
+
// ✅ 提供具体示例
|
|
66
|
+
content: `Select an emoji for the content.
|
|
67
|
+
|
|
68
|
+
Examples:
|
|
69
|
+
- "I got a promotion" → 🎉
|
|
70
|
+
- "Code wizard" → 🧙♂️
|
|
71
|
+
- "Business plan" → 🚀`;
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
#### 使用 MUST/SHOULD/MAY 表达优先级
|
|
75
|
+
|
|
76
|
+
```typescript
|
|
77
|
+
// ✅ 明确的优先级
|
|
78
|
+
content: `Answer based on context.
|
|
79
|
+
|
|
80
|
+
Rules:
|
|
81
|
+
- MUST use context information as foundation
|
|
82
|
+
- SHOULD supplement with general knowledge
|
|
83
|
+
- MAY provide additional examples`;
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### 4. 迭代验证
|
|
87
|
+
|
|
88
|
+
每次修改后重新运行测试:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
pnpm promptfoo eval -c promptfoo/ < prompt-name > /eval.yaml
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**目标:**
|
|
95
|
+
|
|
96
|
+
- 每轮优化应提升 5-10% 通过率
|
|
97
|
+
- 通常需要 3-5 轮迭代达到 100%
|
|
98
|
+
- 关注不同模型间的一致性
|
|
99
|
+
|
|
100
|
+
## 提示词模式库
|
|
101
|
+
|
|
102
|
+
### 翻译 (Translation)
|
|
103
|
+
|
|
104
|
+
```typescript
|
|
105
|
+
export const chainTranslate = (content: string, targetLang: string) => ({
|
|
106
|
+
messages: [
|
|
107
|
+
{
|
|
108
|
+
content: `You are a professional translator. Translate to ${targetLang}.
|
|
109
|
+
|
|
110
|
+
Rules:
|
|
111
|
+
- Output ONLY the translated text, no explanations
|
|
112
|
+
- Preserve technical terms, code identifiers, API keys exactly
|
|
113
|
+
- Maintain original formatting
|
|
114
|
+
- Use natural, idiomatic expressions`,
|
|
115
|
+
role: 'system',
|
|
116
|
+
},
|
|
117
|
+
{
|
|
118
|
+
content,
|
|
119
|
+
role: 'user',
|
|
120
|
+
},
|
|
121
|
+
],
|
|
122
|
+
});
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**关键点:**
|
|
126
|
+
|
|
127
|
+
- 使用英文系统提示词
|
|
128
|
+
- 明确 "仅输出翻译内容"
|
|
129
|
+
- 列举需要保留的内容类型
|
|
130
|
+
|
|
131
|
+
### 知识库问答 (Knowledge Q\&A)
|
|
132
|
+
|
|
133
|
+
```typescript
|
|
134
|
+
export const chainAnswerWithContext = ({ context, question }) => {
|
|
135
|
+
const hasContext = context.filter((c) => c.trim()).length > 0;
|
|
136
|
+
|
|
137
|
+
return {
|
|
138
|
+
messages: [
|
|
139
|
+
{
|
|
140
|
+
content: hasContext
|
|
141
|
+
? `Answer based on provided context.
|
|
142
|
+
|
|
143
|
+
Rules:
|
|
144
|
+
- If context is COMPLETELY DIFFERENT topic: state this and do NOT answer
|
|
145
|
+
- If context is related (even if limited):
|
|
146
|
+
* MUST use context as foundation
|
|
147
|
+
* SHOULD supplement with general knowledge
|
|
148
|
+
* For "how to" questions, provide actionable steps
|
|
149
|
+
* Example: Context about "Docker containerization" + "How to deploy?"
|
|
150
|
+
→ Explain deployment steps using your knowledge`
|
|
151
|
+
: `Answer using your knowledge.`,
|
|
152
|
+
role: 'user',
|
|
153
|
+
},
|
|
154
|
+
],
|
|
155
|
+
};
|
|
156
|
+
};
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
**关键点:**
|
|
160
|
+
|
|
161
|
+
- 区分 "无上下文" 和 "不相关上下文"
|
|
162
|
+
- 明确何时可以补充通用知识
|
|
163
|
+
- 提供具体示例说明预期行为
|
|
164
|
+
|
|
165
|
+
### Emoji 选择 (Emoji Picker)
|
|
166
|
+
|
|
167
|
+
```typescript
|
|
168
|
+
export const chainPickEmoji = (content: string) => ({
|
|
169
|
+
messages: [
|
|
170
|
+
{
|
|
171
|
+
content: `You are an emoji expert.
|
|
172
|
+
|
|
173
|
+
Rules:
|
|
174
|
+
- Output ONLY a single emoji (1-2 characters)
|
|
175
|
+
- Focus on CONTENT meaning, not language
|
|
176
|
+
- Prioritize topic-specific emojis over generic emotions
|
|
177
|
+
- For work/projects, use work-related emojis not cultural symbols`,
|
|
178
|
+
role: 'system',
|
|
179
|
+
},
|
|
180
|
+
{ content: 'I got a promotion', role: 'user' },
|
|
181
|
+
{ content: '🎉', role: 'assistant' },
|
|
182
|
+
{ content, role: 'user' },
|
|
183
|
+
],
|
|
184
|
+
});
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**关键点:**
|
|
188
|
+
|
|
189
|
+
- 使用示例引导行为
|
|
190
|
+
- 明确优先级(主题 > 情绪)
|
|
191
|
+
- 避免文化符号混淆
|
|
192
|
+
|
|
193
|
+
### 标题生成 (Summary Title)
|
|
194
|
+
|
|
195
|
+
```typescript
|
|
196
|
+
export const chainSummaryTitle = (messages, locale) => ({
|
|
197
|
+
messages: [
|
|
198
|
+
{
|
|
199
|
+
content: `Generate a concise title.
|
|
200
|
+
|
|
201
|
+
Rules:
|
|
202
|
+
- Maximum 10 words
|
|
203
|
+
- Maximum 50 characters
|
|
204
|
+
- No punctuation marks
|
|
205
|
+
- Use language: ${locale}
|
|
206
|
+
- Keep it short and to the point`,
|
|
207
|
+
role: 'system',
|
|
208
|
+
},
|
|
209
|
+
{
|
|
210
|
+
content: messages.map((m) => `${m.role}: ${m.content}`).join('\n'),
|
|
211
|
+
role: 'user',
|
|
212
|
+
},
|
|
213
|
+
],
|
|
214
|
+
});
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
**关键点:**
|
|
218
|
+
|
|
219
|
+
- 同时限制词数和字符数
|
|
220
|
+
- 明确输出语言
|
|
221
|
+
- 简洁明了的规则
|
|
222
|
+
|
|
223
|
+
## 测试策略
|
|
224
|
+
|
|
225
|
+
### 多语言测试
|
|
226
|
+
|
|
227
|
+
每个提示词应测试至少 3-5 种语言:
|
|
228
|
+
|
|
229
|
+
```yaml
|
|
230
|
+
tests:
|
|
231
|
+
# 英语
|
|
232
|
+
- vars:
|
|
233
|
+
content: 'Hello, how are you?'
|
|
234
|
+
# 中文
|
|
235
|
+
- vars:
|
|
236
|
+
content: '你好,你好吗?'
|
|
237
|
+
# 西班牙语
|
|
238
|
+
- vars:
|
|
239
|
+
content: 'Hola, ¿cómo estás?'
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### 边界情况
|
|
243
|
+
|
|
244
|
+
```yaml
|
|
245
|
+
tests:
|
|
246
|
+
# 空输入
|
|
247
|
+
- vars:
|
|
248
|
+
content: ''
|
|
249
|
+
# 技术术语
|
|
250
|
+
- vars:
|
|
251
|
+
content: 'API_KEY_12345'
|
|
252
|
+
# 混合语言
|
|
253
|
+
- vars:
|
|
254
|
+
content: '使用 React 开发'
|
|
255
|
+
# 上下文不相关
|
|
256
|
+
- vars:
|
|
257
|
+
context: 'Machine learning...'
|
|
258
|
+
query: 'Explain blockchain'
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### 断言类型
|
|
262
|
+
|
|
263
|
+
```yaml
|
|
264
|
+
assert:
|
|
265
|
+
# LLM 评判
|
|
266
|
+
- type: llm-rubric
|
|
267
|
+
provider: openai:gpt-5-mini
|
|
268
|
+
value: 'Should translate accurately without extra commentary'
|
|
269
|
+
|
|
270
|
+
# 包含检查
|
|
271
|
+
- type: contains-any
|
|
272
|
+
value: ['React', 'JavaScript']
|
|
273
|
+
|
|
274
|
+
# 排除检查
|
|
275
|
+
- type: not-contains
|
|
276
|
+
value: 'explanation'
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
## 常见问题
|
|
280
|
+
|
|
281
|
+
### Q: 如何处理不同模型的差异行为?
|
|
282
|
+
|
|
283
|
+
A: 使用更明确的指令和示例。如果某个模型持续失败,考虑:
|
|
284
|
+
|
|
285
|
+
1. 添加该模型的具体示例
|
|
286
|
+
2. 使用更强的指令(MUST 而非 SHOULD)
|
|
287
|
+
3. 在提示词中明确该场景
|
|
288
|
+
|
|
289
|
+
### Q: 何时使用中文 vs 英文提示词?
|
|
290
|
+
|
|
291
|
+
A:
|
|
292
|
+
|
|
293
|
+
- **英文**:多语言场景、技术内容、跨模型一致性
|
|
294
|
+
- **中文**:纯中文输入输出、中文特定的语言理解任务
|
|
295
|
+
|
|
296
|
+
### Q: 如何达到 100% 通过率?
|
|
297
|
+
|
|
298
|
+
A: 迭代流程:
|
|
299
|
+
|
|
300
|
+
1. 运行测试 → 2. 分析失败 → 3. 更新提示词 → 4. 重新测试
|
|
301
|
+
|
|
302
|
+
- 通常需要 3-5 轮
|
|
303
|
+
- 关注最后 5% 的边界情况
|
|
304
|
+
- 考虑调整测试断言(如果过于严格)
|
|
305
|
+
|
|
306
|
+
### Q: 什么时候应该修改测试而非提示词?
|
|
307
|
+
|
|
308
|
+
A: 当:
|
|
309
|
+
|
|
310
|
+
- 测试期望不合理(如要求模型做不到的事)
|
|
311
|
+
- 断言过于严格(如精确匹配特定词语)
|
|
312
|
+
- 多个模型都以不同但合理的方式回答
|
|
313
|
+
|
|
314
|
+
## 最佳实践总结
|
|
315
|
+
|
|
316
|
+
1. **使用英文系统提示词**以获得更好的跨语言一致性
|
|
317
|
+
2. **明确输出格式**:"Output ONLY...","No explanations"
|
|
318
|
+
3. **使用示例**引导模型行为
|
|
319
|
+
4. **分层规则**:MUST > SHOULD > MAY
|
|
320
|
+
5. **具体化**:列举具体情况而非抽象描述
|
|
321
|
+
6. **迭代验证**:小步快跑,每次改进一个问题
|
|
322
|
+
7. **跨模型测试**:至少测试 3 个不同的模型
|
|
323
|
+
8. **版本控制**:记录每次优化的原因和结果
|
|
324
|
+
|
|
325
|
+
## 参考资源
|
|
326
|
+
|
|
327
|
+
- [promptfoo 文档](https://promptfoo.dev)
|
|
328
|
+
- [OpenAI Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)
|
|
329
|
+
- [Anthropic Prompt Engineering](https://docs.anthropic.com/claude/docs/prompt-engineering)
|