@lobehub/chat 1.2.12 → 1.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/CHANGELOG.md +50 -0
  2. package/docs/usage/tools-calling/anthropic.mdx +185 -1
  3. package/docs/usage/tools-calling/anthropic.zh-CN.mdx +14 -19
  4. package/docs/usage/tools-calling/google.mdx +116 -1
  5. package/docs/usage/tools-calling/google.zh-CN.mdx +104 -3
  6. package/docs/usage/tools-calling/moonshot.mdx +1 -0
  7. package/docs/usage/tools-calling/moonshot.zh-CN.mdx +24 -0
  8. package/docs/usage/tools-calling/openai.mdx +139 -1
  9. package/docs/usage/tools-calling/openai.zh-CN.mdx +0 -694
  10. package/docs/usage/tools-calling.zh-CN.mdx +15 -14
  11. package/locales/ar/setting.json +1 -1
  12. package/locales/bg-BG/setting.json +1 -1
  13. package/locales/de-DE/setting.json +1 -1
  14. package/locales/en-US/setting.json +1 -1
  15. package/locales/es-ES/setting.json +1 -1
  16. package/locales/fr-FR/setting.json +1 -1
  17. package/locales/it-IT/setting.json +1 -1
  18. package/locales/ja-JP/setting.json +1 -1
  19. package/locales/ko-KR/setting.json +1 -1
  20. package/locales/nl-NL/setting.json +1 -1
  21. package/locales/pl-PL/setting.json +1 -1
  22. package/locales/pt-BR/setting.json +1 -1
  23. package/locales/ru-RU/setting.json +1 -1
  24. package/locales/tr-TR/setting.json +1 -1
  25. package/locales/vi-VN/setting.json +1 -1
  26. package/locales/zh-CN/setting.json +1 -1
  27. package/locales/zh-TW/setting.json +1 -1
  28. package/package.json +30 -30
  29. package/src/features/AgentSetting/AgentModal/index.tsx +4 -1
  30. package/src/libs/agent-runtime/google/index.test.ts +2 -2
  31. package/src/locales/default/setting.ts +1 -1
  32. package/src/services/__tests__/chat.test.ts +136 -1
  33. package/src/services/chat.ts +44 -2
package/CHANGELOG.md CHANGED
@@ -2,6 +2,56 @@
2
2
 
3
3
  # Changelog
4
4
 
5
+ ### [Version 1.2.14](https://github.com/lobehub/lobe-chat/compare/v1.2.13...v1.2.14)
6
+
7
+ <sup>Released on **2024-07-08**</sup>
8
+
9
+ #### 💄 Styles
10
+
11
+ - **misc**: Provider changes with model in model settings.
12
+
13
+ <br/>
14
+
15
+ <details>
16
+ <summary><kbd>Improvements and Fixes</kbd></summary>
17
+
18
+ #### Styles
19
+
20
+ - **misc**: Provider changes with model in model settings, closes [#3146](https://github.com/lobehub/lobe-chat/issues/3146) ([e53bb5a](https://github.com/lobehub/lobe-chat/commit/e53bb5a))
21
+
22
+ </details>
23
+
24
+ <div align="right">
25
+
26
+ [![](https://img.shields.io/badge/-BACK_TO_TOP-151515?style=flat-square)](#readme-top)
27
+
28
+ </div>
29
+
30
+ ### [Version 1.2.13](https://github.com/lobehub/lobe-chat/compare/v1.2.12...v1.2.13)
31
+
32
+ <sup>Released on **2024-07-07**</sup>
33
+
34
+ #### 🐛 Bug Fixes
35
+
36
+ - **misc**: Fix tool message order.
37
+
38
+ <br/>
39
+
40
+ <details>
41
+ <summary><kbd>Improvements and Fixes</kbd></summary>
42
+
43
+ #### What's fixed
44
+
45
+ - **misc**: Fix tool message order, closes [#3155](https://github.com/lobehub/lobe-chat/issues/3155) ([6171b2a](https://github.com/lobehub/lobe-chat/commit/6171b2a))
46
+
47
+ </details>
48
+
49
+ <div align="right">
50
+
51
+ [![](https://img.shields.io/badge/-BACK_TO_TOP-151515?style=flat-square)](#readme-top)
52
+
53
+ </div>
54
+
5
55
  ### [Version 1.2.12](https://github.com/lobehub/lobe-chat/compare/v1.2.11...v1.2.12)
6
56
 
7
57
  <sup>Released on **2024-07-07**</sup>
@@ -1 +1,185 @@
1
- TODO
1
+ ---
2
+ title: Anthropic Claude 系列 Tools Calling 评测
3
+ description: >-
4
+ 使用 LobeChat 测试 Anthropic Claude 系列模型(Claude 3.5 sonnet / Claude 3 Opus /
5
+ Claude 3 haiku) 的工具调用(Function Calling)能力,并展现评测结果
6
+ tags:
7
+ - Tools Calling
8
+ - Benchmark
9
+ - Function Calling 评测
10
+ - 工具调用
11
+ - 插件
12
+ ---
13
+
14
+ # Anthropic Claude Series Tools Calling
15
+
16
+ Overview of Anthropic Claude Series model Tools Calling capabilities:
17
+
18
+ | Model | Support Tools Calling | Stream | Parallel | Simple Instruction Score | Complex Instruction |
19
+ | --- | --- | --- | --- | --- | --- |
20
+ | Claude 3.5 Sonnet | ✅ | ✅ | ✅ | 🌟🌟🌟 | 🌟🌟 |
21
+ | Claude 3 Opus | ✅ | ✅ | ❌ | 🌟 | ⛔️ |
22
+ | Claude 3 Sonnet | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ |
23
+ | Claude 3 Haiku | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ |
24
+
25
+ ## Claude 3.5 Sonnet
26
+
27
+ ### Simple Instruction Call: Weather Query
28
+
29
+ Test Instruction: Instruction ①
30
+
31
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/42a6980c-ea2a-44fd-b61f-a7989827f5a5" />
32
+
33
+ <Image
34
+ alt="Claude 3.5 Sonnet Tools Calling for Simple Instruction"
35
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/71146b75-2c73-48c3-9688-1d8814d2a791"
36
+ />
37
+
38
+ <details>
39
+ <summary>Tools Calling Raw Output:</summary>
40
+
41
+ ```yml
42
+
43
+ ```
44
+
45
+ </details>
46
+
47
+ ### Complex Instruction Call: Literary Map
48
+
49
+ Test Instruction: Instruction ②
50
+
51
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a9a40899-d5f3-4ef2-aa08-922751b05ca6" />
52
+
53
+ From the above video:
54
+
55
+ 1. Sonnet 3.5 supports Stream Tools Calling and Parallel Tools Calling;
56
+ 2. In Stream Tools Calling, it is observed that creating long sentences will cause a delay (as seen in the Tools Calling raw output `[chunk 40]` and `[chunk 41]` with a delay of 6s). Therefore, there will be a relatively long waiting time at the beginning stage of Tools Calling.
57
+
58
+ <Image
59
+ alt="Claude 3.5 Sonnet Tools Calling for Complex Instruction"
60
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/23e2d7e5-a6f3-4f4c-9c6a-5651f35a5910"
61
+ />
62
+
63
+ <details>
64
+ <summary>Tools Calling Raw Output:</summary>
65
+
66
+ ```yml
67
+
68
+ ```
69
+
70
+ </details>
71
+
72
+ ## Claude 3 Opus
73
+
74
+ ### Simple Instruction Call: Weather Query
75
+
76
+ Test Instruction: Instruction ①
77
+
78
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/0e120fa2-8410-4552-a947-5ab7a91d994d" />
79
+
80
+ From the above video:
81
+
82
+ 1. Claude 3 Opus outputs a `<thinking>` tag at the beginning of Tools Calling, which is not very helpful for users and consumes more tokens;
83
+ 2. Opus triggers Tools Calling twice, indicating that it does not support Parallel Tools Calling;
84
+ 3. The raw output of Tools Calling shows that Opus also supports Stream Tools Calling.
85
+
86
+ <Image
87
+ alt="Claude 3 Opus Tools Calling for Simple Instruction"
88
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/fa2f89bc-b9d5-43e3-a15e-1e79174d002c"
89
+ />
90
+
91
+ <details>
92
+ <summary>Tools Calling Raw Output:</summary>
93
+
94
+ </details>
95
+
96
+ ### Complex Instruction Call: Literary Map
97
+
98
+ Test Instruction: Instruction ②
99
+
100
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/b2dc8cd9-2582-43fe-9121-29c20a1cdc7b" />
101
+
102
+ From the above video:
103
+
104
+ 1. Combining with simple tasks, Opus will always output a `<thinking>` tag, which significantly impacts the user experience;
105
+ 2. Opus outputs the prompts field as a string instead of an array, causing an error and preventing the plugin from being called correctly.
106
+
107
+ <Image
108
+ alt="Claude 3 Opus Tools Calling for Complex Instruction"
109
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/1eee785d-932f-4320-845e-eed0bee4b1ae"
110
+ />
111
+
112
+ <details>
113
+ <summary>Tools Calling Raw Output:</summary>
114
+
115
+ </details>
116
+
117
+ ## Claude 3 Sonnet
118
+
119
+ ### Simple Instruction Call: Weather Query
120
+
121
+ Test Instruction: Instruction ①
122
+
123
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/600becd5-7f12-4a9a-86c7-e5cca0db6b1b" />
124
+
125
+ From the above video, it can be seen that Claude 3 Sonnet triggers Tools Calling twice, indicating that it does not support Parallel Tools Calling.
126
+
127
+ <Image
128
+ alt="Claude 3 Sonnet Tools Calling for Simple Instruction"
129
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/e82f5c69-7607-488f-8c10-0482fb380c6c"
130
+ />
131
+
132
+ <details>
133
+ <summary>Tools Calling Raw Output:</summary>
134
+
135
+ </details>
136
+
137
+ ### Complex Instruction Call: Literary Map
138
+
139
+ Test Instruction: Instruction ②
140
+
141
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/c150aa5f-36bc-40f2-a779-9c4fdcf2cd4c" />
142
+
143
+ From the above video, it can be seen that Sonnet 3 fails in the complex instruction call. The error is due to prompts being expected as an array but generated as a string.
144
+
145
+ <Image
146
+ alt="Claude 3.5 Sonnet Tools Calling for Complex Instruction"
147
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/b7d84e26-920d-4a82-8798-1b1060ebb341"
148
+ />
149
+
150
+ <details>
151
+ <summary>Tools Calling Raw Output:</summary>
152
+
153
+ </details>
154
+
155
+ ## Claude 3 Haiku
156
+
157
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/02b3e872-735a-4928-8245-a90786acea8b" />
158
+
159
+ From the above video:
160
+
161
+ 1. Claude 3 Haiku triggers Tools Calling twice, indicating that it also does not support Parallel Tools Calling;
162
+ 2. Haiku does not provide a good response and directly calls the tool;
163
+
164
+ <Image
165
+ alt="Claude 3 Haiku Tools Calling for Simple Instruction"
166
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/9081b586-cf43-440f-8ef8-1de5d8658694"
167
+ />
168
+
169
+ ### Complex Instruction Call: Literary Map
170
+
171
+ Test Instruction: Instruction ②
172
+
173
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/d1e3f804-0b89-4b90-9d78-69aee0db1c4d" />
174
+
175
+ From the above video, it can be seen that Haiku 3 also fails in the complex instruction call. The error is the same as prompts generating a string instead of an array.
176
+
177
+ <Image
178
+ alt="Claude 3 Haiku Tools Calling for Complex Instruction"
179
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/cde80220-4615-43bb-934f-35fe0de88754"
180
+ />
181
+
182
+ <details>
183
+ <summary>Tools Calling Raw Output:</summary>
184
+
185
+ </details>
@@ -13,6 +13,15 @@ tags:
13
13
 
14
14
  # Anthropic Claude 系列 Tools Calling
15
15
 
16
+ Anthropic Claude 系列模型 Tools Calling 能力一览:
17
+
18
+ | 模型 | 支持 Tools Calling | 流式 (Stream) | 并发(Parallel) | 简单指令得分 | 复杂指令 |
19
+ | --- | --- | --- | --- | --- | --- |
20
+ | Claude 3.5 Sonnet | ✅ | ✅ | ✅ | 🌟🌟🌟 | 🌟🌟 |
21
+ | Claude 3 Opus | ✅ | ✅ | ❌ | 🌟 | ⛔️ |
22
+ | Claude 3 Sonnet | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ |
23
+ | Claude 3 Haiku | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ |
24
+
16
25
  ## Claude 3.5 Sonnet
17
26
 
18
27
  ### 简单调用指令:天气查询
@@ -42,6 +51,7 @@ tags:
42
51
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a9a40899-d5f3-4ef2-aa08-922751b05ca6" />
43
52
 
44
53
  从上述视频中可以看到:
54
+
45
55
  1. Sonnet 3.5 支持流式 Tools Calling 和 Parallel Tools Calling;
46
56
  2. 在流式 Tools Calling 时,表现出来的特征是在创建长句会等待住(详见 Tools Calling 原始输出 `[chunk 40]` 和 `[chunk 41]` 中间的耗时达到 6s)。所以相对来说会在 Tools Calling 的起始阶段有一个较长的等待时间。
47
57
 
@@ -65,11 +75,11 @@ tags:
65
75
 
66
76
  测试指令:指令 ①
67
77
 
68
-
69
78
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/0e120fa2-8410-4552-a947-5ab7a91d994d" />
70
79
 
71
80
  从上述视频中看到:
72
- 1. Claude 3 Opus 在调用 Tools 的起点会输出一段 <thinking> 标签的内容,这段内容对于用户来说几乎没有什么帮助,反而带来了较多的 Token 消耗;
81
+
82
+ 1. Claude 3 Opus 在调用 Tools 的起点会输出一段 `<thinking>` 标签的内容,这段内容对于用户来说几乎没有什么帮助,反而带来了较多的 Token 消耗;
73
83
  2. Opus 会触发两次 Tools Calling,说明它并不支持 Parallel Tools Calling;
74
84
  3. 从 Tools Calling 的原始输出来看, Opus 也是支持流式 Tools Calling 的
75
85
 
@@ -78,15 +88,11 @@ tags:
78
88
  src="https://github.com/lobehub/lobe-chat/assets/28616219/fa2f89bc-b9d5-43e3-a15e-1e79174d002c"
79
89
  />
80
90
 
81
-
82
-
83
91
  <details>
84
92
  <summary>Tools Calling 原始输出:</summary>
85
93
 
86
-
87
94
  </details>
88
95
 
89
-
90
96
  ### 复杂调用指令:文生图
91
97
 
92
98
  测试指令:指令 ②
@@ -94,7 +100,8 @@ tags:
94
100
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/b2dc8cd9-2582-43fe-9121-29c20a1cdc7b" />
95
101
 
96
102
  从上述视频中看到:
97
- 1. 结合简单任务, Opus 的工具调用一定会输出 <thinking> 标签,这其实对体验影响非常大
103
+
104
+ 1. 结合简单任务, Opus 的工具调用一定会输出 `<thinking>` 标签,这其实对体验影响非常大
98
105
  2. Opus 输出的 prompts 字段是字符串,而不是数组,导致报错,无法正常调用插件。
99
106
 
100
107
  <Image
@@ -105,7 +112,6 @@ tags:
105
112
  <details>
106
113
  <summary>Tools Calling 原始输出:</summary>
107
114
 
108
-
109
115
  </details>
110
116
 
111
117
  ## Claude 3 Sonnet
@@ -123,18 +129,15 @@ tags:
123
129
  src="https://github.com/lobehub/lobe-chat/assets/28616219/e82f5c69-7607-488f-8c10-0482fb380c6c"
124
130
  />
125
131
 
126
-
127
132
  <details>
128
133
  <summary>Tools Calling 原始输出:</summary>
129
134
 
130
-
131
135
  </details>
132
136
 
133
137
  ### 复杂调用指令:文生图
134
138
 
135
139
  测试指令:指令 ②
136
140
 
137
-
138
141
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/c150aa5f-36bc-40f2-a779-9c4fdcf2cd4c" />
139
142
 
140
143
  从上述视频中可以看到, Sonnet 3 在复杂指令调用下就失败了。报错原因是 prompts 原本预期为一个数组,但是生成的却是一个字符串。
@@ -147,13 +150,10 @@ tags:
147
150
  <details>
148
151
  <summary>Tools Calling 原始输出:</summary>
149
152
 
150
-
151
-
152
153
  </details>
153
154
 
154
155
  ## Claude 3 Haiku
155
156
 
156
-
157
157
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/02b3e872-735a-4928-8245-a90786acea8b" />
158
158
 
159
159
  从上述视频中可以看出:
@@ -161,18 +161,15 @@ tags:
161
161
  1. Claude 3 Haiku 会调用两次 Tools Calling,说明它也不支持 Parallel Tools Calling;
162
162
  2. Haiku 并没有回答好的,也是直接调用的工具;
163
163
 
164
-
165
164
  <Image
166
165
  alt="Claude 3 Haiku 简单指令的 Tools Calling"
167
166
  src="https://github.com/lobehub/lobe-chat/assets/28616219/9081b586-cf43-440f-8ef8-1de5d8658694"
168
167
  />
169
168
 
170
-
171
169
  ### 复杂调用指令:文生图
172
170
 
173
171
  测试指令:指令 ②
174
172
 
175
-
176
173
  <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/d1e3f804-0b89-4b90-9d78-69aee0db1c4d" />
177
174
 
178
175
  从上述视频中可以看到, Haiku 3 在复杂指令调用下也是失败的。报错原因同样是 prompts 生成了字符串而不是数组。
@@ -185,6 +182,4 @@ tags:
185
182
  <details>
186
183
  <summary>Tools Calling 原始输出:</summary>
187
184
 
188
-
189
-
190
185
  </details>
@@ -1 +1,116 @@
1
- TODO
1
+ ---
2
+ title: Google Gemini 系列 Tool Calling 评测
3
+ description: >-
4
+ 使用 LobeChat 测试 Google Gemini 系列模型(Gemini 1.5 Pro / Gemini 1.5 Flash)
5
+ 的工具调用(Function Calling)能力,并展现评测结果
6
+ tags:
7
+ - Tools Calling
8
+ - Benchmark
9
+ - Function Calling 评测
10
+ - 工具调用
11
+ - 插件
12
+ ---
13
+
14
+ # Google Gemini Series Tool Calling
15
+
16
+ Overview of Google Gemini series model Tools Calling capabilities:
17
+
18
+ | Model | Tools Calling Support | Streaming | Parallel | Simple Instruction Score | Complex Instruction |
19
+ | --- | --- | --- | --- | --- | --- |
20
+ | Gemini 1.5 Pro | ✅ | ❌ | ✅ | ⛔ | ⛔ |
21
+ | Gemini 1.5 Flash | ❌ | ❌ | ❌ | ⛔ | ⛔ |
22
+
23
+ <Callout type={'important'}>
24
+ Based on our actual tests, we strongly recommend not enabling plugins for Gemini because as of
25
+ July 7, 2024, its Tools Calling capability is extremely poor.
26
+ </Callout>
27
+
28
+ ## Gemini 1.5 Pro
29
+
30
+ ### Simple Instruction Call: Weather Query
31
+
32
+ Test Instruction: Instruction ①
33
+
34
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a5a35431-2a15-4e79-97d5-502637f829bc" />
35
+
36
+ In the json output from Gemini, the name is incorrect, so LobeChat cannot recognize which plugin it called. (In the input, the name of the weather plugin is `realtime-weather____fetchCurrentWeather`, while Gemini returns `weather____fetchCurrentWeather`).
37
+
38
+ <Image
39
+ alt="Tools Calling for Simple Instruction in Gemini 1.5 Pro"
40
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/1e077799-c25e-43c7-8492-c5c0bb9aed9b"
41
+ />
42
+
43
+ <details>
44
+ <summary>Original Tools Calling Output:</summary>
45
+
46
+ ```yml
47
+ [stream start] 2024-7-7 17:53:25.647
48
+ [chunk 0] 2024-7-7 17:53:25.654
49
+ {"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
50
+
51
+ [chunk 1] 2024-7-7 17:53:26.288
52
+ {"candidates":[{"content":{"parts":[{"text":"\n\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
53
+
54
+ [chunk 2] 2024-7-7 17:53:26.336
55
+ {"candidates":[{"content":{"parts":[{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"Hangzhou"}}},{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"Beijing"}}}],"role":"model"},"finishReasoSTOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":79,"totalTokenCount":174}}
56
+
57
+ [stream finished] total chunks: 3
58
+ ```
59
+
60
+ </details>
61
+
62
+ ### Complex Instruction Call: Image Generation
63
+
64
+ Test Instruction: Instruction ②
65
+
66
+ <Image
67
+ alt="Tools Calling for Complex Instruction in Gemini 1.5 Pro"
68
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/a2454a60-3271-4786-861f-d49ceac1316e"
69
+ />
70
+
71
+ When testing a set of complex instructions, Google throws an error directly:
72
+
73
+ ```json
74
+ {
75
+ "message": "[400 Bad Request] Invalid JSON payload received. Unknown name \"maxItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"minItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field. [{\"@type\":\"type.googleapis.com/google.rpc.BadRequest\",\"fieldViolations\":[{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"maxItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"minItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[1].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[3].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[4].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field.\"}]}]"
76
+ }
77
+ ```
78
+
79
+ The error above mentions that it does not support a schema containing `maxItems`, so Gemini 1.5 Pro is essentially unable to use the DallE plugin.
80
+
81
+ Related issues:
82
+
83
+ - [Support for minItems and maxItems for FunctionDeclarationSchemaType.ARRAY?](https://github.com/google-gemini/generative-ai-js/issues/200)
84
+ - [Gemini Models unusable when dalle plugin is enabled](https://github.com/lobehub/lobe-chat/issues/2537)
85
+
86
+ Based on the above two tests, Google's Tool Calling capability seems to be supported, but it is almost unusable in daily use. I personally think it is equivalent to false advertising.
87
+
88
+ ## Gemini 1.5 Flash
89
+
90
+ ### Simple Command: Weather Query
91
+
92
+ Test Command: Command ①
93
+
94
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/6cab77e8-d761-4a91-8325-a61748cebac1" />
95
+
96
+ Gemini 1.5 Flash is more abstract, and the call ends as soon as it is made. Combining the original output below, it can be seen that Gemini 1.5 Flash does not output Tool Calling data, so it can be considered completely unusable.
97
+
98
+ ```yml
99
+ stream start] 2024-7-7 19:4:50.936
100
+ [chunk 0] 2024-7-7 19:4:50.943
101
+ {"candidates":[{"content":{"parts":[{"text":"Okay"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":1,"totalTokenCount":97}}
102
+
103
+ [chunk 1] 2024-7-7 19:4:52.209
104
+ {"candidates":[{"content":{"parts":[{"text":", please wait, I am checking the weather information for Hangzhou and Beijing."}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
105
+
106
+ [chunk 2] 2024-7-7 19:4:53.288
107
+ {"candidates":[{"content":{"parts":[{"text":"\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
108
+
109
+ [stream finished] total chunks: 3
110
+ ```
111
+
112
+ ### Complex Command: Wenshengtu
113
+
114
+ Test Command: Command ②
115
+
116
+ This command, like the complex commands of Gemini 1.5 Pro, throws an error directly, so it will not be further elaborated.
@@ -1,13 +1,114 @@
1
1
  ---
2
- title: Google Gemini 系列 Tool Calling
2
+ title: Google Gemini 系列 Tool Calling 评测
3
+ description: 使用 LobeChat 测试 Google Gemini 系列模型(Gemini 1.5 Pro / Gemini 1.5 Flash)的工具调用(Function Calling)能力,并展现评测结果
4
+ tags:
5
+ - Tools Calling
6
+ - Benchmark
7
+ - Function Calling 评测
8
+ - 工具调用
9
+ - 插件
3
10
  ---
4
11
 
5
12
  # Google Gemini 系列 Tool Calling
6
13
 
14
+ Google Gemini 系列模型 Tools Calling 能力一览:
15
+
16
+ | 模型 | 支持 Tools Calling | 流式 (Stream) | 并发(Parallel) | 简单指令得分 | 复杂指令 |
17
+ | --- | --- | --- | --- | --- | --- |
18
+ | Gemini 1.5 Pro | ✅ | ❌ | ✅ | ⛔ | ⛔ |
19
+ | Gemini 1.5 Flash | ❌ | ❌ | ❌ | ⛔ | ⛔ |
20
+
21
+ <Callout type={'important'}>
22
+ 根据我们的的实际测试,强烈建议不要给 Gemini 开启插件,因为目前(截止2024.07.07)它的 Tools Calling
23
+ 能力实在太烂了。
24
+ </Callout>
25
+
7
26
  ## Gemini 1.5 Pro
8
27
 
9
- TODO
28
+ ### 简单调用指令:天气查询
29
+
30
+ 测试指令:指令 ①
31
+
32
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a5a35431-2a15-4e79-97d5-502637f829bc" />
33
+
34
+ Gemini 输出的 json 中,name 是错误的,因此 LobeChat 无法识别到它调用了什么插件。(入参中,天气插件的 name 为 `realtime-weather____fetchCurrentWeather`,而 Gemini 返回的是 `weather____fetchCurrentWeather`)。
35
+
36
+ <Image
37
+ alt="Gemini 1.5 Pro 简单指令的 Tools Calling"
38
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/1e077799-c25e-43c7-8492-c5c0bb9aed9b"
39
+ />
40
+
41
+ <details>
42
+ <summary>Tools Calling 原始输出:</summary>
43
+
44
+ ```yml
45
+ [stream start] 2024-7-7 17:53:25.647
46
+ [chunk 0] 2024-7-7 17:53:25.654
47
+ {"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
48
+
49
+ [chunk 1] 2024-7-7 17:53:26.288
50
+ {"candidates":[{"content":{"parts":[{"text":"\n\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":1,"totalTokenCount":96}}
51
+
52
+ [chunk 2] 2024-7-7 17:53:26.336
53
+ {"candidates":[{"content":{"parts":[{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"杭州"}}},{"functionCall":{"name":"weather____fetchCurrentWeather","args":{"city":"北京"}}}],"role":"model"},"finishReasoSTOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":95,"candidatesTokenCount":79,"totalTokenCount":174}}
54
+
55
+ [stream finished] total chunks: 3
56
+ ```
57
+
58
+ </details>
59
+
60
+ ### 复杂调用指令:文生图
61
+
62
+ 测试指令:指令 ②
63
+
64
+ <Image
65
+ alt="Gemini 1.5 Pro 复杂指令的 Tools Calling"
66
+ src="https://github.com/lobehub/lobe-chat/assets/28616219/a2454a60-3271-4786-861f-d49ceac1316e"
67
+ />
68
+
69
+ 在测试复杂指令集时,Google 直接抛错:
70
+
71
+ ```json
72
+ {
73
+ "message": "[400 Bad Request] Invalid JSON payload received. Unknown name \"maxItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"minItems\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\nInvalid JSON payload received. Unknown name \"default\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field. [{\"@type\":\"type.googleapis.com/google.rpc.BadRequest\",\"fieldViolations\":[{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"maxItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[0].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"minItems\\\" at 'tools[0].function_declarations[0].parameters.properties[0].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[1].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[1].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[3].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[3].value': Cannot find field.\"},{\"field\":\"tools[0].function_declarations[0].parameters.properties[4].value\",\"description\":\"Invalid JSON payload received. Unknown name \\\"default\\\" at 'tools[0].function_declarations[0].parameters.properties[4].value': Cannot find field.\"}]}]"
74
+ }
75
+ ```
76
+
77
+ 上述抛错中提到并不支持包含 `maxItems` 的 schema,因此 Gemini 1.5 Pro 相当于无法使用 DallE 插件。
78
+
79
+ 相关 issue:
80
+
81
+ - [Support for minItems and maxItems for FunctionDeclarationSchemaType.ARRAY?](https://github.com/google-gemini/generative-ai-js/issues/200)
82
+ - [Gemini Models unusable when dalle plugin is enabled](https://github.com/lobehub/lobe-chat/issues/2537)
83
+
84
+ 综合以上两个测试来看,Google 的 Tool Calling 能力似乎是支持了,但是几乎没法在日常中使用,我个人认为已经等于虚假宣传了。
10
85
 
11
86
  ## Gemini 1.5 Flash
12
87
 
13
- TODO
88
+ ### 简单调用指令:天气查询
89
+
90
+ 测试指令:指令 ①
91
+
92
+ <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/6cab77e8-d761-4a91-8325-a61748cebac1" />
93
+
94
+ 而 Gemini 1.5 flash 更为抽象,说完调用就结束了。结合以下原始输出可以看到,Gemini 1.5 Flash 并没有输出 Tool Calling 的数据,因此可以说是完全不可用。
95
+
96
+ ```yml
97
+ stream start] 2024-7-7 19:4:50.936
98
+ [chunk 0] 2024-7-7 19:4:50.943
99
+ {"candidates":[{"content":{"parts":[{"text":"好的"}],"role":"model"},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":1,"totalTokenCount":97}}
100
+
101
+ [chunk 1] 2024-7-7 19:4:52.209
102
+ {"candidates":[{"content":{"parts":[{"text":",请稍等,我正在查询杭州和北京的天气信息。 "}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"ATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
103
+
104
+ [chunk 2] 2024-7-7 19:4:53.288
105
+ {"candidates":[{"content":{"parts":[{"text":"\n"}],"role":"model"},"finishReason":"STOP","index":0,"safetyRatings":[{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE"},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE"}]}],"usageMetadata":{"promptTokenCount":96,"candidatesTokenCount":16,"totalTokenCount":112}}
106
+
107
+ [stream finished] total chunks: 3
108
+ ```
109
+
110
+ ### 复杂调用指令:文生图
111
+
112
+ 测试指令:指令 ②
113
+
114
+ 该指令和 Gemini 1.5 Pro 的复杂指令一样,直接抛错,因此不再详细展开。
@@ -0,0 +1 @@
1
+ TODO
@@ -0,0 +1,24 @@
1
+ ---
2
+ title: Moonshot 系列 Tools Calling 评测
3
+ description: 使用 LobeChat 测试 Moonshot 系列模型(Moonshot-1) 的工具调用(Function Calling)能力,并展现评测结果
4
+ tags:
5
+ - Tools Calling
6
+ - Benchmark
7
+ - Function Calling
8
+ - 工具调用
9
+ - 插件
10
+ ---
11
+
12
+ # Moonshot 系列工具调用(Tools Calling)
13
+
14
+ ### 简单调用指令:天气查询
15
+
16
+ 测试指令:指令 ①
17
+
18
+ TODO
19
+
20
+ ### 复杂调用指令:文生图
21
+
22
+ 测试指令:指令 ②
23
+
24
+ TODO