@botbotgo/better-call 0.1.6 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +35 -9
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -121,15 +121,41 @@ Measured with real Ollama `/api/chat` calls over all supported BFCL v4 single-tu
|
|
|
121
121
|
|
|
122
122
|
Latest completed remote run artifact: `benchmarks/bfcl-real-remote-completed-summary.json`.
|
|
123
123
|
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
124
|
+
Performance after wrapping the same model outputs with BetterCall:
|
|
125
|
+
|
|
126
|
+
```text
|
|
127
|
+
granite4.1:3b
|
|
128
|
+
Raw 73.4% | #############################...........
|
|
129
|
+
BetterCall 83.8% | ##################################......
|
|
130
|
+
qwen2.5:7b-instruct
|
|
131
|
+
Raw 72.2% | #############################...........
|
|
132
|
+
BetterCall 78.2% | ###############################.........
|
|
133
|
+
qwen3:0.6b
|
|
134
|
+
Raw 55.5% | ######################..................
|
|
135
|
+
BetterCall 63.6% | #########################...............
|
|
136
|
+
qwen3.5:0.8b
|
|
137
|
+
Raw 54.6% | ######################..................
|
|
138
|
+
BetterCall 56.9% | #######################.................
|
|
139
|
+
qwen3.5:2b
|
|
140
|
+
Raw 53.9% | ######################..................
|
|
141
|
+
BetterCall 54.9% | ######################..................
|
|
142
|
+
lfm2.5-thinking:latest
|
|
143
|
+
Raw 50.8% | ####################....................
|
|
144
|
+
BetterCall 54.8% | ######################..................
|
|
145
|
+
gemma4:e2b
|
|
146
|
+
Raw 24.3% | ##########..............................
|
|
147
|
+
BetterCall 24.7% | ##########..............................
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
| Rank | Model | Completed cases | Raw model | BetterCall | Lift | Request errors |
|
|
151
|
+
| ---: | --- | ---: | ---: | ---: | ---: | ---: |
|
|
152
|
+
| 1 | `granite4.1:3b` | 3,625 | 73.4% | 83.8% | +10.4pp | 25 |
|
|
153
|
+
| 2 | `qwen2.5:7b-instruct` | 3,625 | 72.2% | 78.2% | +5.9pp | 80 |
|
|
154
|
+
| 3 | `qwen3:0.6b` | 3,625 | 55.5% | 63.6% | +8.2pp | 217 |
|
|
155
|
+
| 4 | `qwen3.5:0.8b` | 3,625 | 54.6% | 56.9% | +2.3pp | 901 |
|
|
156
|
+
| 5 | `qwen3.5:2b` | 3,625 | 53.9% | 54.9% | +1.0pp | 1,308 |
|
|
157
|
+
| 6 | `lfm2.5-thinking:latest` | 3,625 | 50.8% | 54.8% | +4.0pp | 1,142 |
|
|
158
|
+
| 7 | `gemma4:e2b` | 3,625 | 24.3% | 24.7% | +0.4pp | 2,641 |
|
|
133
159
|
|
|
134
160
|
Latest completed model category detail: `qwen2.5:7b-instruct`.
|
|
135
161
|
|