vg-coder-cli 2.0.54 → 2.0.56

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/INTEGRATION.md CHANGED
@@ -187,7 +187,7 @@ Server có thể chủ động list / đóng / mở tab AI Studio trong từng p
187
187
  |---|---|---|---|
188
188
  | `GET` | `/api/launcher/tabs` | `?label=<email>` (optional) | List tabs trong profile (hoặc all profiles nếu bỏ label) |
189
189
  | `POST` | `/api/launcher/close-tab` | `{ workerLabel?, tabId? }` | Đóng tab cụ thể, hoặc tất cả tab AI Studio nếu bỏ `tabId` |
190
- | `POST` | `/api/launcher/open-tab` | `{ workerLabel?, model?, url?, active? }` | Mở tab mới. `model` mặc định `gemini-3-flash-preview`. Response v2.0.52+ kèm `requested_model` / `actual_model` / `fallback_occurred` để detect AI Studio silent-fallback (account thiếu access tới preview model) |
190
+ | `POST` | `/api/launcher/open-tab` | `{ workerLabel?, model?, url?, active? }` | Mở tab mới. `model` mặc định `gemini-3-flash-preview`. Response v2.0.52+ kèm `requested_model` / `actual_model` / `fallback_occurred` (URL-based, **không reliable** với AI Studio versions mới xem note) |
191
191
 
192
192
  ```bash
193
193
  # List tab tất cả profile
@@ -215,10 +215,17 @@ curl -X POST -d '{"workerLabel":"alice@gmail.com","model":"gemini-3-flash-previe
215
215
  }
216
216
  ```
217
217
 
218
- Khi `fallback_occurred: true` AI Studio đã redirect sang model khác do account
219
- không access. Client nên check field này để fail-fast hoặc retry với model
220
- khác. Trước v2.0.52, response chỉ trả URL request silent quality degradation
221
- không detect được.
218
+ **Note quan trọng** (verified 2026-05-10): AI Studio versions mới **KHÔNG**
219
+ redirect URL khi silent fallback URL giữ `?model=gemini-3-pro-preview` trong
220
+ khi UI load model khác. vậy `actual_model` open-tab response chỉ phản ánh
221
+ URL param, **không phản ánh model thực sự AI Studio dùng**.
222
+
223
+ → **Source of truth là `task.result.actualModel`** (xem [GET task](#get-apitasksid-poll-task))
224
+ — field này scrape từ DOM `<ms-model-selector>` sau khi task xong, accurate.
225
+
226
+ Pattern recommend cho client: bỏ qua `actual_model` ở open-tab response, kiểm
227
+ tra `task.result.actualModel === requested_model` sau mỗi task done. Nếu khác
228
+ → AI Studio đã silently fallback, retry với model khác hoặc fail-fast.
222
229
 
223
230
  ### Modal auto-handling
224
231
 
@@ -277,6 +284,33 @@ curl -F prompt="Use alice's quota only" \
277
284
  http://127.0.0.1:6868/api/tasks
278
285
  ```
279
286
 
287
+ ## Verify model thực với `actualModel` (v2.0.53+)
288
+
289
+ AI Studio có thể silently fallback model khi account không có access tới model
290
+ request (vd request `gemini-3-pro-preview`, account chỉ có Flash → AI Studio
291
+ load Flash, không trả error). URL param và `open-tab` response **không reliable**
292
+ — AI Studio versions mới giữ nguyên `?model=...` trong URL ngay cả khi UI load
293
+ model khác.
294
+
295
+ `task.result.actualModel` (scrape từ `<ms-model-selector>` DOM sau khi task
296
+ done) là source of truth. Pattern recommend:
297
+
298
+ ```bash
299
+ TID=$(curl -s -F prompt="Phân tích PDF" -F files=@doc.pdf $BASE/api/tasks | jq -r .taskId)
300
+ # poll until done...
301
+ RESULT=$(curl -s $BASE/api/tasks/$TID)
302
+ ACTUAL=$(echo "$RESULT" | jq -r .result.actualModel)
303
+ EXPECTED="gemini-3-pro-preview"
304
+
305
+ if [ "$ACTUAL" != "$EXPECTED" ]; then
306
+ echo "WARN: requested $EXPECTED but ran on $ACTUAL — quality may differ"
307
+ # decision: retry với account khác / fail-fast / accept fallback tùy use-case
308
+ fi
309
+ ```
310
+
311
+ Field có thể `null` nếu DOM chưa render khi task xong (rare — workers thường
312
+ đợi assistant turn render trước khi emit complete, đủ cho selector load).
313
+
280
314
  ## Examples
281
315
 
282
316
  ### Submit + poll
@@ -295,6 +329,7 @@ while true; do
295
329
  sleep 2
296
330
  done
297
331
 
332
+ echo "$RESP" | jq -r '"model=\(.result.actualModel) duration=\(.timing.durationMs)ms"'
298
333
  echo "$RESP" | jq -r .result.markdown
299
334
  ```
300
335
 
@@ -336,6 +371,27 @@ async function pollVgTask(taskId, { intervalMs = 2000, timeoutMs = 5 * 60_000 }
336
371
  }
337
372
  throw new Error('timeout');
338
373
  }
374
+
375
+ // Submit + verify model. Throw nếu actualModel khác expected (production
376
+ // extraction muốn fail-fast khi AI Studio fallback xuống model thấp hơn).
377
+ async function runWithModelGuard({ prompt, files, expectedModel }) {
378
+ // Pin tab về expected model trước khi submit task
379
+ await fetch('http://127.0.0.1:6868/api/launcher/close-tab', { method: 'POST' });
380
+ await fetch('http://127.0.0.1:6868/api/launcher/open-tab', {
381
+ method: 'POST',
382
+ headers: { 'Content-Type': 'application/json' },
383
+ body: JSON.stringify({ model: expectedModel }),
384
+ });
385
+
386
+ const { taskId } = await submitVgTask({ prompt, files });
387
+ const task = await pollVgTask(taskId);
388
+
389
+ if (task.status !== 'done') throw new Error(`task ${task.status}: ${task.error?.message}`);
390
+ if (task.result?.actualModel && task.result.actualModel !== expectedModel) {
391
+ throw new Error(`model_fallback: requested ${expectedModel}, ran on ${task.result.actualModel}`);
392
+ }
393
+ return task.result.markdown;
394
+ }
339
395
  ```
340
396
 
341
397
  ## Debug API
package/bugs/bug1.md ADDED
@@ -0,0 +1,370 @@
1
+ # Bug 1: `open-tab` API silently downgrades model — no validation, no error
2
+
3
+ **Reporter**: medgraph integration (chrome-mcp-vgcoder consumer)
4
+ **Date filed**: 2026-05-10
5
+ **Severity**: 🟠 MAJOR — silent quality degradation, hard for caller to detect
6
+ **Affects**: `vetgo-server2.duckdns.org` (server2, account `phathuy.vetgo@gmail.com`), production deployment
7
+ **Service URL**: `https://vetgo.webmcp.vn/vg/api/launcher/open-tab`
8
+
9
+ ---
10
+
11
+ ## Status: 🟢 RESOLVED (2026-05-10, v2.0.53+)
12
+
13
+ **Root cause confirmed**: AI Studio silently fallback model khi account thiếu
14
+ access. AI Studio versions mới (verified 2026-05-10) KHÔNG redirect URL — URL
15
+ param giữ nguyên request, chỉ DOM `<ms-model-selector>` phản ánh model thực.
16
+
17
+ **Fix shipped**:
18
+ - v2.0.52: `open-tab` response thêm `requested_model` / `actual_model` /
19
+ `fallback_occurred` (URL-based — kept for backward-compat nhưng KHÔNG
20
+ reliable với AI Studio versions mới).
21
+ - v2.0.53: **`task.result.actualModel`** — worker scrape DOM
22
+ `<ms-model-selector> [data-test-id="model-name"]` sau khi task done. Đây
23
+ là **source of truth** cho client. Verified work với:
24
+ - Request `gemini-99-fake` → `actualModel: "gemini-3.1-pro-preview"` (AI Studio fallback rõ ràng).
25
+
26
+ **Client pattern recommend**: bỏ qua `actual_model` ở open-tab response, check
27
+ `task.result.actualModel === requestedModel` sau task done; nếu khác → fallback
28
+ xảy ra, retry hoặc fail-fast tùy context.
29
+
30
+ **Code refs**:
31
+ - `vetgo-auto/chrome/src/launcher.ts` — open-tab handler (URL-based detection,
32
+ legacy)
33
+ - `src/server/views/js/features/task-worker.js:201-235` — `readActualModel()`
34
+ scrape DOM, retry without/with toggle panel
35
+ - `src/server/task-queue.js:417-422` — persist `actualModel` vào `task.result`
36
+ - `INTEGRATION.md` — client docs với recommend pattern
37
+
38
+ ---
39
+
40
+ ## Summary
41
+
42
+ The `POST /api/launcher/open-tab` endpoint accepts any string in the
43
+ `model` field, returns a success response with that model name in the
44
+ URL, but **the actual Chromium tab silently loads a different model**
45
+ (typically `gemini-3-flash-preview`) when the requested model is not
46
+ available to the worker's logged-in Google account.
47
+
48
+ The API caller has **no way to detect** the downgrade — both the API
49
+ response and the URL contain the requested model name. Only by visually
50
+ inspecting the AI Studio sidebar via noVNC can you see which model is
51
+ actually loaded.
52
+
53
+ This causes downstream tasks to run on a different model than the
54
+ caller assumes, which silently degrades extraction quality without any
55
+ warning.
56
+
57
+ ---
58
+
59
+ ## How to reproduce
60
+
61
+ ### Step 1 — Request `gemini-3-pro-preview` (or any model not available to the account)
62
+
63
+ ```bash
64
+ BASE=https://vetgo.webmcp.vn/vg
65
+
66
+ # Close existing tab first
67
+ curl -X POST -H 'Content-Type: application/json' -d '{}' $BASE/api/launcher/close-tab
68
+
69
+ # Open with pro-preview model
70
+ curl -X POST -H 'Content-Type: application/json' \
71
+ -d '{"model":"gemini-3-pro-preview"}' \
72
+ $BASE/api/launcher/open-tab
73
+ ```
74
+
75
+ ### Step 2 — Observe API response (looks success)
76
+
77
+ ```json
78
+ {
79
+ "ok": true,
80
+ "tabId": 477055248,
81
+ "windowId": 477055217,
82
+ "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
83
+ }
84
+ ```
85
+
86
+ HTTP status: `200 OK`. URL embeds the requested model. Caller assumes success.
87
+
88
+ ### Step 3 — Inspect actual tab via noVNC at `https://vetgo.webmcp.vn/vnc.html`
89
+
90
+ Look at AI Studio sidebar (right pane). Observe:
91
+
92
+ - **Sidebar title**: "Gemini 3 Flash Preview" (NOT "Gemini 3 Pro Preview")
93
+ - **Model identifier** under title: `gemini-3-flash-preview`
94
+ - **Browser URL bar**: shows `?model=gemini-3-flash-preview` (changed from request)
95
+
96
+ Screenshot evidence (medgraph user observed 2026-05-10):
97
+ ```
98
+ URL: aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview
99
+ Sidebar: "Gemini 3 Flash Preview"
100
+ "gemini-3-flash-preview"
101
+ "Our most intelligent model built for speed,
102
+ combining frontier intelligence with superior
103
+ search and grounding."
104
+ ```
105
+
106
+ ### Step 4 — Confirm task uses Flash, not Pro
107
+
108
+ Submit a multimodal task and measure duration:
109
+
110
+ ```bash
111
+ curl -F prompt="Describe this PDF chapter structure as JSON" \
112
+ -F files=@/tmp/test_chapter.pdf \
113
+ $BASE/api/tasks
114
+ ```
115
+
116
+ Observed: 38-page PDF processed in **58 seconds**.
117
+
118
+ Expected on `gemini-3-pro-preview`: ~3–5 minutes for 38-page PDF (per
119
+ Google's published Pro model latency benchmarks).
120
+
121
+ → 58s ≪ Pro baseline strongly indicates Flash, not Pro.
122
+
123
+ ---
124
+
125
+ ## Expected behavior
126
+
127
+ `/api/launcher/open-tab` SHOULD **either**:
128
+
129
+ **Option A (preferred — fail-fast)**:
130
+ - Validate the requested `model` against a whitelist of models actually available to the worker account
131
+ - If model not available, return `400 Bad Request` with body
132
+ ```json
133
+ {
134
+ "ok": false,
135
+ "error": "model_not_available",
136
+ "requested": "gemini-3-pro-preview",
137
+ "available": ["gemini-3-flash-preview", "gemini-2.5-flash", "gemini-2.5-pro"],
138
+ "message": "Model 'gemini-3-pro-preview' not accessible to account phathuy.vetgo@gmail.com. Choose from available list."
139
+ }
140
+ ```
141
+
142
+ **Option B (acceptable — report fallback)**:
143
+ - Allow tab to open with whatever AI Studio gives back
144
+ - Detect the actual loaded model from the resulting tab URL or DOM
145
+ - Return the **actually loaded** model in the response
146
+ ```json
147
+ {
148
+ "ok": true,
149
+ "tabId": 477055248,
150
+ "requested_model": "gemini-3-pro-preview",
151
+ "actual_model": "gemini-3-flash-preview",
152
+ "fallback_occurred": true,
153
+ "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview"
154
+ }
155
+ ```
156
+
157
+ Either way, the caller MUST be informed when the loaded model differs
158
+ from the requested model.
159
+
160
+ ---
161
+
162
+ ## Actual behavior (broken)
163
+
164
+ API response:
165
+ ```json
166
+ {
167
+ "ok": true,
168
+ "tabId": 477055248,
169
+ "windowId": 477055217,
170
+ "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
171
+ }
172
+ ```
173
+
174
+ But Chromium tab loads `?model=gemini-3-flash-preview` (visible in noVNC
175
+ URL bar + sidebar). The `url` field in the API response is **the
176
+ requested URL, not the loaded URL**.
177
+
178
+ ---
179
+
180
+ ## Root cause hypotheses (for fixer to investigate)
181
+
182
+ 1. **AI Studio frontend silently auto-fallbacks** when account lacks
183
+ access to a preview model. The Chromium tab navigates from
184
+ `?model=gemini-3-pro-preview` → `?model=gemini-3-flash-preview`
185
+ without any error toast.
186
+
187
+ 2. **`open-tab` handler returns the requested URL immediately** without
188
+ waiting for navigation to settle. It does NOT poll the tab's actual
189
+ final URL after AI Studio's redirect.
190
+
191
+ 3. **No model whitelist validation** in `open-tab` — any string is
192
+ accepted, even nonsense like `gemini-99-fake`. Tested with multiple
193
+ strings including `gemini-3-pro` (note: without `-preview` suffix);
194
+ all return `ok: true` regardless of validity. Each request returns
195
+ the requested URL verbatim, regardless of whether AI Studio actually
196
+ honors that model.
197
+
198
+ ---
199
+
200
+ ## Test cases for fixer
201
+
202
+ | Input model | Account permission | Expected result |
203
+ |-------------|-------------------|-----------------|
204
+ | `gemini-3-flash-preview` | ✅ has access | open success, actual_model = requested |
205
+ | `gemini-2.5-flash` | ✅ has access | open success, actual_model = requested |
206
+ | `gemini-3-pro-preview` | ❌ no access | Option A: 400 error / Option B: ok with `actual_model: gemini-3-flash-preview, fallback_occurred: true` |
207
+ | `gemini-3-pro` | ❌ doesn't exist | Option A: 400 invalid model / Option B: report actual fallback |
208
+ | `gemini-99-fake` | ❌ doesn't exist | Option A: 400 invalid model |
209
+ | (omitted `model` field) | — | Use default model, log which default chosen |
210
+
211
+ ---
212
+
213
+ ## Workaround (current — manual)
214
+
215
+ Until fixed, callers must:
216
+
217
+ 1. Open tab via API
218
+ 2. Open `https://vetgo.webmcp.vn/vnc.html` separately
219
+ 3. Visually inspect AI Studio sidebar to verify model
220
+ 4. If wrong model loaded, manually click model dropdown in AI Studio UI
221
+ to select correct one
222
+
223
+ This defeats the purpose of programmatic tab control. The bug is
224
+ particularly insidious because:
225
+
226
+ - **Output quality silently degrades** (Flash vs Pro for the same task)
227
+ - **No error logged anywhere** — task succeeds, looks fine
228
+ - **Token count similar** so caller can't detect via metrics
229
+ - **Only manual visual inspection** reveals the issue
230
+
231
+ In the medgraph case, this caused a recon task to run on Flash when
232
+ caller assumed Pro, leading to ~5x faster completion time but with
233
+ unknown quality trade-off. For production extraction of clinical
234
+ veterinary protocols (high-liability content), this silent downgrade
235
+ is unacceptable.
236
+
237
+ ---
238
+
239
+ ## Evidence collected (2026-05-10 03:29 UTC)
240
+
241
+ ### API responses
242
+
243
+ ```bash
244
+ # Test 1: Request pro-preview
245
+ $ curl -X POST -d '{"model":"gemini-3-pro-preview"}' \
246
+ $BASE/api/launcher/open-tab
247
+ {"ok":true,"tabId":477055264,"windowId":477055217,
248
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"}
249
+
250
+ # Test 2: Request 2.5-pro (might also fallback if account lacks access)
251
+ $ curl -X POST -d '{"model":"gemini-2.5-pro"}' \
252
+ $BASE/api/launcher/open-tab
253
+ {"ok":true,"tabId":477055260,"windowId":477055217,
254
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-pro"}
255
+
256
+ # Test 3: Request gemini-3-pro (no -preview suffix, possibly invalid)
257
+ $ curl -X POST -d '{"model":"gemini-3-pro"}' \
258
+ $BASE/api/launcher/open-tab
259
+ {"ok":true,"tabId":477055262,"windowId":477055217,
260
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro"}
261
+ ```
262
+
263
+ All three return `ok: true` regardless of whether the account actually
264
+ has access to the model. The API is unable to distinguish a successful
265
+ load from a silent fallback.
266
+
267
+ ### Visual evidence from noVNC (after Test 1)
268
+
269
+ Browser URL bar reads: `aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview`
270
+
271
+ AI Studio Run-settings sidebar shows:
272
+ ```
273
+ Gemini 3 Flash Preview
274
+ gemini-3-flash-preview
275
+ Our most intelligent model built for speed,
276
+ combining frontier intelligence with superior
277
+ search and grounding.
278
+ ```
279
+
280
+ NOT `gemini-3-pro-preview` as requested.
281
+
282
+ ### Task duration evidence
283
+
284
+ Submitted PDF (38 pages, 2.2MB, structured-extraction prompt):
285
+ - Task ID: `t_1778383267520_b0c68b`
286
+ - `durationMs`: `58047` ms (≈ 58 seconds)
287
+ - Worker: `phathuy.vetgo@gmail.com`
288
+
289
+ Reference benchmarks (Google AI Studio published latencies, Apr 2026):
290
+ - `gemini-3-flash-preview` typical 30-60s for 30+ page PDF + JSON output
291
+ - `gemini-3-pro-preview` typical 3-9 minutes for same workload
292
+
293
+ Observed 58s strongly aligns with Flash, not Pro.
294
+
295
+ ---
296
+
297
+ ## Fix priority justification
298
+
299
+ This bug breaks the **only programmatic mechanism for model selection**
300
+ in chrome-mcp-vgcoder. README.md (lines 64-69) documents `open-tab` as
301
+ the way to "lock model" — but in practice, this lock is non-functional
302
+ when the account lacks the requested model.
303
+
304
+ For automated pipelines (e.g. medgraph Layer 2 extraction), this means:
305
+
306
+ - Cannot guarantee task quality without manual noVNC verification
307
+ - Cannot run unattended batch jobs with confidence
308
+ - Cannot programmatically detect/recover from model unavailability
309
+ - Account quota issues are masked as "task succeeded" when actually
310
+ running on a degraded model
311
+
312
+ Recommend Option A (fail-fast validation) for production deployments
313
+ to surface account permission issues immediately, with Option B as a
314
+ fallback for graceful degradation in dev/testing.
315
+
316
+ ---
317
+
318
+ ## Suggested implementation pointers
319
+
320
+ For the AI agent fixing this bug:
321
+
322
+ 1. **Locate `open-tab` handler** — likely in `vg-coder-cli` source under
323
+ `src/server/launcher/` or similar. Search for handler accepting POST
324
+ on path matching `/api/launcher/open-tab`.
325
+
326
+ 2. **Add post-navigation poll**: after Chromium navigates to the
327
+ requested URL, wait for AI Studio frontend to settle (debounce ~2-3s
328
+ or watch for DOM "model selector loaded" event), then read the
329
+ actual `model` query param from `tab.url` (the URL after any
330
+ redirects).
331
+
332
+ 3. **Compare requested vs actual**:
333
+ ```js
334
+ const actualModel = new URL(tab.url).searchParams.get('model');
335
+ const requestedModel = req.body.model;
336
+ const fallbackOccurred = actualModel && actualModel !== requestedModel;
337
+ ```
338
+
339
+ 4. **For Option A**: also maintain a per-account model whitelist
340
+ (probably needs to scrape AI Studio model dropdown DOM once at
341
+ worker boot and cache).
342
+
343
+ 5. **Test on multiple accounts**: server2 (`phathuy.vetgo`) and
344
+ server3 (`udymec`) may have different model access — the fix must
345
+ work for both.
346
+
347
+ 6. **Update README.md**: document the new response shape and any
348
+ account-tier limitations on which models are available.
349
+
350
+ 7. **Add integration test** in `vg-coder-cli/tests/` that asserts
351
+ `actual_model === requested_model` after `open-tab`, and fails
352
+ loudly when fallback occurs without warning.
353
+
354
+ ---
355
+
356
+ ## Related code areas (for fixer to grep)
357
+
358
+ - Handler: search for `app.post('/api/launcher/open-tab'` or similar route registration
359
+ - Tab navigation: search for `chrome.tabs.update` or `chrome.tabs.create` with `url:` containing `aistudio.google.com`
360
+ - Worker registration: `meta.domain === 'aistudio.google.com'` likely involves model param parsing already
361
+ - Reference: `chrome-mcp-vgcoder/README.md` lines 64-69 document the contract this bug violates
362
+
363
+ ---
364
+
365
+ ## Out of scope for this bug
366
+
367
+ - Fixing AI Studio's silent fallback behavior itself (that's Google's UI)
368
+ - Adding model selection per-task (current architecture is per-worker; this bug only addresses
369
+ per-worker model selection accuracy)
370
+ - Quota management / fallback strategy when preview models hit limits