npm - vg-coder-cli - Versions diffs - 2.0.54 → 2.0.56 - Mend

vg-coder-cli 2.0.54 → 2.0.56

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/INTEGRATION.md +61 -5
package/bugs/bug1.md +370 -0
package/dist/vg-coder-bundle.js +1 -1
package/package.json +1 -1
package/src/server/api-server.js +13 -0
package/src/server/task-queue.js +8 -2
package/src/server/views/js/features/task-worker.js +18 -7

package/INTEGRATION.md CHANGED Viewed

@@ -187,7 +187,7 @@ Server có thể chủ động list / đóng / mở tab AI Studio trong từng p
 |---|---|---|---|
 | `GET` | `/api/launcher/tabs` | `?label=<email>` (optional) | List tabs trong profile (hoặc all profiles nếu bỏ label) |
 | `POST` | `/api/launcher/close-tab` | `{ workerLabel?, tabId? }` | Đóng tab cụ thể, hoặc tất cả tab AI Studio nếu bỏ `tabId` |
-| `POST` | `/api/launcher/open-tab` | `{ workerLabel?, model?, url?, active? }` | Mở tab mới. `model` mặc định `gemini-3-flash-preview`. Response v2.0.52+ kèm `requested_model` / `actual_model` / `fallback_occurred` để detect AI Studio silent-fallback (account thiếu access tới preview model) |
+| `POST` | `/api/launcher/open-tab` | `{ workerLabel?, model?, url?, active? }` | Mở tab mới. `model` mặc định `gemini-3-flash-preview`. Response v2.0.52+ kèm `requested_model` / `actual_model` / `fallback_occurred` (URL-based, **không reliable** với AI Studio versions mới — xem note) |
 ```bash
 # List tab tất cả profile
@@ -215,10 +215,17 @@ curl -X POST -d '{"workerLabel":"alice@gmail.com","model":"gemini-3-flash-previe
 }
 ```
-Khi `fallback_occurred: true` — AI Studio đã redirect sang model khác do account
-không có access. Client nên check field này để fail-fast hoặc retry với model
-khác. Trước v2.0.52, response chỉ trả URL request → silent quality degradation
-không detect được.
+**Note quan trọng** (verified 2026-05-10): AI Studio versions mới **KHÔNG**
+redirect URL khi silent fallback — URL giữ `?model=gemini-3-pro-preview` trong
+khi UI load model khác. Vì vậy `actual_model` ở open-tab response chỉ phản ánh
+URL param, **không phản ánh model thực sự AI Studio dùng**.
+→ **Source of truth là `task.result.actualModel`** (xem [GET task](#get-apitasksid-poll-task))
+— field này scrape từ DOM `<ms-model-selector>` sau khi task xong, accurate.
+Pattern recommend cho client: bỏ qua `actual_model` ở open-tab response, kiểm
+tra `task.result.actualModel === requested_model` sau mỗi task done. Nếu khác
+→ AI Studio đã silently fallback, retry với model khác hoặc fail-fast.
 ### Modal auto-handling
@@ -277,6 +284,33 @@ curl -F prompt="Use alice's quota only" \
      http://127.0.0.1:6868/api/tasks
 ```
+## Verify model thực với `actualModel` (v2.0.53+)
+AI Studio có thể silently fallback model khi account không có access tới model
+request (vd request `gemini-3-pro-preview`, account chỉ có Flash → AI Studio
+load Flash, không trả error). URL param và `open-tab` response **không reliable**
+— AI Studio versions mới giữ nguyên `?model=...` trong URL ngay cả khi UI load
+model khác.
+`task.result.actualModel` (scrape từ `<ms-model-selector>` DOM sau khi task
+done) là source of truth. Pattern recommend:
+```bash
+TID=$(curl -s -F prompt="Phân tích PDF" -F files=@doc.pdf $BASE/api/tasks | jq -r .taskId)
+# poll until done...
+RESULT=$(curl -s $BASE/api/tasks/$TID)
+ACTUAL=$(echo "$RESULT" | jq -r .result.actualModel)
+EXPECTED="gemini-3-pro-preview"
+if [ "$ACTUAL" != "$EXPECTED" ]; then
+  echo "WARN: requested $EXPECTED but ran on $ACTUAL — quality may differ"
+  # decision: retry với account khác / fail-fast / accept fallback tùy use-case
+fi
+```
+Field có thể `null` nếu DOM chưa render khi task xong (rare — workers thường
+đợi assistant turn render trước khi emit complete, đủ cho selector load).
 ## Examples
 ### Submit + poll
@@ -295,6 +329,7 @@ while true; do
   sleep 2
 done
+echo "$RESP" | jq -r '"model=\(.result.actualModel) duration=\(.timing.durationMs)ms"'
 echo "$RESP" | jq -r .result.markdown
 ```
@@ -336,6 +371,27 @@ async function pollVgTask(taskId, { intervalMs = 2000, timeoutMs = 5 * 60_000 }
   }
   throw new Error('timeout');
 }
+// Submit + verify model. Throw nếu actualModel khác expected (production
+// extraction muốn fail-fast khi AI Studio fallback xuống model thấp hơn).
+async function runWithModelGuard({ prompt, files, expectedModel }) {
+  // Pin tab về expected model trước khi submit task
+  await fetch('http://127.0.0.1:6868/api/launcher/close-tab', { method: 'POST' });
+  await fetch('http://127.0.0.1:6868/api/launcher/open-tab', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ model: expectedModel }),
+  });
+  const { taskId } = await submitVgTask({ prompt, files });
+  const task = await pollVgTask(taskId);
+  if (task.status !== 'done') throw new Error(`task ${task.status}: ${task.error?.message}`);
+  if (task.result?.actualModel && task.result.actualModel !== expectedModel) {
+    throw new Error(`model_fallback: requested ${expectedModel}, ran on ${task.result.actualModel}`);
+  }
+  return task.result.markdown;
+}
 ```
 ## Debug API

package/bugs/bug1.md ADDED Viewed

@@ -0,0 +1,370 @@
+# Bug 1: `open-tab` API silently downgrades model — no validation, no error
+**Reporter**: medgraph integration (chrome-mcp-vgcoder consumer)
+**Date filed**: 2026-05-10
+**Severity**: 🟠 MAJOR — silent quality degradation, hard for caller to detect
+**Affects**: `vetgo-server2.duckdns.org` (server2, account `phathuy.vetgo@gmail.com`), production deployment
+**Service URL**: `https://vetgo.webmcp.vn/vg/api/launcher/open-tab`
+---
+## Status: 🟢 RESOLVED (2026-05-10, v2.0.53+)
+**Root cause confirmed**: AI Studio silently fallback model khi account thiếu
+access. AI Studio versions mới (verified 2026-05-10) KHÔNG redirect URL — URL
+param giữ nguyên request, chỉ DOM `<ms-model-selector>` phản ánh model thực.
+**Fix shipped**:
+- v2.0.52: `open-tab` response thêm `requested_model` / `actual_model` /
+  `fallback_occurred` (URL-based — kept for backward-compat nhưng KHÔNG
+  reliable với AI Studio versions mới).
+- v2.0.53: **`task.result.actualModel`** — worker scrape DOM
+  `<ms-model-selector> [data-test-id="model-name"]` sau khi task done. Đây
+  là **source of truth** cho client. Verified work với:
+  - Request `gemini-99-fake` → `actualModel: "gemini-3.1-pro-preview"` (AI Studio fallback rõ ràng).
+**Client pattern recommend**: bỏ qua `actual_model` ở open-tab response, check
+`task.result.actualModel === requestedModel` sau task done; nếu khác → fallback
+xảy ra, retry hoặc fail-fast tùy context.
+**Code refs**:
+- `vetgo-auto/chrome/src/launcher.ts` — open-tab handler (URL-based detection,
+  legacy)
+- `src/server/views/js/features/task-worker.js:201-235` — `readActualModel()`
+  scrape DOM, retry without/with toggle panel
+- `src/server/task-queue.js:417-422` — persist `actualModel` vào `task.result`
+- `INTEGRATION.md` — client docs với recommend pattern
+---
+## Summary
+The `POST /api/launcher/open-tab` endpoint accepts any string in the
+`model` field, returns a success response with that model name in the
+URL, but **the actual Chromium tab silently loads a different model**
+(typically `gemini-3-flash-preview`) when the requested model is not
+available to the worker's logged-in Google account.
+The API caller has **no way to detect** the downgrade — both the API
+response and the URL contain the requested model name. Only by visually
+inspecting the AI Studio sidebar via noVNC can you see which model is
+actually loaded.
+This causes downstream tasks to run on a different model than the
+caller assumes, which silently degrades extraction quality without any
+warning.
+---
+## How to reproduce
+### Step 1 — Request `gemini-3-pro-preview` (or any model not available to the account)
+```bash
+BASE=https://vetgo.webmcp.vn/vg
+# Close existing tab first
+curl -X POST -H 'Content-Type: application/json' -d '{}' $BASE/api/launcher/close-tab
+# Open with pro-preview model
+curl -X POST -H 'Content-Type: application/json' \
+  -d '{"model":"gemini-3-pro-preview"}' \
+  $BASE/api/launcher/open-tab
+```
+### Step 2 — Observe API response (looks success)
+```json
+{
+  "ok": true,
+  "tabId": 477055248,
+  "windowId": 477055217,
+  "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
+}
+```
+HTTP status: `200 OK`. URL embeds the requested model. Caller assumes success.
+### Step 3 — Inspect actual tab via noVNC at `https://vetgo.webmcp.vn/vnc.html`
+Look at AI Studio sidebar (right pane). Observe:
+- **Sidebar title**: "Gemini 3 Flash Preview" (NOT "Gemini 3 Pro Preview")
+- **Model identifier** under title: `gemini-3-flash-preview`
+- **Browser URL bar**: shows `?model=gemini-3-flash-preview` (changed from request)
+Screenshot evidence (medgraph user observed 2026-05-10):
+```
+URL: aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview
+Sidebar: "Gemini 3 Flash Preview"
+         "gemini-3-flash-preview"
+         "Our most intelligent model built for speed,
+          combining frontier intelligence with superior
+          search and grounding."
+```
+### Step 4 — Confirm task uses Flash, not Pro
+Submit a multimodal task and measure duration:
+```bash
+curl -F prompt="Describe this PDF chapter structure as JSON" \
+     -F files=@/tmp/test_chapter.pdf \
+     $BASE/api/tasks
+```
+Observed: 38-page PDF processed in **58 seconds**.
+Expected on `gemini-3-pro-preview`: ~3–5 minutes for 38-page PDF (per
+Google's published Pro model latency benchmarks).
+→ 58s ≪ Pro baseline strongly indicates Flash, not Pro.
+---
+## Expected behavior
+`/api/launcher/open-tab` SHOULD **either**:
+**Option A (preferred — fail-fast)**:
+- Validate the requested `model` against a whitelist of models actually available to the worker account
+- If model not available, return `400 Bad Request` with body
+  ```json
+  {
+    "ok": false,
+    "error": "model_not_available",
+    "requested": "gemini-3-pro-preview",
+    "available": ["gemini-3-flash-preview", "gemini-2.5-flash", "gemini-2.5-pro"],
+    "message": "Model 'gemini-3-pro-preview' not accessible to account phathuy.vetgo@gmail.com. Choose from available list."
+  }
+  ```
+**Option B (acceptable — report fallback)**:
+- Allow tab to open with whatever AI Studio gives back
+- Detect the actual loaded model from the resulting tab URL or DOM
+- Return the **actually loaded** model in the response
+  ```json
+  {
+    "ok": true,
+    "tabId": 477055248,
+    "requested_model": "gemini-3-pro-preview",
+    "actual_model": "gemini-3-flash-preview",
+    "fallback_occurred": true,
+    "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview"
+  }
+  ```
+Either way, the caller MUST be informed when the loaded model differs
+from the requested model.
+---
+## Actual behavior (broken)
+API response:
+```json
+{
+  "ok": true,
+  "tabId": 477055248,
+  "windowId": 477055217,
+  "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
+}
+```
+But Chromium tab loads `?model=gemini-3-flash-preview` (visible in noVNC
+URL bar + sidebar). The `url` field in the API response is **the
+requested URL, not the loaded URL**.
+---
+## Root cause hypotheses (for fixer to investigate)
+1. **AI Studio frontend silently auto-fallbacks** when account lacks
+   access to a preview model. The Chromium tab navigates from
+   `?model=gemini-3-pro-preview` → `?model=gemini-3-flash-preview`
+   without any error toast.
+2. **`open-tab` handler returns the requested URL immediately** without
+   waiting for navigation to settle. It does NOT poll the tab's actual
+   final URL after AI Studio's redirect.
+3. **No model whitelist validation** in `open-tab` — any string is
+   accepted, even nonsense like `gemini-99-fake`. Tested with multiple
+   strings including `gemini-3-pro` (note: without `-preview` suffix);
+   all return `ok: true` regardless of validity. Each request returns
+   the requested URL verbatim, regardless of whether AI Studio actually
+   honors that model.
+---
+## Test cases for fixer
+| Input model | Account permission | Expected result |
+|-------------|-------------------|-----------------|
+| `gemini-3-flash-preview` | ✅ has access | open success, actual_model = requested |
+| `gemini-2.5-flash` | ✅ has access | open success, actual_model = requested |
+| `gemini-3-pro-preview` | ❌ no access | Option A: 400 error / Option B: ok with `actual_model: gemini-3-flash-preview, fallback_occurred: true` |
+| `gemini-3-pro` | ❌ doesn't exist | Option A: 400 invalid model / Option B: report actual fallback |
+| `gemini-99-fake` | ❌ doesn't exist | Option A: 400 invalid model |
+| (omitted `model` field) | — | Use default model, log which default chosen |
+---
+## Workaround (current — manual)
+Until fixed, callers must:
+1. Open tab via API
+2. Open `https://vetgo.webmcp.vn/vnc.html` separately
+3. Visually inspect AI Studio sidebar to verify model
+4. If wrong model loaded, manually click model dropdown in AI Studio UI
+   to select correct one
+This defeats the purpose of programmatic tab control. The bug is
+particularly insidious because:
+- **Output quality silently degrades** (Flash vs Pro for the same task)
+- **No error logged anywhere** — task succeeds, looks fine
+- **Token count similar** so caller can't detect via metrics
+- **Only manual visual inspection** reveals the issue
+In the medgraph case, this caused a recon task to run on Flash when
+caller assumed Pro, leading to ~5x faster completion time but with
+unknown quality trade-off. For production extraction of clinical
+veterinary protocols (high-liability content), this silent downgrade
+is unacceptable.
+---
+## Evidence collected (2026-05-10 03:29 UTC)
+### API responses
+```bash
+# Test 1: Request pro-preview
+$ curl -X POST -d '{"model":"gemini-3-pro-preview"}' \
+    $BASE/api/launcher/open-tab
+{"ok":true,"tabId":477055264,"windowId":477055217,
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"}
+# Test 2: Request 2.5-pro (might also fallback if account lacks access)
+$ curl -X POST -d '{"model":"gemini-2.5-pro"}' \
+    $BASE/api/launcher/open-tab
+{"ok":true,"tabId":477055260,"windowId":477055217,
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-pro"}
+# Test 3: Request gemini-3-pro (no -preview suffix, possibly invalid)
+$ curl -X POST -d '{"model":"gemini-3-pro"}' \
+    $BASE/api/launcher/open-tab
+{"ok":true,"tabId":477055262,"windowId":477055217,
+ "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro"}
+```
+All three return `ok: true` regardless of whether the account actually
+has access to the model. The API is unable to distinguish a successful
+load from a silent fallback.
+### Visual evidence from noVNC (after Test 1)
+Browser URL bar reads: `aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview`
+AI Studio Run-settings sidebar shows:
+```
+Gemini 3 Flash Preview
+gemini-3-flash-preview
+Our most intelligent model built for speed,
+combining frontier intelligence with superior
+search and grounding.
+```
+NOT `gemini-3-pro-preview` as requested.
+### Task duration evidence
+Submitted PDF (38 pages, 2.2MB, structured-extraction prompt):
+- Task ID: `t_1778383267520_b0c68b`
+- `durationMs`: `58047` ms (≈ 58 seconds)
+- Worker: `phathuy.vetgo@gmail.com`
+Reference benchmarks (Google AI Studio published latencies, Apr 2026):
+- `gemini-3-flash-preview` typical 30-60s for 30+ page PDF + JSON output
+- `gemini-3-pro-preview` typical 3-9 minutes for same workload
+Observed 58s strongly aligns with Flash, not Pro.
+---
+## Fix priority justification
+This bug breaks the **only programmatic mechanism for model selection**
+in chrome-mcp-vgcoder. README.md (lines 64-69) documents `open-tab` as
+the way to "lock model" — but in practice, this lock is non-functional
+when the account lacks the requested model.
+For automated pipelines (e.g. medgraph Layer 2 extraction), this means:
+- Cannot guarantee task quality without manual noVNC verification
+- Cannot run unattended batch jobs with confidence
+- Cannot programmatically detect/recover from model unavailability
+- Account quota issues are masked as "task succeeded" when actually
+  running on a degraded model
+Recommend Option A (fail-fast validation) for production deployments
+to surface account permission issues immediately, with Option B as a
+fallback for graceful degradation in dev/testing.
+---
+## Suggested implementation pointers
+For the AI agent fixing this bug:
+1. **Locate `open-tab` handler** — likely in `vg-coder-cli` source under
+   `src/server/launcher/` or similar. Search for handler accepting POST
+   on path matching `/api/launcher/open-tab`.
+2. **Add post-navigation poll**: after Chromium navigates to the
+   requested URL, wait for AI Studio frontend to settle (debounce ~2-3s
+   or watch for DOM "model selector loaded" event), then read the
+   actual `model` query param from `tab.url` (the URL after any
+   redirects).
+3. **Compare requested vs actual**:
+   ```js
+   const actualModel = new URL(tab.url).searchParams.get('model');
+   const requestedModel = req.body.model;
+   const fallbackOccurred = actualModel && actualModel !== requestedModel;
+   ```
+4. **For Option A**: also maintain a per-account model whitelist
+   (probably needs to scrape AI Studio model dropdown DOM once at
+   worker boot and cache).
+5. **Test on multiple accounts**: server2 (`phathuy.vetgo`) and
+   server3 (`udymec`) may have different model access — the fix must
+   work for both.
+6. **Update README.md**: document the new response shape and any
+   account-tier limitations on which models are available.
+7. **Add integration test** in `vg-coder-cli/tests/` that asserts
+   `actual_model === requested_model` after `open-tab`, and fails
+   loudly when fallback occurs without warning.
+---
+## Related code areas (for fixer to grep)
+- Handler: search for `app.post('/api/launcher/open-tab'` or similar route registration
+- Tab navigation: search for `chrome.tabs.update` or `chrome.tabs.create` with `url:` containing `aistudio.google.com`
+- Worker registration: `meta.domain === 'aistudio.google.com'` likely involves model param parsing already
+- Reference: `chrome-mcp-vgcoder/README.md` lines 64-69 document the contract this bug violates
+---
+## Out of scope for this bug
+- Fixing AI Studio's silent fallback behavior itself (that's Google's UI)
+- Adding model selection per-task (current architecture is per-worker; this bug only addresses
+  per-worker model selection accuracy)
+- Quota management / fallback strategy when preview models hit limits