npm - vg-coder-cli - Versions diffs - 2.0.59 → 2.0.61 - Mend

vg-coder-cli 2.0.59 → 2.0.61

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/INTEGRATION.md +5 -5
package/dist/vg-coder-bundle.js +1 -1
package/package.json +1 -1
package/src/server/task-queue.js +12 -2
package/src/server/views/js/features/task-worker.js +29 -4
package/src/server/views/vg-coder/background.js +2 -2
package/bugs/bug1.md +0 -493

package/bugs/bug1.md DELETED Viewed

@@ -1,493 +0,0 @@
-# Bug 1: `open-tab` API silently downgrades model — no validation, no error
-**Reporter**: medgraph integration (chrome-mcp-vgcoder consumer)
-**Date filed**: 2026-05-10
-**Severity**: 🟠 MAJOR — silent quality degradation, hard for caller to detect
-**Affects**: `vetgo-server2.duckdns.org` (server2, account `phathuy.vetgo@gmail.com`), production deployment
-**Service URL**: `https://vetgo.webmcp.vn/vg/api/launcher/open-tab`
----
-## Status: 🟢 RESOLVED (2026-05-10, v2.0.57)
-**Real root cause** (xác định cuối cùng sau 5 round debug):
-`vetgo-auto/scripts/aistudio.google.com/main.js` hardcode
-`VG_DEFAULT_MODEL='gemini-3-flash-preview'`. `task-worker.js:handleTaskExecute`
-gọi `startNewChat()` ở đầu mỗi task → navigate
-`/prompts/new_chat?model=gemini-3-flash-preview` (drop `?model=` của caller pin).
-Task chạy trên Flash bất kể caller request gì.
-**Bằng chứng** (verified server2 ULTRA, 2026-05-10):
-- Trước v2.0.57: pin `gemini-3.1-pro-preview` → quan sát qua noVNC: tab navigate
-  Pro → reload thành Flash → `actualModel: gemini-3-flash-preview`
-- Sau v2.0.57: pin `gemini-3.1-pro-preview` → tab giữ Pro → `actualModel:
-  gemini-3.1-pro-preview` ✅
-**Fix shipped**:
-- v2.0.52: `open-tab` response thêm `requested_model` / `actual_model` /
-  `fallback_occurred` (URL-based, partial detection)
-- v2.0.53: `task.result.actualModel` — worker scrape DOM `<ms-model-selector>`
-  sau task done (detection accurate)
-- v2.0.55: `_pinnedModelByEmail` Map — `_recycleWorkerTab` reopen với pinned
-  model (giữ pin qua nhiều task)
-- **v2.0.57: REAL FIX** — `getTargetModel()` đọc model từ URL hiện tại +
-  sessionStorage cache, thay tất cả hardcode `VG_DEFAULT_MODEL`. `startNewChat`
-  + `pinPromptModel` không còn override pin của caller.
-**Client pattern recommend**:
-1. Pin model qua `POST /api/launcher/open-tab` body `{model: "..."}`
-2. Submit task → check `task.result.actualModel === expectedModel`
-3. Nếu khác → AI Studio fallback (account không có access). Decide retry /
-   fail-fast / accept tùy use-case.
-**Code refs**:
-- `vetgo-auto/scripts/aistudio.google.com/main.js:20-37` — `getTargetModel()`
-  (real fix — primary)
-- `src/server/views/js/features/task-worker.js:210-236` — `readActualModel()`
-  scrape DOM (verification layer)
-- `src/server/task-queue.js:42-44, 491-494` — `_pinnedModelByEmail` persist
-  qua recycle (secondary fix)
-- `vetgo-auto/chrome/src/launcher.ts` — open-tab handler URL-based detection
-  (legacy — không reliable với AI Studio versions mới)
-- `INTEGRATION.md "Verify model thực"` — client docs với recommend pattern
----
-## Summary
-The `POST /api/launcher/open-tab` endpoint accepts any string in the
-`model` field, returns a success response with that model name in the
-URL, but **the actual Chromium tab silently loads a different model**
-(typically `gemini-3-flash-preview`) when the requested model is not
-available to the worker's logged-in Google account.
-The API caller has **no way to detect** the downgrade — both the API
-response and the URL contain the requested model name. Only by visually
-inspecting the AI Studio sidebar via noVNC can you see which model is
-actually loaded.
-This causes downstream tasks to run on a different model than the
-caller assumes, which silently degrades extraction quality without any
-warning.
----
-## How to reproduce
-### Step 1 — Request `gemini-3-pro-preview` (or any model not available to the account)
-```bash
-BASE=https://vetgo.webmcp.vn/vg
-# Close existing tab first
-curl -X POST -H 'Content-Type: application/json' -d '{}' $BASE/api/launcher/close-tab
-# Open with pro-preview model
-curl -X POST -H 'Content-Type: application/json' \
-  -d '{"model":"gemini-3-pro-preview"}' \
-  $BASE/api/launcher/open-tab
-```
-### Step 2 — Observe API response (looks success)
-```json
-{
-  "ok": true,
-  "tabId": 477055248,
-  "windowId": 477055217,
-  "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
-}
-```
-HTTP status: `200 OK`. URL embeds the requested model. Caller assumes success.
-### Step 3 — Inspect actual tab via noVNC at `https://vetgo.webmcp.vn/vnc.html`
-Look at AI Studio sidebar (right pane). Observe:
-- **Sidebar title**: "Gemini 3 Flash Preview" (NOT "Gemini 3 Pro Preview")
-- **Model identifier** under title: `gemini-3-flash-preview`
-- **Browser URL bar**: shows `?model=gemini-3-flash-preview` (changed from request)
-Screenshot evidence (medgraph user observed 2026-05-10):
-```
-URL: aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview
-Sidebar: "Gemini 3 Flash Preview"
-         "gemini-3-flash-preview"
-         "Our most intelligent model built for speed,
-          combining frontier intelligence with superior
-          search and grounding."
-```
-### Step 4 — Confirm task uses Flash, not Pro
-Submit a multimodal task and measure duration:
-```bash
-curl -F prompt="Describe this PDF chapter structure as JSON" \
-     -F files=@/tmp/test_chapter.pdf \
-     $BASE/api/tasks
-```
-Observed: 38-page PDF processed in **58 seconds**.
-Expected on `gemini-3-pro-preview`: ~3–5 minutes for 38-page PDF (per
-Google's published Pro model latency benchmarks).
-→ 58s ≪ Pro baseline strongly indicates Flash, not Pro.
----
-## Expected behavior
-`/api/launcher/open-tab` SHOULD **either**:
-**Option A (preferred — fail-fast)**:
-- Validate the requested `model` against a whitelist of models actually available to the worker account
-- If model not available, return `400 Bad Request` with body
-  ```json
-  {
-    "ok": false,
-    "error": "model_not_available",
-    "requested": "gemini-3-pro-preview",
-    "available": ["gemini-3-flash-preview", "gemini-2.5-flash", "gemini-2.5-pro"],
-    "message": "Model 'gemini-3-pro-preview' not accessible to account phathuy.vetgo@gmail.com. Choose from available list."
-  }
-  ```
-**Option B (acceptable — report fallback)**:
-- Allow tab to open with whatever AI Studio gives back
-- Detect the actual loaded model from the resulting tab URL or DOM
-- Return the **actually loaded** model in the response
-  ```json
-  {
-    "ok": true,
-    "tabId": 477055248,
-    "requested_model": "gemini-3-pro-preview",
-    "actual_model": "gemini-3-flash-preview",
-    "fallback_occurred": true,
-    "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview"
-  }
-  ```
-Either way, the caller MUST be informed when the loaded model differs
-from the requested model.
----
-## Actual behavior (broken)
-API response:
-```json
-{
-  "ok": true,
-  "tabId": 477055248,
-  "windowId": 477055217,
-  "url": "https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"
-}
-```
-But Chromium tab loads `?model=gemini-3-flash-preview` (visible in noVNC
-URL bar + sidebar). The `url` field in the API response is **the
-requested URL, not the loaded URL**.
----
-## Root cause hypotheses (for fixer to investigate)
-1. **AI Studio frontend silently auto-fallbacks** when account lacks
-   access to a preview model. The Chromium tab navigates from
-   `?model=gemini-3-pro-preview` → `?model=gemini-3-flash-preview`
-   without any error toast.
-2. **`open-tab` handler returns the requested URL immediately** without
-   waiting for navigation to settle. It does NOT poll the tab's actual
-   final URL after AI Studio's redirect.
-3. **No model whitelist validation** in `open-tab` — any string is
-   accepted, even nonsense like `gemini-99-fake`. Tested with multiple
-   strings including `gemini-3-pro` (note: without `-preview` suffix);
-   all return `ok: true` regardless of validity. Each request returns
-   the requested URL verbatim, regardless of whether AI Studio actually
-   honors that model.
----
-## Test cases for fixer
-| Input model | Account permission | Expected result |
-|-------------|-------------------|-----------------|
-| `gemini-3-flash-preview` | ✅ has access | open success, actual_model = requested |
-| `gemini-2.5-flash` | ✅ has access | open success, actual_model = requested |
-| `gemini-3-pro-preview` | ❌ no access | Option A: 400 error / Option B: ok with `actual_model: gemini-3-flash-preview, fallback_occurred: true` |
-| `gemini-3-pro` | ❌ doesn't exist | Option A: 400 invalid model / Option B: report actual fallback |
-| `gemini-99-fake` | ❌ doesn't exist | Option A: 400 invalid model |
-| (omitted `model` field) | — | Use default model, log which default chosen |
----
-## Workaround (current — manual)
-Until fixed, callers must:
-1. Open tab via API
-2. Open `https://vetgo.webmcp.vn/vnc.html` separately
-3. Visually inspect AI Studio sidebar to verify model
-4. If wrong model loaded, manually click model dropdown in AI Studio UI
-   to select correct one
-This defeats the purpose of programmatic tab control. The bug is
-particularly insidious because:
-- **Output quality silently degrades** (Flash vs Pro for the same task)
-- **No error logged anywhere** — task succeeds, looks fine
-- **Token count similar** so caller can't detect via metrics
-- **Only manual visual inspection** reveals the issue
-In the medgraph case, this caused a recon task to run on Flash when
-caller assumed Pro, leading to ~5x faster completion time but with
-unknown quality trade-off. For production extraction of clinical
-veterinary protocols (high-liability content), this silent downgrade
-is unacceptable.
----
-## Evidence collected (2026-05-10 03:29 UTC)
-### API responses
-```bash
-# Test 1: Request pro-preview
-$ curl -X POST -d '{"model":"gemini-3-pro-preview"}' \
-    $BASE/api/launcher/open-tab
-{"ok":true,"tabId":477055264,"windowId":477055217,
- "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview"}
-# Test 2: Request 2.5-pro (might also fallback if account lacks access)
-$ curl -X POST -d '{"model":"gemini-2.5-pro"}' \
-    $BASE/api/launcher/open-tab
-{"ok":true,"tabId":477055260,"windowId":477055217,
- "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-pro"}
-# Test 3: Request gemini-3-pro (no -preview suffix, possibly invalid)
-$ curl -X POST -d '{"model":"gemini-3-pro"}' \
-    $BASE/api/launcher/open-tab
-{"ok":true,"tabId":477055262,"windowId":477055217,
- "url":"https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro"}
-```
-All three return `ok: true` regardless of whether the account actually
-has access to the model. The API is unable to distinguish a successful
-load from a silent fallback.
-### Visual evidence from noVNC (after Test 1)
-Browser URL bar reads: `aistudio.google.com/prompts/new_chat?model=gemini-3-flash-preview`
-AI Studio Run-settings sidebar shows:
-```
-Gemini 3 Flash Preview
-gemini-3-flash-preview
-Our most intelligent model built for speed,
-combining frontier intelligence with superior
-search and grounding.
-```
-NOT `gemini-3-pro-preview` as requested.
-### Task duration evidence
-Submitted PDF (38 pages, 2.2MB, structured-extraction prompt):
-- Task ID: `t_1778383267520_b0c68b`
-- `durationMs`: `58047` ms (≈ 58 seconds)
-- Worker: `phathuy.vetgo@gmail.com`
-Reference benchmarks (Google AI Studio published latencies, Apr 2026):
-- `gemini-3-flash-preview` typical 30-60s for 30+ page PDF + JSON output
-- `gemini-3-pro-preview` typical 3-9 minutes for same workload
-Observed 58s strongly aligns with Flash, not Pro.
----
-## Fix priority justification
-This bug breaks the **only programmatic mechanism for model selection**
-in chrome-mcp-vgcoder. README.md (lines 64-69) documents `open-tab` as
-the way to "lock model" — but in practice, this lock is non-functional
-when the account lacks the requested model.
-For automated pipelines (e.g. medgraph Layer 2 extraction), this means:
-- Cannot guarantee task quality without manual noVNC verification
-- Cannot run unattended batch jobs with confidence
-- Cannot programmatically detect/recover from model unavailability
-- Account quota issues are masked as "task succeeded" when actually
-  running on a degraded model
-Recommend Option A (fail-fast validation) for production deployments
-to surface account permission issues immediately, with Option B as a
-fallback for graceful degradation in dev/testing.
----
-## Suggested implementation pointers
-For the AI agent fixing this bug:
-1. **Locate `open-tab` handler** — likely in `vg-coder-cli` source under
-   `src/server/launcher/` or similar. Search for handler accepting POST
-   on path matching `/api/launcher/open-tab`.
-2. **Add post-navigation poll**: after Chromium navigates to the
-   requested URL, wait for AI Studio frontend to settle (debounce ~2-3s
-   or watch for DOM "model selector loaded" event), then read the
-   actual `model` query param from `tab.url` (the URL after any
-   redirects).
-3. **Compare requested vs actual**:
-   ```js
-   const actualModel = new URL(tab.url).searchParams.get('model');
-   const requestedModel = req.body.model;
-   const fallbackOccurred = actualModel && actualModel !== requestedModel;
-   ```
-4. **For Option A**: also maintain a per-account model whitelist
-   (probably needs to scrape AI Studio model dropdown DOM once at
-   worker boot and cache).
-5. **Test on multiple accounts**: server2 (`phathuy.vetgo`) and
-   server3 (`udymec`) may have different model access — the fix must
-   work for both.
-6. **Update README.md**: document the new response shape and any
-   account-tier limitations on which models are available.
-7. **Add integration test** in `vg-coder-cli/tests/` that asserts
-   `actual_model === requested_model` after `open-tab`, and fails
-   loudly when fallback occurs without warning.
----
-## Related code areas (for fixer to grep)
-- Handler: search for `app.post('/api/launcher/open-tab'` or similar route registration
-- Tab navigation: search for `chrome.tabs.update` or `chrome.tabs.create` with `url:` containing `aistudio.google.com`
-- Worker registration: `meta.domain === 'aistudio.google.com'` likely involves model param parsing already
-- Reference: `chrome-mcp-vgcoder/README.md` lines 64-69 document the contract this bug violates
----
-## Out of scope for this bug
-- Fixing AI Studio's silent fallback behavior itself (that's Google's UI)
-- Adding model selection per-task (current architecture is per-worker; this bug only addresses
-  per-worker model selection accuracy)
-- Quota management / fallback strategy when preview models hit limits
----
-## Debug timeline (2026-05-10)
-Bug fix mất 5 round vì đoán sai root cause vài lần. Lưu lại để tránh lặp:
-### Round 1 (v2.0.52) — URL-based detection ❌
-Hypothesis: AI Studio redirect URL khi fallback. Add `requested_model` /
-`actual_model` từ URL ở open-tab response.
-**Sai vì**: AI Studio versions hiện tại **không** redirect URL (từng làm trong
-quá khứ?). URL giữ nguyên dù fallback.
-### Round 2 (v2.0.53) — DOM scrape sau task ✅ partial
-Worker `readActualModel()` scrape `<ms-model-selector>` sau task done. Verified
-work với account free tier (request fake model → DOM trả model thực).
-### Round 3 — Test với account ULTRA, vẫn Flash
-Hypothesis: pin model bị mất sau `_recycleWorkerTab` (close+reopen với default).
-v2.0.55 add `_pinnedModelByEmail` Map.
-**Đúng 1 phần**: pin persist qua recycle. Nhưng test vẫn Flash.
-### Round 4 — Hypothesis "AI Studio strip ?model="
-Quan sát: URL during task = `/prompts/new_chat` (no query). Đoán AI Studio
-auto-clean URL sau prompt submit.
-**Sai vì**: chính code vg-coder navigate URL (không phải AI Studio).
-### Round 5 (v2.0.57) — REAL ROOT CAUSE ✅
-User quan sát qua noVNC: tab navigate đúng Pro → reload thành Flash trước khi
-chat. Grep code → tìm thấy `vetgo-auto/scripts/aistudio.google.com/main.js`
-hardcode `VG_DEFAULT_MODEL='gemini-3-flash-preview'`. `startNewChat()` ở đầu
-mỗi task navigate `/prompts/new_chat?model=Flash` → override pin caller.
-Fix: `getTargetModel()` dynamic từ URL + sessionStorage. **Verified work.**
-### Round 6 — CI miss step (real-real fix)
-Sau v2.0.57 deployed, test server3 vẫn fail. Verify bundle trên server:
-- `grep getTargetModel /usr/local/lib/node_modules/vg-coder-cli/dist/vg-coder-bundle.js` → 0 match
-- v2.0.57 publish thành công nhưng code mới **không có** trong bundle
-**Cause**: `vetgo-auto/scripts/aistudio.google.com/main.js` deploy qua Firebase
-RTDB (`ENV/VGCODER`), KHÔNG bundle vào npm package. Extension fetch script tại
-runtime từ Firebase. CI `publish.yml` chỉ chạy `build:extension` + `build:copy`
-+ `build:inject` — bỏ qua `deploy-scripts`. Phải chạy thủ công
-`cd vetgo-auto && node deploy-scripts.js` từ máy local để push code mới.
-**Fix CI**: add `npm run deploy-scripts` step vào `publish.yml` (commit
-83186ba). Lần sau bump version → CI tự push Firebase.
-### Round 7 — File upload race (server4 Windows-specific)
-Sau bug1 fix work cả 3 server cho text-only task, test multimodal:
-- server2/3 (Linux native): image + PDF Pro work ✅
-- server4 (Docker Desktop Windows): cả image + PDF → model trả "chưa upload
-  file" mặc dù chip hiển thị file đã attach trong UI.
-User screenshot 2 lần show:
-1. Chip "feline-xray-chest7.jpg Loading..." — file đang upload
-2. Chip "feline-xray-chest7.jpg 1,101 tokens" — upload xong, nhưng textarea
-   vẫn trống, worker chưa paste prompt
-DOM inspect tìm thấy chip element là `<ms-prompt-media>` /
-`[data-test-id="prompt-media-container"]` — KHÔNG match selector cũ
-`ms-prompt-chip-file, ms-file-chip, ms-attachment-chip`. Wait loop pass với
-`chips=0`, fall through 30s timeout → submit Run trước khi tokenize xong → AI
-Studio drop file silent.
-**Cause cụ thể**: 2 vấn đề chồng nhau:
-1. Selector outdated (AI Studio đã rename element 2026)
-2. Không có check token-count finalize (chip hiện ngay sau drop, tokenize 5-15s
-   sau đặc biệt trên Windows fs latency)
-**Fix v2.0.58**:
-- Update CHIP_SELECTORS thêm `ms-prompt-media`, `[data-test-id="prompt-media-container"]`
-- Add wait loop: chip text phải match `/[\d,]+ tokens/` (support comma) VÀ
-  KHÔNG có `Calculating|Processing|Uploading|Loading`
-- Timeout 60s → proceed anyway
-**Result**: server4 image task 105s → 23s sau fix selector + regex chuẩn. Cả 3
-server multimodal Pro work end-to-end.
-### Lesson learned
-- **Quan sát visual (noVNC) > eval DOM async**: User report "tab reload trước
-  khi chat" + screenshot chip "Loading..." → "1,101 tokens" textarea trống là
-  clue quyết định. Eval DOM tại các thời điểm khác nhau cho data rời rạc khó
-  ráp — visual real-time mới thấy state transitions.
-- **Grep hardcode constant trước khi đoán external behavior**: 4 round đầu đoán
-  AI Studio làm gì đó (redirect, strip query). Round 5 tìm thấy hardcode
-  `VG_DEFAULT_MODEL` trong chính code mình.
-- **Verify deploy artifact trên target server**: Round 6 chỉ ra version bump
-  KHÔNG đảm bảo code mới chạy nếu deploy pipeline có gap. Sau mỗi fix, grep
-  symbol mới trong file production thực — không tin "CI passed = code chạy".
-- **Selector update qua thời gian (round 7)**: AI Studio Angular rename
-  tag/class qua mỗi version. Match cả new + legacy selector trong array. Khi
-  wait loop pass với count=0 (warning ignored) → có thể selector đã chết.
-- **Multi-issue stacking**: Round 7 có 2 cause chồng nhau (selector outdated
-  AND tokenize race). Fix 1 cause không đủ — verify fully E2E sau mỗi fix.
-- **Multiple layer detection có giá trị**: `actualModel` (DOM scrape) là
-  source-of-truth đúng đắn ngay từ v2.0.53 — confirm bug có thật, định nghĩa
-  expected behavior cho fix, không phụ thuộc fix nào fail.
-- **Firebase deploy không đồng bộ với npm publish**: code AIChat ở
-  `vetgo-auto/scripts/aistudio.google.com/*.js` deploy qua Firebase RTDB
-  riêng. Bump npm package version KHÔNG đẩy code này (trước v2.0.58 CI fix).
-  Test fix nhanh: `cd vetgo-auto && node deploy-scripts.js` + restart Chromium
-  — KHÔNG cần rebuild image hay update package.