tuna-agent 0.1.151 → 0.1.153

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -291,17 +291,18 @@ Rules:
291
291
  - NO GROUP ENTRIES (CRITICAL): NEVER output a collective/crowd label as a single entry — forbidden: "VILLAGERS", "LADIES GROUP", "KNITTING GROUP", "CROWD", "GROUP OF ...", any "*_GROUP". If 2+ similar secondary people RECUR across scenes, list them as SEPARATE numbered individuals (e.g. WOMAN_1, WOMAN_2, WOMAN_3), each with its OWN distinct face/hair/body/age. Only a truly anonymous one-off background that never recurs may be omitted entirely.
292
292
  - characters.description: ENGLISH only, factual, no camera/action words.
293
293
  - DISTINCT FACES (CRITICAL): every character MUST have a HIGHLY UNIQUE facial structure, a distinct hairstyle, a specific body type and a clearly different age. NEVER reuse the same or a similar facial description for two characters — they must look completely different from one another.`;
294
- // Phase-1 on Gemini 2.5 Flash: image-heavy read is far cheaper than gpt-4o,
295
- // and cast recall is backstopped by the post-Phase-2 reconcile pass, so a
296
- // small frame sample suffices here.
294
+ // Phase-1 on Gemini 3 Flash (strong multimodal, far cheaper image tokens
295
+ // than gpt-4o) with a dense 30-frame seed. 1 call/video; final cast
296
+ // recall is double-covered by the reconcile pass. Generous output budget
297
+ // so any model-side thinking can't starve the JSON answer.
297
298
  const parts = [
298
299
  { text: promptText },
299
300
  ...frames.map(b64 => ({ inlineData: { mimeType: 'image/jpeg', data: b64 } })),
300
301
  ];
301
- const { text: rawTxt, usage } = await geminiGenerate(parts, 1600, 'gemini-2.5-flash');
302
+ const { text: rawTxt, usage } = await geminiGenerate(parts, 3000, 'gemini-3-flash-preview');
302
303
  if (!rawTxt)
303
304
  return empty;
304
- cost?.geminiVision('phase1', usage, 'gemini-2.5-flash');
305
+ cost?.geminiVision('phase1', usage, 'gemini-3-flash-preview');
305
306
  let parsed = {};
306
307
  try {
307
308
  const m = rawTxt.match(/\{[\s\S]*\}/);
@@ -604,12 +605,12 @@ export async function analyzeVideo(url, onProgress) {
604
605
  // master cast + characters. Runs before per-scene describe so the cast
605
606
  // context keeps naming consistent across the whole timeline.
606
607
  progress('Đang phân tích tổng thể (summary + style + master cast)...');
607
- // Sample up to 10 frames evenly enough for summary + style + a naming
608
- // seed. Cast RECALL no longer depends on this sample: the post-Phase-2
609
- // reconcile pass derives the definitive cast from every per-scene
610
- // description, so a small sample keeps the (now Gemini 2.5 Flash) Phase-1
611
- // call cheap.
612
- const p1SampleCount = Math.min(10, frameBuffers.length);
608
+ // Sample up to 30 frames evenly (matches AI_Video_Clone). Final cast
609
+ // RECALL is owned by the post-Phase-2 reconcile pass (reads EVERY scene),
610
+ // but a dense 30-frame seed gives the per-scene pass a consistent naming
611
+ // vocabulary up-front cleaner reconcile + safer on hard cases. Only 1
612
+ // call/video so the richer sample is worth it.
613
+ const p1SampleCount = Math.min(30, frameBuffers.length);
613
614
  const p1Step = Math.max(1, Math.floor(frameBuffers.length / p1SampleCount));
614
615
  const p1Samples = frameBuffers
615
616
  .filter((_, i) => i % p1Step === 0)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tuna-agent",
3
- "version": "0.1.151",
3
+ "version": "0.1.153",
4
4
  "description": "Tuna Agent - Run AI coding tasks on your machine",
5
5
  "bin": {
6
6
  "tuna-agent": "dist/cli/index.js"