npm - @gmickel/gno - Versions diffs - 0.40.2 → 0.41.1 - Mend

@gmickel/gno 0.40.2 → 0.41.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +63 -32
package/package.json +1 -1
package/src/cli/commands/embed.ts +251 -27
package/src/cli/commands/vsearch.ts +1 -1
package/src/embed/backlog.ts +28 -21
package/src/embed/batch.ts +277 -0
package/src/llm/embedding-compatibility.ts +82 -0
package/src/mcp/tools/vsearch.ts +1 -1
package/src/pipeline/contextual.ts +19 -2
package/src/pipeline/hybrid.ts +66 -24
package/src/pipeline/vsearch.ts +3 -1
package/src/sdk/client.ts +1 -1
package/src/sdk/embed.ts +31 -14
package/src/store/vector/sqlite-vec.ts +11 -6

package/README.md CHANGED Viewed

@@ -87,7 +87,7 @@ gno daemon
 ## What's New
-> Latest release: [v0.39.1](./CHANGELOG.md#0391---2026-04-06)
+> Latest release: [v0.40.2](./CHANGELOG.md#0402---2026-04-06)
 > Full release history: [CHANGELOG.md](./CHANGELOG.md)
 - **Retrieval Quality Upgrade**: stronger BM25 lexical handling, code-aware chunking, terminal result hyperlinks, and per-collection model overrides
@@ -108,6 +108,35 @@ gno embed
 That regenerates embeddings for the new default model. Old vectors are kept
 until you explicitly clear stale embeddings.
+If the release also changes the embedding formatting/profile behavior for your
+active model, prefer one of these stronger migration paths:
+```bash
+gno embed --force
+```
+or per collection:
+```bash
+gno collection clear-embeddings my-collection --all
+gno embed my-collection
+```
+If a re-embed run still reports failures, rerun with:
+```bash
+gno --verbose embed --force
+```
+Recent releases now print sample embedding errors and a concrete retry hint when
+batch recovery cannot fully recover on its own.
+Model guides:
+- [Code Embeddings](./docs/guides/code-embeddings.md)
+- [Per-Collection Models](./docs/guides/per-collection-models.md)
+- [Bring Your Own Models](./docs/guides/bring-your-own-models.md)
 ### Fine-Tuned Model Quick Use
 ```yaml
@@ -672,22 +701,23 @@ graph TD
 Models auto-download on first use to `~/.cache/gno/models/`. For deterministic startup, set `GNO_NO_AUTO_DOWNLOAD=1` and use `gno models pull` explicitly. Alternatively, offload to a GPU server on your network using HTTP backends.
-| Model                | Purpose                               | Size         |
-| :------------------- | :------------------------------------ | :----------- |
-| Qwen3-Embedding-0.6B | Embeddings (multilingual)             | ~640MB       |
-| Qwen3-Reranker-0.6B  | Cross-encoder reranking (32K context) | ~700MB       |
-| Qwen/SmolLM          | Query expansion + AI answers          | ~600MB-1.2GB |
+| Model                  | Purpose                               | Size         |
+| :--------------------- | :------------------------------------ | :----------- |
+| Qwen3-Embedding-0.6B   | Embeddings (multilingual)             | ~640MB       |
+| Qwen3-Reranker-0.6B    | Cross-encoder reranking (32K context) | ~700MB       |
+| Qwen3 / Qwen2.5 family | Query expansion + AI answers          | ~600MB-2.5GB |
 ### Model Presets
-| Preset     | Disk   | Best For                     |
-| :--------- | :----- | :--------------------------- |
-| `slim`     | ~1GB   | Fast, good quality (default) |
-| `balanced` | ~2GB   | Slightly larger model        |
-| `quality`  | ~2.5GB | Best answers                 |
+| Preset       | Disk   | Best For                                                |
+| :----------- | :----- | :------------------------------------------------------ |
+| `slim-tuned` | ~1GB   | Current default, tuned retrieval in a compact footprint |
+| `slim`       | ~1GB   | Fast, good quality                                      |
+| `balanced`   | ~2GB   | Slightly larger model                                   |
+| `quality`    | ~2.5GB | Best answers                                            |
 ```bash
-gno models use slim
+gno models use slim-tuned
 gno models pull --all  # Optional: pre-download models (auto-downloads on first use)
 ```
@@ -720,7 +750,7 @@ models:
   presets:
     - id: remote-gpu
       name: Remote GPU Server
-      embed: "http://192.168.1.100:8081/v1/embeddings#bge-m3"
+      embed: "http://192.168.1.100:8081/v1/embeddings#qwen3-embedding-0.6b"
       rerank: "http://192.168.1.100:8082/v1/completions#reranker"
       expand: "http://192.168.1.100:8083/v1/chat/completions#gno-expand"
       gen: "http://192.168.1.100:8083/v1/chat/completions#qwen3-4b"
@@ -730,6 +760,11 @@ Works with llama-server, Ollama, LocalAI, vLLM, or any OpenAI-compatible server.
 > **Configuration**: [Model Setup](https://gno.sh/docs/CONFIGURATION/)
+Remote/BYOM guides:
+- [Bring Your Own Models](./docs/guides/bring-your-own-models.md)
+- [Per-Collection Models](./docs/guides/per-collection-models.md)
 ---
 ## Architecture
@@ -801,33 +836,29 @@ If a model turns out to be better specifically for code, the intended user story
 That lets GNO stay sane by default while still giving power users a clean path to code-specialist retrieval.
-Current code-focused recommendation:
+More model docs:
-```yaml
-collections:
-  - name: gno-code
-    path: /Users/you/work/gno/src
-    pattern: "**/*.{ts,tsx,js,jsx,go,rs,py,swift,c}"
-    models:
-      embed: "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
-```
+- [Code Embeddings](./docs/guides/code-embeddings.md)
+- [Per-Collection Models](./docs/guides/per-collection-models.md)
+- [Bring Your Own Models](./docs/guides/bring-your-own-models.md)
-GNO treats that override like any other model URI:
+Current product stance:
-- auto-downloads on first use by default
-- manual-only if `GNO_NO_AUTO_DOWNLOAD=1`
-- offline-safe if the model is already cached
+- `Qwen3-Embedding-0.6B-GGUF` is already the global default embed model
+- you do **not** need a collection override just to get Qwen on code collections
+- use a collection override only when one collection should intentionally diverge from that default
-Why this is the current recommendation:
+Why Qwen is the current default:
-- matches `bge-m3` on the tiny canonical benchmark
+- matches or exceeds `bge-m3` on the tiny canonical benchmark
 - significantly beats `bge-m3` on the real GNO `src/serve` code slice
 - also beats `bge-m3` on a pinned public-OSS code slice
+- also beats `bge-m3` on the multilingual prose/docs benchmark lane
-Trade-off:
+Current trade-off:
 - Qwen is slower to embed than `bge-m3`
-- existing users upgrading to the new default may need to run `gno embed` again so vector and hybrid retrieval catch up
+- existing users upgrading or adopting a new embedding formatting profile may need to run `gno embed` again so stored vectors match the current formatter/runtime path
 ### General Multilingual Embedding Benchmark
@@ -841,8 +872,8 @@ bun run bench:general-embeddings --candidate qwen3-embedding-0.6b --write
 Current signal on the public multilingual FastAPI-docs fixture:
-- `bge-m3`: vector nDCG@10 `0.350`, hybrid nDCG@10 `0.642`
-- `Qwen3-Embedding-0.6B-GGUF`: vector nDCG@10 `0.859`, hybrid nDCG@10 `0.947`
+- `bge-m3`: vector nDCG@10 `0.3508`, hybrid nDCG@10 `0.6756`
+- `Qwen3-Embedding-0.6B-GGUF`: vector nDCG@10 `0.9891`, hybrid nDCG@10 `0.9891`
 Interpretation:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@gmickel/gno",
-  "version": "0.40.2",
+  "version": "0.41.1",
   "description": "Local semantic search for your documents. Index Markdown, PDF, and Office files with hybrid BM25 + vector search.",
   "keywords": [
     "embeddings",

package/src/cli/commands/embed.ts CHANGED Viewed

@@ -17,6 +17,7 @@ import {
   isInitialized,
   loadConfig,
 } from "../../config";
+import { embedTextsWithRecovery } from "../../embed/batch";
 import { LlmAdapter } from "../../llm/nodeLlamaCpp/adapter";
 import { resolveDownloadPolicy } from "../../llm/policy";
 import { resolveModelUri } from "../../llm/registry";
@@ -70,6 +71,9 @@ export type EmbedResult =
       duration: number;
       model: string;
       searchAvailable: boolean;
+      errorSamples?: string[];
+      suggestion?: string;
+      syncError?: string;
     }
   | { success: false; error: string };
@@ -86,6 +90,30 @@ function formatDuration(seconds: number): string {
   return `${mins}m ${secs.toFixed(0)}s`;
 }
+function formatLlmFailure(
+  error: { message: string; cause?: unknown } | undefined
+): string {
+  if (!error) {
+    return "Unknown embedding failure";
+  }
+  const cause =
+    error.cause &&
+    typeof error.cause === "object" &&
+    "message" in error.cause &&
+    typeof error.cause.message === "string"
+      ? error.cause.message
+      : typeof error.cause === "string"
+        ? error.cause
+        : "";
+  return cause && cause !== error.message
+    ? `${error.message} - ${cause}`
+    : error.message;
+}
+function isDisposedBatchError(message: string): boolean {
+  return message.toLowerCase().includes("object is disposed");
+}
 async function checkVecAvailable(
   db: import("bun:sqlite").Database
 ): Promise<boolean> {
@@ -110,10 +138,20 @@ interface BatchContext {
   showProgress: boolean;
   totalToEmbed: number;
   verbose: boolean;
+  recreateEmbedPort?: () => Promise<
+    { ok: true; value: EmbeddingPort } | { ok: false; error: string }
+  >;
 }
 type BatchResult =
-  | { ok: true; embedded: number; errors: number; duration: number }
+  | {
+      ok: true;
+      embedded: number;
+      errors: number;
+      duration: number;
+      errorSamples: string[];
+      suggestion?: string;
+    }
   | { ok: false; error: string };
 interface Cursor {
@@ -125,8 +163,21 @@ async function processBatches(ctx: BatchContext): Promise<BatchResult> {
   const startTime = Date.now();
   let embedded = 0;
   let errors = 0;
+  const errorSamples: string[] = [];
+  let suggestion: string | undefined;
   let cursor: Cursor | undefined;
+  const pushErrorSamples = (samples: string[]): void => {
+    for (const sample of samples) {
+      if (errorSamples.length >= 5) {
+        break;
+      }
+      if (!errorSamples.includes(sample)) {
+        errorSamples.push(sample);
+      }
+    }
+  };
   while (embedded + errors < ctx.totalToEmbed) {
     // Get next batch using seek pagination (cursor-based)
     const batchResult = ctx.force
@@ -153,10 +204,96 @@ async function processBatches(ctx: BatchContext): Promise<BatchResult> {
     }
     // Embed batch with contextual formatting (title prefix)
-    const batchEmbedResult = await ctx.embedPort.embedBatch(
-      batch.map((b) => formatDocForEmbedding(b.text, b.title ?? undefined))
+    const batchEmbedResult = await embedTextsWithRecovery(
+      ctx.embedPort,
+      batch.map((b) =>
+        formatDocForEmbedding(b.text, b.title ?? undefined, ctx.modelUri)
+      )
     );
     if (!batchEmbedResult.ok) {
+      const formattedError = formatLlmFailure(batchEmbedResult.error);
+      if (ctx.recreateEmbedPort && isDisposedBatchError(formattedError)) {
+        if (ctx.verbose) {
+          process.stderr.write(
+            "\n[embed] Embedding port disposed; recreating model/contexts and retrying batch once\n"
+          );
+        }
+        const recreated = await ctx.recreateEmbedPort();
+        if (recreated.ok) {
+          ctx.embedPort = recreated.value;
+          const retryResult = await embedTextsWithRecovery(
+            ctx.embedPort,
+            batch.map((b) =>
+              formatDocForEmbedding(b.text, b.title ?? undefined, ctx.modelUri)
+            )
+          );
+          if (retryResult.ok) {
+            if (ctx.verbose) {
+              process.stderr.write(
+                "\n[embed] Retry after port reset succeeded\n"
+              );
+            }
+            pushErrorSamples(retryResult.value.failureSamples);
+            suggestion ||= retryResult.value.retrySuggestion;
+            const retryVectors: VectorRow[] = [];
+            for (const [idx, item] of batch.entries()) {
+              const embedding = retryResult.value.vectors[idx];
+              if (!embedding) {
+                errors += 1;
+                continue;
+              }
+              retryVectors.push({
+                mirrorHash: item.mirrorHash,
+                seq: item.seq,
+                model: ctx.modelUri,
+                embedding: new Float32Array(embedding),
+              });
+            }
+            if (retryVectors.length === 0) {
+              if (ctx.verbose) {
+                process.stderr.write(
+                  "\n[embed] No recoverable embeddings in retry batch\n"
+                );
+              }
+              continue;
+            }
+            const retryStoreResult =
+              await ctx.vectorIndex.upsertVectors(retryVectors);
+            if (!retryStoreResult.ok) {
+              if (ctx.verbose) {
+                process.stderr.write(
+                  `\n[embed] Store failed: ${retryStoreResult.error.message}\n`
+                );
+              }
+              pushErrorSamples([retryStoreResult.error.message]);
+              suggestion ??=
+                "Store write failed. Rerun `gno embed` once more; if it repeats, run `gno doctor` and `gno vec sync`.";
+              errors += retryVectors.length;
+              continue;
+            }
+            embedded += retryVectors.length;
+            if (ctx.showProgress) {
+              const embeddedDisplay = Math.min(embedded, ctx.totalToEmbed);
+              const completed = Math.min(embedded + errors, ctx.totalToEmbed);
+              const pct = (completed / ctx.totalToEmbed) * 100;
+              const elapsed = (Date.now() - startTime) / 1000;
+              const rate = embedded / Math.max(elapsed, 0.001);
+              const eta =
+                Math.max(0, ctx.totalToEmbed - completed) /
+                Math.max(rate, 0.001);
+              process.stdout.write(
+                `\rEmbedding: ${embeddedDisplay.toLocaleString()}/${ctx.totalToEmbed.toLocaleString()} (${pct.toFixed(1)}%) | ${rate.toFixed(1)} chunks/s | ETA ${formatDuration(eta)}`
+              );
+            }
+            continue;
+          }
+        }
+      }
       if (ctx.verbose) {
         const err = batchEmbedResult.error;
         const cause = err.cause;
@@ -174,30 +311,52 @@ async function processBatches(ctx: BatchContext): Promise<BatchResult> {
           `\n[embed] Batch failed (${batch.length} chunks: ${titles}${batch.length > 3 ? "..." : ""}): ${err.message}${causeMsg ? ` - ${causeMsg}` : ""}\n`
         );
       }
+      pushErrorSamples([formattedError]);
+      suggestion =
+        "Try rerunning the same command. If failures persist, rerun with `gno --verbose embed --batch-size 1` to isolate failing chunks.";
       errors += batch.length;
       continue;
     }
-    // Validate batch/embedding count match
-    const embeddings = batchEmbedResult.value;
-    if (embeddings.length !== batch.length) {
+    if (ctx.verbose && batchEmbedResult.value.batchFailed) {
+      const titles = batch
+        .slice(0, 3)
+        .map((b) => b.title ?? b.mirrorHash.slice(0, 8))
+        .join(", ");
+      process.stderr.write(
+        `\n[embed] Batch fallback (${batch.length} chunks: ${titles}${batch.length > 3 ? "..." : ""}): ${batchEmbedResult.value.batchError ?? "unknown batch error"}\n`
+      );
+    }
+    pushErrorSamples(batchEmbedResult.value.failureSamples);
+    suggestion ||= batchEmbedResult.value.retrySuggestion;
+    if (ctx.verbose && batchEmbedResult.value.failureSamples.length > 0) {
+      for (const sample of batchEmbedResult.value.failureSamples) {
+        process.stderr.write(`\n[embed] Sample failure: ${sample}\n`);
+      }
+    }
+    const vectors: VectorRow[] = [];
+    for (const [idx, item] of batch.entries()) {
+      const embedding = batchEmbedResult.value.vectors[idx];
+      if (!embedding) {
+        errors += 1;
+        continue;
+      }
+      vectors.push({
+        mirrorHash: item.mirrorHash,
+        seq: item.seq,
+        model: ctx.modelUri,
+        embedding: new Float32Array(embedding),
+      });
+    }
+    if (vectors.length === 0) {
       if (ctx.verbose) {
-        process.stderr.write(
-          `\n[embed] Count mismatch: got ${embeddings.length}, expected ${batch.length}\n`
-        );
+        process.stderr.write("\n[embed] No recoverable embeddings in batch\n");
       }
-      errors += batch.length;
       continue;
     }
-    // Store vectors (embeddedAt set by DB)
-    const vectors: VectorRow[] = batch.map((b, idx) => ({
-      mirrorHash: b.mirrorHash,
-      seq: b.seq,
-      model: ctx.modelUri,
-      embedding: new Float32Array(embeddings[idx] as number[]),
-    }));
     const storeResult = await ctx.vectorIndex.upsertVectors(vectors);
     if (!storeResult.ok) {
       if (ctx.verbose) {
@@ -205,21 +364,26 @@ async function processBatches(ctx: BatchContext): Promise<BatchResult> {
           `\n[embed] Store failed: ${storeResult.error.message}\n`
         );
       }
-      errors += batch.length;
+      pushErrorSamples([storeResult.error.message]);
+      suggestion ??=
+        "Store write failed. Rerun `gno embed` once more; if it repeats, run `gno doctor` and `gno vec sync`.";
+      errors += vectors.length;
       continue;
     }
-    embedded += batch.length;
+    embedded += vectors.length;
     // Progress output
     if (ctx.showProgress) {
-      const pct = ((embedded + errors) / ctx.totalToEmbed) * 100;
+      const embeddedDisplay = Math.min(embedded, ctx.totalToEmbed);
+      const completed = Math.min(embedded + errors, ctx.totalToEmbed);
+      const pct = (completed / ctx.totalToEmbed) * 100;
       const elapsed = (Date.now() - startTime) / 1000;
       const rate = embedded / Math.max(elapsed, 0.001);
       const eta =
-        (ctx.totalToEmbed - embedded - errors) / Math.max(rate, 0.001);
+        Math.max(0, ctx.totalToEmbed - completed) / Math.max(rate, 0.001);
       process.stdout.write(
-        `\rEmbedding: ${embedded.toLocaleString()}/${ctx.totalToEmbed.toLocaleString()} (${pct.toFixed(1)}%) | ${rate.toFixed(1)} chunks/s | ETA ${formatDuration(eta)}`
+        `\rEmbedding: ${embeddedDisplay.toLocaleString()}/${ctx.totalToEmbed.toLocaleString()} (${pct.toFixed(1)}%) | ${rate.toFixed(1)} chunks/s | ETA ${formatDuration(eta)}`
       );
     }
   }
@@ -233,6 +397,8 @@ async function processBatches(ctx: BatchContext): Promise<BatchResult> {
     embedded,
     errors,
     duration: (Date.now() - startTime) / 1000,
+    errorSamples,
+    suggestion,
   };
 }
@@ -338,6 +504,7 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
         duration: 0,
         model: modelUri,
         searchAvailable: vecAvailable,
+        errorSamples: [],
       };
     }
@@ -350,6 +517,7 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
         duration: 0,
         model: modelUri,
         searchAvailable: vecAvailable,
+        errorSamples: [],
       };
     }
@@ -366,6 +534,27 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
       : undefined;
     const llm = new LlmAdapter(config);
+    const recreateEmbedPort = async () => {
+      if (embedPort) {
+        await embedPort.dispose();
+      }
+      await llm.getManager().dispose(modelUri);
+      const recreated = await llm.createEmbeddingPort(modelUri, {
+        policy,
+        onProgress: downloadProgress
+          ? (progress) => downloadProgress("embed", progress)
+          : undefined,
+      });
+      if (!recreated.ok) {
+        return { ok: false as const, error: recreated.error.message };
+      }
+      const initResult = await recreated.value.init();
+      if (!initResult.ok) {
+        await recreated.value.dispose();
+        return { ok: false as const, error: initResult.error.message };
+      }
+      return { ok: true as const, value: recreated.value };
+    };
     const embedResult = await llm.createEmbeddingPort(modelUri, {
       policy,
       onProgress: downloadProgress
@@ -412,6 +601,7 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
       showProgress: !options.json,
       totalToEmbed,
       verbose: options.verbose ?? false,
+      recreateEmbedPort,
     });
     if (!result.ok) {
@@ -431,10 +621,27 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
           }
         }
         vectorIndex.vecDirty = false;
-      } else if (!options.json) {
-        process.stdout.write(
-          `\n[vec] Sync failed: ${syncResult.error.message}\n`
-        );
+      } else {
+        if (!options.json) {
+          process.stdout.write(
+            `\n[vec] Sync failed: ${syncResult.error.message}\n`
+          );
+        }
+        return {
+          success: true,
+          embedded: result.embedded,
+          errors: result.errors,
+          duration: result.duration,
+          model: modelUri,
+          searchAvailable: vectorIndex.searchAvailable,
+          errorSamples: [
+            ...result.errorSamples,
+            syncResult.error.message,
+          ].slice(0, 5),
+          suggestion:
+            "Vector index sync failed after embedding. Rerun `gno embed` once more. If it repeats, run `gno vec sync`.",
+          syncError: syncResult.error.message,
+        };
       }
     }
@@ -445,6 +652,8 @@ export async function embed(options: EmbedOptions = {}): Promise<EmbedResult> {
       duration: result.duration,
       model: modelUri,
       searchAvailable: vectorIndex.searchAvailable,
+      errorSamples: result.errorSamples,
+      suggestion: result.suggestion,
     };
   } finally {
     if (embedPort) {
@@ -569,6 +778,9 @@ export function formatEmbed(
         duration: result.duration,
         model: result.model,
         searchAvailable: result.searchAvailable,
+        errorSamples: result.errorSamples ?? [],
+        suggestion: result.suggestion,
+        syncError: result.syncError,
       },
       null,
       2
@@ -590,6 +802,14 @@ export function formatEmbed(
   if (result.errors > 0) {
     lines.push(`${result.errors} chunks failed to embed.`);
+    if ((result.errorSamples?.length ?? 0) > 0) {
+      for (const sample of result.errorSamples ?? []) {
+        lines.push(`Sample error: ${sample}`);
+      }
+    }
+    if (result.suggestion) {
+      lines.push(`Hint: ${result.suggestion}`);
+    }
   }
   if (!result.searchAvailable) {
@@ -598,5 +818,9 @@ export function formatEmbed(
     );
   }
+  if (result.syncError) {
+    lines.push(`Vec sync error: ${result.syncError}`);
+  }
   return lines.join("\n");
 }

package/src/cli/commands/vsearch.ts CHANGED Viewed

@@ -97,7 +97,7 @@ export async function vsearch(
     try {
       // Embed query with contextual formatting (also determines dimensions)
       const queryEmbedResult = await embedPort.embed(
-        formatQueryForEmbedding(query)
+        formatQueryForEmbedding(query, embedPort.modelUri)
       );
       if (!queryEmbedResult.ok) {
         return { success: false, error: queryEmbedResult.error.message };

package/src/embed/backlog.ts CHANGED Viewed

@@ -16,6 +16,7 @@ import type {
 import { formatDocForEmbedding } from "../pipeline/contextual";
 import { err, ok } from "../store/types";
+import { embedTextsWithRecovery } from "./batch";
 // ─────────────────────────────────────────────────────────────────────────────
 // Types
@@ -85,9 +86,14 @@ export async function embedBacklog(
       }
       // Embed batch with contextual formatting (title prefix)
-      const embedResult = await embedPort.embedBatch(
+      const embedResult = await embedTextsWithRecovery(
+        embedPort,
         batch.map((b: BacklogItem) =>
-          formatDocForEmbedding(b.text, b.title ?? undefined)
+          formatDocForEmbedding(
+            b.text,
+            b.title ?? undefined,
+            embedPort.modelUri
+          )
         )
       );
@@ -96,28 +102,29 @@ export async function embedBacklog(
         continue;
       }
-      // Validate batch/embedding count match
-      const embeddings = embedResult.value;
-      if (embeddings.length !== batch.length) {
-        errors += batch.length;
-        continue;
+      const vectors: VectorRow[] = [];
+      for (const [idx, item] of batch.entries()) {
+        const embedding = embedResult.value.vectors[idx];
+        if (!embedding) {
+          errors += 1;
+          continue;
+        }
+        vectors.push({
+          mirrorHash: item.mirrorHash,
+          seq: item.seq,
+          model: modelUri,
+          embedding: new Float32Array(embedding),
+        });
       }
-      // Store vectors (embeddedAt set by DB)
-      const vectors: VectorRow[] = batch.map((b: BacklogItem, idx: number) => ({
-        mirrorHash: b.mirrorHash,
-        seq: b.seq,
-        model: modelUri,
-        embedding: new Float32Array(embeddings[idx] as number[]),
-      }));
-      const storeResult = await vectorIndex.upsertVectors(vectors);
-      if (!storeResult.ok) {
-        errors += batch.length;
-        continue;
+      if (vectors.length > 0) {
+        const storeResult = await vectorIndex.upsertVectors(vectors);
+        if (!storeResult.ok) {
+          errors += vectors.length;
+          continue;
+        }
+        embedded += vectors.length;
       }
-      embedded += batch.length;
     }
     // Sync vec index once at end if any vec0 writes failed

package/src/embed/batch.ts ADDED Viewed

@@ -0,0 +1,277 @@
+/**
+ * Shared embedding batch helpers.
+ *
+ * @module src/embed/batch
+ */
+import type { EmbeddingPort, LlmResult } from "../llm/types";
+import { getEmbeddingCompatibilityProfile } from "../llm/embedding-compatibility";
+import { inferenceFailedError } from "../llm/errors";
+export interface EmbedBatchRecoveryResult {
+  vectors: Array<number[] | null>;
+  batchFailed: boolean;
+  batchError?: string;
+  fallbackErrors: number;
+  failureSamples: string[];
+  retrySuggestion?: string;
+}
+const MAX_FAILURE_SAMPLES = 5;
+function errorMessage(error: unknown): string {
+  if (
+    error &&
+    typeof error === "object" &&
+    "message" in error &&
+    typeof error.message === "string"
+  ) {
+    return error.message;
+  }
+  return String(error);
+}
+function formatFailureMessage(error: {
+  message: string;
+  cause?: unknown;
+}): string {
+  const cause = error.cause ? errorMessage(error.cause) : "";
+  return cause && cause !== error.message
+    ? `${error.message} - ${cause}`
+    : error.message;
+}
+function isDisposedFailure(message: string): boolean {
+  return message.toLowerCase().includes("object is disposed");
+}
+async function resetEmbeddingPort(
+  embedPort: EmbeddingPort
+): Promise<LlmResult<void>> {
+  await embedPort.dispose();
+  return embedPort.init();
+}
+export async function embedTextsWithRecovery(
+  embedPort: EmbeddingPort,
+  texts: string[]
+): Promise<LlmResult<EmbedBatchRecoveryResult>> {
+  if (texts.length === 0) {
+    return {
+      ok: true,
+      value: {
+        vectors: [],
+        batchFailed: false,
+        fallbackErrors: 0,
+        failureSamples: [],
+      },
+    };
+  }
+  const profile = getEmbeddingCompatibilityProfile(embedPort.modelUri);
+  if (profile.batchEmbeddingTrusted) {
+    let batchResult = await embedPort.embedBatch(texts);
+    if (!batchResult.ok) {
+      const formattedBatchError = formatFailureMessage(batchResult.error);
+      if (isDisposedFailure(formattedBatchError)) {
+        const reset = await resetEmbeddingPort(embedPort);
+        if (!reset.ok) {
+          return reset;
+        }
+        batchResult = await embedPort.embedBatch(texts);
+      }
+    }
+    if (batchResult.ok && batchResult.value.length === texts.length) {
+      return {
+        ok: true,
+        value: {
+          vectors: batchResult.value,
+          batchFailed: false,
+          fallbackErrors: 0,
+          failureSamples: [],
+        },
+      };
+    }
+    const recovered = await recoverWithAdaptiveBatches(embedPort, texts, {
+      rootBatchAlreadyFailed: true,
+    });
+    if (!recovered.ok) {
+      return recovered;
+    }
+    return {
+      ok: true,
+      value: {
+        ...recovered.value,
+        batchFailed: true,
+        batchError: batchResult.ok
+          ? `Embedding count mismatch: got ${batchResult.value.length}, expected ${texts.length}`
+          : formatFailureMessage(batchResult.error),
+        retrySuggestion:
+          recovered.value.fallbackErrors > 0
+            ? "Try rerunning the same command. If failures persist, rerun with `gno --verbose embed --batch-size 1` to isolate failing chunks."
+            : undefined,
+      },
+    };
+  }
+  const recovered = await recoverIndividually(embedPort, texts);
+  if (!recovered.ok) {
+    return recovered;
+  }
+  return {
+    ok: true,
+    value: {
+      ...recovered.value,
+      batchFailed: true,
+      batchError: "Batch embedding disabled for this compatibility profile",
+      retrySuggestion:
+        recovered.value.fallbackErrors > 0
+          ? "Some chunks still failed individually. Rerun with `gno --verbose embed --batch-size 1` for exact chunk errors."
+          : undefined,
+    },
+  };
+}
+async function recoverWithAdaptiveBatches(
+  embedPort: EmbeddingPort,
+  texts: string[],
+  options: { rootBatchAlreadyFailed?: boolean } = {}
+): Promise<
+  LlmResult<Omit<EmbedBatchRecoveryResult, "batchFailed" | "batchError">>
+> {
+  try {
+    const vectors: Array<number[] | null> = Array.from(
+      { length: texts.length },
+      () => null
+    );
+    const failureSamples: string[] = [];
+    let fallbackErrors = 0;
+    const recordFailure = (message: string): void => {
+      if (failureSamples.length < MAX_FAILURE_SAMPLES) {
+        failureSamples.push(message);
+      }
+    };
+    const processRange = async (
+      rangeTexts: string[],
+      offset: number,
+      batchAlreadyFailed = false
+    ): Promise<void> => {
+      if (rangeTexts.length === 0) {
+        return;
+      }
+      if (rangeTexts.length === 1) {
+        const result = await embedPort.embed(rangeTexts[0] ?? "");
+        if (result.ok) {
+          vectors[offset] = result.value;
+          return;
+        }
+        fallbackErrors += 1;
+        recordFailure(formatFailureMessage(result.error));
+        return;
+      }
+      let batchResult: Awaited<ReturnType<typeof embedPort.embedBatch>> | null =
+        null;
+      if (!batchAlreadyFailed) {
+        batchResult = await embedPort.embedBatch(rangeTexts);
+      }
+      if (
+        batchResult &&
+        batchResult.ok &&
+        batchResult.value.length === rangeTexts.length
+      ) {
+        for (const [index, vector] of batchResult.value.entries()) {
+          vectors[offset + index] = vector;
+        }
+        return;
+      }
+      const mid = Math.ceil(rangeTexts.length / 2);
+      await processRange(rangeTexts.slice(0, mid), offset);
+      await processRange(rangeTexts.slice(mid), offset + mid);
+    };
+    await processRange(texts, 0, options.rootBatchAlreadyFailed ?? false);
+    if (fallbackErrors === texts.length) {
+      const reinit = await resetEmbeddingPort(embedPort);
+      if (!reinit.ok) {
+        return reinit;
+      }
+      const retry = await recoverIndividually(embedPort, texts);
+      if (!retry.ok) {
+        return retry;
+      }
+      return {
+        ok: true,
+        value: retry.value,
+      };
+    }
+    return {
+      ok: true,
+      value: {
+        vectors,
+        fallbackErrors,
+        failureSamples,
+      },
+    };
+  } catch (error) {
+    return {
+      ok: false,
+      error: inferenceFailedError(
+        embedPort.modelUri,
+        new Error(errorMessage(error))
+      ),
+    };
+  }
+}
+async function recoverIndividually(
+  embedPort: EmbeddingPort,
+  texts: string[]
+): Promise<
+  LlmResult<Omit<EmbedBatchRecoveryResult, "batchFailed" | "batchError">>
+> {
+  try {
+    const vectors: Array<number[] | null> = [];
+    const failureSamples: string[] = [];
+    let fallbackErrors = 0;
+    for (const text of texts) {
+      const result = await embedPort.embed(text);
+      if (result.ok) {
+        vectors.push(result.value);
+      } else {
+        vectors.push(null);
+        fallbackErrors += 1;
+        if (failureSamples.length < MAX_FAILURE_SAMPLES) {
+          failureSamples.push(formatFailureMessage(result.error));
+        }
+      }
+    }
+    return {
+      ok: true,
+      value: {
+        vectors,
+        fallbackErrors,
+        failureSamples,
+      },
+    };
+  } catch (error) {
+    return {
+      ok: false,
+      error: inferenceFailedError(
+        embedPort.modelUri,
+        new Error(errorMessage(error))
+      ),
+    };
+  }
+}

package/src/llm/embedding-compatibility.ts ADDED Viewed

@@ -0,0 +1,82 @@
+/**
+ * Embedding compatibility profiles.
+ *
+ * Encodes model-specific formatting/runtime hints for embedding models without
+ * forcing every caller to special-case URIs inline.
+ *
+ * @module src/llm/embedding-compatibility
+ */
+export type EmbeddingQueryFormat = "contextual-task" | "qwen-instruct";
+export type EmbeddingDocumentFormat = "title-prefixed" | "raw-text";
+export interface EmbeddingCompatibilityProfile {
+  id: string;
+  queryFormat: EmbeddingQueryFormat;
+  documentFormat: EmbeddingDocumentFormat;
+  /**
+   * Whether embedBatch is trusted for this model in GNO's current native path.
+   * If false, callers should use per-item embedding until compatibility is
+   * better understood.
+   */
+  batchEmbeddingTrusted: boolean;
+  notes?: string[];
+}
+const DEFAULT_PROFILE: EmbeddingCompatibilityProfile = {
+  id: "default",
+  queryFormat: "contextual-task",
+  documentFormat: "title-prefixed",
+  batchEmbeddingTrusted: true,
+};
+const QWEN_PROFILE: EmbeddingCompatibilityProfile = {
+  id: "qwen-embedding",
+  queryFormat: "qwen-instruct",
+  documentFormat: "raw-text",
+  batchEmbeddingTrusted: true,
+  notes: [
+    "Uses Qwen-style instruct query formatting.",
+    "Documents are embedded as raw text (optionally prefixed with title).",
+  ],
+};
+const JINA_PROFILE: EmbeddingCompatibilityProfile = {
+  id: "jina-embedding",
+  queryFormat: "contextual-task",
+  documentFormat: "title-prefixed",
+  batchEmbeddingTrusted: false,
+  notes: [
+    "Current native runtime path has batch-embedding issues on real fixtures.",
+    "Prefer per-item embedding fallback until compatibility improves.",
+  ],
+};
+function normalizeModelUri(modelUri?: string): string {
+  return modelUri?.toLowerCase() ?? "";
+}
+function hasAllTerms(haystack: string, terms: string[]): boolean {
+  return terms.every((term) => haystack.includes(term));
+}
+export function getEmbeddingCompatibilityProfile(
+  modelUri?: string
+): EmbeddingCompatibilityProfile {
+  const normalizedUri = normalizeModelUri(modelUri);
+  if (hasAllTerms(normalizedUri, ["qwen", "embed"])) {
+    return QWEN_PROFILE;
+  }
+  if (
+    normalizedUri.includes("jina-embeddings-v4-text-code") ||
+    normalizedUri.includes("jina-code-embeddings") ||
+    hasAllTerms(normalizedUri, ["jina", "embeddings-v4-text-code"]) ||
+    hasAllTerms(normalizedUri, ["jina", "code-embeddings"])
+  ) {
+    return JINA_PROFILE;
+  }
+  return DEFAULT_PROFILE;
+}

package/src/mcp/tools/vsearch.ts CHANGED Viewed

@@ -149,7 +149,7 @@ export function handleVsearch(
       try {
         // Embed query with contextual formatting
         const queryEmbedResult = await embedPort.embed(
-          formatQueryForEmbedding(args.query)
+          formatQueryForEmbedding(args.query, embedPort.modelUri)
         );
         if (!queryEmbedResult.ok) {
           throw new Error(queryEmbedResult.error.message);

package/src/pipeline/contextual.ts CHANGED Viewed

@@ -10,6 +10,8 @@
  * @module src/pipeline/contextual
  */
+import { getEmbeddingCompatibilityProfile } from "../llm/embedding-compatibility";
 // Top-level regex for performance
 const HEADING_REGEX = /^##?\s+(.+)$/m;
 const SUBHEADING_REGEX = /^##\s+(.+)$/m;
@@ -19,8 +21,16 @@ const EXT_REGEX = /\.\w+$/;
  * Format document text for embedding.
  * Prepends title for contextual retrieval.
  */
-export function formatDocForEmbedding(text: string, title?: string): string {
+export function formatDocForEmbedding(
+  text: string,
+  title?: string,
+  modelUri?: string
+): string {
+  const profile = getEmbeddingCompatibilityProfile(modelUri);
   const safeTitle = title?.trim() || "none";
+  if (profile.documentFormat === "raw-text") {
+    return title?.trim() ? `${title.trim()}\n${text}` : text;
+  }
   return `title: ${safeTitle} | text: ${text}`;
 }
@@ -28,7 +38,14 @@ export function formatDocForEmbedding(text: string, title?: string): string {
  * Format query for embedding.
  * Uses task-prefixed format for asymmetric retrieval.
  */
-export function formatQueryForEmbedding(query: string): string {
+export function formatQueryForEmbedding(
+  query: string,
+  modelUri?: string
+): string {
+  const profile = getEmbeddingCompatibilityProfile(modelUri);
+  if (profile.queryFormat === "qwen-instruct") {
+    return `Instruct: Retrieve relevant documents for the given query\nQuery: ${query}`;
+  }
   return `task: search result | query: ${query}`;
 }

package/src/pipeline/hybrid.ts CHANGED Viewed

@@ -18,6 +18,7 @@ import type {
   SearchResults,
 } from "./types";
+import { embedTextsWithRecovery } from "../embed/batch";
 import { err, ok } from "../store/types";
 import { createChunkLookup } from "./chunk-lookup";
 import { formatQueryForEmbedding } from "./contextual";
@@ -213,7 +214,9 @@ async function searchVectorChunks(
   }
   // Embed query with contextual formatting
-  const embedResult = await embedPort.embed(formatQueryForEmbedding(query));
+  const embedResult = await embedPort.embed(
+    formatQueryForEmbedding(query, embedPort.modelUri)
+  );
   if (!embedResult.ok) {
     return [];
   }
@@ -443,17 +446,6 @@ export async function searchHybrid(
   const vectorStartedAt = performance.now();
   if (vectorAvailable && vectorIndex && embedPort) {
-    // Original query (increase limit when post-filters are active).
-    const vecChunks = await searchVectorChunks(vectorIndex, embedPort, query, {
-      limit: limit * 2 * retrievalMultiplier,
-    });
-    vecCount = vecChunks.length;
-    if (vecCount > 0) {
-      rankedInputs.push(toRankedInput("vector", vecChunks));
-    }
-    // Semantic variants + HyDE (optional; run in parallel and ignore failures)
     const vectorVariantQueries = [
       ...(expansion?.vectorQueries?.map((query) => ({
         source: "vector_variant" as const,
@@ -464,22 +456,72 @@ export async function searchHybrid(
         : []),
     ];
-    if (vectorVariantQueries.length > 0) {
-      const optionalVectorResults = await Promise.allSettled(
-        vectorVariantQueries.map((variant) =>
-          searchVectorChunks(vectorIndex, embedPort, variant.query, {
-            limit: limit * retrievalMultiplier,
-          })
+    if (vectorVariantQueries.length === 0) {
+      const vecChunks = await searchVectorChunks(
+        vectorIndex,
+        embedPort,
+        query,
+        {
+          limit: limit * 2 * retrievalMultiplier,
+        }
+      );
+      vecCount = vecChunks.length;
+      if (vecCount > 0) {
+        rankedInputs.push(toRankedInput("vector", vecChunks));
+      }
+    } else {
+      const batchedQueries = [
+        {
+          source: "vector" as const,
+          query,
+          limit: limit * 2 * retrievalMultiplier,
+        },
+        ...vectorVariantQueries.map((variant) => ({
+          ...variant,
+          limit: limit * retrievalMultiplier,
+        })),
+      ];
+      const embedResult = await embedTextsWithRecovery(
+        embedPort,
+        batchedQueries.map((variant) =>
+          formatQueryForEmbedding(variant.query, embedPort.modelUri)
         )
       );
-      for (const [index, settled] of optionalVectorResults.entries()) {
-        if (settled.status !== "fulfilled" || settled.value.length === 0) {
-          continue;
+      if (!embedResult.ok) {
+        counters.fallbackEvents.push("vector_embed_error");
+      } else {
+        if (embedResult.value.batchFailed) {
+          counters.fallbackEvents.push("vector_embed_batch_fallback");
         }
-        const variant = vectorVariantQueries[index];
-        if (variant) {
-          rankedInputs.push(toRankedInput(variant.source, settled.value));
+        for (const [index, variant] of batchedQueries.entries()) {
+          const embedding = embedResult.value.vectors[index];
+          if (!embedding || !variant) {
+            continue;
+          }
+          const searchResult = await vectorIndex.searchNearest(
+            new Float32Array(embedding),
+            variant.limit
+          );
+          if (!searchResult.ok || searchResult.value.length === 0) {
+            continue;
+          }
+          const chunks = searchResult.value.map((item) => ({
+            mirrorHash: item.mirrorHash,
+            seq: item.seq,
+          }));
+          if (variant.source === "vector") {
+            vecCount = chunks.length;
+          }
+          if (chunks.length === 0) {
+            continue;
+          }
+          rankedInputs.push(toRankedInput(variant.source, chunks));
         }
       }
     }

package/src/pipeline/vsearch.ts CHANGED Viewed

@@ -353,7 +353,9 @@ export async function searchVector(
   }
   // Embed query with contextual formatting
-  const embedResult = await embedPort.embed(formatQueryForEmbedding(query));
+  const embedResult = await embedPort.embed(
+    formatQueryForEmbedding(query, embedPort.modelUri)
+  );
   if (!embedResult.ok) {
     return err(
       "QUERY_FAILED",

package/src/sdk/client.ts CHANGED Viewed

@@ -401,7 +401,7 @@ class GnoClientImpl implements GnoClient {
       }
       const queryEmbedResult = await ports.embedPort.embed(
-        formatQueryForEmbedding(query)
+        formatQueryForEmbedding(query, ports.embedPort.modelUri)
       );
       if (!queryEmbedResult.ok) {
         throw sdkError("MODEL", queryEmbedResult.error.message, {

package/src/sdk/embed.ts CHANGED Viewed

@@ -19,6 +19,7 @@ import type {
 import type { GnoEmbedOptions, GnoEmbedResult } from "./types";
 import { embedBacklog } from "../embed";
+import { embedTextsWithRecovery } from "../embed/batch";
 import { resolveModelUri } from "../llm/registry";
 import { formatDocForEmbedding } from "../pipeline/contextual";
 import { err, ok } from "../store/types";
@@ -139,29 +140,45 @@ async function forceEmbedAll(
       cursor = { mirrorHash: lastItem.mirrorHash, seq: lastItem.seq };
     }
-    const embedResult = await embedPort.embedBatch(
+    const embedResult = await embedTextsWithRecovery(
+      embedPort,
       batch.map((item) =>
-        formatDocForEmbedding(item.text, item.title ?? undefined)
+        formatDocForEmbedding(
+          item.text,
+          item.title ?? undefined,
+          embedPort.modelUri
+        )
       )
     );
-    if (!embedResult.ok || embedResult.value.length !== batch.length) {
+    if (!embedResult.ok) {
       errors += batch.length;
       continue;
     }
-    const vectors: VectorRow[] = batch.map((item, idx) => ({
-      mirrorHash: item.mirrorHash,
-      seq: item.seq,
-      model: modelUri,
-      embedding: new Float32Array(embedResult.value[idx] as number[]),
-    }));
-    const storeResult = await vectorIndex.upsertVectors(vectors);
-    if (!storeResult.ok) {
-      errors += batch.length;
-      continue;
+    const vectors: VectorRow[] = [];
+    for (const [idx, item] of batch.entries()) {
+      const embedding = embedResult.value.vectors[idx];
+      if (!embedding) {
+        errors += 1;
+        continue;
+      }
+      vectors.push({
+        mirrorHash: item.mirrorHash,
+        seq: item.seq,
+        model: modelUri,
+        embedding: new Float32Array(embedding),
+      });
     }
-    embedded += batch.length;
+    if (vectors.length > 0) {
+      const storeResult = await vectorIndex.upsertVectors(vectors);
+      if (!storeResult.ok) {
+        errors += vectors.length;
+        continue;
+      }
+      embedded += vectors.length;
+    }
   }
   if (vectorIndex.vecDirty) {

package/src/store/vector/sqlite-vec.ts CHANGED Viewed

@@ -117,10 +117,12 @@ export async function createVectorIndexPort(
   `);
   // Prepared statements for vec0 table (if available)
-  const upsertVecStmt = searchAvailable
-    ? db.prepare(
-        `INSERT OR REPLACE INTO ${tableName} (chunk_id, embedding) VALUES (?, ?)`
-      )
+  const deleteVecChunkStmt = searchAvailable
+    ? db.prepare(`DELETE FROM ${tableName} WHERE chunk_id = ?`)
+    : null;
+  const insertVecStmt = searchAvailable
+    ? db.prepare(`INSERT INTO ${tableName} (chunk_id, embedding) VALUES (?, ?)`)
     : null;
   const searchStmt = searchAvailable
@@ -175,12 +177,15 @@ export async function createVectorIndexPort(
       }
       // 2. Best-effort update vec0 (graceful degradation)
-      if (upsertVecStmt) {
+      if (deleteVecChunkStmt && insertVecStmt) {
         try {
           db.transaction(() => {
             for (const row of rows) {
               const chunkId = `${row.mirrorHash}:${row.seq}`;
-              upsertVecStmt.run(chunkId, encodeEmbedding(row.embedding));
+              // sqlite-vec vec0 tables do not reliably support OR REPLACE semantics.
+              // Delete first, then insert the fresh vector row.
+              deleteVecChunkStmt.run(chunkId);
+              insertVecStmt.run(chunkId, encodeEmbedding(row.embedding));
             }
           })();
         } catch (e) {