npm - @henrychong-ai/mcp-neo4j-knowledge-graph - Versions diffs - 2.5.0 → 2.7.0 - Mend

@henrychong-ai/mcp-neo4j-knowledge-graph 2.5.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +181 -4
package/dist/KnowledgeGraphManager.d.ts +35 -7
package/dist/KnowledgeGraphManager.js +120 -44
package/dist/KnowledgeGraphManager.js.map +1 -1
package/dist/cli/generate-embeddings.js +12 -0
package/dist/cli/generate-embeddings.js.map +1 -1
package/dist/embeddings/EmbeddingServiceFactory.d.ts +26 -1
package/dist/embeddings/EmbeddingServiceFactory.js +80 -5
package/dist/embeddings/EmbeddingServiceFactory.js.map +1 -1
package/dist/embeddings/OpenAIEmbeddingService.d.ts +6 -0
package/dist/embeddings/OpenAIEmbeddingService.js +14 -2
package/dist/embeddings/OpenAIEmbeddingService.js.map +1 -1
package/dist/index.js +9 -0
package/dist/index.js.map +1 -1
package/dist/retrieval/RerankerService.d.ts +19 -6
package/dist/retrieval/RerankerService.js +30 -10
package/dist/retrieval/RerankerService.js.map +1 -1
package/dist/server/handlers/callToolHandler.js +5 -3
package/dist/server/handlers/callToolHandler.js.map +1 -1
package/dist/server/handlers/listToolsHandler.js +2 -2
package/dist/server/handlers/listToolsHandler.js.map +1 -1
package/dist/server/setup.d.ts +10 -0
package/dist/server/setup.js +16 -1
package/dist/server/setup.js.map +1 -1
package/dist/storage/neo4j/Neo4jStorageProvider.d.ts +21 -0
package/dist/storage/neo4j/Neo4jStorageProvider.js +88 -17
package/dist/storage/neo4j/Neo4jStorageProvider.js.map +1 -1
package/example.env +63 -0
package/package.json +3 -2

package/README.md CHANGED Viewed

@@ -524,8 +524,8 @@ The following tools are available to LLM client hosts through the Model Context
   - Search for entities semantically using vector embeddings and similarity
   - Input:
     - `query` (string): The text query to search for semantically
-    - `limit` (number, optional): Maximum results to return (default: 10)
-    - `min_similarity` (number, optional): Minimum similarity threshold (0.0-1.0, default: 0.6)
+    - `limit` (number, optional): Maximum results to return (default: 10; with a reranker configured, default: 5 reranked best-first — an explicit `limit` is always honoured exactly)
+    - `min_similarity` (number, optional): Minimum similarity threshold on Neo4j's normalised cosine scale (0.0-1.0, where 0.5 ≈ unrelated; default: 0 = disabled — see [Result counts, ordering & `min_similarity`](#result-counts-ordering--min_similarity))
     - `entity_types` (string[], optional): Filter results by entity types
     - `domain` (string, optional): Filter by user-defined domain. Omit to search all domains
     - `hybrid_search` (boolean, optional): Combine keyword and semantic search (default: true)
@@ -563,6 +563,180 @@ The following tools are available to LLM client hosts through the Model Context
     - `reference_time` (number): Reference timestamp for decay calculation (milliseconds since epoch)
     - `decay_factor` (number): Optional decay factor override
+## Embeddings & Reranking Setup
+Semantic search needs an embedding provider. The server speaks the **OpenAI-compatible `/embeddings` API**, so it works with OpenAI, Cloudflare Workers AI, or any self-hosted OpenAI-compatible endpoint (Ollama, LM Studio, vLLM). An optional **cross-encoder reranker** re-scores semantic search candidates for better precision.
+### The one rule that matters: dimensions must match
+```
+EMBEDDING_DIMENSIONS  ==  NEO4J_VECTOR_DIMENSIONS  ==  the model's NATIVE output dimension
+```
+The Neo4j vector index is created at a fixed dimension. A vector of any other length can never be indexed — and as of v2.6.0 the server **refuses to write it** (see [Graceful degradation](#graceful-degradation--failure-behaviour)). The dimension is a property of the *model*, so pick the model first, then set both variables to its native output size.
+### Option A — OpenAI (default)
+```bash
+OPENAI_API_KEY=sk-...
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small   # 1536 dimensions (default)
+NEO4J_VECTOR_DIMENSIONS=1536
+```
+Nothing else needed — the OpenAI endpoint is the built-in default.
+### Option B — Cloudflare Workers AI (free plan works)
+Cloudflare's free Workers AI allocation (**10,000 neurons/day**) comfortably covers a personal knowledge graph — a full re-embed of ~2,000 entities fits inside a single day's free quota, and steady-state usage (query embeddings + incremental backfill) is a tiny fraction of that.
+1. **Create a token**: Cloudflare dashboard → My Profile → **API Tokens** → Create Token → use the **Workers AI** template (or a custom token with `Account → Workers AI → Read`). This single permission covers both embeddings and the reranker.
+2. **Find your account ID**: dashboard → any zone → right sidebar, or **Workers & Pages** overview.
+3. **Configure:**
+```bash
+EMBEDDING_API_KEY=<your-cf-workers-ai-token>
+EMBEDDING_API_ENDPOINT=https://api.cloudflare.com/client/v4/accounts/<your-account-id>/ai/v1/embeddings
+EMBEDDING_MODEL=@cf/qwen/qwen3-embedding-0.6b   # native 1024 dimensions
+EMBEDDING_DIMENSIONS=1024
+NEO4J_VECTOR_DIMENSIONS=1024
+# Optional but recommended: cross-encoder reranker (same token)
+RERANK_ENABLED=true
+RERANK_ACCOUNT_ID=<your-account-id>
+RERANK_MODEL=@cf/baai/bge-reranker-base
+RERANK_API_KEY=<your-cf-workers-ai-token>
+```
+### Option C — Any OpenAI-compatible endpoint (Ollama, LM Studio, vLLM)
+```bash
+EMBEDDING_API_KEY=anything-non-empty            # some local servers ignore auth but the key must be set
+EMBEDDING_API_BASE_URL=http://localhost:11434/v1  # /embeddings is appended automatically
+EMBEDDING_MODEL=nomic-embed-text                # check your model's native dimension!
+EMBEDDING_DIMENSIONS=768
+NEO4J_VECTOR_DIMENSIONS=768
+```
+### Result counts, ordering & `min_similarity`
+Defaults are **reranker-aware** (v2.7.0+). Vector recall is always `limit ?? 10`; the reranker only re-orders *within* that recalled set and trims the default return:
+| Scenario | Vector recall | Returned | Final order |
+|---|---|---|---|
+| No reranker, default | 10 | **10** | hybrid score, best-first |
+| Reranker configured, default | 10 | **5** (`RERANK_TOP_K`) | cross-encoder, best-first |
+| Explicit `limit: N` (either mode) | N | **N** (always honoured exactly) | as above |
+| Reranker fails → fail-open | 10 / N | **5 / N** | hybrid score, sliced to the return count |
+Two env knobs govern the reranker, and they mean different things:
+- **`RERANK_TOP_K`** (default **5**) — the default *return count* when a reranker is configured. Only applies when no explicit `limit` is given.
+- **`RERANK_TOP_N`** (default **20**) — the *scoring-payload cap*: how many recall candidates are sent to the cross-encoder for scoring. It is **not** a return count. With an explicit `limit` larger than `RERANK_TOP_N`, the first `RERANK_TOP_N` candidates are cross-encoder-ordered and the unscored remainder is appended in recall order, so the `limit` contract always holds.
+**Ordering guarantees:** with a reranker, results are cross-encoder best-first (the response is defensively score-sorted server-side). Without a reranker — and on any reranker failure (fail-open) — results follow the hybrid-score order, which is preserved through entity hydration on both search paths (v2.7.0+).
+**`min_similarity`:** the threshold applies to **Neo4j's normalised cosine score** — `(cosine + 1) / 2`, so 0.5 ≈ unrelated and 1.0 = identical. The default is **0 (disabled)**. Absolute floors are not meaningful on this scale for typical embedding models: measured with qwen3 embeddings, top-20 scores cluster around 0.71–0.90 for relevant *and* irrelevant queries alike, so any floor that blocks junk also blocks real queries. The parameter is retained per-call for power users (an explicit `0` works).
+### Switching models (dimension migration)
+Changing to a model with a **different native dimension** requires rebuilding the vector index and re-embedding — vectors of the old dimension cannot coexist with the new index. With the server stopped:
+```cypher
+DROP INDEX entity_embeddings IF EXISTS;
+MATCH (e:Entity) WHERE e.embedding IS NOT NULL
+SET e.embedding = NULL, e.embeddingModel = NULL, e.embeddingGeneratedAt = NULL;
+CREATE VECTOR INDEX entity_embeddings IF NOT EXISTS
+FOR (n:Entity) ON (n.embedding)
+OPTIONS { indexConfig: {
+  `vector.dimensions`: 1024,            // the NEW dimension
+  `vector.similarity_function`: 'cosine'
+} };
+```
+Then update the `EMBEDDING_*` / `NEO4J_VECTOR_DIMENSIONS` variables and restart. The backfill cron (`EMBEDDING_BACKFILL_CRON`) re-embeds every entity automatically — tighten it to `*/1 * * * *` for the duration of the migration if you want it done in minutes rather than at the next daily tick.
+### Graceful degradation / failure behaviour
+The embedding pipeline is designed to fail **loudly into a safe state**, never silently corrupt:
+| Condition | Behaviour |
+|---|---|
+| No provider configured (no `EMBEDDING_API_KEY`/`OPENAI_API_KEY`) | Server runs in **keyword-only mode**: BM25/keyword search works, `semantic_search` falls back, nothing is ever embedded. Random/mock vectors are never generated implicitly. |
+| Embedding API call fails on entity write | Entity is persisted with `embedding = NULL`; the backfill cron retries later. Writes never block on the embedding provider. |
+| Reranker errors (timeout, bad response, quota) | **Fail-open**: `semantic_search` returns the hybrid-ordered recall sliced to the return count (v2.7.0+; previously the full widened recall, unordered). Reranking is strictly additive. |
+| Vector length ≠ `NEO4J_VECTOR_DIMENSIONS` (v2.6.0+) | Write is **rejected with a loud error** — a mismatched vector can never be indexed, so persisting it would silently corrupt search. The startup log also warns if `EMBEDDING_DIMENSIONS` ≠ `NEO4J_VECTOR_DIMENSIONS`. |
+| `NODE_ENV=production` with a mock/fallback embedding service (v2.6.0+) | Embedding **writes are refused** (keyword-only mode + hard error log). `MOCK_EMBEDDINGS=true` is for tests and never counts as a provider in production. |
+## Multi-Surface MCP Client Setup
+When several MCP clients (Claude Code, Claude Desktop, Codex, etc.) share one knowledge graph, use a **hub-and-spoke topology**:
+- **One server-side instance** owns all embedding writes: `WRITE_EMBEDDINGS_LOCALLY=true` (the default) plus a tight backfill cron (`EMBEDDING_BACKFILL_CRON='*/1 * * * *'`).
+- **Every interactive client** runs as a **thin client**: `WRITE_EMBEDDINGS_LOCALLY=false`. Thin clients embed *queries* (so `semantic_search` works) but never write embeddings — a misconfigured laptop can therefore never pollute the shared store.
+The canonical thin-client environment (substitute your own values):
+```bash
+NEO4J_URI=bolt://<your-neo4j-host>:7687
+NEO4J_USERNAME=neo4j                  # NOTE: NEO4J_USERNAME — "NEO4J_USER" is silently ignored
+NEO4J_PASSWORD=<password>
+NEO4J_DATABASE=neo4j
+NEO4J_VECTOR_DIMENSIONS=1024
+EMBEDDING_API_KEY=<token>
+EMBEDDING_API_ENDPOINT=https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/v1/embeddings
+EMBEDDING_MODEL=@cf/qwen/qwen3-embedding-0.6b
+EMBEDDING_DIMENSIONS=1024
+RERANK_ENABLED=true
+RERANK_ACCOUNT_ID=<account-id>
+RERANK_MODEL=@cf/baai/bge-reranker-base
+RERANK_API_KEY=<token>
+WRITE_EMBEDDINGS_LOCALLY=false
+```
+**Claude Code** (user scope, all projects):
+```bash
+claude mcp add-json kg -s user '{
+  "command": "npx",
+  "args": ["-y", "@henrychong-ai/mcp-neo4j-knowledge-graph"],
+  "env": { /* canonical thin-client env above */ }
+}'
+```
+**Claude Desktop** (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
+```json
+{
+  "mcpServers": {
+    "kg": {
+      "command": "npx",
+      "args": ["-y", "@henrychong-ai/mcp-neo4j-knowledge-graph"],
+      "env": { "...": "canonical thin-client env above" }
+    }
+  }
+}
+```
+**Codex** (`~/.codex/config.toml`):
+```toml
+[mcp_servers.kg]
+command = "npx"
+args = ["-y", "@henrychong-ai/mcp-neo4j-knowledge-graph"]
+[mcp_servers.kg.env]
+NEO4J_URI = "bolt://<your-neo4j-host>:7687"
+# ... canonical thin-client env above, TOML syntax
+```
+Tips:
+- **Secrets**: prefer a secret-manager wrapper (e.g. 1Password: `command: "op"`, `args: ["run", "--", "npx", "-y", "@henrychong-ai/mcp-neo4j-knowledge-graph"]` with `op://` references in `env`) over literal tokens in config files.
+- **Query embeddings must match the index**: every client embeds its own queries, so all clients must use the same model/dimension as the server's index. A client on a different model returns no semantic hits.
+- **After upgrading**: clear the npx cache so clients pick up the new version — `rm -rf ~/.npm/_npx/*/node_modules/@henrychong-ai` — then restart the client app. Long-lived apps (Claude Desktop) keep old server processes alive until restarted.
 ## Configuration
 ### Environment Variables
@@ -600,13 +774,16 @@ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
 # (no random-vector mock). Set MOCK_EMBEDDINGS=true for deterministic test vectors.
 # Optional cross-encoder reranker (v2.5.0+) — re-scores semantic_search candidates.
-# Disabled unless RERANK_ENABLED=true AND an endpoint + key resolve. Fail-open on any error.
+# Disabled unless RERANK_ENABLED=true AND an endpoint + key resolve. Fail-open on any error
+# (v2.7.0+: fail-open returns the hybrid-ordered recall sliced to the return count).
 RERANK_ENABLED=false
 # RERANK_MODEL=@cf/baai/bge-reranker-base
 # RERANK_ENDPOINT=https://api.cloudflare.com/client/v4/accounts/<id>/ai/run/@cf/baai/bge-reranker-base
 # RERANK_ACCOUNT_ID=<id>          # alternative to RERANK_ENDPOINT (derives the URL from model)
 # RERANK_API_KEY=<token>          # falls back to EMBEDDING_API_KEY
-# RERANK_TOP_N=20  RERANK_TOP_K=10  RERANK_MAX_PASSAGE_CHARS=2000  RERANK_TIMEOUT_MS=5000
+# RERANK_TOP_N=20                 # scoring-payload cap (candidates sent to the cross-encoder) — NOT a return count
+# RERANK_TOP_K=5                  # default return count with a reranker (explicit `limit` always wins; v2.7.0: was 10)
+# RERANK_MAX_PASSAGE_CHARS=2000  RERANK_TIMEOUT_MS=5000
 # Embedding Pipeline Topology (v2.3.0+)
 WRITE_EMBEDDINGS_LOCALLY=true       # Default true. Set to "false" on thin-client hosts (e.g. laptops)

package/dist/KnowledgeGraphManager.d.ts CHANGED Viewed

@@ -58,6 +58,8 @@ export declare class KnowledgeGraphManager {
     private vectorStore?;
     private writeEmbeddingsLocally;
     private reranker?;
+    /** Once-per-process latch so the keyword-only fallback warn does not spam logs. */
+    private static keywordFallbackWarned;
     constructor(options?: KnowledgeGraphManagerOptions);
     private queueEmbeddings;
     /**
@@ -159,17 +161,43 @@ export declare class KnowledgeGraphManager {
         includeNullDomain?: boolean;
     }): Promise<KnowledgeGraph>;
     /**
-     * Optionally rerank semantic-search results with a cross-encoder (RerankerService).
+     * Trim an ordered entity list to `returnCount` and rebuild the result around it.
      *
-     * Strictly additive and FAIL-OPEN: if no reranker is configured, the candidate set is
-     * trivial (<=1), or the rerank call errors/times out/returns garbage, the original
-     * vector/hybrid ordering is returned unchanged. This also imposes a meaningful final
-     * order (the underlying openNodes() hop does not preserve rank order).
+     * Recall order is meaningful as of v2.7.0: Neo4jStorageProvider.semanticSearch reorders
+     * hydrated entities to match the ranked name list, so slicing recall preserves the
+     * vector/hybrid ranking. Relations are filtered to the surviving entities and `total`
+     * reflects the returned entity count so it can't overstate the trimmed result.
+     *
+     * @param recall - The recall result whose non-entity fields are preserved
+     * @param ordered - The entities in final (rerank or recall) order
+     * @param returnCount - Maximum number of entities to return
+     * @returns The recall result rebuilt around the trimmed, ordered entities
+     */
+    /**
+     * Honour an explicit caller limit on a keyword-fallback result (v2.7.0).
+     * No limit given → the graph passes through unchanged (keyword search keeps
+     * its own result-size semantics); an explicit limit is enforced exactly,
+     * matching the documented semantic_search contract even in degraded mode.
+     */
+    private applyExplicitLimit;
+    private trimToReturnCount;
+    /**
+     * Order semantic-search results and trim them to `returnCount`.
+     *
+     * With a cross-encoder reranker (RerankerService) configured, entities are reordered
+     * best-first by rerank score; if the rerank ordering covers fewer entities than
+     * `returnCount` (e.g. an explicit limit above the RERANK_TOP_N scoring cap), the
+     * unscored remainder is appended in recall order. Strictly additive and FAIL-OPEN: if
+     * no reranker is configured, the candidate set is trivial (<=1), or the rerank call
+     * errors/times out/returns garbage, the recall ordering is used instead (meaningful as
+     * of v2.7.0 — the provider preserves rank order through entity hydration). Every path
+     * returns at most `returnCount` entities, filters relations to the surviving entities,
+     * and sets `total` to the returned entity count.
      *
      * @param query - The search query
      * @param recall - The vector/hybrid recall result to reorder
-     * @param topK - Number of results to keep after reranking
-     * @returns The reranked (or, on any failure, the original) knowledge graph
+     * @param returnCount - Number of results to return after ordering and trimming
+     * @returns The reranked (or, on any rerank failure, recall-ordered) knowledge graph
      */
     private maybeRerank;
     /**

package/dist/KnowledgeGraphManager.js CHANGED Viewed

@@ -24,6 +24,8 @@ export class KnowledgeGraphManager {
     vectorStore;
     writeEmbeddingsLocally;
     reranker;
+    /** Once-per-process latch so the keyword-only fallback warn does not spam logs. */
+    static keywordFallbackWarned = false;
     constructor(options) {
         this.storageProvider = options?.storageProvider;
         this.embeddingJobManager = options?.embeddingJobManager;
@@ -303,8 +305,9 @@ export class KnowledgeGraphManager {
             // Ensure vector store is available
             const vectorStore = await this.ensureVectorStore().catch(() => { });
             if (vectorStore) {
-                const limit = options.limit || 10;
-                const minSimilarity = options.threshold || 0.7;
+                // ?? (not ||) so an explicit limit/threshold of 0 is honoured (v2.7.0)
+                const limit = options.limit ?? 10;
+                const minSimilarity = options.threshold ?? 0.7;
                 // Search the vector store
                 const results = await vectorStore.search(embedding, {
                     limit,
@@ -323,7 +326,7 @@ export class KnowledgeGraphManager {
         }
         // If we have a vector search method in the storage provider, use it
         if (this.storageProvider && hasSearchVectors(this.storageProvider)) {
-            return this.storageProvider.searchVectors(embedding, options.limit || 10, options.threshold || 0.7);
+            return this.storageProvider.searchVectors(embedding, options.limit ?? 10, options.threshold ?? 0.7);
         }
         // Otherwise, return an empty result
         return [];
@@ -349,6 +352,17 @@ export class KnowledgeGraphManager {
         if (options.hybridSearch) {
             options = { ...options, semanticSearch: true };
         }
+        // v2.7.0: normalise an explicit limit once at the entry point — fractional
+        // values floor, negatives clamp to 0 (explicit "no results"), and non-finite
+        // values (NaN/Infinity) fall back to the defaults as if no limit were given.
+        if (options.limit !== undefined) {
+            const normalisedLimit = Number.isFinite(options.limit)
+                ? Math.max(0, Math.floor(options.limit))
+                : undefined;
+            if (normalisedLimit !== options.limit) {
+                options = { ...options, limit: normalisedLimit };
+            }
+        }
         // Check if semantic search is requested
         if (options.semanticSearch || options.hybridSearch) {
             // Check if we have a storage provider with semanticSearch method
@@ -358,39 +372,53 @@ export class KnowledgeGraphManager {
                     if (this.embeddingJobManager) {
                         const embeddingService = this.embeddingJobManager.embeddingService;
                         if (embeddingService) {
+                            // Recall/return counts (v2.7.0): recall a fixed default of 10 unless the
+                            // caller sets an explicit limit; when a reranker is configured and no limit
+                            // is given, return its topK (default 5) best candidates from that recall.
+                            const recallLimit = options.limit ?? 10;
+                            const returnCount = options.limit ?? (this.reranker?.enabled ? this.reranker.topK : 10);
+                            // An explicit limit of 0 is empty by construction — skip the billable
+                            // query-embedding call, the recall pipeline, and any rerank call entirely.
+                            if (returnCount === 0) {
+                                return { entities: [], relations: [], total: 0 };
+                            }
                             const queryVector = await embeddingService.generateEmbedding(query);
-                            // Widen recall when reranking so the reranker has candidates to reorder.
-                            const recallLimit = this.reranker?.enabled
-                                ? Math.max(options.limit ?? 10, this.reranker.topN)
-                                : options.limit;
                             const recall = await this.storageProvider.semanticSearch(query, {
                                 ...options,
                                 limit: recallLimit,
                                 queryVector,
                             });
-                            return await this.maybeRerank(query, recall, options.limit ?? 10);
+                            return await this.maybeRerank(query, recall, returnCount);
                         }
                     }
                     // Fall back to text search if no embedding service
-                    return await this.storageProvider.searchNodes(query, {
+                    const fallbackMessage = 'Semantic search requested but no embedding service is available — falling back to keyword-only searchNodes. Configure EMBEDDING_API_KEY (or OPENAI_API_KEY) for semantic retrieval.';
+                    if (KnowledgeGraphManager.keywordFallbackWarned) {
+                        logger.debug(fallbackMessage);
+                    }
+                    else {
+                        KnowledgeGraphManager.keywordFallbackWarned = true;
+                        logger.warn(fallbackMessage);
+                    }
+                    return this.applyExplicitLimit(await this.storageProvider.searchNodes(query, {
                         domain: options.domain,
                         includeNullDomain: options.includeNullDomain,
-                    });
+                    }), options.limit);
                 }
                 catch (error) {
                     logger.error('Provider semanticSearch failed, falling back to basic search', error);
-                    return this.storageProvider.searchNodes(query, {
+                    return this.applyExplicitLimit(await this.storageProvider.searchNodes(query, {
                         domain: options.domain,
                         includeNullDomain: options.includeNullDomain,
-                    });
+                    }), options.limit);
                 }
             }
             else if (this.storageProvider) {
                 // Fall back to searchNodes if semanticSearch is not available in the provider
-                return this.storageProvider.searchNodes(query, {
+                return this.applyExplicitLimit(await this.storageProvider.searchNodes(query, {
                     domain: options.domain,
                     includeNullDomain: options.includeNullDomain,
-                });
+                }), options.limit);
             }
             // If no storage provider or its semanticSearch is not available, try internal semantic search
             if (this.embeddingJobManager) {
@@ -398,8 +426,9 @@ export class KnowledgeGraphManager {
                     // Try to use semantic search
                     const results = await this.semanticSearch(query, {
                         hybridSearch: options.hybridSearch || false,
-                        limit: options.limit || 10,
-                        threshold: options.threshold || options.minSimilarity || 0.5,
+                        // ?? (not ||) so an explicit limit/threshold of 0 is honoured (v2.7.0)
+                        limit: options.limit ?? 10,
+                        threshold: options.threshold ?? options.minSimilarity ?? 0.5,
                         entityTypes: options.entityTypes || [],
                         facets: options.facets || [],
                         offset: options.offset || 0,
@@ -413,10 +442,10 @@ export class KnowledgeGraphManager {
                     logger.error('Semantic search failed, falling back to basic search', error);
                     // Explicitly call searchNodes if available in the provider
                     if (this.storageProvider) {
-                        return this.storageProvider.searchNodes(query, {
+                        return this.applyExplicitLimit(await this.storageProvider.searchNodes(query, {
                             domain: options.domain,
                             includeNullDomain: options.includeNullDomain,
-                        });
+                        }), options.limit);
                     }
                 }
             }
@@ -431,45 +460,92 @@ export class KnowledgeGraphManager {
         });
     }
     /**
-     * Optionally rerank semantic-search results with a cross-encoder (RerankerService).
+     * Trim an ordered entity list to `returnCount` and rebuild the result around it.
+     *
+     * Recall order is meaningful as of v2.7.0: Neo4jStorageProvider.semanticSearch reorders
+     * hydrated entities to match the ranked name list, so slicing recall preserves the
+     * vector/hybrid ranking. Relations are filtered to the surviving entities and `total`
+     * reflects the returned entity count so it can't overstate the trimmed result.
+     *
+     * @param recall - The recall result whose non-entity fields are preserved
+     * @param ordered - The entities in final (rerank or recall) order
+     * @param returnCount - Maximum number of entities to return
+     * @returns The recall result rebuilt around the trimmed, ordered entities
+     */
+    /**
+     * Honour an explicit caller limit on a keyword-fallback result (v2.7.0).
+     * No limit given → the graph passes through unchanged (keyword search keeps
+     * its own result-size semantics); an explicit limit is enforced exactly,
+     * matching the documented semantic_search contract even in degraded mode.
+     */
+    applyExplicitLimit(graph, limit) {
+        if (limit === undefined) {
+            return graph;
+        }
+        return this.trimToReturnCount(graph, graph.entities ?? [], limit);
+    }
+    trimToReturnCount(recall, ordered, returnCount) {
+        const entities = ordered.slice(0, returnCount);
+        const names = new Set(entities.map(entity => entity.name));
+        return {
+            ...recall,
+            entities,
+            relations: (recall.relations || []).filter(relation => names.has(relation.from) && names.has(relation.to)),
+            total: entities.length,
+        };
+    }
+    /**
+     * Order semantic-search results and trim them to `returnCount`.
      *
-     * Strictly additive and FAIL-OPEN: if no reranker is configured, the candidate set is
-     * trivial (<=1), or the rerank call errors/times out/returns garbage, the original
-     * vector/hybrid ordering is returned unchanged. This also imposes a meaningful final
-     * order (the underlying openNodes() hop does not preserve rank order).
+     * With a cross-encoder reranker (RerankerService) configured, entities are reordered
+     * best-first by rerank score; if the rerank ordering covers fewer entities than
+     * `returnCount` (e.g. an explicit limit above the RERANK_TOP_N scoring cap), the
+     * unscored remainder is appended in recall order. Strictly additive and FAIL-OPEN: if
+     * no reranker is configured, the candidate set is trivial (<=1), or the rerank call
+     * errors/times out/returns garbage, the recall ordering is used instead (meaningful as
+     * of v2.7.0 — the provider preserves rank order through entity hydration). Every path
+     * returns at most `returnCount` entities, filters relations to the surviving entities,
+     * and sets `total` to the returned entity count.
      *
      * @param query - The search query
      * @param recall - The vector/hybrid recall result to reorder
-     * @param topK - Number of results to keep after reranking
-     * @returns The reranked (or, on any failure, the original) knowledge graph
+     * @param returnCount - Number of results to return after ordering and trimming
+     * @returns The reranked (or, on any rerank failure, recall-ordered) knowledge graph
      */
-    async maybeRerank(query, recall, topK) {
-        if (!this.reranker?.enabled || !recall.entities || recall.entities.length <= 1) {
-            return recall;
+    async maybeRerank(query, recall, returnCount) {
+        const recallEntities = recall.entities ?? [];
+        if (!this.reranker?.enabled || recallEntities.length <= 1) {
+            return this.trimToReturnCount(recall, recallEntities, returnCount);
         }
         try {
-            const passages = recall.entities.map(entity => prepareEntityText(entity));
+            const passages = recallEntities.map(entity => prepareEntityText(entity));
             const order = await this.reranker.rerank(query, passages);
-            if (order.length === 0)
-                return recall;
             const reordered = order
-                .map(index => recall.entities[index])
-                .filter((entity) => Boolean(entity))
-                .slice(0, topK);
-            const names = new Set(reordered.map(entity => entity.name));
-            return {
-                ...recall,
-                entities: reordered,
-                relations: (recall.relations || []).filter(relation => names.has(relation.from) && names.has(relation.to)),
-                // Reflect the post-rerank trim so `total` can't overstate the returned entity count.
-                total: reordered.length,
-            };
+                .map(index => recallEntities[index])
+                .filter((entity) => Boolean(entity));
+            if (reordered.length === 0) {
+                return this.trimToReturnCount(recall, recallEntities, returnCount);
+            }
+            // An explicit limit above the reranker's scoring cap (RERANK_TOP_N) leaves some
+            // recall entities unscored — append them in recall order to honour the limit.
+            if (reordered.length < returnCount) {
+                const included = new Set(reordered.map(entity => entity.name));
+                for (const entity of recallEntities) {
+                    if (reordered.length >= returnCount)
+                        break;
+                    if (!included.has(entity.name)) {
+                        included.add(entity.name);
+                        reordered.push(entity);
+                    }
+                }
+            }
+            return this.trimToReturnCount(recall, reordered, returnCount);
         }
         catch (error) {
-            logger.warn('Reranker failed; returning vector/hybrid order unchanged (fail-open)', {
+            logger.warn('Reranker failed; returning recall order trimmed to returnCount (fail-open)', {
                 error: error instanceof Error ? error.message : String(error),
             });
-            return recall;
+            return this.trimToReturnCount(recall, recallEntities, returnCount);
         }
     }
     /**