npm - @mastra/pg - Versions diffs - 0.16.1-alpha.0 → 0.16.1-alpha.1 - Mend

@mastra/pg 0.16.1-alpha.0 → 0.16.1-alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +13 -0
package/README.md +113 -7
package/dist/index.cjs +163 -36
package/dist/index.cjs.map +1 -1
package/dist/index.js +163 -36
package/dist/index.js.map +1 -1
package/dist/vector/index.d.ts +10 -0
package/dist/vector/index.d.ts.map +1 -1
package/package.json +4 -4

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,18 @@
 # @mastra/pg
+## 0.16.1-alpha.1
+### Patch Changes
+- Fix PostgreSQL vector index recreation issue and add optional index configuration ([#8020](https://github.com/mastra-ai/mastra/pull/8020))
+  - Fixed critical bug where memory vector indexes were unnecessarily recreated on every operation
+  - Added support for configuring vector index types (HNSW, IVFFlat, flat) and parameters
+- fix(pg-vector): Fix vector type qualification for custom schemas on RDS ([#8070](https://github.com/mastra-ai/mastra/pull/8070))
+- Updated dependencies [[`4b339b8`](https://github.com/mastra-ai/mastra/commit/4b339b8141c20d6a6d80583c7e8c5c05d8c19492), [`c591dfc`](https://github.com/mastra-ai/mastra/commit/c591dfc1e600fae1dedffe239357d250e146378f), [`1920c5c`](https://github.com/mastra-ai/mastra/commit/1920c5c6d666f687785c73021196aa551e579e0d), [`b6a3b65`](https://github.com/mastra-ai/mastra/commit/b6a3b65d830fa0ca7754ad6481661d1f2c878f21), [`af3abb6`](https://github.com/mastra-ai/mastra/commit/af3abb6f7c7585d856e22d27f4e7d2ece2186b9a)]:
+  - @mastra/core@0.18.0-alpha.3
 ## 0.16.1-alpha.0
 ### Patch Changes

package/README.md CHANGED Viewed

@@ -27,6 +27,14 @@ await vectorStore.createIndex({
   indexName: 'my_vectors',
   dimension: 1536,
   metric: 'cosine',
+  // Optional: Configure index type and parameters
+  indexConfig: {
+    type: 'hnsw',  // 'ivfflat' (default), 'hnsw', or 'flat'
+    hnsw: {
+      m: 16,              // Number of connections per layer (default: 8)
+      efConstruction: 64  // Size of dynamic list (default: 32)
+    }
+  }
 });
 // Add vectors
@@ -104,14 +112,15 @@ Connection pool settings:
 ### Vector Store Features
-- Vector similarity search with cosine, euclidean, and dot product metrics
+- Vector similarity search with cosine, euclidean, and dot product (inner) metrics
 - Advanced metadata filtering with MongoDB-like query syntax
 - Minimum score threshold for queries
 - Automatic UUID generation for vectors
 - Table management (create, list, describe, delete, truncate)
-- Uses pgvector's IVFFLAT indexing with 100 lists by default
-- Supports HNSW indexing with configurable parameters
-- Supports flat indexing
+- Configurable vector index types:
+  - **IVFFlat** (default): Balanced speed/accuracy, auto-calculates optimal lists parameter
+  - **HNSW**: Fastest queries, higher memory usage, best for large datasets
+  - **Flat**: No index, 100% accuracy, best for small datasets (<1000 vectors)
 ### Storage Features
@@ -139,14 +148,111 @@ Example filter:
 }
 ```
+## Vector Index Configuration
+pgvector supports three index types, each with different performance characteristics:
+### IVFFlat Index (Default)
+IVFFlat groups vectors into clusters for efficient searching:
+```typescript
+await vectorStore.createIndex({
+  indexName: 'my_vectors',
+  dimension: 1536,
+  metric: 'cosine',
+  indexConfig: {
+    type: 'ivfflat',
+    ivf: {
+      lists: 1000, // Number of clusters (default: auto-calculated as sqrt(rows) * 2)
+    },
+  },
+});
+```
+- **Best for:** Medium to large datasets (10K-1M vectors)
+- **Build time:** Minutes for millions of vectors
+- **Query speed:** Fast (tens of milliseconds)
+- **Memory:** Moderate
+- **Accuracy:** ~95-99%
+### HNSW Index
+HNSW builds a graph structure for extremely fast searches:
+```typescript
+await vectorStore.createIndex({
+  indexName: 'my_vectors',
+  dimension: 1536,
+  metric: 'dotproduct', // Recommended for normalized embeddings (OpenAI, etc.)
+  indexConfig: {
+    type: 'hnsw',
+    hnsw: {
+      m: 16, // Connections per layer (default: 8, range: 2-100)
+      efConstruction: 64, // Dynamic list size (default: 32, range: 4-1000)
+    },
+  },
+});
+```
+- **Best for:** Large datasets (100K+ vectors) requiring fastest searches
+- **Build time:** Can take hours for large datasets
+- **Query speed:** Very fast (milliseconds even for millions)
+- **Memory:** High (can be 2-3x vector size)
+- **Accuracy:** ~99%
+**Tuning HNSW:**
+- Higher `m`: Better accuracy, more memory (16-32 for high accuracy)
+- Higher `efConstruction`: Better index quality, slower builds (64-200 for quality)
+### Flat Index (No Index)
+Uses sequential scan for 100% accuracy:
+```typescript
+await vectorStore.createIndex({
+  indexName: 'my_vectors',
+  dimension: 1536,
+  metric: 'cosine',
+  indexConfig: {
+    type: 'flat',
+  },
+});
+```
+- **Best for:** Small datasets (<1000 vectors) or when 100% accuracy is required
+- **Build time:** None
+- **Query speed:** Slow for large datasets (linear scan)
+- **Memory:** Minimal (just vectors)
+- **Accuracy:** 100%
+### Distance Metrics
+Choose the appropriate metric for your embeddings:
+- **`cosine`** (default): Angular similarity, good for text embeddings
+- **`euclidean`**: L2 distance, for unnormalized embeddings
+- **`dotproduct`**: Dot product, optimal for normalized embeddings (OpenAI, Cohere)
+### Index Recreation
+The system automatically detects configuration changes and only rebuilds indexes when necessary, preventing the performance issues from unnecessary recreations.
+**Important behaviors:**
+- If no `indexConfig` is provided, existing indexes are preserved as-is
+- If `indexConfig` is provided, indexes are only rebuilt if the configuration differs
+- New indexes default to IVFFlat with cosine distance when no config is specified
 ## Vector Store Methods
-- `createIndex({indexName, dimension, metric?, indexConfig?, defineIndex?})`: Create a new table with vector support
+- `createIndex({indexName, dimension, metric?, indexConfig?, buildIndex?})`: Create a new table with vector support
+- `buildIndex({indexName, metric?, indexConfig?})`: Build or rebuild vector index
 - `upsert({indexName, vectors, metadata?, ids?})`: Add or update vectors
 - `query({indexName, queryVector, topK?, filter?, includeVector?, minScore?})`: Search for similar vectors
-- `defineIndex({indexName, metric?, indexConfig?})`: Define an index
 - `listIndexes()`: List all vector-enabled tables
-- `describeIndex(indexName)`: Get table statistics
+- `describeIndex(indexName)`: Get table statistics and index configuration
 - `deleteIndex(indexName)`: Delete a table
 - `truncateIndex(indexName)`: Remove all data from a table
 - `disconnect()`: Close all database connections

package/dist/index.cjs CHANGED Viewed

@@ -371,7 +371,9 @@ var PgVector = class extends vector.MastraVector {
   setupSchemaPromise = null;
   installVectorExtensionPromise = null;
   vectorExtensionInstalled = void 0;
+  vectorExtensionSchema = null;
   schemaSetupComplete = void 0;
+  cacheWarmupPromise = null;
   constructor({
     connectionString,
     schemaName,
@@ -402,18 +404,24 @@ var PgVector = class extends vector.MastraVector {
           "vector.type": "postgres"
         }
       }) ?? basePool;
-      void (async () => {
-        const existingIndexes = await this.listIndexes();
-        void existingIndexes.map(async (indexName) => {
-          const info = await this.getIndexInfo({ indexName });
-          const key = await this.getIndexCacheKey({
-            indexName,
-            metric: info.metric,
-            dimension: info.dimension,
-            type: info.type
-          });
-          this.createdIndexes.set(indexName, key);
-        });
+      this.cacheWarmupPromise = (async () => {
+        try {
+          const existingIndexes = await this.listIndexes();
+          await Promise.all(
+            existingIndexes.map(async (indexName) => {
+              const info = await this.getIndexInfo({ indexName });
+              const key = await this.getIndexCacheKey({
+                indexName,
+                metric: info.metric,
+                dimension: info.dimension,
+                type: info.type
+              });
+              this.createdIndexes.set(indexName, key);
+            })
+          );
+        } catch (error) {
+          this.logger?.debug("Cache warming skipped or failed", { error });
+        }
       })();
     } catch (error$1) {
       throw new error.MastraError(
@@ -433,6 +441,45 @@ var PgVector = class extends vector.MastraVector {
     if (!this.mutexesByName.has(indexName)) this.mutexesByName.set(indexName, new asyncMutex.Mutex());
     return this.mutexesByName.get(indexName);
   }
+  /**
+   * Detects which schema contains the vector extension
+   */
+  async detectVectorExtensionSchema(client) {
+    try {
+      const result = await client.query(`
+        SELECT n.nspname as schema_name
+        FROM pg_extension e
+        JOIN pg_namespace n ON e.extnamespace = n.oid
+        WHERE e.extname = 'vector'
+        LIMIT 1;
+      `);
+      if (result.rows.length > 0) {
+        this.vectorExtensionSchema = result.rows[0].schema_name;
+        this.logger.debug("Vector extension found in schema", { schema: this.vectorExtensionSchema });
+        return this.vectorExtensionSchema;
+      }
+      return null;
+    } catch (error) {
+      this.logger.debug("Could not detect vector extension schema", { error });
+      return null;
+    }
+  }
+  /**
+   * Gets the properly qualified vector type name
+   */
+  getVectorTypeName() {
+    if (this.vectorExtensionSchema) {
+      if (this.vectorExtensionSchema === "pg_catalog") {
+        return "vector";
+      }
+      if (this.vectorExtensionSchema === (this.schema || "public")) {
+        return "vector";
+      }
+      const validatedSchema = utils.parseSqlIdentifier(this.vectorExtensionSchema, "vector extension schema");
+      return `${validatedSchema}.vector`;
+    }
+    return "vector";
+  }
   getTableName(indexName) {
     const parsedIndexName = utils.parseSqlIdentifier(indexName, "index name");
     const quotedIndexName = `"${parsedIndexName}"`;
@@ -504,11 +551,12 @@ var PgVector = class extends vector.MastraVector {
         await client.query(`SET LOCAL ivfflat.probes = ${probes}`);
       }
       const { tableName } = this.getTableName(indexName);
+      const vectorType = this.getVectorTypeName();
       const query = `
         WITH vector_scores AS (
           SELECT
             vector_id as id,
-            1 - (embedding <=> '${vectorStr}'::vector) as score,
+            1 - (embedding <=> '${vectorStr}'::${vectorType}) as score,
             metadata
             ${includeVector ? ", embedding" : ""}
           FROM ${tableName}
@@ -552,13 +600,14 @@ var PgVector = class extends vector.MastraVector {
     try {
       await client.query("BEGIN");
       const vectorIds = ids || vectors.map(() => crypto.randomUUID());
+      const vectorType = this.getVectorTypeName();
       for (let i = 0; i < vectors.length; i++) {
         const query = `
           INSERT INTO ${tableName} (vector_id, embedding, metadata)
-          VALUES ($1, $2::vector, $3::jsonb)
+          VALUES ($1, $2::${vectorType}, $3::jsonb)
           ON CONFLICT (vector_id)
           DO UPDATE SET
-            embedding = $2::vector,
+            embedding = $2::${vectorType},
             metadata = $3::jsonb
           RETURNING embedding::text
         `;
@@ -705,11 +754,15 @@ var PgVector = class extends vector.MastraVector {
       try {
         await this.setupSchema(client);
         await this.installVectorExtension(client);
+        if (this.schema && this.vectorExtensionSchema && this.schema !== this.vectorExtensionSchema && this.vectorExtensionSchema !== "pg_catalog") {
+          await client.query(`SET search_path TO ${this.getSchemaName()}, "${this.vectorExtensionSchema}"`);
+        }
+        const vectorType = this.getVectorTypeName();
         await client.query(`
           CREATE TABLE IF NOT EXISTS ${tableName} (
             id SERIAL PRIMARY KEY,
             vector_id TEXT UNIQUE NOT NULL,
-            embedding vector(${dimension}),
+            embedding ${vectorType}(${dimension}),
             metadata JSONB DEFAULT '{}'::jsonb
           );
         `);
@@ -764,17 +817,63 @@ var PgVector = class extends vector.MastraVector {
   async setupIndex({ indexName, metric, indexConfig }, client) {
     const mutex = this.getMutexByName(`build-${indexName}`);
     await mutex.runExclusive(async () => {
+      const isConfigEmpty = !indexConfig || Object.keys(indexConfig).length === 0 || !indexConfig.type && !indexConfig.ivf && !indexConfig.hnsw;
+      const indexType = isConfigEmpty ? "ivfflat" : indexConfig.type || "ivfflat";
       const { tableName, vectorIndexName } = this.getTableName(indexName);
-      if (this.createdIndexes.has(indexName)) {
+      let existingIndexInfo = null;
+      let dimension = 0;
+      try {
+        existingIndexInfo = await this.getIndexInfo({ indexName });
+        dimension = existingIndexInfo.dimension;
+        if (isConfigEmpty && existingIndexInfo.metric === metric) {
+          if (existingIndexInfo.type === "flat") {
+            this.logger?.debug(`No index exists for ${vectorIndexName}, will create default ivfflat index`);
+          } else {
+            this.logger?.debug(
+              `Index ${vectorIndexName} already exists (type: ${existingIndexInfo.type}, metric: ${existingIndexInfo.metric}), preserving existing configuration`
+            );
+            const cacheKey = await this.getIndexCacheKey({
+              indexName,
+              dimension,
+              type: existingIndexInfo.type,
+              metric: existingIndexInfo.metric
+            });
+            this.createdIndexes.set(indexName, cacheKey);
+            return;
+          }
+        }
+        let configMatches = existingIndexInfo.metric === metric && existingIndexInfo.type === indexType;
+        if (indexType === "hnsw") {
+          configMatches = configMatches && existingIndexInfo.config.m === (indexConfig.hnsw?.m ?? 8) && existingIndexInfo.config.efConstruction === (indexConfig.hnsw?.efConstruction ?? 32);
+        } else if (indexType === "flat") {
+          configMatches = configMatches && existingIndexInfo.type === "flat";
+        } else if (indexType === "ivfflat" && indexConfig.ivf?.lists) {
+          configMatches = configMatches && existingIndexInfo.config.lists === indexConfig.ivf?.lists;
+        }
+        if (configMatches) {
+          this.logger?.debug(`Index ${vectorIndexName} already exists with same configuration, skipping recreation`);
+          const cacheKey = await this.getIndexCacheKey({
+            indexName,
+            dimension,
+            type: existingIndexInfo.type,
+            metric: existingIndexInfo.metric
+          });
+          this.createdIndexes.set(indexName, cacheKey);
+          return;
+        }
+        this.logger?.info(`Index ${vectorIndexName} configuration changed, rebuilding index`);
         await client.query(`DROP INDEX IF EXISTS ${vectorIndexName}`);
+        this.describeIndexCache.delete(indexName);
+      } catch {
+        this.logger?.debug(`Index ${indexName} doesn't exist yet, will create it`);
       }
-      if (indexConfig.type === "flat") {
+      if (indexType === "flat") {
         this.describeIndexCache.delete(indexName);
         return;
       }
       const metricOp = metric === "cosine" ? "vector_cosine_ops" : metric === "euclidean" ? "vector_l2_ops" : "vector_ip_ops";
       let indexSQL;
-      if (indexConfig.type === "hnsw") {
+      if (indexType === "hnsw") {
         const m = indexConfig.hnsw?.m ?? 8;
         const efConstruction = indexConfig.hnsw?.efConstruction ?? 32;
         indexSQL = `
@@ -811,27 +910,48 @@ var PgVector = class extends vector.MastraVector {
     if (!this.installVectorExtensionPromise) {
       this.installVectorExtensionPromise = (async () => {
         try {
-          const extensionCheck = await client.query(`
-            SELECT EXISTS (
-              SELECT 1 FROM pg_extension WHERE extname = 'vector'
+          const existingSchema = await this.detectVectorExtensionSchema(client);
+          if (existingSchema) {
+            this.vectorExtensionInstalled = true;
+            this.vectorExtensionSchema = existingSchema;
+            this.logger.info(`Vector extension already installed in schema: ${existingSchema}`);
+            return;
+          }
+          try {
+            if (this.schema && this.schema !== "public") {
+              try {
+                await client.query(`CREATE EXTENSION IF NOT EXISTS vector SCHEMA ${this.getSchemaName()}`);
+                this.vectorExtensionInstalled = true;
+                this.vectorExtensionSchema = this.schema;
+                this.logger.info(`Vector extension installed in schema: ${this.schema}`);
+                return;
+              } catch (schemaError) {
+                this.logger.debug(`Could not install vector extension in schema ${this.schema}, trying public schema`, {
+                  error: schemaError
+                });
+              }
+            }
+            await client.query("CREATE EXTENSION IF NOT EXISTS vector");
+            const installedSchema = await this.detectVectorExtensionSchema(client);
+            if (installedSchema) {
+              this.vectorExtensionInstalled = true;
+              this.vectorExtensionSchema = installedSchema;
+              this.logger.info(`Vector extension installed in schema: ${installedSchema}`);
+            }
+          } catch (error) {
+            this.logger.warn(
+              "Could not install vector extension. This requires superuser privileges. If the extension is already installed, you can ignore this warning.",
+              { error }
             );
-          `);
-          this.vectorExtensionInstalled = extensionCheck.rows[0].exists;
-          if (!this.vectorExtensionInstalled) {
-            try {
-              await client.query("CREATE EXTENSION IF NOT EXISTS vector");
+            const existingSchema2 = await this.detectVectorExtensionSchema(client);
+            if (existingSchema2) {
               this.vectorExtensionInstalled = true;
-              this.logger.info("Vector extension installed successfully");
-            } catch {
-              this.logger.warn(
-                "Could not install vector extension. This requires superuser privileges. If the extension is already installed globally, you can ignore this warning."
-              );
+              this.vectorExtensionSchema = existingSchema2;
+              this.logger.info(`Vector extension found in schema: ${existingSchema2}`);
             }
-          } else {
-            this.logger.debug("Vector extension already installed, skipping installation");
           }
         } catch (error) {
-          this.logger.error("Error checking vector extension status", { error });
+          this.logger.error("Error setting up vector extension", { error });
           this.vectorExtensionInstalled = void 0;
           this.installVectorExtensionPromise = null;
           throw error;
@@ -1033,6 +1153,12 @@ var PgVector = class extends vector.MastraVector {
     }
   }
   async disconnect() {
+    if (this.cacheWarmupPromise) {
+      try {
+        await this.cacheWarmupPromise;
+      } catch {
+      }
+    }
     await this.pool.end();
   }
   /**
@@ -1055,8 +1181,9 @@ var PgVector = class extends vector.MastraVector {
       let updateParts = [];
       let values = [id];
       let valueIndex = 2;
+      const vectorType = this.getVectorTypeName();
       if (update.vector) {
-        updateParts.push(`embedding = $${valueIndex}::vector`);
+        updateParts.push(`embedding = $${valueIndex}::${vectorType}`);
         values.push(`[${update.vector.join(",")}]`);
         valueIndex++;
       }