npm - @harperfast/skills - Versions diffs - 1.5.1 → 1.6.1 - Mend

@harperfast/skills 1.5.1 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/dist/index.js +1 -1
package/harper-best-practices/AGENTS.md +809 -189
package/harper-best-practices/rules/vector-indexing.md +84 -121
package/harper-best-practices/rules.manifest.yaml +13 -1
package/package.json +3 -1

package/dist/index.js CHANGED Viewed

@@ -47,7 +47,7 @@ export const rules = {
 	"serving-web-content": "---\nname: serving-web-content\ndescription: How to serve static files and integrated Vite/React applications in Harper.\nmetadata:\n  mode: synthesized\n---\n\n# Serving Web Content\n\nInstructions for the agent to follow when serving web content from Harper.\n\n## When to Use\n\nUse this skill when you need to serve a frontend (HTML, CSS, JS, or a React app) directly from your Harper instance.\n\n## How It Works\n\n1. **Choose a Method**: Decide between the simple Static Plugin or the integrated Vite Plugin.\n2. **Option A: Static Plugin (Simple)**:\n   - Add to `config.yaml`:\n     ```yaml\n     static:\n       files: 'web/*'\n     ```\n   - Place files in a `web/` folder in the project root.\n   - Files are served at the root URL (e.g., `http://localhost:9926/index.html`).\n3. **Option B: Vite Plugin (Advanced/Development)**:\n   - Add to `config.yaml`:\n     ```yaml\n     '@harperfast/vite-plugin':\n       package: '@harperfast/vite-plugin'\n     ```\n   - Ensure `vite.config.ts` and `index.html` are in the project root.\n\n   ```javascript\n   import vue from '@vitejs/plugin-vue';\n   import path from 'node:path';\n   import { defineConfig } from 'vite';\n\n   // https://vite.dev/config/\n   export default defineConfig({\n   \tplugins: [vue()],\n   \tresolve: {\n   \t\talias: {\n   \t\t\t'@': path.resolve(import.meta.dirname, './src'),\n   \t\t},\n   \t},\n   \tbuild: {\n   \t\toutDir: 'web',\n   \t\temptyOutDir: true,\n   \t\trolldownOptions: {\n   \t\t\texternal: ['**/*.test.*', '**/*.spec.*'],\n   \t\t},\n   \t},\n   });\n   ```\n\n   - Install dependencies: `npm install --save-dev vite @harperfast/vite-plugin`.\n   - Then `harper run .` will start up Harper and Vite with HMR. Vite does _not_ need to be executed separately.\n\n4. **Deploy for Production**: For Vite apps, use a build script to generate static files into a `web/` folder and deploy them using the static handler pattern. For example, these scripts in a package.json can perform the necessary steps:\n   ```json\n   \"build\": \"vite build\",\n   \"deploy\": \"rm -Rf deploy && npm run build && mkdir deploy && mv web deploy/ && cp -R deploy-template/* deploy/ && cp -R schemas resources deploy/ && (cd deploy && harper deploy_component . project=web restart=rolling replicated=true) && rm -Rf deploy\",\n   ```\n   Then in production, the \"Static Plugin\" option will performantly and securely serve your assets. `npm create harper@latest` scaffolds all of this for you.\n",
 	"typescript-type-stripping": "---\nname: typescript-type-stripping\ndescription: How to run TypeScript files directly in Harper without a build step.\nmetadata:\n  mode: synthesized\n---\n\n# TypeScript Type Stripping\n\nInstructions for the agent to follow when using TypeScript in Harper.\n\n## When to Use\n\nUse this skill when you want to write Harper Resources in TypeScript and have them execute directly in Node.js without an intermediate build or compilation step.\n\n## How It Works\n\n1. **Verify Node.js Version**: Ensure you are using Node.js v22.6.0 or higher.\n2. **Name Files with `.ts`**: Create your resource files in the `resources/` directory with a `.ts` extension.\n3. **Use TypeScript Syntax**: Write your resource classes using standard TypeScript (interfaces, types, etc.).\n   ```typescript\n   import { Resource } from 'harper';\n   export class MyResource extends Resource {\n   \tasync get(): Promise<{ message: string }> {\n   \t\treturn { message: 'Running TS directly!' };\n   \t}\n   }\n   ```\n4. **Use Explicit Extensions in Imports**: When importing other local modules, include the `.ts` extension: `import { helper } from './helper.ts'`.\n5. **Configure `config.yaml`**: Ensure `jsResource` points to your `.ts` files:\n   ```yaml\n   jsResource:\n     files: 'resources/*.ts'\n   ```\n",
 	"using-blob-datatype": "---\nname: using-blob-datatype\ndescription: How to use the Blob data type for efficient binary storage in Harper.\nmetadata:\n  mode: synthesized\n---\n\n# Using Blob Datatype\n\nInstructions for the agent to follow when working with the Blob data type in Harper.\n\n## When to Use\n\nUse this skill when you need to store unstructured or large binary data (media, documents) that is too large for standard JSON fields. Blobs provide efficient storage and integrated streaming support.\n\n## How It Works\n\n1. **Define Blob Fields**: In your GraphQL schema, use the `Blob` type:\n   ```graphql\n   type MyTable @table {\n   \tid: ID @primaryKey\n   \tdata: Blob\n   }\n   ```\n2. **Create and Store Blobs**: Use `createBlob()` from Harper's globals to wrap Buffers or Streams:\n   ```javascript\n   import { tables } from 'harper';\n   const blob = createBlob(largeBuffer);\n   await tables.MyTable.put('my-id', { data: blob });\n   ```\n3. **Use Streaming (Optional)**: For very large files, pass a stream to `createBlob()` to avoid loading the entire file into memory.\n4. **Read Blob Data**: Retrieve the record and use `.bytes()` or streaming interfaces on the blob field:\n   ```javascript\n   const record = await tables.MyTable.get('my-id');\n   const buffer = await record.data.bytes();\n   ```\n5. **Ensure Write Completion**: Use `saveBeforeCommit: true` in `createBlob` options if you need the blob fully written before the record is committed.\n6. **Handle Errors**: Attach error listeners to the blob object to handle streaming failures.\n",
-	"vector-indexing": "---\nname: vector-indexing\ndescription: How to enable and query vector indexes for similarity search in Harper.\nmetadata:\n  mode: synthesized\n---\n\n# Vector Indexing\n\nInstructions for the agent to follow when implementing vector search in Harper.\n\n## When to Use\n\nUse this skill when you need to perform similarity searches on high-dimensional data, such as AI embeddings for semantic search, recommendations, or image retrieval.\n\n## How It Works\n\n1. **Enable Vector Indexing**: In your GraphQL schema, add `@indexed(type: \"HNSW\")` to a numeric array field:\n   ```graphql\n   type Product @table {\n   \tid: ID @primaryKey\n   \ttextEmbeddings: [Float] @indexed(type: \"HNSW\")\n   }\n   ```\n2. **Configure Index Options (Optional)**: Fine-tune the index with parameters like `distance` (`cosine` or `euclidean`), `M`, and `efConstruction`.\n3. **Query with Vector Search**: Use `tables.Table.search()` with a `sort` object containing the `target` vector:\n   ```javascript\n   const results = await tables.Product.search({\n     select: ['name', '$distance'],\n     sort: {\n       attribute: 'textEmbeddings',\n       target: [0.1, 0.2, ...], // query vector\n     },\n     limit: 5,\n   });\n   ```\n4. **Filter by Distance**: Use `conditions` with a `target` vector and a `comparator` (e.g., `lt`) to return results within a similarity threshold:\n   ```javascript\n   const results = await tables.Product.search({\n   \tconditions: {\n   \t\tattribute: 'textEmbeddings',\n   \t\tcomparator: 'lt',\n   \t\tvalue: 0.1,\n   \t\ttarget: searchVector,\n   \t},\n   });\n   ```\n5. **Generate Embeddings**: Use external services (OpenAI, Ollama) to generate the numeric vectors before storing or searching them in Harper.\n\n```typescript\nimport OpenAI from 'openai';\nimport ollama from 'ollama';\n\nconst { Product } = tables;\nconst openai = new OpenAI();\n// the name of the OpenAI embedding model\nconst OPENAI_EMBEDDING_MODEL = 'text-embedding-3-small';\n\n// the name of the Ollama embedding model\nconst OLLAMA_EMBEDDING_MODEL = 'llama3';\n\nconst SIMILARITY_THRESHOLD = 0.5;\n\nexport class ProductSearch extends Resource {\n\t// based on env variable we choose the appropriate embedding generator\n\tgenerateEmbedding =\n\t\tprocess.env.EMBEDDING_GENERATOR === 'ollama'\n\t\t\t? this._generateOllamaEmbedding\n\t\t\t: this._generateOpenAIEmbedding;\n\n\t/**\n\t * Executes a search query using a generated text embedding and returns the matching products.\n\t *\n\t * @param {Object} data - The input data for the request.\n\t * @param {string} data.prompt - The prompt to generate the text embedding from.\n\t * @return {Promise<Array>} Returns a promise that resolves to an array of products matching the conditions,\n\t * including fields: name, description, price, and $distance.\n\t */\n\tasync post(data) {\n\t\tconst embedding = await this.generateEmbedding(data.prompt);\n\n\t\treturn await Product.search({\n\t\t\tselect: ['name', 'description', 'price', '$distance'],\n\t\t\tconditions: {\n\t\t\t\tattribute: 'textEmbeddings',\n\t\t\t\tcomparator: 'lt',\n\t\t\t\tvalue: SIMILARITY_THRESHOLD,\n\t\t\t\ttarget: embedding[0],\n\t\t\t},\n\t\t\tlimit: 5,\n\t\t});\n\t}\n\n\t/**\n\t * Generates an embedding using the Ollama API.\n\t *\n\t * @param {string} promptData - The input data for which the embedding is to be generated.\n\t * @return {Promise<number[][]>} A promise that resolves to the generated embedding as an array of numbers.\n\t */\n\tasync _generateOllamaEmbedding(promptData) {\n\t\tconst embedding = await ollama.embed({\n\t\t\tmodel: OLLAMA_EMBEDDING_MODEL,\n\t\t\tinput: promptData,\n\t\t});\n\t\treturn embedding?.embeddings;\n\t}\n\n\t/**\n\t * Generates OpenAI embeddings based on the given prompt data.\n\t *\n\t * @param {string} promptData - The input data used for generating the embedding.\n\t * @return {Promise<number[][]>} A promise that resolves to an array of embeddings, where each embedding is an array of floats.\n\t */\n\tasync _generateOpenAIEmbedding(promptData) {\n\t\tconst embedding = await openai.embeddings.create({\n\t\t\tmodel: OPENAI_EMBEDDING_MODEL,\n\t\t\tinput: promptData,\n\t\t\tencoding_format: 'float',\n\t\t});\n\n\t\tlet embeddings = [];\n\t\tembedding.data.forEach((embeddingData) => {\n\t\t\tembeddings.push(embeddingData.embedding);\n\t\t});\n\n\t\treturn embeddings;\n\t}\n}\n```\n\n## Examples\n\nSample request to the `ProductSearch` resource which prompts to find \"shorts for the gym\":\n\n```bash\ncurl -X POST \"http://localhost:9926/ProductSearch/\" \\\n-H \"Accept: application/json\" \\\n-H \"Content-Type: application/json\" \\\n-H \"Authorization: Basic <YOUR_AUTH>\" \\\n-d '{\"prompt\": \"shorts for the gym\"}'\n```\n\n---\n\n## When to Use Vector Indexing\n\nVector indexing is ideal when:\n\n- Storing embedding vectors from ML models\n- Performing semantic or similarity-based search\n- Working with high-dimensional numeric data\n- Exact-match indexes are insufficient\n\n---\n\n## Summary\n\n- Vector indexing enables fast similarity search on numeric arrays\n- Defined using `@indexed(type: \"HNSW\")`\n- Queried using a target vector in search sorting\n- Tunable for performance and accuracy\n"
+	"vector-indexing": "---\nname: vector-indexing\ndescription: How to enable and query vector indexes for similarity search in Harper.\nmetadata:\n  mode: generate\n  sources:\n    - reference/v5/database/schema.md#Vector Indexing\n  sourceCommit: 6d4a30ccd5b32528e0e9963565782dca9fff5ada\n  inputHash: 3732961c671aac00\n---\n\n# Vector Indexing\n\nInstructions for the agent to follow when enabling and querying vector indexes for similarity search in Harper using the HNSW algorithm.\n\n## When to Use\n\nApply this rule when adding a vector index to a Harper table schema or writing similarity search queries against high-dimensional vector fields. Use it whenever you need approximate nearest-neighbor search, distance-threshold filtering, or distance-scored results.\n\n## How It Works\n\n1. **Declare a vector index on a `[Float]` field**: Add `@indexed(type: \"HNSW\")` to any `[Float]` attribute in a `@table` type. See [adding-tables-with-schemas.md](adding-tables-with-schemas.md) for general schema setup.\n\n   ```graphql\n   type Document @table {\n   \tid: Long @primaryKey\n   \ttextEmbeddings: [Float] @indexed(type: \"HNSW\")\n   }\n   ```\n\n2. **Query by nearest neighbors using `sort`**: Call `Document.search()` with a `sort` object specifying `attribute` (the indexed field) and `target` (the query vector). Include `limit` to cap results.\n\n   ```javascript\n   let results = Document.search({\n   \tsort: { attribute: 'textEmbeddings', target: searchVector },\n   \tlimit: 5,\n   });\n   ```\n\n3. **Combine HNSW with filter conditions**: Add a `conditions` array alongside `sort` to pre-filter records before ranking by similarity.\n\n   ```javascript\n   let results = Document.search({\n   \tconditions: [{ attribute: 'price', comparator: 'lt', value: 50 }],\n   \tsort: { attribute: 'textEmbeddings', target: searchVector },\n   \tlimit: 5,\n   });\n   ```\n\n4. **Filter by distance threshold**: Place `target` directly on a condition (alongside `attribute`, `comparator`, and `value`) to return only records whose distance to the target vector is below a threshold. Use this form to bound result quality by a similarity cutoff rather than ranking.\n\n   ```javascript\n   let results = Document.search({\n   \tconditions: {\n   \t\tattribute: 'textEmbeddings',\n   \t\tcomparator: 'lt',\n   \t\tvalue: 0.1,\n   \t\ttarget: searchVector,\n   \t},\n   });\n   ```\n\n5. **Include computed distance in results**: Add `'$distance'` to the `select` array to return the computed distance from the target vector alongside each record. `$distance` works in both `sort`-based and `conditions`-based queries.\n\n   ```javascript\n   let results = Document.search({\n   \tselect: ['name', '$distance'],\n   \tsort: { attribute: 'textEmbeddings', target: searchVector },\n   \tlimit: 5,\n   });\n   ```\n\n6. **Tune HNSW parameters**: Pass additional parameters to `@indexed(type: \"HNSW\", ...)` to control index quality and performance:\n\n   | Parameter              | Default           | Description                                                                                         |\n   | ---------------------- | ----------------- | --------------------------------------------------------------------------------------------------- |\n   | `distance`             | `\"cosine\"`        | Distance function: `\"euclidean\"` or `\"cosine\"` (negative cosine similarity)                         |\n   | `efConstruction`       | `100`             | Max nodes explored during index construction. Higher = better recall, lower = better performance    |\n   | `M`                    | `16`              | Preferred connections per graph layer. Higher = more space, better recall for high-dimensional data |\n   | `optimizeRouting`      | `0.5`             | Heuristic aggressiveness for omitting redundant connections (0 = off, 1 = most aggressive)          |\n   | `mL`                   | computed from `M` | Normalization factor for level generation                                                           |\n   | `efSearchConstruction` | `50`              | Max nodes explored during search                                                                    |\n\n## Examples\n\n**Schema with custom HNSW parameters:**\n\n```graphql\ntype Document @table {\n\tid: Long @primaryKey\n\ttextEmbeddings: [Float]\n\t\t@indexed(type: \"HNSW\", distance: \"euclidean\", optimizeRouting: 0, efSearchConstruction: 100)\n}\n```\n\n**Nearest-neighbor search with distance output:**\n\n```javascript\nlet results = Document.search({\n\tselect: ['name', '$distance'],\n\tsort: { attribute: 'textEmbeddings', target: searchVector },\n\tlimit: 5,\n});\n```\n\n**Distance-threshold filter (no ranking):**\n\n```javascript\nlet results = Document.search({\n\tconditions: {\n\t\tattribute: 'textEmbeddings',\n\t\tcomparator: 'lt',\n\t\tvalue: 0.1,\n\t\ttarget: searchVector,\n\t},\n});\n```\n\n## Notes\n\n- The default `distance` function is `cosine`. To use Euclidean distance, set `distance: \"euclidean\"` in the `@indexed` directive.\n- `efConstruction` controls index build quality; increase it to improve recall at the cost of slower indexing.\n- `$distance` is a special field — prefix it with `$` exactly as shown; it is not a schema attribute.\n- `target` is required in both `sort`-based and threshold-based condition queries to identify the reference vector for distance computation.\n"
 };
 /**