npm - @neuralsea/workspace-indexer - Versions diffs - 0.1.0 - Mend

@neuralsea/workspace-indexer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,356 @@
+# @petri-ai/workspace-indexer
+A **local-first**, **multi-repo** workspace indexer for AI agents (e.g. your custom agent “Damocles”).
+It provides:
+- **Whole-workspace indexing** (multiple Git repos under a workspace root)
+- **Meaningful chunking** (TypeScript/JavaScript AST-aware chunking + robust fallback for other files)
+- **Semantic embeddings** (pluggable: **Ollama local**, **OpenAI**, or deterministic offline **hash**)
+- **Hybrid retrieval**: vector similarity **plus** lexical search (SQLite FTS5) with configurable weights
+- **Pluggable vector backends**: `bruteforce`, `hnswlib` (HNSW), `qdrant` (local/remote), `faiss`, or a custom provider
+- **Branch/commit isolation**: a separate index per repo per Git **HEAD** commit (reduces stale-context errors)
+- **Fast incremental updates**: file watching + `.git/HEAD` switch detection
+- **Security controls**: respects `.gitignore` via `git ls-files`, plus `.petriignore/.augmentignore`, plus redaction hooks
+This package is designed so Damocles can use the same index in different problem domains:
+- **Search**
+- **Refactor**
+- **Review**
+- **Architecture understanding**
+- **RCA (root cause analysis)**
+…by selecting different **retrieval profiles** (k/weights/context-expansion/scope).
+---
+## Install
+```bash
+npm i @neuralsea/workspace-indexer
+```
+Node 18+ recommended.
+---
+## Quick start (library)
+```ts
+import { WorkspaceIndexer, OllamaEmbeddingsProvider } from "@neuralsea/workspace-indexer";
+const embedder = new OllamaEmbeddingsProvider({ model: "nomic-embed-text" });
+const ix = new WorkspaceIndexer("/path/to/workspace", embedder);
+await ix.indexAll();
+// Domain: search
+const search = await ix.retrieve("Where is authentication enforced?", { profile: "search" });
+// Domain: refactor (more context)
+const refactor = await ix.retrieve("Refactor the caching layer to support TTL per key", { profile: "refactor" });
+// Domain: review (changed files only)
+const review = await ix.retrieve("Explain the risk of this change", {
+  profile: "review",
+  scope: { changedOnly: true, baseRef: "origin/main" }
+});
+console.log(search.hits.map(h => h.chunk.path));
+await ix.closeAsync();
+```
+---
+## CLI
+### Index a workspace
+```bash
+npx petri-index index /path/to/workspace --provider ollama --model nomic-embed-text
+```
+### Watch (keeps index current)
+```bash
+npx petri-index watch /path/to/workspace --provider ollama --model nomic-embed-text
+```
+### Query (profile: search)
+```bash
+npx petri-index query "rate limiting middleware" /path/to/workspace --k 8
+```
+### Retrieve (full context bundle as JSON)
+```bash
+npx petri-index retrieve "Why are requests timing out?" /path/to/workspace \
+  --profile rca \
+  --changedOnly true \
+  --baseRef origin/main
+```
+---
+## Retrieval profiles (how Petri adapts per domain)
+The same index can be used differently depending on the task. The package provides defaults:
+- `search`
+  Tight top-k; favours precise matches; minimal context expansion.
+- `refactor`
+  Wider k; includes adjacent chunks and follows relative imports to pull in dependent modules.
+- `review`
+  Biases to changed files (when scoped) and includes file synopsis for reviewer context.
+- `architecture`
+  Larger candidate pools; prioritises file synopses and follows imports more aggressively.
+- `rca`
+  Like review + recency bias (recently modified files rank higher).
+Each profile controls:
+- **k** (how many primary hits)
+- **weights** (vector/lexical/recency)
+- **expand** (adjacent chunks, follow imports, include file synopsis)
+- **candidate pool sizes** (vectorK/lexicalK)
+You can override any of these at runtime:
+```ts
+const bundle = await ix.retrieve("Explain auth flow", {
+  profile: "architecture",
+  profileOverrides: {
+    k: 30,
+    weights: { vector: 0.6, lexical: 0.3, recency: 0.1 },
+    expand: { followImports: 5 }
+  }
+});
+```
+---
+## Config file
+The CLI supports `--config` pointing to a JSON file.
+Example: `petri-index.config.json`
+```json
+{
+  "storage": {
+    "storeText": true,
+    "ftsMode": "full"
+  },
+  "vector": {
+    "provider": "hnswlib",
+    "metric": "cosine",
+    "hnswlib": {
+      "persist": true,
+      "persistDebounceMs": 2000,
+      "efSearch": 64
+    }
+  },
+  "chunk": {
+    "maxLines": 260,
+    "overlapLines": 50
+  },
+  "profiles": {
+    "architecture": {
+      "k": 30,
+      "expand": { "followImports": 4 }
+    },
+    "rca": {
+      "weights": { "recency": 0.35 }
+    }
+  }
+}
+```
+Run:
+```bash
+npx petri-index retrieve "How does login work?" /path/to/workspace --config petri-index.config.json --profile architecture
+```
+### Lexical modes (`storage.ftsMode`)
+- `"full"` (default): best retrieval; stores (redacted) chunk text in the FTS table.
+- `"tokens"`: stores only extracted identifiers/tokens for lexical search (less sensitive; still useful for code search).
+- `"off"`: disables lexical indexing entirely (vector-only retrieval).
+---
+## Vector backends
+Configure the ANN backend via `vector.provider`:
+- `"bruteforce"` (default): in-memory exact search, no extra dependencies
+- `"hnswlib"`: fast local ANN using HNSW via `hnswlib-node`
+- `"qdrant"`: Qdrant (local or remote) via `@qdrant/js-client-rest`
+- `"faiss"`: FAISS via `faiss-node` (rebuild-on-write; good for experimentation)
+- `"auto"`: picks the best available backend (prefers Qdrant if configured)
+- `"custom"`: load a custom provider module that implements the `VectorIndex` interface
+### HNSW (local)
+Install:
+```bash
+npm i hnswlib-node
+```
+Config:
+```json
+{
+  "vector": {
+    "provider": "hnswlib",
+    "metric": "cosine",
+    "hnswlib": {
+      "persist": true,
+      "persistDebounceMs": 2000,
+      "m": 16,
+      "efConstruction": 200,
+      "efSearch": 64
+    }
+  }
+}
+```
+### Qdrant (local)
+Start a local Qdrant:
+```bash
+docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
+```
+Install client:
+```bash
+npm i @qdrant/js-client-rest
+```
+Config:
+```json
+{
+  "vector": {
+    "provider": "qdrant",
+    "metric": "cosine",
+    "qdrant": {
+      "url": "http://127.0.0.1:6333",
+      "collectionPrefix": "petri",
+      "mode": "commit",
+      "recreateOnRebuild": true
+    }
+  }
+}
+```
+### FAISS
+Install:
+```bash
+npm i faiss-node
+```
+Config:
+```json
+{
+  "vector": {
+    "provider": "faiss",
+    "metric": "cosine",
+    "faiss": {
+      "descriptor": "HNSW,Flat",
+      "persist": true,
+      "persistDebounceMs": 2000,
+      "rebuildStrategy": "lazy"
+    }
+  }
+}
+```
+### Custom provider
+Point `vector.custom` to an ES module that exports either:
+- a class implementing `VectorIndex`, or
+- a factory function returning a `VectorIndex`
+```json
+{
+  "vector": {
+    "provider": "custom",
+    "custom": {
+      "module": "./my-vector-provider.mjs",
+      "export": "default",
+      "options": { "foo": "bar" }
+    }
+  }
+}
+```
+## Security model
+Local indexing means **your source stays on your machine**.
+Controls:
+1. **Git-native ignore**: files are selected via:
+   - `git ls-files --cached --others --exclude-standard`
+   which honours `.gitignore` exactly.
+2. **Extra ignores**: `.petriignore` and `.augmentignore`
+3. **Redaction hooks** (on by default):
+   - skip obvious secret files by path substring
+   - redact patterns (e.g. private keys) before embedding + storage
+> For higher assurance, set `storage.ftsMode = "tokens"` and review `redact.patterns`.
+---
+## Output format for agents
+`WorkspaceIndexer.retrieve()` returns a `ContextBundle`:
+- `hits[]` — ranked primary chunks with scores and previews
+- `context[]` — expanded context blocks with reasons (adjacency/imports/synopsis)
+- `stats` — diagnostics useful for your agent logs
+This is a good structure for:
+- Search answers (just `hits`)
+- Multi-file refactoring (use `context` as grounded evidence)
+- Review/RCA (scope to changed files, include synopsis, bias by recency)
+---
+## Performance notes
+- Default vector backend is **bruteforce** (exact search in memory). For large repos, use:
+  - `vector.provider = "hnswlib"` for fast local ANN (HNSW)
+  - `vector.provider = "qdrant"` for durable, scalable vector search
+  - `vector.provider = "faiss"` if you already run FAISS locally
+- SQLite remains the source-of-truth for file/chunk metadata, so you can rebuild the vector index at any time.
+---
+## Files ignored by default (recommended)
+Create a `.petriignore` in each repo to exclude heavy or noisy artefacts:
+```txt
+dist/
+build/
+coverage/
+**/*.min.js
+**/*.map
+```
+---
+## Licence
+MIT (add your own licence file if desired).