npm - opencode-semantic-search - Versions diffs - 0.1.0 - Mend

opencode-semantic-search 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/AGENTS.md +165 -0
package/README.md +138 -0
package/SETUP.md +541 -0
package/bin/opencode-semantic-search.mjs +70 -0
package/bun.lock +61 -0
package/index.ts +138 -0
package/install.sh +260 -0
package/package.json +67 -0
package/src/chunker/fallback.ts +77 -0
package/src/chunker/index.ts +16 -0
package/src/chunker/treesitter.ts +119 -0
package/src/config.ts +157 -0
package/src/diagnostics/bundle.ts +63 -0
package/src/diagnostics/routing.ts +37 -0
package/src/embedder/interface.ts +62 -0
package/src/embedder/ollama.ts +60 -0
package/src/embedder/openai.ts +71 -0
package/src/indexer/delta.ts +165 -0
package/src/indexer/gc.ts +10 -0
package/src/indexer/incremental.ts +105 -0
package/src/indexer/pipeline.test.ts +126 -0
package/src/indexer/pipeline.ts +394 -0
package/src/indexer/pool.ts +25 -0
package/src/indexer/resume.ts +14 -0
package/src/logger.ts +121 -0
package/src/runtime.ts +111 -0
package/src/search/context.ts +17 -0
package/src/search/hybrid.ts +65 -0
package/src/store/schema.sql +31 -0
package/src/store/sqlite.ts +269 -0
package/src/tools/diagnostic_bundle.ts +34 -0
package/src/tools/index_status.ts +73 -0
package/src/tools/reindex.ts +71 -0
package/src/tools/semantic_search.ts +91 -0
package/src/tools/smart_grep.ts +198 -0
package/src/tui_toast.ts +191 -0
package/src/types.d.ts +1 -0

package/AGENTS.md ADDED Viewed

@@ -0,0 +1,165 @@
+# AGENTS.md — Guide for Coding Agents
+This file provides conventions and commands for agents working in this repository.
+## Project Overview
+A semantic search plugin for [OpenCode AI](https://github.com/opencode-ai/opencode). It indexes code files using tree-sitter chunking and embeddings (OpenAI / Ollama), stores vectors in SQLite (sqlite-vec), and exposes search tools via the plugin API.
+**Runtime:** Bun (not Node.js)
+**Language:** TypeScript, ESM (`"type": "module"`)
+**Entry point:** `index.ts`
+## Commands
+```bash
+# Type-check the entire project
+bun run typecheck        # alias: bun run check
+# Run the plugin (typically via OpenCode host)
+bun run index.ts
+# Install dependencies
+bun install
+```
+- **No build step** — Bun runs TypeScript directly.
+- **No test framework** is configured. If you add tests, prefer `bun test` (built-in runner) with `*.test.ts` files co-located in `src/`.
+- **No linter/formatter** is configured. Follow the existing style (see below).
+## Project Structure
+```
+index.ts                  # Plugin entrypoint — wires tools + events
+src/
+  config.ts               # Plugin config schema + loader
+  runtime.ts              # Runtime setup helpers
+  types.d.ts              # Ambient type declarations
+  embedder/               # Embedding providers (openai.ts, ollama.ts, interface.ts)
+  chunker/                # Code chunking (tree-sitter + line-based fallback)
+  search/                 # Hybrid search (vector + BM25 ranking, context building)
+  store/                  # SQLite wrapper, schema.sql, FTS integration
+  indexer/                # Incremental indexing, delta scanning, GC, resume
+  tools/                  # Plugin tools: semantic_search, smart_grep, reindex, index_status
+```
+## Code Style
+### Formatting
+- **Double quotes** for strings (`"foo"`, not `'foo'`)
+- **Semicolons** at end of statements
+- **No trailing commas** in function params (match existing code)
+- ~100 char soft line limit; wrap long signatures naturally
+### Imports
+- Use `import type { ... }` for type-only imports (enforced by `verbatimModuleSyntax` in tsconfig)
+- Use `import { ... } from "..."` for value imports
+- Bare specifiers for npm packages; relative paths for local modules
+- No barrel files — import directly from the module that exports
+### Types
+- `strict: true` is on — all code must type-check cleanly
+- Prefer explicit return types on exported functions
+- Use `interface` for object shapes, `type` for unions/intersections
+- Avoid `any` — use `unknown` + narrowing, or specific types
+- `noUncheckedIndexedAccess: true` — always guard indexed access
+### Naming
+- **Files:** `snake_case.ts` (e.g., `semantic_search.ts`, `smart_grep.ts`)
+- **Variables/functions:** `camelCase`
+- **Types/interfaces/classes:** `PascalCase`
+- **Constants:** `camelCase` (no SCREAMING_CASE unless truly global config)
+- **SQL files:** `snake_case.sql`
+### Functions & Error Handling
+- Prefer `async/await` over raw Promises
+- Use early returns to reduce nesting
+- Wrap fallible operations in try/catch; log errors with context
+- Fallback-first pattern is common — degrade gracefully (e.g., tree-sitter → line chunking, semantic → ripgrep)
+### General Patterns
+- Small, focused modules — one concern per file
+- Factory functions for provider selection (see `createEmbedder` in `src/embedder/interface.ts`)
+- Class-based wrappers for stateful resources (see `SqliteStore` in `src/store/sqlite.ts`)
+- Config-driven behavior via `src/config.ts` defaults + deep merge
+- Keep utility functions local to the file that uses them; don't create shared util files unless truly shared
+## TypeScript Config Highlights
+Key `tsconfig.json` settings to be aware of:
+| Setting | Value | Impact |
+|---|---|---|
+| `strict` | `true` | Full strict mode |
+| `verbatimModuleSyntax` | `true` | Must use `import type` for type imports |
+| `noUncheckedIndexedAccess` | `true` | Index signatures return `T | undefined` |
+| `noFallthroughCasesInSwitch` | `true` | No silent switch fallthrough |
+| `noImplicitOverride` | `true` | Must use `override` keyword |
+| `moduleResolution` | `bundler` | Modern ESM resolution |
+Run `bun run typecheck` before submitting changes to catch all of the above.
+## Adding a New Tool
+1. Create `src/tools/your_tool.ts` exporting a tool definition
+2. Register it in the tools map in `index.ts`
+3. Follow the pattern of existing tools (see `semantic_search.ts` for reference)
+## Dependencies
+- `@opencode-ai/plugin` — Plugin API (types + runtime)
+- `web-tree-sitter` + `tree-sitter-wasms` — Syntax-aware code chunking
+- `sqlite-vec` — Vector similarity search in SQLite
+- `picomatch` — Glob pattern matching
+## Common Pitfalls
+- **`verbatimModuleSyntax`** — importing a type without `import type` will fail at runtime. Always check: is this value used at runtime, or only in type positions?
+- **`noUncheckedIndexedAccess`** — `arr[0]` returns `T | undefined`. Always guard with `if` or use `?.` before accessing properties.
+- **Bun, not Node** — don't use `node:` prefixed imports unless the module requires it. Prefer standard ESM imports.
+- **No barrel re-exports** — each module exports its own API. Import from the specific file, not an `index.ts` barrel.
+- **Config merging** — config uses deep merge. Partial configs are fine; missing keys fall back to defaults in `src/config.ts`.
+## Working with the Plugin API
+- Tools are defined as objects with `name`, `description`, `parameters`, and `execute` — see `@opencode-ai/plugin` types.
+- Event handlers (`onFileChanged`, etc.) are registered in `index.ts`.
+- The plugin receives a context object with access to the host filesystem and shell — use these instead of raw `fs`/`child_process`.
+- **TUI feedback:** `ToolContext.metadata()` updates inline tool-call cards during `execute`. For transient OpenCode TUI pop-ups, use the SDK HTTP client (`runtime.opencodeClient.tui.showToast` → POST `/tui/show-toast`); the plugin uses `src/tui_toast.ts` for throttled **progress** toasts during indexing (phase changes + ~2.2s cadence) and for **completion** toasts (session sync summary, reindex done).
+## SQL Conventions
+- Schema lives in `src/store/schema.sql` — edit there, then the `SqliteStore` class applies it on init.
+- Use parameterized queries (`db.query(sql).all(params)`) — never interpolate user input into SQL.
+- FTS5 tables mirror the main tables; keep column names in sync.
+## Debugging Tips
+- Run `bun run typecheck` first — catches most issues before runtime.
+- The plugin logs to stderr via the host; add `console.error(...)` for quick debugging.
+- SQLite DB is stored in the OpenCode data directory — inspect with `sqlite3` CLI if needed.
+- For embedding issues, test the provider directly (e.g., curl the Ollama/OpenAI endpoint) before debugging the plugin layer.
+## Verification Checklist
+Before submitting changes:
+1. `bun run typecheck` — must pass with zero errors
+2. Review `import type` usage — runtime imports must not be type-only
+3. Guard all indexed access (`arr[i]`, `obj[key]`) for `undefined`
+4. Confirm new files follow `snake_case.ts` naming
+5. If adding a tool, verify it's registered in `index.ts`
+## Learned User Preferences
+- For codebase search, prefer the plugin’s semantic path for conceptual or multi-word queries; rely on the ripgrep fallback for exact symbols, tight literals, or when the index or embedder is unavailable.
+## Learned Workspace Facts
+- OpenCode does not load plugins from `package.json` alone; a TypeScript shim under the project `.opencode/plugins/` directory or under `~/.config/opencode/plugins/` must import this plugin’s entry so the host merges its tools (see `README.md` / `SETUP.md`). End users can run `bunx opencode-semantic-search@latest` (after npm publish) or `bash install.sh` from a clone to generate that shim.
+- Tools registered on the plugin object use the keys `semantic_search`, `grep` (smart grep implementation in `src/tools/smart_grep.ts`), `index_status`, `reindex`, and `diagnostic_bundle`. Slash aliases (`/sem-status`, `/sem-search`, etc.) are purely LLM-driven: users add Markdown stubs under `.opencode/commands/` whose bodies tell the assistant which tool to call (see SETUP.md). The `command.execute.before` hook is not used because it does not suppress the LLM invocation in the current plugin API version.
+- Resolve bundled assets (e.g. tree-sitter WASM and packages under this plugin’s `node_modules`) from the plugin install directory (e.g. paths anchored with `import.meta.dir`), not `process.cwd()`, because the OpenCode host’s working directory is the user’s open project.
+- Config loading merges in order: built-in defaults, then optional `~/.config/opencode/semantic-search.json`, then optional `<worktree>/.opencode/semantic-search.json` (project overrides global for overlapping keys).
+- Restart OpenCode after changing merged `semantic-search.json` so the plugin reloads options (e.g. `indexing.embed_concurrency`, embedding settings); config is not hot-reloaded from disk while the host is running.
+- The logger supports a `log_file` config option (`logging.log_file` in `semantic-search.json`); it defaults to `~/.cache/opencode/semantic-search/plugin.log` and appends newline-delimited JSON entries. Set it to `null` or `""` in a config override to disable file logging.
+- The plugin uses `experimental.chat.system.transform` (fires on every LLM turn) to inject a live `## Semantic Code Search` block into the system prompt with index stats and usage guidance, in addition to `experimental.session.compacting` (fires only during context compaction). Both hooks live in `index.ts`.

package/README.md ADDED Viewed

@@ -0,0 +1,138 @@
+# opencode-semantic-search
+Local-first semantic search plugin for [OpenCode](https://opencode.ai), with smart `grep` routing and incremental indexing.
+## Quickstart
+### Prerequisites
+- [Bun](https://bun.sh) 1.1+
+- [OpenCode](https://opencode.ai)
+- [ripgrep](https://github.com/BurntSushi/ripgrep) (`rg`)
+- Either [Ollama](https://ollama.com) or an OpenAI-compatible embedding API
+### Install
+**Option A — `bunx` (recommended after the package is on npm)**
+Runs the same installer as `install.sh` from the published package (no clone):
+```bash
+# Global shim (~/.config/opencode/plugins) — default
+bunx opencode-semantic-search@latest
+# Project-local only (./.opencode/plugins)
+bunx opencode-semantic-search@latest install --local
+```
+Other flags match `install.sh` (see below).
+**Option B — Remote `install.sh` (curl)**
+```bash
+curl -fsSL https://raw.githubusercontent.com/jainprashul/opencode-semantic-search/main/install.sh | bash
+```
+Use `bash -s -- --local` after the pipe for a project-local shim instead of the default global install.
+**Option C — Git clone**
+```bash
+git clone https://github.com/jainprashul/opencode-semantic-search.git
+cd opencode-semantic-search
+# Global (default): ~/.config/opencode/plugins
+bash install.sh
+# Project-local only: ./.opencode/plugins
+bash install.sh --local
+```
+Common installer modes (clone or remote script):
+```bash
+# explicit global install (default)
+bash install.sh --global
+# project-local only
+bash install.sh --local
+# OpenAI embeddings instead of Ollama
+bash install.sh --openai-key-env OPENAI_API_KEY --skip-ollama
+# custom Ollama model
+bash install.sh --ollama-model nomic-embed-text
+# skip Ollama checks/model pull
+bash install.sh --skip-ollama
+```
+The script installs dependencies, writes a plugin shim, optionally pulls the Ollama model, writes config, and runs integration self-tests when the test scripts are present (git clone); npm installs skip that step.
+### Start
+```bash
+# Ollama users
+ollama serve
+# open your codebase
+cd /path/to/your/repo
+opencode
+```
+## Available tools
+- `semantic_search(query, top_k?, threshold?, path?)`
+- `grep(pattern|query, ...)` smart route: conceptual -> semantic, exact/regex -> `rg`
+- `index_status()` health and coverage stats
+- `reindex()` full index rebuild
+- `diagnostic_bundle()` JSON support bundle (provider, index, routing history)
+Optional **slash aliases** (`/sem-status`, `/sem-search`, etc.): add Markdown stubs under `.opencode/commands/` as described in [SETUP.md](SETUP.md#8-using-the-plugin-in-opencode). Each stub body is an LLM prompt that asks the assistant to call the matching tool.
+## Verify it works
+From this plugin repo:
+```bash
+bun run check
+bun run test:integration
+bun run diagnostic:bundle
+```
+Expected:
+- `bun run check` exits successfully (no TypeScript errors).
+- `bun run test:integration` prints JSON with `"ok":true` from both suites.
+- `bun run diagnostic:bundle` prints one JSON bundle with provider health, index stats, DB path, and recent routing outcomes.
+In OpenCode (after startup indexing):
+- `index_status()` should report `provider_healthy: true`, `files_indexed > 0`, `total_chunks > 0`.
+- `semantic_search("authentication flow")` should return JSON results with file paths and scores.
+- `grep("auth retry flow")` should return scored semantic matches (`score=...`) when provider/index are healthy.
+## Debugging pointers
+- No semantic results: ensure embedder is reachable (`ollama serve` or valid OpenAI key) and run `reindex()`.
+- `grep` behaving like plain text search: this is expected for exact/regex/single-token patterns.
+- Wrong results after changing embedding model/dimensions: run `reindex()`.
+- Persistent index issues: remove the project DB under `~/.cache/opencode/semantic-search/<project-hash>/embeddings.db`.
+## Publishing (maintainers)
+Bump `version` in `package.json`, then:
+```bash
+bun run typecheck
+npm publish --access public
+```
+`prepublishOnly` runs `typecheck` automatically. The npm package name is [`opencode-semantic-search`](https://www.npmjs.com/package/opencode-semantic-search).
+## Docs
+- `docs/ARCHITECTURE.md` architecture, data flow, index lifecycle, smart grep routing
+- `docs/CONFIG.md` full configuration reference + install script behavior
+- `docs/DEBUGGING.md` logging/diagnostics, troubleshooting, and verification playbook
+- `SETUP.md` extended setup walkthrough and notes