npm - purecontext-mcp - Versions diffs - 1.1.1 → 1.1.2 - Mend

purecontext-mcp 1.1.1 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/package.json +1 -1
package/docs/dev/API_STABILITY.md +0 -319
package/docs/dev/DECISIONS.md +0 -22
package/docs/dev/DOCUMENTATION_PLAN.md +0 -113
package/docs/dev/PHASE10_TASKS.md +0 -476
package/docs/dev/PHASE11_TASKS.md +0 -385
package/docs/dev/PHASE12_TASKS.md +0 -335
package/docs/dev/PHASE13_TASKS.md +0 -381
package/docs/dev/PHASE14_TASKS.md +0 -371
package/docs/dev/PHASE15_TASKS.md +0 -256
package/docs/dev/PHASE16_TASKS.md +0 -314
package/docs/dev/PHASE17_TASKS.md +0 -321
package/docs/dev/PHASE18_TASKS.md +0 -345
package/docs/dev/PHASE19_TASKS.md +0 -261
package/docs/dev/PHASE1_TASKS.md +0 -443
package/docs/dev/PHASE20_TASKS.md +0 -280
package/docs/dev/PHASE21_TASKS.md +0 -355
package/docs/dev/PHASE22_TASKS.md +0 -371
package/docs/dev/PHASE23_TASKS.md +0 -274
package/docs/dev/PHASE24_TASKS.md +0 -326
package/docs/dev/PHASE25_TASKS.md +0 -452
package/docs/dev/PHASE26_TASKS.md +0 -253
package/docs/dev/PHASE27_TASKS.md +0 -410
package/docs/dev/PHASE2_TASKS.md +0 -328
package/docs/dev/PHASE3_TASKS.md +0 -571
package/docs/dev/PHASE4_TASKS.md +0 -531
package/docs/dev/PHASE5_TASKS.md +0 -835
package/docs/dev/PHASE6_TASKS.md +0 -347
package/docs/dev/PHASE7_TASKS.md +0 -257
package/docs/dev/PHASE8_TASKS.md +0 -299
package/docs/dev/PHASE9_TASKS.md +0 -320
package/docs/dev/PureContext_MCP_PRD_v1.0.docx +0 -0
package/docs/dev/SELF_HOSTING.md +0 -142
package/docs/dev/TEAM_SETUP.md +0 -316
package/docs/dev/TELEMETRY.md +0 -99
package/docs/dev/feature-analysis.md +0 -305
package/docs/dev/phase-1-notes.md +0 -3

package/docs/dev/PHASE11_TASKS.md DELETED Viewed

@@ -1,385 +0,0 @@
-# Phase 11 — Task Breakdown
-**Goal**: Implement approximate nearest-neighbor semantic search using HNSW (Hierarchical Navigable Small Worlds) for repositories with more than 50,000 symbols. This enables AI agents to find conceptually related code even when keyword search fails.
-**Scope rationale**: Large codebases can have tens or hundreds of thousands of symbols. Traditional FTS (full-text search) based on keyword matching becomes less effective at scale. Semantic search using embeddings and HNSW allows finding symbols by meaning, not just lexical match. The PRD Section 4.7 (Phase 41 item) identified this as a scalability enhancement.
-**Approach**: Tasks build from embedding infrastructure through HNSW indexing to hybrid search integration. The system degrades gracefully — repos under 50k symbols continue using FTS.
----
-## Task 90: Embedding Provider Abstraction
-Create an abstraction layer for generating embeddings that supports multiple providers.
-**Deliverables:**
-- `src/semantic/embedding-provider.ts`
-  - `interface EmbeddingProvider`
-    ```typescript
-    interface EmbeddingProvider {
-      name: string;
-      dimension: number;
-      maxTokens: number;
-      embed(texts: string[]): Promise<Float32Array[]>;
-    }
-    ```
-  - `AnthropicEmbedding` — implements `EmbeddingProvider`
-    - Uses Claude's embedding endpoint (when available)
-    - Dimension: 1024
-    - Batch size: 100 texts per call
-    - API key from `config.ai.apiKey` or `ANTHROPIC_API_KEY` env var
-  - `OpenAIEmbedding` — implements `EmbeddingProvider`
-    - Uses `text-embedding-3-small` model
-    - Dimension: 1536
-    - Batch size: 2048 texts per call (OpenAI limit)
-    - API key from `config.ai.openaiApiKey` or `OPENAI_API_KEY`
-  - `LocalEmbedding` — implements `EmbeddingProvider`
-    - Uses a local embedding model via Ollama or similar
-    - Endpoint from `config.ai.localEmbeddingEndpoint`
-    - Dimension: depends on model (typically 384–1024)
-    - No API key required
-  - `createEmbeddingProvider(config: Config): EmbeddingProvider`
-    - Factory function that returns the configured provider
-    - Throws `PureContextError` if no provider configured
-- `src/semantic/text-preparation.ts`
-  - `prepareSymbolText(symbol: SymbolRecord): string`
-    - Combines name, signature, summary, and docstring into a single text
-    - Format: `{name}: {signature}\n{summary}\n{docstring}`
-    - Truncate to provider's `maxTokens` limit
-  - `prepareBatch(symbols: SymbolRecord[]): string[]`
-    - Prepares multiple symbols for batch embedding
-- Update `src/config/config-schema.ts`:
-  - `semantic.provider`: `'anthropic' | 'openai' | 'local' | 'none'` (default: `'none'`)
-  - `semantic.localEmbeddingEndpoint`: `string | null`
-  - `semantic.dimension`: `number` (auto-detected from provider if not set)
-  - `semantic.enabled`: `boolean` (default: `false`)
-**Key technical notes:**
-- Embeddings are `Float32Array` for memory efficiency
-- Batch embedding reduces API calls and improves throughput
-- Local providers (Ollama) enable offline operation
-- Text preparation is consistent across providers for comparable embeddings
-**Verify:** Configure OpenAI provider. Embed a batch of 10 symbol texts. Verify returned arrays are correct dimension. Verify batch sizes respected.
-**Tests:** Text preparation: verify format and truncation. Provider factory: verify correct provider selected. Batch embedding: mock API, verify request format. Error handling: API failure returns error, not crash.
----
-## Task 91: HNSW Index Implementation
-Implement HNSW indexing for fast approximate nearest-neighbor search.
-**Deliverables:**
-- `src/semantic/hnsw-index.ts`
-  - Uses `hnswlib-node` package (Node.js bindings for hnswlib)
-  - `HNSWIndex` class:
-    ```typescript
-    class HNSWIndex {
-      constructor(dimension: number, maxElements: number);
-      add(ids: string[], vectors: Float32Array[]): void;
-      search(query: Float32Array, k: number): SearchResult[];
-      save(path: string): void;
-      load(path: string): void;
-      remove(ids: string[]): void;
-      size(): number;
-    }
-    interface SearchResult {
-      id: string;
-      distance: number;
-    }
-    ```
-  - HNSW parameters:
-    - `M = 16` (connections per node)
-    - `efConstruction = 200` (build-time quality)
-    - `efSearch = 100` (search-time quality)
-    - Distance metric: cosine similarity (L2-normalized vectors)
-  - Index persistence: `~/.purecontext/indexes/{repoId}/hnsw.idx`
-  - Incremental updates: add new vectors, mark removed IDs
-- `src/semantic/vector-store.ts`
-  - `VectorStore` class:
-    ```typescript
-    class VectorStore {
-      constructor(repoId: string, provider: EmbeddingProvider);
-      indexSymbols(symbols: SymbolRecord[]): Promise<void>;
-      searchSimilar(query: string, k: number): Promise<SymbolRecord[]>;
-      removeSymbols(ids: string[]): void;
-      rebuild(): Promise<void>;
-    }
-    ```
-  - Coordinates embedding generation and HNSW indexing
-  - Maintains mapping from symbol IDs to HNSW internal IDs
-  - Handles incremental indexing (new symbols) and removal
-- `src/core/db/embedding-store.ts`
-  - SQLite table `embeddings`:
-    ```sql
-    CREATE TABLE embeddings (
-      symbol_id TEXT PRIMARY KEY,
-      repo_id TEXT NOT NULL,
-      embedding BLOB NOT NULL,  -- Float32Array as binary
-      dimension INTEGER NOT NULL,
-      provider TEXT NOT NULL,
-      created_at TEXT NOT NULL,
-      FOREIGN KEY (symbol_id) REFERENCES symbols(id)
-    );
-    CREATE INDEX idx_embeddings_repo ON embeddings(repo_id);
-    ```
-  - `saveEmbeddings(repoId: string, embeddings: Map<string, Float32Array>): void`
-  - `loadEmbeddings(repoId: string): Map<string, Float32Array>`
-  - `deleteEmbeddings(symbolIds: string[]): void`
-  - Embeddings stored in SQLite for persistence across restarts
-**Key technical notes:**
-- HNSW provides O(log n) search vs O(n) for brute force — critical for large repos
-- Embeddings are stored both in SQLite (persistence) and HNSW index (fast search)
-- Incremental updates avoid full re-embedding on code changes
-- hnswlib-node is a native addon — verify Windows/Mac/Linux compatibility
-**Verify:** Create HNSW index with 10,000 random vectors. Search for 5 nearest neighbors. Verify results are plausible (same vectors return themselves). Save/load index and verify persistence.
-**Tests:** Add vectors: verify insertion. Search: verify k results returned. Remove: verify deleted vectors not returned. Persistence: save, reload, search again. Incremental: add more vectors after initial build.
----
-## Task 92: Semantic Indexing Pipeline
-Integrate semantic indexing into the main indexing pipeline for large repos.
-**Deliverables:**
-- Update `src/core/index-manager.ts`:
-  - After symbol extraction: check if `config.semantic.enabled` and symbol count > 50,000
-  - If yes, trigger semantic indexing via `VectorStore.indexSymbols()`
-  - Track semantic indexing status in repo metadata
-  - Progress reporting: emit events for embedding progress
-  - Batch embedding: process symbols in batches of 500
-- `src/semantic/semantic-indexer.ts`
-  - `SemanticIndexer` class:
-    ```typescript
-    class SemanticIndexer {
-      constructor(config: Config);
-      shouldIndex(repoId: string, symbolCount: number): boolean;
-      index(repoId: string, symbols: SymbolRecord[]): Promise<IndexResult>;
-      updateIncremental(repoId: string, added: SymbolRecord[], removed: string[]): Promise<void>;
-    }
-    interface IndexResult {
-      symbolsIndexed: number;
-      embeddingTimeMs: number;
-      indexBuildTimeMs: number;
-      indexSizeBytes: number;
-    }
-    ```
-  - Orchestrates the full semantic indexing flow
-  - Handles rate limiting for API-based embedding providers
-  - Reports progress via logger
-- Update incremental indexing:
-  - When symbols are added: embed and add to HNSW
-  - When symbols are removed: remove from HNSW and delete embeddings
-  - When symbols are modified: re-embed and update HNSW
-- Add config options:
-  - `semantic.threshold`: minimum symbol count to enable (default: 50000)
-  - `semantic.batchSize`: embedding batch size (default: 500)
-  - `semantic.concurrency`: parallel embedding batches (default: 2)
-**Key technical notes:**
-- Embedding API calls are the bottleneck — batch and parallelize
-- Rate limiting prevents API quota exhaustion
-- Progress events enable UI feedback for long indexing operations
-- Threshold of 50k is configurable — lower for testing, higher for cost control
-**Verify:** Index a large fixture project (50k+ symbols generated). Verify semantic indexing triggered. Verify embeddings stored in SQLite. Verify HNSW index saved to disk.
-**Tests:** Threshold check: 40k symbols (skip), 60k symbols (index). Batch processing: verify correct batch sizes. Incremental update: add 100 symbols, verify HNSW updated. Progress events: verify emission.
----
-## Task 93: Hybrid Search Implementation
-Implement hybrid search that combines keyword FTS with semantic similarity.
-**Deliverables:**
-- `src/semantic/hybrid-search.ts`
-  - `HybridSearcher` class:
-    ```typescript
-    class HybridSearcher {
-      constructor(repoId: string, vectorStore: VectorStore, symbolStore: SymbolStore);
-      search(query: string, options: SearchOptions): Promise<SearchResult[]>;
-    }
-    interface SearchOptions {
-      maxResults: number;
-      semanticWeight: number;  // 0.0–1.0, default 0.5
-      keywordWeight: number;   // 0.0–1.0, default 0.5
-      threshold: number;       // minimum combined score
-    }
-    interface SearchResult {
-      symbol: SymbolRecord;
-      keywordScore: number;
-      semanticScore: number;
-      combinedScore: number;
-    }
-    ```
-  - Search algorithm:
-    1. Run FTS search → get top N keyword matches
-    2. Run semantic search → get top N vector matches
-    3. Merge results using Reciprocal Rank Fusion (RRF):
-       `score = keywordWeight * (1 / (k + keywordRank)) + semanticWeight * (1 / (k + semanticRank))`
-       where `k = 60` (standard RRF constant)
-    4. Sort by combined score, return top `maxResults`
-- Update `src/server/tools/search-symbols.ts`:
-  - Add `mode` parameter: `'keyword' | 'semantic' | 'hybrid'`
-  - Default: `'hybrid'` if semantic index exists, else `'keyword'`
-  - Add `semantic_weight` and `keyword_weight` parameters
-  - Return scores in result metadata
-- `src/semantic/query-expansion.ts`
-  - `expandQuery(query: string): string[]`
-    - Generate query variations for better semantic coverage
-    - Techniques: lemmatization, synonym expansion, abbreviation expansion
-    - Example: "auth" → ["auth", "authentication", "authorize", "authorization"]
-  - Query expansion is optional and lightweight (no API calls)
-**Key technical notes:**
-- RRF is robust fusion method that handles different score distributions
-- Keyword search handles exact matches; semantic handles conceptual matches
-- Weights allow tuning for different use cases (code review vs exploration)
-- Query expansion improves recall without adding latency
-**Verify:** Index a large project with semantic. Search for "authentication" with hybrid mode. Verify results include both exact matches (keyword) and conceptually related symbols (semantic). Compare to keyword-only search.
-**Tests:** RRF calculation: verify score formula. Hybrid merge: overlapping results handled correctly. Weight adjustment: semantic_weight=1.0 returns semantic results only. Graceful fallback: no semantic index → keyword only.
----
-## Task 94: Semantic Search Tool
-Add a dedicated semantic search tool for advanced use cases.
-**Deliverables:**
-- `src/server/tools/search-semantic.ts`
-  - Tool name: `search-semantic`
-  - Input schema:
-    ```json
-    {
-      "repo": { "type": "string" },
-      "query": { "type": "string" },
-      "mode": { "type": "string", "enum": ["semantic", "hybrid"], "default": "hybrid" },
-      "semantic_weight": { "type": "number", "default": 0.5 },
-      "keyword_weight": { "type": "number", "default": 0.5 },
-      "max_results": { "type": "number", "default": 10 },
-      "kind": { "type": "string", "description": "Filter by symbol kind" },
-      "file_pattern": { "type": "string" }
-    }
-    ```
-  - Output includes similarity scores:
-    ```json
-    {
-      "results": [
-        {
-          "id": "abc123",
-          "name": "validateCredentials",
-          "kind": "function",
-          "file": "src/auth/validator.ts",
-          "signature": "function validateCredentials(user: User): boolean",
-          "scores": {
-            "keyword": 0.0,
-            "semantic": 0.92,
-            "combined": 0.46
-          }
-        }
-      ],
-      "_meta": {
-        "mode": "hybrid",
-        "semantic_index_size": 75000,
-        "query_embedding_ms": 45,
-        "search_ms": 12
-      }
-    }
-    ```
-  - Register in `src/server/mcp-server.ts`
-- Update `search-symbols` tool:
-  - Automatically use hybrid search when semantic index exists
-  - Add `"mode"` to output `_meta` to indicate which search mode used
-  - No breaking changes to existing behavior
-**Key technical notes:**
-- Dedicated tool allows explicit semantic-only search
-- Updated `search-symbols` gets hybrid upgrade transparently
-- Timing metadata helps diagnose slow searches
-**Verify:** Run `search-semantic` on a semantically-indexed repo. Verify results include semantic scores. Verify `search-symbols` automatically uses hybrid when available.
-**Tests:** Semantic-only search: keyword_weight=0. Hybrid search: both weights > 0. Fallback: repo without semantic index returns error. Filtering: kind and file_pattern filters work.
----
-## Task 95: Phase 11 Test Fixtures and Integration Tests
-Validate the complete semantic search pipeline.
-**Deliverables:**
-- `test/fixtures/large-project/` — generated project with 60,000+ symbols
-  - Script to generate: create synthetic TypeScript files with functions/classes
-  - Realistic naming: `user_`, `auth_`, `payment_`, `order_`, `product_` prefixes
-  - Each symbol has signature and docstring
-  - Used only for semantic tests (not run in normal test suite due to size)
-- Integration tests `test/integration/phase11.test.ts`:
-  1. Embedding provider: generate embeddings for 10 symbols, verify dimensions
-  2. HNSW index: add 1000 vectors, search, verify nearest neighbors
-  3. Semantic indexer: threshold check, batch processing
-  4. Hybrid search: combine keyword + semantic results
-  5. RRF scoring: verify combined scores
-  6. Incremental update: add symbols, verify HNSW updated
-  7. Remove symbols: verify removed from index
-  8. Persistence: save HNSW index, reload, search again
-  9. Graceful fallback: semantic disabled → keyword only
-  10. Full suite regression: all Phase 1–10 tests still green
-- Performance benchmarks:
-  - Embedding latency: < 50ms per symbol (batched)
-  - HNSW search: < 10ms for k=10 in 100k vector index
-  - Hybrid search: < 100ms total
-**Verify:** `npm run test` passes. Semantic tests run in separate suite due to API/size requirements.
----
-## Order of Execution
-```
-Task 90: Embedding provider abstraction    ██░░░░░░░░ Foundation
-Task 91: HNSW index implementation         ████░░░░░░ Index
-Task 92: Semantic indexing pipeline        ██████░░░░ Integration
-Task 93: Hybrid search implementation      ████████░░ Search
-Task 94: Semantic search tool              █████████░ Tool          DONE
-Task 95: Fixtures + integration tests      ██████████ Polish        DONE
-```
-Tasks proceed in order: embedding provider (90) is needed for HNSW (91), which is needed for the indexing pipeline (92), which enables hybrid search (93) and the search tool (94).
----
-## Post-Phase 11: What Comes Next
-Phase 11 adds powerful semantic search capabilities for large codebases. Remaining phases:
-- **Phase 12**: Rate limiting and multi-tenant auth for hosted deployments
-- **Phase 13**: Web UI for exploring the symbol graph visually