@pi-unipi/cocoindex 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,93 @@
1
+ # @pi-unipi/cocoindex
2
+
3
+ CocoIndex integration for Pi coding agent — AST-aware content indexing, semantic vector search, and incremental pipeline management.
4
+
5
+ ## Overview
6
+
7
+ Replaces the compactor's FTS5-based content indexing with [CocoIndex](https://cocoindex.io/), providing:
8
+
9
+ - **AST-aware code chunking** — language-aware splitting for code files
10
+ - **Semantic vector search** — find content by meaning, not just keywords
11
+ - **Incremental indexing** — only reprocesses changed files (delta-only)
12
+ - **LanceDB storage** — zero-config, local file-based vector database
13
+ - **Shared embeddings** — reuses memory package's OpenRouter API key and model
14
+
15
+ ## Prerequisites
16
+
17
+ 1. **Python 3.10+**
18
+ 2. **CocoIndex CLI**: `pip install cocoindex 'cocoindex[lancedb]'` (requires cocoindex >= 1.0)
19
+ 3. **LanceDB SDK** (optional, for search): `npm install @lancedb/lancedb`
20
+ 4. **Embedding API key** — configured via `/unipi:memory-settings`
21
+
22
+ ## Quick Start
23
+
24
+ ```
25
+ # 1. Initialize the pipeline (once per project)
26
+ /unipi:cocoindex-init
27
+
28
+ # 2. Index the project
29
+ /unipi:cocoindex-update
30
+
31
+ # 3. Search indexed content
32
+ cocoindex_search({ query: "how does authentication work?" })
33
+ ```
34
+
35
+ ## Architecture
36
+
37
+ ```
38
+ Project files ──→ localfs.walk_dir (recursive)
39
+
40
+
41
+ chunk_text (@coco.fn, memoized)
42
+
43
+
44
+ LanceDB target (via ContextKey)
45
+
46
+
47
+ Vector search → ranked results
48
+ ```
49
+
50
+ Uses cocoindex v1.0+ App/fn/mount API with:
51
+ - `@coco.lifespan` for async environment setup (LanceDB connection)
52
+ - `@coco.fn` for memoized processing functions
53
+ - `coco.mount()` / `coco.mount_target()` for component management
54
+ - `localfs.walk_dir` for file enumeration
55
+ - `lancedb.TableTarget` for row-level target state management
56
+
57
+ ## Tools
58
+
59
+ | Tool | Description |
60
+ |------|-------------|
61
+ | `cocoindex_search` | Search indexed content (semantic vector when available, LanceDB FTS when available, lexical fallback for text-only indexes) |
62
+ | `cocoindex_status` | Check indexing status, freshness, doc count |
63
+
64
+ ## Commands
65
+
66
+ | Command | Description |
67
+ |---------|-------------|
68
+ | `/unipi:cocoindex-update` | Run incremental indexing |
69
+ | `/unipi:cocoindex-status` | Show pipeline status |
70
+ | `/unipi:cocoindex-init` | Scaffold default pipeline |
71
+ | `/unipi:cocoindex-settings` | View configuration |
72
+
73
+ ## Configuration
74
+
75
+ - **Pipeline**: `.unipi/cocoindex/main.py` — auto-generated, fully customizable
76
+ - **Data store**: `.unipi/cocoindex/.lancedb/`
77
+ - **Embeddings**: `~/.unipi/memory/config.json` (shared with memory package)
78
+ - **Search fallback**: Existing text-only LanceDB tables remain searchable through a lexical scan fallback when no vector column or FTS index exists
79
+
80
+ ## What Changed from FTS5
81
+
82
+ This package replaces compactor's content indexing subsystem:
83
+
84
+ | Feature | Before (FTS5) | After (CocoIndex) |
85
+ |---------|---------------|-------------------|
86
+ | Chunking | Heading/paragraph | AST-aware recursive |
87
+ | Search | BM25 + trigram | Vector + full-text |
88
+ | Incremental | No (full re-index) | Yes (delta-only) |
89
+ | Storage | SQLite FTS5 | LanceDB |
90
+
91
+ ## Status
92
+
93
+ ⚠️ **Experimental** — This is an `experiment/cocoindex` branch feature. Not yet merged to main.