cocoindex-code 0.1.14__tar.gz → 0.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/.gitignore +5 -2
- cocoindex_code-0.2.1/PKG-INFO +503 -0
- cocoindex_code-0.2.1/README.md +464 -0
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/pyproject.toml +11 -1
- cocoindex_code-0.2.1/src/cocoindex_code/__init__.py +10 -0
- cocoindex_code-0.2.1/src/cocoindex_code/_version.py +34 -0
- cocoindex_code-0.2.1/src/cocoindex_code/cli.py +549 -0
- cocoindex_code-0.2.1/src/cocoindex_code/client.py +443 -0
- cocoindex_code-0.2.1/src/cocoindex_code/daemon.py +633 -0
- cocoindex_code-0.2.1/src/cocoindex_code/indexer.py +220 -0
- cocoindex_code-0.2.1/src/cocoindex_code/project.py +124 -0
- cocoindex_code-0.2.1/src/cocoindex_code/protocol.py +184 -0
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/src/cocoindex_code/query.py +12 -14
- cocoindex_code-0.2.1/src/cocoindex_code/server.py +342 -0
- cocoindex_code-0.2.1/src/cocoindex_code/settings.py +332 -0
- cocoindex_code-0.2.1/src/cocoindex_code/shared.py +88 -0
- cocoindex_code-0.1.14/PKG-INFO +0 -428
- cocoindex_code-0.1.14/README.md +0 -393
- cocoindex_code-0.1.14/src/cocoindex_code/__init__.py +0 -11
- cocoindex_code-0.1.14/src/cocoindex_code/indexer.py +0 -168
- cocoindex_code-0.1.14/src/cocoindex_code/server.py +0 -249
- cocoindex_code-0.1.14/src/cocoindex_code/shared.py +0 -88
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/LICENSE +0 -0
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/src/cocoindex_code/__main__.py +0 -0
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/src/cocoindex_code/config.py +0 -0
- {cocoindex_code-0.1.14 → cocoindex_code-0.2.1}/src/cocoindex_code/schema.py +0 -0
|
@@ -0,0 +1,503 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: cocoindex-code
|
|
3
|
+
Version: 0.2.1
|
|
4
|
+
Summary: MCP server for indexing and querying codebases using CocoIndex
|
|
5
|
+
Project-URL: Homepage, https://github.com/cocoindex-io/cocoindex-code
|
|
6
|
+
Project-URL: Repository, https://github.com/cocoindex-io/cocoindex-code
|
|
7
|
+
Project-URL: Issues, https://github.com/cocoindex-io/cocoindex-code/issues
|
|
8
|
+
License-Expression: Apache-2.0
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: cocoindex,codebase,indexing,mcp,vector-search
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
19
|
+
Requires-Python: >=3.11
|
|
20
|
+
Requires-Dist: cocoindex[litellm]==1.0.0a32
|
|
21
|
+
Requires-Dist: einops>=0.8.2
|
|
22
|
+
Requires-Dist: mcp>=1.0.0
|
|
23
|
+
Requires-Dist: msgspec>=0.19.0
|
|
24
|
+
Requires-Dist: numpy>=1.24.0
|
|
25
|
+
Requires-Dist: pathspec>=0.12.1
|
|
26
|
+
Requires-Dist: pydantic>=2.0.0
|
|
27
|
+
Requires-Dist: pyyaml>=6.0
|
|
28
|
+
Requires-Dist: sentence-transformers>=2.2.0
|
|
29
|
+
Requires-Dist: sqlite-vec>=0.1.0
|
|
30
|
+
Requires-Dist: typer>=0.9.0
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: mypy>=1.0.0; extra == 'dev'
|
|
33
|
+
Requires-Dist: prek>=0.1.0; extra == 'dev'
|
|
34
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
|
|
35
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
|
|
36
|
+
Requires-Dist: pytest>=7.0.0; extra == 'dev'
|
|
37
|
+
Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
|
38
|
+
Description-Content-Type: text/markdown
|
|
39
|
+
|
|
40
|
+
<p align="center">
|
|
41
|
+
<img width="2428" alt="cocoindex code" src="https://github.com/user-attachments/assets/d05961b4-0b7b-42ea-834a-59c3c01717ca" />
|
|
42
|
+
</p>
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
<h1 align="center">AST-based semantic code search that just works</h1>
|
|
46
|
+
|
|
47
|
+

|
|
48
|
+
|
|
49
|
+
|
|
50
|
+
A lightweight, effective **(AST-based)** semantic code search tool for your codebase. Built on [CocoIndex](https://github.com/cocoindex-io/cocoindex) — a Rust-based ultra performant data transformation engine. Use it from the CLI, or integrate with Claude, Codex, Cursor — any coding agent — via [Skill](#skill-recommended) or [MCP](#mcp-server).
|
|
51
|
+
|
|
52
|
+
- Instant token saving by 70%.
|
|
53
|
+
- **1 min setup** — install and go, zero config needed!
|
|
54
|
+
|
|
55
|
+
<div align="center">
|
|
56
|
+
|
|
57
|
+
[](https://discord.com/invite/zpA9S2DR7s)
|
|
58
|
+
[](https://github.com/cocoindex-io/cocoindex)
|
|
59
|
+
[](https://cocoindex.io/docs/getting_started/quickstart)
|
|
60
|
+
[](https://opensource.org/licenses/Apache-2.0)
|
|
61
|
+
<!--[](https://pypistats.org/packages/cocoindex) -->
|
|
62
|
+
[](https://pepy.tech/projects/cocoindex)
|
|
63
|
+
[](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml)
|
|
64
|
+
[](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml)
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
🌟 Please help star [CocoIndex](https://github.com/cocoindex-io/cocoindex) if you like this project!
|
|
68
|
+
|
|
69
|
+
[Deutsch](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=de) |
|
|
70
|
+
[English](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=en) |
|
|
71
|
+
[Español](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=es) |
|
|
72
|
+
[français](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=fr) |
|
|
73
|
+
[日本語](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ja) |
|
|
74
|
+
[한국어](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ko) |
|
|
75
|
+
[Português](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=pt) |
|
|
76
|
+
[Русский](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ru) |
|
|
77
|
+
[中文](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=zh)
|
|
78
|
+
|
|
79
|
+
</div>
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
## Get Started — zero config, let's go!
|
|
83
|
+
|
|
84
|
+
### Install
|
|
85
|
+
|
|
86
|
+
Using [pipx](https://pipx.pypa.io/stable/installation/):
|
|
87
|
+
```bash
|
|
88
|
+
pipx install cocoindex-code # first install
|
|
89
|
+
pipx upgrade cocoindex-code # upgrade
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Using [uv](https://docs.astral.sh/uv/getting-started/installation/):
|
|
93
|
+
```bash
|
|
94
|
+
uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
The default embedding model runs locally ([sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)) — no API key required, completely free.
|
|
98
|
+
|
|
99
|
+
Next, set up your [coding agent integration](#coding-agent-integration) — or jump to [Manual CLI Usage](#manual-cli-usage) if you prefer direct control.
|
|
100
|
+
|
|
101
|
+
## Coding Agent Integration
|
|
102
|
+
|
|
103
|
+
### Skill (Recommended)
|
|
104
|
+
|
|
105
|
+
Install the `ccc` skill so your coding agent automatically uses semantic search when needed:
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
npx skills add cocoindex-io/cocoindex-code
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
That's it — no `ccc init` or `ccc index` needed. The skill teaches the agent to handle initialization, indexing, and searching on its own. It will automatically keep the index up to date as you work.
|
|
112
|
+
|
|
113
|
+
The agent uses semantic search automatically when it would be helpful. You can also nudge it explicitly — just ask it to search the codebase, e.g. *"find how user sessions are managed"*, or type `/ccc` to invoke the skill directly.
|
|
114
|
+
|
|
115
|
+
Works with [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and other skill-compatible agents.
|
|
116
|
+
|
|
117
|
+
### MCP Server
|
|
118
|
+
|
|
119
|
+
Alternatively, use `ccc mcp` to run as an MCP server:
|
|
120
|
+
|
|
121
|
+
<details>
|
|
122
|
+
<summary>Claude Code</summary>
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
claude mcp add cocoindex-code -- ccc mcp
|
|
126
|
+
```
|
|
127
|
+
</details>
|
|
128
|
+
|
|
129
|
+
<details>
|
|
130
|
+
<summary>Codex</summary>
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
codex mcp add cocoindex-code -- ccc mcp
|
|
134
|
+
```
|
|
135
|
+
</details>
|
|
136
|
+
|
|
137
|
+
<details>
|
|
138
|
+
<summary>OpenCode</summary>
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
opencode mcp add
|
|
142
|
+
```
|
|
143
|
+
Enter MCP server name: `cocoindex-code`
|
|
144
|
+
Select MCP server type: `local`
|
|
145
|
+
Enter command to run: `ccc mcp`
|
|
146
|
+
|
|
147
|
+
Or use opencode.json:
|
|
148
|
+
```json
|
|
149
|
+
{
|
|
150
|
+
"$schema": "https://opencode.ai/config.json",
|
|
151
|
+
"mcp": {
|
|
152
|
+
"cocoindex-code": {
|
|
153
|
+
"type": "local",
|
|
154
|
+
"command": [
|
|
155
|
+
"ccc", "mcp"
|
|
156
|
+
]
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
}
|
|
160
|
+
```
|
|
161
|
+
</details>
|
|
162
|
+
|
|
163
|
+
Once configured, the agent automatically decides when semantic code search is helpful — finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names.
|
|
164
|
+
|
|
165
|
+
> **Note:** The `cocoindex-code` command (without subcommand) still works as an MCP server for backward compatibility. It auto-creates settings from environment variables on first run.
|
|
166
|
+
|
|
167
|
+
<details>
|
|
168
|
+
<summary>MCP Tool Reference</summary>
|
|
169
|
+
|
|
170
|
+
When running as an MCP server (`ccc mcp`), the following tool is exposed:
|
|
171
|
+
|
|
172
|
+
**`search`** — Search the codebase using semantic similarity.
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
search(
|
|
176
|
+
query: str, # Natural language query or code snippet
|
|
177
|
+
limit: int = 5, # Maximum results (1-100)
|
|
178
|
+
offset: int = 0, # Pagination offset
|
|
179
|
+
refresh_index: bool = True, # Refresh index before querying
|
|
180
|
+
languages: list[str] | None = None, # Filter by language (e.g. ["python", "typescript"])
|
|
181
|
+
paths: list[str] | None = None, # Filter by path glob (e.g. ["src/utils/*"])
|
|
182
|
+
)
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Returns matching code chunks with file path, language, code content, line numbers, and similarity score.
|
|
186
|
+
</details>
|
|
187
|
+
|
|
188
|
+
## Manual CLI Usage
|
|
189
|
+
|
|
190
|
+
You can also use the CLI directly — useful for manual control, running indexing after changing settings, checking status, or searching outside an agent.
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
ccc init # initialize project (creates settings)
|
|
194
|
+
ccc index # build the index
|
|
195
|
+
ccc search "authentication logic" # search!
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
The background daemon starts automatically on first use.
|
|
199
|
+
|
|
200
|
+
> **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you can skip straight to indexing.
|
|
201
|
+
|
|
202
|
+
### CLI Reference
|
|
203
|
+
|
|
204
|
+
| Command | Description |
|
|
205
|
+
|---------|-------------|
|
|
206
|
+
| `ccc init` | Initialize a project — creates settings files, adds `.cocoindex_code/` to `.gitignore` |
|
|
207
|
+
| `ccc index` | Build or update the index (auto-inits if needed). Shows streaming progress. |
|
|
208
|
+
| `ccc search <query>` | Semantic search across the codebase |
|
|
209
|
+
| `ccc status` | Show index stats (chunk count, file count, language breakdown) |
|
|
210
|
+
| `ccc mcp` | Run as MCP server in stdio mode |
|
|
211
|
+
| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. |
|
|
212
|
+
| `ccc daemon status` | Show daemon version, uptime, and loaded projects |
|
|
213
|
+
| `ccc daemon restart` | Restart the background daemon |
|
|
214
|
+
| `ccc daemon stop` | Stop the daemon |
|
|
215
|
+
|
|
216
|
+
### Search Options
|
|
217
|
+
|
|
218
|
+
```bash
|
|
219
|
+
ccc search database schema # basic search
|
|
220
|
+
ccc search --lang python --lang markdown schema # filter by language
|
|
221
|
+
ccc search --path 'src/utils/*' query handler # filter by path
|
|
222
|
+
ccc search --offset 10 --limit 5 database schema # pagination
|
|
223
|
+
ccc search --refresh database schema # update index first, then search
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
By default, `ccc search` scopes results to your current working directory (relative to the project root). Use `--path` to override.
|
|
227
|
+
|
|
228
|
+
## Features
|
|
229
|
+
- **Semantic Code Search**: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.
|
|
230
|
+
- **Ultra Performant**: ⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex). Only re-indexes changed files for fast updates.
|
|
231
|
+
- **Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.
|
|
232
|
+
- **Embedded**: Portable and just works, no database setup required!
|
|
233
|
+
- **Flexible Embeddings**: Local SentenceTransformers by default (free!) or 100+ cloud providers via LiteLLM.
|
|
234
|
+
|
|
235
|
+
## Configuration
|
|
236
|
+
|
|
237
|
+
Configuration lives in two YAML files, both created automatically by `ccc init`.
|
|
238
|
+
|
|
239
|
+
### User Settings (`~/.cocoindex_code/global_settings.yml`)
|
|
240
|
+
|
|
241
|
+
Shared across all projects. Controls the embedding model and environment variables for the daemon.
|
|
242
|
+
|
|
243
|
+
```yaml
|
|
244
|
+
embedding:
|
|
245
|
+
provider: sentence-transformers # or "litellm"
|
|
246
|
+
model: sentence-transformers/all-MiniLM-L6-v2
|
|
247
|
+
device: mps # optional: cpu, cuda, mps (auto-detected if omitted)
|
|
248
|
+
|
|
249
|
+
envs: # extra environment variables for the daemon
|
|
250
|
+
OPENAI_API_KEY: your-key # only needed if not already in your shell environment
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
> **Note:** The daemon inherits your shell environment. If an API key (e.g. `OPENAI_API_KEY`) is already set as an environment variable, you don't need to duplicate it in `envs`. The `envs` field is only for values that aren't in your environment.
|
|
254
|
+
|
|
255
|
+
### Project Settings (`<project>/.cocoindex_code/settings.yml`)
|
|
256
|
+
|
|
257
|
+
Per-project. Controls which files to index.
|
|
258
|
+
|
|
259
|
+
```yaml
|
|
260
|
+
include_patterns:
|
|
261
|
+
- "**/*.py"
|
|
262
|
+
- "**/*.js"
|
|
263
|
+
- "**/*.ts"
|
|
264
|
+
- "**/*.rs"
|
|
265
|
+
- "**/*.go"
|
|
266
|
+
# ... (sensible defaults for 28+ file types)
|
|
267
|
+
|
|
268
|
+
exclude_patterns:
|
|
269
|
+
- "**/.*" # hidden directories
|
|
270
|
+
- "**/__pycache__"
|
|
271
|
+
- "**/node_modules"
|
|
272
|
+
- "**/dist"
|
|
273
|
+
# ...
|
|
274
|
+
|
|
275
|
+
language_overrides:
|
|
276
|
+
- ext: inc # treat .inc files as PHP
|
|
277
|
+
lang: php
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
> `.cocoindex_code/` is automatically added to `.gitignore` during init.
|
|
281
|
+
|
|
282
|
+
## Embedding Models
|
|
283
|
+
|
|
284
|
+
By default, a local SentenceTransformers model ([sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)) is used — no API key required. To use a different model, edit `~/.cocoindex_code/global_settings.yml`.
|
|
285
|
+
|
|
286
|
+
> The `envs` entries below are only needed if the key isn't already in your shell environment — the daemon inherits your environment automatically.
|
|
287
|
+
|
|
288
|
+
<details>
|
|
289
|
+
<summary>Ollama (Local)</summary>
|
|
290
|
+
|
|
291
|
+
```yaml
|
|
292
|
+
embedding:
|
|
293
|
+
model: ollama/nomic-embed-text
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
Set `OLLAMA_API_BASE` in `envs:` if your Ollama server is not at `http://localhost:11434`.
|
|
297
|
+
|
|
298
|
+
</details>
|
|
299
|
+
|
|
300
|
+
<details>
|
|
301
|
+
<summary>OpenAI</summary>
|
|
302
|
+
|
|
303
|
+
```yaml
|
|
304
|
+
embedding:
|
|
305
|
+
model: text-embedding-3-small
|
|
306
|
+
envs:
|
|
307
|
+
OPENAI_API_KEY: your-api-key
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
</details>
|
|
311
|
+
|
|
312
|
+
<details>
|
|
313
|
+
<summary>Azure OpenAI</summary>
|
|
314
|
+
|
|
315
|
+
```yaml
|
|
316
|
+
embedding:
|
|
317
|
+
model: azure/your-deployment-name
|
|
318
|
+
envs:
|
|
319
|
+
AZURE_API_KEY: your-api-key
|
|
320
|
+
AZURE_API_BASE: https://your-resource.openai.azure.com
|
|
321
|
+
AZURE_API_VERSION: "2024-06-01"
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
</details>
|
|
325
|
+
|
|
326
|
+
<details>
|
|
327
|
+
<summary>Gemini</summary>
|
|
328
|
+
|
|
329
|
+
```yaml
|
|
330
|
+
embedding:
|
|
331
|
+
model: gemini/gemini-embedding-001
|
|
332
|
+
envs:
|
|
333
|
+
GEMINI_API_KEY: your-api-key
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
</details>
|
|
337
|
+
|
|
338
|
+
<details>
|
|
339
|
+
<summary>Mistral</summary>
|
|
340
|
+
|
|
341
|
+
```yaml
|
|
342
|
+
embedding:
|
|
343
|
+
model: mistral/mistral-embed
|
|
344
|
+
envs:
|
|
345
|
+
MISTRAL_API_KEY: your-api-key
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
</details>
|
|
349
|
+
|
|
350
|
+
<details>
|
|
351
|
+
<summary>Voyage (Code-Optimized)</summary>
|
|
352
|
+
|
|
353
|
+
```yaml
|
|
354
|
+
embedding:
|
|
355
|
+
model: voyage/voyage-code-3
|
|
356
|
+
envs:
|
|
357
|
+
VOYAGE_API_KEY: your-api-key
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
</details>
|
|
361
|
+
|
|
362
|
+
<details>
|
|
363
|
+
<summary>Cohere</summary>
|
|
364
|
+
|
|
365
|
+
```yaml
|
|
366
|
+
embedding:
|
|
367
|
+
model: cohere/embed-v4.0
|
|
368
|
+
envs:
|
|
369
|
+
COHERE_API_KEY: your-api-key
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
</details>
|
|
373
|
+
|
|
374
|
+
<details>
|
|
375
|
+
<summary>AWS Bedrock</summary>
|
|
376
|
+
|
|
377
|
+
```yaml
|
|
378
|
+
embedding:
|
|
379
|
+
model: bedrock/amazon.titan-embed-text-v2:0
|
|
380
|
+
envs:
|
|
381
|
+
AWS_ACCESS_KEY_ID: your-access-key
|
|
382
|
+
AWS_SECRET_ACCESS_KEY: your-secret-key
|
|
383
|
+
AWS_REGION_NAME: us-east-1
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
</details>
|
|
387
|
+
|
|
388
|
+
<details>
|
|
389
|
+
<summary>Nebius</summary>
|
|
390
|
+
|
|
391
|
+
```yaml
|
|
392
|
+
embedding:
|
|
393
|
+
model: nebius/BAAI/bge-en-icl
|
|
394
|
+
envs:
|
|
395
|
+
NEBIUS_API_KEY: your-api-key
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
</details>
|
|
399
|
+
|
|
400
|
+
Any [LiteLLM-supported model](https://docs.litellm.ai/docs/embedding/supported_embedding) works. When using a LiteLLM model, set `provider: litellm` (or omit `provider` — LiteLLM is the default for non-`sentence-transformers` models).
|
|
401
|
+
|
|
402
|
+
### Local SentenceTransformers Models
|
|
403
|
+
|
|
404
|
+
Set `provider: sentence-transformers` and use any [SentenceTransformers](https://www.sbert.net/) model (no API key required).
|
|
405
|
+
|
|
406
|
+
**Example — general purpose text model:**
|
|
407
|
+
```yaml
|
|
408
|
+
embedding:
|
|
409
|
+
provider: sentence-transformers
|
|
410
|
+
model: nomic-ai/nomic-embed-text-v1.5
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
**GPU-optimised code retrieval:**
|
|
414
|
+
|
|
415
|
+
[`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed) delivers significantly better code retrieval than the default model. It is 137M parameters, requires ~1 GB VRAM, and has an 8192-token context window.
|
|
416
|
+
|
|
417
|
+
```yaml
|
|
418
|
+
embedding:
|
|
419
|
+
provider: sentence-transformers
|
|
420
|
+
model: nomic-ai/CodeRankEmbed
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
**Note:** Switching models requires re-indexing your codebase (`ccc reset && ccc index`) since the vector dimensions differ.
|
|
424
|
+
|
|
425
|
+
## Supported Languages
|
|
426
|
+
|
|
427
|
+
| Language | Aliases | File Extensions |
|
|
428
|
+
|----------|---------|-----------------|
|
|
429
|
+
| c | | `.c` |
|
|
430
|
+
| cpp | c++ | `.cpp`, `.cc`, `.cxx`, `.h`, `.hpp` |
|
|
431
|
+
| csharp | csharp, cs | `.cs` |
|
|
432
|
+
| css | | `.css`, `.scss` |
|
|
433
|
+
| dtd | | `.dtd` |
|
|
434
|
+
| fortran | f, f90, f95, f03 | `.f`, `.f90`, `.f95`, `.f03` |
|
|
435
|
+
| go | golang | `.go` |
|
|
436
|
+
| html | | `.html`, `.htm` |
|
|
437
|
+
| java | | `.java` |
|
|
438
|
+
| javascript | js | `.js` |
|
|
439
|
+
| json | | `.json` |
|
|
440
|
+
| kotlin | | `.kt`, `.kts` |
|
|
441
|
+
| lua | | `.lua` |
|
|
442
|
+
| markdown | md | `.md`, `.mdx` |
|
|
443
|
+
| pascal | pas, dpr, delphi | `.pas`, `.dpr` |
|
|
444
|
+
| php | | `.php` |
|
|
445
|
+
| python | | `.py` |
|
|
446
|
+
| r | | `.r` |
|
|
447
|
+
| ruby | | `.rb` |
|
|
448
|
+
| rust | rs | `.rs` |
|
|
449
|
+
| scala | | `.scala` |
|
|
450
|
+
| solidity | | `.sol` |
|
|
451
|
+
| sql | | `.sql` |
|
|
452
|
+
| swift | | `.swift` |
|
|
453
|
+
| toml | | `.toml` |
|
|
454
|
+
| tsx | | `.tsx` |
|
|
455
|
+
| typescript | ts | `.ts` |
|
|
456
|
+
| xml | | `.xml` |
|
|
457
|
+
| yaml | | `.yaml`, `.yml` |
|
|
458
|
+
|
|
459
|
+
## Troubleshooting
|
|
460
|
+
|
|
461
|
+
### `sqlite3.Connection object has no attribute enable_load_extension`
|
|
462
|
+
|
|
463
|
+
Some Python installations (e.g. the one pre-installed on macOS) ship with a SQLite library that doesn't enable extensions.
|
|
464
|
+
|
|
465
|
+
**macOS fix:** Install Python through [Homebrew](https://brew.sh/):
|
|
466
|
+
|
|
467
|
+
```bash
|
|
468
|
+
brew install python3
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
Then re-install cocoindex-code (see [Get Started](#get-started--zero-config-lets-go) for install options):
|
|
472
|
+
|
|
473
|
+
Using pipx:
|
|
474
|
+
```bash
|
|
475
|
+
pipx install cocoindex-code # first install
|
|
476
|
+
pipx upgrade cocoindex-code # upgrade
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
Using uv (install or upgrade):
|
|
480
|
+
```bash
|
|
481
|
+
uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
## Legacy: Environment Variables
|
|
485
|
+
|
|
486
|
+
If you previously configured `cocoindex-code` via environment variables, the `cocoindex-code` MCP command still reads them and auto-migrates to YAML settings on first run. We recommend switching to the YAML settings for new setups.
|
|
487
|
+
|
|
488
|
+
| Environment Variable | YAML Equivalent |
|
|
489
|
+
|---------------------|-----------------|
|
|
490
|
+
| `COCOINDEX_CODE_EMBEDDING_MODEL` | `embedding.model` in `global_settings.yml` |
|
|
491
|
+
| `COCOINDEX_CODE_DEVICE` | `embedding.device` in `global_settings.yml` |
|
|
492
|
+
| `COCOINDEX_CODE_ROOT_PATH` | Run `ccc init` in your project root instead |
|
|
493
|
+
| `COCOINDEX_CODE_EXCLUDED_PATTERNS` | `exclude_patterns` in project `settings.yml` |
|
|
494
|
+
| `COCOINDEX_CODE_EXTRA_EXTENSIONS` | `include_patterns` + `language_overrides` in project `settings.yml` |
|
|
495
|
+
|
|
496
|
+
## Large codebase / Enterprise
|
|
497
|
+
[CocoIndex](https://github.com/cocoindex-io/cocoindex) is an ultra efficient indexing engine that also works on large codebases at scale for enterprises. In enterprise scenarios it is a lot more efficient to share indexes with teammates when there are large or many repos. We also have advanced features like branch dedupe etc designed for enterprise users.
|
|
498
|
+
|
|
499
|
+
If you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help!
|
|
500
|
+
|
|
501
|
+
## License
|
|
502
|
+
|
|
503
|
+
Apache-2.0
|