@mars167/git-ai 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +22 -0
- package/README.md +364 -0
- package/README.zh-CN.md +361 -0
- package/assets/hooks/post-checkout +28 -0
- package/assets/hooks/post-merge +28 -0
- package/assets/hooks/pre-commit +17 -0
- package/assets/hooks/pre-push +29 -0
- package/dist/bin/git-ai.js +62 -0
- package/dist/src/commands/ai.js +30 -0
- package/dist/src/commands/checkIndex.js +19 -0
- package/dist/src/commands/dsr.js +156 -0
- package/dist/src/commands/graph.js +203 -0
- package/dist/src/commands/hooks.js +125 -0
- package/dist/src/commands/index.js +92 -0
- package/dist/src/commands/pack.js +31 -0
- package/dist/src/commands/query.js +139 -0
- package/dist/src/commands/semantic.js +134 -0
- package/dist/src/commands/serve.js +14 -0
- package/dist/src/commands/status.js +78 -0
- package/dist/src/commands/trae.js +75 -0
- package/dist/src/commands/unpack.js +28 -0
- package/dist/src/core/archive.js +91 -0
- package/dist/src/core/astGraph.js +127 -0
- package/dist/src/core/astGraphQuery.js +142 -0
- package/dist/src/core/cozo.js +266 -0
- package/dist/src/core/cpg/astLayer.js +56 -0
- package/dist/src/core/cpg/callGraph.js +483 -0
- package/dist/src/core/cpg/cfgLayer.js +490 -0
- package/dist/src/core/cpg/dfgLayer.js +237 -0
- package/dist/src/core/cpg/index.js +80 -0
- package/dist/src/core/cpg/types.js +108 -0
- package/dist/src/core/crypto.js +10 -0
- package/dist/src/core/dsr/generate.js +308 -0
- package/dist/src/core/dsr/gitContext.js +74 -0
- package/dist/src/core/dsr/indexMaterialize.js +106 -0
- package/dist/src/core/dsr/paths.js +26 -0
- package/dist/src/core/dsr/query.js +73 -0
- package/dist/src/core/dsr/snapshotParser.js +73 -0
- package/dist/src/core/dsr/state.js +27 -0
- package/dist/src/core/dsr/types.js +2 -0
- package/dist/src/core/embedding/fusion.js +52 -0
- package/dist/src/core/embedding/index.js +43 -0
- package/dist/src/core/embedding/parser.js +14 -0
- package/dist/src/core/embedding/semantic.js +254 -0
- package/dist/src/core/embedding/structural.js +97 -0
- package/dist/src/core/embedding/symbolic.js +117 -0
- package/dist/src/core/embedding/tokenizer.js +91 -0
- package/dist/src/core/embedding/types.js +2 -0
- package/dist/src/core/embedding.js +36 -0
- package/dist/src/core/git.js +49 -0
- package/dist/src/core/gitDiff.js +73 -0
- package/dist/src/core/indexCheck.js +131 -0
- package/dist/src/core/indexer.js +185 -0
- package/dist/src/core/indexerIncremental.js +303 -0
- package/dist/src/core/indexing/config.js +51 -0
- package/dist/src/core/indexing/hnsw.js +568 -0
- package/dist/src/core/indexing/index.js +17 -0
- package/dist/src/core/indexing/monitor.js +82 -0
- package/dist/src/core/indexing/parallel.js +252 -0
- package/dist/src/core/lancedb.js +111 -0
- package/dist/src/core/lfs.js +27 -0
- package/dist/src/core/log.js +62 -0
- package/dist/src/core/manifest.js +88 -0
- package/dist/src/core/parser/adapter.js +2 -0
- package/dist/src/core/parser/c.js +93 -0
- package/dist/src/core/parser/chunkRelations.js +178 -0
- package/dist/src/core/parser/chunker.js +274 -0
- package/dist/src/core/parser/go.js +98 -0
- package/dist/src/core/parser/java.js +80 -0
- package/dist/src/core/parser/markdown.js +76 -0
- package/dist/src/core/parser/python.js +81 -0
- package/dist/src/core/parser/rust.js +103 -0
- package/dist/src/core/parser/typescript.js +98 -0
- package/dist/src/core/parser/utils.js +62 -0
- package/dist/src/core/parser/yaml.js +53 -0
- package/dist/src/core/parser.js +75 -0
- package/dist/src/core/paths.js +10 -0
- package/dist/src/core/repoMap.js +164 -0
- package/dist/src/core/retrieval/cache.js +31 -0
- package/dist/src/core/retrieval/classifier.js +74 -0
- package/dist/src/core/retrieval/expander.js +80 -0
- package/dist/src/core/retrieval/fuser.js +40 -0
- package/dist/src/core/retrieval/index.js +32 -0
- package/dist/src/core/retrieval/reranker.js +304 -0
- package/dist/src/core/retrieval/types.js +2 -0
- package/dist/src/core/retrieval/weights.js +42 -0
- package/dist/src/core/search.js +41 -0
- package/dist/src/core/sq8.js +65 -0
- package/dist/src/core/symbolSearch.js +143 -0
- package/dist/src/core/types.js +2 -0
- package/dist/src/core/workspace.js +116 -0
- package/dist/src/mcp/server.js +794 -0
- package/docs/README.md +44 -0
- package/docs/cross-encoder.md +157 -0
- package/docs/embedding.md +158 -0
- package/docs/logo.png +0 -0
- package/docs/windows-setup.md +67 -0
- package/docs/zh-CN/DESIGN.md +102 -0
- package/docs/zh-CN/README.md +46 -0
- package/docs/zh-CN/advanced.md +26 -0
- package/docs/zh-CN/architecture_explained.md +116 -0
- package/docs/zh-CN/cli.md +109 -0
- package/docs/zh-CN/dsr.md +91 -0
- package/docs/zh-CN/graph_scenarios.md +173 -0
- package/docs/zh-CN/hooks.md +14 -0
- package/docs/zh-CN/manifests.md +136 -0
- package/docs/zh-CN/mcp.md +205 -0
- package/docs/zh-CN/quickstart.md +35 -0
- package/docs/zh-CN/rules.md +7 -0
- package/docs/zh-CN/technical-details.md +454 -0
- package/docs/zh-CN/troubleshooting.md +19 -0
- package/docs/zh-CN/windows-setup.md +67 -0
- package/install.sh +183 -0
- package/package.json +97 -0
- package/skills/git-ai-mcp/SKILL.md +86 -0
- package/skills/git-ai-mcp/references/constraints.md +143 -0
- package/skills/git-ai-mcp/references/tools.md +263 -0
- package/templates/agents/common/documents/Fix EISDIR error and enable multi-language indexing.md +14 -0
- package/templates/agents/common/documents/Fix git-ai index error in CodaGraph directory.md +13 -0
- package/templates/agents/common/skills/git-ai-mcp/SKILL.md +86 -0
- package/templates/agents/common/skills/git-ai-mcp/references/constraints.md +143 -0
- package/templates/agents/common/skills/git-ai-mcp/references/tools.md +263 -0
package/docs/README.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Documentation Center
|
|
2
|
+
|
|
3
|
+
This collects all documentation for `git-ai`.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
`git-ai` is a global CLI:
|
|
8
|
+
- Default behavior acts like `git`: `git-ai status/commit/push/...` proxies to system `git`.
|
|
9
|
+
- AI capabilities are under `git-ai ai ...`: Indexing, Retrieval, Packing, Hooks, MCP Server.
|
|
10
|
+
|
|
11
|
+
### Core Goals
|
|
12
|
+
- Store structured code repository indexes under `.git-ai/`, shareable via archive `.git-ai/lancedb.tar.gz`.
|
|
13
|
+
- Enable Agents to hit symbols/snippets via MCP tools at low cost, then read files as needed.
|
|
14
|
+
- Persist per-commit semantic change as DSR (immutable, deterministic), and rebuild caches from it.
|
|
15
|
+
|
|
16
|
+
### Important Directories
|
|
17
|
+
- `.git-ai/meta.json`: Index metadata (locally generated, usually not committed).
|
|
18
|
+
- `.git-ai/lancedb/`: Local vector index directory (usually not committed).
|
|
19
|
+
- `.git-ai/lancedb.tar.gz`: Archived index (can be committed/tracked via git-lfs).
|
|
20
|
+
- `.git-ai/ast-graph.sqlite`: AST graph database (CozoDB).
|
|
21
|
+
- `.git-ai/ast-graph.export.json`: AST graph export snapshot (for non-SQLite backend cross-process reuse).
|
|
22
|
+
- `.git-ai/dsr/<commit_hash>.json`: Per-commit DSR (canonical artifact, immutable).
|
|
23
|
+
- `.git-ai/dsr/dsr-index.sqlite`: DSR query accelerator (rebuildable cache from DSR + Git).
|
|
24
|
+
|
|
25
|
+
## Directory
|
|
26
|
+
|
|
27
|
+
### Usage Guides
|
|
28
|
+
- [Installation & Quick Start](./zh-CN/quickstart.md) (Chinese)
|
|
29
|
+
- [Windows Setup Guide](./windows-setup.md)
|
|
30
|
+
- [CLI Usage](./zh-CN/cli.md) (Chinese)
|
|
31
|
+
- [Hooks Workflow](./zh-CN/hooks.md) (Chinese)
|
|
32
|
+
- [MCP Server Integration](./zh-CN/mcp.md) (Chinese)
|
|
33
|
+
- [Manifest Workspace Support](./zh-CN/manifests.md) (Chinese)
|
|
34
|
+
- [Troubleshooting](./zh-CN/troubleshooting.md) (Chinese)
|
|
35
|
+
- [DSR (Deterministic Semantic Record)](./zh-CN/dsr.md) (Chinese)
|
|
36
|
+
|
|
37
|
+
### Advanced & Principles
|
|
38
|
+
- [Advanced: Index Archiving & LFS](./zh-CN/advanced.md) (Chinese)
|
|
39
|
+
- [Architecture Design](./zh-CN/design.md) (Chinese)
|
|
40
|
+
- [Development Rules](./zh-CN/rules.md) (Chinese)
|
|
41
|
+
- [Cross-Encoder Reranking](./cross-encoder.md) (English)
|
|
42
|
+
|
|
43
|
+
## Agent Integration
|
|
44
|
+
- [MCP Skill & Rule Templates](./zh-CN/mcp.md#agent-skills--rules) (Chinese)
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
# Cross-Encoder Reranking & ONNX Runtime
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
git-ai v2.2+ includes an optional **Cross-Encoder Reranking** feature that uses ONNX Runtime for high-quality result re-ranking. This is an optional enhancement that improves search result quality when a model is available.
|
|
6
|
+
|
|
7
|
+
## Architecture
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
Query → [Vector Search] → [Graph Search] → [DSR Search] → [Cross-Encoder Rerank] → Results
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
The cross-encoder takes query-candidate pairs and scores their relevance, providing higher quality re-ranking than simple score fusion.
|
|
14
|
+
|
|
15
|
+
## Configuration
|
|
16
|
+
|
|
17
|
+
### Model Path
|
|
18
|
+
|
|
19
|
+
The cross-encoder uses a configurable model path. By default, it looks for:
|
|
20
|
+
1. `<modelName>` (as absolute or relative path)
|
|
21
|
+
2. `<modelName>/model.onnx`
|
|
22
|
+
3. `<modelName>/onnx/model.onnx`
|
|
23
|
+
|
|
24
|
+
The default model name is `non-existent-model.onnx`, which means the system will use hash-based fallback by default.
|
|
25
|
+
|
|
26
|
+
```typescript
|
|
27
|
+
// Reranker configuration
|
|
28
|
+
interface RerankerConfig {
|
|
29
|
+
modelName: string; // Path to ONNX model
|
|
30
|
+
device: 'cpu' | 'gpu'; // Execution device
|
|
31
|
+
batchSize: number; // Batch processing size
|
|
32
|
+
topK: number; // Max candidates to re-rank
|
|
33
|
+
scoreWeights: {
|
|
34
|
+
original: number; // Weight for original retrieval score
|
|
35
|
+
crossEncoder: number; // Weight for cross-encoder score
|
|
36
|
+
};
|
|
37
|
+
}
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Default Behavior
|
|
41
|
+
|
|
42
|
+
When no model is found, the system automatically falls back to **hash-based scoring**:
|
|
43
|
+
- Uses `hashEmbedding` to create query-content vectors
|
|
44
|
+
- Computes similarity via sigmoid(sum)
|
|
45
|
+
- No external dependencies required
|
|
46
|
+
|
|
47
|
+
This ensures the system works even without ONNX models.
|
|
48
|
+
|
|
49
|
+
## Installing ONNX Models
|
|
50
|
+
|
|
51
|
+
To enable cross-encoder reranking, download a compatible model (e.g., MiniLM, CodeBERT) and configure the path:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# Example: Download a cross-encoder model
|
|
55
|
+
mkdir -p models/cross-encoder
|
|
56
|
+
cd models/cross-encoder
|
|
57
|
+
# Download your ONNX model (e.g., from HuggingFace, ONNX Model Zoo)
|
|
58
|
+
# Place model.onnx in this directory
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Performance Considerations
|
|
62
|
+
|
|
63
|
+
### Memory
|
|
64
|
+
- ONNX Runtime loads models into memory
|
|
65
|
+
- GPU memory required for GPU inference
|
|
66
|
+
- CPU inference works on any modern CPU
|
|
67
|
+
|
|
68
|
+
### Batch Processing
|
|
69
|
+
- Configure `batchSize` based on available memory
|
|
70
|
+
- Larger batches = better throughput but more memory
|
|
71
|
+
|
|
72
|
+
### Supported Backends
|
|
73
|
+
- **CPU**: All platforms, no additional setup
|
|
74
|
+
- **GPU**: CUDA-enabled systems (optional CUDA execution provider)
|
|
75
|
+
|
|
76
|
+
## API Usage
|
|
77
|
+
|
|
78
|
+
### CLI (Not yet exposed)
|
|
79
|
+
|
|
80
|
+
Cross-encoder is currently used internally by the retrieval pipeline.
|
|
81
|
+
|
|
82
|
+
### Programmatic
|
|
83
|
+
|
|
84
|
+
```typescript
|
|
85
|
+
import { CrossEncoderReranker } from 'git-ai';
|
|
86
|
+
|
|
87
|
+
const reranker = new CrossEncoderReranker({
|
|
88
|
+
modelName: './models/cross-encoder',
|
|
89
|
+
device: 'cpu',
|
|
90
|
+
batchSize: 32,
|
|
91
|
+
topK: 100,
|
|
92
|
+
scoreWeights: {
|
|
93
|
+
original: 0.3,
|
|
94
|
+
crossEncoder: 0.7,
|
|
95
|
+
},
|
|
96
|
+
});
|
|
97
|
+
|
|
98
|
+
const results = await reranker.rerank('authentication logic', candidates);
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Fallback Mechanism
|
|
102
|
+
|
|
103
|
+
The system handles missing models gracefully:
|
|
104
|
+
|
|
105
|
+
1. **Model file missing** → Log `cross_encoder_model_missing` and use hash fallback
|
|
106
|
+
2. **ONNX load failed** → Log `cross_encoder_fallback` and use hash fallback
|
|
107
|
+
3. **Inference error** → Log error and continue with fallback
|
|
108
|
+
|
|
109
|
+
No crashes or service interruption when model is unavailable.
|
|
110
|
+
|
|
111
|
+
## Comparison: Hash vs ONNX
|
|
112
|
+
|
|
113
|
+
| Aspect | Hash Fallback | ONNX Cross-Encoder |
|
|
114
|
+
|--------|---------------|-------------------|
|
|
115
|
+
| Quality | Good for exact matches | Excellent for semantic matching |
|
|
116
|
+
| Speed | <1ms | 10-100ms (depending on model) |
|
|
117
|
+
| Dependencies | None | onnxruntime-node |
|
|
118
|
+
| Memory | <1MB | 50-500MB (model size) |
|
|
119
|
+
| GPU Required | No | Optional |
|
|
120
|
+
|
|
121
|
+
## Troubleshooting
|
|
122
|
+
|
|
123
|
+
### Model Load Failed
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
{"level":"warn","msg":"cross_encoder_fallback","err":"..."}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Causes:
|
|
130
|
+
- Model file doesn't exist
|
|
131
|
+
- Corrupted model file
|
|
132
|
+
- Incompatible ONNX opset version
|
|
133
|
+
|
|
134
|
+
Solution:
|
|
135
|
+
1. Verify model path is correct
|
|
136
|
+
2. Check model file is valid ONNX
|
|
137
|
+
3. Ensure onnxruntime-node is installed
|
|
138
|
+
|
|
139
|
+
### Out of Memory
|
|
140
|
+
|
|
141
|
+
Reduce `batchSize` in configuration or use CPU backend.
|
|
142
|
+
|
|
143
|
+
### Slow Inference
|
|
144
|
+
|
|
145
|
+
- Use smaller models (MiniLM instead of large BERT)
|
|
146
|
+
- Enable batching for multiple queries
|
|
147
|
+
- Consider GPU for large-scale usage
|
|
148
|
+
|
|
149
|
+
## Dependencies
|
|
150
|
+
|
|
151
|
+
```json
|
|
152
|
+
{
|
|
153
|
+
"onnxruntime-node": "^1.19.2"
|
|
154
|
+
}
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Required for cross-encoder functionality. Optional - system works without it.
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Embedding Models
|
|
2
|
+
|
|
3
|
+
git-ai uses ONNX-compatible embedding models for semantic code search. This document covers model configuration, available options, and setup instructions.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
The embedding system converts code snippets into vector representations for similarity search. git-ai supports:
|
|
8
|
+
|
|
9
|
+
- **Semantic Embedding**: Neural network-based code representation (CodeBERT, MiniLM)
|
|
10
|
+
- **Structural Embedding**: AST-based structural features (WL kernel hashing)
|
|
11
|
+
- **Symbolic Embedding**: Identifier and symbol relationships
|
|
12
|
+
|
|
13
|
+
## Configuration
|
|
14
|
+
|
|
15
|
+
### Environment Variable
|
|
16
|
+
|
|
17
|
+
Set `GIT_AI_EMBEDDING_MODEL` to override the default embedding model:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
export GIT_AI_EMBEDDING_MODEL="$HOME/.cache/git-ai/models/minilm/model.onnx"
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Add to your shell profile for permanent use:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# ~/.zshrc or ~/.bashrc
|
|
27
|
+
export GIT_AI_EMBEDDING_MODEL="$HOME/.cache/git-ai/models/minilm/model.onnx"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Default Paths
|
|
31
|
+
|
|
32
|
+
| Model | Default Path |
|
|
33
|
+
|-------|-------------|
|
|
34
|
+
| CodeBERT | `~/.cache/git-ai/models/codebert/model.onnx` |
|
|
35
|
+
| MiniLM | `~/.cache/git-ai/models/minilm/model.onnx` |
|
|
36
|
+
|
|
37
|
+
The system automatically detects the model type and sets the appropriate embedding dimension:
|
|
38
|
+
- CodeBERT: 768 dimensions
|
|
39
|
+
- MiniLM-L6: 384 dimensions
|
|
40
|
+
|
|
41
|
+
## Available Models
|
|
42
|
+
|
|
43
|
+
### MiniLM-L6 (Recommended)
|
|
44
|
+
|
|
45
|
+
Lightweight, fast model ideal for local development.
|
|
46
|
+
|
|
47
|
+
- **Size**: ~86MB
|
|
48
|
+
- **Dimensions**: 384
|
|
49
|
+
- **Speed**: Fast (<100ms per query)
|
|
50
|
+
- **Download**:
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
from huggingface_hub import hf_hub_download
|
|
54
|
+
|
|
55
|
+
hf_hub_download(
|
|
56
|
+
repo_id="Xenova/all-MiniLM-L6-v2",
|
|
57
|
+
filename="onnx/model.onnx",
|
|
58
|
+
local_dir="$HOME/.cache/git-ai/models/minilm"
|
|
59
|
+
)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### CodeBERT
|
|
63
|
+
|
|
64
|
+
Microsoft CodeBERT for code understanding.
|
|
65
|
+
|
|
66
|
+
- **Size**: ~500MB
|
|
67
|
+
- **Dimensions**: 768
|
|
68
|
+
- **Quality**: Higher semantic understanding
|
|
69
|
+
- **Download**:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
huggingface-cli download onnx-community/codebert-javascript-ONNX \
|
|
73
|
+
--local-dir "$HOME/.cache/git-ai/models/codebert"
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Model Directory Structure
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
~/.cache/git-ai/models/
|
|
80
|
+
├── codebert/
|
|
81
|
+
│ ├── model.onnx # ONNX model file
|
|
82
|
+
│ └── config.json # Model configuration
|
|
83
|
+
└── minilm/
|
|
84
|
+
├── model.onnx -> onnx/model.onnx # Symlink to ONNX model
|
|
85
|
+
├── onnx/
|
|
86
|
+
│ └── model.onnx
|
|
87
|
+
└── config.json
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Fallback Behavior
|
|
91
|
+
|
|
92
|
+
If no model is found, git-ai automatically falls back to hash-based embedding:
|
|
93
|
+
|
|
94
|
+
- **Quality**: Good for exact matches
|
|
95
|
+
- **Speed**: <1ms
|
|
96
|
+
- **Memory**: <1MB
|
|
97
|
+
- **Dependencies**: None
|
|
98
|
+
|
|
99
|
+
No crashes or service interruption when model is unavailable.
|
|
100
|
+
|
|
101
|
+
## Performance Considerations
|
|
102
|
+
|
|
103
|
+
| Model | Memory | CPU Inference | GPU Recommended |
|
|
104
|
+
|-------|--------|---------------|-----------------|
|
|
105
|
+
| MiniLM | ~200MB | Excellent | Optional |
|
|
106
|
+
| CodeBERT | ~800MB | Good | Yes |
|
|
107
|
+
|
|
108
|
+
### Batch Processing
|
|
109
|
+
|
|
110
|
+
Configure batch size in environment:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
export GIT_AI_EMBEDDING_BATCH_SIZE=8
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Troubleshooting
|
|
117
|
+
|
|
118
|
+
### Model Load Failed
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
{"level":"warn","msg":"semantic_embed_fallback","err":"..."}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Causes:
|
|
125
|
+
- Model file doesn't exist
|
|
126
|
+
- Corrupted model file
|
|
127
|
+
- Incompatible ONNX opset version
|
|
128
|
+
|
|
129
|
+
Solution:
|
|
130
|
+
1. Verify model path is correct
|
|
131
|
+
2. Check model file is valid ONNX
|
|
132
|
+
3. Ensure onnxruntime-node is installed
|
|
133
|
+
|
|
134
|
+
### Dimension Mismatch
|
|
135
|
+
|
|
136
|
+
If you see dimension errors, verify the model path matches the expected dimension:
|
|
137
|
+
- MiniLM: 384 dimensions
|
|
138
|
+
- CodeBERT: 768 dimensions
|
|
139
|
+
|
|
140
|
+
## Comparison
|
|
141
|
+
|
|
142
|
+
| Aspect | MiniLM | CodeBERT | Hash Fallback |
|
|
143
|
+
|--------|--------|----------|---------------|
|
|
144
|
+
| Size | 86MB | 500MB | <1MB |
|
|
145
|
+
| Dimensions | 384 | 768 | N/A |
|
|
146
|
+
| Speed | <100ms | 100-500ms | <1ms |
|
|
147
|
+
| Quality | Good | Excellent | Exact matches |
|
|
148
|
+
| Memory | Low | High | Minimal |
|
|
149
|
+
|
|
150
|
+
## Dependencies
|
|
151
|
+
|
|
152
|
+
```json
|
|
153
|
+
{
|
|
154
|
+
"onnxruntime-node": "^1.19.2"
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Required for embedding functionality. Optional - the system works with hash fallback without it.
|
package/docs/logo.png
ADDED
|
Binary file
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Windows Development and Installation Guide
|
|
2
|
+
|
|
3
|
+
[简体中文](./zh-CN/windows-setup.md) | **English**
|
|
4
|
+
|
|
5
|
+
This guide describes how to set up the development environment for `git-ai` on Windows, specifically for the multi-language support (C, Go, Python, Rust).
|
|
6
|
+
|
|
7
|
+
## Prerequisites
|
|
8
|
+
|
|
9
|
+
1. **Node.js**: Install Node.js (LTS version recommended) from [nodejs.org](https://nodejs.org/).
|
|
10
|
+
2. **Git**: Install Git for Windows from [git-scm.com](https://git-scm.com/).
|
|
11
|
+
|
|
12
|
+
## Build Tools for Native Dependencies
|
|
13
|
+
|
|
14
|
+
`git-ai` relies on libraries with native bindings:
|
|
15
|
+
* `tree-sitter`: For code parsing (C++)
|
|
16
|
+
* `cozo-node`: Graph database engine (Rust/C++)
|
|
17
|
+
|
|
18
|
+
While these libraries typically provide prebuilt binaries, you may need to build from source in certain environments (e.g., mismatched Node versions or specific architectures). Therefore, setting up a build environment is recommended.
|
|
19
|
+
|
|
20
|
+
### Option 1: Install via Admin PowerShell (Recommended)
|
|
21
|
+
|
|
22
|
+
Open PowerShell as Administrator and run:
|
|
23
|
+
|
|
24
|
+
```powershell
|
|
25
|
+
npm install --global --production windows-build-tools
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
*Note: This package is sometimes deprecated or problematic. If it hangs or fails, use Option 2.*
|
|
29
|
+
|
|
30
|
+
### Option 2: Manual Installation
|
|
31
|
+
|
|
32
|
+
1. **Python**: Install Python 3 from [python.org](https://www.python.org/) or the Microsoft Store.
|
|
33
|
+
2. **Visual Studio Build Tools**:
|
|
34
|
+
* Download [Visual Studio Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).
|
|
35
|
+
* Run the installer and select the **"Desktop development with C++"** workload.
|
|
36
|
+
* Ensure "MSVC ... C++ x64/x86 build tools" and "Windows 10/11 SDK" are selected.
|
|
37
|
+
|
|
38
|
+
## Installation
|
|
39
|
+
|
|
40
|
+
Once prerequisites are met:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
git clone https://github.com/mars167/git-ai-cli.git
|
|
44
|
+
cd git-ai-cli-v2
|
|
45
|
+
npm install
|
|
46
|
+
npm run build
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Running Examples
|
|
50
|
+
|
|
51
|
+
To verify support for different languages, you can run the parsing test:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
npx ts-node test/verify_parsing.ts
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
To fully develop with the polyglot examples, you may need to install the respective language runtimes:
|
|
58
|
+
|
|
59
|
+
* **C**: Install MinGW or use MSVC (cl.exe).
|
|
60
|
+
* **Go**: Install from [go.dev](https://go.dev/dl/).
|
|
61
|
+
* **Python**: [python.org](https://www.python.org/).
|
|
62
|
+
* **Rust**: Install via [rustup.rs](https://rustup.rs/).
|
|
63
|
+
|
|
64
|
+
## Troubleshooting
|
|
65
|
+
|
|
66
|
+
* **node-gyp errors**: Ensure Python and Visual Studio Build Tools are correctly installed and in PATH. You can configure npm to use a specific python version: `npm config set python python3`.
|
|
67
|
+
* **Path issues**: Ensure `git-ai` binary or `npm bin` is in your PATH if running globally.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# git-ai Design(LanceDB + SQ8 + 去重)
|
|
2
|
+
|
|
3
|
+
## 1. 目标
|
|
4
|
+
- 使用 LanceDB 作为本地索引存储(列式 + 可扩展)。
|
|
5
|
+
- 引入 SQ8(8-bit 标量量化)降低向量存储体积。
|
|
6
|
+
- 引入内容哈希去重:相同内容只存一份向量,多处引用仅存引用关系。
|
|
7
|
+
- 索引口径:仅针对当前 checkout 的 HEAD 工作区;历史版本由 Git 负责管理(通过 checkout 获得对应索引快照)。
|
|
8
|
+
- 提供按提交的 DSR(Deterministic Semantic Record)作为语义工件:按提交、不可变、确定性;数据库仅作可重建缓存。
|
|
9
|
+
|
|
10
|
+
## 2. 存储布局
|
|
11
|
+
索引产物放在仓库根目录:
|
|
12
|
+
- `.git-ai/`:索引目录
|
|
13
|
+
- `lancedb/`:LanceDB 数据目录
|
|
14
|
+
- `lancedb.tar.gz`:打包后的 LanceDB(用于 Git LFS 追踪与传输)
|
|
15
|
+
- `ast-graph.sqlite`:AST 关系图数据库(CozoDB,优先 SQLite 引擎)
|
|
16
|
+
- `ast-graph.export.json`:AST 图导出快照(仅在非 SQLite 后端时用于跨进程复用)
|
|
17
|
+
- `meta.json`:索引元信息(维度、编码、构建时间等)
|
|
18
|
+
|
|
19
|
+
DSR 产物(按提交,规范工件):
|
|
20
|
+
- `.git-ai/dsr/<commit_hash>.json`:单提交 DSR(不可变、确定性)
|
|
21
|
+
- `.git-ai/dsr/dsr-index.sqlite`:DSR 查询加速索引(可删缓存,可由 DSR + Git 重建)
|
|
22
|
+
|
|
23
|
+
## 3. 数据模型(两张表)
|
|
24
|
+
|
|
25
|
+
### 3.1 chunks(去重后的内容向量表)
|
|
26
|
+
- 一行代表一个“去重后的内容块”(例如某个符号的骨架/签名文本)。
|
|
27
|
+
- 主键:`content_hash`(sha256)
|
|
28
|
+
- 字段:
|
|
29
|
+
- `content_hash: string`
|
|
30
|
+
- `text: string`(骨架/签名等用于可解释)
|
|
31
|
+
- `dim: int32`
|
|
32
|
+
- `scale: float32`(SQ8 反量化比例)
|
|
33
|
+
- `qvec_b64: string`(SQ8 量化向量,Int8Array 的 base64 编码)
|
|
34
|
+
|
|
35
|
+
### 3.2 refs(引用表)
|
|
36
|
+
- 一行代表一次出现位置(文件/行号/符号等),指向 chunks 的 `content_hash`。
|
|
37
|
+
- 字段:
|
|
38
|
+
- `ref_id: string`(sha256(file + symbol + range + content_hash))
|
|
39
|
+
- `content_hash: string`
|
|
40
|
+
- `file: string`
|
|
41
|
+
- `symbol: string`
|
|
42
|
+
- `kind: string`
|
|
43
|
+
- `signature: string`
|
|
44
|
+
- `start_line: int32`
|
|
45
|
+
- `end_line: int32`
|
|
46
|
+
|
|
47
|
+
## 4. 向量生成与 SQ8
|
|
48
|
+
- v2 默认不依赖外部 embedding API:使用确定性的本地 hash embedding(维度固定)生成 float 向量。
|
|
49
|
+
- SQ8(对称量化):
|
|
50
|
+
- `scale = max(|v|)/127`
|
|
51
|
+
- `q[i] = clamp(round(v[i]/scale), -127..127)`
|
|
52
|
+
- 反量化:`v'[i] = q[i] * scale`
|
|
53
|
+
|
|
54
|
+
## 5. 去重策略
|
|
55
|
+
- `content_hash = sha256(text)`,同一 text 只写入 chunks 一次。
|
|
56
|
+
- refs 始终写入,形成多对一关系。
|
|
57
|
+
|
|
58
|
+
## 6. 查询能力
|
|
59
|
+
- `search_symbols(query)`:在 refs 表过滤 `symbol LIKE %query%` 返回文件 + 行号 + signature。
|
|
60
|
+
- `semantic_search(text, k)`:
|
|
61
|
+
- 计算 query embedding → SQ8;
|
|
62
|
+
- 扫描 chunks(或按过滤条件缩小)反量化计算 cosine 相似度;
|
|
63
|
+
- 取 TopK 后关联 refs 输出定位结果。
|
|
64
|
+
|
|
65
|
+
## 6.1 AST 图查询(CozoDB)
|
|
66
|
+
|
|
67
|
+
索引时会把符号及其关系写入 CozoDB,用于表达“包含关系”和“继承关系”等更适合图/递归查询的数据:
|
|
68
|
+
|
|
69
|
+
### 关系(relations)
|
|
70
|
+
- `ast_file(file_id => file)`:文件节点(file_id 为 `sha256("file:" + file)`)
|
|
71
|
+
- `ast_symbol(ref_id => file, name, kind, signature, start_line, end_line)`:符号节点(ref_id 与 refs 表一致)
|
|
72
|
+
- `ast_contains(parent_id, child_id)`:包含关系边(parent_id 可能是 file_id 或 ref_id)
|
|
73
|
+
- `ast_extends_name(sub_id, super_name)`:继承关系(按名字记录,便于后续 join/解析)
|
|
74
|
+
- `ast_implements_name(sub_id, iface_name)`:实现关系(按名字记录)
|
|
75
|
+
|
|
76
|
+
### CLI / MCP
|
|
77
|
+
- CLI:`git-ai ai graph ...`
|
|
78
|
+
- MCP:`ast_graph_query({query, params})`
|
|
79
|
+
|
|
80
|
+
## 7. Git hooks 集成
|
|
81
|
+
- `pre-commit`:自动重建索引(index --overwrite)并打包(pack),把 `.git-ai/lancedb.tar.gz` 添加到暂存区;若安装了 git-lfs 会自动执行 lfs track。
|
|
82
|
+
- `pre-push`:再次打包并校验归档未发生变化;若变化则阻止 push,提示先提交归档文件。
|
|
83
|
+
- `post-checkout` / `post-merge`:若存在 `.git-ai/lancedb.tar.gz`,自动解包到 `.git-ai/lancedb/`。
|
|
84
|
+
- 安装方式:在仓库中执行 `git-ai ai hooks install`(写入 .githooks/* 并设置 core.hooksPath=.githooks)。
|
|
85
|
+
|
|
86
|
+
## 8. DSR(Deterministic Semantic Record)
|
|
87
|
+
|
|
88
|
+
DSR 用于固化“每个提交的语义变化”,并严格遵守:
|
|
89
|
+
|
|
90
|
+
- Git DAG 是历史/分支的唯一权威来源(DSR 只 enrich 节点,不定义边)
|
|
91
|
+
- 一次提交 → 一份 DSR 文件(`.git-ai/dsr/<commit_hash>.json`)
|
|
92
|
+
- DSR 一旦生成不可覆盖;若已存在且内容不同应停止并报错
|
|
93
|
+
- 任何数据库/索引必须可由 DSR + Git 重建(缓存可删)
|
|
94
|
+
|
|
95
|
+
CLI 入口:
|
|
96
|
+
|
|
97
|
+
- `git-ai ai dsr context`
|
|
98
|
+
- `git-ai ai dsr generate <commit>`
|
|
99
|
+
- `git-ai ai dsr rebuild-index`
|
|
100
|
+
- `git-ai ai dsr query symbol-evolution <symbol>`
|
|
101
|
+
|
|
102
|
+
更详细说明见:[DSR 文档](./dsr.md)
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# 文档中心
|
|
2
|
+
|
|
3
|
+
[**English**](../README.md) | 简体中文
|
|
4
|
+
|
|
5
|
+
这里汇集了 `git-ai` 的所有文档。
|
|
6
|
+
|
|
7
|
+
## 概览
|
|
8
|
+
|
|
9
|
+
`git-ai` 是一个全局 CLI:
|
|
10
|
+
- 默认行为像 `git`:`git-ai status/commit/push/...` 会代理到系统 `git`
|
|
11
|
+
- AI 能力放在 `git-ai ai ...`:索引、检索、归档、hooks、MCP Server
|
|
12
|
+
|
|
13
|
+
### 核心目标
|
|
14
|
+
- 把代码仓的结构化索引放在 `.git-ai/` 下,并可通过归档文件 `.git-ai/lancedb.tar.gz` 分享
|
|
15
|
+
- 让 Agent 通过 MCP tools 低成本命中符号/片段,再按需读取文件
|
|
16
|
+
- 把每个提交的语义变化固化为 DSR(按提交、不可变、确定性),并可据此重建缓存
|
|
17
|
+
|
|
18
|
+
### 重要目录
|
|
19
|
+
- `.git-ai/meta.json`:索引元数据(本地生成,通常不提交)
|
|
20
|
+
- `.git-ai/lancedb/`:本地向量索引目录(通常不提交)
|
|
21
|
+
- `.git-ai/lancedb.tar.gz`:归档后的索引(可提交/可用 git-lfs 追踪)
|
|
22
|
+
- `.git-ai/ast-graph.sqlite`:AST 图数据库(CozoDB)
|
|
23
|
+
- `.git-ai/ast-graph.export.json`:AST 图导出快照(用于非 SQLite 后端跨进程复用)
|
|
24
|
+
- `.git-ai/dsr/<commit_hash>.json`:单提交 DSR(规范工件,按提交、不可变)
|
|
25
|
+
- `.git-ai/dsr/dsr-index.sqlite`:DSR 查询加速索引(可删缓存,可由 DSR + Git 重建)
|
|
26
|
+
|
|
27
|
+
## 目录
|
|
28
|
+
|
|
29
|
+
### 使用指引
|
|
30
|
+
- [安装与快速开始](./quickstart.md)
|
|
31
|
+
- [Windows 开发与安装指引](./windows-setup.md)
|
|
32
|
+
- [命令行使用](./cli.md)
|
|
33
|
+
- [Hooks 工作流](./hooks.md)
|
|
34
|
+
- [MCP Server 接入](./mcp.md)
|
|
35
|
+
- [Manifest Workspace 支持](./manifests.md)
|
|
36
|
+
- [排障](./troubleshooting.md)
|
|
37
|
+
- [DSR(Deterministic Semantic Record)](./dsr.md)
|
|
38
|
+
|
|
39
|
+
### 进阶与原理
|
|
40
|
+
- [技术细节详解](./technical-details.md)
|
|
41
|
+
- [进阶:索引归档与 LFS](./advanced.md)
|
|
42
|
+
- [架构设计](./design.md)
|
|
43
|
+
- [开发规则](./rules.md)
|
|
44
|
+
|
|
45
|
+
## Agent 集成
|
|
46
|
+
- [MCP Skill & Rule 模版](./mcp.md#agent-skills--rules)
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# 索引归档与 LFS
|
|
2
|
+
|
|
3
|
+
## pack/unpack
|
|
4
|
+
- `pack`:把 `.git-ai/lancedb/` 打包为 `.git-ai/lancedb.tar.gz`
|
|
5
|
+
- `unpack`:把 `.git-ai/lancedb.tar.gz` 解包为 `.git-ai/lancedb/`
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git-ai ai pack
|
|
9
|
+
git-ai ai unpack
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
## Git LFS(可选)
|
|
13
|
+
如果仓库安装了 git-lfs,推荐对 `.git-ai/lancedb.tar.gz` 使用 LFS:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
git lfs install
|
|
17
|
+
git lfs track ".git-ai/lancedb.tar.gz"
|
|
18
|
+
git add .gitattributes
|
|
19
|
+
git commit -m "track lancedb archive via lfs"
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
也可以运行一次:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
git-ai ai pack --lfs
|
|
26
|
+
```
|