@lorrylurui/code-intelligence-mcp 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +276 -0
- package/dist/cli/detect-duplicates.js +349 -0
- package/dist/cli/index-codebase.js +43 -0
- package/dist/config/env.js +23 -0
- package/dist/db/mysql.js +20 -0
- package/dist/index.js +14 -0
- package/dist/indexer/embedText.js +28 -0
- package/dist/indexer/extractMeta.js +118 -0
- package/dist/indexer/heuristics.js +96 -0
- package/dist/indexer/indexProject.js +178 -0
- package/dist/indexer/persistSymbols.js +55 -0
- package/dist/prompts/recommendComponentPrompt.js +30 -0
- package/dist/prompts/reusableCodeAdvisorPrompt.js +63 -0
- package/dist/repositories/symbolRepository.js +219 -0
- package/dist/server/createServer.js +29 -0
- package/dist/services/embeddingClient.js +37 -0
- package/dist/services/ranking.js +161 -0
- package/dist/services/reindex.js +45 -0
- package/dist/services/vectorMath.js +17 -0
- package/dist/skills/rankSymbols.js +24 -0
- package/dist/skills/recommendComponent.js +39 -0
- package/dist/tools/getSymbolDetail.js +32 -0
- package/dist/tools/recommendComponent.js +28 -0
- package/dist/tools/reindex.js +37 -0
- package/dist/tools/searchByStructure.js +55 -0
- package/dist/tools/searchSymbols.js +81 -0
- package/dist/types/symbol.js +1 -0
- package/package.json +45 -0
package/README.md
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
1
|
+
# Code Intelligence MCP (Minimal)
|
|
2
|
+
|
|
3
|
+
最小可用的 Node MCP Server 框架,包含:
|
|
4
|
+
|
|
5
|
+
- MCP Server(stdio)
|
|
6
|
+
- Tool: `search_symbols`(支持 `semantic=true` 语义检索,Phase 5)
|
|
7
|
+
- Tool: `get_symbol_detail`
|
|
8
|
+
- Tool: `search_by_structure`
|
|
9
|
+
- Tool: `reindex`
|
|
10
|
+
- Tool: `recommend_component`
|
|
11
|
+
- Prompt: `reusable-code-advisor`(与 Cursor Skill 同工作流,见 `src/prompts/reusableCodeAdvisorPrompt.ts`)
|
|
12
|
+
- MySQL Repository(可选启用)
|
|
13
|
+
- Cursor Skill:`reusable-code-advisor`(`.cursor/skills/reusable-code-advisor/`,未改动,与 MCP Prompt 并行维护)
|
|
14
|
+
|
|
15
|
+
## 1) 安装
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
npm install
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## 2) 环境变量
|
|
22
|
+
|
|
23
|
+
复制 `.env.example` 为 `.env`。
|
|
24
|
+
|
|
25
|
+
默认不强制连接 MySQL(未配置时走内存示例数据)。
|
|
26
|
+
|
|
27
|
+
如果你要连接 MySQL,请设置:
|
|
28
|
+
|
|
29
|
+
```env
|
|
30
|
+
MYSQL_ENABLED=true
|
|
31
|
+
MYSQL_HOST=127.0.0.1
|
|
32
|
+
MYSQL_PORT=3306
|
|
33
|
+
MYSQL_USER=root
|
|
34
|
+
MYSQL_PASSWORD=devpassword
|
|
35
|
+
MYSQL_DATABASE=code_intelligence
|
|
36
|
+
|
|
37
|
+
# Phase 5(可选):句向量服务根 URL,与 `npm run embedding:dev` 默认端口一致
|
|
38
|
+
# EMBEDDING_SERVICE_URL=http://127.0.0.1:8765
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
密码需与下方 Docker / 本机 MySQL 配置一致(文档示例里 `devpassword` 对应 Compose)。
|
|
42
|
+
|
|
43
|
+
### 用 Docker 启动 MySQL(推荐本地开发)
|
|
44
|
+
|
|
45
|
+
1. 安装 [Docker Desktop](https://www.docker.com/products/docker-desktop/)(或 Docker Engine + Compose 插件)。
|
|
46
|
+
2. 在项目根目录执行:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
npm run docker:up
|
|
50
|
+
# 或:docker compose up -d
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
3. 首次启动会自动挂载 `sql/schema.sql` 到 `docker-entrypoint-initdb.d`,**创建库表**(仅**空数据卷**时执行一次)。
|
|
54
|
+
4. 复制 `.env.example` 为 `.env`,设置 `MYSQL_ENABLED=true`,`MYSQL_PASSWORD` 与 `docker-compose.yml` 里 `MYSQL_ROOT_PASSWORD`(默认 `devpassword`)一致。
|
|
55
|
+
5. 等待容器健康(约数十秒):
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
docker compose ps
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
6. 再执行 `npm run index` 或启动 MCP。
|
|
62
|
+
|
|
63
|
+
常用命令:
|
|
64
|
+
|
|
65
|
+
| 命令 | 说明 |
|
|
66
|
+
| ------------------------ | ------------------------------ |
|
|
67
|
+
| `npm run docker:logs` | 查看 MySQL 日志 |
|
|
68
|
+
| `npm run docker:down` | 停止容器(数据卷保留,库仍在) |
|
|
69
|
+
| `docker compose down -v` | **删除卷**(清空库,慎用) |
|
|
70
|
+
|
|
71
|
+
**端口冲突**:若本机已有服务占用 `3306`,把 `docker-compose.yml` 里 `ports` 改为 `"3307:3306"`,并在 `.env` 设 `MYSQL_PORT=3307`。
|
|
72
|
+
|
|
73
|
+
## 3) 初始化数据库(可选)
|
|
74
|
+
|
|
75
|
+
- **已用上述 Docker 首次启动**:若卷为空,建表已由 `sql/schema.sql` 自动执行,一般无需再跑下面命令。
|
|
76
|
+
- **本机 mysql 客户端 / 手动执行**:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
mysql -u root -p code_intelligence < sql/schema.sql
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## 4) 本地运行
|
|
83
|
+
|
|
84
|
+
### 普通开发(热更新)
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
npm run dev
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
使用 `tsx watch`,改 `src/` 会自动重启;已关闭清屏(`--clear-screen=false`),并排除 `node_modules`、`dist`。
|
|
91
|
+
|
|
92
|
+
### 接 Cursor MCP(不污染 stdout)
|
|
93
|
+
|
|
94
|
+
MCP 走 **stdio**,协议数据必须在子进程的 **stdout** 上;若用 `npm run dev` 接 MCP,`npm` 或部分工具可能往 stdout 打杂讯,导致握手异常。
|
|
95
|
+
|
|
96
|
+
推荐用 **专用脚本**:子进程只跑 `tsx src/index.ts`,**监听/重启日志只打到 stderr**。
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
npm run dev:mcp
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Cursor `mcp.json` 示例(推荐直接调 node,避免 npm):**
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"mcpServers": {
|
|
107
|
+
"code-intelligence-mcp": {
|
|
108
|
+
"command": "node",
|
|
109
|
+
"args": ["/绝对路径/Intelligence-code/scripts/mcp-dev-watch.mjs"],
|
|
110
|
+
"cwd": "/绝对路径/Intelligence-code"
|
|
111
|
+
}
|
|
112
|
+
}
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
也可继续用 `"command": "npm"`, `"args": ["run", "dev:mcp"]`,但部分环境下 npm 仍可能产生额外输出;若 tools 不稳定,请改用上面的 `node .../mcp-dev-watch.mjs`。
|
|
117
|
+
|
|
118
|
+
### MCP Prompt(非 Cursor 客户端)
|
|
119
|
+
|
|
120
|
+
服务器注册 Prompt **`reusable-code-advisor`**:客户端执行 `prompts/list` 可见;`prompts/get` 时可传可选参数 **`userRequest`**(用户当前需求或关键词),返回的消息正文与 Cursor Skill 工作流一致。
|
|
121
|
+
文案与 `.cursor/skills/reusable-code-advisor/SKILL.md` 正文需**手动同步**(见 `src/prompts/reusableCodeAdvisorPrompt.ts` 顶部注释)。
|
|
122
|
+
|
|
123
|
+
在 **MCP Inspector** 中切换到 **Prompts** 面板即可选择并调试。
|
|
124
|
+
|
|
125
|
+
## 5) Phase 2:代码索引(ts-morph + fast-glob → MySQL)
|
|
126
|
+
|
|
127
|
+
1. **建表 / 迁移**
|
|
128
|
+
- 新库:执行 `sql/schema.sql`(已含 `(path, name)` 唯一索引,便于重复执行 `npm run index` 时 upsert)。
|
|
129
|
+
- 旧库若只有早期表结构:执行 `sql/migrations/002_symbols_unique_path_name.sql`(若已有重复 `path+name` 需先清理)。
|
|
130
|
+
|
|
131
|
+
2. **配置 MySQL**(`.env` 中 `MYSQL_ENABLED=true` 等)。
|
|
132
|
+
|
|
133
|
+
3. **跑索引**(日志在 stderr,不污染 MCP stdout):
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
npm run index
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
可选环境变量(见 `.env.example`):
|
|
140
|
+
|
|
141
|
+
| 变量 | 含义 |
|
|
142
|
+
| -------------- | --------------------------------------- |
|
|
143
|
+
| `INDEX_ROOT` | 工程根目录,默认当前工作目录 |
|
|
144
|
+
| `INDEX_GLOB` | 逗号分隔 glob,默认 `src/**/*.{ts,tsx}` |
|
|
145
|
+
| `INDEX_IGNORE` | 额外忽略的 glob 片段(逗号分隔) |
|
|
146
|
+
|
|
147
|
+
**分类规则(首版启发式)**:`interface` / `type` → `type`;`.tsx` 且函数体含 JSX → `component`;路径或导出名含 `selector` → `selector`;其余导出函数 → `util`;`class` → `util`(可后续细化)。
|
|
148
|
+
|
|
149
|
+
**常见错误 `ECONNREFUSED 127.0.0.1:3306`**:本机没有在该端口监听 MySQL。请先启动数据库服务(例如 macOS Homebrew:`brew services start mysql` / `mariadb`),或把 `.env` 里的 `MYSQL_HOST`、`MYSQL_PORT` 改成你实际使用的实例(含 Docker 映射端口)。索引脚本会先执行 `SELECT 1` 再扫描代码,避免库不可用时仍跑完解析。
|
|
150
|
+
|
|
151
|
+
## 6) 后续演进建议
|
|
152
|
+
|
|
153
|
+
- 新增 Tool:`list_dependencies`、`get_usage_stats`
|
|
154
|
+
- Indexer:更细的 selector 识别、`export default` 命名、类组件等
|
|
155
|
+
- Phase 5 语义检索已落地(见下文);后续可换 pgvector / FAISS、更大模型
|
|
156
|
+
|
|
157
|
+
## 8) Phase 3(增强)
|
|
158
|
+
|
|
159
|
+
- `search_symbols` 已支持 `ranked` 参数(默认 `true`),返回 `score` 和 `reason`。
|
|
160
|
+
- 新增 `search_by_structure`,可按 `fields`(匹配 `meta.props/params/properties/hooks`)检索。
|
|
161
|
+
- 两个搜索 tool 的 ranking 已升级:除可读 `reason` 外,还返回结构化 `reasonDetail`(含各维度得分、权重和匹配方式),方便前端/Agent解释。
|
|
162
|
+
|
|
163
|
+
示例:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
{
|
|
167
|
+
"fields": ["onChange", "value"],
|
|
168
|
+
"type": "component",
|
|
169
|
+
"limit": 10
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
`reindex` 示例(Inspector / Agent 可直接调用,不用回终端):
|
|
174
|
+
|
|
175
|
+
```json
|
|
176
|
+
{
|
|
177
|
+
"dryRun": false
|
|
178
|
+
}
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
可选参数:
|
|
182
|
+
|
|
183
|
+
- `projectRoot`: 指定索引根目录(默认 MCP 进程当前目录)
|
|
184
|
+
- `globPatterns`: 自定义扫描 glob 列表
|
|
185
|
+
- `ignore`: 额外忽略规则
|
|
186
|
+
- `dryRun`: `true` 时只扫描,不写 MySQL
|
|
187
|
+
|
|
188
|
+
## 9) Phase 4(Skill)
|
|
189
|
+
|
|
190
|
+
- 新增 Skill Tool:`recommend_component`
|
|
191
|
+
- 流程已落地:关键词搜索 -> 结构过滤(可选 `props`)-> ranking -> detail 补全 -> 返回 reason
|
|
192
|
+
- 新增 Prompt:`recommend-component`(用于在支持 MCP Prompt 的客户端快速触发该流程)
|
|
193
|
+
|
|
194
|
+
示例:
|
|
195
|
+
|
|
196
|
+
```json
|
|
197
|
+
{
|
|
198
|
+
"query": "带校验的表单组件",
|
|
199
|
+
"props": ["value", "onChange"],
|
|
200
|
+
"limit": 3
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## 10) Phase 5(语义检索,可选)
|
|
205
|
+
|
|
206
|
+
1. **迁移**:若库是在增加 `embedding` 列之前创建的,执行:
|
|
207
|
+
|
|
208
|
+
```bash
|
|
209
|
+
mysql -u root -p code_intelligence < sql/migrations/003_add_embedding.sql
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
2. **Python 依赖**(建议虚拟环境;首次运行会下载模型权重,体积约数百 MB):
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
cd embedding-service
|
|
216
|
+
python3 -m venv .venv
|
|
217
|
+
source .venv/bin/activate
|
|
218
|
+
pip install -r requirements.txt
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
3. **启动嵌入服务**(默认 `127.0.0.1:8765`):
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
npm run embedding:dev
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
4. **`.env`** 增加 `EMBEDDING_SERVICE_URL=http://127.0.0.1:8765`,再执行 **`npm run index`** 或 MCP **`reindex`**(`dryRun=false`)写入向量。未配置 URL 时与 Phase 2 行为一致,不写入 `embedding`。
|
|
228
|
+
|
|
229
|
+
5. **`search_symbols`**:传入 `semantic: true` 可做自然语言检索;可选 `limit`(默认 20)。返回中会含 `semanticSimilarity`(余弦相似度)。当前实现按 `usage_count` 取最多 3000 条有向量的候选再精排;超大规模仓库请改为 ANN。
|
|
230
|
+
|
|
231
|
+
环境变量 **`EMBEDDING_MODEL`**(仅 Python):覆盖默认的 `all-MiniLM-L6-v2`。
|
|
232
|
+
|
|
233
|
+
## 7) VS Code 迁移
|
|
234
|
+
|
|
235
|
+
迁移步骤见 `docs/vscode-mcp-migration.md`。
|
|
236
|
+
|
|
237
|
+
# 使用说明
|
|
238
|
+
|
|
239
|
+
Run with:
|
|
240
|
+
|
|
241
|
+
````bash
|
|
242
|
+
npx code-intelligence-mcp
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
### MCP 配置(核心)
|
|
246
|
+
|
|
247
|
+
```md
|
|
248
|
+
## MCP Config
|
|
249
|
+
|
|
250
|
+
```json
|
|
251
|
+
{
|
|
252
|
+
"mcpServers": {
|
|
253
|
+
"code-intelligence": {
|
|
254
|
+
"command": "npx",
|
|
255
|
+
"args": ["code-intelligence-mcp"]
|
|
256
|
+
}
|
|
257
|
+
}
|
|
258
|
+
}
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
### 支持的 Tools Prompts
|
|
262
|
+
|
|
263
|
+
```md
|
|
264
|
+
## Tools
|
|
265
|
+
|
|
266
|
+
- search_symbols
|
|
267
|
+
- get_symbol_detail
|
|
268
|
+
- search_by_structure
|
|
269
|
+
- recommend_component
|
|
270
|
+
- reindex
|
|
271
|
+
|
|
272
|
+
## Prompts
|
|
273
|
+
|
|
274
|
+
- recommend-component
|
|
275
|
+
- reusable-code-advisor
|
|
276
|
+
````
|
|
@@ -0,0 +1,349 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* PR/CI 用重复实现检测(最小版)
|
|
4
|
+
*
|
|
5
|
+
* 策略:
|
|
6
|
+
* - 只分析 changed files 中可索引的导出代码块
|
|
7
|
+
* - 与库内同 type 的存量代码块做语义相似度(cosine)匹配
|
|
8
|
+
* - 对 component:要求 newProps 是 oldProps 的超集(或至少覆盖大部分)才判定为“重复/可合并”
|
|
9
|
+
*
|
|
10
|
+
* 输出:
|
|
11
|
+
* - duplicate-report.json
|
|
12
|
+
* - duplicate-report.md(中文,适合 PR 评论)
|
|
13
|
+
*/
|
|
14
|
+
import { readFileSync, writeFileSync } from 'node:fs';
|
|
15
|
+
import { resolve } from 'node:path';
|
|
16
|
+
import dotenv from 'dotenv';
|
|
17
|
+
import { env, validateEnv } from '../config/env.js';
|
|
18
|
+
import { getMySqlPool } from '../db/mysql.js';
|
|
19
|
+
import { indexedRowToEmbedText } from '../indexer/embedText.js';
|
|
20
|
+
import { indexProject } from '../indexer/indexProject.js';
|
|
21
|
+
import { createEmbeddingClient, embedAll, } from '../services/embeddingClient.js';
|
|
22
|
+
import { cosineSimilarity } from '../services/vectorMath.js';
|
|
23
|
+
dotenv.config();
|
|
24
|
+
function parseArgs(argv) {
|
|
25
|
+
const args = new Map();
|
|
26
|
+
for (let i = 0; i < argv.length; i++) {
|
|
27
|
+
const a = argv[i];
|
|
28
|
+
if (!a.startsWith('--'))
|
|
29
|
+
continue;
|
|
30
|
+
const key = a.slice(2);
|
|
31
|
+
const value = argv[i + 1] && !argv[i + 1].startsWith('--') ? argv[i + 1] : 'true';
|
|
32
|
+
args.set(key, value);
|
|
33
|
+
}
|
|
34
|
+
return args;
|
|
35
|
+
}
|
|
36
|
+
function readLines(path) {
|
|
37
|
+
const raw = readFileSync(path, 'utf8');
|
|
38
|
+
return raw
|
|
39
|
+
.split(/\r?\n/g)
|
|
40
|
+
.map((l) => l.trim())
|
|
41
|
+
.filter(Boolean);
|
|
42
|
+
}
|
|
43
|
+
function uniqueLower(items) {
|
|
44
|
+
const out = [];
|
|
45
|
+
const seen = new Set();
|
|
46
|
+
for (const it of items) {
|
|
47
|
+
const k = it.trim();
|
|
48
|
+
if (!k)
|
|
49
|
+
continue;
|
|
50
|
+
const low = k.toLowerCase();
|
|
51
|
+
if (seen.has(low))
|
|
52
|
+
continue;
|
|
53
|
+
seen.add(low);
|
|
54
|
+
out.push(k);
|
|
55
|
+
}
|
|
56
|
+
return out;
|
|
57
|
+
}
|
|
58
|
+
function getMetaArray(meta, key) {
|
|
59
|
+
if (!meta)
|
|
60
|
+
return [];
|
|
61
|
+
const v = meta[key];
|
|
62
|
+
if (!Array.isArray(v))
|
|
63
|
+
return [];
|
|
64
|
+
return v.filter((x) => typeof x === 'string');
|
|
65
|
+
}
|
|
66
|
+
function propsForRow(row) {
|
|
67
|
+
if (row.type !== 'component')
|
|
68
|
+
return [];
|
|
69
|
+
return uniqueLower(getMetaArray(row.meta, 'props'));
|
|
70
|
+
}
|
|
71
|
+
function coverageAndSuperset(newProps, oldProps) {
|
|
72
|
+
if (oldProps.length === 0) {
|
|
73
|
+
return { coverage: 1, isSuperset: true };
|
|
74
|
+
}
|
|
75
|
+
const newSet = new Set(newProps.map((p) => p.toLowerCase()));
|
|
76
|
+
let hit = 0;
|
|
77
|
+
for (const p of oldProps) {
|
|
78
|
+
if (newSet.has(p.toLowerCase()))
|
|
79
|
+
hit += 1;
|
|
80
|
+
}
|
|
81
|
+
const coverage = hit / oldProps.length;
|
|
82
|
+
const isSuperset = hit === oldProps.length;
|
|
83
|
+
return { coverage, isSuperset };
|
|
84
|
+
}
|
|
85
|
+
function toFixed4(n) {
|
|
86
|
+
return Number(n.toFixed(4));
|
|
87
|
+
}
|
|
88
|
+
function escapeMd(text) {
|
|
89
|
+
return text.replace(/\|/g, '\\|');
|
|
90
|
+
}
|
|
91
|
+
async function main() {
|
|
92
|
+
const args = parseArgs(process.argv.slice(2));
|
|
93
|
+
const changedFilesPath = args.get('changed-files') ?? 'changed_files.txt';
|
|
94
|
+
const outJson = args.get('out-json') ?? 'duplicate-report.json';
|
|
95
|
+
const outMd = args.get('out-md') ?? 'duplicate-report.md';
|
|
96
|
+
const blockThreshold = Number(args.get('block-threshold') ??
|
|
97
|
+
process.env.DUPLICATE_BLOCK_THRESHOLD ??
|
|
98
|
+
'0.95');
|
|
99
|
+
const warnThreshold = Number(args.get('warn-threshold') ??
|
|
100
|
+
process.env.DUPLICATE_WARN_THRESHOLD ??
|
|
101
|
+
'0.85');
|
|
102
|
+
const propsCoverageThreshold = Number(args.get('props-coverage') ?? '1');
|
|
103
|
+
const candidateLimit = Number(args.get('candidate-limit') ?? '3000');
|
|
104
|
+
// Mock 模式:不连接真实服务,仅测试报告生成流程
|
|
105
|
+
const isMockMode = args.get('is-mock-mode') === 'true';
|
|
106
|
+
if (isMockMode) {
|
|
107
|
+
console.log('[duplicate-check] 🔧 Mock 模式:跳过 MySQL 和 embedding service');
|
|
108
|
+
}
|
|
109
|
+
else {
|
|
110
|
+
validateEnv();
|
|
111
|
+
const pool = getMySqlPool();
|
|
112
|
+
if (!pool || !env.mysqlEnabled) {
|
|
113
|
+
throw new Error('duplicate-check 需要 MYSQL_ENABLED=true 并可连接 MySQL。');
|
|
114
|
+
}
|
|
115
|
+
if (!env.embeddingServiceUrl) {
|
|
116
|
+
throw new Error('duplicate-check 需要 EMBEDDING_SERVICE_URL(embedding service)。');
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
// Type narrowing for TS (pool is guaranteed non-null after guards above)
|
|
120
|
+
const mysqlPool = isMockMode ? null : getMySqlPool();
|
|
121
|
+
const projectRoot = resolve(process.cwd());
|
|
122
|
+
const changed = readLines(changedFilesPath)
|
|
123
|
+
.filter((p) => p.endsWith('.ts') || p.endsWith('.tsx'))
|
|
124
|
+
.filter((p) => !p.includes('/node_modules/') && !p.includes('/dist/'));
|
|
125
|
+
if (changed.length === 0) {
|
|
126
|
+
const empty = {
|
|
127
|
+
ok: true,
|
|
128
|
+
blockingCount: 0,
|
|
129
|
+
warningCount: 0,
|
|
130
|
+
maxSimilarity: 0,
|
|
131
|
+
findings: [],
|
|
132
|
+
note: '本次 PR 未包含可索引的 .ts/.tsx 变更文件。',
|
|
133
|
+
};
|
|
134
|
+
writeFileSync(outJson, JSON.stringify(empty, null, 2));
|
|
135
|
+
writeFileSync(outMd, '## 重复实现检测(CI)\n\n本次 PR 未包含可索引的 `.ts/.tsx` 变更文件。\n');
|
|
136
|
+
return;
|
|
137
|
+
}
|
|
138
|
+
// 1) 仅解析变更文件(通过传绝对路径给 indexProject.globPatterns)
|
|
139
|
+
const absPatterns = changed.map((p) => resolve(projectRoot, p).replace(/\\/g, '/'));
|
|
140
|
+
const rows = await indexProject({ projectRoot, globPatterns: absPatterns });
|
|
141
|
+
if (rows.length === 0) {
|
|
142
|
+
const empty = {
|
|
143
|
+
ok: true,
|
|
144
|
+
blockingCount: 0,
|
|
145
|
+
warningCount: 0,
|
|
146
|
+
maxSimilarity: 0,
|
|
147
|
+
findings: [],
|
|
148
|
+
note: '变更文件中未抽取到可索引导出代码块。',
|
|
149
|
+
};
|
|
150
|
+
writeFileSync(outJson, JSON.stringify(empty, null, 2));
|
|
151
|
+
writeFileSync(outMd, '## 重复实现检测(CI)\n\n变更文件中未抽取到可索引导出代码块。\n');
|
|
152
|
+
return;
|
|
153
|
+
}
|
|
154
|
+
// 2) 计算本次变更代码块 embedding(批量)或使用 mock
|
|
155
|
+
let vecs;
|
|
156
|
+
let client = null;
|
|
157
|
+
if (isMockMode) {
|
|
158
|
+
// Mock 模式:生成随机向量(维度 1024,与 embedding service 一致)
|
|
159
|
+
vecs = rows.map(() => Array.from({ length: 1024 }, () => Math.random() * 2 - 1));
|
|
160
|
+
}
|
|
161
|
+
else {
|
|
162
|
+
client = createEmbeddingClient(env.embeddingServiceUrl);
|
|
163
|
+
const texts = rows.map(indexedRowToEmbedText);
|
|
164
|
+
vecs = await embedAll(client, texts);
|
|
165
|
+
}
|
|
166
|
+
async function loadCandidates(type) {
|
|
167
|
+
if (isMockMode) {
|
|
168
|
+
// Mock 模式:返回空候选,模拟"无重复"的检测结果
|
|
169
|
+
return [];
|
|
170
|
+
}
|
|
171
|
+
if (!mysqlPool)
|
|
172
|
+
return [];
|
|
173
|
+
const [dbRows] = await mysqlPool.query(`
|
|
174
|
+
SELECT id, name, type, path, CAST(meta AS CHAR) AS meta, embedding
|
|
175
|
+
FROM symbols
|
|
176
|
+
WHERE type = ? AND embedding IS NOT NULL
|
|
177
|
+
ORDER BY usage_count DESC
|
|
178
|
+
LIMIT ?
|
|
179
|
+
`, [type, candidateLimit]);
|
|
180
|
+
const out = [];
|
|
181
|
+
for (const r of dbRows) {
|
|
182
|
+
let meta = null;
|
|
183
|
+
try {
|
|
184
|
+
meta = r.meta ? JSON.parse(r.meta) : null;
|
|
185
|
+
}
|
|
186
|
+
catch {
|
|
187
|
+
meta = null;
|
|
188
|
+
}
|
|
189
|
+
let emb = null;
|
|
190
|
+
try {
|
|
191
|
+
const parsed = typeof r.embedding === 'string'
|
|
192
|
+
? JSON.parse(r.embedding)
|
|
193
|
+
: r.embedding;
|
|
194
|
+
if (Array.isArray(parsed)) {
|
|
195
|
+
const nums = parsed.map((x) => Number(x));
|
|
196
|
+
if (nums.every((n) => Number.isFinite(n)))
|
|
197
|
+
emb = nums;
|
|
198
|
+
}
|
|
199
|
+
}
|
|
200
|
+
catch {
|
|
201
|
+
emb = null;
|
|
202
|
+
}
|
|
203
|
+
if (!emb)
|
|
204
|
+
continue;
|
|
205
|
+
out.push({
|
|
206
|
+
id: Number(r.id),
|
|
207
|
+
name: String(r.name),
|
|
208
|
+
type: r.type,
|
|
209
|
+
path: String(r.path),
|
|
210
|
+
meta,
|
|
211
|
+
embedding: emb,
|
|
212
|
+
});
|
|
213
|
+
}
|
|
214
|
+
return out;
|
|
215
|
+
}
|
|
216
|
+
const candidatesByType = new Map();
|
|
217
|
+
async function getCandidates(type) {
|
|
218
|
+
const cached = candidatesByType.get(type);
|
|
219
|
+
if (cached)
|
|
220
|
+
return cached;
|
|
221
|
+
const loaded = await loadCandidates(type);
|
|
222
|
+
candidatesByType.set(type, loaded);
|
|
223
|
+
return loaded;
|
|
224
|
+
}
|
|
225
|
+
const findings = [];
|
|
226
|
+
let maxSimilarity = 0;
|
|
227
|
+
for (let i = 0; i < rows.length; i++) {
|
|
228
|
+
const row = rows[i];
|
|
229
|
+
const qv = vecs[i];
|
|
230
|
+
const cand = await getCandidates(row.type);
|
|
231
|
+
let best = null;
|
|
232
|
+
for (const c of cand) {
|
|
233
|
+
if (c.embedding.length !== qv.length)
|
|
234
|
+
continue;
|
|
235
|
+
const sim = cosineSimilarity(qv, c.embedding);
|
|
236
|
+
if (!best || sim > best.sim)
|
|
237
|
+
best = { c, sim };
|
|
238
|
+
}
|
|
239
|
+
if (!best)
|
|
240
|
+
continue;
|
|
241
|
+
const sim = best.sim;
|
|
242
|
+
if (sim > maxSimilarity)
|
|
243
|
+
maxSimilarity = sim;
|
|
244
|
+
// props 超集判定:仅对 component 生效;其它类型只用语义相似度。
|
|
245
|
+
const newProps = propsForRow(row);
|
|
246
|
+
const oldProps = best.c.type === 'component'
|
|
247
|
+
? uniqueLower(getMetaArray(best.c.meta, 'props'))
|
|
248
|
+
: [];
|
|
249
|
+
const { coverage, isSuperset } = row.type === 'component'
|
|
250
|
+
? coverageAndSuperset(newProps, oldProps)
|
|
251
|
+
: { coverage: 1, isSuperset: true };
|
|
252
|
+
const propsOk = row.type !== 'component'
|
|
253
|
+
? true
|
|
254
|
+
: coverage >= propsCoverageThreshold;
|
|
255
|
+
const level = sim >= blockThreshold && propsOk && isSuperset
|
|
256
|
+
? 'blocking'
|
|
257
|
+
: sim >= warnThreshold && propsOk && isSuperset
|
|
258
|
+
? 'warning'
|
|
259
|
+
: null;
|
|
260
|
+
if (!level)
|
|
261
|
+
continue;
|
|
262
|
+
findings.push({
|
|
263
|
+
level,
|
|
264
|
+
symbol: {
|
|
265
|
+
name: row.name,
|
|
266
|
+
type: row.type,
|
|
267
|
+
path: row.path,
|
|
268
|
+
props: newProps,
|
|
269
|
+
},
|
|
270
|
+
bestMatch: {
|
|
271
|
+
id: best.c.id,
|
|
272
|
+
name: best.c.name,
|
|
273
|
+
type: best.c.type,
|
|
274
|
+
path: best.c.path,
|
|
275
|
+
similarity: toFixed4(sim),
|
|
276
|
+
props: oldProps,
|
|
277
|
+
propsCoverage: toFixed4(coverage),
|
|
278
|
+
propsIsSuperset: isSuperset,
|
|
279
|
+
},
|
|
280
|
+
});
|
|
281
|
+
}
|
|
282
|
+
const blockingCount = findings.filter((f) => f.level === 'blocking').length;
|
|
283
|
+
const warningCount = findings.filter((f) => f.level === 'warning').length;
|
|
284
|
+
// JSON 报告(供 workflow 读 blockingCount/maxSimilarity)
|
|
285
|
+
const report = {
|
|
286
|
+
ok: true,
|
|
287
|
+
mockMode: isMockMode,
|
|
288
|
+
blockingCount,
|
|
289
|
+
warningCount,
|
|
290
|
+
maxSimilarity: toFixed4(maxSimilarity),
|
|
291
|
+
thresholds: {
|
|
292
|
+
blockThreshold,
|
|
293
|
+
warnThreshold,
|
|
294
|
+
propsCoverageThreshold,
|
|
295
|
+
candidateLimit,
|
|
296
|
+
},
|
|
297
|
+
changedFiles: changed,
|
|
298
|
+
extractedSymbols: rows.map((r) => ({
|
|
299
|
+
name: r.name,
|
|
300
|
+
type: r.type,
|
|
301
|
+
path: r.path,
|
|
302
|
+
})),
|
|
303
|
+
findings,
|
|
304
|
+
};
|
|
305
|
+
writeFileSync(outJson, JSON.stringify(report, null, 2));
|
|
306
|
+
// 中文 Markdown(PR 评论)
|
|
307
|
+
const lines = [];
|
|
308
|
+
lines.push('## 重复实现检测(CI)');
|
|
309
|
+
if (isMockMode) {
|
|
310
|
+
lines.push('');
|
|
311
|
+
lines.push('> ⚠️ **Mock 模式**:本次检测未连接真实 MySQL/embedding service,结果仅供参考。');
|
|
312
|
+
}
|
|
313
|
+
lines.push('');
|
|
314
|
+
lines.push(`- 阻断(blocking)阈值:语义相似度 ≥ **${blockThreshold}** 且 **props 超集**`);
|
|
315
|
+
lines.push(`- 告警(warning)阈值:语义相似度 ≥ **${warnThreshold}** 且 **props 超集**`);
|
|
316
|
+
if (propsCoverageThreshold < 1) {
|
|
317
|
+
lines.push(`- props 覆盖阈值:覆盖率 ≥ **${propsCoverageThreshold}**(当前要求超集时仍会校验覆盖率)`);
|
|
318
|
+
}
|
|
319
|
+
lines.push('');
|
|
320
|
+
if (findings.length === 0) {
|
|
321
|
+
lines.push('未发现需要提示的重复实现候选。');
|
|
322
|
+
lines.push('');
|
|
323
|
+
}
|
|
324
|
+
else {
|
|
325
|
+
lines.push(`本次检测发现:**阻断 ${blockingCount}** 条,**告警 ${warningCount}** 条。`);
|
|
326
|
+
lines.push('');
|
|
327
|
+
lines.push('| 级别 | 新增/改动代码块 | 类型 | 最相似存量 | 相似度 | props 超集 | props 覆盖率 |');
|
|
328
|
+
lines.push('|---|---|---|---|---:|---:|---:|');
|
|
329
|
+
for (const f of findings) {
|
|
330
|
+
const newRef = `${escapeMd(f.symbol.name)} \\(${escapeMd(f.symbol.path)}\\)`;
|
|
331
|
+
const oldRef = `${escapeMd(f.bestMatch.name)} \\(${escapeMd(f.bestMatch.path)}\\)`;
|
|
332
|
+
lines.push(`| ${f.level === 'blocking' ? '阻断' : '告警'} | ${newRef} | ${f.symbol.type} | ${oldRef} | ${f.bestMatch.similarity} | ${f.bestMatch.propsIsSuperset ? '是' : '否'} | ${f.bestMatch.propsCoverage} |`);
|
|
333
|
+
}
|
|
334
|
+
lines.push('');
|
|
335
|
+
lines.push('### 处理建议');
|
|
336
|
+
lines.push('- **优先复用/扩展存量组件**:如果只是新增少量属性,建议把属性合并到存量组件并统一出口。');
|
|
337
|
+
lines.push('- **若确需新建**:请在 PR 描述中说明为什么不能复用(领域差异、历史包袱、兼容性约束等),并由 Owner 审核通过。');
|
|
338
|
+
lines.push('');
|
|
339
|
+
}
|
|
340
|
+
writeFileSync(outMd, lines.join('\n'));
|
|
341
|
+
// Exit code:有 blocking 则非 0(让 workflow fail)
|
|
342
|
+
if (blockingCount > 0) {
|
|
343
|
+
process.exit(1);
|
|
344
|
+
}
|
|
345
|
+
}
|
|
346
|
+
main().catch((err) => {
|
|
347
|
+
console.error('[duplicate-check] failed:', err);
|
|
348
|
+
process.exit(2);
|
|
349
|
+
});
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* Phase 2 CLI:扫描代码库并写入 MySQL `symbols`(需 `MYSQL_ENABLED=true`)。
|
|
4
|
+
*/
|
|
5
|
+
import { resolve } from "node:path";
|
|
6
|
+
import dotenv from "dotenv";
|
|
7
|
+
import { runReindex } from "../services/reindex.js";
|
|
8
|
+
dotenv.config();
|
|
9
|
+
/**
|
|
10
|
+
* 入口:校验环境 → 连接池 → 按 `INDEX_*` 调用 `indexProject` → `upsertSymbols`。
|
|
11
|
+
* 进度与统计输出到 **stderr**,避免占用 stdout(与 MCP 混用时更安全)。
|
|
12
|
+
* 进程退出码:成功 `0`,无 MySQL 或异常 `1`。
|
|
13
|
+
*/
|
|
14
|
+
async function main() {
|
|
15
|
+
const projectRoot = resolve(process.env.INDEX_ROOT ?? process.cwd());
|
|
16
|
+
const globPatterns = process.env.INDEX_GLOB
|
|
17
|
+
? process.env.INDEX_GLOB.split(",").map((s) => s.trim())
|
|
18
|
+
: undefined;
|
|
19
|
+
const ignore = process.env.INDEX_IGNORE
|
|
20
|
+
? process.env.INDEX_IGNORE.split(",").map((s) => s.trim())
|
|
21
|
+
: undefined;
|
|
22
|
+
console.error(`[index] projectRoot=${projectRoot}`);
|
|
23
|
+
const result = await runReindex({ projectRoot, globPatterns, ignore, dryRun: false });
|
|
24
|
+
console.error(`[index] extracted ${result.extractedCount} symbol(s)`);
|
|
25
|
+
console.error(`[index] embeddings computed: ${result.embeddingsComputed}`);
|
|
26
|
+
console.error("[index] upserted into MySQL");
|
|
27
|
+
}
|
|
28
|
+
main().catch((err) => {
|
|
29
|
+
console.error("[index] failed:", err);
|
|
30
|
+
const anyErr = err;
|
|
31
|
+
if (anyErr.code === "ECONNREFUSED") {
|
|
32
|
+
const host = process.env.MYSQL_HOST ?? "127.0.0.1";
|
|
33
|
+
const port = process.env.MYSQL_PORT ?? "3306";
|
|
34
|
+
console.error(`[index] 原因: 无法连接 ${host}:${port}(连接被拒绝)。请先在本机启动 MySQL/MariaDB,或把 .env 里的 MYSQL_HOST / MYSQL_PORT 改成实际地址。macOS 可用 brew services start mysql 等方式启动。`);
|
|
35
|
+
}
|
|
36
|
+
else if (anyErr.code === "ER_ACCESS_DENIED_ERROR") {
|
|
37
|
+
console.error("[index] 原因: 用户名或密码错误,请检查 MYSQL_USER / MYSQL_PASSWORD。");
|
|
38
|
+
}
|
|
39
|
+
else if (anyErr.code === "ENOTFOUND" || anyErr.code === "ETIMEDOUT") {
|
|
40
|
+
console.error("[index] 原因: 网络不可达或超时,请检查 MYSQL_HOST 是否可解析、防火墙与安全组。");
|
|
41
|
+
}
|
|
42
|
+
process.exit(1);
|
|
43
|
+
});
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
import dotenv from "dotenv";
|
|
2
|
+
dotenv.config();
|
|
3
|
+
const requiredWhenEnabled = ["MYSQL_HOST", "MYSQL_USER", "MYSQL_DATABASE"];
|
|
4
|
+
export const env = {
|
|
5
|
+
mysqlEnabled: process.env.MYSQL_ENABLED === "true",
|
|
6
|
+
mysqlHost: process.env.MYSQL_HOST ?? "127.0.0.1",
|
|
7
|
+
mysqlPort: Number(process.env.MYSQL_PORT ?? "3306"),
|
|
8
|
+
mysqlUser: process.env.MYSQL_USER ?? "root",
|
|
9
|
+
mysqlPassword: process.env.MYSQL_PASSWORD ?? "",
|
|
10
|
+
mysqlDatabase: process.env.MYSQL_DATABASE ?? "code_intelligence",
|
|
11
|
+
/** Phase 5:指向 Python FastAPI 嵌入服务根 URL,如 http://127.0.0.1:8765 */
|
|
12
|
+
embeddingServiceUrl: (process.env.EMBEDDING_SERVICE_URL ?? "").trim()
|
|
13
|
+
};
|
|
14
|
+
export function validateEnv() {
|
|
15
|
+
if (!env.mysqlEnabled) {
|
|
16
|
+
return;
|
|
17
|
+
}
|
|
18
|
+
for (const key of requiredWhenEnabled) {
|
|
19
|
+
if (!process.env[key]) {
|
|
20
|
+
throw new Error(`Missing environment variable: ${key}`);
|
|
21
|
+
}
|
|
22
|
+
}
|
|
23
|
+
}
|