@gulu9527/code-trust 0.2.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README-CN.md +256 -0
- package/README.md +42 -4
- package/dist/cli/index.js +337 -122
- package/dist/cli/index.js.map +1 -1
- package/dist/index.d.ts +46 -2
- package/dist/index.js +291 -108
- package/dist/index.js.map +1 -1
- package/docs/codetrust-deep-research-report-zh-en.md +802 -0
- package/package.json +6 -2
|
@@ -0,0 +1,802 @@
|
|
|
1
|
+
# CodeTrust 深度调研报告 / CodeTrust Deep Research Report
|
|
2
|
+
|
|
3
|
+
> 版本 / Version: v1
|
|
4
|
+
> 日期 / Date: 2026-03-11
|
|
5
|
+
> 形式 / Format: 中文主文 + 英文译文
|
|
6
|
+
> 研究范围 / Scope: CodeTrust 产品方向、CI 接入策略、baseline/suppression/SARIF/PR summary 最佳实践
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 1. 执行摘要 / Executive Summary
|
|
11
|
+
|
|
12
|
+
### 中文
|
|
13
|
+
|
|
14
|
+
CodeTrust 已经不再是“想法验证”阶段,而是进入了“第一轮收敛”阶段。当前仓库已经具备一套可运行的产品骨架:CLI 入口、核心扫描引擎、规则引擎、评分逻辑、自动修复骨架、规则测试,以及 GitHub Action 初版。这说明产品方向已经成立,下一阶段的关键不再是继续增加规则数量,而是把工具从“会扫描、会打分”升级为“可以稳定进入 CI 的 trust gate”。
|
|
15
|
+
|
|
16
|
+
这轮调研后的核心判断是:**CodeTrust 下一阶段最关键的升级,不是 score model 更复杂,而是建立完整的 finding lifecycle。**
|
|
17
|
+
|
|
18
|
+
也就是说,CodeTrust 需要从下面这个旧模型:
|
|
19
|
+
|
|
20
|
+
- scanner + score
|
|
21
|
+
|
|
22
|
+
升级为这个新模型:
|
|
23
|
+
|
|
24
|
+
- finding identity
|
|
25
|
+
- finding lifecycle
|
|
26
|
+
- policy engine
|
|
27
|
+
- delivery channels
|
|
28
|
+
|
|
29
|
+
只有这四层建立起来,CodeTrust 才能真正支撑以下场景:
|
|
30
|
+
|
|
31
|
+
- 只拦截新增问题,而不是历史遗留问题
|
|
32
|
+
- 正确区分“文件未扫描”和“问题被压制”
|
|
33
|
+
- 在 PR、CLI、SARIF 中保持一致的结果语义
|
|
34
|
+
- 让 CI 决策有解释性,而不是只靠一个分数
|
|
35
|
+
|
|
36
|
+
本报告的最终建议是:**把 P0 重心从“继续扩规则/优化分数”转移到“稳定 finding 指纹、baseline 比对、suppression 语义、tool health 可见性”。**
|
|
37
|
+
|
|
38
|
+
### English Translation
|
|
39
|
+
|
|
40
|
+
CodeTrust is no longer in the “idea validation” stage. It has entered its first consolidation phase. The repository already contains a runnable product skeleton: CLI entrypoints, a core scan engine, a rule engine, scoring logic, an autofix foundation, rule-level tests, and an initial GitHub Action. This means the direction is already validated. The next step is not to keep adding more rules, but to turn the tool from “something that scans and scores” into “a trust gate that can reliably live inside CI.”
|
|
41
|
+
|
|
42
|
+
The central conclusion from this research is: **the most important upgrade for CodeTrust is not a smarter scoring model, but a complete finding lifecycle.**
|
|
43
|
+
|
|
44
|
+
In other words, CodeTrust needs to evolve from this old model:
|
|
45
|
+
|
|
46
|
+
- scanner + score
|
|
47
|
+
|
|
48
|
+
into this new model:
|
|
49
|
+
|
|
50
|
+
- finding identity
|
|
51
|
+
- finding lifecycle
|
|
52
|
+
- policy engine
|
|
53
|
+
- delivery channels
|
|
54
|
+
|
|
55
|
+
Only when these four layers exist can CodeTrust truly support the following workflows:
|
|
56
|
+
|
|
57
|
+
- block only newly introduced issues instead of legacy debt
|
|
58
|
+
- correctly distinguish “not scanned” from “suppressed”
|
|
59
|
+
- keep result semantics consistent across CLI, PR, and SARIF
|
|
60
|
+
- make CI decisions explainable instead of relying on a single score
|
|
61
|
+
|
|
62
|
+
The final recommendation of this report is: **move P0 focus away from “more rules / better scores” and toward stable finding fingerprints, baseline comparison, suppression semantics, and tool-health visibility.**
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 2. 调研方法 / Research Method
|
|
67
|
+
|
|
68
|
+
### 中文
|
|
69
|
+
|
|
70
|
+
本报告结合了两类输入:
|
|
71
|
+
|
|
72
|
+
1. **仓库现状判断**
|
|
73
|
+
- 结合当前 CodeTrust 代码结构与命令面,评估其所处阶段与短板。
|
|
74
|
+
2. **Exa 定向调研**
|
|
75
|
+
- 调研对象主要包括:
|
|
76
|
+
- Semgrep:diff-aware scan、finding lifecycle、ignore/suppression、blocking 行为
|
|
77
|
+
- GitHub Code Scanning:SARIF 支持子集、上传限制、结果去重、身份字段
|
|
78
|
+
- Snyk:ignore / policy file 模型
|
|
79
|
+
- reviewdog:PR annotations、changed-only 工作流
|
|
80
|
+
|
|
81
|
+
本次调研关注的不是“谁功能更多”,而是“成熟工具在 CI 接入、误报控制、问题生命周期管理上是怎么设计的”。
|
|
82
|
+
|
|
83
|
+
### English Translation
|
|
84
|
+
|
|
85
|
+
This report combines two inputs:
|
|
86
|
+
|
|
87
|
+
1. **Repository assessment**
|
|
88
|
+
- Reviewing the current CodeTrust structure and command surface to understand its maturity and its main gaps.
|
|
89
|
+
2. **Directed Exa research**
|
|
90
|
+
- The main systems examined were:
|
|
91
|
+
- Semgrep: diff-aware scan, finding lifecycle, ignore/suppression, blocking behavior
|
|
92
|
+
- GitHub Code Scanning: SARIF subset support, upload limits, deduplication, identity fields
|
|
93
|
+
- Snyk: ignore / policy file model
|
|
94
|
+
- reviewdog: PR annotations and changed-only workflows
|
|
95
|
+
|
|
96
|
+
The goal of this research was not to compare feature counts, but to understand how mature tools design CI integration, false-positive control, and finding lifecycle management.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## 3. 当前阶段判断 / Current Stage Assessment
|
|
101
|
+
|
|
102
|
+
### 中文
|
|
103
|
+
|
|
104
|
+
CodeTrust 的现状可以概括为:**产品骨架已经成立,可信度体系开始工程化,但 finding lifecycle / suppression / delivery layer 仍未完成。**
|
|
105
|
+
|
|
106
|
+
目前已经具备:
|
|
107
|
+
|
|
108
|
+
- CLI 入口与多命令表面
|
|
109
|
+
- 核心扫描引擎
|
|
110
|
+
- builtin rules 管理机制
|
|
111
|
+
- severity + dimension weight 的评分模型
|
|
112
|
+
- 自动修复骨架
|
|
113
|
+
- 规则单测
|
|
114
|
+
- 初版 GitHub Action
|
|
115
|
+
- 稳定 issue fingerprint 输出
|
|
116
|
+
- `toolHealth` 可见性字段
|
|
117
|
+
- 一轮面向 self-scan / CI trust gate 的安全规则误报收敛
|
|
118
|
+
|
|
119
|
+
但下一阶段最大的缺口不在“还缺几条规则”,而在于以下三件事:
|
|
120
|
+
|
|
121
|
+
1. **finding lifecycle 尚未落地**:虽然已有稳定指纹,但还没有 baseline/new/existing/fixed/suppressed 生命周期模型。
|
|
122
|
+
2. **suppression 语义缺失**:没有正式的 suppression 模型,就很难管理误报和可接受风险。
|
|
123
|
+
3. **delivery layer 不完整**:CLI、PR summary、annotation、SARIF 的职责还没有清晰分层。
|
|
124
|
+
|
|
125
|
+
### English Translation
|
|
126
|
+
|
|
127
|
+
CodeTrust’s current state can be summarized as: **the product skeleton exists, and parts of the trust system are now engineered, but finding lifecycle, suppression, and delivery are still incomplete.**
|
|
128
|
+
|
|
129
|
+
It already has:
|
|
130
|
+
|
|
131
|
+
- a CLI surface with multiple commands
|
|
132
|
+
- a core scan engine
|
|
133
|
+
- builtin rule management
|
|
134
|
+
- a severity + dimension-weight scoring model
|
|
135
|
+
- an autofix foundation
|
|
136
|
+
- rule-level tests
|
|
137
|
+
- an initial GitHub Action
|
|
138
|
+
- stable issue fingerprint output
|
|
139
|
+
- visible `toolHealth` metadata
|
|
140
|
+
- an initial round of false-positive reduction for self-scan / CI trust-gate scenarios
|
|
141
|
+
|
|
142
|
+
But the biggest gap is not “a few missing rules.” It is the absence of three key capabilities:
|
|
143
|
+
|
|
144
|
+
1. **Finding lifecycle is not implemented yet**: stable fingerprints now exist, but there is still no baseline/new/existing/fixed/suppressed lifecycle.
|
|
145
|
+
2. **Missing suppression semantics**: without a formal suppression model, false positives and acceptable risk cannot be managed well.
|
|
146
|
+
3. **Incomplete delivery layer**: CLI, PR summary, annotations, and SARIF do not yet have clearly separated responsibilities.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## 4. 核心研究发现 / Core Research Findings
|
|
151
|
+
|
|
152
|
+
### 4.1 Baseline 不是附加功能,而是核心数据模型 / Baseline Is a Core Data Model, Not an Optional Feature
|
|
153
|
+
|
|
154
|
+
#### 中文
|
|
155
|
+
|
|
156
|
+
Semgrep 的 diff-aware scan 不是简单“只看改动文件”,而是通过 finding identity 跟踪问题生命周期。它关注的不是某条告警的行号,而是更稳定的识别元素,例如规则 ID、文件路径、语义上下文和重复索引。
|
|
157
|
+
|
|
158
|
+
这对 CodeTrust 的直接启发是:
|
|
159
|
+
|
|
160
|
+
- baseline 不能只是“本次 JSON 与上次 JSON 做字符串对比”
|
|
161
|
+
- 必须先定义稳定 fingerprint
|
|
162
|
+
- baseline 设计应该早于 score model v2
|
|
163
|
+
|
|
164
|
+
否则会出现:
|
|
165
|
+
|
|
166
|
+
- 行号变化导致老问题变成新问题
|
|
167
|
+
- 重排代码引发误报回潮
|
|
168
|
+
- CI gate 噪音过高
|
|
169
|
+
|
|
170
|
+
#### English Translation
|
|
171
|
+
|
|
172
|
+
Semgrep’s diff-aware scan is not simply “check changed files only.” It tracks the lifecycle of a finding through finding identity. The goal is not to match alerts by line number, but by more stable elements such as rule ID, file path, semantic context, and occurrence index.
|
|
173
|
+
|
|
174
|
+
The direct implication for CodeTrust is:
|
|
175
|
+
|
|
176
|
+
- baseline cannot be implemented as “compare this JSON to the previous JSON as raw output”
|
|
177
|
+
- a stable fingerprint must come first
|
|
178
|
+
- baseline should be prioritized before score model v2
|
|
179
|
+
|
|
180
|
+
Without this, you will get:
|
|
181
|
+
|
|
182
|
+
- old findings being treated as new because line numbers moved
|
|
183
|
+
- noisy regressions caused by code reordering
|
|
184
|
+
- an overly noisy CI gate
|
|
185
|
+
|
|
186
|
+
### 4.2 exclude、disable、suppress 必须拆开 / Exclude, Disable, and Suppress Must Be Separate Concepts
|
|
187
|
+
|
|
188
|
+
#### 中文
|
|
189
|
+
|
|
190
|
+
成熟工具通常把这三件事分开:
|
|
191
|
+
|
|
192
|
+
1. **exclude / ignore path**:根本不扫描该目标
|
|
193
|
+
2. **disable rule / policy off**:扫描目标,但不启用某条规则
|
|
194
|
+
3. **suppress finding**:发现问题后,显式标记为 false positive 或 acceptable risk
|
|
195
|
+
|
|
196
|
+
CodeTrust 不能把它们都合并成一个“ignore”。
|
|
197
|
+
|
|
198
|
+
推荐最少暴露三类计数:
|
|
199
|
+
|
|
200
|
+
- `filesExcluded`
|
|
201
|
+
- `rulesDisabled`
|
|
202
|
+
- `findingsSuppressed`
|
|
203
|
+
|
|
204
|
+
如果把它们混为一谈,用户会无法判断:
|
|
205
|
+
|
|
206
|
+
- 是工具没扫到
|
|
207
|
+
- 是规则没开
|
|
208
|
+
- 还是问题被人为接受了
|
|
209
|
+
|
|
210
|
+
#### English Translation
|
|
211
|
+
|
|
212
|
+
Mature tools usually separate these three concepts:
|
|
213
|
+
|
|
214
|
+
1. **exclude / ignore path**: do not scan the target at all
|
|
215
|
+
2. **disable rule / policy off**: scan the target, but do not run a given rule
|
|
216
|
+
3. **suppress finding**: detect the issue, then explicitly mark it as a false positive or acceptable risk
|
|
217
|
+
|
|
218
|
+
CodeTrust should not collapse all of these into a single “ignore” concept.
|
|
219
|
+
|
|
220
|
+
At minimum, it should expose three counters:
|
|
221
|
+
|
|
222
|
+
- `filesExcluded`
|
|
223
|
+
- `rulesDisabled`
|
|
224
|
+
- `findingsSuppressed`
|
|
225
|
+
|
|
226
|
+
If these are blended together, users cannot tell whether:
|
|
227
|
+
|
|
228
|
+
- the tool never scanned the file
|
|
229
|
+
- the rule was disabled
|
|
230
|
+
- or the finding was explicitly accepted
|
|
231
|
+
|
|
232
|
+
### 4.3 SARIF 是目标平台协议,不是简单导出格式 / SARIF Is a Platform Protocol, Not Just Another Export Format
|
|
233
|
+
|
|
234
|
+
#### 中文
|
|
235
|
+
|
|
236
|
+
GitHub 对 SARIF 的支持是受限子集,而不是无条件兼容全部字段。调研中最值得注意的点包括:
|
|
237
|
+
|
|
238
|
+
- 文件大小限制
|
|
239
|
+
- 结果数、规则数、位置数等限制
|
|
240
|
+
- 结果去重与稳定身份字段的重要性
|
|
241
|
+
- `partialFingerprints` 对结果稳定性的重要价值
|
|
242
|
+
- `automationDetails.id` / category 对不同自动化流程的区分作用
|
|
243
|
+
|
|
244
|
+
因此,CodeTrust 后续做 SARIF 时,不能把现有 JSON 直接映射一层就结束,而应该单独设计 GitHub 兼容模式。
|
|
245
|
+
|
|
246
|
+
#### English Translation
|
|
247
|
+
|
|
248
|
+
GitHub supports a restricted subset of SARIF rather than the full model. The most important findings from the research were:
|
|
249
|
+
|
|
250
|
+
- file size limits
|
|
251
|
+
- limits on results, rules, and locations
|
|
252
|
+
- the importance of stable identity for deduplication
|
|
253
|
+
- the value of `partialFingerprints` for result stability
|
|
254
|
+
- the role of `automationDetails.id` / category in distinguishing automation streams
|
|
255
|
+
|
|
256
|
+
Therefore, when CodeTrust adds SARIF, it should not simply map the current JSON format into SARIF. It should design a GitHub-compatible SARIF mode deliberately.
|
|
257
|
+
|
|
258
|
+
### 4.4 suppressed finding 进入 SARIF 可能产生错误体验 / Suppressed Findings in SARIF Can Produce Bad UX
|
|
259
|
+
|
|
260
|
+
#### 中文
|
|
261
|
+
|
|
262
|
+
Semgrep 的历史讨论说明:即便把 suppressed findings 放进 SARIF 的 `suppressions` 字段,也不代表 GitHub 一定会按用户期待进行展示。现实中,这可能导致“本来已抑制的问题在 GitHub 里仍像开放告警一样出现”。
|
|
263
|
+
|
|
264
|
+
因此,CodeTrust 的默认策略应当是:
|
|
265
|
+
|
|
266
|
+
- CLI / JSON:可以保留 suppressed findings,并明确状态
|
|
267
|
+
- SARIF:默认不输出 suppressed findings
|
|
268
|
+
- 如有需要,再通过显式开关启用
|
|
269
|
+
|
|
270
|
+
推荐未来参数:
|
|
271
|
+
|
|
272
|
+
- `--sarif-include-suppressed`
|
|
273
|
+
|
|
274
|
+
默认值建议为 `false`。
|
|
275
|
+
|
|
276
|
+
#### English Translation
|
|
277
|
+
|
|
278
|
+
Semgrep’s history shows that even when suppressed findings are exported through SARIF `suppressions`, GitHub may not present them the way users expect. In practice, this can make previously suppressed issues appear as if they are still open alerts.
|
|
279
|
+
|
|
280
|
+
Therefore, CodeTrust’s default strategy should be:
|
|
281
|
+
|
|
282
|
+
- CLI / JSON: keep suppressed findings, but mark them clearly
|
|
283
|
+
- SARIF: do not export suppressed findings by default
|
|
284
|
+
- offer an explicit opt-in flag if needed
|
|
285
|
+
|
|
286
|
+
A future option could be:
|
|
287
|
+
|
|
288
|
+
- `--sarif-include-suppressed`
|
|
289
|
+
|
|
290
|
+
and the default should likely be `false`.
|
|
291
|
+
|
|
292
|
+
### 4.5 PR 集成应该分层,不应该只输出一种结果 / PR Integration Should Be Layered, Not Monolithic
|
|
293
|
+
|
|
294
|
+
#### 中文
|
|
295
|
+
|
|
296
|
+
PR 集成至少应该分成三层:
|
|
297
|
+
|
|
298
|
+
1. **Job Summary**:给人看的决策摘要
|
|
299
|
+
2. **Annotations / changed-line comments**:给开发者的精准修复提示
|
|
300
|
+
3. **SARIF upload**:给 GitHub Security / 历史跟踪的平台视图
|
|
301
|
+
|
|
302
|
+
它们的职责分别是:
|
|
303
|
+
|
|
304
|
+
- Summary 负责解释“为什么这次通过/失败”
|
|
305
|
+
- Annotation 负责指出“改哪一行、修什么”
|
|
306
|
+
- SARIF 负责“沉淀与平台化消费”
|
|
307
|
+
|
|
308
|
+
如果只做其中一种,体验会失衡。
|
|
309
|
+
|
|
310
|
+
#### English Translation
|
|
311
|
+
|
|
312
|
+
PR integration should have at least three layers:
|
|
313
|
+
|
|
314
|
+
1. **Job Summary**: a human-readable decision summary
|
|
315
|
+
2. **Annotations / changed-line comments**: precise developer-facing remediation hints
|
|
316
|
+
3. **SARIF upload**: the platform-facing security and historical view
|
|
317
|
+
|
|
318
|
+
Their responsibilities are different:
|
|
319
|
+
|
|
320
|
+
- Summary explains why the run passed or failed
|
|
321
|
+
- Annotations explain exactly what to fix and where
|
|
322
|
+
- SARIF supports persistence and platform-level consumption
|
|
323
|
+
|
|
324
|
+
If you build only one of these, the experience becomes unbalanced.
|
|
325
|
+
|
|
326
|
+
### 4.6 score model 的优先级低于 lifecycle 与 policy / Score Model Is Lower Priority Than Lifecycle and Policy
|
|
327
|
+
|
|
328
|
+
#### 中文
|
|
329
|
+
|
|
330
|
+
调研后最明确的结论之一是:成熟工具真正支撑 adoption 的,通常不是分数公式本身,而是:
|
|
331
|
+
|
|
332
|
+
- baseline 稳不稳定
|
|
333
|
+
- 新旧问题分得清不清楚
|
|
334
|
+
- suppression 是否合理
|
|
335
|
+
- CI 输出是否可解释
|
|
336
|
+
- 误报管理是否可持续
|
|
337
|
+
|
|
338
|
+
所以 CodeTrust 应该把 gate 设计成“双轨制”:
|
|
339
|
+
|
|
340
|
+
1. **blocking findings / blocking policy**
|
|
341
|
+
2. **score threshold**
|
|
342
|
+
|
|
343
|
+
也就是说:
|
|
344
|
+
|
|
345
|
+
- 某些问题 regardless of score 直接 fail
|
|
346
|
+
- 其余问题再用 score 评估整体可信度
|
|
347
|
+
|
|
348
|
+
#### English Translation
|
|
349
|
+
|
|
350
|
+
One of the clearest conclusions from the research is that adoption is rarely driven by the scoring formula alone. Mature tools succeed because of:
|
|
351
|
+
|
|
352
|
+
- stable baseline behavior
|
|
353
|
+
- clear new-vs-existing distinction
|
|
354
|
+
- practical suppression handling
|
|
355
|
+
- explainable CI output
|
|
356
|
+
- sustainable false-positive control
|
|
357
|
+
|
|
358
|
+
So CodeTrust should design its gate as a dual-track system:
|
|
359
|
+
|
|
360
|
+
1. **blocking findings / blocking policy**
|
|
361
|
+
2. **score threshold**
|
|
362
|
+
|
|
363
|
+
In other words:
|
|
364
|
+
|
|
365
|
+
- some issues should fail regardless of score
|
|
366
|
+
- the remaining issues can then contribute to the overall trust score
|
|
367
|
+
|
|
368
|
+
---
|
|
369
|
+
|
|
370
|
+
## 5. 设计原则 / Design Principles
|
|
371
|
+
|
|
372
|
+
### 中文
|
|
373
|
+
|
|
374
|
+
基于这轮调研,建议 CodeTrust 采用以下产品与工程原则:
|
|
375
|
+
|
|
376
|
+
1. **先定义 finding identity,再做 baseline 与 SARIF。**
|
|
377
|
+
2. **先区分 tool health 和 code risk,再谈评分可信度。**
|
|
378
|
+
3. **在 CI 中优先支持 only-new-findings,而不是全量历史问题阻塞。**
|
|
379
|
+
4. **suppressions 必须显式、有理由、最好可过期。**
|
|
380
|
+
5. **SARIF 默认走 GitHub-safe 策略。**
|
|
381
|
+
6. **PR 集成必须同时服务“决策者”和“修复者”。**
|
|
382
|
+
|
|
383
|
+
### English Translation
|
|
384
|
+
|
|
385
|
+
Based on this research, CodeTrust should adopt the following product and engineering principles:
|
|
386
|
+
|
|
387
|
+
1. **Define finding identity before baseline and SARIF.**
|
|
388
|
+
2. **Separate tool health from code risk before investing in score credibility.**
|
|
389
|
+
3. **In CI, prioritize only-new-findings over blocking on the entire historical backlog.**
|
|
390
|
+
4. **Suppressions must be explicit, justified, and ideally expirable.**
|
|
391
|
+
5. **Use GitHub-safe defaults for SARIF.**
|
|
392
|
+
6. **PR integration must serve both decision-makers and fixers.**
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## 6. 推荐路线图 / Recommended Roadmap
|
|
397
|
+
|
|
398
|
+
### 6.1 P0:信任地基 / P0: Trust Foundations
|
|
399
|
+
|
|
400
|
+
#### 中文
|
|
401
|
+
|
|
402
|
+
**目标:先解决“工具是否真的可信”这个问题。**
|
|
403
|
+
|
|
404
|
+
推荐任务:
|
|
405
|
+
|
|
406
|
+
1. **实现稳定 finding fingerprint**
|
|
407
|
+
- 输入建议:`ruleId + normalizedFilePath + contextHash + occurrenceIndex`
|
|
408
|
+
2. **让 include/exclude 真正生效**
|
|
409
|
+
- 严格定义为 pre-scan filtering
|
|
410
|
+
3. **让规则失败与扫描异常可见**
|
|
411
|
+
- 输出 `rulesExecuted`、`rulesFailed`、`filesSkipped`、`scanErrors`
|
|
412
|
+
4. **固化 JSON schema v1**
|
|
413
|
+
- 明确 `toolHealth` 与 `analysisResult` 分层
|
|
414
|
+
5. **收敛 `scan` 与 `report` 的职责边界**
|
|
415
|
+
- `scan` 做即时扫描
|
|
416
|
+
- `report` 做 artifact / baseline / previous result 展示
|
|
417
|
+
|
|
418
|
+
#### English Translation
|
|
419
|
+
|
|
420
|
+
**Goal: solve the question of whether the tool itself is trustworthy.**
|
|
421
|
+
|
|
422
|
+
Recommended tasks:
|
|
423
|
+
|
|
424
|
+
1. **Implement stable finding fingerprints**
|
|
425
|
+
- Suggested input: `ruleId + normalizedFilePath + contextHash + occurrenceIndex`
|
|
426
|
+
2. **Make include/exclude truly effective**
|
|
427
|
+
- Define it strictly as pre-scan filtering
|
|
428
|
+
3. **Make rule failures and scan errors visible**
|
|
429
|
+
- Output `rulesExecuted`, `rulesFailed`, `filesSkipped`, `scanErrors`
|
|
430
|
+
4. **Freeze JSON schema v1**
|
|
431
|
+
- Separate `toolHealth` and `analysisResult`
|
|
432
|
+
5. **Clarify the boundary between `scan` and `report`**
|
|
433
|
+
- `scan` performs live analysis
|
|
434
|
+
- `report` renders artifacts, baseline comparison, or previous results
|
|
435
|
+
|
|
436
|
+
### 6.2 P1:把工具变成可进 CI 的 trust gate / P1: Turn the Tool into a CI-Ready Trust Gate
|
|
437
|
+
|
|
438
|
+
#### 中文
|
|
439
|
+
|
|
440
|
+
**目标:把“可扫描”升级为“可决策”。**
|
|
441
|
+
|
|
442
|
+
推荐任务:
|
|
443
|
+
|
|
444
|
+
1. **baseline / lifecycle 比对**
|
|
445
|
+
- 支持 `new / existing / fixed / suppressed`
|
|
446
|
+
2. **suppression 模型**
|
|
447
|
+
- 支持 inline、file、rule、config 级 suppression
|
|
448
|
+
- 建议带 `reason`、`source`、`expiresAt`
|
|
449
|
+
3. **policy engine**
|
|
450
|
+
- 支持 `off / warn / block`
|
|
451
|
+
4. **GitHub Action v2**
|
|
452
|
+
- job summary
|
|
453
|
+
- changed-line annotation
|
|
454
|
+
- json artifact
|
|
455
|
+
- baseline ref 输入
|
|
456
|
+
- fail-on-new-blocking
|
|
457
|
+
- fail-on-score-below
|
|
458
|
+
|
|
459
|
+
#### English Translation
|
|
460
|
+
|
|
461
|
+
**Goal: move from “scan-capable” to “decision-capable.”**
|
|
462
|
+
|
|
463
|
+
Recommended tasks:
|
|
464
|
+
|
|
465
|
+
1. **baseline / lifecycle comparison**
|
|
466
|
+
- Support `new / existing / fixed / suppressed`
|
|
467
|
+
2. **suppression model**
|
|
468
|
+
- Support inline, file-level, rule-level, and config-level suppression
|
|
469
|
+
- Ideally include `reason`, `source`, and `expiresAt`
|
|
470
|
+
3. **policy engine**
|
|
471
|
+
- Support `off / warn / block`
|
|
472
|
+
4. **GitHub Action v2**
|
|
473
|
+
- job summary
|
|
474
|
+
- changed-line annotations
|
|
475
|
+
- JSON artifact output
|
|
476
|
+
- baseline ref input
|
|
477
|
+
- fail-on-new-blocking
|
|
478
|
+
- fail-on-score-below
|
|
479
|
+
|
|
480
|
+
### 6.3 P2:专业化与生态桥接 / P2: Professionalization and Ecosystem Bridge
|
|
481
|
+
|
|
482
|
+
#### 中文
|
|
483
|
+
|
|
484
|
+
**目标:提高专业感与外部系统兼容性。**
|
|
485
|
+
|
|
486
|
+
推荐任务:
|
|
487
|
+
|
|
488
|
+
1. **GitHub-compatible SARIF exporter**
|
|
489
|
+
- 稳定 `partialFingerprints`
|
|
490
|
+
- 设定 `automationDetails.id`
|
|
491
|
+
- 默认排除 suppressed findings
|
|
492
|
+
2. **explain 模式**
|
|
493
|
+
- `codetrust explain <rule-id>`
|
|
494
|
+
3. **presets**
|
|
495
|
+
- `recommended`
|
|
496
|
+
- `strict`
|
|
497
|
+
- `ci-gate`
|
|
498
|
+
- `ai-suspicious`
|
|
499
|
+
4. **top risk file / top risk dimension / top risk module**
|
|
500
|
+
|
|
501
|
+
#### English Translation
|
|
502
|
+
|
|
503
|
+
**Goal: improve professionalism and external system compatibility.**
|
|
504
|
+
|
|
505
|
+
Recommended tasks:
|
|
506
|
+
|
|
507
|
+
1. **GitHub-compatible SARIF exporter**
|
|
508
|
+
- stable `partialFingerprints`
|
|
509
|
+
- explicit `automationDetails.id`
|
|
510
|
+
- exclude suppressed findings by default
|
|
511
|
+
2. **Explain mode**
|
|
512
|
+
- `codetrust explain <rule-id>`
|
|
513
|
+
3. **Presets**
|
|
514
|
+
- `recommended`
|
|
515
|
+
- `strict`
|
|
516
|
+
- `ci-gate`
|
|
517
|
+
- `ai-suspicious`
|
|
518
|
+
4. **top risk file / top risk dimension / top risk module**
|
|
519
|
+
|
|
520
|
+
---
|
|
521
|
+
|
|
522
|
+
## 7. 建议的输出模型 / Suggested Output Model
|
|
523
|
+
|
|
524
|
+
### 中文
|
|
525
|
+
|
|
526
|
+
建议 CodeTrust 的 JSON 输出模型从一开始就分为两部分:
|
|
527
|
+
|
|
528
|
+
### 7.1 toolHealth
|
|
529
|
+
|
|
530
|
+
用于说明工具这次“执行得怎么样”:
|
|
531
|
+
|
|
532
|
+
- `scanMode`
|
|
533
|
+
- `rulesExecuted`
|
|
534
|
+
- `rulesFailed`
|
|
535
|
+
- `filesConsidered`
|
|
536
|
+
- `filesExcluded`
|
|
537
|
+
- `filesSkipped`
|
|
538
|
+
- `scanErrors`
|
|
539
|
+
- `durationMs`
|
|
540
|
+
|
|
541
|
+
### 7.2 analysisResult
|
|
542
|
+
|
|
543
|
+
用于说明代码“风险长什么样”:
|
|
544
|
+
|
|
545
|
+
- `overall`
|
|
546
|
+
- `dimensions`
|
|
547
|
+
- `issues`
|
|
548
|
+
- `topRiskFiles`
|
|
549
|
+
- `thresholdResult`
|
|
550
|
+
- `lifecycleSummary`
|
|
551
|
+
|
|
552
|
+
这样做的价值在于:
|
|
553
|
+
|
|
554
|
+
- 用户可以区分“分数低是因为代码差,还是工具没跑完整”
|
|
555
|
+
- 后续 PR summary 和 SARIF exporter 也更容易消费
|
|
556
|
+
|
|
557
|
+
### English Translation
|
|
558
|
+
|
|
559
|
+
CodeTrust’s JSON output should be separated into two parts from the start:
|
|
560
|
+
|
|
561
|
+
### 7.1 toolHealth
|
|
562
|
+
|
|
563
|
+
This explains how well the tool executed:
|
|
564
|
+
|
|
565
|
+
- `scanMode`
|
|
566
|
+
- `rulesExecuted`
|
|
567
|
+
- `rulesFailed`
|
|
568
|
+
- `filesConsidered`
|
|
569
|
+
- `filesExcluded`
|
|
570
|
+
- `filesSkipped`
|
|
571
|
+
- `scanErrors`
|
|
572
|
+
- `durationMs`
|
|
573
|
+
|
|
574
|
+
### 7.2 analysisResult
|
|
575
|
+
|
|
576
|
+
This explains what the code risk looks like:
|
|
577
|
+
|
|
578
|
+
- `overall`
|
|
579
|
+
- `dimensions`
|
|
580
|
+
- `issues`
|
|
581
|
+
- `topRiskFiles`
|
|
582
|
+
- `thresholdResult`
|
|
583
|
+
- `lifecycleSummary`
|
|
584
|
+
|
|
585
|
+
The value of this split is:
|
|
586
|
+
|
|
587
|
+
- users can tell whether a low score comes from bad code or an incomplete scan
|
|
588
|
+
- PR summaries and SARIF exporters become much easier to build cleanly
|
|
589
|
+
|
|
590
|
+
---
|
|
591
|
+
|
|
592
|
+
## 8. 可直接创建的 GitHub Issues Backlog / GitHub-Issue-Ready Backlog
|
|
593
|
+
|
|
594
|
+
### 中文
|
|
595
|
+
|
|
596
|
+
#### P0
|
|
597
|
+
|
|
598
|
+
1. **feat: add stable finding fingerprint generation**
|
|
599
|
+
2. **fix: apply include/exclude filtering before scan execution**
|
|
600
|
+
3. **feat: surface rule execution failures in scan metadata**
|
|
601
|
+
4. **feat: add strict engine mode for CI**
|
|
602
|
+
5. **feat: freeze JSON output schema v1**
|
|
603
|
+
6. **refactor: make report artifact-based instead of live scan**
|
|
604
|
+
|
|
605
|
+
#### P1
|
|
606
|
+
|
|
607
|
+
7. **feat: implement baseline comparison and finding lifecycle states**
|
|
608
|
+
8. **feat: add suppression model with reason and optional expiry**
|
|
609
|
+
9. **feat: add policy modes (off/warn/block) per rule/category**
|
|
610
|
+
10. **feat: add GitHub Action job summary and changed-line annotations**
|
|
611
|
+
|
|
612
|
+
#### P2
|
|
613
|
+
|
|
614
|
+
11. **feat: export GitHub-compatible SARIF with stable fingerprints**
|
|
615
|
+
12. **feat: add explain command for rules and findings**
|
|
616
|
+
13. **feat: add recommended/strict/ci-gate presets**
|
|
617
|
+
14. **feat: report top-risk files and top-risk dimensions**
|
|
618
|
+
|
|
619
|
+
### English Translation
|
|
620
|
+
|
|
621
|
+
#### P0
|
|
622
|
+
|
|
623
|
+
1. **feat: add stable finding fingerprint generation**
|
|
624
|
+
2. **fix: apply include/exclude filtering before scan execution**
|
|
625
|
+
3. **feat: surface rule execution failures in scan metadata**
|
|
626
|
+
4. **feat: add strict engine mode for CI**
|
|
627
|
+
5. **feat: freeze JSON output schema v1**
|
|
628
|
+
6. **refactor: make report artifact-based instead of live scan**
|
|
629
|
+
|
|
630
|
+
#### P1
|
|
631
|
+
|
|
632
|
+
7. **feat: implement baseline comparison and finding lifecycle states**
|
|
633
|
+
8. **feat: add suppression model with reason and optional expiry**
|
|
634
|
+
9. **feat: add policy modes (off/warn/block) per rule/category**
|
|
635
|
+
10. **feat: add GitHub Action job summary and changed-line annotations**
|
|
636
|
+
|
|
637
|
+
#### P2
|
|
638
|
+
|
|
639
|
+
11. **feat: export GitHub-compatible SARIF with stable fingerprints**
|
|
640
|
+
12. **feat: add explain command for rules and findings**
|
|
641
|
+
13. **feat: add recommended/strict/ci-gate presets**
|
|
642
|
+
14. **feat: report top-risk files and top-risk dimensions**
|
|
643
|
+
|
|
644
|
+
---
|
|
645
|
+
|
|
646
|
+
## 9. 建议的成功指标 / Suggested Success Metrics
|
|
647
|
+
|
|
648
|
+
### 中文
|
|
649
|
+
|
|
650
|
+
为了避免路线图只停留在“功能完成”,建议给下一阶段补上结果指标:
|
|
651
|
+
|
|
652
|
+
1. **规则执行可靠性**
|
|
653
|
+
- `rulesFailed / rulesExecuted` 持续下降
|
|
654
|
+
2. **baseline 稳定性**
|
|
655
|
+
- 同一问题在小范围重构后,不应被大量重新识别为 new
|
|
656
|
+
3. **CI 可接受性**
|
|
657
|
+
- 团队实际愿意开启 blocking mode
|
|
658
|
+
4. **误报管理成本**
|
|
659
|
+
- suppression 的创建与追踪成本可控
|
|
660
|
+
5. **PR 可读性**
|
|
661
|
+
- summary 与 annotations 能帮助开发者在一次 review 中完成修复
|
|
662
|
+
|
|
663
|
+
### English Translation
|
|
664
|
+
|
|
665
|
+
To avoid a roadmap that only measures “features shipped,” the next phase should also include outcome metrics:
|
|
666
|
+
|
|
667
|
+
1. **Rule execution reliability**
|
|
668
|
+
- `rulesFailed / rulesExecuted` should trend downward
|
|
669
|
+
2. **Baseline stability**
|
|
670
|
+
- the same issue should not frequently reappear as new after small refactors
|
|
671
|
+
3. **CI acceptability**
|
|
672
|
+
- teams should actually be willing to enable blocking mode
|
|
673
|
+
4. **False-positive management cost**
|
|
674
|
+
- creating and tracking suppressions should remain manageable
|
|
675
|
+
5. **PR readability**
|
|
676
|
+
- summaries and annotations should help developers fix issues in a single review cycle
|
|
677
|
+
|
|
678
|
+
---
|
|
679
|
+
|
|
680
|
+
## 10. 当前不建议优先投入的方向 / Areas Not Worth Prioritizing Yet
|
|
681
|
+
|
|
682
|
+
### 中文
|
|
683
|
+
|
|
684
|
+
在 finding lifecycle 与 CI trust gate 建好之前,不建议把主精力投入到以下方向:
|
|
685
|
+
|
|
686
|
+
- 多语言支持
|
|
687
|
+
- VS Code 插件
|
|
688
|
+
- MCP server
|
|
689
|
+
- SaaS dashboard
|
|
690
|
+
- “AI probability” 或模糊型 AI 检测能力
|
|
691
|
+
|
|
692
|
+
原因不是这些方向没价值,而是它们都建立在“核心工作流足够可信”之上。
|
|
693
|
+
|
|
694
|
+
### English Translation
|
|
695
|
+
|
|
696
|
+
Before finding lifecycle and the CI trust gate are solid, the main effort should not go into:
|
|
697
|
+
|
|
698
|
+
- multi-language support
|
|
699
|
+
- a VS Code extension
|
|
700
|
+
- an MCP server
|
|
701
|
+
- a SaaS dashboard
|
|
702
|
+
- fuzzy “AI probability” style detectors
|
|
703
|
+
|
|
704
|
+
The reason is not that these are worthless, but that they all depend on a trustworthy core workflow.
|
|
705
|
+
|
|
706
|
+
---
|
|
707
|
+
|
|
708
|
+
## 11. 最终结论 / Final Conclusion
|
|
709
|
+
|
|
710
|
+
### 中文
|
|
711
|
+
|
|
712
|
+
CodeTrust 当前最重要的升级方向,不是让自己成为“更强的 scanner”,而是让自己成为“更可靠的 decision system”。
|
|
713
|
+
|
|
714
|
+
一句话总结:
|
|
715
|
+
|
|
716
|
+
**CodeTrust 的下一阶段,不应再以“规则数量”来定义进度,而应以“finding lifecycle、policy、delivery 是否成立”来定义成熟度。**
|
|
717
|
+
|
|
718
|
+
如果只能保留一个优先级判断,那就是:
|
|
719
|
+
|
|
720
|
+
**先把 CodeTrust 做成一个让团队敢放进 CI 的工具,再考虑把它做成一个让人兴奋的工具。**
|
|
721
|
+
|
|
722
|
+
### English Translation
|
|
723
|
+
|
|
724
|
+
The most important upgrade for CodeTrust is not to become “a stronger scanner,” but to become “a more reliable decision system.”
|
|
725
|
+
|
|
726
|
+
In one sentence:
|
|
727
|
+
|
|
728
|
+
**The next stage of CodeTrust should not be measured by rule count, but by whether finding lifecycle, policy, and delivery are truly in place.**
|
|
729
|
+
|
|
730
|
+
If there is only one priority judgment to keep, it is this:
|
|
731
|
+
|
|
732
|
+
**First make CodeTrust something teams are willing to put into CI. Then make it something that excites them.**
|
|
733
|
+
|
|
734
|
+
---
|
|
735
|
+
|
|
736
|
+
## 12. 参考资料 / References
|
|
737
|
+
|
|
738
|
+
### 中文
|
|
739
|
+
|
|
740
|
+
以下资料为本次 Exa 定向调研中的核心参考:
|
|
741
|
+
|
|
742
|
+
- Semgrep Findings in CI
|
|
743
|
+
https://semgrep.dev/docs/semgrep-ci/findings-ci
|
|
744
|
+
- Semgrep Configure blocking findings
|
|
745
|
+
https://semgrep.dev/docs/semgrep-ci/configuring-blocking-and-errors-in-ci
|
|
746
|
+
- Semgrep Ignore files, folders, and code
|
|
747
|
+
https://semgrep.dev/docs/ignoring-files-folders-code
|
|
748
|
+
- Semgrep Semgrepignore v2 reference
|
|
749
|
+
https://semgrep.dev/docs/semgrepignore-v2-reference
|
|
750
|
+
- GitHub Uploading a SARIF file to GitHub
|
|
751
|
+
https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github
|
|
752
|
+
- GitHub SARIF support for code scanning
|
|
753
|
+
https://docs.github.com/en/code-security/reference/code-scanning/sarif-files/sarif-support-for-code-scanning
|
|
754
|
+
- GitHub SARIF results exceed one or more limits
|
|
755
|
+
https://docs.github.com/en/code-security/code-scanning/troubleshooting-sarif-uploads/results-exceed-limit
|
|
756
|
+
- GitHub SARIF file is too large
|
|
757
|
+
https://docs.github.com/en/code-security/reference/code-scanning/sarif-files/troubleshoot-sarif-uploads/file-too-large
|
|
758
|
+
- Snyk Ignore issues
|
|
759
|
+
https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/ignore-issues
|
|
760
|
+
- Snyk The .snyk file
|
|
761
|
+
https://docs.snyk.io/manage-risk/policies/the-.snyk-file
|
|
762
|
+
- reviewdog repository
|
|
763
|
+
https://github.com/reviewdog/reviewdog
|
|
764
|
+
- Reviewdog filter settings with GitHub Actions
|
|
765
|
+
https://lornajane.net/posts/2024/reviewdog-filter-settings-with-github-actions
|
|
766
|
+
- Semgrep PR discussion: suppressed findings in SARIF
|
|
767
|
+
https://github.com/returntocorp/semgrep/pull/3616
|
|
768
|
+
- Semgrep issue: SARIF and suppressed findings caveat
|
|
769
|
+
https://github.com/returntocorp/semgrep/issues/7121
|
|
770
|
+
|
|
771
|
+
### English Translation
|
|
772
|
+
|
|
773
|
+
The following references were the most important sources used in this directed Exa research:
|
|
774
|
+
|
|
775
|
+
- Semgrep Findings in CI
|
|
776
|
+
https://semgrep.dev/docs/semgrep-ci/findings-ci
|
|
777
|
+
- Semgrep Configure blocking findings
|
|
778
|
+
https://semgrep.dev/docs/semgrep-ci/configuring-blocking-and-errors-in-ci
|
|
779
|
+
- Semgrep Ignore files, folders, and code
|
|
780
|
+
https://semgrep.dev/docs/ignoring-files-folders-code
|
|
781
|
+
- Semgrep Semgrepignore v2 reference
|
|
782
|
+
https://semgrep.dev/docs/semgrepignore-v2-reference
|
|
783
|
+
- GitHub Uploading a SARIF file to GitHub
|
|
784
|
+
https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github
|
|
785
|
+
- GitHub SARIF support for code scanning
|
|
786
|
+
https://docs.github.com/en/code-security/reference/code-scanning/sarif-files/sarif-support-for-code-scanning
|
|
787
|
+
- GitHub SARIF results exceed one or more limits
|
|
788
|
+
https://docs.github.com/en/code-security/code-scanning/troubleshooting-sarif-uploads/results-exceed-limit
|
|
789
|
+
- GitHub SARIF file is too large
|
|
790
|
+
https://docs.github.com/en/code-security/reference/code-scanning/sarif-files/troubleshoot-sarif-uploads/file-too-large
|
|
791
|
+
- Snyk Ignore issues
|
|
792
|
+
https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/ignore-issues
|
|
793
|
+
- Snyk The .snyk file
|
|
794
|
+
https://docs.snyk.io/manage-risk/policies/the-.snyk-file
|
|
795
|
+
- reviewdog repository
|
|
796
|
+
https://github.com/reviewdog/reviewdog
|
|
797
|
+
- Reviewdog filter settings with GitHub Actions
|
|
798
|
+
https://lornajane.net/posts/2024/reviewdog-filter-settings-with-github-actions
|
|
799
|
+
- Semgrep PR discussion: suppressed findings in SARIF
|
|
800
|
+
https://github.com/returntocorp/semgrep/pull/3616
|
|
801
|
+
- Semgrep issue: SARIF and suppressed findings caveat
|
|
802
|
+
https://github.com/returntocorp/semgrep/issues/7121
|