ccbot-cli 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (83) hide show
  1. package/LICENSE +21 -0
  2. package/bin/adapters/claude.js +150 -0
  3. package/bin/adapters/codex.js +439 -0
  4. package/bin/install.js +509 -349
  5. package/bin/lib/ccline.js +82 -0
  6. package/bin/lib/utils.js +87 -34
  7. package/bin/uninstall.js +48 -0
  8. package/config/AGENTS.md +630 -0
  9. package/config/CLAUDE.md +229 -20
  10. package/config/ccline/config.toml +161 -0
  11. package/config/codex-config.example.toml +22 -0
  12. package/config/settings.example.json +32 -0
  13. package/output-styles/abyss-cultivator.md +399 -0
  14. package/package.json +14 -5
  15. package/skills/SKILL.md +159 -0
  16. package/skills/domains/ai/SKILL.md +34 -0
  17. package/skills/domains/ai/agent-dev.md +242 -0
  18. package/skills/domains/ai/llm-security.md +288 -0
  19. package/skills/domains/ai/prompt-and-eval.md +279 -0
  20. package/skills/domains/ai/rag-system.md +542 -0
  21. package/skills/domains/architecture/SKILL.md +42 -0
  22. package/skills/domains/architecture/api-design.md +225 -0
  23. package/skills/domains/architecture/caching.md +299 -0
  24. package/skills/domains/architecture/cloud-native.md +285 -0
  25. package/skills/domains/architecture/message-queue.md +329 -0
  26. package/skills/domains/architecture/security-arch.md +297 -0
  27. package/skills/domains/data-engineering/SKILL.md +207 -0
  28. package/skills/domains/development/SKILL.md +46 -0
  29. package/skills/domains/development/cpp.md +246 -0
  30. package/skills/domains/development/go.md +323 -0
  31. package/skills/domains/development/java.md +277 -0
  32. package/skills/domains/development/python.md +288 -0
  33. package/skills/domains/development/rust.md +313 -0
  34. package/skills/domains/development/shell.md +313 -0
  35. package/skills/domains/development/typescript.md +277 -0
  36. package/skills/domains/devops/SKILL.md +39 -0
  37. package/skills/domains/devops/cost-optimization.md +272 -0
  38. package/skills/domains/devops/database.md +217 -0
  39. package/skills/domains/devops/devsecops.md +198 -0
  40. package/skills/domains/devops/git-workflow.md +181 -0
  41. package/skills/domains/devops/observability.md +280 -0
  42. package/skills/domains/devops/performance.md +336 -0
  43. package/skills/domains/devops/testing.md +283 -0
  44. package/skills/domains/frontend-design/SKILL.md +38 -0
  45. package/skills/domains/frontend-design/claymorphism/SKILL.md +119 -0
  46. package/skills/domains/frontend-design/claymorphism/references/tokens.css +52 -0
  47. package/skills/domains/frontend-design/component-patterns.md +202 -0
  48. package/skills/domains/frontend-design/engineering.md +287 -0
  49. package/skills/domains/frontend-design/glassmorphism/SKILL.md +140 -0
  50. package/skills/domains/frontend-design/glassmorphism/references/tokens.css +32 -0
  51. package/skills/domains/frontend-design/liquid-glass/SKILL.md +137 -0
  52. package/skills/domains/frontend-design/liquid-glass/references/tokens.css +81 -0
  53. package/skills/domains/frontend-design/neubrutalism/SKILL.md +143 -0
  54. package/skills/domains/frontend-design/neubrutalism/references/tokens.css +44 -0
  55. package/skills/domains/frontend-design/state-management.md +680 -0
  56. package/skills/domains/frontend-design/ui-aesthetics.md +110 -0
  57. package/skills/domains/frontend-design/ux-principles.md +156 -0
  58. package/skills/domains/infrastructure/SKILL.md +200 -0
  59. package/skills/domains/mobile/SKILL.md +224 -0
  60. package/skills/domains/orchestration/SKILL.md +29 -0
  61. package/skills/domains/orchestration/multi-agent.md +263 -0
  62. package/skills/domains/security/SKILL.md +54 -0
  63. package/skills/domains/security/blue-team.md +436 -0
  64. package/skills/domains/security/code-audit.md +265 -0
  65. package/skills/domains/security/pentest.md +226 -0
  66. package/skills/domains/security/red-team.md +375 -0
  67. package/skills/domains/security/threat-intel.md +372 -0
  68. package/skills/domains/security/vuln-research.md +369 -0
  69. package/skills/orchestration/multi-agent/SKILL.md +493 -0
  70. package/skills/run_skill.js +129 -0
  71. package/skills/tools/gen-docs/SKILL.md +116 -0
  72. package/skills/tools/gen-docs/scripts/doc_generator.js +435 -0
  73. package/skills/tools/lib/shared.js +98 -0
  74. package/skills/tools/verify-change/SKILL.md +140 -0
  75. package/skills/tools/verify-change/scripts/change_analyzer.js +289 -0
  76. package/skills/tools/verify-module/SKILL.md +127 -0
  77. package/skills/tools/verify-module/scripts/module_scanner.js +171 -0
  78. package/skills/tools/verify-quality/SKILL.md +160 -0
  79. package/skills/tools/verify-quality/scripts/quality_checker.js +337 -0
  80. package/skills/tools/verify-security/SKILL.md +143 -0
  81. package/skills/tools/verify-security/scripts/security_scanner.js +283 -0
  82. package/bin/lib/registry.js +0 -61
  83. package/config/.claudeignore +0 -11
@@ -0,0 +1,279 @@
1
+ ---
2
+ name: prompt-and-eval
3
+ description: Prompt 工程与模型评估。Prompt 模式(Zero-shot、Few-shot、CoT、ReAct、ToT)、模板设计、RAGAS、LLM-as-Judge、基准测试、A/B 测试、持续监控。当用户提到 Prompt 工程、Few-shot、CoT、模型评估、RAGAS、LLM-as-Judge、基准测试时使用。
4
+ ---
5
+
6
+ # Prompt 工程与模型评估
7
+
8
+ ## 一、Prompt 模式
9
+
10
+ ### 模式对比
11
+
12
+ | 模式 | 复杂度 | 准确性 | Token 消耗 | 适用场景 |
13
+ |------|--------|--------|------------|----------|
14
+ | Zero-shot | 低 | 中 | 低 | 简单任务、通用问题 |
15
+ | Few-shot | 中 | 高 | 中 | 格式化输出、分类 |
16
+ | CoT | 中 | 高 | 中 | 推理、数学、逻辑 |
17
+ | Self-Consistency | 高 | 极高 | 高 | 关键决策 |
18
+ | ToT | 极高 | 极高 | 极高 | 复杂规划 |
19
+ | ReAct | 高 | 高 | 高 | 工具调用、Agent |
20
+
21
+ ### Zero-shot
22
+
23
+ ```python
24
+ # 关键:清晰指令 + 角色设定 + 输出格式
25
+ prompt = """
26
+ 你是一位资深安全工程师。
27
+ 任务: 将以下文本分类为正面、负面或中性。
28
+ 输入: {text}
29
+ 输出格式: JSON {"sentiment": "...", "confidence": 0.0-1.0}
30
+ """
31
+ ```
32
+
33
+ ### Few-shot
34
+
35
+ ```python
36
+ # 关键:2-5 个高质量示例 + 语义相似度选择
37
+ prompt = """
38
+ 将评论分类:
39
+
40
+ 评论: 音质很棒,佩戴舒适。 → 正面
41
+ 评论: 电池续航太差。 → 负面
42
+ 评论: {new_review} →
43
+ """
44
+
45
+ # 动态示例选择(LangChain)
46
+ selector = SemanticSimilarityExampleSelector.from_examples(
47
+ examples, OpenAIEmbeddings(), Chroma, k=2
48
+ )
49
+ ```
50
+
51
+ ### Chain-of-Thought (CoT)
52
+
53
+ ```python
54
+ # Zero-shot CoT — 魔法咒语
55
+ prompt = f"问题: {question}\n\n让我们一步步思考:"
56
+
57
+ # Self-Consistency — 多路投票
58
+ answers = [extract_answer(llm.predict(prompt, temperature=0.7)) for _ in range(5)]
59
+ final = Counter(answers).most_common(1)[0][0]
60
+ ```
61
+
62
+ ### ReAct
63
+
64
+ ```python
65
+ # Thought → Action → Observation 循环
66
+ prompt = """
67
+ 工具: Search[query], Calculate[expr], Finish[answer]
68
+
69
+ Thought: 我需要查询埃菲尔铁塔高度
70
+ Action: Search[埃菲尔铁塔高度]
71
+ Observation: 330 米
72
+ Thought: 现在知道答案了
73
+ Action: Finish[330 米]
74
+ """
75
+ ```
76
+
77
+ ### Tree-of-Thoughts (ToT)
78
+
79
+ ```python
80
+ # 生成多条思路 → 评估打分 → Beam Search 选最优 → 递归扩展
81
+ class TreeOfThoughts:
82
+ def solve(self, problem):
83
+ thoughts = self._generate(problem, n=3)
84
+ scored = self._evaluate(problem, thoughts)
85
+ best = sorted(scored, key=lambda x: x[1], reverse=True)[:self.beam_width]
86
+ # 递归深入最佳路径
87
+ ```
88
+
89
+ ## 二、Prompt 设计技巧
90
+
91
+ ### 模板结构
92
+
93
+ ```python
94
+ messages = [
95
+ {"role": "system", "content": "角色 + 能力边界 + 输出约束"},
96
+ {"role": "user", "content": "### 指令\n{task}\n### 输入\n{input}\n### 输出格式\n{format}"},
97
+ ]
98
+ ```
99
+
100
+ ### 优化原则
101
+
102
+ | 原则 | 做 | 不做 |
103
+ |------|-----|------|
104
+ | 清晰性 | 具体、可执行、有约束 | 模糊指令 |
105
+ | 结构化 | 分隔符、编号、格式 | 大段文字 |
106
+ | 示例驱动 | 2-5 个高质量示例 | 无示例 |
107
+ | 分步指令 | 步骤 1/2/3 | 一句话包办 |
108
+ | 约束边界 | 说明要做和不做什么 | 无限制 |
109
+
110
+ ### 高级技巧
111
+
112
+ ```python
113
+ # 元提示 — 用 LLM 生成 Prompt
114
+ meta = "你是 Prompt 专家。为以下任务生成最优 Prompt: {task}"
115
+
116
+ # 自我批评 — 生成 → 批评 → 改进
117
+ answer = llm(question)
118
+ critique = llm(f"批评: {answer}")
119
+ improved = llm(f"基于批评改进: {critique}")
120
+ ```
121
+
122
+ ### Prompt 模板速查
123
+
124
+ ```yaml
125
+ 代码生成: "生成 {lang} 代码: {desc}。要求: 最佳实践 + 注释 + 异常处理"
126
+ 文本摘要: "总结为 {n} 字: {text}。保留关键信息,语言简洁"
127
+ 数据提取: "从文本提取 {fields},输出 JSON: {text}"
128
+ NL2SQL: "将自然语言转 SQL: {query}。表结构: {schema}"
129
+ ```
130
+
131
+ ## 三、模型评估
132
+
133
+ ### 评估维度
134
+
135
+ | 维度 | 指标 | 适用场景 |
136
+ |------|------|----------|
137
+ | 准确性 | Accuracy, F1, Precision, Recall | 分类、NER |
138
+ | 相关性 | Relevance, Context Precision | RAG、检索 |
139
+ | 忠实性 | Faithfulness, Hallucination Rate | 生成任务 |
140
+ | 效率 | Latency P95, Throughput, Cost/1K | 生产部署 |
141
+
142
+ ### RAGAS 框架
143
+
144
+ ```python
145
+ from ragas import evaluate
146
+ from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
147
+
148
+ dataset = Dataset.from_dict({
149
+ "question": questions,
150
+ "answer": answers,
151
+ "contexts": contexts,
152
+ "ground_truth": ground_truths,
153
+ })
154
+
155
+ result = evaluate(dataset, metrics=[
156
+ faithfulness, # 答案是否基于上下文(0-1)
157
+ answer_relevancy, # 答案与问题相关度(0-1)
158
+ context_precision, # 检索上下文中相关信息比例(0-1)
159
+ context_recall, # 上下文是否包含所需全部信息(0-1)
160
+ ])
161
+ ```
162
+
163
+ ### LLM-as-Judge
164
+
165
+ ```python
166
+ class LLMJudge:
167
+ def evaluate(self, question, answer, criteria):
168
+ prompt = f"""
169
+ 评估答案质量(1-5 分):
170
+ 问题: {question}
171
+ 答案: {answer}
172
+ 标准: {criteria}
173
+
174
+ 输出 JSON: {{"accuracy": N, "completeness": N, "clarity": N, "overall": N, "feedback": "..."}}
175
+ """
176
+ return json.loads(self.llm.predict(prompt))
177
+
178
+ # 成对比较 + ELO 排名
179
+ def pairwise(q, a, b):
180
+ # 返回 {"winner": "A"|"B", "confidence": 0-1}
181
+ ...
182
+ ```
183
+
184
+ ### 基准测试速查
185
+
186
+ | 基准 | 评估能力 | 核心指标 |
187
+ |------|----------|----------|
188
+ | MMLU | 多任务语言理解 | Accuracy |
189
+ | HumanEval | 代码生成 | Pass@k |
190
+ | GSM8K | 数学推理 | Accuracy (CoT) |
191
+ | 自定义 | 业务场景 | 加权评分 + 延迟 |
192
+
193
+ ### 检索指标
194
+
195
+ ```python
196
+ def evaluate_retrieval(retrieved, relevant, k=5):
197
+ precision_at_k = len(set(retrieved[:k]) & set(relevant)) / k
198
+ recall_at_k = len(set(retrieved[:k]) & set(relevant)) / len(relevant)
199
+ # MRR: 第一个相关文档的倒数排名
200
+ # NDCG: 归一化折损累积增益
201
+ return {"precision@k": precision_at_k, "recall@k": recall_at_k, "mrr": mrr, "ndcg": ndcg}
202
+ ```
203
+
204
+ ### 生成指标
205
+
206
+ ```python
207
+ # ROUGE: 摘要质量(rouge-1, rouge-2, rouge-l)
208
+ # BLEU: 翻译质量
209
+ from rouge import Rouge
210
+ rouge_scores = Rouge().get_scores(predictions, references, avg=True)
211
+ ```
212
+
213
+ ## 四、A/B 测试与监控
214
+
215
+ ### A/B 测试
216
+
217
+ ```python
218
+ class ABTest:
219
+ def __init__(self, variants): # [Variant(name, model, ratio)]
220
+ self.variants = variants
221
+
222
+ def get_variant(self, user_id):
223
+ # 一致性哈希分流
224
+ return self.variants[hash(user_id) % 100 < cumulative_ratio]
225
+
226
+ def check_significance(self, a_scores, b_scores, alpha=0.05):
227
+ t_stat, p_value = stats.ttest_ind(a_scores, b_scores)
228
+ cohens_d = (mean(a) - mean(b)) / pooled_std
229
+ return {"p_value": p_value, "significant": p_value < alpha, "effect": cohens_d}
230
+ ```
231
+
232
+ ### 持续监控
233
+
234
+ ```python
235
+ from prometheus_client import Counter, Histogram, Gauge
236
+
237
+ request_count = Counter('llm_requests_total', 'Total', ['model', 'status'])
238
+ latency = Histogram('llm_latency_seconds', 'Latency', ['model'])
239
+ quality = Gauge('llm_quality_score', 'Quality', ['model'])
240
+
241
+ # 异常检测: Z-score > 2.0 触发告警
242
+ class AnomalyDetector:
243
+ def check(self, value):
244
+ z = abs((value - mean(self.window)) / std(self.window))
245
+ return z > self.threshold
246
+ ```
247
+
248
+ ## 五、Checklist
249
+
250
+ ### Prompt 工程
251
+
252
+ - 清晰指令 + 角色设定 + 输出格式约束
253
+ - 复杂任务用 CoT / ReAct
254
+ - 关键决策用 Self-Consistency 多路投票
255
+ - 版本管理 Prompt,A/B 测试对比效果
256
+ - 迭代优化:测试 → 分析 → 改进
257
+
258
+ ### 模型评估
259
+
260
+ - 多维度评估:准确性 + 相关性 + 忠实性 + 效率
261
+ - RAG 用 RAGAS 四指标
262
+ - 自动评估 LLM-as-Judge + 定期人工抽检
263
+ - 标准基准(MMLU/HumanEval)+ 业务自定义基准
264
+ - 上线前 A/B 测试,上线后持续监控 + 异常告警
265
+ - 反馈闭环:收集用户反馈持续改进
266
+
267
+ ## 工具速查
268
+
269
+ | 工具 | 用途 |
270
+ |------|------|
271
+ | RAGAS | RAG 专用评估 |
272
+ | LangSmith | LLM 应用监控 |
273
+ | Phoenix | 可观测性平台 |
274
+ | LangChain | Prompt 模板管理 |
275
+ | Guidance | 结构化生成 |
276
+ | OpenAI Evals | 模型评估框架 |
277
+ | W&B | 实验追踪 |
278
+
279
+ ---