paperfit-cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/adjust-length.md +21 -0
- package/.claude/commands/check-visual.md +27 -0
- package/.claude/commands/fix-layout.md +31 -0
- package/.claude/commands/migrate-template.md +23 -0
- package/.claude/commands/repair-table.md +21 -0
- package/.claude/commands/show-status.md +32 -0
- package/.claude-plugin/README.md +77 -0
- package/.claude-plugin/marketplace.json +41 -0
- package/.claude-plugin/plugin.json +39 -0
- package/CLAUDE.md +266 -0
- package/CONTRIBUTING.md +131 -0
- package/LICENSE +21 -0
- package/README.md +164 -0
- package/agents/code-surgeon-agent.md +214 -0
- package/agents/layout-detective-agent.md +229 -0
- package/agents/orchestrator-agent.md +254 -0
- package/agents/quality-gatekeeper-agent.md +270 -0
- package/agents/rule-engine-agent.md +224 -0
- package/agents/semantic-polish-agent.md +250 -0
- package/bin/paperfit.js +176 -0
- package/config/agent_roles.yaml +56 -0
- package/config/layout_rules.yaml +54 -0
- package/config/templates.yaml +241 -0
- package/config/vto_taxonomy.yaml +489 -0
- package/config/writing_rules.yaml +64 -0
- package/install.sh +30 -0
- package/package.json +52 -0
- package/requirements.txt +5 -0
- package/scripts/benchmark_runner.py +629 -0
- package/scripts/compile.sh +244 -0
- package/scripts/config_validator.py +339 -0
- package/scripts/cv_detector.py +600 -0
- package/scripts/evidence_collector.py +167 -0
- package/scripts/float_fixers.py +861 -0
- package/scripts/inject_defects.py +549 -0
- package/scripts/install-claude-global.js +148 -0
- package/scripts/install.js +66 -0
- package/scripts/install.sh +106 -0
- package/scripts/overflow_fixers.py +656 -0
- package/scripts/package-for-opensource.sh +138 -0
- package/scripts/parse_log.py +260 -0
- package/scripts/postinstall.js +38 -0
- package/scripts/pre_tool_use.py +265 -0
- package/scripts/render_pages.py +244 -0
- package/scripts/session_logger.py +329 -0
- package/scripts/space_util_fixers.py +773 -0
- package/scripts/state_manager.py +352 -0
- package/scripts/test_commands.py +187 -0
- package/scripts/test_cv_detector.py +214 -0
- package/scripts/test_integration.py +290 -0
- package/skills/consistency-polisher/SKILL.md +337 -0
- package/skills/float-optimizer/SKILL.md +284 -0
- package/skills/latex_fixers/__init__.py +82 -0
- package/skills/latex_fixers/float_fixers.py +392 -0
- package/skills/latex_fixers/fullwidth_fixers.py +375 -0
- package/skills/latex_fixers/overflow_fixers.py +250 -0
- package/skills/latex_fixers/semantic_micro_tuning.py +362 -0
- package/skills/latex_fixers/space_util_fixers.py +389 -0
- package/skills/latex_fixers/utils.py +55 -0
- package/skills/overflow-repair/SKILL.md +304 -0
- package/skills/space-util-fixer/SKILL.md +307 -0
- package/skills/taxonomy-vto/SKILL.md +486 -0
- package/skills/template-migrator/SKILL.md +251 -0
- package/skills/visual-inspector/SKILL.md +217 -0
- package/skills/writing-polish/SKILL.md +289 -0
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
# Writing Polish Skill
|
|
2
|
+
|
|
3
|
+
## 概述
|
|
4
|
+
|
|
5
|
+
本技能为 **Semantic Polish Agent** 提供具体、可执行的语义微调策略与禁区规则。它定义了在排版手段用尽后,如何通过最小化文字增删来消除孤行寡行、控制页数预算或优化末页留白,同时严格保持学术内容的原意、数据和结论不变。
|
|
6
|
+
|
|
7
|
+
该技能不直接被 `orchestrator-agent` 调用,而是作为 `semantic-polish-agent` 的知识库和行为规范。所有语义级改写必须遵循本技能中定义的技巧和约束。
|
|
8
|
+
|
|
9
|
+
## 适用场景
|
|
10
|
+
|
|
11
|
+
| 触发缺陷 | 操作方向 | 允许的改写幅度 |
|
|
12
|
+
|----------|----------|---------------|
|
|
13
|
+
| A1(孤行寡行) | 缩短 1-2 行 | 增删 3-8 词 |
|
|
14
|
+
| A2(末页留白) | 扩展 2-4 行 | 增加 20-50 词 |
|
|
15
|
+
| A3(页数预算) | 缩短或扩展多行 | 按需,但需分段执行 |
|
|
16
|
+
| 用户主动请求 | 精炼或扩写特定段落 | 用户指定 |
|
|
17
|
+
|
|
18
|
+
## 输入规范
|
|
19
|
+
|
|
20
|
+
| 输入项 | 来源 | 说明 |
|
|
21
|
+
|--------|------|------|
|
|
22
|
+
| 目标段落源码 | `semantic-polish-agent` 提取 | 需改写的一个或多个完整段落 |
|
|
23
|
+
| 改写目标 | 调用方请求 | `shorten` 或 `expand`,及期望行数变化 |
|
|
24
|
+
| 写作规范 | `config/writing_rules.yaml` | 时态、术语、禁用词等约束 |
|
|
25
|
+
| 上下文段落 | `semantic-polish-agent` 提取 | 前后各一段,用于保证语义连贯 |
|
|
26
|
+
|
|
27
|
+
## 输出规范
|
|
28
|
+
|
|
29
|
+
本技能输出改写后的文本及变更元数据,供 `semantic-polish-agent` 整合为最终报告。
|
|
30
|
+
|
|
31
|
+
```json
|
|
32
|
+
{
|
|
33
|
+
"skill": "writing-polish",
|
|
34
|
+
"changes": [
|
|
35
|
+
{
|
|
36
|
+
"paragraph_id": 3,
|
|
37
|
+
"action": "shorten",
|
|
38
|
+
"net_word_change": -6,
|
|
39
|
+
"before_snippet": "It is worth noting that our method achieves state-of-the-art performance on several benchmark datasets.",
|
|
40
|
+
"after_snippet": "Our method achieves state-of-the-art results on several benchmarks.",
|
|
41
|
+
"techniques_used": ["remove_redundant", "phrase_to_word"],
|
|
42
|
+
"rationale": "移除冗余修饰词,将 'achieves state-of-the-art performance on several benchmark datasets' 压缩为 'achieves state-of-the-art results on several benchmarks'。语义等价,数据未变。"
|
|
43
|
+
}
|
|
44
|
+
],
|
|
45
|
+
"warnings": []
|
|
46
|
+
}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## 改写策略
|
|
50
|
+
|
|
51
|
+
### 通用原则
|
|
52
|
+
|
|
53
|
+
1. **最小修改优先**:能改一词不改一句,能改一句不改一段。
|
|
54
|
+
2. **保持学术严谨**:绝不改变数据值、引用标记、专有名词、核心声明。
|
|
55
|
+
3. **局部影响评估**:每次改写后需编译验证,确保不引入新的孤行或溢出。
|
|
56
|
+
4. **可逆性**:保留改写前文本,便于人工审查或回滚。
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
### 策略组 1:缩短(Shorten)
|
|
61
|
+
|
|
62
|
+
目标:在不损失信息的前提下减少字数/行数。
|
|
63
|
+
|
|
64
|
+
#### 技巧 1.1:删除冗余修饰词
|
|
65
|
+
|
|
66
|
+
移除对学术内容无实质贡献的修饰语。
|
|
67
|
+
|
|
68
|
+
- 删除强调性副词:`very`、`quite`、`extremely`、`highly`
|
|
69
|
+
- 删除填充短语:`It is worth noting that`、`It should be emphasized that`、`It is important to mention that`
|
|
70
|
+
- 删除冗余限定:`in a certain sense`、`to some extent`
|
|
71
|
+
|
|
72
|
+
示例:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
修改前:It is worth noting that our method achieves very competitive performance.
|
|
76
|
+
修改后:Our method achieves competitive performance.
|
|
77
|
+
减少:5 词
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
#### 技巧 1.2:短语替换为单词
|
|
81
|
+
|
|
82
|
+
用更简洁的单词或缩写替代多词短语。
|
|
83
|
+
|
|
84
|
+
| 原短语 | 替换为 |
|
|
85
|
+
|--------|--------|
|
|
86
|
+
| `in order to` | `to` |
|
|
87
|
+
| `a large number of` | `many` |
|
|
88
|
+
| `due to the fact that` | `because` |
|
|
89
|
+
| `at the present time` | `now` |
|
|
90
|
+
| `state-of-the-art methods` | `SOTA methods`(需已定义) |
|
|
91
|
+
| `with respect to` | `regarding` 或 `on` |
|
|
92
|
+
|
|
93
|
+
示例:
|
|
94
|
+
|
|
95
|
+
```
|
|
96
|
+
修改前:We conduct experiments in order to evaluate the performance of the proposed approach.
|
|
97
|
+
修改后:We conduct experiments to evaluate our approach.
|
|
98
|
+
减少:5 词
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
#### 技巧 1.3:被动语态转主动语态
|
|
102
|
+
|
|
103
|
+
主动语态通常更简短且更有力。
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
修改前:The experiments were conducted by us on three datasets.
|
|
107
|
+
修改后:We conducted experiments on three datasets.
|
|
108
|
+
减少:3 词
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
#### 技巧 1.4:合并相邻短句
|
|
112
|
+
|
|
113
|
+
将两个紧密相关的短句合并为一句。
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
修改前:We used the Adam optimizer. The learning rate was set to 1e-4.
|
|
117
|
+
修改后:We used Adam with a learning rate of 1e-4.
|
|
118
|
+
减少:6 词
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
#### 技巧 1.5:使用标准学术缩写
|
|
122
|
+
|
|
123
|
+
在全文首次定义后,使用公认缩写。
|
|
124
|
+
|
|
125
|
+
| 原词 | 缩写 |
|
|
126
|
+
|------|------|
|
|
127
|
+
| `state-of-the-art` | `SOTA` |
|
|
128
|
+
| `natural language processing` | `NLP` |
|
|
129
|
+
| `mean average precision` | `mAP` |
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
修改前:Our method outperforms previous state-of-the-art approaches on the natural language processing benchmark.
|
|
133
|
+
修改后:Our method outperforms previous SOTA approaches on the NLP benchmark.
|
|
134
|
+
减少:5 词(假设 SOTA/NLP 已定义)
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
#### 技巧 1.6:简化从句结构
|
|
138
|
+
|
|
139
|
+
将定语从句压缩为分词短语或前置定语。
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
修改前:The model which is trained on ImageNet achieves high accuracy.
|
|
143
|
+
修改后:The ImageNet-trained model achieves high accuracy.
|
|
144
|
+
减少:3 词
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
### 策略组 2:扩展(Expand)
|
|
150
|
+
|
|
151
|
+
目标:在不注水的前提下增加有实质内容的文字。
|
|
152
|
+
|
|
153
|
+
#### 技巧 2.1:显式化隐含因果关系
|
|
154
|
+
|
|
155
|
+
在结果陈述后补充简短的原因解释。
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
修改前:Our method outperforms the baseline by 3.2%.
|
|
159
|
+
修改后:Our method outperforms the baseline by 3.2%, likely because the attention mechanism better captures long-range dependencies.
|
|
160
|
+
增加:11 词
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
#### 技巧 2.2:补充结果解释
|
|
164
|
+
|
|
165
|
+
在表格或数据引用后,增加一句对关键发现的解读。
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
修改前:Table 2 shows the ablation results.
|
|
169
|
+
修改后:Table 2 summarizes the ablation study. Removing the temporal module causes a significant drop of 5.1%, confirming its importance for sequential modeling.
|
|
170
|
+
增加:18 词
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
#### 技巧 2.3:强化与相关工作的对比
|
|
174
|
+
|
|
175
|
+
在提及已有工作时,增加具体的差异说明。
|
|
176
|
+
|
|
177
|
+
```
|
|
178
|
+
修改前:Unlike previous work, we use a transformer-based architecture.
|
|
179
|
+
修改后:Unlike previous work that relied on recurrent networks with limited parallelization, we adopt a transformer architecture that scales more efficiently to long sequences.
|
|
180
|
+
增加:14 词
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
#### 技巧 2.4:添加局限性讨论
|
|
184
|
+
|
|
185
|
+
在结论或讨论部分,补充一句对当前方法局限性的客观陈述。
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
修改前:Future work will explore larger-scale datasets.
|
|
189
|
+
修改后:Future work will explore larger-scale datasets. A current limitation is the reliance on pre-trained word embeddings, which may not fully capture domain-specific terminology.
|
|
190
|
+
增加:17 词
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
#### 技巧 2.5:拆分长句为短句
|
|
194
|
+
|
|
195
|
+
通过增加句号拆分长句,可在不显著增加内容的情况下扩展行数。
|
|
196
|
+
|
|
197
|
+
```
|
|
198
|
+
修改前:Our method consists of three components: an encoder, a decoder, and a refinement module.
|
|
199
|
+
修改后:Our method consists of three components. First, the encoder extracts features from the input. Second, the decoder generates initial predictions. Finally, the refinement module iteratively improves the output.
|
|
200
|
+
增加:14 词,行数增加更多
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
#### 技巧 2.6:补充技术细节(谨慎)
|
|
204
|
+
|
|
205
|
+
在不泄露未公开信息的前提下,可适当补充已在论文其他部分出现过的技术细节。
|
|
206
|
+
|
|
207
|
+
```
|
|
208
|
+
修改前:We use a standard cross-entropy loss.
|
|
209
|
+
修改后:We use a standard cross-entropy loss with label smoothing of 0.1, following common practice in image classification.
|
|
210
|
+
增加:10 词
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
---
|
|
214
|
+
|
|
215
|
+
## 改写禁区(绝对禁止)
|
|
216
|
+
|
|
217
|
+
以下操作 **严禁** 进行,违反任何一条都不得输出修改。
|
|
218
|
+
|
|
219
|
+
### 禁区 1:篡改数据与结果
|
|
220
|
+
|
|
221
|
+
- 不得修改任何数值、百分比、指标名称。
|
|
222
|
+
- 不得增删或更改实验设定、数据集名称、模型参数。
|
|
223
|
+
- 不得修改表格中的任何单元格内容。
|
|
224
|
+
|
|
225
|
+
### 禁区 2:编造内容
|
|
226
|
+
|
|
227
|
+
- 不得引入原论文中不存在的引用、相关工作、方法细节。
|
|
228
|
+
- 不得虚构实验、消融研究、用户调查。
|
|
229
|
+
- 不得添加未经作者确认的局限性或未来工作方向(除非是论文其他部分明确提及的内容)。
|
|
230
|
+
|
|
231
|
+
### 禁区 3:改变核心声明
|
|
232
|
+
|
|
233
|
+
- 不得弱化或夸大论文的贡献声明。
|
|
234
|
+
- 不得修改结论段落中的主要论断。
|
|
235
|
+
- 不得改变任何 `\ref{}` 或 `\cite{}` 的引用关系。
|
|
236
|
+
|
|
237
|
+
### 禁区 4:引入非学术表达
|
|
238
|
+
|
|
239
|
+
- 不得使用口语化、情绪化、主观化的语言。
|
|
240
|
+
- 不得添加无意义的填充句(如 `This is a very interesting result.` 后无任何分析)。
|
|
241
|
+
- 不得违反 `config/writing_rules.yaml` 中的任何硬规则(如时态混乱、口语缩写)。
|
|
242
|
+
|
|
243
|
+
### 禁区 5:破坏 LaTeX 结构
|
|
244
|
+
|
|
245
|
+
- 不得修改 `\section`、`\label`、`\ref`、`\cite` 等关键命令。
|
|
246
|
+
- 不得增删或修改 `\begin{...}` 和 `\end{...}` 环境边界。
|
|
247
|
+
- 不得在公式环境内进行语义改写。
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## 改写验证清单
|
|
252
|
+
|
|
253
|
+
每完成一次改写,`semantic-polish-agent` 必须自检以下项目:
|
|
254
|
+
|
|
255
|
+
- [ ] 所有数值、百分数、指标名称是否与原段落完全一致?
|
|
256
|
+
- [ ] 所有 `\ref{}`、`\cite{}` 命令是否未被触碰?
|
|
257
|
+
- [ ] 时态是否与上下文一致(相关工作用现在时,方法/实验用过去时)?
|
|
258
|
+
- [ ] 专有名词(方法名、模型名、数据集名)是否拼写正确且未变?
|
|
259
|
+
- [ ] 若引入了新缩写,是否在首次出现处已定义?
|
|
260
|
+
- [ ] 改写后的段落是否与前后文语义连贯?
|
|
261
|
+
- [ ] 是否有违反禁区的操作?
|
|
262
|
+
|
|
263
|
+
若任何一项未通过,必须回退并尝试其他改写方案。
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## 与其它技能的协作
|
|
268
|
+
|
|
269
|
+
- **Space Utilization Fixer**:当排版手段(`\looseness` 等)无法解决孤行或页数问题时,向 `semantic-polish-agent` 发出请求。
|
|
270
|
+
- **Semantic Polish Agent**:本技能的直接使用者,严格按照本技能定义的策略和禁区执行改写。
|
|
271
|
+
- **Quality Gatekeeper Agent**:在最终验收时审查语义改写的合理性,确保未违反禁区。
|
|
272
|
+
|
|
273
|
+
## 常见问题与边界处理
|
|
274
|
+
|
|
275
|
+
**Q:如果段落已经很精炼,无法再缩短怎么办?**
|
|
276
|
+
A:标记为 `failed`,并向 `semantic-polish-agent` 返回明确原因(如“段落仅含 3 句,每句均含必要信息,无法在不损害语义的前提下缩短”)。
|
|
277
|
+
|
|
278
|
+
**Q:扩展时如何避免注水?**
|
|
279
|
+
A:优先使用技巧 2.1-2.4,这些技巧均基于论文已有信息进行显式化或深度解读。若确无扩展空间,同样标记 `failed`。
|
|
280
|
+
|
|
281
|
+
**Q:是否可以跨段落操作?**
|
|
282
|
+
A:原则上应优先在目标段落内解决。若确需跨段落(如将前一页的句子后移以消除孤行),需明确标注跨段落的改动范围,并确保逻辑连贯。
|
|
283
|
+
|
|
284
|
+
**Q:改写后是否需要重新编译?**
|
|
285
|
+
A:是。任何语义改写都可能改变分页,必须重新编译并经过视觉验收,确保达到了预期效果且未引入新缺陷。
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
**Writing Polish Skill 就绪。**
|