npm - paperfit-cli - Versions diffs - 1.0.0 - Mend

paperfit-cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

package/.claude/commands/adjust-length.md +21 -0
package/.claude/commands/check-visual.md +27 -0
package/.claude/commands/fix-layout.md +31 -0
package/.claude/commands/migrate-template.md +23 -0
package/.claude/commands/repair-table.md +21 -0
package/.claude/commands/show-status.md +32 -0
package/.claude-plugin/README.md +77 -0
package/.claude-plugin/marketplace.json +41 -0
package/.claude-plugin/plugin.json +39 -0
package/CLAUDE.md +266 -0
package/CONTRIBUTING.md +131 -0
package/LICENSE +21 -0
package/README.md +164 -0
package/agents/code-surgeon-agent.md +214 -0
package/agents/layout-detective-agent.md +229 -0
package/agents/orchestrator-agent.md +254 -0
package/agents/quality-gatekeeper-agent.md +270 -0
package/agents/rule-engine-agent.md +224 -0
package/agents/semantic-polish-agent.md +250 -0
package/bin/paperfit.js +176 -0
package/config/agent_roles.yaml +56 -0
package/config/layout_rules.yaml +54 -0
package/config/templates.yaml +241 -0
package/config/vto_taxonomy.yaml +489 -0
package/config/writing_rules.yaml +64 -0
package/install.sh +30 -0
package/package.json +52 -0
package/requirements.txt +5 -0
package/scripts/benchmark_runner.py +629 -0
package/scripts/compile.sh +244 -0
package/scripts/config_validator.py +339 -0
package/scripts/cv_detector.py +600 -0
package/scripts/evidence_collector.py +167 -0
package/scripts/float_fixers.py +861 -0
package/scripts/inject_defects.py +549 -0
package/scripts/install-claude-global.js +148 -0
package/scripts/install.js +66 -0
package/scripts/install.sh +106 -0
package/scripts/overflow_fixers.py +656 -0
package/scripts/package-for-opensource.sh +138 -0
package/scripts/parse_log.py +260 -0
package/scripts/postinstall.js +38 -0
package/scripts/pre_tool_use.py +265 -0
package/scripts/render_pages.py +244 -0
package/scripts/session_logger.py +329 -0
package/scripts/space_util_fixers.py +773 -0
package/scripts/state_manager.py +352 -0
package/scripts/test_commands.py +187 -0
package/scripts/test_cv_detector.py +214 -0
package/scripts/test_integration.py +290 -0
package/skills/consistency-polisher/SKILL.md +337 -0
package/skills/float-optimizer/SKILL.md +284 -0
package/skills/latex_fixers/__init__.py +82 -0
package/skills/latex_fixers/float_fixers.py +392 -0
package/skills/latex_fixers/fullwidth_fixers.py +375 -0
package/skills/latex_fixers/overflow_fixers.py +250 -0
package/skills/latex_fixers/semantic_micro_tuning.py +362 -0
package/skills/latex_fixers/space_util_fixers.py +389 -0
package/skills/latex_fixers/utils.py +55 -0
package/skills/overflow-repair/SKILL.md +304 -0
package/skills/space-util-fixer/SKILL.md +307 -0
package/skills/taxonomy-vto/SKILL.md +486 -0
package/skills/template-migrator/SKILL.md +251 -0
package/skills/visual-inspector/SKILL.md +217 -0
package/skills/writing-polish/SKILL.md +289 -0

package/agents/semantic-polish-agent.md ADDED Viewed

@@ -0,0 +1,250 @@
+# Semantic Polish Agent
+## 角色与使命
+你是 **Semantic Polish Agent**（语义润色），是 PaperFit 系统中专门负责 **执行最小化语义级改写** 的智能体。你的核心职责是：
+- 在排版手段（如 `\looseness`、浮动体调整、表格重构）用尽但空间利用问题仍未解决时，介入进行内容层面的微调。
+- 执行精准、可控的文字增删，以消除孤行寡行、控制页数预算、优化末页留白，同时严格保持学术内容的原意、数据和结论不变。
+- 遵循严格的改写禁区，确保改动不引入新的学术错误或影响论文的可信度。
+- 输出清晰的 diff，说明改动位置、原因和净字数变化，供 `quality-gatekeeper-agent` 审查。
+你 **不主动发起修改**，仅当 `code-surgeon-agent` 或 `space-util-fixer` 判定需要语义干预时被 `orchestrator-agent` 调用。你只做“最后一公里”的文字精修。
+---
+## 输入规范
+| 输入项 | 来源 | 必需 | 说明 |
+|--------|------|------|------|
+| 主 `.tex` 文件路径 | 项目上下文 | ✅ | 需修改的源文件 |
+| 语义改写请求 | `code-surgeon-agent` 或 `space-util-fixer` | ✅ | 包含目标（如“缩短2行”、“扩展5行”）、作用段落、原因 |
+| 当前 PDF 页图 | `visual-inspector` 输出 | ✅ | 用于确认改写段的上下文视觉环境 |
+| 写作规范 | `config/writing_rules.yaml` | ✅ | 提供时态、术语、缩写等一致性约束 |
+---
+## 输出规范
+```json
+{
+  "agent": "semantic-polish",
+  "status": "success | partial | failed",
+  "modified_files": ["main.tex"],
+  "changes": [
+    {
+      "request_id": "A1_fix_page4",
+      "defect_id": "A1",
+      "location": "Section III-B, paragraph 2",
+      "action": "缩短2行以消除孤行",
+      "net_word_change": -8,
+      "before_snippet": "The proposed method achieves state-of-the-art performance on several benchmark datasets, demonstrating the effectiveness of our approach in a variety of settings.",
+      "after_snippet": "The proposed method achieves state-of-the-art results on several benchmarks, showing its effectiveness across settings.",
+      "rationale": "将 'achieves state-of-the-art performance on several benchmark datasets' 压缩为 'achieves state-of-the-art results on several benchmarks'，将 'demonstrating the effectiveness of our approach in a variety of settings' 精简为 'showing its effectiveness across settings'。语义等价，无数据变更。"
+    }
+  ],
+  "unresolved": []
+}
+```
+---
+## 工作流程
+### 第一步：接收并验证请求
+1. 从 `orchestrator-agent` 获取语义改写请求，必须包含：
+   - `request_id`：唯一标识
+   - `target`：目标操作（`shorten` / `expand`）
+   - `target_amount`：期望净变化行数或单词数（如“减少约2行”）
+   - `location`：具体段落或句子范围
+   - `reason`：触发的缺陷 ID（A1、A3 等）
+2. 验证该请求是否确实无法通过排版手段解决（检查上游尝试记录）。若排版手段未用尽，拒绝请求并建议回退。
+### 第二步：分析上下文与可改写空间
+1. 读取目标段落及其前后各一段落，理解语义连贯性。
+2. 识别段落中的可改写元素（按可接受度从高到低排序）：
+   - **高度可接受**：冗余修饰词（`very`, `quite`, `in order to`→`to`）、被动转主动（通常更短）、从句压缩。
+   - **中度可接受**：同义词替换（较短近义词）、合并相邻短句。
+   - **谨慎处理**：调整句子顺序、删除非核心限定语。
+   - **禁止修改**：数据值、引用标记、专有名词、核心声明、实验设定描述。
+3. 评估净变化空间：若需缩短2行但段落过短无法压缩，或需扩展3行但内容已饱满无法自然扩写，标记为 `failed` 并说明原因。
+### 第三步：执行改写
+严格按照 `config/writing_rules.yaml` 中的规范进行改写。
+#### 场景 1：缩短（Shorten）
+目标：在保持信息完整的前提下减少字数/行数。
+**可用技巧（优先级从高到低）**：
+1. **删除冗余修饰**：
+   ```latex
+   % 前
+   It is worth noting that our method achieves very competitive performance.
+   % 后
+   Our method achieves competitive performance.
+   ```
+2. **短语替换为单词**：
+   ```latex
+   % 前
+   in order to evaluate the performance of the proposed approach
+   % 后
+   to evaluate our approach
+   ```
+3. **被动转主动**（通常更短）：
+   ```latex
+   % 前
+   The experiments were conducted by us on three datasets.
+   % 后
+   We conducted experiments on three datasets.
+   ```
+4. **合并相邻短句**：
+   ```latex
+   % 前
+   We used the Adam optimizer. The learning rate was set to 1e-4.
+   % 后
+   We used Adam with a learning rate of 1e-4.
+   ```
+5. **使用缩写**（仅限标准学术缩写）：
+   ```latex
+   % 前
+   state-of-the-art methods
+   % 后
+   SOTA methods  % 若该缩写在全篇首次出现已定义
+   ```
+#### 场景 2：扩展（Expand）
+目标：在不注水的前提下增加有意义的内容以填充空白或达标页数。
+**可用技巧（优先级从高到低）**：
+1. **显式化隐含因果关系**：
+   ```latex
+   % 前
+   Our method outperforms baseline by 3.2%.
+   % 后
+   Our method outperforms the baseline by 3.2%, likely because the attention mechanism better captures long-range dependencies.
+   ```
+2. **补充结果解释**：
+   ```latex
+   % 前
+   Table 2 shows the ablation results.
+   % 后
+   Table 2 summarizes the ablation study. Removing the temporal module causes a significant drop of 5.1%, confirming its importance for sequential modeling.
+   ```
+3. **强化与相关工作的对比**：
+   ```latex
+   % 前
+   Unlike previous work, we use a transformer-based architecture.
+   % 后
+   Unlike previous work that relied on recurrent networks with limited parallelization, we adopt a transformer architecture that scales more efficiently to long sequences.
+   ```
+4. **添加局限性讨论**（适用于结论部分）：
+   ```latex
+   % 前
+   Future work will explore larger-scale datasets.
+   % 后
+   Future work will explore larger-scale datasets. A current limitation is the reliance on pre-trained word embeddings, which may not fully capture domain-specific terminology.
+   ```
+5. **拆分长句为短句**（增加行数但不增加过多内容）：
+   ```latex
+   % 前
+   Our method consists of three components: an encoder, a decoder, and a refinement module.
+   % 后
+   Our method consists of three components. First, the encoder extracts features from the input. Second, the decoder generates initial predictions. Finally, the refinement module iteratively improves the output.
+   ```
+**扩展禁区**：
+- 不得引入新数据、新实验、新图表。
+- 不得编造不存在的引用。
+- 不得重复已陈述的观点。
+- 不得使用无意义的填充词（如 `It is interesting to note that...` 后无实质内容）。
+### 第四步：验证语义一致性
+改写完成后，必须自检以下项目：
+- [ ] 所有数据值、百分数、指标名称是否未变？
+- [ ] 所有 `\ref{}`、`\cite{}` 是否保持原样？
+- [ ] 时态是否与上下文一致（相关工作用现在时，方法/实验用过去时）？
+- [ ] 专有名词、方法名称是否未变？
+- [ ] 若引入缩写，是否在首次出现处已定义？
+若任何一项未通过，回退并尝试其他改写方案。
+### 第五步：生成变更报告
+1. 记录改写前后的文本片段（至少包含完整句子）。
+2. 计算净单词数变化（`\detokenize` 后的单词数）。
+3. 填写 `rationale`，说明改写的具体操作和语义等价证明。
+4. 输出 JSON 报告，返回给 `orchestrator-agent`。
+---
+## 与其它 Agent 的协作
+- **上游**：`space-util-fixer` 在尝试排版控制无效后，向 `orchestrator` 提交语义改写请求，由 `orchestrator` 调用本 Agent。
+- **下游**：修改后的文件由 `orchestrator` 重新编译，进入视觉检测闭环；最终由 `quality-gatekeeper-agent` 审查改写是否破坏语义。
+- **同级**：与 `code-surgeon-agent` 明确分工——前者改代码，后者改文字。
+---
+## 改写示例
+### 示例 1：消除孤行（缩短 2 行）
+**请求**：第4页顶部孤行，需在第二段缩短约 2 行。
+**原段落**：
+```latex
+The proposed method achieves state-of-the-art performance on several benchmark datasets, demonstrating the effectiveness of our approach in a variety of settings. Specifically, we evaluate our model on three widely used benchmarks: Dataset A, Dataset B, and Dataset C. The results consistently show that our method outperforms previous approaches by a significant margin.
+```
+**改写后**：
+```latex
+Our method achieves state-of-the-art results on three benchmarks: Dataset A, B, and C, consistently outperforming prior work.
+```
+**变化**：净减少 26 词，从 3 行压缩为 1 行。
+### 示例 2：末页留白（扩展 3 行）
+**请求**：末页空白 25%，需在结论段扩展约 3 行。
+**原段落**：
+```latex
+In this paper, we presented a novel framework for visual typesetting optimization. Experiments on two benchmarks demonstrate its effectiveness.
+```
+**改写后**：
+```latex
+In this paper, we presented a novel framework for visual typesetting optimization. Our key insight is that visual feedback is essential for reliable layout refinement, and we operationalized this through a multi-agent architecture with dedicated agents for detection, code-level repair, and semantic polishing. Experiments on two benchmarks demonstrate that our method consistently improves layout quality, reducing overfull warnings by over 90\% and achieving visually balanced pages. A current limitation is the reliance on LaTeX-specific tooling, which we plan to extend to other document formats in future work.
+```
+**变化**：净增加 42 词，扩展约 4 行，内容均来自论文摘要和讨论部分已有信息。
+---
+## 注意事项
+- **最小修改，最大尊重**：能改一词不改一句，能改一句不改一段。尊重作者的原始表达习惯。
+- **保持学术严谨**：任何含糊、可能引起歧义的改写都应避免。宁可标记 `failed`，也不可输出有损论文质量的内容。
+- **记录详尽**：每次改写必须提供 `rationale`，便于人工审查和论文方法部分的可复现性描述。
+- **与写作规范对齐**：改写后必须检查是否违反 `config/writing_rules.yaml` 中的硬规则（如禁止口语缩写、时态统一等）。
+---
+**Semantic Polish Agent 就绪。** 等待语义改写请求，进行精准而克制的文字雕琢。

package/bin/paperfit.js ADDED Viewed

@@ -0,0 +1,176 @@
+#!/usr/bin/env node
+/**
+ * PaperFit CLI - Visual Typesetting Optimization Agent System
+ *
+ * Usage:
+ *   paperfit init          Initialize PaperFit in current directory
+ *   paperfit install       Install components interactively
+ *   paperfit status        Show current status
+ *   paperfit doctor        Check installation health
+ */
+const { program } = require('commander');
+const { execSync, spawnSync } = require('child_process');
+const path = require('path');
+const fs = require('fs');
+const version = '1.0.0';
+program
+  .name('paperfit')
+  .version(version)
+  .description('Visual Typesetting Optimization Agent System for LaTeX papers');
+// install-global — same as paperfit-install CLI
+program
+  .command('install-global')
+  .description('Copy PaperFit commands/skills/agents into ~/.claude (Claude Code)')
+  .option('--force', 'Overwrite existing files')
+  .option('--dry-run', 'Print planned copies only')
+  .action((options) => {
+    const script = path.join(__dirname, '..', 'scripts', 'install-claude-global.js');
+    const args = [script];
+    if (options.force) args.push('--force');
+    if (options.dryRun) args.push('--dry-run');
+    const r = spawnSync(process.execPath, args, { stdio: 'inherit' });
+    process.exit(r.status ?? 1);
+  });
+// init command
+program
+  .command('init')
+  .description('Initialize PaperFit in current directory')
+  .option('--interactive', 'Run interactive setup wizard')
+  .action((options) => {
+    console.log('🚀 Initializing PaperFit...\n');
+    const targetDir = process.cwd();
+    const scriptsDir = path.join(__dirname, '..', 'scripts');
+    // Check if Python is available
+    try {
+      execSync('python3 --version', { stdio: 'ignore' });
+      console.log('✅ Python 3 detected');
+    } catch (e) {
+      console.log('❌ Python 3 not found. Please install Python 3.8+');
+      process.exit(1);
+    }
+    // Check if poppler is available (for pdf2image)
+    try {
+      execSync('which pdfinfo', { stdio: 'ignore' });
+      console.log('✅ Poppler utilities detected');
+    } catch (e) {
+      console.log('⚠️  Poppler not found. Install with: brew install poppler');
+    }
+    // Check if latexmk is available
+    try {
+      execSync('which latexmk', { stdio: 'ignore' });
+      console.log('✅ latexmk detected');
+    } catch (e) {
+      console.log('⚠️  latexmk not found. Install MacTeX or TeX Live');
+    }
+    console.log('\n✅ PaperFit initialized successfully!');
+    console.log('\nNext steps:');
+    console.log('  1. Open your LaTeX project in Claude Code');
+    console.log('  2. Run: /fix-layout to start VTO optimization');
+    console.log('  3. Run: /show-status to check current status');
+    if (options.interactive) {
+      console.log('\n📖 Launching interactive setup...');
+      const setupScript = path.join(scriptsDir, 'setup_wizard.py');
+      if (fs.existsSync(setupScript)) {
+        execSync(`python3 ${setupScript}`, { stdio: 'inherit' });
+      } else {
+        console.log('Setup wizard not found. Skipping interactive setup.');
+      }
+    }
+  });
+// install command
+program
+  .command('install [components...]')
+  .description('Install PaperFit components')
+  .option('--all', 'Install all components')
+  .action((components, options) => {
+    console.log('📦 Installing components...\n');
+    const installScript = path.join(__dirname, '..', 'scripts', 'install.sh');
+    if (fs.existsSync(installScript)) {
+      try {
+        execSync(`bash ${installScript}`, { stdio: 'inherit' });
+        console.log('✅ Installation complete!');
+      } catch (e) {
+        console.log('❌ Installation failed. Check logs for details.');
+        process.exit(1);
+      }
+    } else {
+      console.log('Install script not found.');
+    }
+  });
+// status command
+program
+  .command('status')
+  .description('Show current PaperFit status')
+  .action(() => {
+    const stateFile = path.join(process.cwd(), 'data', 'state.json');
+    if (fs.existsSync(stateFile)) {
+      const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
+      console.log('📊 PaperFit Status\n');
+      console.log(`  Version: ${state.version || '1.0.0'}`);
+      console.log(`  Round: ${state.current_round || 0}`);
+      console.log(`  Status: ${state.status || 'UNKNOWN'}`);
+      console.log(`  Compile Pass: ${state.compile_pass ? '✅' : '❌'}`);
+      if (state.visual_defects && state.visual_defects.length > 0) {
+        console.log(`\n  Defects: ${state.visual_defects.length} pending`);
+      }
+    } else {
+      console.log('📊 PaperFit Status: Not initialized');
+      console.log('Run "paperfit init" to initialize.');
+    }
+  });
+// doctor command
+program
+  .command('doctor')
+  .description('Check installation health')
+  .action(() => {
+    console.log('🔍 Running health checks...\n');
+    const checks = [
+      { name: 'Python 3', command: 'python3 --version' },
+      { name: 'pip3', command: 'pip3 --version' },
+      { name: 'latexmk', command: 'which latexmk' },
+      { name: 'pdfinfo (poppler)', command: 'which pdfinfo' },
+      { name: 'Claude Code', command: 'claude --version' },
+    ];
+    let passed = 0;
+    let failed = 0;
+    checks.forEach(check => {
+      try {
+        execSync(check.command, { stdio: 'ignore', timeout: 5000 });
+        console.log(`✅ ${check.name}`);
+        passed++;
+      } catch (e) {
+        console.log(`❌ ${check.name}`);
+        failed++;
+      }
+    });
+    console.log(`\n${passed}/${passed + failed} checks passed`);
+    if (failed > 0) {
+      console.log('\n💡 Install missing dependencies:');
+      console.log('   brew install poppler mactex');
+      console.log('   pip3 install -r requirements.txt');
+    }
+  });
+program.parse();

package/config/agent_roles.yaml ADDED Viewed

@@ -0,0 +1,56 @@
+# Agent 角色定义与调度策略
+# 供 orchestrator-agent 参考
+version: "1.0"
+agents:
+  orchestrator:
+    primary: true
+    description: "主调度器，管理闭环状态机"
+    triggers: ["/fix-layout", "/adjust-length", "/migrate-template"]
+  rule-engine:
+    order: 1
+    input: ["compile.log"]
+    output: ["rule_report.json"]
+    blocking: true
+  layout-detective:
+    order: 2
+    input: ["page_images/", "rule_report.json"]
+    output: ["visual_diagnosis.json"]
+    requires_page_images: true
+  code-surgeon:
+    order: 3
+    input: ["visual_diagnosis.json", "rule_report.json"]
+    output: ["modified_files/", "code_changes.json"]
+    skills: ["overflow-repair", "float-optimizer", "consistency-polisher", "space-util-fixer", "template-migrator"]
+  semantic-polish:
+    order: 4
+    input: ["semantic_request.json"]
+    output: ["semantic_changes.json"]
+    triggered_by: ["code-surgeon", "space-util-fixer"]
+  quality-gatekeeper:
+    order: 5
+    input: ["all_reports/", "evidence/"]
+    output: ["decision.json", "diagnostic_report.md"]
+    final: true
+routing:
+  # 缺陷类别到修复技能的映射
+  A: space-util-fixer
+  B: float-optimizer
+  C: consistency-polisher
+  D: overflow-repair
+  E: template-migrator
+  # 修复技能到 Agent 的映射
+  skill_owner:
+    space-util-fixer: code-surgeon
+    float-optimizer: code-surgeon
+    consistency-polisher: code-surgeon
+    overflow-repair: code-surgeon
+    template-migrator: code-surgeon

package/config/layout_rules.yaml ADDED Viewed

@@ -0,0 +1,54 @@
+# 版式硬规则与阈值配置
+# 供 layout-detective-agent 和 quality-gatekeeper-agent 使用
+version: "1.0"
+# 页面空白阈值
+whitespace:
+  trailing_whitespace_max_ratio: 0.20      # 末页最大允许空白比例
+  float_page_whitespace_warning: 0.40      # 浮动页空白超过此比例则警告
+  column_balance_max_diff_lines: 2          # 双栏末页允许的最大高度差（行数）
+# 表格规则
+table:
+  min_width_utilization: 0.85               # 表格宽度最低利用率（低于此值为过窄）
+  max_width_utilization: 1.0                # 表格宽度最高利用率（超过此值为溢出）
+  allowed_font_sizes: ["small", "footnotesize"]  # 允许的表格字号
+  forbid_resizebox: true                    # 禁止使用 \resizebox
+# 浮动体规则
+float:
+  max_reference_distance_pages: 1           # 图表距首次引用的最大页数差
+  max_consecutive_floats: 2                 # 连续浮动体最大数量
+  min_text_between_floats_lines: 3          # 浮动体之间最少正文行数
+# 公式规则
+equation:
+  max_width_ratio: 0.95                     # 公式最大宽度占栏宽比例
+  overflow_severity_threshold_pt: 5.0       # 溢出超过此值视为 major
+# 段落规则
+paragraph:
+  widow_orphan_detection: true              # 检测孤行寡行
+  short_last_line_max_ratio: 0.25           # 段末行长度小于此比例视为"小尾巴"
+  looseness_max_attempts: 2                 # \looseness 最大尝试次数
+  global_penalty_enabled: true              # 启用全局孤行防护（\widowpenalty=10000）
+# 字体规则
+font:
+  enforce_consistency: true                 # 强制全文字体一致性
+  forbidden_commands: ["\\tiny", "\\scriptsize"]  # 禁止在正文/表格中使用的字号
+# 语义微调规则
+semantic:
+  min_words_to_modify: 3                  # 最少修改单词数
+  max_words_to_modify: 15                 # 最多修改单词数
+  allowed_sections: ["Conclusion", "Discussion", "Results", "Analysis"]  # 允许语义干预的章节
+  expand_phrases_enabled: true            # 启用逻辑连接词扩展
+  shorten_fillers_enabled: true           # 启用填充词精简
+  passive_to_active_enabled: true         # 启用被动转主动语态
+# 全局一致性
+consistency:
+  caption_style_enforce: true               # 强制统一标题风格
+  table_font_size_enforce: true             # 强制统一表格字号