chem-pdf2ppt 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +235 -0
- package/README_EN.md +239 -0
- package/SKILL.md +469 -0
- package/SKILL_EN.md +473 -0
- package/assets/academic_template.html +197 -0
- package/cli.js +57 -0
- package/examples/example_usage.py +407 -0
- package/index.js +109 -0
- package/package.json +50 -0
- package/references/chemistry_templates.md +228 -0
- package/references/chemistry_templates_en.md +228 -0
- package/references/visual_style.md +172 -0
- package/references/visual_style_en.md +172 -0
- package/requirements.txt +20 -0
- package/scripts/analyze_paper.py +334 -0
- package/scripts/convert_to_images.py +67 -0
- package/scripts/create_ppt.py +712 -0
- package/scripts/extract_charts.py +425 -0
- package/scripts/generate_html.py +288 -0
package/README.md
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
1
|
+
# PDF2PPT — 化学学术论文 → 演示文稿转换器
|
|
2
|
+
|
|
3
|
+
将化学领域学术论文 PDF 转换为专业学术演示文稿。支持 **PPTX** 和 **HTML** 两种输出格式。
|
|
4
|
+
|
|
5
|
+
## 核心特性
|
|
6
|
+
|
|
7
|
+
- **论文化学类型自动识别**:实验化学 / 理论计算化学 / 实验+理论混合,自动匹配叙事结构
|
|
8
|
+
- **双格式输出**:PPTX(python-pptx)和单文件 HTML(横向翻页,图片 base64 嵌入)
|
|
9
|
+
- **化学领域深度适配**:催化、材料、有机合成、计算化学、电化学、能源、环境、辐射化学等
|
|
10
|
+
- **智能内容生成**:从论文提取真实信息,不生成"请手动添加XX"占位符
|
|
11
|
+
- **多策略图表提取**:矢量图聚类 + 嵌入式图片 + 整页渲染回退,兼容 PyMuPDF 1.19–1.23+
|
|
12
|
+
- **错误追踪与报告**:全链路 JSON 报告(分析 → 提取 → 构建),Windows 编码安全
|
|
13
|
+
- **4 套学术配色**:学术经典 / 分子科技 / 绿色化学 / Nature 风格
|
|
14
|
+
- **7 种幻灯片类型**:封面、章节分隔、内容要点、图表说明(4 种布局)、数据表格、总结、致谢
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 安装
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
pip install -r requirements.txt
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**依赖**:`pymupdf>=1.19.0` · `python-pptx>=0.6.23` · `pdfplumber>=0.10.0` · `Pillow>=10.0.0`
|
|
25
|
+
|
|
26
|
+
`pdf2image` 可选(需系统安装 Poppler)。
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 快速开始
|
|
31
|
+
|
|
32
|
+
### 完整工作流
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Step 1: 分析论文类型与结构
|
|
36
|
+
python scripts/analyze_paper.py paper.pdf --json analysis.json
|
|
37
|
+
|
|
38
|
+
# Step 2: 提取图表(多策略 + 自动回退)
|
|
39
|
+
python scripts/extract_charts.py paper.pdf output/figures 300 --report
|
|
40
|
+
|
|
41
|
+
# Step 3: 构建 PPTX 或 HTML
|
|
42
|
+
python build_my_ppt.py
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### PPTX 格式
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
import sys
|
|
49
|
+
sys.path.insert(0, 'scripts')
|
|
50
|
+
from create_ppt import ChemistryPPT
|
|
51
|
+
|
|
52
|
+
ppt = ChemistryPPT(theme='academic')
|
|
53
|
+
|
|
54
|
+
ppt.add_title_slide(
|
|
55
|
+
title_cn='Ru₁/Cu 单原子合金高效电催化 CO₂ 还原',
|
|
56
|
+
title_en='Single-Atom Ru Alloyed with Cu for Efficient CO₂RR',
|
|
57
|
+
authors='Zhang, L. et al.',
|
|
58
|
+
journal='J. Am. Chem. Soc., 2024, 146, 12345',
|
|
59
|
+
doi='10.1021/jacs.4c01234'
|
|
60
|
+
)
|
|
61
|
+
|
|
62
|
+
ppt.add_section_slide('研究背景')
|
|
63
|
+
ppt.add_content_slide(
|
|
64
|
+
title='电催化 CO₂ 还原的核心挑战',
|
|
65
|
+
bullets=[
|
|
66
|
+
'CO₂RR 产物分布广泛,选择性控制困难',
|
|
67
|
+
'Cu 基催化剂 C₂₊ FE 通常 < 50%',
|
|
68
|
+
'关键瓶颈:*CO 吸附能与 C-C 偶联动力学的矛盾'
|
|
69
|
+
]
|
|
70
|
+
)
|
|
71
|
+
ppt.add_figure_slide(
|
|
72
|
+
title='HAADF-STEM 确认 Ru 单原子分散',
|
|
73
|
+
figure_path='figures/p3_fig1.png',
|
|
74
|
+
bullets=['亮点均匀分散,无团簇', 'EDS mapping 确认均匀分布'],
|
|
75
|
+
figure_label='Figure 1',
|
|
76
|
+
layout='figure_right'
|
|
77
|
+
)
|
|
78
|
+
ppt.add_table_slide(
|
|
79
|
+
title='催化性能对比',
|
|
80
|
+
headers=['催化剂', 'FE(C₂₊)%', 'j (mA/cm²)', '稳定性 (h)'],
|
|
81
|
+
rows=[['Ru₁/Cu', '82%', '300', '100'], ['Cu NPs', '45%', '150', '20']]
|
|
82
|
+
)
|
|
83
|
+
ppt.add_summary_slide(
|
|
84
|
+
title='全文总结',
|
|
85
|
+
bullets=['核心发现1', '核心发现2', '核心发现3']
|
|
86
|
+
)
|
|
87
|
+
ppt.add_thankyou_slide()
|
|
88
|
+
|
|
89
|
+
ppt.save('output/presentation.pptx')
|
|
90
|
+
ppt.save_report('output/presentation.pptx') # 生成 JSON 构建报告
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### HTML 格式
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
import sys
|
|
97
|
+
sys.path.insert(0, 'scripts')
|
|
98
|
+
from generate_html import HtmlPPT
|
|
99
|
+
|
|
100
|
+
html = HtmlPPT(title="学术报告", theme="molecular")
|
|
101
|
+
|
|
102
|
+
# API 与 ChemistryPPT 完全一致
|
|
103
|
+
html.add_title_slide("标题", title_en="Title", authors="...", journal="...")
|
|
104
|
+
html.add_section_slide("第一部分")
|
|
105
|
+
html.add_content_slide("要点标题", ["bullet 1", "bullet 2"])
|
|
106
|
+
html.add_figure_slide("图表", figure_path="figures/fig1.png",
|
|
107
|
+
bullets=["说明"], figure_label="Figure 1",
|
|
108
|
+
layout="figure_right")
|
|
109
|
+
html.add_summary_slide("总结", ["结论1", "结论2"])
|
|
110
|
+
html.add_thankyou_slide()
|
|
111
|
+
|
|
112
|
+
html.save('output/presentation.html') # 单文件,可直接浏览器打开
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**HTML 特性**:
|
|
116
|
+
- 图片以 base64 嵌入,单文件零依赖
|
|
117
|
+
- 横向翻页:键盘 ← → Home End、滚轮、触摸滑动、底部圆点导航
|
|
118
|
+
- 页码追踪 + 键盘提示
|
|
119
|
+
- 响应式设计,适配投影仪和移动端
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## 配色主题
|
|
124
|
+
|
|
125
|
+
| 主题 | PPTX 参数 | HTML 参数 | 适合 |
|
|
126
|
+
|------|-----------|-----------|------|
|
|
127
|
+
| 学术经典 | `theme="academic"` | `theme="academic"` | 通用化学(默认) |
|
|
128
|
+
| 分子科技 | `theme="molecular"` | `theme="molecular"` | 计算化学/材料 |
|
|
129
|
+
| 绿色化学 | `theme="green"` | `theme="green"` | 催化/能源/环境 |
|
|
130
|
+
| Nature 风格 | `theme="nature"` | `theme="nature"` | CNS 期刊汇报 |
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## 幻灯片类型
|
|
135
|
+
|
|
136
|
+
| 方法 | PPTX | HTML | 用途 |
|
|
137
|
+
|------|------|------|------|
|
|
138
|
+
| `add_title_slide()` | ✓ | ✓ | 封面页(中英文标题、作者、期刊、DOI) |
|
|
139
|
+
| `add_section_slide()` | ✓ | ✓ | 章节分隔页(深色背景) |
|
|
140
|
+
| `add_content_slide()` | ✓ | ✓ | 文字要点页(标题 + bullets + 备注) |
|
|
141
|
+
| `add_figure_slide()` | ✓ | ✓ | 图表+说明(4 种布局:right/top/left/full) |
|
|
142
|
+
| `add_table_slide()` | ✓ | ✓ | 数据对比表(斑马纹、表头着色) |
|
|
143
|
+
| `add_image_grid_slide()` | ✓ | — | 多图网格页 |
|
|
144
|
+
| `add_summary_slide()` | ✓ | ✓ | 总结页(浅色背景) |
|
|
145
|
+
| `add_thankyou_slide()` | ✓ | ✓ | 致谢/提问页 |
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## 图表提取:多策略 + 版本兼容
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
策略 1: cluster_drawings() 默认容忍度 (3,3)
|
|
153
|
+
↓ 结果 < 3
|
|
154
|
+
策略 2: 多容忍度尝试 (6,6) → (10,10) → (15,15) → (20,20)
|
|
155
|
+
↓
|
|
156
|
+
策略 3: 提取嵌入式位图 (get_images)
|
|
157
|
+
↓ 结果 < 3
|
|
158
|
+
策略 4: 图页整页渲染回退
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
- 兼容 PyMuPDF 1.19+ (`get_drawings` 手动聚类) 和 1.23+ (`cluster_drawings`)
|
|
162
|
+
- `--report` 输出 `extraction_report.json`(各策略提取详情)
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## 错误处理与报告
|
|
167
|
+
|
|
168
|
+
全链路 JSON 报告,便于自动化集成和问题诊断:
|
|
169
|
+
|
|
170
|
+
| 阶段 | 报告文件 | 生成方式 |
|
|
171
|
+
|------|---------|---------|
|
|
172
|
+
| 论文分析 | `analysis.json` | `analyze_paper.py --json analysis.json` |
|
|
173
|
+
| 图表提取 | `extraction_report.json` | `extract_charts.py --report` |
|
|
174
|
+
| PPTX 构建 | `presentation_report.json` | `ppt.save_report("output.pptx")` |
|
|
175
|
+
|
|
176
|
+
**Windows 编码安全**:所有脚本使用 `_safe_print()` 避免 GBK 编码崩溃。
|
|
177
|
+
|
|
178
|
+
**常见问题自动诊断**:
|
|
179
|
+
|
|
180
|
+
| 症状 | 可能原因 | 脚本输出 |
|
|
181
|
+
|------|---------|---------|
|
|
182
|
+
| 矢量图提取 0 个 | PyMuPDF < 1.23 或 PDF 渲染特殊 | 自动回退到 `get_drawings` 手动聚类 |
|
|
183
|
+
| 图表总数仍不足 | 图片均为嵌入式位图 | 策略 3 自动覆盖 |
|
|
184
|
+
| Windows `print` 崩溃 | Unicode 字符 (如 − ₂) | `_safe_print` 回退 ASCII |
|
|
185
|
+
| 论文类型误判 | 参考文献含表征术语 | 加权检测 + confidence 标注 |
|
|
186
|
+
| PPT 中图片缺失 | 图片路径不存在 | 记录到 `missing_images`,构建不中断 |
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## 论文化学类型适配
|
|
191
|
+
|
|
192
|
+
| 实验化学 | 理论计算化学 | 实验+理论混合 |
|
|
193
|
+
|---------|------------|-------------|
|
|
194
|
+
| 合成 → 表征 → 性能 → 机理 | 方法 → 模型 → 电子结构 → 能量 → 机理 | 实验 → 计算 → 互验 → 统一机理 |
|
|
195
|
+
| 催化/材料/有机/能源 | DFT/MM/AIMD/电子结构 | 实验+DFT 联合 |
|
|
196
|
+
|
|
197
|
+
详细模板见 `references/chemistry_templates.md`。
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## 文件结构
|
|
202
|
+
|
|
203
|
+
```
|
|
204
|
+
PDF2PPT/
|
|
205
|
+
├── SKILL.md # Skill 主文件
|
|
206
|
+
├── README.md # 本文件
|
|
207
|
+
├── requirements.txt
|
|
208
|
+
├── assets/
|
|
209
|
+
│ └── academic_template.html # HTML PPT 模板(CSS + 翻页 JS)
|
|
210
|
+
├── scripts/
|
|
211
|
+
│ ├── create_ppt.py # PPTX 构建器 (ChemistryPPT)
|
|
212
|
+
│ ├── generate_html.py # HTML 构建器 (HtmlPPT)
|
|
213
|
+
│ ├── extract_charts.py # 多策略图表提取
|
|
214
|
+
│ ├── analyze_paper.py # 论文分析 + 类型分类
|
|
215
|
+
│ └── convert_to_images.py # PDF 页面 → 图片
|
|
216
|
+
├── references/
|
|
217
|
+
│ ├── chemistry_templates.md # 三种论文类型的逐页模板
|
|
218
|
+
│ └── visual_style.md # 学术 PPT 视觉设计规范
|
|
219
|
+
└── examples/
|
|
220
|
+
└── example_usage.py # 三种化学论文类型的完整示例
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## 兼容性
|
|
226
|
+
|
|
227
|
+
- **OS**: macOS / Linux / Windows
|
|
228
|
+
- **Python**: 3.8+
|
|
229
|
+
- **PyMuPDF**: 1.19+(自动兼容新旧 API)
|
|
230
|
+
- **环境**: Claude Code / Claude Desktop / Cursor / VS Code / 任何 Python 环境
|
|
231
|
+
- **HTML 输出**: 任何现代浏览器(Chrome / Firefox / Edge / Safari)
|
|
232
|
+
|
|
233
|
+
## 许可证
|
|
234
|
+
|
|
235
|
+
MIT License
|
package/README_EN.md
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
# PDF2PPT — Chemistry Academic Paper → Presentation Converter
|
|
2
|
+
|
|
3
|
+
Convert chemistry academic paper PDFs into professional presentations for group meetings, defenses, and academic reports. Supports both **PPTX** and **HTML** output formats.
|
|
4
|
+
|
|
5
|
+
## Key Features
|
|
6
|
+
|
|
7
|
+
- **Automatic paper type recognition**: Experimental / Computational / Hybrid chemistry — auto-matched narrative structure
|
|
8
|
+
- **Dual output formats**: PPTX (python-pptx) and single-file HTML (horizontal-slide, base64-embedded figures)
|
|
9
|
+
- **Deep chemistry domain support**: Catalysis, materials, organic synthesis, computational chemistry, electrochemistry, energy, environmental, radiation chemistry, and more
|
|
10
|
+
- **Intelligent content generation**: Extracts real information from papers — no "please fill in XX" placeholder content
|
|
11
|
+
- **Multi-strategy figure extraction**: Vector clustering + embedded images + page rendering fallback, compatible with PyMuPDF 1.19–1.23+
|
|
12
|
+
- **Error tracking & reporting**: Full-chain JSON reports (analysis → extraction → build), Windows encoding-safe
|
|
13
|
+
- **4 academic color themes**: Academic Classic / Molecular Tech / Green Chemistry / Nature Style
|
|
14
|
+
- **7 slide types**: Title, section divider, content, figure (4 layouts), data table, summary, thank you
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
pip install -r requirements.txt
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Dependencies**: `pymupdf>=1.19.0` · `python-pptx>=0.6.23` · `pdfplumber>=0.10.0` · `Pillow>=10.0.0`
|
|
25
|
+
|
|
26
|
+
`pdf2image` is optional (requires system Poppler installation).
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Quick Start
|
|
31
|
+
|
|
32
|
+
### Complete Workflow
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Step 1: Analyze paper type and structure
|
|
36
|
+
python scripts/analyze_paper.py paper.pdf --json analysis.json
|
|
37
|
+
|
|
38
|
+
# Step 2: Extract figures (multi-strategy + auto-fallback)
|
|
39
|
+
python scripts/extract_charts.py paper.pdf output/figures 300 --report
|
|
40
|
+
|
|
41
|
+
# Step 3: Build PPTX or HTML
|
|
42
|
+
python build_my_ppt.py
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### PPTX Format
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
import sys
|
|
49
|
+
sys.path.insert(0, 'scripts')
|
|
50
|
+
from create_ppt import ChemistryPPT
|
|
51
|
+
|
|
52
|
+
ppt = ChemistryPPT(theme='academic')
|
|
53
|
+
|
|
54
|
+
ppt.add_title_slide(
|
|
55
|
+
title_cn='Ru₁/Cu Single-Atom Alloy for Efficient CO₂ Electroreduction',
|
|
56
|
+
title_en='Single-Atom Ru Alloyed with Cu for Efficient CO₂RR',
|
|
57
|
+
authors='Zhang, L. et al.',
|
|
58
|
+
journal='J. Am. Chem. Soc., 2024, 146, 12345',
|
|
59
|
+
doi='10.1021/jacs.4c01234'
|
|
60
|
+
)
|
|
61
|
+
|
|
62
|
+
ppt.add_section_slide('Background')
|
|
63
|
+
ppt.add_content_slide(
|
|
64
|
+
title='Core Challenge in Electrocatalytic CO₂ Reduction',
|
|
65
|
+
bullets=[
|
|
66
|
+
'CO₂RR produces diverse products — selectivity control is difficult',
|
|
67
|
+
'Cu-based catalysts achieve C₂₊ FE typically < 50%',
|
|
68
|
+
'Key bottleneck: conflicting demands on *CO binding and C-C coupling'
|
|
69
|
+
]
|
|
70
|
+
)
|
|
71
|
+
ppt.add_figure_slide(
|
|
72
|
+
title='HAADF-STEM Confirms Single-Atom Ru Dispersion',
|
|
73
|
+
figure_path='figures/p3_fig1.png',
|
|
74
|
+
bullets=['Bright dots uniformly dispersed — no clusters', 'EDS mapping confirms uniformity'],
|
|
75
|
+
figure_label='Figure 1',
|
|
76
|
+
layout='figure_right'
|
|
77
|
+
)
|
|
78
|
+
ppt.add_table_slide(
|
|
79
|
+
title='Performance Comparison',
|
|
80
|
+
headers=['Catalyst', 'FE(C₂₊)%', 'j (mA/cm²)', 'Stability (h)'],
|
|
81
|
+
rows=[['Ru₁/Cu', '82%', '300', '100'], ['Cu NPs', '45%', '150', '20']]
|
|
82
|
+
)
|
|
83
|
+
ppt.add_summary_slide(
|
|
84
|
+
title='Summary',
|
|
85
|
+
bullets=['Finding 1', 'Finding 2', 'Finding 3']
|
|
86
|
+
)
|
|
87
|
+
ppt.add_thankyou_slide()
|
|
88
|
+
|
|
89
|
+
ppt.save('output/presentation.pptx')
|
|
90
|
+
ppt.save_report('output/presentation.pptx') # JSON build report
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### HTML Format
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
import sys
|
|
97
|
+
sys.path.insert(0, 'scripts')
|
|
98
|
+
from generate_html import HtmlPPT
|
|
99
|
+
|
|
100
|
+
html = HtmlPPT(title="Academic Report", theme="molecular")
|
|
101
|
+
|
|
102
|
+
# API is identical to ChemistryPPT
|
|
103
|
+
html.add_title_slide("Title", title_en="Title EN", authors="...", journal="...")
|
|
104
|
+
html.add_section_slide("Part 1")
|
|
105
|
+
html.add_content_slide("Key Point", ["bullet 1", "bullet 2"])
|
|
106
|
+
html.add_figure_slide("Figure", figure_path="figures/fig1.png",
|
|
107
|
+
bullets=["description"], figure_label="Figure 1",
|
|
108
|
+
layout="figure_right")
|
|
109
|
+
html.add_summary_slide("Summary", ["finding 1", "finding 2"])
|
|
110
|
+
html.add_thankyou_slide()
|
|
111
|
+
|
|
112
|
+
html.save('output/presentation.html') # Single file, open directly in browser
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**HTML features**:
|
|
116
|
+
- Figures embedded as base64 — single file, zero dependencies
|
|
117
|
+
- Horizontal navigation: keyboard ← → Home End, scroll wheel, touch swipe, dot navigation
|
|
118
|
+
- Page counter + keyboard hints
|
|
119
|
+
- Responsive design for projectors and mobile
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Color Themes
|
|
124
|
+
|
|
125
|
+
| Theme | PPTX param | HTML param | Best for |
|
|
126
|
+
|-------|-----------|------------|----------|
|
|
127
|
+
| Academic Classic | `theme="academic"` | `theme="academic"` | General chemistry (default) |
|
|
128
|
+
| Molecular Tech | `theme="molecular"` | `theme="molecular"` | Computational/materials |
|
|
129
|
+
| Green Chemistry | `theme="green"` | `theme="green"` | Catalysis/energy/environment |
|
|
130
|
+
| Nature Style | `theme="nature"` | `theme="nature"` | CNS journal presentations |
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Slide Types
|
|
135
|
+
|
|
136
|
+
| Method | PPTX | HTML | Purpose |
|
|
137
|
+
|--------|------|------|---------|
|
|
138
|
+
| `add_title_slide()` | ✓ | ✓ | Cover: bilingual title, authors, journal, DOI |
|
|
139
|
+
| `add_section_slide()` | ✓ | ✓ | Section divider (dark background) |
|
|
140
|
+
| `add_content_slide()` | ✓ | ✓ | Text content: title + bullets + notes |
|
|
141
|
+
| `add_figure_slide()` | ✓ | ✓ | Figure + explanation (4 layouts: right/top/left/full) |
|
|
142
|
+
| `add_table_slide()` | ✓ | ✓ | Data comparison table (striped rows, colored header) |
|
|
143
|
+
| `add_image_grid_slide()` | ✓ | — | Multi-image grid |
|
|
144
|
+
| `add_summary_slide()` | ✓ | ✓ | Summary (light background) |
|
|
145
|
+
| `add_thankyou_slide()` | ✓ | ✓ | Thank you / Q&A |
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Figure Extraction: Multi-Strategy + Version Compatibility
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
Strategy 1: cluster_drawings() default tolerance (3,3)
|
|
153
|
+
↓ < 3 results
|
|
154
|
+
Strategy 2: Multi-tolerance retry (6,6) → (10,10) → (15,15) → (20,20)
|
|
155
|
+
↓
|
|
156
|
+
Strategy 3: Extract embedded bitmaps (get_images)
|
|
157
|
+
↓ < 3 results
|
|
158
|
+
Strategy 4: Full-page rendering fallback for figure pages
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
- Compatible with PyMuPDF 1.19+ (`get_drawings` manual clustering) and 1.23+ (`cluster_drawings`)
|
|
162
|
+
- `--report` outputs `extraction_report.json` (per-strategy details)
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## Error Handling & Reports
|
|
167
|
+
|
|
168
|
+
Full-chain JSON reports for workflow automation and diagnostics:
|
|
169
|
+
|
|
170
|
+
| Stage | Report File | Generated By |
|
|
171
|
+
|-------|------------|--------------|
|
|
172
|
+
| Paper analysis | `analysis.json` | `analyze_paper.py --json analysis.json` |
|
|
173
|
+
| Figure extraction | `extraction_report.json` | `extract_charts.py --report` |
|
|
174
|
+
| PPTX build | `presentation_report.json` | `ppt.save_report("output.pptx")` |
|
|
175
|
+
|
|
176
|
+
**Windows encoding safety**: All scripts use `_safe_print()` to prevent GBK encoding crashes.
|
|
177
|
+
|
|
178
|
+
**Common issue diagnostics**:
|
|
179
|
+
|
|
180
|
+
| Symptom | Likely Cause | Script Behavior |
|
|
181
|
+
|---------|-------------|-----------------|
|
|
182
|
+
| 0 vector figures extracted | PyMuPDF < 1.23 or special PDF rendering | Auto-fallback to `get_drawings` manual clustering |
|
|
183
|
+
| Still too few figures total | All bitmaps | Strategy 3 auto-covers |
|
|
184
|
+
| Windows `print` crash | Unicode chars (e.g. − ₂) | `_safe_print` fallback to ASCII |
|
|
185
|
+
| Paper type misclassified | Characterization terms in references | Weighted detection + confidence annotation |
|
|
186
|
+
| Images missing in PPT | Non-existent file paths | Recorded in `missing_images`, build continues |
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Paper Type Adaptations
|
|
191
|
+
|
|
192
|
+
| Experimental Chemistry | Computational Chemistry | Experimental + Theoretical |
|
|
193
|
+
|------------------------|------------------------|---------------------------|
|
|
194
|
+
| Synthesis → Characterization → Performance → Mechanism | Method → Model → Electronic Structure → Energetics → Mechanism | Experiment → Computation → Cross-Validation → Unified Mechanism |
|
|
195
|
+
| Catalysis / Materials / Organic / Energy | DFT / MM / AIMD / Electronic Structure | Experiment + DFT joint studies |
|
|
196
|
+
|
|
197
|
+
Detailed templates in `references/chemistry_templates.md`.
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## File Structure
|
|
202
|
+
|
|
203
|
+
```
|
|
204
|
+
PDF2PPT/
|
|
205
|
+
├── SKILL.md # Skill main file (Chinese)
|
|
206
|
+
├── SKILL_EN.md # Skill main file (English)
|
|
207
|
+
├── README.md # This file (Chinese)
|
|
208
|
+
├── README_EN.md # This file (English)
|
|
209
|
+
├── requirements.txt
|
|
210
|
+
├── assets/
|
|
211
|
+
│ └── academic_template.html # HTML PPT template (CSS + navigation JS)
|
|
212
|
+
├── scripts/
|
|
213
|
+
│ ├── create_ppt.py # PPTX builder (ChemistryPPT)
|
|
214
|
+
│ ├── generate_html.py # HTML builder (HtmlPPT)
|
|
215
|
+
│ ├── extract_charts.py # Multi-strategy figure extraction
|
|
216
|
+
│ ├── analyze_paper.py # Paper analysis + type classification
|
|
217
|
+
│ └── convert_to_images.py # PDF page → image conversion
|
|
218
|
+
├── references/
|
|
219
|
+
│ ├── chemistry_templates.md # Slide-by-slide templates for 3 paper types
|
|
220
|
+
│ ├── chemistry_templates_en.md # (English version)
|
|
221
|
+
│ ├── visual_style.md # Academic PPT visual design spec
|
|
222
|
+
│ └── visual_style_en.md # (English version)
|
|
223
|
+
└── examples/
|
|
224
|
+
└── example_usage.py # Complete examples for 3 chemistry paper types
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Compatibility
|
|
230
|
+
|
|
231
|
+
- **OS**: macOS / Linux / Windows
|
|
232
|
+
- **Python**: 3.8+
|
|
233
|
+
- **PyMuPDF**: 1.19+ (auto-detects and adapts API)
|
|
234
|
+
- **Environment**: Claude Code / Claude Desktop / Cursor / VS Code / any Python environment
|
|
235
|
+
- **HTML output**: Any modern browser (Chrome / Firefox / Edge / Safari)
|
|
236
|
+
|
|
237
|
+
## License
|
|
238
|
+
|
|
239
|
+
MIT License
|