paper-search-cli 0.2.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +2 -6
- package/README.md +149 -653
- package/README.zh.md +270 -0
- package/dist/cli.js +184 -21
- package/dist/cli.js.map +1 -1
- package/dist/config/ConfigService.d.ts +1 -1
- package/dist/config/ConfigService.d.ts.map +1 -1
- package/dist/config/ConfigService.js +1 -3
- package/dist/config/ConfigService.js.map +1 -1
- package/dist/config/ResultCaps.d.ts +4 -0
- package/dist/config/ResultCaps.d.ts.map +1 -0
- package/dist/config/ResultCaps.js +10 -0
- package/dist/config/ResultCaps.js.map +1 -0
- package/dist/core/capabilityProfile.d.ts +18 -0
- package/dist/core/capabilityProfile.d.ts.map +1 -0
- package/dist/core/capabilityProfile.js +167 -0
- package/dist/core/capabilityProfile.js.map +1 -0
- package/dist/core/diagnostics.js +16 -16
- package/dist/core/diagnostics.js.map +1 -1
- package/dist/core/handleToolCall.d.ts.map +1 -1
- package/dist/core/handleToolCall.js +33 -0
- package/dist/core/handleToolCall.js.map +1 -1
- package/dist/core/liveSmoke.d.ts +42 -0
- package/dist/core/liveSmoke.d.ts.map +1 -0
- package/dist/core/liveSmoke.js +226 -0
- package/dist/core/liveSmoke.js.map +1 -0
- package/dist/core/platformMetadata.js +2 -2
- package/dist/core/platformMetadata.js.map +1 -1
- package/dist/core/schemas.d.ts +77 -2
- package/dist/core/schemas.d.ts.map +1 -1
- package/dist/core/schemas.js +58 -3
- package/dist/core/schemas.js.map +1 -1
- package/dist/core/textReports.d.ts +21 -0
- package/dist/core/textReports.d.ts.map +1 -0
- package/dist/core/textReports.js +85 -0
- package/dist/core/textReports.js.map +1 -0
- package/dist/core/tools.d.ts.map +1 -1
- package/dist/core/tools.js +60 -1
- package/dist/core/tools.js.map +1 -1
- package/dist/platforms/BioRxivSearcher.d.ts.map +1 -1
- package/dist/platforms/BioRxivSearcher.js +40 -21
- package/dist/platforms/BioRxivSearcher.js.map +1 -1
- package/dist/platforms/CORESearcher.d.ts.map +1 -1
- package/dist/platforms/CORESearcher.js +39 -9
- package/dist/platforms/CORESearcher.js.map +1 -1
- package/dist/platforms/GoogleScholarSearcher.d.ts.map +1 -1
- package/dist/platforms/GoogleScholarSearcher.js +3 -2
- package/dist/platforms/GoogleScholarSearcher.js.map +1 -1
- package/dist/platforms/OpenAIRESearcher.js +1 -1
- package/dist/platforms/OpenAIRESearcher.js.map +1 -1
- package/dist/services/CitationService.d.ts.map +1 -1
- package/dist/services/CitationService.js +8 -2
- package/dist/services/CitationService.js.map +1 -1
- package/dist/services/JournalMetricsService.js +1 -1
- package/dist/services/JournalMetricsService.js.map +1 -1
- package/dist/services/OpenAccessFallbackService.d.ts +20 -0
- package/dist/services/OpenAccessFallbackService.d.ts.map +1 -1
- package/dist/services/OpenAccessFallbackService.js +95 -72
- package/dist/services/OpenAccessFallbackService.js.map +1 -1
- package/dist/skills/SkillInstaller.d.ts +108 -0
- package/dist/skills/SkillInstaller.d.ts.map +1 -0
- package/dist/skills/SkillInstaller.js +389 -0
- package/dist/skills/SkillInstaller.js.map +1 -0
- package/dist/utils/RateLimiter.d.ts.map +1 -1
- package/dist/utils/RateLimiter.js +7 -0
- package/dist/utils/RateLimiter.js.map +1 -1
- package/package.json +2 -2
- package/skills/paper-search/SKILL.md +52 -143
- package/skills/paper-search/references/capability-routing.md +147 -0
- package/skills/paper-search/references/cli-contract.md +152 -0
- package/skills/paper-search/references/management-layer.md +140 -0
- package/README-sc.md +0 -766
|
@@ -5,164 +5,99 @@ description: |
|
|
|
5
5
|
用于:搜索论文、查找相似研究、做文献综述初筛、验证 PMID/DOI、下载论文 PDF、
|
|
6
6
|
调用 Crossref/OpenAlex/PubMed/PMC/Europe PMC/arXiv/bioRxiv/medRxiv/Semantic Scholar/CORE/OpenAIRE/DBLP/ACM/USENIX/OpenReview/IACR 等来源,
|
|
7
7
|
使用 Semantic Scholar Open Access snippet 索引检索论文正文片段中的方法学细节,
|
|
8
|
+
通过 Semantic Scholar Graph API 查询已知论文的施引文献和参考文献,
|
|
8
9
|
以及通过 EasyScholar 查询期刊影响因子、JCR/SSCI 分区、中科院分区、JCI、ESI、预警和等级指标。
|
|
9
10
|
当用户提到“搜文献”“找论文”“文献检索”“search papers”“find papers”“literature search”
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
11
|
+
“查一下有没有相关研究”“帮我找几篇参考文献”“看看别人怎么做的”“别人怎么写”
|
|
12
|
+
“Methods 里怎么做的”“Methods 写法”“方法学写法对照”“下载论文 PDF”
|
|
13
|
+
“验证 PMID”“验证 DOI”“正文片段检索”“snippet search”“方法学细节检索”
|
|
14
|
+
“影响因子”“IF”“JCR 分区”“中科院分区”“期刊分区”“期刊等级”
|
|
15
|
+
“目标期刊指标”“journal metrics”等任务时使用。
|
|
14
16
|
此 skill 只负责指导 agent 调用 paper-search CLI;API key 必须通过 paper-search setup、
|
|
15
17
|
paper-search config、.env 或环境变量配置,绝不要写入 Skill 文件。
|
|
16
18
|
---
|
|
17
19
|
|
|
18
20
|
# Paper Search CLI
|
|
19
21
|
|
|
20
|
-
|
|
22
|
+
你是学术文献检索调度器。本 Skill 是 Routing Skill:负责把用户意图路由到 `paper-search` CLI,并维护证据、密钥和下载边界。优先通过 `paper-search` CLI 完成论文检索、元数据核验、施引/参考文献扩展、正文片段检索、期刊指标查询和 PDF 获取;不要把本 Skill 当作密钥、cookie、账号或下载策略的存储位置。
|
|
21
23
|
|
|
22
|
-
|
|
24
|
+
Reference 读取规则:
|
|
23
25
|
|
|
24
|
-
|
|
26
|
+
- 需要确认安装、配置、doctor、smoke、Skill 同步或健康状态时,读 `references/management-layer.md`。
|
|
27
|
+
- 需要在搜索、期刊指标、PDF 获取、正文片段检索之间做路由时,读 `references/capability-routing.md`。
|
|
28
|
+
- 需要核对稳定 CLI 命令、`paper-search run` 工具名、输出格式或密钥边界时,读 `references/cli-contract.md`。
|
|
29
|
+
- 如果 reference 和实际 `paper-search --help` / `paper-search tools` 冲突,以实际 CLI 为准,并报告需要更新 Skill。
|
|
25
30
|
|
|
26
|
-
|
|
27
|
-
command -v paper-search
|
|
28
|
-
paper-search status --pretty
|
|
29
|
-
```
|
|
31
|
+
## 快速自检
|
|
30
32
|
|
|
31
|
-
|
|
33
|
+
第一次使用、环境不确定,或用户问“现在能用哪些能力”时:
|
|
32
34
|
|
|
33
35
|
```bash
|
|
34
|
-
paper-search
|
|
36
|
+
command -v paper-search
|
|
37
|
+
paper-search doctor --pretty
|
|
35
38
|
```
|
|
36
39
|
|
|
37
|
-
|
|
40
|
+
需要给用户一份可读健康报告时:
|
|
38
41
|
|
|
39
42
|
```bash
|
|
40
|
-
paper-search
|
|
41
|
-
paper-search config set SEMANTIC_SCHOLAR_API_KEY your_key
|
|
42
|
-
paper-search setup EASYSCHOLAR_KEY
|
|
43
|
-
paper-search config doctor --pretty
|
|
43
|
+
paper-search doctor --format text
|
|
44
44
|
```
|
|
45
45
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
1. shell 环境变量
|
|
49
|
-
2. 当前目录 `.env`
|
|
50
|
-
3. 用户级配置 `~/.config/paper-search-cli/config.json`
|
|
51
|
-
4. 免费来源的内置默认值
|
|
52
|
-
|
|
53
|
-
## 安装缺失时
|
|
54
|
-
|
|
55
|
-
如果 `paper-search` 不存在,先告知用户需要安装。用户要求你安装时再执行:
|
|
46
|
+
安装缺失时先说明缺失;用户要求安装时再执行:
|
|
56
47
|
|
|
57
48
|
```bash
|
|
58
49
|
npm install -g paper-search-cli
|
|
59
50
|
paper-search setup
|
|
60
|
-
paper-search
|
|
51
|
+
paper-search doctor --pretty
|
|
61
52
|
```
|
|
62
53
|
|
|
63
|
-
|
|
54
|
+
用户问“如何更新”、安装后怀疑 Skill 过期,或 `doctor`/`skills status` 显示 Skill 不同步时,先区分包本体更新和 Skill 同步;不要只运行 `skills update`。完整流程查看 `references/management-layer.md` 的 `Package Update And Capability Setup`。
|
|
64
55
|
|
|
65
|
-
|
|
56
|
+
普通用户更新:
|
|
66
57
|
|
|
67
58
|
```bash
|
|
68
|
-
|
|
69
|
-
paper-search
|
|
70
|
-
paper-search
|
|
71
|
-
paper-search search "graph neural networks" --platform dblp --max-results 5 --pretty
|
|
72
|
-
paper-search search "large language models" --platform openreview --max-results 5 --pretty
|
|
59
|
+
npm install -g paper-search-cli@latest
|
|
60
|
+
paper-search skills update --targets agents --pretty
|
|
61
|
+
paper-search doctor --pretty
|
|
73
62
|
```
|
|
74
63
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
### 精确工具调用
|
|
64
|
+
本地维护者更新:
|
|
78
65
|
|
|
79
66
|
```bash
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
paper-search
|
|
67
|
+
git pull
|
|
68
|
+
npm install
|
|
69
|
+
npm run build
|
|
70
|
+
npm install -g .
|
|
71
|
+
paper-search skills update --targets agents --pretty
|
|
72
|
+
paper-search doctor --pretty
|
|
85
73
|
```
|
|
86
74
|
|
|
87
|
-
|
|
75
|
+
缺少 API key 或 email 时,不要让用户在聊天里发送密钥;提示用户用 `paper-search setup` 或 `paper-search config` 在本机配置。
|
|
88
76
|
|
|
89
|
-
|
|
90
|
-
paper-search run search_semantic_scholar --json-args '{"query":"graph neural network medicine","maxResults":5,"year":"2020-2025"}' --pretty
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 期刊指标查询
|
|
94
|
-
|
|
95
|
-
当用户询问影响因子、JCR/SSCI 分区、中科院分区、JCI、ESI、预警、北大核心、南大核心、CSCD、A&HCI、CCF、EI 或目标期刊等级时,使用 EasyScholar 原生 CLI 工具,而不是通用论文检索。
|
|
96
|
-
|
|
97
|
-
```bash
|
|
98
|
-
paper-search journal-metrics "Nature" "BMJ" --pretty
|
|
99
|
-
paper-search journal-metrics --file journals.txt --include-raw --pretty
|
|
100
|
-
paper-search run query_journal_metrics --json-args '{"journals":["Nature"],"includeRaw":true}' --pretty
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
返回时优先解释标准化的 `core` 字段:`impact_factor`、`impact_factor_5y`、`jcr_quartile`、`ssci_quartile`、`jci`、`cas_base`、`cas_upgraded`、`cas_small`、`cas_top`、`cas_zone`、`esi`、`warning`、`pku`、`cssci`、`cscd`、`ahci`、`ccf`、`ei`、`china_st_core`。如果需要完整字段,使用 `--include-raw` 并检查 `official_all`、`official_select`、`custom_rank`。
|
|
104
|
-
|
|
105
|
-
EasyScholar 文档要求每秒最多 2 次请求;批量查询时不要并发调用。
|
|
77
|
+
## 功能地图
|
|
106
78
|
|
|
107
|
-
|
|
79
|
+
本 Skill 只有五个文献主功能。`doctor`、`smoke`、`config`、`skills` 是管理层命令,不属于文献任务本身。
|
|
108
80
|
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
paper-search
|
|
112
|
-
paper-search run
|
|
113
|
-
|
|
81
|
+
| 用户意图 | 能力名 | 首选入口 | 关键边界 |
|
|
82
|
+
|---|---|---|---|
|
|
83
|
+
| 搜论文、找相关研究、验证 DOI/PMID、做文献初筛 | `metadata_search` | `paper-search search` 集成入口 / `paper-search run search_*` 精确工具入口 | 只返回和核验论文元数据;Sci-Hub 不属于搜索源 |
|
|
84
|
+
| 查询已知论文的施引文献或参考文献 | `citation_expansion` | `paper-search run get_paper_citations` / `paper-search run get_paper_references` | 需要已知 `paperId`、DOI 或 arXiv ID;不是关键词检索 |
|
|
85
|
+
| 查影响因子、JCR/SSCI/中科院分区、JCI、ESI、预警、期刊等级 | `journal_metrics` | `paper-search journal-metrics` / `paper-search run query_journal_metrics` | 这是期刊指标查询,不是论文检索;需要 `EASYSCHOLAR_KEY` |
|
|
86
|
+
| 获取或下载已确认论文的 PDF | `pdf_discovery` | `paper-search download` / `paper-search run download_with_fallback` | 先核验论文身份,再下载;Sci-Hub 是默认开启的最后 fallback |
|
|
87
|
+
| 在论文正文片段中找 Methods/参数/写法线索 | `body_snippet_search` | `paper-search run search_semantic_snippets` | 查 Semantic Scholar OA snippet 索引;需要 `SEMANTIC_SCHOLAR_API_KEY`;不是完整全文解析 |
|
|
114
88
|
|
|
115
|
-
|
|
89
|
+
## 默认工作流
|
|
116
90
|
|
|
117
|
-
|
|
91
|
+
开放式文献任务使用 Two-Stage Paper Workflow:
|
|
118
92
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
| 生物医学、临床、药学、公卫 | `pubmed` | `pmc`, `europepmc`, `semantic`, `crossref` |
|
|
122
|
-
| 正文方法学片段 | `search_semantic_snippets` | 先用 `pubmed`/`semantic` 找题名和同义词 |
|
|
123
|
-
| 计算机、AI、数学、物理 | `arxiv` | `semantic`, `crossref`, `openalex` |
|
|
124
|
-
| 计算机文献目录/会议元数据 | `dblp` | `acm`, `usenix`, `openreview`, `ieee` 需要 key |
|
|
125
|
-
| 跨学科广覆盖 | `crossref` | `openalex`, `semantic` |
|
|
126
|
-
| 开放获取全文发现 | `pmc`, `europepmc`, `core`, `openaire`, `unpaywall` | `download_with_fallback` |
|
|
127
|
-
| 期刊影响因子/分区/等级 | `journal-metrics` | `query_journal_metrics` |
|
|
128
|
-
| 密码学 | `iacr` | `arxiv` |
|
|
129
|
-
| 引用统计排序 | `semantic`, `crossref`, `openalex` | `webofscience`, `scopus` 需要 key |
|
|
130
|
-
| 出版商/付费数据库 | `webofscience`, `ieee`, `scopus`, `sciencedirect`, `springer`/`springerlink`, `wiley` | 仅在 key 已配置时使用 |
|
|
93
|
+
1. 先做 `metadata_search`:检索和核验文献条目,确认题名、作者、年份、期刊、DOI、PMID/PMCID、URL、摘要线索和相关性。
|
|
94
|
+
2. 用户确认条目或任务明确需要 PDF 后,再做 `pdf_discovery`:下载选中的已核验条目;下载失败项记录原因,不阻塞其他条目。
|
|
131
95
|
|
|
132
|
-
|
|
96
|
+
Direct Paper Request 可以跳过广泛发现:用户给出单个 DOI、PMID、PMCID、arXiv ID 或已核验清单并明确要求下载时,先核验目标身份,再进入下载。
|
|
133
97
|
|
|
134
|
-
|
|
135
|
-
- `usenix` 通过 DBLP 返回 USENIX 相关会议/期刊元数据,不抓取 USENIX 搜索页。
|
|
136
|
-
- `springerlink` 是 `springer` 的别名,仍需要 Springer API key。
|
|
98
|
+
## 验证与输出边界
|
|
137
99
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
- 默认把中文问题转为英文关键词。
|
|
141
|
-
- 用 3-8 个核心概念词,不要写成长句。
|
|
142
|
-
- 医学主题可加入 MeSH 或标准术语。
|
|
143
|
-
- 找方法细节时加入软件名、参数名、模型名、章节词,例如 `methods`, `statistical analysis`, `adjusted for`, `bootstrap`, `sensitivity analysis`。
|
|
144
|
-
|
|
145
|
-
## 正文片段检索
|
|
146
|
-
|
|
147
|
-
PubMed 只提供题名、作者、摘要、PMID、DOI、期刊和年份等元数据,不提供论文正文抓取。
|
|
148
|
-
|
|
149
|
-
正文片段检索使用:
|
|
150
|
-
|
|
151
|
-
```bash
|
|
152
|
-
paper-search run search_semantic_snippets --arg query="CMAverse mediation bootstrap confidence interval" --arg limit=5 --arg fieldsOfStudy=Medicine --pretty
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
使用规则:
|
|
156
|
-
|
|
157
|
-
1. 该工具需要 `SEMANTIC_SCHOLAR_API_KEY`。
|
|
158
|
-
2. 它检索 Semantic Scholar Open Access snippet 索引,不等于完整全文解析。
|
|
159
|
-
3. 只有 `snippetKind="body"` 的结果才能作为正文片段证据;`title` 或 `abstract` 只能作为线索。
|
|
160
|
-
4. 输出正文片段前,必须补齐和核验标题、作者、年份、期刊、DOI 或 PMID。
|
|
161
|
-
5. 如果 snippet 无结果,不代表研究不存在;回退到 `search_pubmed`、`search_semantic_scholar` 或 `search_crossref` 做摘要级检索。
|
|
162
|
-
|
|
163
|
-
## 验证规范
|
|
164
|
-
|
|
165
|
-
输出给用户前,关键论文必须尽量验证:
|
|
100
|
+
关键论文输出前尽量验证:
|
|
166
101
|
|
|
167
102
|
```bash
|
|
168
103
|
paper-search run search_pubmed --arg query="37654321[PMID]" --arg maxResults=1 --pretty
|
|
@@ -173,36 +108,11 @@ paper-search run search_crossref --arg query="full paper title" --arg maxResults
|
|
|
173
108
|
规则:
|
|
174
109
|
|
|
175
110
|
- 不凭模型记忆编造 PMID、DOI、期刊、年份或作者。
|
|
176
|
-
- PMID 必须能被 PubMed
|
|
177
|
-
- DOI 必须能被 DOI 查询或 Crossref/OpenAlex/Semantic Scholar 结果支持。
|
|
111
|
+
- PMID 必须能被 PubMed 查询确认;DOI 必须能被 DOI 查询或 Crossref/OpenAlex/Semantic Scholar 结果支持。
|
|
178
112
|
- 同一论文的 PMID、DOI、题名、第一作者和年份应一致;不一致时标记为可疑。
|
|
179
|
-
-
|
|
180
|
-
|
|
181
|
-
## 输出格式
|
|
182
|
-
|
|
183
|
-
### 文献列表
|
|
184
|
-
|
|
185
|
-
```markdown
|
|
186
|
-
| # | 标题 | 作者 | 年份 | 期刊/来源 | DOI | PMID | 验证 |
|
|
187
|
-
|---|---|---|---:|---|---|---|---|
|
|
188
|
-
| 1 | [Title](URL) | First Author et al. | 2024 | Journal | 10.xxxx/xxxxx | 12345678 | 已验证 |
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
### 正文片段结果
|
|
192
|
-
|
|
193
|
-
```markdown
|
|
194
|
-
### 发现 1
|
|
195
|
-
|
|
196
|
-
**论文:** Full paper title
|
|
197
|
-
**引用:** Author et al. Journal. Year. DOI/PMID.
|
|
198
|
-
**片段类型:** body
|
|
199
|
-
**章节:** Methods / Statistical Analysis
|
|
200
|
-
**来源:** Semantic Scholar URL
|
|
201
|
-
|
|
202
|
-
> snippet text
|
|
203
|
-
```
|
|
113
|
+
- `config` / `doctor` 输出应视为已脱敏,但不要复述、保存或写入任何原始密钥。
|
|
204
114
|
|
|
205
|
-
##
|
|
115
|
+
## 常见失败处理
|
|
206
116
|
|
|
207
117
|
| 场景 | 处理 |
|
|
208
118
|
|---|---|
|
|
@@ -210,7 +120,6 @@ paper-search run search_crossref --arg query="full paper title" --arg maxResults
|
|
|
210
120
|
| API key 缺失 | 提示运行 `paper-search setup`;不要索要或保存 key |
|
|
211
121
|
| EasyScholar key 缺失 | 提示运行 `paper-search setup EASYSCHOLAR_KEY`;不要让用户在聊天中发送 SecretKey |
|
|
212
122
|
| 429 限流 | 降低 `--max-results`,换平台,或提示配置可选 key |
|
|
213
|
-
| EasyScholar 批量查询 | 控制在每秒最多 2 次请求;优先使用单次 `journal-metrics` 多期刊输入,不要并发 |
|
|
214
123
|
| 0 结果 | 放宽关键词,换英文同义词,换平台,或用 `--sources` 扩展 |
|
|
215
124
|
| 下载失败 | 优先开放获取来源和 `download_with_fallback`,报告失败原因 |
|
|
216
125
|
| 用户要求完整正文 | 先下载 PDF;再交给当前环境可用的 PDF/MinerU 解析流程 |
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
# Capability Routing Reference
|
|
2
|
+
|
|
3
|
+
Use this reference when mapping a user literature request to one of the five main `paper-search` workflow capabilities.
|
|
4
|
+
|
|
5
|
+
## Functional Map
|
|
6
|
+
|
|
7
|
+
| User Intent | Capability | Preferred Entrypoint | Boundary |
|
|
8
|
+
|---|---|---|---|
|
|
9
|
+
| Search papers, find related work, verify DOI/PMID, screen literature | `metadata_search` | `paper-search search` integrated entrypoint / `paper-search run search_*` precise tool entrypoint | Returns and verifies paper metadata only; Sci-Hub is not a search source |
|
|
10
|
+
| Expand citation graph for a known paper | `citation_expansion` | `paper-search run get_paper_citations` / `paper-search run get_paper_references` | Requires a known `paperId`, DOI, or arXiv ID; returns citing papers or cited references, not general keyword search |
|
|
11
|
+
| Query impact factor, JCR/SSCI/CAS quartiles, JCI, ESI, warnings, journal rank | `journal_metrics` | `paper-search journal-metrics` / `paper-search run query_journal_metrics` | Journal-level lookup, not paper search; requires `EASYSCHOLAR_KEY` |
|
|
12
|
+
| Get or download a verified paper PDF | `pdf_discovery` | `paper-search download` / `paper-search run download_with_fallback` | Verify identity before download; Sci-Hub is the default enabled final fallback |
|
|
13
|
+
| Find Methods text, parameters, software, models, or statistical wording in body snippets | `body_snippet_search` | `paper-search run search_semantic_snippets` | Searches Semantic Scholar OA snippet index; requires `SEMANTIC_SCHOLAR_API_KEY`; not full-text parsing |
|
|
14
|
+
|
|
15
|
+
## Workflow Boundaries
|
|
16
|
+
|
|
17
|
+
Open-ended literature tasks use the Two-Stage Paper Workflow:
|
|
18
|
+
|
|
19
|
+
1. Run `metadata_search`: build and verify a paper list with title, authors, year, journal/source, DOI, PMID/PMCID, URL, abstract clues, and relevance.
|
|
20
|
+
2. Run `pdf_discovery` only after the user confirms selected papers or the task explicitly requires PDFs. Record failed downloads without blocking other items.
|
|
21
|
+
|
|
22
|
+
Direct Paper Requests may skip broad discovery when the user provides a DOI, PMID, PMCID, arXiv ID, or already verified paper list. The target identity still needs verification before download.
|
|
23
|
+
|
|
24
|
+
Do not fabricate PMID, DOI, title, author, journal, or year from model memory. Important citations should include the supported claim, title, authors, journal/source, year, DOI or PMID when available, and a stable URL.
|
|
25
|
+
|
|
26
|
+
## Metadata Search
|
|
27
|
+
|
|
28
|
+
Use `metadata_search` for finding papers, expanding keywords, literature screening, and verifying DOI/PMID/PMCID/arXiv ID.
|
|
29
|
+
|
|
30
|
+
`paper-search search` is the integrated metadata entrypoint:
|
|
31
|
+
|
|
32
|
+
- use `--platform NAME` for one source
|
|
33
|
+
- use `--sources a,b,c` for explicit multi-source search
|
|
34
|
+
- use `--platform all` or `--sources all` only when broad recall matters more than precision
|
|
35
|
+
|
|
36
|
+
It does not call `citation_expansion`, `journal_metrics`, `pdf_discovery`, or `body_snippet_search`.
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
paper-search search "machine learning" --platform crossref --max-results 5 --pretty
|
|
40
|
+
paper-search search "osteoarthritis occupational exposure" --platform pubmed --max-results 10 --pretty
|
|
41
|
+
paper-search search "transformer attention mechanism" --sources arxiv,semantic,crossref --max-results 5 --pretty
|
|
42
|
+
paper-search search "causal inference target trial emulation" --sources all --max-results 5 --pretty
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Precise tool entrypoints:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
paper-search run search_pubmed --arg query="osteoarthritis occupational exposure" --arg maxResults=10 --pretty
|
|
49
|
+
paper-search run search_openalex --arg query="causal inference target trial emulation" --arg maxResults=5 --pretty
|
|
50
|
+
paper-search run get_paper_by_doi --arg doi="10.xxxx/xxxxx" --pretty
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Do not treat `search_scihub` as a search source. It is DOI/URL-targeted lookup, not `metadata_search`.
|
|
54
|
+
|
|
55
|
+
## Citation Expansion
|
|
56
|
+
|
|
57
|
+
Use `citation_expansion` when the user has a known paper and asks which papers cite it or which references it cites.
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
paper-search run get_paper_citations --arg doi="10.1038/nature12373" --arg limit=5 --pretty
|
|
61
|
+
paper-search run get_paper_references --arg doi="10.1038/nature12373" --arg limit=5 --pretty
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Target priority is `paperId` > `doi` > `arxivId`. This capability uses Semantic Scholar Graph API and is separate from keyword-based `metadata_search`.
|
|
65
|
+
|
|
66
|
+
## Journal Metrics
|
|
67
|
+
|
|
68
|
+
Use `journal_metrics` for journal-level metrics: impact factor, JCR/SSCI quartiles, CAS quartiles, JCI, ESI, warnings, and rank.
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
paper-search journal-metrics "Nature" "BMJ" --pretty
|
|
72
|
+
paper-search journal-metrics --file journals.txt --include-raw --pretty
|
|
73
|
+
paper-search run query_journal_metrics --json-args '{"journals":["Nature"],"includeRaw":true}' --pretty
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
`journal_metrics` requires `EASYSCHOLAR_KEY`. If missing, tell the user to configure it locally:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
paper-search setup EASYSCHOLAR_KEY
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
For batch journal lookups, prefer one `journal-metrics` call with multiple journal names or `--file`; do not run parallel EasyScholar requests.
|
|
83
|
+
|
|
84
|
+
## PDF Discovery
|
|
85
|
+
|
|
86
|
+
Use `pdf_discovery` to get an already verified paper PDF. For open-ended literature tasks, do not begin with batch downloads.
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
paper-search download 2301.12345 --platform arxiv --save-path ./downloads --pretty
|
|
90
|
+
paper-search run download_paper --arg paperId="10.xxxx/xxxxx" --arg platform=springer --arg savePath="./downloads" --pretty
|
|
91
|
+
paper-search run download_with_fallback --json-args '{"source":"crossref","paperId":"10.xxxx/xxxxx","doi":"10.xxxx/xxxxx","title":"Paper title","savePath":"./downloads"}' --pretty
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
`download_with_fallback` order:
|
|
95
|
+
|
|
96
|
+
1. source-native download
|
|
97
|
+
2. metadata PDF URL
|
|
98
|
+
3. repository discovery through PMC, Europe PMC, CORE, OpenAIRE
|
|
99
|
+
4. Unpaywall DOI resolution
|
|
100
|
+
5. Sci-Hub as the final fallback
|
|
101
|
+
|
|
102
|
+
Sci-Hub Fallback is enabled by default. To suppress that final stage for one request:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
paper-search run download_with_fallback --json-args '{"source":"crossref","paperId":"10.xxxx/xxxxx","doi":"10.xxxx/xxxxx","title":"Paper title","savePath":"./downloads","useSciHub":false}' --pretty
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
PDF source groups:
|
|
109
|
+
|
|
110
|
+
- `open_access_sources`: arXiv, bioRxiv, medRxiv, PMC, Europe PMC, CORE, OpenAIRE, Unpaywall, OpenAlex OA metadata, Semantic Scholar openAccessPdf, publisher open-access modes, IACR
|
|
111
|
+
- `entitled_access_sources`: Web of Science, ScienceDirect, Scopus, Springer, IEEE, Wiley TDM, or other sources requiring user keys, subscriptions, TDM tokens, or institutional entitlements
|
|
112
|
+
- `scihub_sources`: Sci-Hub, separately identified as the default enabled final fallback; not OA and not entitled access
|
|
113
|
+
|
|
114
|
+
## Body Snippet Search
|
|
115
|
+
|
|
116
|
+
Use `body_snippet_search` to find Methods wording, parameters, software names, model descriptions, statistical analysis text, or similar body-snippet clues.
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
paper-search run search_semantic_snippets --arg query="comparative risk assessment methods uncertainty propagation" --arg limit=5 --arg fieldsOfStudy=Medicine --pretty
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
`search_semantic_snippets` requires `SEMANTIC_SCHOLAR_API_KEY` and uses `limit`, not `maxResults`.
|
|
123
|
+
|
|
124
|
+
Only results with `snippetKind="body"` can be used as body-snippet evidence. Results from `title` or `abstract` are clues only. Before quoting or relying on a snippet, verify title, authors, year, journal/source, DOI or PMID.
|
|
125
|
+
|
|
126
|
+
## Platform Selection
|
|
127
|
+
|
|
128
|
+
| Task | First Choice | Supplements |
|
|
129
|
+
|---|---|---|
|
|
130
|
+
| Biomedical, clinical, pharmaceutical, public health | `pubmed` | `pmc`, `europepmc`, `semantic`, `crossref` |
|
|
131
|
+
| Methods/body snippet clues | `search_semantic_snippets` | Use `pubmed`/`semantic` first for titles and synonyms |
|
|
132
|
+
| Citation graph expansion | `get_paper_citations`, `get_paper_references` | Use only after a target paper identifier is known |
|
|
133
|
+
| Computer science, AI, math, physics | `arxiv` | `semantic`, `crossref`, `openalex` |
|
|
134
|
+
| CS bibliographies and conference metadata | `dblp` | `acm`, `usenix`, `openreview`, `ieee` requires key |
|
|
135
|
+
| Cross-disciplinary coverage | `crossref` | `openalex`, `semantic` |
|
|
136
|
+
| Open-access full-text discovery | `pmc`, `europepmc`, `core`, `openaire`, `unpaywall` | `download_with_fallback` |
|
|
137
|
+
| Journal IF/quartiles/rank | `journal-metrics` | `query_journal_metrics` |
|
|
138
|
+
| Cryptography | `iacr` | `arxiv` |
|
|
139
|
+
| Citation-count sorting | `semantic`, `crossref`, `openalex` | `webofscience`, `scopus` require keys |
|
|
140
|
+
| Publisher or paid databases | `webofscience`, `ieee`, `scopus`, `sciencedirect`, `springer`/`springerlink`, `wiley` | Use only when key is configured |
|
|
141
|
+
|
|
142
|
+
## Query Construction
|
|
143
|
+
|
|
144
|
+
- Translate Chinese research questions into English keywords by default.
|
|
145
|
+
- Use 3-8 core concept terms rather than long sentences.
|
|
146
|
+
- For medical topics, include MeSH or standard terminology when useful.
|
|
147
|
+
- For method details, include software names, parameter names, model names, or section words such as `methods`, `statistical analysis`, `adjusted for`, `bootstrap`, `sensitivity analysis`.
|
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# Paper Search CLI Contract
|
|
2
|
+
|
|
3
|
+
This contract records the stable CLI surface that the `paper-search` Routing Skill may rely on. The Routing Skill should stay short and should not describe commands, flags, or defaults that are absent from this contract and the current CLI.
|
|
4
|
+
|
|
5
|
+
## Entrypoints
|
|
6
|
+
|
|
7
|
+
- `paper-search` is the primary executable.
|
|
8
|
+
- `paper-search --version`, `paper-search -v`, and `paper-search version` print the installed version.
|
|
9
|
+
- `paper-search --help` and `paper-search help` print usage.
|
|
10
|
+
- `paper-search tools --pretty` lists direct `run` tool names and schemas.
|
|
11
|
+
- Private API keys, emails, and tokens must be configured through `paper-search setup`, `paper-search config`, `.env`, or shell environment variables. They must not be written into Skill files.
|
|
12
|
+
|
|
13
|
+
## Top-Level Commands
|
|
14
|
+
|
|
15
|
+
- `paper-search search <query> [--platform NAME] [--sources CSV] [--max-results N] [--year YEAR] [--pretty]`
|
|
16
|
+
- `paper-search run <tool-name> --arg key=value --json-args '{"key":"value"}' [--pretty]`
|
|
17
|
+
- `paper-search download <paper-id> --platform NAME [--save-path PATH] [--pretty]`
|
|
18
|
+
- `paper-search journal-metrics <journal...> [--file PATH] [--include-raw] [--pretty]`
|
|
19
|
+
- `paper-search metrics ...` is an alias for `journal-metrics`.
|
|
20
|
+
- `paper-search status [--validate] [--pretty]`
|
|
21
|
+
- `paper-search doctor [--validate] [--format text] [--pretty]`
|
|
22
|
+
- `paper-search smoke --mock [--pretty]`
|
|
23
|
+
- `paper-search smoke --live [--pretty]`
|
|
24
|
+
- `paper-search skills status [--targets CSV] [--pretty]`
|
|
25
|
+
- `paper-search skills diff [--targets CSV] [--format text] [--pretty]`
|
|
26
|
+
- `paper-search skills update [--targets CSV] [--pretty]`
|
|
27
|
+
- `paper-search setup [--all] [--keys CSV] [--install-skills CSV] [--skip-skills]`
|
|
28
|
+
- `paper-search tools [--pretty]`
|
|
29
|
+
- `paper-search diagnostics [--pretty]`
|
|
30
|
+
- `paper-search requirements [--pretty]` is an alias for `diagnostics`.
|
|
31
|
+
- `paper-search config init [--pretty]`
|
|
32
|
+
- `paper-search config path [--pretty]`
|
|
33
|
+
- `paper-search config keys [--pretty]`
|
|
34
|
+
- `paper-search config list [--all] [--pretty]`
|
|
35
|
+
- `paper-search config doctor [--pretty]` is a compatibility config summary; use top-level `doctor` for the full health report.
|
|
36
|
+
- `paper-search config get KEY [--raw] [--pretty]`
|
|
37
|
+
- `paper-search config set KEY VALUE [--pretty]`
|
|
38
|
+
- `paper-search config unset KEY [--pretty]`
|
|
39
|
+
- `paper-search config delete KEY [--pretty]` and `paper-search config remove KEY [--pretty]` are aliases for `unset`.
|
|
40
|
+
- `paper-search config import-env [file] [--pretty]`
|
|
41
|
+
|
|
42
|
+
## Direct Run Tools
|
|
43
|
+
|
|
44
|
+
These names can be used with `paper-search run <tool-name>`:
|
|
45
|
+
|
|
46
|
+
- `search_papers`
|
|
47
|
+
- `search_arxiv`
|
|
48
|
+
- `search_webofscience`
|
|
49
|
+
- `search_pubmed`
|
|
50
|
+
- `search_biorxiv`
|
|
51
|
+
- `search_medrxiv`
|
|
52
|
+
- `search_semantic_scholar`
|
|
53
|
+
- `search_semantic_snippets`
|
|
54
|
+
- `get_paper_citations`
|
|
55
|
+
- `get_paper_references`
|
|
56
|
+
- `search_iacr`
|
|
57
|
+
- `download_paper`
|
|
58
|
+
- `search_google_scholar`
|
|
59
|
+
- `get_paper_by_doi`
|
|
60
|
+
- `search_scihub`
|
|
61
|
+
- `check_scihub_mirrors`
|
|
62
|
+
- `get_platform_status`
|
|
63
|
+
- `query_journal_metrics`
|
|
64
|
+
- `search_sciencedirect`
|
|
65
|
+
- `search_springer`
|
|
66
|
+
- `search_wiley`
|
|
67
|
+
- `search_scopus`
|
|
68
|
+
- `search_crossref`
|
|
69
|
+
- `search_openalex`
|
|
70
|
+
- `search_unpaywall`
|
|
71
|
+
- `search_pmc`
|
|
72
|
+
- `search_europepmc`
|
|
73
|
+
- `search_core`
|
|
74
|
+
- `search_openaire`
|
|
75
|
+
- `download_with_fallback`
|
|
76
|
+
- `search_dblp`
|
|
77
|
+
- `search_ieee`
|
|
78
|
+
- `search_acm`
|
|
79
|
+
- `search_usenix`
|
|
80
|
+
- `search_openreview`
|
|
81
|
+
- `search_springerlink`
|
|
82
|
+
|
|
83
|
+
## Output Expectations
|
|
84
|
+
|
|
85
|
+
- JSON is the default machine-readable output for agent and script callers.
|
|
86
|
+
- `--pretty` pretty-prints JSON.
|
|
87
|
+
- `--format text` is supported by top-level `doctor` and `skills diff` for explicitly requested human-readable reports.
|
|
88
|
+
- `--include-text` keeps raw tool response text in JSON for commands where the CLI supports it.
|
|
89
|
+
- The Routing Skill should parse JSON when making decisions and use text format only when the user needs a readable report.
|
|
90
|
+
|
|
91
|
+
## Search Command Contract
|
|
92
|
+
|
|
93
|
+
- `paper-search search` is the integrated metadata search entrypoint.
|
|
94
|
+
- Use `--platform NAME` for one source and `--sources a,b,c` for explicit multi-source search.
|
|
95
|
+
- Use `--platform all` or `--sources all` only when broad recall matters more than precision.
|
|
96
|
+
- `search_papers` is the direct tool behind the integrated `search` command.
|
|
97
|
+
- `search_semantic_snippets` uses `limit`, not `maxResults`, and is for body/title/abstract snippets rather than complete full text.
|
|
98
|
+
- `search_unpaywall` resolves DOI-based OA metadata and returns at most one result.
|
|
99
|
+
- `search_scihub` is DOI/URL-targeted lookup and is not a metadata search source.
|
|
100
|
+
- `CORE_MAX_RESULTS_CAP` controls the configurable CORE-only result cap. Default is `100`; hard maximum is `500`. Other platforms keep their own current limits.
|
|
101
|
+
|
|
102
|
+
## Citation Expansion Contract
|
|
103
|
+
|
|
104
|
+
`get_paper_citations` and `get_paper_references` query Semantic Scholar Graph API for citation graph expansion.
|
|
105
|
+
|
|
106
|
+
- Provide at least one of `paperId`, `doi`, or `arxivId`.
|
|
107
|
+
- Target priority is `paperId`, then `doi`, then `arxivId`.
|
|
108
|
+
- `doi` is converted to `DOI:<doi>`.
|
|
109
|
+
- `arxivId` is converted to `ARXIV:<id>`.
|
|
110
|
+
- `limit` defaults to `100` and accepts values from `1` to `100`.
|
|
111
|
+
|
|
112
|
+
Examples:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
paper-search run get_paper_citations --arg doi="10.1038/nature12373" --arg limit=5 --pretty
|
|
116
|
+
paper-search run get_paper_references --arg doi="10.1038/nature12373" --arg limit=5 --pretty
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## Download Command Contract
|
|
120
|
+
|
|
121
|
+
`download_paper` tries source-native download first when available. Unsupported or failed native downloads route into the same fallback funnel used by `download_with_fallback`.
|
|
122
|
+
|
|
123
|
+
`download_with_fallback` order is source-native download, metadata PDF URL, repository discovery through PMC/Europe PMC/CORE/OpenAIRE, Unpaywall DOI resolution, then Sci-Hub as the final fallback.
|
|
124
|
+
|
|
125
|
+
Sci-Hub Fallback is enabled by default. To suppress that final stage for a request, pass:
|
|
126
|
+
|
|
127
|
+
```json
|
|
128
|
+
{"useSciHub": false}
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
The Routing Skill must not describe future-only download commands or strategy flags until they appear in `paper-search --help` or `paper-search tools`.
|
|
132
|
+
|
|
133
|
+
## Configuration And Secret Boundaries
|
|
134
|
+
|
|
135
|
+
Configuration sources, in priority order:
|
|
136
|
+
|
|
137
|
+
1. Shell environment variables
|
|
138
|
+
2. Current directory `.env`
|
|
139
|
+
3. User config file under `~/.config/paper-search-cli/config.json`
|
|
140
|
+
4. Free-source built-in defaults
|
|
141
|
+
|
|
142
|
+
Useful configuration commands:
|
|
143
|
+
|
|
144
|
+
```bash
|
|
145
|
+
paper-search setup
|
|
146
|
+
paper-search config set SEMANTIC_SCHOLAR_API_KEY your_key
|
|
147
|
+
paper-search setup EASYSCHOLAR_KEY
|
|
148
|
+
paper-search config list --pretty
|
|
149
|
+
paper-search doctor --pretty
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Do not ask users to paste secrets into chat. Do not write secrets into Skill, README, tests, or logs. `doctor` and `config` output should mask configured secret values.
|