@fbraza/pi-cite 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -12
- package/package.json +3 -4
- package/skills/literature/SKILL.md +21 -40
- package/skills/literature/references/preclinical-extraction-guide.md +1 -1
- package/skills/literature/scripts/generate_table.py +1 -3
- package/skills/literature/scripts/synthesis.py +4 -3
- package/src/index.ts +0 -4
- package/src/literature-search.ts +2 -110
- package/src/rendering.ts +13 -23
- package/src/shared.ts +0 -21
- package/src/types.ts +0 -13
- package/skills/literature/references/full-text-access-guide.md +0 -34
- package/skills/literature/references/scihub_routine.md +0 -40
- package/skills/literature/references/semanticscholar_routine.md +0 -50
- package/skills/literature/scripts/scihub_pdf_resolver.py +0 -289
- package/src/fulltext.ts +0 -524
- package/src/semantic-scholar.ts +0 -199
package/README.md
CHANGED
|
@@ -1,24 +1,22 @@
|
|
|
1
1
|
# @fbraza/pi-cite
|
|
2
2
|
|
|
3
3
|
A standalone [Pi](https://pi.dev) extension providing literature-research tools for
|
|
4
|
-
academic workflows. Registers
|
|
4
|
+
academic workflows. Registers two tools callable by the agent:
|
|
5
5
|
|
|
6
|
-
- **`literature_search`** —
|
|
7
|
-
|
|
6
|
+
- **`literature_search`** — literature workflow search against PubMed using a
|
|
7
|
+
PubMed-ready query (MeSH `[mh]`, `[tiab]`, `[pt]`, substance `[nm]`, and Boolean
|
|
8
|
+
logic), with streaming progress and deduplicated results.
|
|
8
9
|
- **`pubmed_search`** — direct PubMed query (MeSH, `[tiab]`, `[pt]`, etc.).
|
|
9
|
-
- **`fetch_fulltext`** — retrieve a paper PDF via PMC → publisher OA → fallback.
|
|
10
|
-
- (`semantic_scholar` helper used internally by the search tools.)
|
|
11
10
|
|
|
12
11
|
## Bundled skill
|
|
13
12
|
|
|
14
13
|
Ships with the **`literature`** skill (`skills/literature/`), which turns these
|
|
15
|
-
tools into an end-to-end review workflow: verified-citation search,
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
14
|
+
tools into an end-to-end review workflow: verified-citation search, per-paper
|
|
15
|
+
experiment extraction, and a structured hypothesis synthesis. Its frontmatter
|
|
16
|
+
declares `allowed-tools` covering the extension's tools above, so the skill and
|
|
17
|
+
extension are paired on purpose.
|
|
19
18
|
|
|
20
|
-
- `references/` — PubMed
|
|
21
|
-
full-text access routines.
|
|
19
|
+
- `references/` — PubMed query syntax, API reference, and common queries.
|
|
22
20
|
- `scripts/` — Python helpers (`extract_experiments.py`, `synthesis.py`,
|
|
23
21
|
`generate_table.py`, `export_all.py`) invoked by the skill.
|
|
24
22
|
|
|
@@ -54,4 +52,3 @@ npm run pack:check # preview the published tarball contents
|
|
|
54
52
|
| Variable | Purpose |
|
|
55
53
|
|---|---|
|
|
56
54
|
| `NCBI_API_KEY` / `api_key` env | PubMed rate limit + E-utilities auth |
|
|
57
|
-
| `SEMANTIC_SCHOLAR_API_KEY` | Enables Semantic Scholar supplementary search |
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@fbraza/pi-cite",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "Pi extension with PubMed
|
|
3
|
+
"version": "0.3.0",
|
|
4
|
+
"description": "Pi extension with PubMed and literature search tools.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "module",
|
|
7
7
|
"files": [
|
|
@@ -13,8 +13,7 @@
|
|
|
13
13
|
"pi-package",
|
|
14
14
|
"pi-extension",
|
|
15
15
|
"literature",
|
|
16
|
-
"pubmed"
|
|
17
|
-
"semantic-scholar"
|
|
16
|
+
"pubmed"
|
|
18
17
|
],
|
|
19
18
|
"pi": {
|
|
20
19
|
"extensions": [
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: literature
|
|
3
|
-
description: Unified literature search, verification,
|
|
4
|
-
allowed-tools: Read, Write, WebFetch, WebSearch, literature_search, pubmed_search
|
|
3
|
+
description: Unified literature search, verification, and synthesis workflow for scientific questions. Use when any biological claim needs a verified citation, when reviewing a gene/pathway/disease/drug/target, when surveying preclinical evidence for a target in a disease, when checking novelty, or when turning a paper set into a structured hypothesis synthesis.
|
|
4
|
+
allowed-tools: Read, Write, WebFetch, WebSearch, literature_search, pubmed_search
|
|
5
5
|
starting-prompt: Conduct a literature review on my research topic with verified citations, structured synthesis, and a per-paper summary table.
|
|
6
6
|
---
|
|
7
7
|
|
|
@@ -16,7 +16,6 @@ Use this skill when you need to:
|
|
|
16
16
|
- review literature on a gene, pathway, disease, drug, or molecular target
|
|
17
17
|
- survey preclinical evidence for a target in a disease context
|
|
18
18
|
- check whether a finding appears novel or already published
|
|
19
|
-
- retrieve full text or PDFs for key papers
|
|
20
19
|
- synthesize a paper set into hypotheses, contradictions, and evidence-weighted conclusions
|
|
21
20
|
|
|
22
21
|
Do not use this skill for:
|
|
@@ -30,7 +29,6 @@ Do not use this skill for:
|
|
|
30
29
|
- Never fabricate PMIDs, DOIs, titles, journals, years, or author lists.
|
|
31
30
|
- Distinguish human, animal, and in vitro evidence.
|
|
32
31
|
- Weight evidence quality by study design and replication.
|
|
33
|
-
- Record how full text was obtained for each paper.
|
|
34
32
|
- Use inline numbered citations like `[1]` or `[1, 2]` in narrative synthesis.
|
|
35
33
|
- Never overwrite outputs from a previous literature search.
|
|
36
34
|
- Never write literature-review outputs directly to generic shared paths under `results/`.
|
|
@@ -48,26 +46,29 @@ Always clarify:
|
|
|
48
46
|
|
|
49
47
|
### Step 2 — Create a dedicated output folder
|
|
50
48
|
|
|
51
|
-
For every new literature review or literature research task, create a new dedicated folder
|
|
49
|
+
For every new literature review or literature research task, create a new dedicated folder under `results/literature_review/` before generating files.
|
|
52
50
|
|
|
53
|
-
|
|
54
|
-
- `results/literature_multiomics_ML_biomarkers_PGD/`
|
|
55
|
-
- `results/literature_siRNA_lung_transplant_new_treatments/`
|
|
51
|
+
Use the path `results/literature_review/<subject_of_study>/`, where `<subject_of_study>` is a short **snake_case title summary of the theme** of the literature search. Derive it from the scope clarified in Step 1: lower case, words separated by single underscores, no spaces, hyphens, or punctuation. For example, a review on **trained immunity in transplantation** becomes:
|
|
56
52
|
|
|
57
|
-
|
|
53
|
+
- `results/literature_review/trained_immunity_in_transplantation/`
|
|
54
|
+
|
|
55
|
+
Other examples:
|
|
56
|
+
- `results/literature_review/sirna_lung_transplant_new_treatments/`
|
|
57
|
+
- `results/literature_review/multiomics_ml_biomarkers_in_pgd/`
|
|
58
|
+
|
|
59
|
+
All generated files for that search session must be saved inside this dedicated subject folder, including:
|
|
58
60
|
- `literature_report.md`
|
|
59
61
|
- `paper_summary_table.csv`
|
|
60
62
|
- `search_log.md`
|
|
61
|
-
- `pdfs/`
|
|
62
63
|
- any optional analysis/export artifacts such as `analysis_object.pkl`
|
|
63
64
|
|
|
64
|
-
Never write directly to
|
|
65
|
+
Never write outputs directly to the parent folder or to the `results/` root, for example:
|
|
66
|
+
- `results/literature_review/literature_report.md`
|
|
67
|
+
- `results/literature_review/paper_summary_table.csv`
|
|
68
|
+
- `results/literature_review/analysis_object.pkl`
|
|
65
69
|
- `results/literature_report.md`
|
|
66
|
-
- `results/paper_summary_table.csv`
|
|
67
|
-
- `results/analysis_object.pkl`
|
|
68
|
-
- `results/literature_pdfs/`
|
|
69
70
|
|
|
70
|
-
If a folder for a previous search already exists, create a new folder with a distinct descriptive
|
|
71
|
+
If a folder for a previous search on the same subject already exists, create a new folder with a distinct descriptive `<subject_of_study>` title rather than using versioned filenames.
|
|
71
72
|
|
|
72
73
|
At the end of the task, clearly report the exact output folder and generated file paths to the user.
|
|
73
74
|
|
|
@@ -79,7 +80,6 @@ Use the custom literature tool as the primary search path:
|
|
|
79
80
|
When calling `literature_search`:
|
|
80
81
|
- Always construct `pubmed_query` using PubMed-specific syntax from the references below.
|
|
81
82
|
- Use MeSH terms (`[mh]` / `[majr]`), title/abstract terms (`[tiab]`), publication types (`[pt]`), substance names (`[nm]`), date filters, and Boolean logic as appropriate.
|
|
82
|
-
- Construct `semantic_scholar_query` separately as broader natural-language search terms when useful. Semantic Scholar is used automatically as supplementary search only when `SEMANTIC_SCHOLAR_API_KEY` is configured.
|
|
83
83
|
- Do not pass a generic natural-language query as `pubmed_query` when a PubMed/MeSH query can be constructed.
|
|
84
84
|
|
|
85
85
|
These extension tools are the preferred search path for this skill. Do not fall back to generic `WebFetch` / `WebSearch` first when one of these typed tools fits the task.
|
|
@@ -88,29 +88,15 @@ Read these references before constructing queries:
|
|
|
88
88
|
- `references/pubmed_routine.md`
|
|
89
89
|
- `references/pubmed_search_syntax.md`
|
|
90
90
|
- `references/pubmed_common_queries.md`
|
|
91
|
-
- `references/semanticscholar_routine.md`
|
|
92
91
|
|
|
93
92
|
### Step 4 — Screen and prioritise
|
|
94
93
|
|
|
95
|
-
- Deduplicate
|
|
96
|
-
- Prioritise by relevance, recency,
|
|
94
|
+
- Deduplicate PubMed results.
|
|
95
|
+
- Prioritise by relevance, recency, and study type.
|
|
97
96
|
- Default to deep reading of the top 20 papers unless the user asks otherwise.
|
|
98
97
|
- For preclinical requests, keep studies with experimental target perturbation evidence.
|
|
99
98
|
|
|
100
|
-
### Step 5 —
|
|
101
|
-
|
|
102
|
-
Use `fetch_fulltext` for top papers. Prefer it over ad-hoc `WebFetch` PDF retrieval because it applies the defined PMC → publisher OA → Sci-Hub chain.
|
|
103
|
-
|
|
104
|
-
Access chain:
|
|
105
|
-
1. PMC
|
|
106
|
-
2. publisher open-access page
|
|
107
|
-
3. Sci-Hub fallback
|
|
108
|
-
|
|
109
|
-
Read:
|
|
110
|
-
- `references/full-text-access-guide.md`
|
|
111
|
-
- `references/scihub_routine.md`
|
|
112
|
-
|
|
113
|
-
### Step 6 — Synthesis
|
|
99
|
+
### Step 5 — Synthesis
|
|
114
100
|
|
|
115
101
|
Always produce:
|
|
116
102
|
1. a narrative synthesis with inline numbered citations
|
|
@@ -179,14 +165,13 @@ After reviewing the core paper set, optionally produce:
|
|
|
179
165
|
|
|
180
166
|
## Expected files
|
|
181
167
|
|
|
182
|
-
Typical outputs must be placed in a dedicated
|
|
168
|
+
Typical outputs must be placed in a dedicated subject folder under `./results/literature_review/`, for example `./results/literature_review/<subject_of_study>/`:
|
|
183
169
|
- `literature_report.md`
|
|
184
170
|
- `paper_summary_table.csv`
|
|
185
171
|
- `search_log.md`
|
|
186
|
-
- `pdfs/`
|
|
187
172
|
- optional `analysis_object.pkl` or other export artifacts when produced
|
|
188
173
|
|
|
189
|
-
Do not write these outputs directly to `./results/` or reuse a previous
|
|
174
|
+
Do not write these outputs directly to `./results/literature_review/` or to `./results/`, and do not reuse a previous subject folder.
|
|
190
175
|
|
|
191
176
|
## Companion references
|
|
192
177
|
|
|
@@ -194,10 +179,7 @@ Do not write these outputs directly to `./results/` or reuse a previous search f
|
|
|
194
179
|
- `references/pubmed_routine.md`
|
|
195
180
|
- `references/pubmed_search_syntax.md`
|
|
196
181
|
- `references/pubmed_common_queries.md`
|
|
197
|
-
- `references/semanticscholar_routine.md`
|
|
198
182
|
- `references/preclinical-extraction-guide.md`
|
|
199
|
-
- `references/full-text-access-guide.md`
|
|
200
|
-
- `references/scihub_routine.md`
|
|
201
183
|
|
|
202
184
|
## Companion scripts
|
|
203
185
|
|
|
@@ -205,4 +187,3 @@ Do not write these outputs directly to `./results/` or reuse a previous search f
|
|
|
205
187
|
- `scripts/synthesis.py`
|
|
206
188
|
- `scripts/generate_table.py`
|
|
207
189
|
- `scripts/export_all.py`
|
|
208
|
-
- `scripts/scihub_pdf_resolver.py`
|
|
@@ -173,7 +173,7 @@ The `experiment_extraction.csv` file contains one row per paper with these colum
|
|
|
173
173
|
### 1. Abstract-only extraction
|
|
174
174
|
The script only reads abstracts, not full text. Papers that describe experiments only in the methods/results sections (not the abstract) will be misclassified as "unclassified".
|
|
175
175
|
|
|
176
|
-
**Mitigation:**
|
|
176
|
+
**Mitigation:** No full-text enrichment step is currently available; papers whose experiments appear only in the methods/results sections may be misclassified as "unclassified".
|
|
177
177
|
|
|
178
178
|
### 2. Keyword sensitivity
|
|
179
179
|
- **False positives:** A paper mentioning "mouse model" in the introduction (not as an experiment performed) may be classified as in_vivo.
|
|
@@ -24,8 +24,6 @@ def _identifier(paper: Dict) -> str:
|
|
|
24
24
|
return f"PMID:{paper['pmid']}"
|
|
25
25
|
if paper.get("doi"):
|
|
26
26
|
return paper["doi"]
|
|
27
|
-
if paper.get("s2_id"):
|
|
28
|
-
return paper["s2_id"]
|
|
29
27
|
return "NA"
|
|
30
28
|
|
|
31
29
|
|
|
@@ -51,7 +49,7 @@ def build_table_rows(papers: List[Dict], experiments: List[Dict] | None = None,
|
|
|
51
49
|
"#": idx,
|
|
52
50
|
"PMID/DOI": _identifier(paper),
|
|
53
51
|
"Authors (year)": _authors_year(paper),
|
|
54
|
-
"Key Message": _truncate(paper.get("
|
|
52
|
+
"Key Message": _truncate(paper.get("title") or ""),
|
|
55
53
|
"Key Results": _truncate(paper.get("abstract") or exp.get("key_findings") or ""),
|
|
56
54
|
"Key Methods": _truncate(
|
|
57
55
|
"; ".join(filter(None, [
|
|
@@ -35,15 +35,16 @@ def classify_study_type(paper: Dict) -> str:
|
|
|
35
35
|
|
|
36
36
|
def classify_evidence_quality(paper: Dict) -> str:
|
|
37
37
|
study_type = classify_study_type(paper)
|
|
38
|
-
citation_count = int(paper.get("citation_count") or 0)
|
|
39
38
|
if study_type in {"Systematic review / meta-analysis", "Randomized controlled trial"}:
|
|
40
39
|
return "High"
|
|
41
40
|
if study_type in {"Clinical study", "In vitro + in vivo"}:
|
|
42
41
|
return "Moderate"
|
|
43
42
|
if paper.get("is_preprint"):
|
|
44
43
|
return "Preliminary (preprint)"
|
|
45
|
-
if study_type
|
|
46
|
-
return "Moderate"
|
|
44
|
+
if study_type == "In vivo":
|
|
45
|
+
return "Moderate"
|
|
46
|
+
if study_type == "In vitro":
|
|
47
|
+
return "Preliminary"
|
|
47
48
|
return "Preliminary"
|
|
48
49
|
|
|
49
50
|
|
package/src/index.ts
CHANGED
|
@@ -1,12 +1,8 @@
|
|
|
1
1
|
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
|
|
2
|
-
import { registerFetchFulltextTool } from "./fulltext.ts";
|
|
3
2
|
import { registerLiteratureSearchTool } from "./literature-search.ts";
|
|
4
3
|
import { registerPubmedSearchTool } from "./pubmed.ts";
|
|
5
|
-
import { registerSemanticScholarSearchTool } from "./semantic-scholar.ts";
|
|
6
4
|
|
|
7
5
|
export default function literatureToolsExtension(pi: ExtensionAPI) {
|
|
8
6
|
registerLiteratureSearchTool(pi);
|
|
9
7
|
registerPubmedSearchTool(pi);
|
|
10
|
-
registerSemanticScholarSearchTool(pi);
|
|
11
|
-
registerFetchFulltextTool(pi);
|
|
12
8
|
}
|
package/src/literature-search.ts
CHANGED
|
@@ -7,7 +7,6 @@ import {
|
|
|
7
7
|
type LiteratureSearchDisplayEvent,
|
|
8
8
|
type LiteratureSearchDisplaySearch,
|
|
9
9
|
} from "./rendering.ts";
|
|
10
|
-
import { searchSemanticScholar } from "./semantic-scholar.ts";
|
|
11
10
|
import { formatPaperText, normalizeDoi, unique } from "./shared.ts";
|
|
12
11
|
import { emitProgress, textResult, type TextToolUpdate } from "./tool-output.ts";
|
|
13
12
|
import type { PaperRecord } from "./types.ts";
|
|
@@ -17,12 +16,6 @@ export const LITERATURE_SEARCH_PARAMS = Type.Object({
|
|
|
17
16
|
description:
|
|
18
17
|
"PubMed-ready query using PubMed syntax such as MeSH [mh], title/abstract [tiab], publication type [pt], substance [nm], and Boolean logic.",
|
|
19
18
|
}),
|
|
20
|
-
semantic_scholar_query: Type.Optional(
|
|
21
|
-
Type.String({
|
|
22
|
-
description:
|
|
23
|
-
"Optional natural-language Semantic Scholar query for supplementary search. If omitted and Semantic Scholar is configured, a simplified query is derived from pubmed_query.",
|
|
24
|
-
}),
|
|
25
|
-
),
|
|
26
19
|
max_results: Type.Optional(
|
|
27
20
|
Type.Number({ description: "Maximum results per provider (default 20)" }),
|
|
28
21
|
),
|
|
@@ -51,27 +44,11 @@ export type LiteratureSearchResult = {
|
|
|
51
44
|
papers: PaperRecord[];
|
|
52
45
|
providers: {
|
|
53
46
|
pubmed: ProviderExecution;
|
|
54
|
-
semantic_scholar: ProviderExecution;
|
|
55
47
|
};
|
|
56
48
|
searches: LiteratureSearchDisplaySearch[];
|
|
57
49
|
events: LiteratureSearchDisplayEvent[];
|
|
58
50
|
};
|
|
59
51
|
|
|
60
|
-
function firstYear(value?: string): number | undefined {
|
|
61
|
-
const match = value?.match(/^(\d{4})/);
|
|
62
|
-
return match?.[1] ? Number(match[1]) : undefined;
|
|
63
|
-
}
|
|
64
|
-
|
|
65
|
-
export function simplifyPubmedQueryForSemanticScholar(query: string): string {
|
|
66
|
-
const simplified = query
|
|
67
|
-
.replace(/\[[^\]]+\]/g, " ")
|
|
68
|
-
.replace(/\b(?:AND|OR|NOT)\b/gi, " ")
|
|
69
|
-
.replace(/[()"']/g, " ")
|
|
70
|
-
.replace(/\s+/g, " ")
|
|
71
|
-
.trim();
|
|
72
|
-
return simplified || query.trim();
|
|
73
|
-
}
|
|
74
|
-
|
|
75
52
|
function sourceList(paper: PaperRecord): string[] {
|
|
76
53
|
return unique([
|
|
77
54
|
...(paper.sources ?? []),
|
|
@@ -92,7 +69,6 @@ function dedupeKeys(paper: PaperRecord): string[] {
|
|
|
92
69
|
const keys = [
|
|
93
70
|
doi ? `doi:${doi}` : undefined,
|
|
94
71
|
paper.pmid ? `pmid:${paper.pmid}` : undefined,
|
|
95
|
-
paper.s2_id ? `s2:${paper.s2_id}` : undefined,
|
|
96
72
|
];
|
|
97
73
|
const title = normalizedTitle(paper.title);
|
|
98
74
|
if (title && paper.year) keys.push(`title-year:${title}:${paper.year}`);
|
|
@@ -106,7 +82,6 @@ function mergePapers(existing: PaperRecord, incoming: PaperRecord): PaperRecord
|
|
|
106
82
|
...existing,
|
|
107
83
|
doi: normalizeDoi(existing.doi) ?? normalizeDoi(incoming.doi),
|
|
108
84
|
pmid: existing.pmid ?? incoming.pmid,
|
|
109
|
-
s2_id: existing.s2_id ?? incoming.s2_id,
|
|
110
85
|
title: existing.title !== "Untitled" ? existing.title : incoming.title,
|
|
111
86
|
abstract: existing.abstract ?? incoming.abstract,
|
|
112
87
|
authors: unique([...(existing.authors ?? []), ...(incoming.authors ?? [])]),
|
|
@@ -117,10 +92,6 @@ function mergePapers(existing: PaperRecord, incoming: PaperRecord): PaperRecord
|
|
|
117
92
|
...(incoming.publication_types ?? []),
|
|
118
93
|
]),
|
|
119
94
|
mesh_terms: unique([...(existing.mesh_terms ?? []), ...(incoming.mesh_terms ?? [])]),
|
|
120
|
-
citation_count: existing.citation_count ?? incoming.citation_count,
|
|
121
|
-
tldr: existing.tldr ?? incoming.tldr,
|
|
122
|
-
open_access_pdf: existing.open_access_pdf ?? incoming.open_access_pdf,
|
|
123
|
-
external_ids: { ...(incoming.external_ids ?? {}), ...(existing.external_ids ?? {}) },
|
|
124
95
|
source: sources.join(";"),
|
|
125
96
|
sources,
|
|
126
97
|
};
|
|
@@ -208,88 +179,10 @@ export async function searchLiterature(
|
|
|
208
179
|
});
|
|
209
180
|
emitEvent(`PubMed q1 found ${pubmed.count} candidate papers.`);
|
|
210
181
|
|
|
211
|
-
const semanticScholarApiKey = process.env.SEMANTIC_SCHOLAR_API_KEY?.trim();
|
|
212
|
-
let semanticScholar: ProviderExecution = {
|
|
213
|
-
searched: false,
|
|
214
|
-
reason: "SEMANTIC_SCHOLAR_API_KEY not configured",
|
|
215
|
-
};
|
|
216
|
-
let semanticScholarPapers: PaperRecord[] = [];
|
|
217
|
-
|
|
218
|
-
if (semanticScholarApiKey) {
|
|
219
|
-
const semanticScholarQuery =
|
|
220
|
-
params.semantic_scholar_query?.trim() ||
|
|
221
|
-
simplifyPubmedQueryForSemanticScholar(params.pubmed_query);
|
|
222
|
-
|
|
223
|
-
events.push({
|
|
224
|
-
phase: "query_start",
|
|
225
|
-
provider: "semantic_scholar",
|
|
226
|
-
query_index: 1,
|
|
227
|
-
query: semanticScholarQuery,
|
|
228
|
-
});
|
|
229
|
-
emitEvent(`Searching Semantic Scholar q1: ${semanticScholarQuery}`);
|
|
230
|
-
|
|
231
|
-
try {
|
|
232
|
-
const semanticScholarResult = await searchSemanticScholar(
|
|
233
|
-
{
|
|
234
|
-
query: semanticScholarQuery,
|
|
235
|
-
max_results: Math.min(100, maxResults),
|
|
236
|
-
year_from: firstYear(params.date_from),
|
|
237
|
-
year_to: firstYear(params.date_to),
|
|
238
|
-
},
|
|
239
|
-
signal,
|
|
240
|
-
undefined,
|
|
241
|
-
);
|
|
242
|
-
semanticScholarPapers = semanticScholarResult.papers;
|
|
243
|
-
const semanticScholarDisplayPapers = compactPapersForDisplay(
|
|
244
|
-
semanticScholarResult.papers,
|
|
245
|
-
);
|
|
246
|
-
searches.push({
|
|
247
|
-
provider: "semantic_scholar",
|
|
248
|
-
query_index: 1,
|
|
249
|
-
query: semanticScholarQuery,
|
|
250
|
-
count: semanticScholarResult.count,
|
|
251
|
-
papers: semanticScholarDisplayPapers,
|
|
252
|
-
});
|
|
253
|
-
events.push({
|
|
254
|
-
phase: "query_results",
|
|
255
|
-
provider: "semantic_scholar",
|
|
256
|
-
query_index: 1,
|
|
257
|
-
query: semanticScholarQuery,
|
|
258
|
-
count: semanticScholarResult.count,
|
|
259
|
-
papers: semanticScholarDisplayPapers,
|
|
260
|
-
});
|
|
261
|
-
emitEvent(
|
|
262
|
-
`Semantic Scholar q1 found ${semanticScholarResult.count} candidate papers.`,
|
|
263
|
-
);
|
|
264
|
-
semanticScholar = {
|
|
265
|
-
searched: true,
|
|
266
|
-
count: semanticScholarResult.count,
|
|
267
|
-
query: semanticScholarQuery,
|
|
268
|
-
};
|
|
269
|
-
} catch (err) {
|
|
270
|
-
const message = err instanceof Error ? err.message : String(err);
|
|
271
|
-
events.push({
|
|
272
|
-
phase: "query_error",
|
|
273
|
-
provider: "semantic_scholar",
|
|
274
|
-
query_index: 1,
|
|
275
|
-
query: semanticScholarQuery,
|
|
276
|
-
error: message,
|
|
277
|
-
});
|
|
278
|
-
semanticScholar = {
|
|
279
|
-
searched: false,
|
|
280
|
-
reason: `Semantic Scholar search failed: ${message}`,
|
|
281
|
-
};
|
|
282
|
-
emitEvent(`Semantic Scholar q1 failed: ${message}`);
|
|
283
|
-
}
|
|
284
|
-
}
|
|
285
|
-
|
|
286
182
|
events.push({ phase: "dedupe" });
|
|
287
183
|
emitEvent("Deduplicating literature results...");
|
|
288
184
|
|
|
289
|
-
const papers = dedupeLiteraturePapers(
|
|
290
|
-
...pubmed.papers,
|
|
291
|
-
...semanticScholarPapers,
|
|
292
|
-
]);
|
|
185
|
+
const papers = dedupeLiteraturePapers(pubmed.papers);
|
|
293
186
|
events.push({
|
|
294
187
|
phase: "complete",
|
|
295
188
|
count: papers.length,
|
|
@@ -307,7 +200,6 @@ export async function searchLiterature(
|
|
|
307
200
|
query: pubmed.query ?? params.pubmed_query,
|
|
308
201
|
total: pubmed.total,
|
|
309
202
|
},
|
|
310
|
-
semantic_scholar: semanticScholar,
|
|
311
203
|
},
|
|
312
204
|
searches,
|
|
313
205
|
events,
|
|
@@ -319,7 +211,7 @@ export function createLiteratureSearchTool() {
|
|
|
319
211
|
name: "literature_search",
|
|
320
212
|
label: "Literature Search",
|
|
321
213
|
description:
|
|
322
|
-
"Run the literature workflow search
|
|
214
|
+
"Run the literature workflow search against PubMed using a PubMed-ready query (MeSH [mh], title/abstract [tiab], publication type [pt], substance [nm], and Boolean logic).",
|
|
323
215
|
parameters: LITERATURE_SEARCH_PARAMS,
|
|
324
216
|
async execute(
|
|
325
217
|
_toolCallId: string,
|
package/src/rendering.ts
CHANGED
|
@@ -16,20 +16,19 @@ export type CompactPaperForDisplay = {
|
|
|
16
16
|
source: string;
|
|
17
17
|
year?: number;
|
|
18
18
|
journal?: string;
|
|
19
|
-
citation_count?: number;
|
|
20
19
|
};
|
|
21
20
|
|
|
22
21
|
export type LiteratureSearchDisplayEvent =
|
|
23
22
|
| { phase: "start" }
|
|
24
23
|
| {
|
|
25
24
|
phase: "query_start";
|
|
26
|
-
provider: "pubmed"
|
|
25
|
+
provider: "pubmed";
|
|
27
26
|
query_index: number;
|
|
28
27
|
query: string;
|
|
29
28
|
}
|
|
30
29
|
| {
|
|
31
30
|
phase: "query_results";
|
|
32
|
-
provider: "pubmed"
|
|
31
|
+
provider: "pubmed";
|
|
33
32
|
query_index: number;
|
|
34
33
|
query: string;
|
|
35
34
|
count: number;
|
|
@@ -37,7 +36,7 @@ export type LiteratureSearchDisplayEvent =
|
|
|
37
36
|
}
|
|
38
37
|
| {
|
|
39
38
|
phase: "query_error";
|
|
40
|
-
provider: "pubmed"
|
|
39
|
+
provider: "pubmed";
|
|
41
40
|
query_index: number;
|
|
42
41
|
query: string;
|
|
43
42
|
error: string;
|
|
@@ -46,7 +45,7 @@ export type LiteratureSearchDisplayEvent =
|
|
|
46
45
|
| { phase: "complete"; count: number; papers: CompactPaperForDisplay[] };
|
|
47
46
|
|
|
48
47
|
export type LiteratureSearchDisplaySearch = {
|
|
49
|
-
provider: "pubmed"
|
|
48
|
+
provider: "pubmed";
|
|
50
49
|
query_index: number;
|
|
51
50
|
query: string;
|
|
52
51
|
count: number;
|
|
@@ -107,7 +106,6 @@ export function authorRange(paper: PaperRecord): string {
|
|
|
107
106
|
export function paperIdentifier(paper: PaperRecord): string {
|
|
108
107
|
if (paper.doi) return `DOI:${paper.doi}`;
|
|
109
108
|
if (paper.pmid) return `PMID:${paper.pmid}`;
|
|
110
|
-
if (paper.s2_id) return `S2:${paper.s2_id}`;
|
|
111
109
|
return "—";
|
|
112
110
|
}
|
|
113
111
|
|
|
@@ -120,11 +118,7 @@ export function sourceLabel(paper: PaperRecord): string {
|
|
|
120
118
|
.map((source) => source.trim())
|
|
121
119
|
.filter(Boolean),
|
|
122
120
|
);
|
|
123
|
-
|
|
124
|
-
const hasS2 = sources.has("semantic_scholar");
|
|
125
|
-
if (hasPubmed && hasS2) return "PM+S2";
|
|
126
|
-
if (hasPubmed) return "PM";
|
|
127
|
-
if (hasS2) return "S2";
|
|
121
|
+
if (sources.has("pubmed")) return "PM";
|
|
128
122
|
return paper.source ?? "—";
|
|
129
123
|
}
|
|
130
124
|
|
|
@@ -136,7 +130,6 @@ export function compactPaperForDisplay(paper: PaperRecord): CompactPaperForDispl
|
|
|
136
130
|
source: sourceLabel(paper),
|
|
137
131
|
year: paper.year,
|
|
138
132
|
journal: paper.journal,
|
|
139
|
-
citation_count: paper.citation_count,
|
|
140
133
|
};
|
|
141
134
|
}
|
|
142
135
|
|
|
@@ -144,12 +137,12 @@ export function compactPapersForDisplay(papers: PaperRecord[]): CompactPaperForD
|
|
|
144
137
|
return papers.map(compactPaperForDisplay);
|
|
145
138
|
}
|
|
146
139
|
|
|
147
|
-
function providerLabel(provider: "pubmed"
|
|
148
|
-
return
|
|
140
|
+
function providerLabel(provider: "pubmed"): string {
|
|
141
|
+
return "PubMed";
|
|
149
142
|
}
|
|
150
143
|
|
|
151
|
-
function providerColor(provider: "pubmed"
|
|
152
|
-
return
|
|
144
|
+
function providerColor(provider: "pubmed"): string {
|
|
145
|
+
return "success";
|
|
153
146
|
}
|
|
154
147
|
|
|
155
148
|
export function formatFoundLine(
|
|
@@ -168,7 +161,7 @@ export function formatMergedLine(
|
|
|
168
161
|
theme?: ThemeLike,
|
|
169
162
|
): string {
|
|
170
163
|
const title = truncateText(paper.title, 72);
|
|
171
|
-
const source = color(theme,
|
|
164
|
+
const source = color(theme, "success", `(${paper.source})`);
|
|
172
165
|
return ` ${color(theme, "success", "+")} ${index + 1}. ${title} ${source}`;
|
|
173
166
|
}
|
|
174
167
|
|
|
@@ -237,7 +230,6 @@ type LiteratureResultDetails = {
|
|
|
237
230
|
papers?: PaperRecord[];
|
|
238
231
|
providers?: {
|
|
239
232
|
pubmed?: ProviderSearchSummary;
|
|
240
|
-
semantic_scholar?: ProviderSearchSummary;
|
|
241
233
|
};
|
|
242
234
|
events?: LiteratureSearchDisplayEvent[];
|
|
243
235
|
};
|
|
@@ -250,11 +242,9 @@ type ProviderResultDetails = {
|
|
|
250
242
|
|
|
251
243
|
function renderCollapsedLiteratureResult(details: LiteratureResultDetails, theme?: ThemeLike): string {
|
|
252
244
|
const pubmed = details?.providers?.pubmed;
|
|
253
|
-
const s2 = details?.providers?.semantic_scholar;
|
|
254
245
|
const pubmedText = pubmed?.searched ? `PubMed: ${pubmed.count}` : "PubMed: —";
|
|
255
|
-
const s2Text = s2?.searched ? `S2: ${s2.count}` : "S2: skipped";
|
|
256
246
|
const count = details?.count ?? details?.papers?.length ?? 0;
|
|
257
|
-
return `${color(theme, "success", "✓")} ${color(theme, "toolTitle", "literature_search")} ${color(theme, "success", pubmedText)} |
|
|
247
|
+
return `${color(theme, "success", "✓")} ${color(theme, "toolTitle", "literature_search")} ${color(theme, "success", pubmedText)} | merged: ${count}`;
|
|
258
248
|
}
|
|
259
249
|
|
|
260
250
|
export function renderLiteratureSearchResult(
|
|
@@ -284,7 +274,7 @@ export function renderLiteratureSearchResult(
|
|
|
284
274
|
}
|
|
285
275
|
|
|
286
276
|
export function renderProviderSearchResult(
|
|
287
|
-
provider: "pubmed"
|
|
277
|
+
provider: "pubmed",
|
|
288
278
|
result: ToolRenderResult<ProviderResultDetails>,
|
|
289
279
|
options: RenderOptions,
|
|
290
280
|
theme?: ThemeLike,
|
|
@@ -298,7 +288,7 @@ export function renderProviderSearchResult(
|
|
|
298
288
|
return terminalText(color(theme, "warning", text));
|
|
299
289
|
}
|
|
300
290
|
if (!options.expanded) {
|
|
301
|
-
return terminalText(`${color(theme, "success", "✓")} ${color(theme, "toolTitle",
|
|
291
|
+
return terminalText(`${color(theme, "success", "✓")} ${color(theme, "toolTitle", "pubmed_search")} ${papers.length} papers`);
|
|
302
292
|
}
|
|
303
293
|
const lines = [
|
|
304
294
|
`${color(theme, providerColor(provider), "→")} ${color(theme, providerColor(provider), providerName)} q1: ${query}`,
|
package/src/shared.ts
CHANGED
|
@@ -1,5 +1,3 @@
|
|
|
1
|
-
import { mkdir, writeFile } from "node:fs/promises";
|
|
2
|
-
import path from "node:path";
|
|
3
1
|
import type { PaperRecord } from "./types.ts";
|
|
4
2
|
|
|
5
3
|
export const USER_AGENT = "research-skills-literature-tools/0.1 (+https://github.com/fbraza/research-skills)";
|
|
@@ -82,22 +80,3 @@ export async function fetchJson<T>(url: string, signal?: AbortSignal, headers?:
|
|
|
82
80
|
export function formatPaperText(papers: PaperRecord[]): string {
|
|
83
81
|
return JSON.stringify(papers, null, 2);
|
|
84
82
|
}
|
|
85
|
-
|
|
86
|
-
export function sanitizeFilename(value: string): string {
|
|
87
|
-
return value.replace(/[^a-z0-9._-]+/gi, "_").replace(/^_+|_+$/g, "") || "paper";
|
|
88
|
-
}
|
|
89
|
-
|
|
90
|
-
export async function savePdf(pdfUrl: string, outputDir: string, preferredId: string, signal?: AbortSignal): Promise<string> {
|
|
91
|
-
await mkdir(outputDir, { recursive: true });
|
|
92
|
-
const response = await fetch(pdfUrl, {
|
|
93
|
-
method: "GET",
|
|
94
|
-
signal,
|
|
95
|
-
headers: { "user-agent": USER_AGENT, accept: "application/pdf,*/*" },
|
|
96
|
-
redirect: "follow",
|
|
97
|
-
});
|
|
98
|
-
if (!response.ok) throw new Error(`Failed to download PDF (${response.status})`);
|
|
99
|
-
const bytes = Buffer.from(await response.arrayBuffer());
|
|
100
|
-
const filePath = path.resolve(outputDir, `${sanitizeFilename(preferredId)}.pdf`);
|
|
101
|
-
await writeFile(filePath, bytes);
|
|
102
|
-
return filePath;
|
|
103
|
-
}
|
package/src/types.ts
CHANGED
|
@@ -1,7 +1,6 @@
|
|
|
1
1
|
export type PaperRecord = {
|
|
2
2
|
pmid?: string;
|
|
3
3
|
doi?: string;
|
|
4
|
-
s2_id?: string;
|
|
5
4
|
title: string;
|
|
6
5
|
abstract?: string;
|
|
7
6
|
authors?: string[];
|
|
@@ -9,22 +8,10 @@ export type PaperRecord = {
|
|
|
9
8
|
year?: number;
|
|
10
9
|
publication_types?: string[];
|
|
11
10
|
mesh_terms?: string[];
|
|
12
|
-
citation_count?: number;
|
|
13
|
-
tldr?: string;
|
|
14
|
-
open_access_pdf?: string;
|
|
15
|
-
external_ids?: Record<string, string>;
|
|
16
11
|
source?: string;
|
|
17
12
|
sources?: string[];
|
|
18
13
|
date?: string;
|
|
19
14
|
category?: string;
|
|
20
15
|
version?: string;
|
|
21
16
|
license?: string;
|
|
22
|
-
pdf_url?: string;
|
|
23
|
-
};
|
|
24
|
-
|
|
25
|
-
export type FullTextRouteResult = {
|
|
26
|
-
source: string;
|
|
27
|
-
pdf_url?: string;
|
|
28
|
-
access_note: string;
|
|
29
|
-
is_preprint?: boolean;
|
|
30
17
|
};
|
|
@@ -1,34 +0,0 @@
|
|
|
1
|
-
# Full-Text Access Guide
|
|
2
|
-
|
|
3
|
-
**Workflow:** literature
|
|
4
|
-
**Purpose:** Retrieve PDFs for prioritised papers using a consistent fallback chain.
|
|
5
|
-
|
|
6
|
-
## Access order
|
|
7
|
-
|
|
8
|
-
1. **PubMed Central (PMC)**
|
|
9
|
-
- Preferred for PubMed-indexed papers with open full text.
|
|
10
|
-
- Use PubMed/PMC linking first when a PMID is available.
|
|
11
|
-
|
|
12
|
-
2. **Publisher open-access page**
|
|
13
|
-
- Resolve DOI at `https://doi.org/<doi>`.
|
|
14
|
-
- Look for `citation_pdf_url`, explicit PDF links, or embedded PDF viewers.
|
|
15
|
-
|
|
16
|
-
3. **Sci-Hub fallback**
|
|
17
|
-
- Use only as the final fallback after OA routes are exhausted.
|
|
18
|
-
- Record that Sci-Hub was used.
|
|
19
|
-
|
|
20
|
-
## Per-paper logging
|
|
21
|
-
|
|
22
|
-
For each paper, record:
|
|
23
|
-
- PMID
|
|
24
|
-
- DOI
|
|
25
|
-
- source used: `pmc`, `publisher_oa`, `scihub`, or `not_found`
|
|
26
|
-
- direct PDF URL if found
|
|
27
|
-
- local saved path if downloaded
|
|
28
|
-
- access note
|
|
29
|
-
|
|
30
|
-
## Notes
|
|
31
|
-
|
|
32
|
-
- PMC and publisher OA should always be attempted before Sci-Hub.
|
|
33
|
-
- If no DOI is known but PMID exists, try resolving identifiers from PubMed metadata first.
|
|
34
|
-
- If no PDF is found, keep the paper in the synthesis and note `not_found`.
|