paper-manager 0.11.2 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,10 +8,11 @@ A CLI tool for managing academic papers with knowledge base and vector search su
8
8
  ## Features
9
9
 
10
10
  - **Semantic search** — FAISS vector indexing with configurable embedding models, query your papers by meaning rather than keywords
11
+ - **Add papers by DOI** — automatically download Open Access PDFs via [Unpaywall API](https://unpaywall.org/products/api) with `--doi` ([integration guide](paper-cli/unpaywall.md))
11
12
  - **PDF metadata extraction** — automatically extracts title, author, keywords, DOI, and more from PDF files
12
13
  - **DOI deduplication** — detects duplicate papers by DOI before adding, with `--force` override
13
14
  - **Multi-format support** — import from PDF, TXT, MD, TEX, and other text-based formats
14
- - **PDF-to-Markdown conversion** — optional high-quality conversion via [opendataloader-pdf](https://github.com/nicobailon/opendataloader-pdf) with image extraction
15
+ - **PDF-to-Markdown conversion** — optional high-quality conversion via [opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf) with image extraction ([integration guide](paper-cli/opendataloader-pdf.md))
15
16
  - **Dual-scope data model** — user-level (`~/.paper-manager/`) for global collections and project-level (`./.paper-manager/`) for project-specific papers, with automatic scope resolution
16
17
  - **DOI-to-BibTeX** — convert DOI to BibTeX citation in one command
17
18
  - **Machine-readable output** — `--json` and `--jq` flags on all read commands for scripting and automation
@@ -49,6 +50,10 @@ paper kb create my-papers -d "My research papers"
49
50
  # Add a paper (supports PDF, TXT, MD, TEX, etc.)
50
51
  paper lit add <knowledge-base-id> ./paper.pdf
51
52
 
53
+ # Or add an Open Access paper by DOI
54
+ paper config set email '"you@example.com"' --user # one-time setup for Unpaywall API
55
+ paper lit add <knowledge-base-id> --doi 10.1038/nature12373
56
+
52
57
  # Search across papers
53
58
  paper kb query <knowledge-base-id> "attention mechanism"
54
59
  ```
@@ -78,7 +83,8 @@ paper kb query <id> <query-text> [--json] [--jq <expr>] # Query a knowledge bas
78
83
  ### Literature (`paper lit`)
79
84
 
80
85
  ```bash
81
- paper lit add <kb-id> <file-path> [-f] # Add a literature (auto-extracts PDF metadata, rejects duplicate DOI)
86
+ paper lit add <kb-id> <file-path> [-f] # Add a literature from file (auto-extracts PDF metadata)
87
+ paper lit add <kb-id> --doi <doi> [-f] # Add an Open Access paper by DOI via Unpaywall
82
88
  paper lit remove <kb-id> <id> # Remove a literature
83
89
  paper lit update <kb-id> <id> [opts] # Update literature metadata
84
90
  paper lit list <kb-id> [--json] [--jq <expr>] # List literatures
@@ -96,6 +102,72 @@ paper util doi2bib <doi> # Convert a DOI to BibTeX citation
96
102
  paper util pdf-meta <file> [--json] [--jq <expr>] # Extract metadata from a PDF file
97
103
  ```
98
104
 
105
+ ## Usage Scenarios
106
+
107
+ ### Building a paper collection for a research project
108
+
109
+ ```bash
110
+ # Initialize a project-scoped data directory (version-controllable)
111
+ paper config init
112
+
113
+ # Create a knowledge base for your topic
114
+ paper kb create "llm-agents" -d "Papers on LLM-based autonomous agents"
115
+ # Output: Knowledge base created: 9f3a...
116
+
117
+ # Add papers — by file or by DOI
118
+ paper lit add 9f3a ./downloaded-paper.pdf
119
+ paper lit add 9f3a --doi 10.48550/arXiv.2305.10601
120
+
121
+ # Ask questions across all your papers
122
+ paper kb query 9f3a "how do agents handle long-term memory?"
123
+ ```
124
+
125
+ ### Quick-adding Open Access papers you find online
126
+
127
+ ```bash
128
+ # Spot a DOI in a reference list? One command to ingest it.
129
+ paper lit add <kb-id> --doi 10.1038/nature12373
130
+
131
+ # Unpaywall checks OA status, downloads the PDF, extracts metadata,
132
+ # and builds a vector index — all in one step.
133
+ # If the paper is not OA, you'll get a clear error with instructions.
134
+ ```
135
+
136
+ ### Generating BibTeX citations for your bibliography
137
+
138
+ ```bash
139
+ paper util doi2bib 10.1145/3586183.3606763
140
+ # @inproceedings{...}
141
+
142
+ # Combine with jq to batch-extract DOIs from your knowledge base
143
+ paper lit list <kb-id> --jq '.[].doi | select(. != null)'
144
+ ```
145
+
146
+ ### Scripting with JSON output
147
+
148
+ ```bash
149
+ # Export all papers in a knowledge base as JSON
150
+ paper lit list <kb-id> --json > papers.json
151
+
152
+ # Filter with jq expressions inline
153
+ paper lit list <kb-id> --jq '[.[] | {title, doi, author}]'
154
+
155
+ # Find papers by a specific author
156
+ paper lit search <kb-id> -a "Vaswani" --json
157
+ ```
158
+
159
+ ### Annotating papers for a literature review
160
+
161
+ ```bash
162
+ # Attach notes to track your reading progress
163
+ paper lit note set <lit-id> status "read"
164
+ paper lit note set <lit-id> relevance "high"
165
+ paper lit note set <lit-id> summary "Proposes transformer architecture..."
166
+
167
+ # Review all notes
168
+ paper lit note list <lit-id>
169
+ ```
170
+
99
171
  ## Configuration
100
172
 
101
173
  See [Configuration Reference](docs/configuration.md) for all available config fields and detailed usage.
@@ -47,6 +47,11 @@
47
47
  "defaultEmbeddingModelId": {
48
48
  "type": "string",
49
49
  "minLength": 1
50
+ },
51
+ "email": {
52
+ "type": "string",
53
+ "format": "email",
54
+ "pattern": "^(?!\\.)(?!.*\\.\\.)([A-Za-z0-9_'+\\-\\.]*)[A-Za-z0-9_+-]@([A-Za-z0-9][A-Za-z0-9\\-]*\\.)+[A-Za-z]{2,}$"
50
55
  }
51
56
  },
52
57
  "required": ["embeddingModels"],
@@ -1,17 +1,20 @@
1
1
  import * as fs from "node:fs";
2
+ import * as os from "node:os";
2
3
  import * as path from "node:path";
3
4
  import chalk from "chalk";
4
5
  import cliProgress from "cli-progress";
5
6
  import { Command } from "commander";
6
- import { getFilesDir, getModelConfig, getProjectDataDir, getUserDataDir, getVectorStoreDir, } from "../config/index.js";
7
+ import { getConfig, getFilesDir, getModelConfig, getProjectDataDir, getUserDataDir, getVectorStoreDir, } from "../config/index.js";
7
8
  import * as projectKb from "../db/project/knowledge-bases.js";
8
9
  import * as projectLit from "../db/project/literatures.js";
9
10
  import * as userKb from "../db/user/knowledge-bases.js";
10
11
  import * as userLit from "../db/user/literatures.js";
12
+ import { isHybridBackendAvailable } from "../dep/index.js";
11
13
  import { extractContent, extractPdfMetadata } from "../extractor/index.js";
12
14
  import { convertPdfToMarkdown, isOpendataLoaderAvailable, removeImageDir, saveConvertResult, } from "../extractor/markdown.js";
13
15
  import { log } from "../logger.js";
14
16
  import { splitDocuments } from "../text-splitter.js";
17
+ import { downloadPdf, lookupDoi, normalizeDoi, UnpaywallError } from "../unpaywall/index.js";
15
18
  import { addDocuments, createVectorStore } from "../vector-store/index.js";
16
19
  import { outputJson } from "./output.js";
17
20
  function resolveKnowledgeBase(id) {
@@ -33,11 +36,21 @@ export function createLiteratureCommand() {
33
36
  const lit = new Command("lit").description("Manage literatures");
34
37
  // ─── lit add ───────────────────────────────────────────────
35
38
  lit
36
- .command("add <knowledge-base-id> <lit-path>")
37
- .description("Add a literature from a file (PDF, TXT, MD, TEX, etc.)")
39
+ .command("add <knowledge-base-id> [lit-path]")
40
+ .description("Add a literature from a file (PDF, TXT, MD, TEX, etc.) or by DOI via Unpaywall")
38
41
  .option("-t, --title <title>", "Literature title")
39
42
  .option("-f, --force", "Force add even if a literature with the same DOI already exists")
43
+ .option("-d, --doi <doi>", "Add paper by DOI (downloads Open Access PDF via Unpaywall)")
40
44
  .action(async (kbId, litPath, options) => {
45
+ // Mutual exclusivity check
46
+ if (litPath && options.doi) {
47
+ log.error("Cannot specify both <lit-path> and --doi. Use one or the other.");
48
+ process.exit(1);
49
+ }
50
+ if (!litPath && !options.doi) {
51
+ log.error("Either <lit-path> or --doi is required.");
52
+ process.exit(1);
53
+ }
41
54
  const resolved = resolveKnowledgeBase(kbId);
42
55
  if (!resolved) {
43
56
  log.error(`Knowledge base not found: ${kbId}`);
@@ -46,108 +59,214 @@ export function createLiteratureCommand() {
46
59
  const { kb, scope } = resolved;
47
60
  const baseDir = getBaseDir(scope);
48
61
  const litOps = getLitOps(scope);
49
- // Resolve file path
50
- const absolutePath = path.resolve(litPath);
51
- if (!fs.existsSync(absolutePath)) {
52
- log.error(`File not found: ${absolutePath}`);
53
- process.exit(1);
54
- }
55
- log.info("Extracting content...");
56
- const docs = await extractContent(absolutePath);
57
- log.step(`Extracted ${String(docs.length)} pages.`);
58
- // Extract PDF metadata if available
59
- const isPdf = absolutePath.toLowerCase().endsWith(".pdf");
60
- const pdfMeta = isPdf ? await extractPdfMetadata(absolutePath) : null;
61
- if (pdfMeta) {
62
- const hasAny = pdfMeta.title ?? pdfMeta.author ?? pdfMeta.doi ?? pdfMeta.subject;
63
- if (hasAny || pdfMeta.keywords.length > 0) {
64
- log.info("Extracted PDF metadata:");
65
- if (pdfMeta.title)
66
- log.step(`Title: ${pdfMeta.title}`);
67
- if (pdfMeta.author)
68
- log.step(`Author: ${pdfMeta.author}`);
69
- if (pdfMeta.subject)
70
- log.step(`Subject: ${pdfMeta.subject}`);
71
- if (pdfMeta.doi)
72
- log.step(`DOI: ${pdfMeta.doi}`);
73
- if (pdfMeta.keywords.length > 0)
74
- log.step(`Keywords: ${pdfMeta.keywords.join(", ")}`);
75
- if (pdfMeta.creationDate)
76
- log.step(`Created: ${pdfMeta.creationDate.toISOString()}`);
77
- if (pdfMeta.creator)
78
- log.step(`Creator: ${pdfMeta.creator}`);
62
+ let absolutePath;
63
+ let tempDir = null;
64
+ let doiFromFlag = null;
65
+ let unpaywallMeta = null;
66
+ if (options.doi) {
67
+ // ─── DOI mode: lookup Unpaywall and download OA PDF ───
68
+ const normalizedDoi = normalizeDoi(options.doi);
69
+ doiFromFlag = normalizedDoi;
70
+ const email = getConfig("email");
71
+ if (!email) {
72
+ log.error('Email is required for Unpaywall API. Set it with: paper config set email "you@example.com"');
73
+ process.exit(1);
79
74
  }
80
- }
81
- // Check for duplicate DOI in the knowledge base
82
- if (pdfMeta?.doi && !options.force) {
83
- const existing = litOps.findLiteratureByDoi(kbId, pdfMeta.doi);
84
- if (existing) {
85
- log.error(`A literature with DOI "${pdfMeta.doi}" already exists in this knowledge base: ${existing.id} (${existing.title})`);
86
- log.info("Use --force to add anyway.");
75
+ // Check for duplicate DOI before downloading
76
+ if (!options.force) {
77
+ const existing = litOps.findLiteratureByDoi(kbId, normalizedDoi);
78
+ if (existing) {
79
+ log.error(`A literature with DOI "${normalizedDoi}" already exists in this knowledge base: ${existing.id} (${existing.title})`);
80
+ log.info("Use --force to add anyway.");
81
+ process.exit(1);
82
+ }
83
+ }
84
+ log.info(`Looking up DOI: ${normalizedDoi}`);
85
+ try {
86
+ unpaywallMeta = await lookupDoi(normalizedDoi, email);
87
+ }
88
+ catch (err) {
89
+ if (err instanceof UnpaywallError) {
90
+ log.error(err.message);
91
+ }
92
+ else {
93
+ log.error(`Unpaywall lookup failed: ${err instanceof Error ? err.message : String(err)}`);
94
+ }
95
+ process.exit(1);
96
+ }
97
+ if (!unpaywallMeta.is_oa) {
98
+ log.error(`Paper is not Open Access (status: ${unpaywallMeta.oa_status}).`);
99
+ log.info(`Add it manually: paper lit add ${kbId} <file>`);
100
+ process.exit(1);
101
+ }
102
+ const pdfUrl = unpaywallMeta.best_oa_location?.url_for_pdf;
103
+ if (!pdfUrl) {
104
+ const landingPage = unpaywallMeta.best_oa_location?.url_for_landing_page;
105
+ log.error("Paper is Open Access but no direct PDF URL is available.");
106
+ if (landingPage) {
107
+ log.info(`Landing page: ${landingPage}`);
108
+ }
109
+ log.info(`Download the PDF manually and use: paper lit add ${kbId} <file>`);
87
110
  process.exit(1);
88
111
  }
112
+ // Show Unpaywall metadata
113
+ log.info("Unpaywall metadata:");
114
+ if (unpaywallMeta.title)
115
+ log.step(`Title: ${unpaywallMeta.title}`);
116
+ if (unpaywallMeta.z_authors && unpaywallMeta.z_authors.length > 0) {
117
+ log.step(`Authors: ${unpaywallMeta.z_authors.map((a) => a.raw_author_name).join(", ")}`);
118
+ }
119
+ if (unpaywallMeta.journal_name)
120
+ log.step(`Journal: ${unpaywallMeta.journal_name}`);
121
+ if (unpaywallMeta.year)
122
+ log.step(`Year: ${String(unpaywallMeta.year)}`);
123
+ log.step(`OA Status: ${unpaywallMeta.oa_status}`);
124
+ // Download PDF to temp location
125
+ tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "paper-unpaywall-"));
126
+ absolutePath = path.join(tempDir, `${normalizedDoi.replace(/\//g, "_")}.pdf`);
127
+ log.info("Downloading PDF...");
128
+ try {
129
+ await downloadPdf(pdfUrl, absolutePath);
130
+ }
131
+ catch (err) {
132
+ fs.rmSync(tempDir, { recursive: true, force: true });
133
+ if (err instanceof UnpaywallError) {
134
+ log.error(err.message);
135
+ }
136
+ else {
137
+ log.error(`PDF download failed: ${err instanceof Error ? err.message : String(err)}`);
138
+ }
139
+ process.exit(1);
140
+ }
141
+ log.step("PDF downloaded.");
89
142
  }
90
- const title = options.title ?? pdfMeta?.title ?? path.basename(litPath, path.extname(litPath));
91
- // Create literature record
92
- const literature = litOps.createLiterature({
93
- title,
94
- titleTranslation: null,
95
- author: pdfMeta?.author ?? null,
96
- abstract: pdfMeta?.subject ?? null,
97
- summary: null,
98
- keywords: pdfMeta?.keywords ?? [],
99
- url: null,
100
- doi: pdfMeta?.doi ?? null,
101
- notes: {},
102
- knowledgeBaseId: kbId,
103
- });
104
- // Copy file to storage
105
- const filesDir = getFilesDir(baseDir);
106
- const ext = path.extname(litPath);
107
- fs.mkdirSync(filesDir, { recursive: true });
108
- fs.copyFileSync(absolutePath, path.join(filesDir, `${literature.id}${ext}`));
109
- // Convert PDF to Markdown if opendataloader is available
110
- if (isPdf && (await isOpendataLoaderAvailable())) {
111
- const result = await convertPdfToMarkdown(absolutePath);
112
- if (result) {
113
- saveConvertResult(filesDir, literature.id, result);
114
- log.step("Converted to Markdown via opendataloader-pdf.");
143
+ else {
144
+ // ─── File mode (existing behavior) ────────────────────
145
+ // litPath is guaranteed to be defined here by the mutual exclusivity check above
146
+ const filePath = litPath ?? "";
147
+ absolutePath = path.resolve(filePath);
148
+ if (!fs.existsSync(absolutePath)) {
149
+ log.error(`File not found: ${absolutePath}`);
150
+ process.exit(1);
115
151
  }
116
152
  }
117
- // Split text and add to vector store
118
- log.info("Splitting text...");
119
- const splitDocs = splitDocuments(docs, { chunkSize: 1000, chunkOverlap: 200 });
120
- log.step(`Created ${String(splitDocs.length)} chunks.`);
121
- // Add literature ID metadata to each chunk
122
- for (const doc of splitDocs) {
123
- doc.metadata = { ...doc.metadata, literatureId: literature.id };
124
- }
125
- const vectorDir = path.join(getVectorStoreDir(baseDir), kbId);
126
- const modelConfig = getModelConfig(kb.embeddingModelId);
127
- log.info("Embedding and storing vectors...");
128
- const bar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
129
- bar.start(splitDocs.length, 0);
130
- // Check if both FAISS index files exist (not just the directory)
131
- const hasIndex = fs.existsSync(path.join(vectorDir, "faiss.index")) &&
132
- fs.existsSync(path.join(vectorDir, "docstore.json"));
133
- if (hasIndex) {
134
- await addDocuments(splitDocs, modelConfig, vectorDir);
153
+ // ─── Shared flow ──────────────────────────────────────
154
+ try {
155
+ log.info("Extracting content...");
156
+ const docs = await extractContent(absolutePath);
157
+ log.step(`Extracted ${String(docs.length)} pages.`);
158
+ // Extract PDF metadata if available
159
+ const isPdf = absolutePath.toLowerCase().endsWith(".pdf");
160
+ const pdfMeta = isPdf ? await extractPdfMetadata(absolutePath) : null;
161
+ if (pdfMeta) {
162
+ const hasAny = pdfMeta.title ?? pdfMeta.author ?? pdfMeta.doi ?? pdfMeta.subject;
163
+ if (hasAny || pdfMeta.keywords.length > 0) {
164
+ log.info("Extracted PDF metadata:");
165
+ if (pdfMeta.title)
166
+ log.step(`Title: ${pdfMeta.title}`);
167
+ if (pdfMeta.author)
168
+ log.step(`Author: ${pdfMeta.author}`);
169
+ if (pdfMeta.subject)
170
+ log.step(`Subject: ${pdfMeta.subject}`);
171
+ if (pdfMeta.doi)
172
+ log.step(`DOI: ${pdfMeta.doi}`);
173
+ if (pdfMeta.keywords.length > 0)
174
+ log.step(`Keywords: ${pdfMeta.keywords.join(", ")}`);
175
+ if (pdfMeta.creationDate)
176
+ log.step(`Created: ${pdfMeta.creationDate.toISOString()}`);
177
+ if (pdfMeta.creator)
178
+ log.step(`Creator: ${pdfMeta.creator}`);
179
+ }
180
+ }
181
+ // Check for duplicate DOI (file mode only — DOI mode already checked above)
182
+ const effectiveDoi = doiFromFlag ?? pdfMeta?.doi ?? null;
183
+ if (effectiveDoi && !doiFromFlag && !options.force) {
184
+ const existing = litOps.findLiteratureByDoi(kbId, effectiveDoi);
185
+ if (existing) {
186
+ log.error(`A literature with DOI "${effectiveDoi}" already exists in this knowledge base: ${existing.id} (${existing.title})`);
187
+ log.info("Use --force to add anyway.");
188
+ process.exit(1);
189
+ }
190
+ }
191
+ // Resolve metadata: CLI option > Unpaywall > PDF metadata > fallback
192
+ const unpaywallAuthors = unpaywallMeta?.z_authors && unpaywallMeta.z_authors.length > 0
193
+ ? unpaywallMeta.z_authors.map((a) => a.raw_author_name).join(", ")
194
+ : null;
195
+ const title = options.title ??
196
+ unpaywallMeta?.title ??
197
+ pdfMeta?.title ??
198
+ (litPath
199
+ ? path.basename(litPath, path.extname(litPath))
200
+ : (effectiveDoi ?? "Untitled"));
201
+ // Create literature record
202
+ const literature = litOps.createLiterature({
203
+ title,
204
+ titleTranslation: null,
205
+ author: pdfMeta?.author ?? unpaywallAuthors,
206
+ abstract: pdfMeta?.subject ?? null,
207
+ summary: null,
208
+ keywords: pdfMeta?.keywords ?? [],
209
+ url: null,
210
+ doi: effectiveDoi,
211
+ notes: {},
212
+ knowledgeBaseId: kbId,
213
+ });
214
+ // Copy file to storage
215
+ const filesDir = getFilesDir(baseDir);
216
+ const ext = path.extname(absolutePath);
217
+ fs.mkdirSync(filesDir, { recursive: true });
218
+ fs.copyFileSync(absolutePath, path.join(filesDir, `${literature.id}${ext}`));
219
+ // Convert PDF to Markdown if opendataloader is available
220
+ if (isPdf && (await isOpendataLoaderAvailable())) {
221
+ if (!(await isHybridBackendAvailable())) {
222
+ log.step("Hybrid backend (localhost:5002) is not running; using basic conversion. Start the backend for better quality.");
223
+ }
224
+ const result = await convertPdfToMarkdown(absolutePath);
225
+ if (result) {
226
+ saveConvertResult(filesDir, literature.id, result);
227
+ log.step("Converted to Markdown via opendataloader-pdf.");
228
+ }
229
+ }
230
+ // Split text and add to vector store
231
+ log.info("Splitting text...");
232
+ const splitDocs = splitDocuments(docs, { chunkSize: 1000, chunkOverlap: 200 });
233
+ log.step(`Created ${String(splitDocs.length)} chunks.`);
234
+ // Add literature ID metadata to each chunk
235
+ for (const doc of splitDocs) {
236
+ doc.metadata = { ...doc.metadata, literatureId: literature.id };
237
+ }
238
+ const vectorDir = path.join(getVectorStoreDir(baseDir), kbId);
239
+ const modelConfig = getModelConfig(kb.embeddingModelId);
240
+ log.info("Embedding and storing vectors...");
241
+ const bar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);
242
+ bar.start(splitDocs.length, 0);
243
+ // Check if both FAISS index files exist (not just the directory)
244
+ const hasIndex = fs.existsSync(path.join(vectorDir, "faiss.index")) &&
245
+ fs.existsSync(path.join(vectorDir, "docstore.json"));
246
+ if (hasIndex) {
247
+ await addDocuments(splitDocs, modelConfig, vectorDir);
248
+ }
249
+ else {
250
+ await createVectorStore(splitDocs, modelConfig, vectorDir);
251
+ }
252
+ bar.update(splitDocs.length);
253
+ bar.stop();
254
+ log.success(`Literature added: ${literature.id}`);
255
+ log.label("Title:", literature.title);
256
+ if (literature.author)
257
+ log.label("Author:", literature.author);
258
+ if (literature.abstract)
259
+ log.label("Abstract:", literature.abstract);
260
+ if (literature.doi)
261
+ log.label("DOI:", literature.doi);
262
+ if (literature.keywords.length > 0)
263
+ log.label("Keywords:", literature.keywords.join(", "));
264
+ }
265
+ finally {
266
+ if (tempDir) {
267
+ fs.rmSync(tempDir, { recursive: true, force: true });
268
+ }
135
269
  }
136
- else {
137
- await createVectorStore(splitDocs, modelConfig, vectorDir);
138
- }
139
- bar.update(splitDocs.length);
140
- bar.stop();
141
- log.success(`Literature added: ${literature.id}`);
142
- log.label("Title:", literature.title);
143
- if (literature.author)
144
- log.label("Author:", literature.author);
145
- if (literature.abstract)
146
- log.label("Abstract:", literature.abstract);
147
- if (literature.doi)
148
- log.label("DOI:", literature.doi);
149
- if (literature.keywords.length > 0)
150
- log.label("Keywords:", literature.keywords.join(", "));
151
270
  });
152
271
  // ─── lit convert ────────────────────────────────────────────
153
272
  lit
@@ -61,6 +61,7 @@ function getProjectConfigPath() {
61
61
  const configSchemas = {
62
62
  embeddingModels: z.record(z.string().min(1), EmbeddingModelConfigSchema),
63
63
  defaultEmbeddingModelId: z.string().min(1),
64
+ email: z.email(),
64
65
  };
65
66
  // ─── Config File I/O ────────────────────────────────────────
66
67
  export function readConfigFile(filePath) {
@@ -0,0 +1,19 @@
1
+ import { request } from "node:http";
2
+ const HYBRID_BACKEND_URL = "http://localhost:5002";
3
+ const HYBRID_PROBE_TIMEOUT_MS = 1500;
4
+ /** Check if the opendataloader hybrid backend is reachable at localhost:5002. */
5
+ export function isHybridBackendAvailable() {
6
+ return new Promise((resolve) => {
7
+ const req = request(HYBRID_BACKEND_URL, { method: "GET", timeout: HYBRID_PROBE_TIMEOUT_MS }, (res) => {
8
+ // Any response means the server is running
9
+ res.resume();
10
+ resolve(true);
11
+ });
12
+ req.on("error", () => resolve(false));
13
+ req.on("timeout", () => {
14
+ req.destroy();
15
+ resolve(false);
16
+ });
17
+ req.end();
18
+ });
19
+ }
@@ -1,8 +1,8 @@
1
1
  import { execFile } from "node:child_process";
2
2
  import { existsSync, mkdirSync, readdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
3
- import { request } from "node:http";
4
3
  import { tmpdir } from "node:os";
5
4
  import * as path from "node:path";
5
+ import { isHybridBackendAvailable } from "../dep/index.js";
6
6
  /**
7
7
  * Check whether opendataloader-pdf is available (package installed + Java runtime).
8
8
  * Result is cached after the first call.
@@ -124,24 +124,6 @@ export async function checkOpendataLoaderStatus() {
124
124
  hybridBackendAvailable,
125
125
  };
126
126
  }
127
- const HYBRID_BACKEND_URL = "http://localhost:5002";
128
- const HYBRID_PROBE_TIMEOUT_MS = 1500;
129
- /** Check if the opendataloader hybrid backend is reachable at localhost:5002. */
130
- function isHybridBackendAvailable() {
131
- return new Promise((resolve) => {
132
- const req = request(HYBRID_BACKEND_URL, { method: "GET", timeout: HYBRID_PROBE_TIMEOUT_MS }, (res) => {
133
- // Any response means the server is running
134
- res.resume();
135
- resolve(true);
136
- });
137
- req.on("error", () => resolve(false));
138
- req.on("timeout", () => {
139
- req.destroy();
140
- resolve(false);
141
- });
142
- req.end();
143
- });
144
- }
145
127
  // execFile is safe — arguments are passed as an array, no shell interpolation.
146
128
  function getJavaVersion() {
147
129
  return new Promise((resolve) => {
@@ -32,4 +32,5 @@ export const ConfigSchema = z.object({
32
32
  $schema: z.string().optional(),
33
33
  embeddingModels: z.record(z.string().min(1), EmbeddingModelConfigSchema).default({}),
34
34
  defaultEmbeddingModelId: z.string().min(1).optional(),
35
+ email: z.email().optional(),
35
36
  });
@@ -0,0 +1,71 @@
1
+ import { writeFile } from "node:fs/promises";
2
+ import * as z from "zod";
3
+ // ─── Unpaywall Response Schema ─────────────────────────────
4
+ const UnpaywallOaLocationSchema = z.object({
5
+ url_for_pdf: z.string().nullable(),
6
+ url_for_landing_page: z.string().nullable(),
7
+ license: z.string().nullable(),
8
+ version: z.string().nullable(),
9
+ host_type: z.string().nullable(),
10
+ });
11
+ const UnpaywallAuthorSchema = z.object({
12
+ raw_author_name: z.string(),
13
+ });
14
+ const UnpaywallResponseSchema = z.object({
15
+ is_oa: z.boolean(),
16
+ oa_status: z.string(),
17
+ title: z.string().nullable().optional(),
18
+ z_authors: z.array(UnpaywallAuthorSchema).nullable().optional(),
19
+ published_date: z.string().nullable().optional(),
20
+ journal_name: z.string().nullable().optional(),
21
+ year: z.number().nullable().optional(),
22
+ publisher: z.string().nullable().optional(),
23
+ best_oa_location: UnpaywallOaLocationSchema.nullable(),
24
+ doi: z.string(),
25
+ });
26
+ export class UnpaywallError extends Error {
27
+ code;
28
+ constructor(message, code) {
29
+ super(message);
30
+ this.name = "UnpaywallError";
31
+ this.code = code;
32
+ }
33
+ }
34
+ // ─── DOI Normalization ─────────────────────────────────────
35
+ export function normalizeDoi(input) {
36
+ return input.replace(/^https?:\/\/(dx\.)?doi\.org\//, "").replace(/^doi:/i, "");
37
+ }
38
+ // ─── API Client ────────────────────────────────────────────
39
+ export async function lookupDoi(doi, email) {
40
+ const url = `https://api.unpaywall.org/v2/${encodeURIComponent(doi)}?email=${encodeURIComponent(email)}`;
41
+ const response = await fetch(url, {
42
+ headers: { Accept: "application/json" },
43
+ redirect: "follow",
44
+ });
45
+ if (response.status === 404) {
46
+ throw new UnpaywallError(`DOI not found in Unpaywall: ${doi}`, "not_found");
47
+ }
48
+ if (!response.ok) {
49
+ throw new UnpaywallError(`Unpaywall API error: HTTP ${String(response.status)}`, "api_error");
50
+ }
51
+ const json = await response.json();
52
+ const result = UnpaywallResponseSchema.safeParse(json);
53
+ if (!result.success) {
54
+ throw new UnpaywallError(`Invalid Unpaywall API response: ${result.error.message}`, "parse_error");
55
+ }
56
+ return result.data;
57
+ }
58
+ // ─── PDF Download ──────────────────────────────────────────
59
+ export async function downloadPdf(url, destPath) {
60
+ const response = await fetch(url, { redirect: "follow" });
61
+ if (!response.ok) {
62
+ throw new UnpaywallError(`Failed to download PDF: HTTP ${String(response.status)}`, "download_error");
63
+ }
64
+ const contentType = response.headers.get("content-type") ?? "";
65
+ if (!contentType.includes("application/pdf") &&
66
+ !contentType.includes("application/octet-stream")) {
67
+ throw new UnpaywallError(`Expected PDF but received: ${contentType}`, "download_error");
68
+ }
69
+ const buffer = new Uint8Array(await response.arrayBuffer());
70
+ await writeFile(destPath, buffer);
71
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "paper-manager",
3
- "version": "0.11.2",
3
+ "version": "0.12.1",
4
4
  "description": "A paper management system.",
5
5
  "keywords": [],
6
6
  "homepage": "https://github.com/EurFelux/paper-manager",