academic-refchecker 2.0.18__py3-none-any.whl → 2.0.19__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- academic_refchecker-2.0.19.dist-info/METADATA +366 -0
- {academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/RECORD +7 -7
- refchecker/__version__.py +1 -1
- academic_refchecker-2.0.18.dist-info/METADATA +0 -877
- {academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/WHEEL +0 -0
- {academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/entry_points.txt +0 -0
- {academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/licenses/LICENSE +0 -0
- {academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,366 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: academic-refchecker
|
|
3
|
+
Version: 2.0.19
|
|
4
|
+
Summary: A comprehensive tool for validating reference accuracy in academic papers
|
|
5
|
+
Author-email: Mark Russinovich <markrussinovich@hotmail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/markrussinovich/refchecker
|
|
8
|
+
Project-URL: Repository, https://github.com/markrussinovich/refchecker
|
|
9
|
+
Project-URL: Bug Tracker, https://github.com/markrussinovich/refchecker/issues
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Science/Research
|
|
12
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.7
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Operating System :: OS Independent
|
|
20
|
+
Requires-Python: >=3.7
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Requires-Dist: requests>=2.25.0
|
|
24
|
+
Requires-Dist: beautifulsoup4>=4.9.0
|
|
25
|
+
Requires-Dist: pypdf>=5.0.0
|
|
26
|
+
Requires-Dist: arxiv>=1.4.0
|
|
27
|
+
Requires-Dist: python-dateutil>=2.8.0
|
|
28
|
+
Requires-Dist: tqdm>=4.60.0
|
|
29
|
+
Requires-Dist: colorama>=0.4.4
|
|
30
|
+
Requires-Dist: fuzzywuzzy>=0.18.0
|
|
31
|
+
Requires-Dist: python-Levenshtein>=0.12.0
|
|
32
|
+
Requires-Dist: pandas<2.4.0,>=1.3.0
|
|
33
|
+
Requires-Dist: numpy<2.0.0,>=1.22.4
|
|
34
|
+
Requires-Dist: pdfplumber>=0.6.0
|
|
35
|
+
Requires-Dist: bibtexparser>=1.4.0
|
|
36
|
+
Provides-Extra: dev
|
|
37
|
+
Requires-Dist: pytest>=6.0.0; extra == "dev"
|
|
38
|
+
Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
|
|
39
|
+
Requires-Dist: black>=21.0.0; extra == "dev"
|
|
40
|
+
Requires-Dist: isort>=5.0.0; extra == "dev"
|
|
41
|
+
Requires-Dist: flake8>=3.9.0; extra == "dev"
|
|
42
|
+
Requires-Dist: mypy>=0.910; extra == "dev"
|
|
43
|
+
Provides-Extra: docs
|
|
44
|
+
Requires-Dist: sphinx>=4.0.0; extra == "docs"
|
|
45
|
+
Requires-Dist: sphinx-rtd-theme>=0.5.0; extra == "docs"
|
|
46
|
+
Provides-Extra: llm
|
|
47
|
+
Requires-Dist: openai>=1.0.0; extra == "llm"
|
|
48
|
+
Requires-Dist: anthropic>=0.7.0; extra == "llm"
|
|
49
|
+
Requires-Dist: google-generativeai>=0.3.0; extra == "llm"
|
|
50
|
+
Provides-Extra: optional
|
|
51
|
+
Requires-Dist: lxml>=4.6.0; extra == "optional"
|
|
52
|
+
Requires-Dist: selenium>=4.0.0; extra == "optional"
|
|
53
|
+
Requires-Dist: pikepdf>=5.0.0; extra == "optional"
|
|
54
|
+
Requires-Dist: nltk>=3.6.0; extra == "optional"
|
|
55
|
+
Requires-Dist: scikit-learn>=1.0.0; extra == "optional"
|
|
56
|
+
Requires-Dist: joblib>=1.1.0; extra == "optional"
|
|
57
|
+
Provides-Extra: vllm
|
|
58
|
+
Requires-Dist: vllm>=0.3.0; extra == "vllm"
|
|
59
|
+
Requires-Dist: huggingface_hub>=0.17.0; extra == "vllm"
|
|
60
|
+
Requires-Dist: torch>=2.0.0; extra == "vllm"
|
|
61
|
+
Provides-Extra: webui
|
|
62
|
+
Requires-Dist: fastapi>=0.100.0; extra == "webui"
|
|
63
|
+
Requires-Dist: uvicorn[standard]>=0.22.0; extra == "webui"
|
|
64
|
+
Requires-Dist: pydantic>=2.0.0; extra == "webui"
|
|
65
|
+
Requires-Dist: aiosqlite>=0.19.0; extra == "webui"
|
|
66
|
+
Requires-Dist: httpx>=0.24.0; extra == "webui"
|
|
67
|
+
Requires-Dist: cryptography>=42.0.0; extra == "webui"
|
|
68
|
+
Requires-Dist: pymupdf>=1.23.0; extra == "webui"
|
|
69
|
+
Requires-Dist: Pillow>=9.0.0; extra == "webui"
|
|
70
|
+
Requires-Dist: python-multipart>=0.0.6; extra == "webui"
|
|
71
|
+
Dynamic: license-file
|
|
72
|
+
|
|
73
|
+
# RefChecker
|
|
74
|
+
|
|
75
|
+
Validate reference accuracy in academic papers. Useful for authors checking bibliographies and reviewers ensuring citations are authentic. RefChecker verifies citations against Semantic Scholar, OpenAlex, and CrossRef.
|
|
76
|
+
|
|
77
|
+
*Built by Mark Russinovich with AI assistants (Cursor, GitHub Copilot, Claude Code). [Watch the deep dive video](https://www.youtube.com/watch?v=n929Alz-fjo).*
|
|
78
|
+
|
|
79
|
+
## Contents
|
|
80
|
+
|
|
81
|
+
- [Quick Start](#quick-start)
|
|
82
|
+
- [Features](#features)
|
|
83
|
+
- [Sample Output](#sample-output)
|
|
84
|
+
- [Install](#install)
|
|
85
|
+
- [Run](#run)
|
|
86
|
+
- [Output](#output)
|
|
87
|
+
- [Configure](#configure)
|
|
88
|
+
- [Docker](#docker)
|
|
89
|
+
- [Local Database](#local-database)
|
|
90
|
+
- [Testing](#testing)
|
|
91
|
+
- [License](#license)
|
|
92
|
+
|
|
93
|
+
## Quick Start
|
|
94
|
+
|
|
95
|
+
### Web UI (Docker)
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Open **http://localhost:8000** in your browser.
|
|
102
|
+
|
|
103
|
+
### Web UI (pip)
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
pip install academic-refchecker[llm,webui]
|
|
107
|
+
refchecker-webui
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### CLI (pip)
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
pip install academic-refchecker[llm]
|
|
114
|
+
academic-refchecker --paper 1706.03762
|
|
115
|
+
academic-refchecker --paper /path/to/paper.pdf
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
> **Performance**: Set `SEMANTIC_SCHOLAR_API_KEY` for 1-2s per reference vs 5-10s without.
|
|
119
|
+
|
|
120
|
+
## Features
|
|
121
|
+
|
|
122
|
+
- **Multiple formats**: ArXiv papers, PDFs, LaTeX, text files
|
|
123
|
+
- **LLM-powered extraction**: OpenAI, Anthropic, Google, Azure, vLLM
|
|
124
|
+
- **Multi-source verification**: Semantic Scholar, OpenAlex, CrossRef
|
|
125
|
+
- **Comprehensive checks**: Titles, authors, years, venues, DOIs, ArXiv IDs
|
|
126
|
+
- **Smart matching**: Handles formatting variations (BERT vs B-ERT, pre-trained vs pretrained)
|
|
127
|
+
- **Detailed reports**: Errors, warnings, corrected references
|
|
128
|
+
|
|
129
|
+
## Sample Output
|
|
130
|
+
|
|
131
|
+
**Web UI**
|
|
132
|
+
|
|
133
|
+

|
|
134
|
+
|
|
135
|
+
**CLI**
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
📄 Processing: Attention Is All You Need
|
|
139
|
+
URL: https://arxiv.org/abs/1706.03762
|
|
140
|
+
|
|
141
|
+
[1/45] Neural machine translation in linear time
|
|
142
|
+
Nal Kalchbrenner et al. | 2017
|
|
143
|
+
⚠️ Warning: Year mismatch: cited '2017', actual '2016'
|
|
144
|
+
|
|
145
|
+
[2/45] Effective approaches to attention-based neural machine translation
|
|
146
|
+
Minh-Thang Luong et al. | 2015
|
|
147
|
+
❌ Error: First author mismatch: cited 'Minh-Thang Luong', actual 'Thang Luong'
|
|
148
|
+
|
|
149
|
+
[3/45] Deep Residual Learning for Image Recognition
|
|
150
|
+
Kaiming He et al. | 2016 | https://doi.org/10.1109/CVPR.2016.91
|
|
151
|
+
❌ Error: DOI mismatch: cited '10.1109/CVPR.2016.91', actual '10.1109/CVPR.2016.90'
|
|
152
|
+
|
|
153
|
+
============================================================
|
|
154
|
+
📋 SUMMARY
|
|
155
|
+
📚 Total references processed: 68
|
|
156
|
+
❌ Total errors: 55 ⚠️ Total warnings: 16 ❓ Unverified: 15
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## Install
|
|
160
|
+
|
|
161
|
+
### PyPI (Recommended)
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
pip install academic-refchecker[llm,webui] # Web UI + CLI + LLM providers
|
|
165
|
+
pip install academic-refchecker # CLI only
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### From Source (Development)
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
git clone https://github.com/markrussinovich/refchecker.git && cd refchecker
|
|
172
|
+
python -m venv .venv && source .venv/bin/activate
|
|
173
|
+
pip install -e ".[llm,webui]"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
**Requirements:** Python 3.7+ (3.10+ recommended). Node.js 18+ is only needed for Web UI development.
|
|
177
|
+
|
|
178
|
+
## Run
|
|
179
|
+
|
|
180
|
+
### Web UI
|
|
181
|
+
|
|
182
|
+
The Web UI shows live progress, history, and export (including corrected values).
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
refchecker-webui --port 8000
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
#### Development (frontend)
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
cd web-ui
|
|
192
|
+
npm install
|
|
193
|
+
npm start
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Open **http://localhost:5173**.
|
|
197
|
+
|
|
198
|
+
Alternative (separate servers):
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
# Terminal 1
|
|
202
|
+
python -m uvicorn backend.main:app --reload --port 8000
|
|
203
|
+
|
|
204
|
+
# Terminal 2
|
|
205
|
+
cd web-ui
|
|
206
|
+
npm run dev
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
Verify the backend is running:
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
curl http://localhost:8000/
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
Web UI documentation: see [web-ui/README.md](web-ui/README.md).
|
|
216
|
+
|
|
217
|
+
### CLI
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
# ArXiv (ID or URL)
|
|
221
|
+
academic-refchecker --paper 1706.03762
|
|
222
|
+
academic-refchecker --paper https://arxiv.org/abs/1706.03762
|
|
223
|
+
|
|
224
|
+
# Local files
|
|
225
|
+
academic-refchecker --paper paper.pdf
|
|
226
|
+
academic-refchecker --paper paper.tex
|
|
227
|
+
academic-refchecker --paper paper.txt
|
|
228
|
+
academic-refchecker --paper refs.bib
|
|
229
|
+
|
|
230
|
+
# Faster/offline verification (local DB)
|
|
231
|
+
academic-refchecker --paper paper.pdf --db-path semantic_scholar_db/semantic_scholar.db
|
|
232
|
+
|
|
233
|
+
# Save results
|
|
234
|
+
academic-refchecker --paper 1706.03762 --output-file errors.txt
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
## Output
|
|
238
|
+
|
|
239
|
+
RefChecker reports these result types:
|
|
240
|
+
|
|
241
|
+
| Type | Description | Examples |
|
|
242
|
+
|------|-------------|----------|
|
|
243
|
+
| ❌ **Error** | Critical issues needing correction | Author/title/DOI mismatches, incorrect ArXiv IDs |
|
|
244
|
+
| ⚠️ **Warning** | Minor issues to review | Year differences, venue variations |
|
|
245
|
+
| ℹ️ **Suggestion** | Recommended improvements | Add missing ArXiv/DOI URLs, small metadata fixes |
|
|
246
|
+
| ❓ **Unverified** | Could not verify against any source | Rare publications, preprints |
|
|
247
|
+
|
|
248
|
+
Verified references include discovered URLs (Semantic Scholar, ArXiv, DOI). Suggestions are non-blocking improvements.
|
|
249
|
+
|
|
250
|
+
<details>
|
|
251
|
+
<summary>Detailed examples</summary>
|
|
252
|
+
|
|
253
|
+
```
|
|
254
|
+
❌ Error: First author mismatch: cited 'T. Xie', actual 'Zhao Xu'
|
|
255
|
+
❌ Error: DOI mismatch: cited '10.5555/3295222.3295349', actual '10.48550/arXiv.1706.03762'
|
|
256
|
+
⚠️ Warning: Year mismatch: cited '2024', actual '2023'
|
|
257
|
+
ℹ️ Suggestion: Add ArXiv URL https://arxiv.org/abs/1706.03762
|
|
258
|
+
❓ Could not verify: Llama guard (M. A. Research, 2024)
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
</details>
|
|
262
|
+
|
|
263
|
+
## Configure
|
|
264
|
+
|
|
265
|
+
### LLM
|
|
266
|
+
|
|
267
|
+
LLM-powered extraction improves accuracy with complex bibliographies. Claude Sonnet 4 performs best; GPT-4o may hallucinate DOIs.
|
|
268
|
+
|
|
269
|
+
| Provider | Env Variable | Example Model |
|
|
270
|
+
|----------|--------------|---------------|
|
|
271
|
+
| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` |
|
|
272
|
+
| OpenAI | `OPENAI_API_KEY` | `gpt-4o` |
|
|
273
|
+
| Google | `GOOGLE_API_KEY` | `gemini-2.5-flash` |
|
|
274
|
+
| Azure | `AZURE_OPENAI_API_KEY` | `gpt-4` |
|
|
275
|
+
| vLLM | (local) | `meta-llama/Llama-3.1-8B-Instruct` |
|
|
276
|
+
|
|
277
|
+
```bash
|
|
278
|
+
export ANTHROPIC_API_KEY=your_key
|
|
279
|
+
academic-refchecker --paper 1706.03762 --llm-provider anthropic
|
|
280
|
+
|
|
281
|
+
academic-refchecker --paper paper.pdf --llm-provider openai --llm-model gpt-4o
|
|
282
|
+
academic-refchecker --paper paper.pdf --llm-provider vllm --llm-model meta-llama/Llama-3.1-8B-Instruct
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
#### Local models (vLLM)
|
|
286
|
+
|
|
287
|
+
There is no separate “GPU Docker image”. For local inference, install the vLLM extra and run an OpenAI-compatible vLLM server:
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
pip install "academic-refchecker[vllm]"
|
|
291
|
+
python scripts/start_vllm_server.py --model meta-llama/Llama-3.1-8B-Instruct --port 8001
|
|
292
|
+
academic-refchecker --paper paper.pdf --llm-provider vllm --llm-endpoint http://localhost:8001/v1
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### Command Line
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
--paper PAPER # ArXiv ID, URL, or file path
|
|
299
|
+
--llm-provider PROVIDER # openai, anthropic, google, azure, vllm
|
|
300
|
+
--llm-model MODEL # Override default model
|
|
301
|
+
--db-path PATH # Local database for offline verification
|
|
302
|
+
--output-file [PATH] # Save results (default: reference_errors.txt)
|
|
303
|
+
--debug # Verbose output
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Environment Variables
|
|
307
|
+
|
|
308
|
+
```bash
|
|
309
|
+
# LLM
|
|
310
|
+
export REFCHECKER_LLM_PROVIDER=anthropic
|
|
311
|
+
export ANTHROPIC_API_KEY=your_key # Also: OPENAI_API_KEY, GOOGLE_API_KEY
|
|
312
|
+
|
|
313
|
+
# Performance
|
|
314
|
+
export SEMANTIC_SCHOLAR_API_KEY=your_key # Higher rate limits / faster verification
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
## Docker
|
|
318
|
+
|
|
319
|
+
Pre-built images are published to GitHub Container Registry.
|
|
320
|
+
|
|
321
|
+
```bash
|
|
322
|
+
docker run -p 8000:8000 \
|
|
323
|
+
-e ANTHROPIC_API_KEY=your_key \
|
|
324
|
+
-v refchecker-data:/app/data \
|
|
325
|
+
ghcr.io/markrussinovich/refchecker:latest
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
Docker Compose:
|
|
329
|
+
|
|
330
|
+
```bash
|
|
331
|
+
git clone https://github.com/markrussinovich/refchecker.git && cd refchecker
|
|
332
|
+
cp .env.example .env # Add your API keys
|
|
333
|
+
docker compose up -d
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
| Tag | Description | Arch | Size |
|
|
337
|
+
|-----|-------------|------|------|
|
|
338
|
+
| `latest` | RefChecker (Web UI + API-based LLM support) | amd64, arm64 | ~800MB |
|
|
339
|
+
|
|
340
|
+
## Local Database
|
|
341
|
+
|
|
342
|
+
For offline verification or faster processing:
|
|
343
|
+
|
|
344
|
+
```bash
|
|
345
|
+
python scripts/download_db.py \
|
|
346
|
+
--field "computer science" \
|
|
347
|
+
--start-year 2020 --end-year 2024
|
|
348
|
+
|
|
349
|
+
academic-refchecker --paper paper.pdf --db-path semantic_scholar_db/semantic_scholar.db
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
## Testing
|
|
353
|
+
|
|
354
|
+
490+ tests covering unit, integration, and end-to-end scenarios.
|
|
355
|
+
|
|
356
|
+
```bash
|
|
357
|
+
pytest tests/ # All tests
|
|
358
|
+
pytest tests/unit/ # Unit only
|
|
359
|
+
pytest --cov=src tests/ # With coverage
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
See [tests/README.md](tests/README.md) for details.
|
|
363
|
+
|
|
364
|
+
## License
|
|
365
|
+
|
|
366
|
+
MIT License - see [LICENSE](LICENSE).
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
academic_refchecker-2.0.
|
|
1
|
+
academic_refchecker-2.0.19.dist-info/licenses/LICENSE,sha256=Kwrx3fePVCeEFDCZvCW4OuoTNBiSoYbpGBI6qzGhWF0,1067
|
|
2
2
|
backend/__init__.py,sha256=TFVkOx5tSp3abty15RzUbaSwQ9ZD0kfUn7PDh63xkYY,521
|
|
3
3
|
backend/__main__.py,sha256=74V7yUMsRSZaaRyXYm-rZVc3TVUcUgwsoTQTUbV5EqM,211
|
|
4
4
|
backend/cli.py,sha256=xV3l9M5OdNQQYOcrzj2d_7RmCgj7CXP_1oi0TPe6zNo,1672
|
|
@@ -19,7 +19,7 @@ backend/static/assets/index-DMZJNrR0.js,sha256=UhK5CQ8IufZmx6FTvXUCtkRxTqpGK7czS
|
|
|
19
19
|
backend/static/assets/index-hk21nqxR.js,sha256=z2agP8ZFYw4AfYi-GJ5E_8_k-lPF-frXOJtPk-I0hDs,369533
|
|
20
20
|
refchecker/__init__.py,sha256=Pg5MrtLxDBRcNYcI02N-bv3tzURVd1S3nQ8IyF7Zw7E,322
|
|
21
21
|
refchecker/__main__.py,sha256=agBbT9iKN0g2xXtRNCoh29Nr7z2n5vU-r0MCVJKi4tI,232
|
|
22
|
-
refchecker/__version__.py,sha256=
|
|
22
|
+
refchecker/__version__.py,sha256=vHLK-xjo5zn4rni47JXc2jHVKcszjZka6aMOXd6sYg8,66
|
|
23
23
|
refchecker/checkers/__init__.py,sha256=-dR7HX0bfPq9YMXrnODoYbfNWFLqu706xoVsUdWHYRI,611
|
|
24
24
|
refchecker/checkers/arxiv_citation.py,sha256=j_waQmQSP3iuZdVuBE92ghtiOdGFTCx09s6f4mHik6o,27777
|
|
25
25
|
refchecker/checkers/crossref.py,sha256=88moAyTudBqf9SKqTQkNAq1yyuRe95f8r4EpmJznupQ,20937
|
|
@@ -62,8 +62,8 @@ refchecker/utils/mock_objects.py,sha256=QxU-UXyHSY27IZYN8Sb8ei0JtNkpGSdMXoErrRLH
|
|
|
62
62
|
refchecker/utils/text_utils.py,sha256=Tx1k0SqS1cmw4N9BDJY-Ipep2T-HMmKPqi4SMcq1ZJ8,235751
|
|
63
63
|
refchecker/utils/unicode_utils.py,sha256=-WBKarXO756p7fd7gCeNsMag4ztDNURwFX5IVniOtwY,10366
|
|
64
64
|
refchecker/utils/url_utils.py,sha256=7b0rWCQJSajzqOvD7ghsBZPejiq6mUIz6SGhvU_WGDs,9441
|
|
65
|
-
academic_refchecker-2.0.
|
|
66
|
-
academic_refchecker-2.0.
|
|
67
|
-
academic_refchecker-2.0.
|
|
68
|
-
academic_refchecker-2.0.
|
|
69
|
-
academic_refchecker-2.0.
|
|
65
|
+
academic_refchecker-2.0.19.dist-info/METADATA,sha256=RAuuYEAPfr8Q2SU2aDvHhsBKoYYAXOxyfgBTlTDC4oc,11224
|
|
66
|
+
academic_refchecker-2.0.19.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
|
|
67
|
+
academic_refchecker-2.0.19.dist-info/entry_points.txt,sha256=9cREsaKwlp05Ql0CBIjKrNHk5IG2cHY5LvJPsV2-SxA,108
|
|
68
|
+
academic_refchecker-2.0.19.dist-info/top_level.txt,sha256=FfNvrvpj25gfpUBjW0epvz7Qrdejhups5Za_DBiSRu4,19
|
|
69
|
+
academic_refchecker-2.0.19.dist-info/RECORD,,
|
refchecker/__version__.py
CHANGED
|
@@ -1,877 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: academic-refchecker
|
|
3
|
-
Version: 2.0.18
|
|
4
|
-
Summary: A comprehensive tool for validating reference accuracy in academic papers
|
|
5
|
-
Author-email: Mark Russinovich <markrussinovich@hotmail.com>
|
|
6
|
-
License-Expression: MIT
|
|
7
|
-
Project-URL: Homepage, https://github.com/markrussinovich/refchecker
|
|
8
|
-
Project-URL: Repository, https://github.com/markrussinovich/refchecker
|
|
9
|
-
Project-URL: Bug Tracker, https://github.com/markrussinovich/refchecker/issues
|
|
10
|
-
Classifier: Development Status :: 4 - Beta
|
|
11
|
-
Classifier: Intended Audience :: Science/Research
|
|
12
|
-
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.7
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.8
|
|
16
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
-
Classifier: Operating System :: OS Independent
|
|
20
|
-
Requires-Python: >=3.7
|
|
21
|
-
Description-Content-Type: text/markdown
|
|
22
|
-
License-File: LICENSE
|
|
23
|
-
Requires-Dist: requests>=2.25.0
|
|
24
|
-
Requires-Dist: beautifulsoup4>=4.9.0
|
|
25
|
-
Requires-Dist: pypdf>=5.0.0
|
|
26
|
-
Requires-Dist: arxiv>=1.4.0
|
|
27
|
-
Requires-Dist: python-dateutil>=2.8.0
|
|
28
|
-
Requires-Dist: tqdm>=4.60.0
|
|
29
|
-
Requires-Dist: colorama>=0.4.4
|
|
30
|
-
Requires-Dist: fuzzywuzzy>=0.18.0
|
|
31
|
-
Requires-Dist: python-Levenshtein>=0.12.0
|
|
32
|
-
Requires-Dist: pandas<2.4.0,>=1.3.0
|
|
33
|
-
Requires-Dist: numpy<2.0.0,>=1.22.4
|
|
34
|
-
Requires-Dist: pdfplumber>=0.6.0
|
|
35
|
-
Requires-Dist: bibtexparser>=1.4.0
|
|
36
|
-
Provides-Extra: dev
|
|
37
|
-
Requires-Dist: pytest>=6.0.0; extra == "dev"
|
|
38
|
-
Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
|
|
39
|
-
Requires-Dist: black>=21.0.0; extra == "dev"
|
|
40
|
-
Requires-Dist: isort>=5.0.0; extra == "dev"
|
|
41
|
-
Requires-Dist: flake8>=3.9.0; extra == "dev"
|
|
42
|
-
Requires-Dist: mypy>=0.910; extra == "dev"
|
|
43
|
-
Provides-Extra: docs
|
|
44
|
-
Requires-Dist: sphinx>=4.0.0; extra == "docs"
|
|
45
|
-
Requires-Dist: sphinx-rtd-theme>=0.5.0; extra == "docs"
|
|
46
|
-
Provides-Extra: llm
|
|
47
|
-
Requires-Dist: openai>=1.0.0; extra == "llm"
|
|
48
|
-
Requires-Dist: anthropic>=0.7.0; extra == "llm"
|
|
49
|
-
Requires-Dist: google-generativeai>=0.3.0; extra == "llm"
|
|
50
|
-
Provides-Extra: optional
|
|
51
|
-
Requires-Dist: lxml>=4.6.0; extra == "optional"
|
|
52
|
-
Requires-Dist: selenium>=4.0.0; extra == "optional"
|
|
53
|
-
Requires-Dist: pikepdf>=5.0.0; extra == "optional"
|
|
54
|
-
Requires-Dist: nltk>=3.6.0; extra == "optional"
|
|
55
|
-
Requires-Dist: scikit-learn>=1.0.0; extra == "optional"
|
|
56
|
-
Requires-Dist: joblib>=1.1.0; extra == "optional"
|
|
57
|
-
Provides-Extra: vllm
|
|
58
|
-
Requires-Dist: vllm>=0.3.0; extra == "vllm"
|
|
59
|
-
Requires-Dist: huggingface_hub>=0.17.0; extra == "vllm"
|
|
60
|
-
Requires-Dist: torch>=2.0.0; extra == "vllm"
|
|
61
|
-
Provides-Extra: webui
|
|
62
|
-
Requires-Dist: fastapi>=0.100.0; extra == "webui"
|
|
63
|
-
Requires-Dist: uvicorn[standard]>=0.22.0; extra == "webui"
|
|
64
|
-
Requires-Dist: pydantic>=2.0.0; extra == "webui"
|
|
65
|
-
Requires-Dist: aiosqlite>=0.19.0; extra == "webui"
|
|
66
|
-
Requires-Dist: httpx>=0.24.0; extra == "webui"
|
|
67
|
-
Requires-Dist: cryptography>=42.0.0; extra == "webui"
|
|
68
|
-
Requires-Dist: pymupdf>=1.23.0; extra == "webui"
|
|
69
|
-
Requires-Dist: Pillow>=9.0.0; extra == "webui"
|
|
70
|
-
Requires-Dist: python-multipart>=0.0.6; extra == "webui"
|
|
71
|
-
Dynamic: license-file
|
|
72
|
-
|
|
73
|
-
# 📚 Academic Paper Reference Checker
|
|
74
|
-
|
|
75
|
-
*Developed by Mark Russinovich with various AI assistants, including Cursor, GitHub Copilot and Claude Code*
|
|
76
|
-
|
|
77
|
-
A comprehensive tool for validating reference accuracy in academic papers, useful for both authors checking their bibliography and conference reviewers ensuring that paper references are authentic and accurate. This tool processes papers from various local and online sources including ArXiv, PDF files, LaTeX documents, and text files to verify the accuracy of references by comparing cited information against authoritative sources.
|
|
78
|
-
|
|
79
|
-
## 🎥 Project Deep Dive
|
|
80
|
-
|
|
81
|
-
Learn about RefChecker's design philosophy and development process in this detailed discussion between Mark Russinovich (RefChecker's author) and Scott Hanselman. Mark shares insights into how he leveraged AI coding assistants including Cursor, GitHub Copilot, and Claude to build this comprehensive academic reference validation tool.
|
|
82
|
-
|
|
83
|
-
**[📺 Watch: "AI Coding with Mark Russinovich: Building RefChecker"](https://www.youtube.com/watch?v=n929Alz-fjo)**
|
|
84
|
-
|
|
85
|
-
*This video provides valuable insights into modern AI-assisted development workflows and the technical decisions behind RefChecker's architecture.*
|
|
86
|
-
|
|
87
|
-
## 📊 Sample Output
|
|
88
|
-
|
|
89
|
-
```
|
|
90
|
-
📄 Processing: Attention Is All You Need
|
|
91
|
-
URL: https://arxiv.org/abs/1706.03762
|
|
92
|
-
|
|
93
|
-
[1/45] Neural machine translation in linear time
|
|
94
|
-
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu
|
|
95
|
-
2017
|
|
96
|
-
|
|
97
|
-
Verified URL: https://www.semanticscholar.org/paper/5f4ac1ac7ca4b17d3db1b52d9aafd9e8b26c0d7
|
|
98
|
-
ArXiv URL: https://arxiv.org/abs/1610.10099
|
|
99
|
-
DOI URL: https://doi.org/10.48550/arxiv.1610.10099
|
|
100
|
-
⚠️ Warning: Year mismatch:
|
|
101
|
-
cited: '2017'
|
|
102
|
-
actual: '2016'
|
|
103
|
-
|
|
104
|
-
[2/45] Effective approaches to attention-based neural machine translation
|
|
105
|
-
Minh-Thang Luong, Hieu Pham, Christopher D. Manning
|
|
106
|
-
2015
|
|
107
|
-
|
|
108
|
-
Verified URL: https://www.semanticscholar.org/paper/93499a7c7f699b6630a86fad964536f9423bb6d0
|
|
109
|
-
ArXiv URL: https://arxiv.org/abs/1508.04025
|
|
110
|
-
DOI URL: https://doi.org/10.18653/v1/d15-1166
|
|
111
|
-
❌ Error: First author mismatch:
|
|
112
|
-
cited: 'Minh-Thang Luong'
|
|
113
|
-
actual: 'Thang Luong'
|
|
114
|
-
|
|
115
|
-
[3/45] Deep Residual Learning for Image Recognition
|
|
116
|
-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
|
|
117
|
-
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
|
|
118
|
-
2016
|
|
119
|
-
https://doi.org/10.1109/CVPR.2016.91
|
|
120
|
-
|
|
121
|
-
Verified URL: https://www.semanticscholar.org/paper/2c03df8b48bf3fa39054345bafabfeff15bfd11d
|
|
122
|
-
ArXiv URL: https://arxiv.org/abs/1512.03385
|
|
123
|
-
DOI URL: https://doi.org/10.1109/CVPR.2016.90
|
|
124
|
-
❌ Error: DOI mismatch:
|
|
125
|
-
cited: '10.1109/CVPR.2016.91'
|
|
126
|
-
actual: '10.1109/CVPR.2016.90'
|
|
127
|
-
|
|
128
|
-
============================================================
|
|
129
|
-
📋 SUMMARY
|
|
130
|
-
============================================================
|
|
131
|
-
📚 Total references processed: 68
|
|
132
|
-
❌ Total errors: 55
|
|
133
|
-
⚠️ Total warnings: 16
|
|
134
|
-
❓ References that couldn't be verified: 15
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
## 📋 Table of Contents
|
|
138
|
-
|
|
139
|
-
- [🎥 Project Deep Dive](#-project-deep-dive)
|
|
140
|
-
- [📊 Sample Output](#-sample-output)
|
|
141
|
-
- [🎯 Features](#-features)
|
|
142
|
-
- [🚀 Quick Start](#-quick-start)
|
|
143
|
-
- [🐳 Docker (Recommended)](#-docker-recommended)
|
|
144
|
-
- [🌐 Web UI](#-web-ui)
|
|
145
|
-
- [🤖 LLM-Enhanced Reference Extraction](#-llm-enhanced-reference-extraction)
|
|
146
|
-
- [📦 Installation](#-installation)
|
|
147
|
-
- [📖 Usage](#-usage)
|
|
148
|
-
- [📊 Output and Results](#-output-and-results)
|
|
149
|
-
- [⚙️ Configuration](#-configuration)
|
|
150
|
-
- [🗄️ Local Database Setup](#-local-database-setup)
|
|
151
|
-
- [🧪 Testing](#-testing)
|
|
152
|
-
- [📄 License](#-license)
|
|
153
|
-
|
|
154
|
-
## 🎯 Features
|
|
155
|
-
|
|
156
|
-
- **📄 Multiple Input Formats**: Process ArXiv papers, local PDFs, LaTeX files, and text documents
|
|
157
|
-
- **🔍 Advanced Bibliography Detection**: Uses intelligent pattern matching to identify bibliography sections
|
|
158
|
-
- **🤖 LLM-Enhanced Reference Extraction**: Recommended AI-powered bibliography parsing with support for OpenAI, Anthropic, Google, Azure, and local vLLM
|
|
159
|
-
- **✅ Comprehensive Error Detection**: Identifies issues with titles, authors, years, venues, URLs, and DOIs
|
|
160
|
-
- **🔄 Multi-Tier Verification Sources**: Uses a prioritized check of Semantic Scholar, OpenAlex, and CrossRef with intelligent retry logic
|
|
161
|
-
- **🔗 Enhanced URL Discovery**: Automatically discovers and displays additional authoritative URLs (Semantic Scholar, ArXiv, DOI) obtained through verification
|
|
162
|
-
- **🧠 Smart Title Matching**: Advanced similarity algorithms handle common academic formatting variations (BERT vs B-ERT, pre-trained vs pretrained)
|
|
163
|
-
- **🏢 Venue Normalization**: Recognizes common journal and conference abbreviation patterns
|
|
164
|
-
- **📊 Detailed Reporting**: Generates comprehensive error reports with drop-in corrected references
|
|
165
|
-
|
|
166
|
-
## 🚀 Quick Start
|
|
167
|
-
|
|
168
|
-
### Easiest: Use Docker
|
|
169
|
-
|
|
170
|
-
```bash
|
|
171
|
-
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=your_key ghcr.io/markrussinovich/refchecker:latest
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
Open **http://localhost:8000** in your browser. See [Docker](#-docker-recommended) for more options.
|
|
175
|
-
|
|
176
|
-
### Command Line (requires Python installation)
|
|
177
|
-
|
|
178
|
-
1. **Check a famous paper:**
|
|
179
|
-
```bash
|
|
180
|
-
python run_refchecker.py --paper 1706.03762
|
|
181
|
-
```
|
|
182
|
-
|
|
183
|
-
2. **Check your own PDF:**
|
|
184
|
-
```bash
|
|
185
|
-
python run_refchecker.py --paper /path/to/your/paper.pdf
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
3. **For faster processing with local database** (see [Local Database Setup](#local-database-setup)):
|
|
189
|
-
```bash
|
|
190
|
-
python run_refchecker.py --paper 1706.03762 --db-path semantic_scholar_db/semantic_scholar.db
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
> **⚡ Performance Tip**: Reference verification takes 5-10 seconds per reference without a Semantic Scholar API key due to rate limiting. With an API key, verification speeds up to 1-2 seconds per reference. Set `SEMANTIC_SCHOLAR_API_KEY` environment variable or use `--semantic-scholar-api-key` for faster processing.
|
|
194
|
-
|
|
195
|
-
## 🐳 Docker (Recommended)
|
|
196
|
-
|
|
197
|
-
The easiest way to run RefChecker is with Docker. Pre-built images are automatically published to GitHub Container Registry on every release.
|
|
198
|
-
|
|
199
|
-
### Quick Start
|
|
200
|
-
|
|
201
|
-
```bash
|
|
202
|
-
# Pull and run the latest image (works on Intel/AMD and Apple Silicon)
|
|
203
|
-
docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
Open **http://localhost:8000** in your browser.
|
|
207
|
-
|
|
208
|
-
### With LLM API Keys
|
|
209
|
-
|
|
210
|
-
To use LLM-powered reference extraction (recommended for best accuracy), pass your API key:
|
|
211
|
-
|
|
212
|
-
```bash
|
|
213
|
-
# Using Anthropic Claude (recommended)
|
|
214
|
-
docker run -p 8000:8000 \
|
|
215
|
-
-e ANTHROPIC_API_KEY=your_key_here \
|
|
216
|
-
ghcr.io/markrussinovich/refchecker:latest
|
|
217
|
-
|
|
218
|
-
# Using OpenAI
|
|
219
|
-
docker run -p 8000:8000 \
|
|
220
|
-
-e OPENAI_API_KEY=your_key_here \
|
|
221
|
-
ghcr.io/markrussinovich/refchecker:latest
|
|
222
|
-
|
|
223
|
-
# Using Google Gemini
|
|
224
|
-
docker run -p 8000:8000 \
|
|
225
|
-
-e GOOGLE_API_KEY=your_key_here \
|
|
226
|
-
ghcr.io/markrussinovich/refchecker:latest
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
### Persistent Data
|
|
230
|
-
|
|
231
|
-
To persist check history and settings between container restarts:
|
|
232
|
-
|
|
233
|
-
```bash
|
|
234
|
-
docker run -p 8000:8000 \
|
|
235
|
-
-v refchecker-data:/app/data \
|
|
236
|
-
-e ANTHROPIC_API_KEY=your_key_here \
|
|
237
|
-
ghcr.io/markrussinovich/refchecker:latest
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
### Using Docker Compose
|
|
241
|
-
|
|
242
|
-
For easier configuration, clone the repo and use the included `docker-compose.yml`:
|
|
243
|
-
|
|
244
|
-
```bash
|
|
245
|
-
git clone https://github.com/markrussinovich/refchecker.git
|
|
246
|
-
cd refchecker
|
|
247
|
-
|
|
248
|
-
# Copy the example environment file and add your API keys
|
|
249
|
-
cp .env.example .env
|
|
250
|
-
# Edit .env with your API keys
|
|
251
|
-
|
|
252
|
-
# Start RefChecker
|
|
253
|
-
docker compose up
|
|
254
|
-
|
|
255
|
-
# Or run in background
|
|
256
|
-
docker compose up -d
|
|
257
|
-
```
|
|
258
|
-
|
|
259
|
-
### GPU Support for Local Models
|
|
260
|
-
|
|
261
|
-
For running local LLMs with vLLM (no API keys needed), use the GPU-enabled image:
|
|
262
|
-
|
|
263
|
-
```bash
|
|
264
|
-
# Pull the GPU image (amd64/NVIDIA only)
|
|
265
|
-
docker pull ghcr.io/markrussinovich/refchecker:gpu
|
|
266
|
-
|
|
267
|
-
# Run with GPU access
|
|
268
|
-
docker run --gpus all -p 8000:8000 \
|
|
269
|
-
-v refchecker-data:/app/data \
|
|
270
|
-
-v refchecker-models:/app/models \
|
|
271
|
-
ghcr.io/markrussinovich/refchecker:gpu
|
|
272
|
-
```
|
|
273
|
-
|
|
274
|
-
Or with Docker Compose:
|
|
275
|
-
|
|
276
|
-
```bash
|
|
277
|
-
# Start with GPU profile
|
|
278
|
-
docker compose --profile gpu up
|
|
279
|
-
```
|
|
280
|
-
|
|
281
|
-
### Available Image Tags
|
|
282
|
-
|
|
283
|
-
| Tag | Description | Architectures | Size |
|
|
284
|
-
|-----|-------------|---------------|------|
|
|
285
|
-
| `latest` | Latest stable release with cloud LLM support | amd64, arm64 | ~800MB |
|
|
286
|
-
| `X.Y.Z` | Specific version (e.g., `1.2.50`) | amd64, arm64 | ~800MB |
|
|
287
|
-
| `gpu` | GPU-enabled with vLLM/PyTorch for local models | amd64 only | ~12GB |
|
|
288
|
-
| `X.Y.Z-gpu` | Specific GPU version | amd64 only | ~12GB |
|
|
289
|
-
|
|
290
|
-
> **Note**: The `gpu` images are amd64 only because CUDA is not available on ARM. For Apple Silicon Macs, use the standard `latest` tag with cloud LLM APIs.
|
|
291
|
-
|
|
292
|
-
### Building Locally
|
|
293
|
-
|
|
294
|
-
```bash
|
|
295
|
-
# Build standard image
|
|
296
|
-
make docker-build
|
|
297
|
-
|
|
298
|
-
# Build GPU image
|
|
299
|
-
make docker-build-gpu
|
|
300
|
-
|
|
301
|
-
# Run locally
|
|
302
|
-
make docker-run
|
|
303
|
-
|
|
304
|
-
# Test the build
|
|
305
|
-
make docker-test
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
## 🌐 Web UI
|
|
309
|
-
|
|
310
|
-
RefChecker includes a modern web interface with real-time progress updates, check history, and export options.
|
|
311
|
-
|
|
312
|
-

|
|
313
|
-
|
|
314
|
-
### Features
|
|
315
|
-
|
|
316
|
-
- ✨ Real-time validation with live progress updates
|
|
317
|
-
- 📄 Support for ArXiv URLs and file uploads (PDF, LaTeX, text)
|
|
318
|
-
- 📊 Live statistics with filtering by status
|
|
319
|
-
- 📋 Export references as Markdown, plain text, or BibTeX (with corrected values)
|
|
320
|
-
- 📚 Persistent check history
|
|
321
|
-
- 🌓 Automatic dark/light mode
|
|
322
|
-
|
|
323
|
-
### Option 1: Docker (See Above)
|
|
324
|
-
|
|
325
|
-
The Docker image includes the complete Web UI - just run the container and open http://localhost:8000.
|
|
326
|
-
|
|
327
|
-
### Option 2: Install from PyPI
|
|
328
|
-
|
|
329
|
-
```bash
|
|
330
|
-
# Install RefChecker with Web UI support
|
|
331
|
-
pip install academic-refchecker[llm,webui]
|
|
332
|
-
|
|
333
|
-
# Start the web server
|
|
334
|
-
refchecker-webui
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
Then open **http://localhost:8000** in your browser.
|
|
338
|
-
|
|
339
|
-
**Options:**
|
|
340
|
-
```bash
|
|
341
|
-
refchecker-webui --port 8080 # Use a different port
|
|
342
|
-
refchecker-webui --host 0.0.0.0 # Allow external connections
|
|
343
|
-
```
|
|
344
|
-
|
|
345
|
-
### Option 3: Run from Cloned Repository (Development)
|
|
346
|
-
|
|
347
|
-
If you're developing or modifying the Web UI:
|
|
348
|
-
|
|
349
|
-
**Prerequisites:**
|
|
350
|
-
- **Python 3.8+** with dependencies installed
|
|
351
|
-
- **Node.js 18+** and npm
|
|
352
|
-
|
|
353
|
-
```bash
|
|
354
|
-
# Clone the repository
|
|
355
|
-
git clone https://github.com/markrussinovich/refchecker.git
|
|
356
|
-
cd refchecker
|
|
357
|
-
|
|
358
|
-
# Install Python dependencies
|
|
359
|
-
pip install -e ".[llm,webui]"
|
|
360
|
-
|
|
361
|
-
# Install and run the frontend development server
|
|
362
|
-
cd web-ui
|
|
363
|
-
npm install # First time only
|
|
364
|
-
npm start # Starts both backend and frontend
|
|
365
|
-
```
|
|
366
|
-
|
|
367
|
-
Then open **http://localhost:5173** in your browser.
|
|
368
|
-
|
|
369
|
-
**Alternative: Start Servers Separately**
|
|
370
|
-
|
|
371
|
-
*Terminal 1 - Backend:*
|
|
372
|
-
```bash
|
|
373
|
-
python -m uvicorn backend.main:app --reload --port 8000
|
|
374
|
-
```
|
|
375
|
-
|
|
376
|
-
*Terminal 2 - Frontend:*
|
|
377
|
-
```bash
|
|
378
|
-
cd web-ui
|
|
379
|
-
npm run dev
|
|
380
|
-
```
|
|
381
|
-
|
|
382
|
-
For complete Web UI documentation, see **[web-ui/README.md](web-ui/README.md)**.
|
|
383
|
-
|
|
384
|
-
## 🤖 LLM-Enhanced Reference Extraction
|
|
385
|
-
|
|
386
|
-
RefChecker supports AI-powered bibliography parsing using Large Language Models (LLMs) for improved accuracy with complex citation formats. While models as small as Llama 3.1-8B are fairly reliable at reference extraction, they can struggle with non-standard bibliographies. GPT-4o frequently hallucinates DOIs while Sonnet 4 has shown the best performance on large, complex bibliographies.
|
|
387
|
-
|
|
388
|
-
### Supported LLM Providers
|
|
389
|
-
|
|
390
|
-
- **OpenAI** e.g., GPT-4.1, o3
|
|
391
|
-
- **Anthropic** e.g., Claude Sonnet 4
|
|
392
|
-
- **Google** e.g., Gemini 2.5
|
|
393
|
-
- **Azure OpenAI** e.g., GPT-4o, o3
|
|
394
|
-
- **vLLM** e.g., Local Hugging Face models via OpenAI-compatible server
|
|
395
|
-
|
|
396
|
-
### Quick LLM Setup
|
|
397
|
-
|
|
398
|
-
1. **Using Environment Variables**:
|
|
399
|
-
```bash
|
|
400
|
-
# Enable LLM with Anthropic Claude
|
|
401
|
-
export REFCHECKER_USE_LLM=true
|
|
402
|
-
export REFCHECKER_LLM_PROVIDER=anthropic
|
|
403
|
-
export ANTHROPIC_API_KEY=your_api_key_here
|
|
404
|
-
|
|
405
|
-
python run_refchecker.py --paper 1706.03762
|
|
406
|
-
```
|
|
407
|
-
|
|
408
|
-
2. **Using Command Line Arguments**:
|
|
409
|
-
```bash
|
|
410
|
-
# Enable LLM with specific provider and model
|
|
411
|
-
python run_refchecker.py --paper 1706.03762 \
|
|
412
|
-
--llm-provider anthropic \
|
|
413
|
-
--llm-model claude-sonnet-4-20250514 \
|
|
414
|
-
```
|
|
415
|
-
API keys are obtained from environment variables, or if not found, the tool will prompt you interactively to enter them securely.
|
|
416
|
-
|
|
417
|
-
### LLM Examples
|
|
418
|
-
|
|
419
|
-
#### OpenAI GPT-4
|
|
420
|
-
|
|
421
|
-
With `OPENAI_API_KEY` environment variable:
|
|
422
|
-
|
|
423
|
-
```bash
|
|
424
|
-
python run_refchecker.py --paper /path/to/paper.pdf \
|
|
425
|
-
--llm-provider openai \
|
|
426
|
-
--llm-model gpt-4o \
|
|
427
|
-
```
|
|
428
|
-
|
|
429
|
-
#### Anthropic Claude
|
|
430
|
-
|
|
431
|
-
With `ANTHROPIC_API_KEY` environment variable:
|
|
432
|
-
|
|
433
|
-
```bash
|
|
434
|
-
python run_refchecker.py --paper https://arxiv.org/abs/1706.03762 \
|
|
435
|
-
--llm-provider anthropic \
|
|
436
|
-
--llm-model claude-sonnet-4-20250514 \
|
|
437
|
-
```
|
|
438
|
-
|
|
439
|
-
#### Google Gemini
|
|
440
|
-
|
|
441
|
-
With `GOOGLE_API_KEY` environment variable:
|
|
442
|
-
|
|
443
|
-
```bash
|
|
444
|
-
python run_refchecker.py --paper paper.tex \
|
|
445
|
-
--llm-provider google \
|
|
446
|
-
--llm-model gemini-2.5-flash
|
|
447
|
-
```
|
|
448
|
-
|
|
449
|
-
#### Azure OpenAI
|
|
450
|
-
|
|
451
|
-
With `AZURE_OPENAI_API_KEY` environment variable:
|
|
452
|
-
|
|
453
|
-
```bash
|
|
454
|
-
python run_refchecker.py --paper paper.txt \
|
|
455
|
-
--llm-provider azure \
|
|
456
|
-
--llm-model gpt-4 \
|
|
457
|
-
--llm-endpoint https://your-resource.openai.azure.com/
|
|
458
|
-
```
|
|
459
|
-
|
|
460
|
-
#### vLLM (Local Models)
|
|
461
|
-
|
|
462
|
-
For running models locally:
|
|
463
|
-
|
|
464
|
-
```bash
|
|
465
|
-
# automatic Huggingface model download with VLLM server launch
|
|
466
|
-
python run_refchecker.py --paper paper.pdf \
|
|
467
|
-
--llm-provider vllm \
|
|
468
|
-
--llm-model meta-llama/Llama-3.1-8B-Instruct
|
|
469
|
-
```
|
|
470
|
-
|
|
471
|
-
You can debug vllm server issues by running refchecker with the `--debug` flag.
|
|
472
|
-
|
|
473
|
-
## 📦 Installation
|
|
474
|
-
|
|
475
|
-
### Option 1: Docker (Recommended)
|
|
476
|
-
|
|
477
|
-
No installation required - just run the Docker image:
|
|
478
|
-
|
|
479
|
-
```bash
|
|
480
|
-
docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest
|
|
481
|
-
```
|
|
482
|
-
|
|
483
|
-
See [Docker](#-docker-recommended) section for full details on configuration, persistent data, and GPU support.
|
|
484
|
-
|
|
485
|
-
### Option 2: Install from PyPI
|
|
486
|
-
|
|
487
|
-
**Prerequisites:**
|
|
488
|
-
- **Python 3.8+** (3.10+ recommended)
|
|
489
|
-
|
|
490
|
-
For the latest stable release with all features:
|
|
491
|
-
|
|
492
|
-
```bash
|
|
493
|
-
pip install academic-refchecker[llm,webui]
|
|
494
|
-
```
|
|
495
|
-
|
|
496
|
-
This installs RefChecker with:
|
|
497
|
-
- **llm**: Support for OpenAI, Anthropic, Google, Azure, and vLLM providers
|
|
498
|
-
- **webui**: Web interface dependencies (FastAPI, uvicorn, etc.)
|
|
499
|
-
|
|
500
|
-
For a minimal installation (CLI only, no LLM or Web UI):
|
|
501
|
-
```bash
|
|
502
|
-
pip install academic-refchecker
|
|
503
|
-
```
|
|
504
|
-
|
|
505
|
-
Other optional extras:
|
|
506
|
-
- **dev**: Development tools (pytest, black, flake8, mypy)
|
|
507
|
-
- **optional**: Enhanced features (lxml, selenium, pikepdf, nltk, scikit-learn)
|
|
508
|
-
- **vllm**: Local model inference with vLLM
|
|
509
|
-
|
|
510
|
-
### Option 3: Install from Source
|
|
511
|
-
|
|
512
|
-
**Prerequisites:**
|
|
513
|
-
- **Python 3.8+** (3.10+ recommended)
|
|
514
|
-
- **Node.js 18+** and npm (only required for Web UI development)
|
|
515
|
-
|
|
516
|
-
#### 1. Clone the Repository
|
|
517
|
-
|
|
518
|
-
```bash
|
|
519
|
-
git clone https://github.com/markrussinovich/refchecker.git
|
|
520
|
-
cd refchecker
|
|
521
|
-
```
|
|
522
|
-
|
|
523
|
-
#### 2. Create and Activate Virtual Environment (Recommended)
|
|
524
|
-
|
|
525
|
-
```bash
|
|
526
|
-
python -m venv .venv
|
|
527
|
-
# On Windows:
|
|
528
|
-
.venv\Scripts\activate
|
|
529
|
-
# On macOS/Linux:
|
|
530
|
-
source .venv/bin/activate
|
|
531
|
-
```
|
|
532
|
-
|
|
533
|
-
#### 3. Install Dependencies
|
|
534
|
-
|
|
535
|
-
```bash
|
|
536
|
-
# Install all dependencies including LLM and Web UI support
|
|
537
|
-
pip install -e ".[llm,webui]"
|
|
538
|
-
|
|
539
|
-
# Or install from requirements.txt
|
|
540
|
-
pip install -r requirements.txt
|
|
541
|
-
```
|
|
542
|
-
|
|
543
|
-
#### 4. (Optional) Install Additional Dependencies
|
|
544
|
-
|
|
545
|
-
For enhanced performance and LLM support, you can install optional dependencies:
|
|
546
|
-
|
|
547
|
-
```bash
|
|
548
|
-
# For LLM providers
|
|
549
|
-
pip install openai # For OpenAI GPT models
|
|
550
|
-
pip install anthropic # For Anthropic Claude models
|
|
551
|
-
pip install google-generativeai # For Google Gemini models
|
|
552
|
-
|
|
553
|
-
# For faster XML/HTML parsing
|
|
554
|
-
pip install lxml
|
|
555
|
-
|
|
556
|
-
# For dynamic web scraping (if needed)
|
|
557
|
-
pip install selenium
|
|
558
|
-
|
|
559
|
-
# For better PDF processing
|
|
560
|
-
pip install pikepdf
|
|
561
|
-
```
|
|
562
|
-
|
|
563
|
-
### Web UI Installation
|
|
564
|
-
|
|
565
|
-
The Web UI requires Node.js 18+ in addition to the Python dependencies:
|
|
566
|
-
|
|
567
|
-
```bash
|
|
568
|
-
cd web-ui
|
|
569
|
-
npm install
|
|
570
|
-
```
|
|
571
|
-
|
|
572
|
-
## 📖 Usage
|
|
573
|
-
|
|
574
|
-
Check papers in various formats and online locations:
|
|
575
|
-
|
|
576
|
-
#### ArXiv Papers
|
|
577
|
-
|
|
578
|
-
```bash
|
|
579
|
-
# Check a specific ArXiv paper by ID
|
|
580
|
-
python run_refchecker.py --paper 1706.03762
|
|
581
|
-
|
|
582
|
-
# Check by ArXiv URL
|
|
583
|
-
python run_refchecker.py --paper https://arxiv.org/abs/1706.03762
|
|
584
|
-
|
|
585
|
-
# Check by ArXiv PDF URL
|
|
586
|
-
python run_refchecker.py --paper https://arxiv.org/pdf/1706.03762.pdf
|
|
587
|
-
```
|
|
588
|
-
|
|
589
|
-
#### Local PDF Files
|
|
590
|
-
|
|
591
|
-
```bash
|
|
592
|
-
# Check a local PDF file
|
|
593
|
-
python run_refchecker.py --paper /path/to/your/paper.pdf
|
|
594
|
-
|
|
595
|
-
# Check with offline database for faster processing
|
|
596
|
-
python run_refchecker.py --paper /path/to/your/paper.pdf --db-path semantic_scholar_db/semantic_scholar.db
|
|
597
|
-
```
|
|
598
|
-
|
|
599
|
-
#### LaTeX Files
|
|
600
|
-
|
|
601
|
-
```bash
|
|
602
|
-
# Check a LaTeX document
|
|
603
|
-
python run_refchecker.py --paper /path/to/your/paper.tex
|
|
604
|
-
|
|
605
|
-
# Check with debug mode for detailed processing info
|
|
606
|
-
python run_refchecker.py --paper /path/to/your/paper.tex --debug
|
|
607
|
-
```
|
|
608
|
-
|
|
609
|
-
#### Text Files
|
|
610
|
-
|
|
611
|
-
```bash
|
|
612
|
-
# Check a plain text file containing paper content
|
|
613
|
-
python run_refchecker.py --paper /path/to/your/paper.txt
|
|
614
|
-
|
|
615
|
-
# Combine with local database for offline verification
|
|
616
|
-
python run_refchecker.py --paper /path/to/your/paper.txt --db-path semantic_scholar_db/semantic_scholar.db
|
|
617
|
-
```
|
|
618
|
-
|
|
619
|
-
|
|
620
|
-
## 📊 Output and Results
|
|
621
|
-
|
|
622
|
-
### Generated Files
|
|
623
|
-
|
|
624
|
-
By default, no files are generated. To save detailed results, use the `--output-file` option:
|
|
625
|
-
|
|
626
|
-
```bash
|
|
627
|
-
# Save to default filename (reference_errors.txt)
|
|
628
|
-
python run_refchecker.py --paper 1706.03762 --output-file
|
|
629
|
-
|
|
630
|
-
# Save to custom filename
|
|
631
|
-
python run_refchecker.py --paper 1706.03762 --output-file my_errors.txt
|
|
632
|
-
```
|
|
633
|
-
|
|
634
|
-
The output file contains a detailed report of references with errors and warnings, including corrected references.
|
|
635
|
-
|
|
636
|
-
### Enhanced URL Display
|
|
637
|
-
|
|
638
|
-
RefChecker automatically discovers and displays authoritative URLs for verified references:
|
|
639
|
-
|
|
640
|
-
- **Verified URL**: The primary authoritative source (typically Semantic Scholar)
|
|
641
|
-
- **ArXiv URL**: Direct link to the ArXiv preprint when available
|
|
642
|
-
- **DOI URL**: Digital Object Identifier link when available
|
|
643
|
-
- **Additional URLs**: Other relevant sources discovered during verification
|
|
644
|
-
|
|
645
|
-
This enhanced URL display helps users access multiple authoritative sources for each reference and provides comprehensive citation information.
|
|
646
|
-
|
|
647
|
-
### Error Types
|
|
648
|
-
|
|
649
|
-
- **❌ Errors**: Critical issues that need correction
|
|
650
|
-
- `author`: Author name mismatches
|
|
651
|
-
```
|
|
652
|
-
[16/19] Bag of tricks: Benchmarking of jailbreak attacks on llms
|
|
653
|
-
T. Xie, X. Qi, Y. Zeng, Y. Huang, U. M. Sehwag, K. Huang, L. He, B. Wei, D. Li, Y. Sheng et al
|
|
654
|
-
|
|
655
|
-
Verified URL: https://www.semanticscholar.org/paper/a1b2c3d4e5f6789012345678901234567890abcd
|
|
656
|
-
ArXiv URL: https://arxiv.org/abs/2312.02119
|
|
657
|
-
DOI URL: https://doi.org/10.48550/arxiv.2312.02119
|
|
658
|
-
❌ Error: First author mismatch:
|
|
659
|
-
cited: 'T. Xie'
|
|
660
|
-
actual: 'Zhao Xu'
|
|
661
|
-
```
|
|
662
|
-
- `title`: Title discrepancies
|
|
663
|
-
```
|
|
664
|
-
[8/19] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
|
665
|
-
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova
|
|
666
|
-
|
|
667
|
-
Verified URL: https://www.semanticscholar.org/paper/df2b0e26d0599ce3e70df8a9da02e51594e0e992
|
|
668
|
-
ArXiv URL: https://arxiv.org/abs/1810.04805
|
|
669
|
-
DOI URL: https://doi.org/10.18653/v1/n19-1423
|
|
670
|
-
❌ Error: Title mismatch:
|
|
671
|
-
cited: 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'
|
|
672
|
-
actual: 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
|
|
673
|
-
```
|
|
674
|
-
- `arxiv_id`: Incorrect URLs or arXiv IDs
|
|
675
|
-
```
|
|
676
|
-
[5/19] Jbshield: Neural representation-level defense against adversarial prompts in large language models
|
|
677
|
-
W. Zhang, M. Li, H. Wang
|
|
678
|
-
https://arxiv.org/abs/2503.01234
|
|
679
|
-
|
|
680
|
-
Verified URL: https://www.semanticscholar.org/paper/e1f2a3b4c5d6e7f8901234567890123456789012
|
|
681
|
-
DOI URL: https://doi.org/10.48550/arxiv.2401.12345
|
|
682
|
-
❌ Error: Incorrect ArXiv ID: ArXiv ID 2503.01234 points to 'Self-Adaptive Gamma Context-Aware SSM-based Model for Metal Defect Detection'
|
|
683
|
-
```
|
|
684
|
-
- `doi`: DOI mismatches
|
|
685
|
-
```
|
|
686
|
-
[12/19] Attention Is All You Need
|
|
687
|
-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
|
|
688
|
-
Neural Information Processing Systems
|
|
689
|
-
2017
|
|
690
|
-
https://doi.org/10.5555/3295222.3295349
|
|
691
|
-
|
|
692
|
-
Verified URL: https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776
|
|
693
|
-
ArXiv URL: https://arxiv.org/abs/1706.03762
|
|
694
|
-
DOI URL: https://doi.org/10.48550/arXiv.1706.03762
|
|
695
|
-
❌ Error: DOI mismatch:
|
|
696
|
-
cited: '10.5555/3295222.3295349'
|
|
697
|
-
actual: '10.48550/arXiv.1706.03762'
|
|
698
|
-
```
|
|
699
|
-
|
|
700
|
-
- **⚠️ Warnings**: Minor issues that may need attention
|
|
701
|
-
- `year`: Publication year differences (common due to multiple paper versions)
|
|
702
|
-
```
|
|
703
|
-
[14/19] Smoothllm: Defending large language models against jailbreaking attacks
|
|
704
|
-
A. Robey, E. Wong, H. Hassani, G. J. Pappas
|
|
705
|
-
2024
|
|
706
|
-
|
|
707
|
-
Verified URL: https://www.semanticscholar.org/paper/f1a2b3c4d5e6f7890123456789012345678901ab
|
|
708
|
-
ArXiv URL: https://arxiv.org/abs/2310.03684
|
|
709
|
-
DOI URL: https://doi.org/10.48550/arxiv.2310.03684
|
|
710
|
-
⚠️ Warning: Year mismatch:
|
|
711
|
-
cited: '2024'
|
|
712
|
-
actual: '2023'
|
|
713
|
-
```
|
|
714
|
-
- `venue`: Venue format variations
|
|
715
|
-
```
|
|
716
|
-
[2/19] Gradient cuff: Detecting jailbreak attacks on large language models by exploring refusal loss landscapes
|
|
717
|
-
X. Hu, P.-Y. Chen, T.-Y. Ho
|
|
718
|
-
arXiv, 2024
|
|
719
|
-
|
|
720
|
-
Verified URL: https://www.semanticscholar.org/paper/c1d2e3f4a5b6c7d8e9f0123456789012345678ab
|
|
721
|
-
ArXiv URL: https://arxiv.org/abs/2403.02151
|
|
722
|
-
DOI URL: https://doi.org/10.48550/arxiv.2403.02151
|
|
723
|
-
⚠️ Warning: Venue mismatch:
|
|
724
|
-
cited: 'arXiv, 2024'
|
|
725
|
-
actual: 'Neural Information Processing Systems'
|
|
726
|
-
```
|
|
727
|
-
|
|
728
|
-
- **❓ Unverified**: References that couldn't be verified with any of the checker APIs
|
|
729
|
-
```
|
|
730
|
-
[15/19] Llama guard: A fine-tuned safety model for prompt moderation
|
|
731
|
-
M. A. Research
|
|
732
|
-
❓ Could not verify: Llama guard: A fine-tuned safety model for prompt moderation
|
|
733
|
-
Cited as: M. A. Research (2024)
|
|
734
|
-
URL: https://research.meta.com/publications/llama-guard-a-fine-tuned-safety-model-for-prompt-moderation/
|
|
735
|
-
```
|
|
736
|
-
|
|
737
|
-
## ⚙️ Configuration
|
|
738
|
-
|
|
739
|
-
### Command Line Arguments
|
|
740
|
-
|
|
741
|
-
```bash
|
|
742
|
-
# Basic options
|
|
743
|
-
--paper PAPER # Paper to check (ArXiv ID, URL, or file path)
|
|
744
|
-
--debug # Enable debug mode
|
|
745
|
-
--semantic-scholar-api-key KEY # Semantic Scholar API key (1-2s vs 5-10s without key; can also use SEMANTIC_SCHOLAR_API_KEY env var)
|
|
746
|
-
--db-path PATH # Local database path
|
|
747
|
-
--output-file [PATH] # Path to output file for reference discrepancies (default: reference_errors.txt if flag provided, no file if not provided)
|
|
748
|
-
|
|
749
|
-
# LLM options
|
|
750
|
-
--llm-provider {openai,anthropic,google,azure,vllm} # Enable LLM with provider
|
|
751
|
-
--llm-model MODEL # Override default model
|
|
752
|
-
--llm-endpoint URL # Override endpoint (for Azure/vLLM)
|
|
753
|
-
```
|
|
754
|
-
|
|
755
|
-
### API Key Handling
|
|
756
|
-
|
|
757
|
-
The refchecker tool automatically handles API keys for LLM providers in the following order:
|
|
758
|
-
|
|
759
|
-
1. **Environment Variables** (recommended): The tool checks for provider-specific environment variables
|
|
760
|
-
2. **Interactive Prompts**: If no API key is found in environment variables, the tool will securely prompt you to enter it
|
|
761
|
-
|
|
762
|
-
When you use an LLM provider without setting the corresponding environment variable, you'll see a prompt like:
|
|
763
|
-
```
|
|
764
|
-
OpenAI API key not found in environment variables.
|
|
765
|
-
Checked environment variables: REFCHECKER_OPENAI_API_KEY, OPENAI_API_KEY
|
|
766
|
-
Please enter your OpenAI API key (input will be hidden):
|
|
767
|
-
API key: [your input is hidden]
|
|
768
|
-
```
|
|
769
|
-
|
|
770
|
-
This approach ensures your API keys are never exposed in command line history while providing a seamless user experience.
|
|
771
|
-
|
|
772
|
-
### Environment Variables
|
|
773
|
-
|
|
774
|
-
```bash
|
|
775
|
-
# Enable/disable LLM
|
|
776
|
-
export REFCHECKER_USE_LLM=true
|
|
777
|
-
|
|
778
|
-
# Provider selection
|
|
779
|
-
export REFCHECKER_LLM_PROVIDER=anthropic # openai, anthropic, google, azure
|
|
780
|
-
|
|
781
|
-
# Semantic Scholar API key (for higher rate limits and faster verification: 1-2s vs 5-10s without key)
|
|
782
|
-
export SEMANTIC_SCHOLAR_API_KEY=your_key
|
|
783
|
-
|
|
784
|
-
# Provider-specific API keys (native environment variables preferred)
|
|
785
|
-
export OPENAI_API_KEY=your_key # or REFCHECKER_OPENAI_API_KEY
|
|
786
|
-
export ANTHROPIC_API_KEY=your_key # or REFCHECKER_ANTHROPIC_API_KEY
|
|
787
|
-
export GOOGLE_API_KEY=your_key # or REFCHECKER_GOOGLE_API_KEY
|
|
788
|
-
export AZURE_OPENAI_API_KEY=your_key # or REFCHECKER_AZURE_API_KEY
|
|
789
|
-
export AZURE_OPENAI_ENDPOINT=your_endpoint # or REFCHECKER_AZURE_ENDPOINT
|
|
790
|
-
|
|
791
|
-
# Model configuration
|
|
792
|
-
export REFCHECKER_LLM_MODEL=claude-sonnet-4-20250514
|
|
793
|
-
export REFCHECKER_LLM_MAX_TOKENS=4000
|
|
794
|
-
export REFCHECKER_LLM_TEMPERATURE=0.1
|
|
795
|
-
```
|
|
796
|
-
|
|
797
|
-
|
|
798
|
-
## 🗄️ Local Database Setup
|
|
799
|
-
|
|
800
|
-
### Downloading the Database
|
|
801
|
-
|
|
802
|
-
Create a local database for offline verification:
|
|
803
|
-
|
|
804
|
-
```bash
|
|
805
|
-
# Download recent computer science papers
|
|
806
|
-
python download_semantic_scholar_db.py \
|
|
807
|
-
--field "computer science" \
|
|
808
|
-
--start-year 2020 \
|
|
809
|
-
--end-year 2024 \
|
|
810
|
-
--batch-size 100
|
|
811
|
-
|
|
812
|
-
# Download papers matching a specific query
|
|
813
|
-
python download_semantic_scholar_db.py \
|
|
814
|
-
--query "attention is all you need" \
|
|
815
|
-
--batch-size 50
|
|
816
|
-
|
|
817
|
-
# Download with API key for higher rate limits
|
|
818
|
-
python download_semantic_scholar_db.py \
|
|
819
|
-
--api-key YOUR_API_KEY \
|
|
820
|
-
--field "machine learning" \
|
|
821
|
-
--start-year 2023
|
|
822
|
-
```
|
|
823
|
-
|
|
824
|
-
### Database Options
|
|
825
|
-
|
|
826
|
-
- **`--output-dir`**: Directory to store database (default: `semantic_scholar_db`)
|
|
827
|
-
- **`--batch-size`**: Papers per batch (default: 100)
|
|
828
|
-
- **`--api-key`**: Semantic Scholar API key for higher limits
|
|
829
|
-
- **`--fields`**: Metadata fields to include
|
|
830
|
-
- **`--query`**: Search query for specific papers
|
|
831
|
-
- **`--start-year`/`--end-year`**: Year range filter
|
|
832
|
-
|
|
833
|
-
## 🧪 Testing
|
|
834
|
-
|
|
835
|
-
RefChecker includes a comprehensive test suite with **490+ tests** covering unit, integration, and end-to-end scenarios. The tests ensure reliability across all components and provide examples of how to use the system.
|
|
836
|
-
|
|
837
|
-
### Quick Test Run
|
|
838
|
-
|
|
839
|
-
```bash
|
|
840
|
-
# Run all tests
|
|
841
|
-
pytest tests/
|
|
842
|
-
|
|
843
|
-
# Run specific test categories
|
|
844
|
-
pytest tests/unit/ # Unit tests only
|
|
845
|
-
pytest tests/integration/ # Integration tests only
|
|
846
|
-
pytest tests/e2e/ # End-to-end tests only
|
|
847
|
-
|
|
848
|
-
# Run with coverage
|
|
849
|
-
pytest --cov=src --cov-report=html tests/
|
|
850
|
-
|
|
851
|
-
# Run tests in parallel (if pytest-xdist installed)
|
|
852
|
-
pytest -n auto tests/
|
|
853
|
-
```
|
|
854
|
-
|
|
855
|
-
### Test Categories
|
|
856
|
-
|
|
857
|
-
- **Unit Tests** Individual components like text utilities, error handling, and reference extraction
|
|
858
|
-
- **Integration Tests** API interactions, LLM providers, and component integration
|
|
859
|
-
- **End-to-End Tests** Complete workflows, performance testing, and edge cases
|
|
860
|
-
|
|
861
|
-
### Test Structure
|
|
862
|
-
|
|
863
|
-
```
|
|
864
|
-
tests/
|
|
865
|
-
├── unit/ # Unit tests for individual components
|
|
866
|
-
├── integration/ # Integration tests for APIs and services
|
|
867
|
-
├── e2e/ # End-to-end workflow tests
|
|
868
|
-
├── fixtures/ # Test data and mock objects
|
|
869
|
-
└── README.md # Detailed testing documentation
|
|
870
|
-
```
|
|
871
|
-
|
|
872
|
-
For detailed testing information, test execution options, and guidance on writing new tests, see the **[Testing Documentation](tests/README.md)**.
|
|
873
|
-
|
|
874
|
-
|
|
875
|
-
## 📄 License
|
|
876
|
-
|
|
877
|
-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
File without changes
|
{academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/entry_points.txt
RENAMED
|
File without changes
|
{academic_refchecker-2.0.18.dist-info → academic_refchecker-2.0.19.dist-info}/licenses/LICENSE
RENAMED
|
File without changes
|
|
File without changes
|