EvoScientist 0.0.1.dev3__py3-none-any.whl → 0.1.0rc1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- EvoScientist/EvoScientist.py +17 -49
- EvoScientist/backends.py +0 -26
- EvoScientist/cli.py +1109 -255
- EvoScientist/middleware.py +8 -61
- EvoScientist/stream/__init__.py +0 -25
- EvoScientist/stream/utils.py +16 -23
- EvoScientist/tools.py +0 -64
- evoscientist-0.1.0rc1.dist-info/METADATA +199 -0
- evoscientist-0.1.0rc1.dist-info/RECORD +21 -0
- evoscientist-0.1.0rc1.dist-info/entry_points.txt +2 -0
- EvoScientist/memory.py +0 -715
- EvoScientist/paths.py +0 -45
- EvoScientist/skills/accelerate/SKILL.md +0 -332
- EvoScientist/skills/accelerate/references/custom-plugins.md +0 -453
- EvoScientist/skills/accelerate/references/megatron-integration.md +0 -489
- EvoScientist/skills/accelerate/references/performance.md +0 -525
- EvoScientist/skills/bitsandbytes/SKILL.md +0 -411
- EvoScientist/skills/bitsandbytes/references/memory-optimization.md +0 -521
- EvoScientist/skills/bitsandbytes/references/qlora-training.md +0 -521
- EvoScientist/skills/bitsandbytes/references/quantization-formats.md +0 -447
- EvoScientist/skills/find-skills/SKILL.md +0 -133
- EvoScientist/skills/find-skills/scripts/install_skill.py +0 -211
- EvoScientist/skills/flash-attention/SKILL.md +0 -367
- EvoScientist/skills/flash-attention/references/benchmarks.md +0 -215
- EvoScientist/skills/flash-attention/references/transformers-integration.md +0 -293
- EvoScientist/skills/llama-cpp/SKILL.md +0 -258
- EvoScientist/skills/llama-cpp/references/optimization.md +0 -89
- EvoScientist/skills/llama-cpp/references/quantization.md +0 -213
- EvoScientist/skills/llama-cpp/references/server.md +0 -125
- EvoScientist/skills/lm-evaluation-harness/SKILL.md +0 -490
- EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +0 -490
- EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +0 -488
- EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +0 -602
- EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +0 -519
- EvoScientist/skills/ml-paper-writing/SKILL.md +0 -937
- EvoScientist/skills/ml-paper-writing/references/checklists.md +0 -361
- EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +0 -562
- EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +0 -367
- EvoScientist/skills/ml-paper-writing/references/sources.md +0 -159
- EvoScientist/skills/ml-paper-writing/references/writing-guide.md +0 -476
- EvoScientist/skills/ml-paper-writing/templates/README.md +0 -251
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +0 -534
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +0 -144
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +0 -952
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +0 -111
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +0 -1493
- EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +0 -315
- EvoScientist/skills/ml-paper-writing/templates/acl/README.md +0 -50
- EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +0 -312
- EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +0 -377
- EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +0 -101
- EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +0 -1940
- EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +0 -26
- EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +0 -70
- EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +0 -326
- EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +0 -3
- EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +0 -11
- EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +0 -1440
- EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
- EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +0 -218
- EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +0 -305
- EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +0 -485
- EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +0 -508
- EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +0 -1246
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +0 -485
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +0 -24
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +0 -1440
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +0 -246
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +0 -414
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +0 -508
- EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +0 -1246
- EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +0 -79
- EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +0 -201
- EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +0 -75
- EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
- EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +0 -662
- EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +0 -864
- EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +0 -1443
- EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +0 -767
- EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
- EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +0 -36
- EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +0 -53
- EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +0 -38
- EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +0 -382
- EvoScientist/skills/peft/SKILL.md +0 -431
- EvoScientist/skills/peft/references/advanced-usage.md +0 -514
- EvoScientist/skills/peft/references/troubleshooting.md +0 -480
- EvoScientist/skills/ray-data/SKILL.md +0 -326
- EvoScientist/skills/ray-data/references/integration.md +0 -82
- EvoScientist/skills/ray-data/references/transformations.md +0 -83
- EvoScientist/skills/skill-creator/LICENSE.txt +0 -202
- EvoScientist/skills/skill-creator/SKILL.md +0 -356
- EvoScientist/skills/skill-creator/references/output-patterns.md +0 -82
- EvoScientist/skills/skill-creator/references/workflows.md +0 -28
- EvoScientist/skills/skill-creator/scripts/init_skill.py +0 -303
- EvoScientist/skills/skill-creator/scripts/package_skill.py +0 -110
- EvoScientist/skills/skill-creator/scripts/quick_validate.py +0 -95
- EvoScientist/skills_manager.py +0 -392
- EvoScientist/stream/display.py +0 -604
- EvoScientist/stream/events.py +0 -415
- EvoScientist/stream/state.py +0 -343
- evoscientist-0.0.1.dev3.dist-info/METADATA +0 -321
- evoscientist-0.0.1.dev3.dist-info/RECORD +0 -113
- evoscientist-0.0.1.dev3.dist-info/entry_points.txt +0 -5
- {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/WHEEL +0 -0
- {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/licenses/LICENSE +0 -0
- {evoscientist-0.0.1.dev3.dist-info → evoscientist-0.1.0rc1.dist-info}/top_level.txt +0 -0
|
@@ -1,562 +0,0 @@
|
|
|
1
|
-
# Citation Management & Hallucination Prevention
|
|
2
|
-
|
|
3
|
-
This reference provides a complete workflow for managing citations programmatically, preventing AI-generated citation hallucinations, and maintaining clean bibliographies.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Contents
|
|
8
|
-
|
|
9
|
-
- [Why Citation Verification Matters](#why-citation-verification-matters)
|
|
10
|
-
- [Citation APIs Overview](#citation-apis-overview)
|
|
11
|
-
- [Verified Citation Workflow](#verified-citation-workflow)
|
|
12
|
-
- [Python Implementation](#python-implementation)
|
|
13
|
-
- [BibTeX Management](#bibtex-management)
|
|
14
|
-
- [Common Citation Formats](#common-citation-formats)
|
|
15
|
-
- [Troubleshooting](#troubleshooting)
|
|
16
|
-
|
|
17
|
-
---
|
|
18
|
-
|
|
19
|
-
## Why Citation Verification Matters
|
|
20
|
-
|
|
21
|
-
### The Hallucination Problem
|
|
22
|
-
|
|
23
|
-
Research has documented significant issues with AI-generated citations:
|
|
24
|
-
- **~40% error rate** in AI-generated citations (Enago Academy research)
|
|
25
|
-
- NeurIPS 2025 found **100+ hallucinated citations** slipped through review
|
|
26
|
-
- Common errors include:
|
|
27
|
-
- Fabricated paper titles with real author names
|
|
28
|
-
- Wrong publication venues or years
|
|
29
|
-
- Non-existent papers with plausible metadata
|
|
30
|
-
- Incorrect DOIs or arXiv IDs
|
|
31
|
-
|
|
32
|
-
### Consequences
|
|
33
|
-
|
|
34
|
-
- Desk rejection at some venues
|
|
35
|
-
- Loss of credibility with reviewers
|
|
36
|
-
- Potential retraction if published
|
|
37
|
-
- Wasted time chasing non-existent sources
|
|
38
|
-
|
|
39
|
-
### Solution
|
|
40
|
-
|
|
41
|
-
**Never generate citations from memory—always verify programmatically.**
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
|
-
## Citation APIs Overview
|
|
46
|
-
|
|
47
|
-
### Primary APIs
|
|
48
|
-
|
|
49
|
-
| API | Coverage | Rate Limits | Best For |
|
|
50
|
-
|-----|----------|-------------|----------|
|
|
51
|
-
| **Semantic Scholar** | 214M papers | 1 RPS (free key) | ML/AI papers, citation graphs |
|
|
52
|
-
| **CrossRef** | 140M+ DOIs | Polite pool with mailto | DOI lookup, BibTeX retrieval |
|
|
53
|
-
| **arXiv** | Preprints | 3-second delays | ML preprints, PDF access |
|
|
54
|
-
| **OpenAlex** | 240M+ works | 100K/day, 10 RPS | Open alternative to MAG |
|
|
55
|
-
|
|
56
|
-
### API Selection Guide
|
|
57
|
-
|
|
58
|
-
```
|
|
59
|
-
Need ML paper search? → Semantic Scholar
|
|
60
|
-
Have DOI, need BibTeX? → CrossRef content negotiation
|
|
61
|
-
Looking for preprint? → arXiv API
|
|
62
|
-
Need open data, bulk access? → OpenAlex
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
### No Official Google Scholar API
|
|
66
|
-
|
|
67
|
-
Google Scholar has no official API. Scraping violates ToS. Use SerpApi ($75-275/month) only if Semantic Scholar coverage is insufficient.
|
|
68
|
-
|
|
69
|
-
---
|
|
70
|
-
|
|
71
|
-
## Verified Citation Workflow
|
|
72
|
-
|
|
73
|
-
### 5-Step Process
|
|
74
|
-
|
|
75
|
-
```
|
|
76
|
-
1. SEARCH → Query Semantic Scholar with specific keywords
|
|
77
|
-
↓
|
|
78
|
-
2. VERIFY → Confirm paper exists in 2+ sources
|
|
79
|
-
↓
|
|
80
|
-
3. RETRIEVE → Get BibTeX via DOI content negotiation
|
|
81
|
-
↓
|
|
82
|
-
4. VALIDATE → Confirm the claim appears in source
|
|
83
|
-
↓
|
|
84
|
-
5. ADD → Add verified entry to .bib file
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
### Step 1: Search
|
|
88
|
-
|
|
89
|
-
Use Semantic Scholar for ML/AI papers:
|
|
90
|
-
|
|
91
|
-
```python
|
|
92
|
-
from semanticscholar import SemanticScholar
|
|
93
|
-
|
|
94
|
-
sch = SemanticScholar()
|
|
95
|
-
results = sch.search_paper("transformer attention mechanism", limit=10)
|
|
96
|
-
|
|
97
|
-
for paper in results:
|
|
98
|
-
print(f"Title: {paper.title}")
|
|
99
|
-
print(f"Year: {paper.year}")
|
|
100
|
-
print(f"DOI: {paper.externalIds.get('DOI', 'N/A')}")
|
|
101
|
-
print(f"arXiv: {paper.externalIds.get('ArXiv', 'N/A')}")
|
|
102
|
-
print(f"Citation count: {paper.citationCount}")
|
|
103
|
-
print("---")
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
### Step 2: Verify Existence
|
|
107
|
-
|
|
108
|
-
Confirm paper exists in at least two sources:
|
|
109
|
-
|
|
110
|
-
```python
|
|
111
|
-
import requests
|
|
112
|
-
|
|
113
|
-
def verify_paper(doi=None, arxiv_id=None, title=None):
|
|
114
|
-
"""Verify paper exists in multiple sources."""
|
|
115
|
-
sources_found = []
|
|
116
|
-
|
|
117
|
-
# Check Semantic Scholar
|
|
118
|
-
sch = SemanticScholar()
|
|
119
|
-
if doi:
|
|
120
|
-
paper = sch.get_paper(f"DOI:{doi}")
|
|
121
|
-
if paper:
|
|
122
|
-
sources_found.append("Semantic Scholar")
|
|
123
|
-
|
|
124
|
-
# Check CrossRef (via DOI)
|
|
125
|
-
if doi:
|
|
126
|
-
resp = requests.get(f"https://api.crossref.org/works/{doi}")
|
|
127
|
-
if resp.status_code == 200:
|
|
128
|
-
sources_found.append("CrossRef")
|
|
129
|
-
|
|
130
|
-
# Check arXiv
|
|
131
|
-
if arxiv_id:
|
|
132
|
-
resp = requests.get(
|
|
133
|
-
f"http://export.arxiv.org/api/query?id_list={arxiv_id}"
|
|
134
|
-
)
|
|
135
|
-
if "<entry>" in resp.text:
|
|
136
|
-
sources_found.append("arXiv")
|
|
137
|
-
|
|
138
|
-
return len(sources_found) >= 2, sources_found
|
|
139
|
-
```
|
|
140
|
-
|
|
141
|
-
### Step 3: Retrieve BibTeX
|
|
142
|
-
|
|
143
|
-
Use DOI content negotiation for guaranteed accuracy:
|
|
144
|
-
|
|
145
|
-
```python
|
|
146
|
-
import requests
|
|
147
|
-
|
|
148
|
-
def doi_to_bibtex(doi: str) -> str:
|
|
149
|
-
"""Get verified BibTeX from DOI via CrossRef content negotiation."""
|
|
150
|
-
response = requests.get(
|
|
151
|
-
f"https://doi.org/{doi}",
|
|
152
|
-
headers={"Accept": "application/x-bibtex"},
|
|
153
|
-
allow_redirects=True
|
|
154
|
-
)
|
|
155
|
-
response.raise_for_status()
|
|
156
|
-
return response.text
|
|
157
|
-
|
|
158
|
-
# Example: "Attention Is All You Need"
|
|
159
|
-
bibtex = doi_to_bibtex("10.48550/arXiv.1706.03762")
|
|
160
|
-
print(bibtex)
|
|
161
|
-
```
|
|
162
|
-
|
|
163
|
-
### Step 4: Validate Claims
|
|
164
|
-
|
|
165
|
-
Before citing a paper for a specific claim, verify the claim exists:
|
|
166
|
-
|
|
167
|
-
```python
|
|
168
|
-
def get_paper_abstract(doi):
|
|
169
|
-
"""Get abstract to verify claims."""
|
|
170
|
-
sch = SemanticScholar()
|
|
171
|
-
paper = sch.get_paper(f"DOI:{doi}")
|
|
172
|
-
return paper.abstract if paper else None
|
|
173
|
-
|
|
174
|
-
# Verify claim appears in abstract
|
|
175
|
-
abstract = get_paper_abstract("10.48550/arXiv.1706.03762")
|
|
176
|
-
claim = "attention mechanism"
|
|
177
|
-
if claim.lower() in abstract.lower():
|
|
178
|
-
print("Claim appears in paper")
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
### Step 5: Add to Bibliography
|
|
182
|
-
|
|
183
|
-
Add verified entry to your .bib file with consistent key format:
|
|
184
|
-
|
|
185
|
-
```python
|
|
186
|
-
def generate_citation_key(bibtex: str) -> str:
|
|
187
|
-
"""Generate consistent citation key: author_year_firstword."""
|
|
188
|
-
import re
|
|
189
|
-
|
|
190
|
-
# Extract author
|
|
191
|
-
author_match = re.search(r'author\s*=\s*\{([^}]+)\}', bibtex, re.I)
|
|
192
|
-
if author_match:
|
|
193
|
-
first_author = author_match.group(1).split(',')[0].split()[-1]
|
|
194
|
-
else:
|
|
195
|
-
first_author = "unknown"
|
|
196
|
-
|
|
197
|
-
# Extract year
|
|
198
|
-
year_match = re.search(r'year\s*=\s*\{?(\d{4})\}?', bibtex, re.I)
|
|
199
|
-
year = year_match.group(1) if year_match else "0000"
|
|
200
|
-
|
|
201
|
-
# Extract title first word
|
|
202
|
-
title_match = re.search(r'title\s*=\s*\{([^}]+)\}', bibtex, re.I)
|
|
203
|
-
if title_match:
|
|
204
|
-
first_word = title_match.group(1).split()[0].lower()
|
|
205
|
-
first_word = re.sub(r'[^a-z]', '', first_word)
|
|
206
|
-
else:
|
|
207
|
-
first_word = "paper"
|
|
208
|
-
|
|
209
|
-
return f"{first_author.lower()}_{year}_{first_word}"
|
|
210
|
-
```
|
|
211
|
-
|
|
212
|
-
---
|
|
213
|
-
|
|
214
|
-
## Python Implementation
|
|
215
|
-
|
|
216
|
-
### Complete Citation Manager Class
|
|
217
|
-
|
|
218
|
-
```python
|
|
219
|
-
"""
|
|
220
|
-
Citation Manager - Verified citation workflow for ML papers.
|
|
221
|
-
"""
|
|
222
|
-
|
|
223
|
-
import requests
|
|
224
|
-
import time
|
|
225
|
-
from typing import Optional, List, Dict, Tuple
|
|
226
|
-
from dataclasses import dataclass
|
|
227
|
-
|
|
228
|
-
try:
|
|
229
|
-
from semanticscholar import SemanticScholar
|
|
230
|
-
except ImportError:
|
|
231
|
-
print("Install: pip install semanticscholar")
|
|
232
|
-
SemanticScholar = None
|
|
233
|
-
|
|
234
|
-
@dataclass
|
|
235
|
-
class Paper:
|
|
236
|
-
title: str
|
|
237
|
-
authors: List[str]
|
|
238
|
-
year: int
|
|
239
|
-
doi: Optional[str]
|
|
240
|
-
arxiv_id: Optional[str]
|
|
241
|
-
venue: Optional[str]
|
|
242
|
-
citation_count: int
|
|
243
|
-
abstract: Optional[str]
|
|
244
|
-
|
|
245
|
-
class CitationManager:
|
|
246
|
-
"""Manage citations with verification."""
|
|
247
|
-
|
|
248
|
-
def __init__(self, api_key: Optional[str] = None):
|
|
249
|
-
self.sch = SemanticScholar(api_key=api_key) if SemanticScholar else None
|
|
250
|
-
self.verified_papers: Dict[str, Paper] = {}
|
|
251
|
-
|
|
252
|
-
def search(self, query: str, limit: int = 10) -> List[Paper]:
|
|
253
|
-
"""Search for papers using Semantic Scholar."""
|
|
254
|
-
if not self.sch:
|
|
255
|
-
raise RuntimeError("Semantic Scholar not available")
|
|
256
|
-
|
|
257
|
-
results = self.sch.search_paper(query, limit=limit)
|
|
258
|
-
papers = []
|
|
259
|
-
|
|
260
|
-
for r in results:
|
|
261
|
-
paper = Paper(
|
|
262
|
-
title=r.title,
|
|
263
|
-
authors=[a.name for a in (r.authors or [])],
|
|
264
|
-
year=r.year or 0,
|
|
265
|
-
doi=r.externalIds.get('DOI') if r.externalIds else None,
|
|
266
|
-
arxiv_id=r.externalIds.get('ArXiv') if r.externalIds else None,
|
|
267
|
-
venue=r.venue,
|
|
268
|
-
citation_count=r.citationCount or 0,
|
|
269
|
-
abstract=r.abstract
|
|
270
|
-
)
|
|
271
|
-
papers.append(paper)
|
|
272
|
-
|
|
273
|
-
return papers
|
|
274
|
-
|
|
275
|
-
def verify(self, paper: Paper) -> Tuple[bool, List[str]]:
|
|
276
|
-
"""Verify paper exists in multiple sources."""
|
|
277
|
-
sources = []
|
|
278
|
-
|
|
279
|
-
# Already found in Semantic Scholar via search
|
|
280
|
-
sources.append("Semantic Scholar")
|
|
281
|
-
|
|
282
|
-
# Check CrossRef if DOI available
|
|
283
|
-
if paper.doi:
|
|
284
|
-
try:
|
|
285
|
-
resp = requests.get(
|
|
286
|
-
f"https://api.crossref.org/works/{paper.doi}",
|
|
287
|
-
timeout=10
|
|
288
|
-
)
|
|
289
|
-
if resp.status_code == 200:
|
|
290
|
-
sources.append("CrossRef")
|
|
291
|
-
except:
|
|
292
|
-
pass
|
|
293
|
-
|
|
294
|
-
# Check arXiv if ID available
|
|
295
|
-
if paper.arxiv_id:
|
|
296
|
-
try:
|
|
297
|
-
resp = requests.get(
|
|
298
|
-
f"http://export.arxiv.org/api/query?id_list={paper.arxiv_id}",
|
|
299
|
-
timeout=10
|
|
300
|
-
)
|
|
301
|
-
if "<entry>" in resp.text and "<title>" in resp.text:
|
|
302
|
-
sources.append("arXiv")
|
|
303
|
-
except:
|
|
304
|
-
pass
|
|
305
|
-
|
|
306
|
-
return len(sources) >= 2, sources
|
|
307
|
-
|
|
308
|
-
def get_bibtex(self, paper: Paper) -> Optional[str]:
|
|
309
|
-
"""Get BibTeX for verified paper."""
|
|
310
|
-
if paper.doi:
|
|
311
|
-
try:
|
|
312
|
-
resp = requests.get(
|
|
313
|
-
f"https://doi.org/{paper.doi}",
|
|
314
|
-
headers={"Accept": "application/x-bibtex"},
|
|
315
|
-
timeout=10,
|
|
316
|
-
allow_redirects=True
|
|
317
|
-
)
|
|
318
|
-
if resp.status_code == 200:
|
|
319
|
-
return resp.text
|
|
320
|
-
except:
|
|
321
|
-
pass
|
|
322
|
-
|
|
323
|
-
# Fallback: generate from paper data
|
|
324
|
-
return self._generate_bibtex(paper)
|
|
325
|
-
|
|
326
|
-
def _generate_bibtex(self, paper: Paper) -> str:
|
|
327
|
-
"""Generate BibTeX from paper metadata."""
|
|
328
|
-
# Generate citation key
|
|
329
|
-
first_author = paper.authors[0].split()[-1] if paper.authors else "unknown"
|
|
330
|
-
first_word = paper.title.split()[0].lower().replace(',', '').replace(':', '')
|
|
331
|
-
key = f"{first_author.lower()}_{paper.year}_{first_word}"
|
|
332
|
-
|
|
333
|
-
# Format authors
|
|
334
|
-
authors = " and ".join(paper.authors) if paper.authors else "Unknown"
|
|
335
|
-
|
|
336
|
-
bibtex = f"""@article{{{key},
|
|
337
|
-
title = {{{paper.title}}},
|
|
338
|
-
author = {{{authors}}},
|
|
339
|
-
year = {{{paper.year}}},
|
|
340
|
-
{'doi = {' + paper.doi + '},' if paper.doi else ''}
|
|
341
|
-
{'eprint = {' + paper.arxiv_id + '},' if paper.arxiv_id else ''}
|
|
342
|
-
{'journal = {' + paper.venue + '},' if paper.venue else ''}
|
|
343
|
-
}}"""
|
|
344
|
-
return bibtex
|
|
345
|
-
|
|
346
|
-
def cite(self, query: str) -> Optional[str]:
|
|
347
|
-
"""Full workflow: search, verify, return BibTeX."""
|
|
348
|
-
# Search
|
|
349
|
-
papers = self.search(query, limit=5)
|
|
350
|
-
if not papers:
|
|
351
|
-
return None
|
|
352
|
-
|
|
353
|
-
# Take top result
|
|
354
|
-
paper = papers[0]
|
|
355
|
-
|
|
356
|
-
# Verify
|
|
357
|
-
verified, sources = self.verify(paper)
|
|
358
|
-
if not verified:
|
|
359
|
-
print(f"Warning: Could only verify in {sources}")
|
|
360
|
-
|
|
361
|
-
# Get BibTeX
|
|
362
|
-
bibtex = self.get_bibtex(paper)
|
|
363
|
-
|
|
364
|
-
# Cache
|
|
365
|
-
if bibtex:
|
|
366
|
-
self.verified_papers[paper.title] = paper
|
|
367
|
-
|
|
368
|
-
return bibtex
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
# Usage example
|
|
372
|
-
if __name__ == "__main__":
|
|
373
|
-
cm = CitationManager()
|
|
374
|
-
|
|
375
|
-
# Search and cite
|
|
376
|
-
bibtex = cm.cite("attention is all you need transformer")
|
|
377
|
-
if bibtex:
|
|
378
|
-
print(bibtex)
|
|
379
|
-
```
|
|
380
|
-
|
|
381
|
-
### Quick Functions
|
|
382
|
-
|
|
383
|
-
```python
|
|
384
|
-
def quick_cite(query: str) -> str:
|
|
385
|
-
"""One-liner citation."""
|
|
386
|
-
cm = CitationManager()
|
|
387
|
-
return cm.cite(query)
|
|
388
|
-
|
|
389
|
-
def batch_cite(queries: List[str], output_file: str = "references.bib"):
|
|
390
|
-
"""Cite multiple papers and save to file."""
|
|
391
|
-
cm = CitationManager()
|
|
392
|
-
bibtex_entries = []
|
|
393
|
-
|
|
394
|
-
for query in queries:
|
|
395
|
-
print(f"Processing: {query}")
|
|
396
|
-
bibtex = cm.cite(query)
|
|
397
|
-
if bibtex:
|
|
398
|
-
bibtex_entries.append(bibtex)
|
|
399
|
-
time.sleep(1) # Rate limiting
|
|
400
|
-
|
|
401
|
-
with open(output_file, 'w') as f:
|
|
402
|
-
f.write("\n\n".join(bibtex_entries))
|
|
403
|
-
|
|
404
|
-
print(f"Saved {len(bibtex_entries)} citations to {output_file}")
|
|
405
|
-
```
|
|
406
|
-
|
|
407
|
-
---
|
|
408
|
-
|
|
409
|
-
## BibTeX Management
|
|
410
|
-
|
|
411
|
-
### BibTeX vs BibLaTeX
|
|
412
|
-
|
|
413
|
-
| Feature | BibTeX | BibLaTeX |
|
|
414
|
-
|---------|--------|----------|
|
|
415
|
-
| Unicode support | Limited | Full |
|
|
416
|
-
| Entry types | Standard | Extended (@online, @dataset) |
|
|
417
|
-
| Customization | Limited | Highly flexible |
|
|
418
|
-
| Backend | bibtex | Biber (recommended) |
|
|
419
|
-
|
|
420
|
-
**Recommendation**: Use BibLaTeX with Biber for new papers.
|
|
421
|
-
|
|
422
|
-
### LaTeX Setup
|
|
423
|
-
|
|
424
|
-
```latex
|
|
425
|
-
% In preamble
|
|
426
|
-
\usepackage[
|
|
427
|
-
backend=biber,
|
|
428
|
-
style=numeric,
|
|
429
|
-
sorting=none
|
|
430
|
-
]{biblatex}
|
|
431
|
-
\addbibresource{references.bib}
|
|
432
|
-
|
|
433
|
-
% In document
|
|
434
|
-
\cite{vaswani_2017_attention}
|
|
435
|
-
|
|
436
|
-
% At end
|
|
437
|
-
\printbibliography
|
|
438
|
-
```
|
|
439
|
-
|
|
440
|
-
### Citation Commands
|
|
441
|
-
|
|
442
|
-
```latex
|
|
443
|
-
\cite{key} % Numeric: [1]
|
|
444
|
-
\citep{key} % Parenthetical: (Author, 2020)
|
|
445
|
-
\citet{key} % Textual: Author (2020)
|
|
446
|
-
\citeauthor{key} % Just author name
|
|
447
|
-
\citeyear{key} % Just year
|
|
448
|
-
```
|
|
449
|
-
|
|
450
|
-
### Consistent Citation Keys
|
|
451
|
-
|
|
452
|
-
Use format: `author_year_firstword`
|
|
453
|
-
|
|
454
|
-
```
|
|
455
|
-
vaswani_2017_attention
|
|
456
|
-
devlin_2019_bert
|
|
457
|
-
brown_2020_language
|
|
458
|
-
```
|
|
459
|
-
|
|
460
|
-
---
|
|
461
|
-
|
|
462
|
-
## Common Citation Formats
|
|
463
|
-
|
|
464
|
-
### Conference Paper
|
|
465
|
-
|
|
466
|
-
```bibtex
|
|
467
|
-
@inproceedings{vaswani_2017_attention,
|
|
468
|
-
title = {Attention Is All You Need},
|
|
469
|
-
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and
|
|
470
|
-
Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and
|
|
471
|
-
Kaiser, Lukasz and Polosukhin, Illia},
|
|
472
|
-
booktitle = {Advances in Neural Information Processing Systems},
|
|
473
|
-
volume = {30},
|
|
474
|
-
year = {2017},
|
|
475
|
-
publisher = {Curran Associates, Inc.}
|
|
476
|
-
}
|
|
477
|
-
```
|
|
478
|
-
|
|
479
|
-
### Journal Article
|
|
480
|
-
|
|
481
|
-
```bibtex
|
|
482
|
-
@article{hochreiter_1997_long,
|
|
483
|
-
title = {Long Short-Term Memory},
|
|
484
|
-
author = {Hochreiter, Sepp and Schmidhuber, J{\"u}rgen},
|
|
485
|
-
journal = {Neural Computation},
|
|
486
|
-
volume = {9},
|
|
487
|
-
number = {8},
|
|
488
|
-
pages = {1735--1780},
|
|
489
|
-
year = {1997},
|
|
490
|
-
publisher = {MIT Press}
|
|
491
|
-
}
|
|
492
|
-
```
|
|
493
|
-
|
|
494
|
-
### arXiv Preprint
|
|
495
|
-
|
|
496
|
-
```bibtex
|
|
497
|
-
@misc{brown_2020_language,
|
|
498
|
-
title = {Language Models are Few-Shot Learners},
|
|
499
|
-
author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and others},
|
|
500
|
-
year = {2020},
|
|
501
|
-
eprint = {2005.14165},
|
|
502
|
-
archiveprefix = {arXiv},
|
|
503
|
-
primaryclass = {cs.CL}
|
|
504
|
-
}
|
|
505
|
-
```
|
|
506
|
-
|
|
507
|
-
---
|
|
508
|
-
|
|
509
|
-
## Troubleshooting
|
|
510
|
-
|
|
511
|
-
### Common Issues
|
|
512
|
-
|
|
513
|
-
**Issue: Semantic Scholar returns no results**
|
|
514
|
-
- Try more specific keywords
|
|
515
|
-
- Check spelling of author names
|
|
516
|
-
- Use quotation marks for exact phrases
|
|
517
|
-
|
|
518
|
-
**Issue: DOI doesn't resolve to BibTeX**
|
|
519
|
-
- DOI may be registered but not linked to CrossRef
|
|
520
|
-
- Try arXiv ID instead if available
|
|
521
|
-
- Generate BibTeX from metadata manually
|
|
522
|
-
|
|
523
|
-
**Issue: Rate limiting errors**
|
|
524
|
-
- Add delays between requests (1-3 seconds)
|
|
525
|
-
- Use API key if available
|
|
526
|
-
- Cache results to avoid repeat queries
|
|
527
|
-
|
|
528
|
-
**Issue: Encoding problems in BibTeX**
|
|
529
|
-
- Use proper LaTeX escaping: `{\"u}` for ü
|
|
530
|
-
- Ensure file is UTF-8 encoded
|
|
531
|
-
- Use BibLaTeX with Biber for better Unicode
|
|
532
|
-
|
|
533
|
-
### Verification Checklist
|
|
534
|
-
|
|
535
|
-
Before adding a citation:
|
|
536
|
-
|
|
537
|
-
- [ ] Paper found in at least 2 sources
|
|
538
|
-
- [ ] DOI or arXiv ID verified
|
|
539
|
-
- [ ] BibTeX retrieved (not generated from memory)
|
|
540
|
-
- [ ] Entry type correct (@inproceedings vs @article)
|
|
541
|
-
- [ ] Author names complete and correctly formatted
|
|
542
|
-
- [ ] Year and venue verified
|
|
543
|
-
- [ ] Citation key follows consistent format
|
|
544
|
-
|
|
545
|
-
---
|
|
546
|
-
|
|
547
|
-
## Additional Resources
|
|
548
|
-
|
|
549
|
-
**APIs:**
|
|
550
|
-
- Semantic Scholar: https://api.semanticscholar.org/api-docs/
|
|
551
|
-
- CrossRef: https://www.crossref.org/documentation/retrieve-metadata/rest-api/
|
|
552
|
-
- arXiv: https://info.arxiv.org/help/api/basics.html
|
|
553
|
-
- OpenAlex: https://docs.openalex.org/
|
|
554
|
-
|
|
555
|
-
**Python Libraries:**
|
|
556
|
-
- `semanticscholar`: https://pypi.org/project/semanticscholar/
|
|
557
|
-
- `arxiv`: https://pypi.org/project/arxiv/
|
|
558
|
-
- `habanero` (CrossRef): https://github.com/sckott/habanero
|
|
559
|
-
|
|
560
|
-
**Verification Tools:**
|
|
561
|
-
- Citely: https://citely.ai/citation-checker
|
|
562
|
-
- ReciteWorks: https://reciteworks.com/
|