papergraph 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +229 -0
- package/dist/index.js +2695 -0
- package/package.json +63 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Dashanka De Silva
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
# π PaperGraph
|
|
2
|
+
|
|
3
|
+
**Build interactive research-paper connectivity graphs from any topic.**
|
|
4
|
+
|
|
5
|
+
PaperGraph is a command-line tool that discovers academic papers, traces their citation networks, computes text similarity, runs graph algorithms, and produces explorable visualizations β all from a single command.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## β¨ Motivation
|
|
10
|
+
|
|
11
|
+
Navigating academic literature is hard. A single topic can span thousands of papers across decades, and understanding *how* they connect β who cites whom, which share methods, which disagree β requires hours of manual work.
|
|
12
|
+
|
|
13
|
+
PaperGraph automates this:
|
|
14
|
+
|
|
15
|
+
1. **You provide a topic** (e.g., *"transformer attention mechanisms"*)
|
|
16
|
+
2. **It discovers papers** via OpenAlex or Semantic Scholar APIs
|
|
17
|
+
3. **It traces citations** through configurable BFS depth
|
|
18
|
+
4. **It computes relationships** β text similarity, co-citation, bibliographic coupling
|
|
19
|
+
5. **It ranks and clusters** papers using PageRank and Louvain community detection
|
|
20
|
+
6. **It produces outputs** β an interactive HTML viewer, JSON, GraphML, GEXF, CSV, or Mermaid diagrams
|
|
21
|
+
|
|
22
|
+
The result is a navigable knowledge graph that reveals the structure of a research field at a glance.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## ποΈ Architecture
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
30
|
+
β CLI (Commander) β
|
|
31
|
+
β build Β· export Β· view Β· inspect Β· cache β
|
|
32
|
+
βββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
|
|
33
|
+
β
|
|
34
|
+
βββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ
|
|
35
|
+
β Graph Builder β
|
|
36
|
+
β Orchestrates the full pipeline: β
|
|
37
|
+
β seed β traverse β NLP β algorithms β store β
|
|
38
|
+
ββββ¬ββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββ
|
|
39
|
+
β β β β
|
|
40
|
+
βΌ βΌ βΌ βΌ
|
|
41
|
+
ββββββββ ββββββββββ ββββββββββββ ββββββββββββ
|
|
42
|
+
βSource β β NLP β β Graph β β SQLite β
|
|
43
|
+
βAdapt.β βPipelineβ β Algos β β Storage β
|
|
44
|
+
ββββββββ€ ββββββββββ€ ββββββββββββ€ ββββββββββββ€
|
|
45
|
+
βOpenAlβ βTF-IDF β βPageRank β β10 tables β
|
|
46
|
+
β ex β βCosine β βLouvain β βWAL mode β
|
|
47
|
+
β S2 β βEntity β βCo-cite β βMigrationsβ
|
|
48
|
+
β β βExtract β βCoupling β β β
|
|
49
|
+
ββββ¬ββββ ββββββββββ βScoring β ββββββββββββ
|
|
50
|
+
β ββββββββββββ
|
|
51
|
+
βΌ
|
|
52
|
+
ββββββββββββββββββββ
|
|
53
|
+
β HTTP Client β
|
|
54
|
+
β Rate limiting β
|
|
55
|
+
β Retry + backoff β
|
|
56
|
+
β Token bucket β
|
|
57
|
+
ββββββββββββββββββββ
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Data Flow
|
|
61
|
+
|
|
62
|
+
```mermaid
|
|
63
|
+
graph LR
|
|
64
|
+
A["Topic / Papers / DOIs"] --> B["Seed Discovery"]
|
|
65
|
+
B --> C["BFS Citation Traversal"]
|
|
66
|
+
C --> D["TF-IDF Corpus"]
|
|
67
|
+
D --> E["Similarity Edges"]
|
|
68
|
+
C --> F["Co-Citation / Coupling"]
|
|
69
|
+
D --> G["PageRank + Louvain"]
|
|
70
|
+
E --> H["SQLite Database"]
|
|
71
|
+
F --> H
|
|
72
|
+
G --> H
|
|
73
|
+
H --> I["Exporters / Viewer"]
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## π Project Structure
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
Paper-Graph/
|
|
82
|
+
βββ src/
|
|
83
|
+
β βββ cli/ # CLI entry point (Commander)
|
|
84
|
+
β β βββ index.ts # 5 commands: build, export, view, inspect, cache
|
|
85
|
+
β β
|
|
86
|
+
β βββ builder/ # Graph build orchestrator
|
|
87
|
+
β β βββ graph-builder.ts # Full pipeline: seed β traverse β NLP β rank β store
|
|
88
|
+
β β
|
|
89
|
+
β βββ sources/ # API data source adapters
|
|
90
|
+
β β βββ openalex.ts # OpenAlex API adapter
|
|
91
|
+
β β βββ semantic-scholar.ts # Semantic Scholar API adapter
|
|
92
|
+
β β βββ utils.ts # Shared utilities (DOI stripping, title similarity)
|
|
93
|
+
β β
|
|
94
|
+
β βββ nlp/ # Natural language processing
|
|
95
|
+
β β βββ tokenizer.ts # Deterministic tokenization (no stemming)
|
|
96
|
+
β β βββ stopwords.ts # 175+ English + academic stopwords
|
|
97
|
+
β β βββ tfidf.ts # TF-IDF corpus building + topic relevance
|
|
98
|
+
β β βββ similarity.ts # Cosine similarity + edge generation
|
|
99
|
+
β β βββ entity-extraction.ts # Dictionary-based entity extraction
|
|
100
|
+
β β
|
|
101
|
+
β βββ graph/ # Graph algorithms
|
|
102
|
+
β β βββ algorithms.ts # PageRank, Louvain, co-citation, coupling
|
|
103
|
+
β β βββ scoring.ts # Composite ranking (PageRank + relevance + recency)
|
|
104
|
+
β β
|
|
105
|
+
β βββ storage/ # Persistence layer
|
|
106
|
+
β β βββ database.ts # SQLite via better-sqlite3 (10 tables, WAL mode)
|
|
107
|
+
β β
|
|
108
|
+
β βββ exporters/ # Output format exporters
|
|
109
|
+
β β βββ export.ts # JSON, GraphML, GEXF, CSV, Mermaid
|
|
110
|
+
β β
|
|
111
|
+
β βββ viewer/ # Interactive visualization
|
|
112
|
+
β β βββ html-viewer.ts # Self-contained Cytoscape.js HTML viewer
|
|
113
|
+
β β
|
|
114
|
+
β βββ cache/ # API response caching
|
|
115
|
+
β β βββ response-cache.ts # File-system cache with SHA-256 keys + TTL
|
|
116
|
+
β β
|
|
117
|
+
β βββ utils/ # Shared infrastructure
|
|
118
|
+
β β βββ http-client.ts # HTTP client with rate limiting + retries
|
|
119
|
+
β β βββ logger.ts # Pino-based structured logging
|
|
120
|
+
β β βββ config.ts # Cosmiconfig configuration resolver
|
|
121
|
+
β β
|
|
122
|
+
β βββ types/ # TypeScript type definitions
|
|
123
|
+
β β βββ index.ts # Paper, Edge, Cluster, Entity, Config interfaces
|
|
124
|
+
β β βββ config.ts # Config types + defaults
|
|
125
|
+
β β
|
|
126
|
+
β βββ __tests__/ # Test suites (86 tests)
|
|
127
|
+
β
|
|
128
|
+
βββ dist/ # Built output (82 KB ESM bundle)
|
|
129
|
+
βββ package.json
|
|
130
|
+
βββ tsconfig.json
|
|
131
|
+
βββ tsup.config.ts
|
|
132
|
+
βββ vitest.config.ts
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## π Features
|
|
138
|
+
|
|
139
|
+
### Data Sources
|
|
140
|
+
| Source | API | Rate Limit | Key Required |
|
|
141
|
+
|--------|-----|-----------|-------------|
|
|
142
|
+
| **OpenAlex** | REST | 10 req/s (polite pool) | Optional (email for polite pool) |
|
|
143
|
+
| **Semantic Scholar** | REST | 1 req/s (100 with key) | Optional |
|
|
144
|
+
|
|
145
|
+
### Graph Spine Strategies
|
|
146
|
+
| Spine | Description |
|
|
147
|
+
|-------|-------------|
|
|
148
|
+
| `citation` | Direct citation links (A cites B) |
|
|
149
|
+
| `similarity` | TF-IDF cosine similarity between abstracts |
|
|
150
|
+
| `co-citation` | Papers frequently cited together |
|
|
151
|
+
| `coupling` | Papers that cite the same references |
|
|
152
|
+
| `hybrid` | All of the above combined |
|
|
153
|
+
|
|
154
|
+
### Graph Algorithms
|
|
155
|
+
- **PageRank** β Identifies the most influential papers
|
|
156
|
+
- **Louvain** β Community detection for topic clustering
|
|
157
|
+
- **Composite Scoring** β Weighted combination of PageRank, relevance, and recency
|
|
158
|
+
|
|
159
|
+
### Export Formats
|
|
160
|
+
| Format | Extension | Use Case |
|
|
161
|
+
|--------|-----------|----------|
|
|
162
|
+
| JSON | `.json` | Programmatic access, custom visualization |
|
|
163
|
+
| GraphML | `.graphml` | yEd, Gephi, NetworkX |
|
|
164
|
+
| GEXF | `.gexf` | Gephi (with attributes) |
|
|
165
|
+
| CSV | `.csv` | Spreadsheets, pandas |
|
|
166
|
+
| Mermaid | `.md` | GitHub/GitLab rendered diagrams |
|
|
167
|
+
|
|
168
|
+
### Interactive Viewer
|
|
169
|
+
- **Cytoscape.js** β force-directed layout
|
|
170
|
+
- **Dark glassmorphism** UI with blur effects
|
|
171
|
+
- **Cluster coloring** β papers colored by community
|
|
172
|
+
- **Node sizing** β scaled by influence score
|
|
173
|
+
- **Edge coloring** β by relationship type
|
|
174
|
+
- **Search** β real-time filter by title, venue, DOI
|
|
175
|
+
- **Neighbor highlighting** β click a paper to highlight connections
|
|
176
|
+
- **Detail panel** β paper metadata with DOI/URL links
|
|
177
|
+
|
|
178
|
+
### NLP Pipeline
|
|
179
|
+
- Deterministic TF-IDF (no stemming β reproducible results)
|
|
180
|
+
- 175+ stopwords including academic terms
|
|
181
|
+
- Cosine similarity with configurable threshold
|
|
182
|
+
- Dictionary-based entity extraction (120+ known entities)
|
|
183
|
+
|
|
184
|
+
### Infrastructure
|
|
185
|
+
- **Rate limiting** β per-source token bucket (won't get you banned)
|
|
186
|
+
- **Retry logic** β exponential backoff with jitter for 429/5xx errors
|
|
187
|
+
- **Response cache** β SHA-256 keyed file-system cache (24h TTL default)
|
|
188
|
+
- **SQLite with WAL** β fast concurrent reads, 10-table schema
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## π§ Tech Stack
|
|
193
|
+
|
|
194
|
+
| Layer | Technology |
|
|
195
|
+
|-------|-----------|
|
|
196
|
+
| Language | TypeScript (ESM, NodeNext) |
|
|
197
|
+
| Runtime | Node.js 20+ |
|
|
198
|
+
| CLI | Commander.js |
|
|
199
|
+
| HTTP | undici (Node.js built-in HTTP/1.1 & HTTP/2) |
|
|
200
|
+
| Database | better-sqlite3 (WAL mode) |
|
|
201
|
+
| Graph | graphology + graphology-communities |
|
|
202
|
+
| Logging | pino (JSON + pretty-print) |
|
|
203
|
+
| Config | cosmiconfig |
|
|
204
|
+
| Bundler | tsup |
|
|
205
|
+
| Testing | vitest (86 tests, 6 suites) |
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## π Quick Start
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# Install dependencies
|
|
213
|
+
npm install
|
|
214
|
+
|
|
215
|
+
# Build
|
|
216
|
+
npm run build
|
|
217
|
+
|
|
218
|
+
# Run
|
|
219
|
+
npx papergraph build -t "transformer attention" -o graph.db
|
|
220
|
+
npx papergraph view -i graph.db
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
See [USAGE.md](./USAGE.md) for detailed usage instructions.
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## π License
|
|
228
|
+
|
|
229
|
+
MIT
|