lumina-wiki 1.6.2 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,35 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
5
5
 
6
6
  ## [Unreleased]
7
7
 
8
+ ## [1.7.0] - 2026-06-16
9
+
10
+ ### Added
11
+
12
+ - **Advanced paper ranking** via the new research-pack skill
13
+ `/lumi-research-rank`. It scores an already-ingested paper and records the
14
+ results on its source page, both as a machine-readable `ranking:` frontmatter
15
+ block and a human-readable `## Ranking` scorecard. Re-running refreshes the
16
+ ranking and preserves any notes inside `<!-- user-edited -->` markers.
17
+ - **Citation influence signal**: surfaces Semantic Scholar's influential-citation
18
+ count alongside the raw citation count (reuses the existing `fetch_s2.py`; no
19
+ new key required).
20
+ - **4C qualitative rubric** (Correctness, Clarity, Contribution, Context, each
21
+ scored 1-5) produced with a three-pass reading method to keep the assessment
22
+ efficient. Scores are explicitly recorded as LLM-assessed with a timestamp.
23
+ - **Venue prestige** recorded from the agent's own knowledge and explicitly
24
+ flagged as an estimate (`venue_source: llm-estimated`) — no live API or
25
+ bundled dataset.
26
+ - **Optional, key-gated influence fetchers** `fetch_scite.py` (Scite.ai Smart
27
+ Citation tallies) and `fetch_altmetric.py` (Altmetric attention score). Both
28
+ degrade gracefully: with no key set they exit with a clear message and the
29
+ skill simply skips that signal. New `.env` keys `SCITE_API_KEY` and
30
+ `ALTMETRIC_API_KEY`.
31
+
32
+ ### Changed
33
+
34
+ - Source page schema gains an optional `ranking` frontmatter object (no change
35
+ required for existing un-ranked pages).
36
+
8
37
  ## [1.6.2] - 2026-06-15
9
38
 
10
39
  ### Fixed
package/README.md CHANGED
@@ -185,6 +185,7 @@ These are the commands you can use when chatting with your AI agent.
185
185
  | | `/lumi-research-survey` | Create a survey or summary from existing knowledge. |
186
186
  | | `/lumi-research-prefill` | Seed foundational concepts to avoid duplicates. |
187
187
  | | `/lumi-research-topic` | Create a topic page at `wiki/topics/<slug>.md` by gathering related concepts and sources already in your wiki. The AI proposes what to include and you confirm before anything is written. Use this after several `/lumi-ingest` runs when you want to give a theme its own page. |
188
+ | | `/lumi-research-rank` | Score a paper you have already ingested so you know what to read first. It looks up how influential the paper is (citation signals), estimates how respected its venue is, and rates its quality on four points — Correctness, Clarity, Contribution, and Context — then adds a clear scorecard to the paper's page. Measured numbers and the AI's own estimates are always kept separate. |
188
189
  | | `/lumi-research-setup` | Help configure API keys for research tools. |
189
190
  | | `/lumi-research-watch-run` | Run one scheduled-discovery pass over your watchlist (topics + RSS / Atom feeds). Polls only when you ask. |
190
191
  | **Reading** | `/lumi-reading-chapter-ingest` | Ingest a book chapter by chapter. |
@@ -209,7 +210,7 @@ Lumina-Wiki is evolving rapidly. Here is our user-facing roadmap:
209
210
  - [x] **Improved CI/CD:** Native support for Bun and Node 22 environments. *(shipped in v1.2)*
210
211
  - [x] **Global Source Expansion:** Direct integration with OpenAlex, CORE, and Unpaywall for reliable DOI-to-PDF resolution. *(shipped in v1.6)*
211
212
  - [x] **RSS & Blog Monitoring:** Automatically surface new papers from your favorite lab blogs and journals via `type: feed` watchlist items. *(shipped in v1.6)*
212
- - [ ] **Advanced Paper Ranking:** See influence scores and quality signals for your research papers.
213
+ - [x] **Advanced Paper Ranking:** See influence scores and quality signals for your research papers via `/lumi-research-rank`. *(shipped in v1.7)*
213
214
 
214
215
  **Long-term (Deep Research & Integration)**
215
216
  - [ ] **Image OCR & Scanned PDFs:** Ingest screenshots and scanned PDFs into your wiki.
package/README.vi.md CHANGED
@@ -185,6 +185,7 @@ Xem [Hướng dẫn Nâng cao](docs/user-guide/advanced-qmd.vi.md) để biết
185
185
  | | `/lumi-research-survey` | Tạo một bài tổng quan/khảo sát từ kiến thức hiện có. |
186
186
  | | `/lumi-research-prefill` | Tạo trước các khái niệm nền tảng để tránh trùng lặp. |
187
187
  | | `/lumi-research-topic` | Gom các khái niệm và nguồn liên quan trong wiki thành một trang chủ đề tại `wiki/topics/<slug>.md`. AI đề xuất danh sách để bạn xem và xác nhận trước khi trang được tạo. Dùng sau khi đã nạp nhiều tài liệu và muốn tổng hợp một nhóm ý tưởng thành trang riêng. |
188
+ | | `/lumi-research-rank` | Chấm điểm một bài báo bạn đã nạp để biết nên đọc gì trước. Nó tra mức độ ảnh hưởng của bài (tín hiệu trích dẫn), ước lượng uy tín nơi công bố, và đánh giá chất lượng theo bốn tiêu chí — Tính đúng đắn, Sự rõ ràng, Đóng góp, và Bối cảnh — rồi thêm một bảng điểm rõ ràng vào trang của bài báo. Các con số đo được và ước lượng của AI luôn được tách bạch. |
188
189
  | | `/lumi-research-setup` | Giúp cấu hình API key cho các công cụ nghiên cứu. |
189
190
  | | `/lumi-research-watch-run` | Chạy một lượt khám phá theo lịch dựa trên watchlist (chủ đề + nguồn RSS / Atom). Chỉ chạy khi bạn yêu cầu. |
190
191
  | **Reading** | `/lumi-reading-chapter-ingest`| Nạp kiến thức sách theo từng chương. |
@@ -209,7 +210,7 @@ Lumina-Wiki đang phát triển nhanh chóng. Dưới đây là lộ trình hư
209
210
  - [x] **Cải thiện CI/CD:** Hỗ trợ chính thức cho môi trường Bun và Node 22. *(đã phát hành trong v1.2)*
210
211
  - [x] **Mở rộng nguồn dữ liệu toàn cầu:** Tích hợp trực tiếp với OpenAlex, CORE và Unpaywall để tra cứu DOI-to-PDF đáng tin cậy. *(ra mắt trong v1.6)*
211
212
  - [x] **Theo dõi RSS & Blog:** Tự động phát hiện bài báo mới từ các blog phòng thí nghiệm và tạp chí yêu thích qua các mục `type: feed` trong watchlist. *(ra mắt trong v1.6)*
212
- - [ ] **Xếp hạng bài báo nâng cao:** Xem điểm số ảnh hưởng và tín hiệu chất lượng cho các nghiên cứu của bạn.
213
+ - [x] **Xếp hạng bài báo nâng cao:** Xem điểm số ảnh hưởng và tín hiệu chất lượng cho các nghiên cứu của bạn qua `/lumi-research-rank`. *(ra mắt ở v1.7)*
213
214
 
214
215
  **Dài hạn (Nghiên cứu sâu & Tích hợp)**
215
216
  - [ ] **OCR ảnh & PDF scan:** Nạp ảnh chụp màn hình và PDF dạng scan vào wiki.
package/README.zh.md CHANGED
@@ -186,6 +186,7 @@ npx skills add https://github.com/tobi/qmd --skill qmd
186
186
  | | `/lumi-research-survey` | 从现有知识创建综述/调研。 |
187
187
  | | `/lumi-research-prefill` | 预先生成基础概念,避免重复。 |
188
188
  | | `/lumi-research-topic` | 把 wiki 中已有的相关概念和来源汇聚成一个主题页,保存在 `wiki/topics/<slug>.md`。AI 会提议收录哪些内容,由你确认后再生成页面。多次 `/lumi-ingest` 之后,用它把一组相关想法整理成独立的主题页。 |
189
+ | | `/lumi-research-rank` | 给已经纳入的论文打分,帮你决定先读哪一篇。它会查询论文的影响力(引用信号)、估计发表场所的声望,并从四个方面评估质量——正确性、清晰度、贡献、背景——然后在论文页面上加上一份清晰的评分卡。实测数据与 AI 的估计始终分开标注。 |
189
190
  | | `/lumi-research-setup` | 帮助配置研究工具的 API key。 |
190
191
  | | `/lumi-research-watch-run` | 基于 watchlist 运行一次计划式发现(主题 + RSS / Atom 源)。仅在你要求时才运行。 |
191
192
  | **Reading** | `/lumi-reading-chapter-ingest`| 按章节导入书籍知识。 |
@@ -210,7 +211,7 @@ Lumina-Wiki 正在快速演进。这是我们的用户路线图:
210
211
  - [x] **改进的 CI/CD:** 正式支持 Bun 和 Node 22 环境。*(v1.2 已发布)*
211
212
  - [x] **全球数据源扩展:** 直接集成 OpenAlex、CORE 和 Unpaywall,实现可靠的 DOI-to-PDF 解析。*(在 v1.6 中发布)*
212
213
  - [x] **RSS 与博客监控:** 通过 watchlist 中的 `type: feed` 项,自动从您喜爱的实验室博客和期刊中发现新论文。*(在 v1.6 中发布)*
213
- - [ ] **高级论文排名:** 查看研究论文的影响力评分和质量信号。
214
+ - [x] **高级论文排名:** 通过 `/lumi-research-rank` 查看研究论文的影响力评分和质量信号。*(在 v1.7 中发布)*
214
215
 
215
216
  **长期计划(深度研究与集成)**
216
217
  - [ ] **图片 OCR 与扫描 PDF:** 将截图与扫描版 PDF 导入维基。
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "$schema": "https://json.schemastore.org/package.json",
3
3
  "name": "lumina-wiki",
4
- "version": "1.6.2",
4
+ "version": "1.7.0",
5
5
  "description": "Domain-agnostic, multi-IDE wiki scaffolder — Karpathy's LLM-Wiki vision, cross-platform and pack-based.",
6
6
  "keywords": [
7
7
  "llm-wiki",
@@ -69,6 +69,8 @@
69
69
  "src/tools/fetch_core.py",
70
70
  "src/tools/resolve_pdf.py",
71
71
  "src/tools/fetch_rss.py",
72
+ "src/tools/fetch_scite.py",
73
+ "src/tools/fetch_altmetric.py",
72
74
  "src/tools/id_utils.py",
73
75
  "src/tools/requirements.txt",
74
76
  "CHANGELOG.md",
@@ -131,7 +131,7 @@ const RESEARCH_TOOL_FILES = [
131
131
  '_env.py', '_cache.py', 'discover.py', 'init_discovery.py', 'prepare_source.py',
132
132
  'fetch_arxiv.py', 'fetch_wikipedia.py', 'fetch_s2.py', 'fetch_deepxiv.py',
133
133
  'fetch_openalex.py', 'fetch_unpaywall.py', 'fetch_core.py', 'resolve_pdf.py',
134
- 'fetch_rss.py',
134
+ 'fetch_rss.py', 'fetch_scite.py', 'fetch_altmetric.py',
135
135
  ];
136
136
 
137
137
  async function findEnclosingWorkspace(startDir) {
@@ -1234,6 +1234,7 @@ function getSkillDefs(packs) {
1234
1234
  { name: 'prefill', canonicalId: 'lumi-research-prefill', displayName: '/lumi-research-prefill' },
1235
1235
  { name: 'setup', canonicalId: 'lumi-research-setup', displayName: '/lumi-research-setup' },
1236
1236
  { name: 'topic', canonicalId: 'lumi-research-topic', displayName: '/lumi-research-topic' },
1237
+ { name: 'rank', canonicalId: 'lumi-research-rank', displayName: '/lumi-research-rank' },
1237
1238
  { name: 'watchlist', canonicalId: 'lumi-research-watchlist', displayName: '/lumi-research-watchlist' },
1238
1239
  { name: 'watch-run', canonicalId: 'lumi-research-watch-run', displayName: '/lumi-research-watch-run' },
1239
1240
  ];
@@ -1322,6 +1323,10 @@ async function renderEnvExample(projectRoot) {
1322
1323
  `OPENALEX_API_KEY=\n\n` +
1323
1324
  `# DeepXiv token (optional; enables full-text PDF access)\n` +
1324
1325
  `DEEPXIV_TOKEN=\n\n` +
1326
+ `# Scite.ai API key (optional; enables Smart Citation tallies in /lumi-research-rank)\n` +
1327
+ `SCITE_API_KEY=\n\n` +
1328
+ `# Altmetric API key (optional; enables attention scores in /lumi-research-rank)\n` +
1329
+ `ALTMETRIC_API_KEY=\n\n` +
1325
1330
  `# arXiv does not require an API key in v0.1\n`;
1326
1331
  }
1327
1332
  await atomicWrite(destPath, content);
@@ -283,6 +283,7 @@ export const REQUIRED_FRONTMATTER = {
283
283
  { key: 'findings', type: 'array', required: false },
284
284
  { key: 'external_ids', type: 'object', required: false },
285
285
  { key: 'sources', type: 'array', required: false },
286
+ { key: 'ranking', type: 'object', required: false },
286
287
  ],
287
288
 
288
289
  // Concept page
@@ -0,0 +1,170 @@
1
+ ---
2
+ name: lumi-research-rank
3
+ description: >
4
+ Score an already-ingested paper's influence and quality. Fetches citation
5
+ influence (and optional Scite/Altmetric signals when keys are set), estimates
6
+ venue prestige, and runs a structured 4C quality assessment, then writes a
7
+ transparent ranking block onto the source page. Helps prioritize what to read
8
+ next.
9
+ allowed-tools:
10
+ - Bash
11
+ - Read
12
+ - Edit
13
+ ---
14
+
15
+ # /lumi-research-rank
16
+
17
+ ## Role
18
+
19
+ You are the wiki's paper-ranking assistant. For one or more source pages the
20
+ user names, you gather influence signals and produce a short quality scorecard,
21
+ then record them on the source page so the user can prioritize their reading.
22
+ You score papers; you never change a paper's summary, claims, or other content.
23
+
24
+ ## Context
25
+
26
+ Read `README.md` first. This skill is available only when the research pack is
27
+ installed. It works on papers already in `wiki/sources/`; if a paper has not
28
+ been ingested yet, suggest `/lumi-ingest` first.
29
+
30
+ Every figure you record must say where it came from and when. Influence numbers
31
+ come from APIs (Semantic Scholar always; Scite and Altmetric only when the user
32
+ has set keys). Venue prestige and the 4C quality scores come from your own
33
+ judgment — always mark those as estimates, never as authoritative facts.
34
+
35
+ References:
36
+ - Read `references/three-pass.md` before reading the paper, to keep the
37
+ assessment efficient.
38
+ - Read `references/4c-rubric.md` before scoring quality.
39
+
40
+ ## Instructions
41
+
42
+ 1. **Resolve the target.** Take the slug(s) the user named. To confirm a slug
43
+ exists and read its identifiers:
44
+
45
+ ```bash
46
+ node _lumina/scripts/wiki.mjs read-meta <slug>
47
+ ```
48
+
49
+ Note the `external_ids` block. You need an `s2` id, `doi`, or `arxiv` id for
50
+ the influence lookup, and a `doi` for the optional Scite/Altmetric lookups.
51
+ If none are present, you can still do the qualitative 4C assessment — just
52
+ tell the user the influence numbers are unavailable.
53
+
54
+ 2. **Fetch citation influence (uses the optional Semantic Scholar key).**
55
+
56
+ ```bash
57
+ python3 _lumina/tools/fetch_s2.py paper <s2-id|arXiv:ID|DOI:ID>
58
+ ```
59
+
60
+ From the result, keep `influentialCitationCount`, `citationCount`, and the
61
+ `journal` name. These become `influential_citations`, `citation_count`, and
62
+ the venue hint, with `citation_source: semantic-scholar`.
63
+
64
+ This tool needs `SEMANTIC_SCHOLAR_API_KEY`. If it is not set, the tool exits
65
+ with a clear "no key set" message (exit code 2) — treat this exactly like the
66
+ optional signals in step 3: skip the citation-influence numbers, continue
67
+ with the qualitative assessment, and tell the user that influence figures are
68
+ unavailable until they add the key (offer `/lumi-research-setup`). Do not
69
+ abort the ranking over a missing S2 key.
70
+
71
+ 3. **Optional key-gated signals.** Only attempt these when the paper has a DOI.
72
+ Each tool exits with a clear "no key set" message (exit code 2) when the key
73
+ is missing — if that happens, skip the signal silently and continue; do not
74
+ treat it as an error or ask the user to add a key unless they want it.
75
+
76
+ ```bash
77
+ python3 _lumina/tools/fetch_scite.py tally <doi>
78
+ python3 _lumina/tools/fetch_altmetric.py doi <doi>
79
+ ```
80
+
81
+ A `found: false` result means the service has no data for that paper — record
82
+ nothing for that signal rather than zeros.
83
+
84
+ 4. **Estimate venue prestige from your own knowledge.** Using the journal or
85
+ conference name, state a tier such as "CORE A*", "SJR Q1", or "top-tier
86
+ workshop" if you are reasonably confident. This is your estimate, not a
87
+ looked-up fact: always set `venue_source: llm-estimated`. If you are unsure,
88
+ leave the venue tier out rather than guess.
89
+
90
+ 5. **Assess quality (4C rubric).** Follow `references/three-pass.md` to read the
91
+ paper efficiently, then score Correctness, Clarity, Contribution, and Context
92
+ from 1 to 5 each per `references/4c-rubric.md`. Keep a one-line rationale for
93
+ each score.
94
+
95
+ 6. **Write the ranking block.** Assemble a flat object of the values you have
96
+ (omit keys you do not) and store it on the page. Use `--json-value`:
97
+
98
+ ```bash
99
+ node _lumina/scripts/wiki.mjs set-meta <slug> ranking '{
100
+ "influential_citations": 42,
101
+ "citation_count": 318,
102
+ "citation_source": "semantic-scholar",
103
+ "citation_fetched": "YYYY-MM-DD",
104
+ "venue_name": "NeurIPS",
105
+ "venue_tier": "CORE A*",
106
+ "venue_source": "llm-estimated",
107
+ "venue_estimated": "YYYY-MM-DD",
108
+ "scite_supporting": 12,
109
+ "scite_contrasting": 1,
110
+ "scite_mentioning": 64,
111
+ "scite_fetched": "YYYY-MM-DD",
112
+ "altmetric_score": 287,
113
+ "altmetric_fetched": "YYYY-MM-DD",
114
+ "quality_correctness": 4,
115
+ "quality_clarity": 5,
116
+ "quality_contribution": 4,
117
+ "quality_context": 3,
118
+ "quality_source": "llm",
119
+ "quality_assessed": "YYYY-MM-DD"
120
+ }' --json-value
121
+ ```
122
+
123
+ Use today's date (`node _lumina/scripts/wiki.mjs read-meta` output or the
124
+ system date) for the `_fetched` / `_assessed` / `_estimated` fields. The
125
+ `ranking` field is a one-level map of plain values — do not nest objects
126
+ inside it.
127
+
128
+ 7. **Write the human-readable scorecard.** The `## Ranking` section holds a
129
+ **managed region** bounded by marker comments:
130
+
131
+ ```markdown
132
+ ## Ranking
133
+
134
+ <!-- lumina:ranking -->
135
+ (influence numbers and the 4C scorecard with one-line rationales go here)
136
+ <!-- /lumina:ranking -->
137
+ ```
138
+
139
+ Refresh rules, so re-running is safe in any session:
140
+ - If the markers already exist, **replace only the text between them** with
141
+ the new scorecard. Use `Edit` with the whole marked block (markers
142
+ included) as the search target so you never create a second `## Ranking`.
143
+ - If the section does not exist yet, add it once, with both markers.
144
+ - **Never write inside or remove `<!-- user-edited -->` blocks**, and keep any
145
+ user prose that sits *outside* the `lumina:ranking` markers untouched.
146
+
147
+ Put the influence figures and their dates inside the managed region so the
148
+ provenance is visible to a reader who never opens the frontmatter.
149
+
150
+ 8. **Log the activity.**
151
+
152
+ ```bash
153
+ node _lumina/scripts/wiki.mjs log lumi-research-rank "ranked <slug>: infl=<n>, 4C=<c/c/c/c>"
154
+ ```
155
+
156
+ 9. **Report to the user in plain language.** Summarize what you found — how
157
+ influential the paper is, any quality concerns from the 4C pass, and where it
158
+ sits relative to other ranked papers if you know. Clearly separate measured
159
+ numbers from your own estimates. Do not present your venue guess or 4C scores
160
+ as hard facts.
161
+
162
+ ## Boundaries
163
+
164
+ - Ranking is **additive metadata**. Do not edit the summary, key claims,
165
+ evidence, links, or any other section of the source page.
166
+ - Do not create new pages, graph edges, or index entries.
167
+ - Do not invent citation numbers. If an API returns nothing, say the number is
168
+ unavailable rather than recording a zero.
169
+ - Re-running on the same paper refreshes the ranking; it must not duplicate the
170
+ `## Ranking` section or clobber user notes.
@@ -0,0 +1,62 @@
1
+ # The 4C Quality Rubric
2
+
3
+ Score each of the four dimensions from **1 to 5**. These are *your* qualitative
4
+ judgments, not measured facts — record them with `quality_source: llm` and keep
5
+ one short rationale per score. When the evidence for a dimension is thin (e.g.
6
+ you only read the abstract), score conservatively and say so.
7
+
8
+ | Score | Meaning |
9
+ |-------|--------------------|
10
+ | 5 | Exceptional |
11
+ | 4 | Strong |
12
+ | 3 | Adequate / typical |
13
+ | 2 | Weak |
14
+ | 1 | Poor / unreliable |
15
+
16
+ ## Correctness — *Is the work sound?*
17
+
18
+ Methodological integrity and the absence of obvious flaws. Look for: sensible
19
+ experimental design, appropriate baselines and ablations, honest treatment of
20
+ limitations, claims that are actually supported by the evidence presented,
21
+ reproducibility signals (released code/data, clear hyperparameters).
22
+
23
+ Lower the score for: overclaiming beyond the evidence, missing baselines,
24
+ cherry-picked results, statistical weaknesses, or unaddressed confounds.
25
+
26
+ ## Clarity — *Can a reader follow it?*
27
+
28
+ Logical flow and presentation quality. Look for: a clear problem statement, a
29
+ readable structure, well-labeled figures and tables, defined notation, and a
30
+ contribution that is easy to state in one sentence.
31
+
32
+ Lower the score for: disorganized structure, undefined terms, figures that do
33
+ not support the text, or a thesis you have to reconstruct yourself.
34
+
35
+ ## Contribution — *Does it matter?*
36
+
37
+ Novelty and impact on the field. Look for: a genuinely new idea, method, result,
38
+ dataset, or synthesis; a meaningful improvement over prior work; usefulness to
39
+ other researchers or practitioners.
40
+
41
+ Lower the score for: incremental tweaks framed as breakthroughs, results already
42
+ well established elsewhere, or a contribution that is hard to identify.
43
+
44
+ ## Context — *Is it well situated?*
45
+
46
+ Quality of citations and relationship to prior work. Look for: fair and
47
+ reasonably complete related-work coverage, accurate characterization of what
48
+ came before, and a clear positioning of this work against it.
49
+
50
+ Lower the score for: thin or one-sided citations, ignoring obvious prior art, or
51
+ mischaracterizing competing approaches.
52
+
53
+ ## Recording
54
+
55
+ The four scores go into the `ranking` frontmatter as
56
+ `quality_correctness`, `quality_clarity`, `quality_contribution`,
57
+ `quality_context`, plus `quality_source: llm` and `quality_assessed: <date>`.
58
+ The rationales go into the human-readable `## Ranking` section, not the
59
+ frontmatter.
60
+
61
+ Do **not** invent a single "overall" number. The four dimensions are reported
62
+ separately so a reader can weigh them for their own purpose.
@@ -0,0 +1,40 @@
1
+ # Three-Pass Reading for Efficient Assessment
2
+
3
+ Do not read a whole paper end-to-end before scoring it. Use the three-pass
4
+ method to spend context budget where it matters and to stop early when the paper
5
+ clearly does not warrant a deep read.
6
+
7
+ ## Pass 1 — Shallow overview (always)
8
+
9
+ Read only: title, abstract, section headings, figures/tables captions, and the
10
+ conclusion. Goal: understand the claimed contribution and the shape of the
11
+ evidence.
12
+
13
+ After Pass 1 you should be able to draft a *provisional* Contribution and
14
+ Clarity score and decide whether a deeper read is worthwhile. If the paper is
15
+ clearly out of scope for the user's interest, stop here and tell them — record
16
+ only the influence numbers and a note that quality was not deeply assessed.
17
+
18
+ ## Pass 2 — Targeted read (usual depth)
19
+
20
+ Read the introduction, the method/approach, the main results, and the
21
+ related-work section. Skim proofs and appendices. Goal: judge whether the claims
22
+ are supported and whether prior work is fairly situated.
23
+
24
+ Pass 2 is normally enough to finalize all four 4C scores with confidence.
25
+
26
+ ## Pass 3 — Deep read (only when needed)
27
+
28
+ Reserve for papers that are central to the user's work, surprising, or where
29
+ correctness is in doubt. Re-derive key arguments, scrutinize assumptions, and
30
+ check whether the experiments actually test the claims. Goal: a defensible
31
+ Correctness score for a paper that matters.
32
+
33
+ ## Budgeting
34
+
35
+ - Most papers: Pass 1 + Pass 2.
36
+ - Skip to a Pass-1-only verdict for clearly off-topic or low-relevance papers.
37
+ - Escalate to Pass 3 sparingly — it is the most expensive.
38
+
39
+ If you only completed Pass 1, say so in your report and score conservatively;
40
+ do not present a shallow read as a thorough evaluation.
@@ -14,5 +14,13 @@ OPENALEX_API_KEY=
14
14
  # Obtain at: https://deepxiv.com
15
15
  DEEPXIV_TOKEN=
16
16
 
17
+ # Scite.ai API key (optional; enables Smart Citation tallies in /lumi-research-rank)
18
+ # Obtain at: https://scite.ai/apis
19
+ SCITE_API_KEY=
20
+
21
+ # Altmetric API key (optional; enables attention scores in /lumi-research-rank)
22
+ # Obtain at: https://www.altmetric.com/products/altmetric-api/
23
+ ALTMETRIC_API_KEY=
24
+
17
25
  # arXiv does not require an API key in v0.1.
18
26
  # The fetcher uses the public XML API at export.arxiv.org.
@@ -197,7 +197,7 @@ Skills live in `.agents/skills/` and are invoked via slash commands. Active inst
197
197
 
198
198
  {{#if pack_research}}### Pack: research
199
199
 
200
- Adds `/lumi-research-discover` (ranked candidate shortlist), `/lumi-research-watchlist` (choose topics for scheduled discovery with AI help), `/lumi-research-watch-run` (run one scheduled-discovery pass over the watchlist — topics + RSS / Atom feeds — only when you ask), `/lumi-research-survey` (narrative synthesis), `/lumi-research-prefill` (seed foundations/ to prevent concept duplication), `/lumi-research-topic` (cluster existing concepts and sources into a thematic topic page; AI proposes the cluster from the graph, you confirm before anything is written), `/lumi-research-setup` (interactive API key configuration).
200
+ Adds `/lumi-research-discover` (ranked candidate shortlist), `/lumi-research-watchlist` (choose topics for scheduled discovery with AI help), `/lumi-research-watch-run` (run one scheduled-discovery pass over the watchlist — topics + RSS / Atom feeds — only when you ask), `/lumi-research-survey` (narrative synthesis), `/lumi-research-prefill` (seed foundations/ to prevent concept duplication), `/lumi-research-topic` (cluster existing concepts and sources into a thematic topic page; AI proposes the cluster from the graph, you confirm before anything is written), `/lumi-research-rank` (score an already-ingested paper's citation influence and 4C quality, recorded on its source page; optional Scite/Altmetric signals when keys are set), `/lumi-research-setup` (interactive API key configuration).
201
201
  {{/if}}
202
202
  {{#if pack_reading}}### Pack: reading
203
203
 
@@ -196,7 +196,7 @@ Các skill nằm trong `.agents/skills/` và được gọi qua lệnh slash. C
196
196
 
197
197
  {{#if pack_research}}### Gói: research
198
198
 
199
- Thêm `/lumi-research-discover` (danh sách ứng viên được xếp hạng), `/lumi-research-watchlist` (chọn chủ đề để khám phá theo lịch với sự hỗ trợ AI), `/lumi-research-watch-run` (chạy một lượt khám phá theo lịch trên watchlist — chủ đề + nguồn RSS / Atom — chỉ khi bạn yêu cầu), `/lumi-research-survey` (tổng hợp dạng tường thuật), `/lumi-research-prefill` (tạo nền tảng để ngăn trùng lặp khái niệm), `/lumi-research-topic` (nhóm các khái niệm và nguồn hiện có thành trang chủ đề; AI đề xuất cụm từ đồ thị, bạn xác nhận trước khi ghi bất cứ thứ gì), `/lumi-research-setup` (cấu hình API key tương tác).
199
+ Thêm `/lumi-research-discover` (danh sách ứng viên được xếp hạng), `/lumi-research-watchlist` (chọn chủ đề để khám phá theo lịch với sự hỗ trợ AI), `/lumi-research-watch-run` (chạy một lượt khám phá theo lịch trên watchlist — chủ đề + nguồn RSS / Atom — chỉ khi bạn yêu cầu), `/lumi-research-survey` (tổng hợp dạng tường thuật), `/lumi-research-prefill` (tạo nền tảng để ngăn trùng lặp khái niệm), `/lumi-research-topic` (nhóm các khái niệm và nguồn hiện có thành trang chủ đề; AI đề xuất cụm từ đồ thị, bạn xác nhận trước khi ghi bất cứ thứ gì), `/lumi-research-rank` (chấm điểm mức độ ảnh hưởng trích dẫn và chất lượng 4C của một bài báo đã nạp, ghi vào trang nguồn của nó; có thêm tín hiệu Scite/Altmetric khi đã đặt key), `/lumi-research-setup` (cấu hình API key tương tác).
200
200
  {{/if}}
201
201
  {{#if pack_reading}}### Gói: reading
202
202
 
@@ -197,7 +197,7 @@
197
197
 
198
198
  {{#if pack_research}}### 包:research
199
199
 
200
- 添加 `/lumi-research-discover`(排名候选人简短列表)、`/lumi-research-watchlist`(借助 AI 为定期发现选择主题)、`/lumi-research-watch-run`(基于 watchlist 运行一次计划式发现——主题 + RSS / Atom 源——仅在您要求时执行)、`/lumi-research-survey`(叙事综合)、`/lumi-research-prefill`(播种基础知识以防止概念重复)、`/lumi-research-topic`(将现有概念和来源聚类为主题页面;AI 从图谱中提出聚类,您在写入任何内容之前确认)、`/lumi-research-setup`(交互式 API 密钥配置)。
200
+ 添加 `/lumi-research-discover`(排名候选人简短列表)、`/lumi-research-watchlist`(借助 AI 为定期发现选择主题)、`/lumi-research-watch-run`(基于 watchlist 运行一次计划式发现——主题 + RSS / Atom 源——仅在您要求时执行)、`/lumi-research-survey`(叙事综合)、`/lumi-research-prefill`(播种基础知识以防止概念重复)、`/lumi-research-topic`(将现有概念和来源聚类为主题页面;AI 从图谱中提出聚类,您在写入任何内容之前确认)、`/lumi-research-rank`(为已纳入的论文评估引用影响力和 4C 质量,记录到其来源页面;设置密钥后还可加入 Scite/Altmetric 信号)、`/lumi-research-setup`(交互式 API 密钥配置)。
201
201
  {{/if}}
202
202
  {{#if pack_reading}}### 包:reading
203
203
 
@@ -14,6 +14,7 @@ lumi-research-watchlist,RW,research,anytime,,lumi-research-discover,false,,_lumi
14
14
  lumi-research-watch-run,RWR,research,anytime,lumi-research-watchlist,,false,,raw/discovered/**,run one scheduled-discovery pass over the watchlist (topics + RSS / Atom feeds)
15
15
  lumi-research-survey,RV,research,3-query,lumi-ingest,,false,,wiki/summary/**,narrative synthesis across a topic's sources
16
16
  lumi-research-topic,RT,research,3-query,lumi-ingest,,false,,wiki/topics/**,cluster concepts and sources into a thematic topic page
17
+ lumi-research-rank,RR,research,3-query,lumi-ingest,,false,[source-slug],wiki/sources/**,"score a paper's citation influence and 4C quality, recorded on its source page"
17
18
  lumi-research-setup,RSS,research,anytime,,,false,,.env,interactive API key configuration
18
19
  {{/if}}
19
20
  {{#if pack_reading}}
@@ -19,6 +19,28 @@ source_type: paper # paper | article | book | podcast | note | other
19
19
  importance: 3 # 1=niche 2=useful 3=field-standard 4=influential 5=seminal
20
20
  confidence: high # high | medium | low
21
21
  tags: []
22
+ ranking: # optional; written by /lumi-research-rank. Omit until the paper is ranked.
23
+ # Flat map of scalars (one level only, like external_ids). Only include keys you have.
24
+ influential_citations: 42 # Semantic Scholar influentialCitationCount
25
+ citation_count: 318 # Semantic Scholar citationCount
26
+ citation_source: semantic-scholar
27
+ citation_fetched: YYYY-MM-DD
28
+ venue_name: "NeurIPS"
29
+ venue_tier: "CORE A*" # free-form; agent-estimated, NOT authoritative
30
+ venue_source: llm-estimated
31
+ venue_estimated: YYYY-MM-DD
32
+ scite_supporting: 12 # only when SCITE_API_KEY is set
33
+ scite_contrasting: 1
34
+ scite_mentioning: 64
35
+ scite_fetched: YYYY-MM-DD
36
+ altmetric_score: 287 # only when ALTMETRIC_API_KEY is set
37
+ altmetric_fetched: YYYY-MM-DD
38
+ quality_correctness: 4 # 4C rubric, 1-5 each (LLM-assessed)
39
+ quality_clarity: 5
40
+ quality_contribution: 4
41
+ quality_context: 3
42
+ quality_source: llm
43
+ quality_assessed: YYYY-MM-DD
22
44
  ---
23
45
  ```
24
46
 
@@ -30,6 +52,7 @@ tags: []
30
52
  - `## Related sources` — wikilinks to other source pages
31
53
  - `## People` — wikilinks to person pages
32
54
  - `## Open questions` — unanswered questions this source raises
55
+ - `## Ranking` — *(optional; managed by `/lumi-research-rank`)* human-readable influence signals and the 4C quality scorecard (Correctness, Clarity, Contribution, Context) with one-line rationales. Each figure states its source and date. The scorecard lives inside a managed region bounded by `<!-- lumina:ranking -->` and `<!-- /lumina:ranking -->`; only that region is rewritten on refresh. Free-text notes you add outside those markers (or inside `<!-- user-edited -->` markers) are preserved.
33
56
  - `## Notes` — free-form notes (user-owned; mark with `<!-- user-edited -->` to preserve on upgrade)
34
57
 
35
58
  ---
@@ -0,0 +1,229 @@
1
+ """
2
+ fetch_altmetric.py — Altmetric attention-score wrapper.
3
+
4
+ CLI:
5
+ python fetch_altmetric.py doi <doi>
6
+
7
+ Surfaces the Altmetric attention score and the broad social/news engagement
8
+ counts for a publication, used by /lumi-research-rank as a non-citation
9
+ influence signal.
10
+
11
+ Requires ALTMETRIC_API_KEY in the environment (via _env.load_env()). The signal
12
+ is key-gated: when the key is absent the tool exits 2 with an actionable message
13
+ so the skill can skip this signal and continue. JSON emitted to stdout on
14
+ success.
15
+
16
+ Exit codes:
17
+ 0 success (includes the "no Altmetric data for this DOI" case, found=false)
18
+ 2 user error (missing key, bad/empty DOI) — actionable
19
+ 3 internal/transient error (network, API 5xx) — includes retry hint
20
+ """
21
+
22
+ from __future__ import annotations
23
+
24
+ import argparse
25
+ import json
26
+ import sys
27
+ from pathlib import Path
28
+ from typing import Any
29
+ from urllib.parse import quote
30
+
31
+ import requests
32
+
33
+ # Import HTTP cache helper at module load (before any test patches requests.Session)
34
+ from _cache import wrap_session
35
+ from id_utils import normalize_external_id
36
+
37
+ # Import env loader using relative path for portability when installed
38
+ try:
39
+ from _env import load_env
40
+ except ImportError:
41
+ import importlib.util
42
+ _spec = importlib.util.spec_from_file_location(
43
+ "_env", Path(__file__).parent / "_env.py"
44
+ )
45
+ _mod = importlib.util.module_from_spec(_spec) # type: ignore[arg-type]
46
+ _spec.loader.exec_module(_mod) # type: ignore[union-attr]
47
+ load_env = _mod.load_env
48
+
49
+ # ---------------------------------------------------------------------------
50
+ # Constants
51
+ # ---------------------------------------------------------------------------
52
+
53
+ ALTMETRIC_API_BASE = "https://api.altmetric.com/v1"
54
+ REQUEST_TIMEOUT = 30
55
+
56
+ ENV_KEY_NAME = "ALTMETRIC_API_KEY"
57
+ KEY_OBTAIN_URL = "https://www.altmetric.com/products/altmetric-api/"
58
+
59
+
60
+ # ---------------------------------------------------------------------------
61
+ # Helpers
62
+ # ---------------------------------------------------------------------------
63
+
64
+ def _err(msg: str) -> None:
65
+ print(msg, file=sys.stderr)
66
+
67
+
68
+ def _get_api_key() -> str:
69
+ """Load and return the Altmetric API key.
70
+
71
+ Raises SystemExit(2) with an actionable message if missing.
72
+ """
73
+ env = load_env()
74
+ key = env.get(ENV_KEY_NAME, "").strip()
75
+ if not key:
76
+ _err(
77
+ f"Error: {ENV_KEY_NAME} is not set.\n"
78
+ f"Altmetric attention is an optional, key-gated ranking signal.\n"
79
+ f"Set it in your project .env file or ~/.env:\n"
80
+ f" {ENV_KEY_NAME}=<your-key>\n"
81
+ f"Obtain an API key at: {KEY_OBTAIN_URL}"
82
+ )
83
+ sys.exit(2)
84
+ return key
85
+
86
+
87
+ def _clean_doi(raw: str) -> str:
88
+ """Validate and normalize a DOI; SystemExit(2) on an invalid value."""
89
+ r = normalize_external_id("doi", raw)
90
+ if not r["valid"] or not r["id"]:
91
+ _err(f"Error: {raw!r} is not a valid DOI (expected form like 10.1234/abcd).")
92
+ sys.exit(2)
93
+ return r["id"]
94
+
95
+
96
+ def _make_session() -> requests.Session:
97
+ session = requests.Session()
98
+ session.headers.update({
99
+ "User-Agent": "lumina-wiki/0.1 (research-pack; altmetric fetcher)",
100
+ })
101
+ # Strip the secret `key` query param from the cache key so the credential is
102
+ # never part of the on-disk cache slot and rotating the key still hits the
103
+ # same cache entry (mirrors fetch_openalex.py / fetch_unpaywall.py).
104
+ return wrap_session(session, namespace="altmetric", strip_params=["key"])
105
+
106
+
107
+ def _handle_response_errors(resp: requests.Response, context: str) -> None:
108
+ """Check for API-level errors and raise appropriate exceptions.
109
+
110
+ 404 is NOT raised here — a DOI with no Altmetric attention is a normal
111
+ "no data" outcome handled by the caller, not an error.
112
+
113
+ ValueError -> exit 2 (request problem); RuntimeError -> exit 3 (transient).
114
+ """
115
+ if resp.status_code == 404:
116
+ return
117
+ if resp.status_code in (401, 403):
118
+ raise ValueError(f"Altmetric rejected the API key (HTTP {resp.status_code}). Check ALTMETRIC_API_KEY.")
119
+ if resp.status_code == 429:
120
+ raise RuntimeError("Rate limit exceeded. Wait before retrying.")
121
+ if resp.status_code >= 500:
122
+ raise RuntimeError(f"Altmetric API returned HTTP {resp.status_code}")
123
+ if 400 <= resp.status_code < 500:
124
+ # Any other 4xx is a request-shape problem, not a transient one.
125
+ raise ValueError(f"Altmetric rejected the request for {context} (HTTP {resp.status_code}).")
126
+ resp.raise_for_status()
127
+
128
+
129
+ def _parse_json(resp: requests.Response, context: str) -> dict[str, Any] | None:
130
+ """Parse a JSON object body. Returns None when the body is JSON but not an
131
+ object (treated as "no usable data"). A non-JSON body is a server-side
132
+ problem and is raised as RuntimeError -> exit 3, not mislabeled exit 2.
133
+ """
134
+ try:
135
+ data = resp.json()
136
+ except ValueError as exc:
137
+ raise RuntimeError(f"Altmetric returned a non-JSON body for {context}: {exc}") from exc
138
+ return data if isinstance(data, dict) else None
139
+
140
+
141
+ # ---------------------------------------------------------------------------
142
+ # Core fetch
143
+ # ---------------------------------------------------------------------------
144
+
145
+ def fetch_doi(doi: str, api_key: str, session: requests.Session) -> dict[str, Any]:
146
+ """Fetch the Altmetric record for a single DOI.
147
+
148
+ Returns a normalized dict. When the DOI has no Altmetric attention, returns
149
+ {"found": False, "doi": doi} rather than raising.
150
+ """
151
+ url = f"{ALTMETRIC_API_BASE}/doi/{quote(doi, safe='/')}"
152
+ resp = session.get(url, params={"key": api_key}, timeout=REQUEST_TIMEOUT)
153
+ _handle_response_errors(resp, f"attention for '{doi}'")
154
+ if resp.status_code == 404:
155
+ return {"found": False, "doi": doi}
156
+
157
+ raw = _parse_json(resp, f"attention for '{doi}'")
158
+ if raw is None:
159
+ # 200 with no usable attention object — treat as no data, not zeros.
160
+ return {"found": False, "doi": doi}
161
+
162
+ result: dict[str, Any] = {
163
+ "found": True,
164
+ "doi": raw.get("doi", doi),
165
+ "source": "altmetric.com",
166
+ }
167
+ # Only surface fields the API actually returned — never fabricate zeros that
168
+ # the ranking skill would record as provenance-bearing facts.
169
+ for key in (
170
+ "score", "readers_count", "cited_by_posts_count",
171
+ "cited_by_tweeters_count", "cited_by_msm_count", "details_url",
172
+ ):
173
+ if raw.get(key) is not None:
174
+ result[key] = raw[key]
175
+ return result
176
+
177
+
178
+ # ---------------------------------------------------------------------------
179
+ # CLI
180
+ # ---------------------------------------------------------------------------
181
+
182
+ def main(argv: list[str] | None = None) -> None:
183
+ parser = argparse.ArgumentParser(
184
+ prog="fetch_altmetric.py",
185
+ description="Fetch Altmetric attention data. Requires ALTMETRIC_API_KEY.",
186
+ )
187
+ subparsers = parser.add_subparsers(dest="command", required=True)
188
+
189
+ d = subparsers.add_parser("doi", help="Fetch attention data for a DOI.")
190
+ d.add_argument("doi", help="DOI of the publication (e.g. 10.1234/abcd).")
191
+
192
+ args = parser.parse_args(argv)
193
+
194
+ if not args.doi.strip():
195
+ _err("Error: DOI must not be empty.")
196
+ sys.exit(2)
197
+ doi = _clean_doi(args.doi.strip())
198
+
199
+ api_key = _get_api_key()
200
+ session = _make_session()
201
+
202
+ try:
203
+ result = fetch_doi(doi, api_key, session)
204
+ print(json.dumps(result, ensure_ascii=False, indent=2))
205
+ sys.exit(0)
206
+ except ValueError as exc:
207
+ _err(f"Error: {exc}")
208
+ sys.exit(2)
209
+ except RuntimeError as exc:
210
+ _err(f"API error: {exc}")
211
+ _err("Retry hint: wait a few seconds and try again.")
212
+ sys.exit(3)
213
+ except requests.exceptions.ConnectionError as exc:
214
+ _err(f"Network error: {exc}")
215
+ _err("Retry hint: check your internet connection and try again.")
216
+ sys.exit(3)
217
+ except requests.exceptions.Timeout:
218
+ _err("Request timed out while contacting Altmetric.")
219
+ _err("Retry hint: Altmetric may be slow; try again in a few minutes.")
220
+ sys.exit(3)
221
+ except requests.exceptions.HTTPError as exc:
222
+ code = exc.response.status_code if exc.response is not None else "unknown"
223
+ _err(f"HTTP error {code} from Altmetric.")
224
+ _err("Retry hint: try again later.")
225
+ sys.exit(3)
226
+
227
+
228
+ if __name__ == "__main__":
229
+ main()
@@ -0,0 +1,236 @@
1
+ """
2
+ fetch_scite.py — Scite.ai citation-tally wrapper.
3
+
4
+ CLI:
5
+ python fetch_scite.py tally <doi>
6
+
7
+ Surfaces the Smart Citation tallies (supporting / contrasting / mentioning)
8
+ that Scite.ai computes for a publication, used by /lumi-research-rank as a
9
+ quality signal beyond raw citation counts.
10
+
11
+ Requires SCITE_API_KEY in the environment (via _env.load_env()). The signal is
12
+ key-gated: when the key is absent the tool exits 2 with an actionable message
13
+ so the skill can skip this signal and continue. JSON emitted to stdout on
14
+ success.
15
+
16
+ Exit codes:
17
+ 0 success (includes the "no Scite data for this DOI" case, found=false)
18
+ 2 user error (missing key, bad/empty DOI) — actionable
19
+ 3 internal/transient error (network, API 5xx) — includes retry hint
20
+ """
21
+
22
+ from __future__ import annotations
23
+
24
+ import argparse
25
+ import json
26
+ import sys
27
+ from pathlib import Path
28
+ from typing import Any
29
+ from urllib.parse import quote
30
+
31
+ import requests
32
+
33
+ # Import HTTP cache helper at module load (before any test patches requests.Session)
34
+ from _cache import wrap_session
35
+ from id_utils import normalize_external_id
36
+
37
+ # Import env loader using relative path for portability when installed
38
+ try:
39
+ from _env import load_env
40
+ except ImportError:
41
+ import importlib.util
42
+ _spec = importlib.util.spec_from_file_location(
43
+ "_env", Path(__file__).parent / "_env.py"
44
+ )
45
+ _mod = importlib.util.module_from_spec(_spec) # type: ignore[arg-type]
46
+ _spec.loader.exec_module(_mod) # type: ignore[union-attr]
47
+ load_env = _mod.load_env
48
+
49
+ # ---------------------------------------------------------------------------
50
+ # Constants
51
+ # ---------------------------------------------------------------------------
52
+
53
+ SCITE_API_BASE = "https://api.scite.ai"
54
+ REQUEST_TIMEOUT = 30
55
+
56
+ ENV_KEY_NAME = "SCITE_API_KEY"
57
+ KEY_OBTAIN_URL = "https://scite.ai/apis"
58
+
59
+
60
+ # ---------------------------------------------------------------------------
61
+ # Helpers
62
+ # ---------------------------------------------------------------------------
63
+
64
+ def _err(msg: str) -> None:
65
+ print(msg, file=sys.stderr)
66
+
67
+
68
+ def _get_api_key() -> str:
69
+ """Load and return the Scite API key.
70
+
71
+ Raises SystemExit(2) with an actionable message if missing.
72
+ """
73
+ env = load_env()
74
+ key = env.get(ENV_KEY_NAME, "").strip()
75
+ if not key:
76
+ _err(
77
+ f"Error: {ENV_KEY_NAME} is not set.\n"
78
+ f"Scite tallies are an optional, key-gated ranking signal.\n"
79
+ f"Set it in your project .env file or ~/.env:\n"
80
+ f" {ENV_KEY_NAME}=<your-key>\n"
81
+ f"Obtain an API key at: {KEY_OBTAIN_URL}"
82
+ )
83
+ sys.exit(2)
84
+ return key
85
+
86
+
87
+ def _clean_doi(raw: str) -> str:
88
+ """Validate and normalize a DOI; SystemExit(2) on an invalid value."""
89
+ r = normalize_external_id("doi", raw)
90
+ if not r["valid"] or not r["id"]:
91
+ _err(f"Error: {raw!r} is not a valid DOI (expected form like 10.1234/abcd).")
92
+ sys.exit(2)
93
+ return r["id"]
94
+
95
+
96
+ def _make_session(api_key: str) -> requests.Session:
97
+ session = requests.Session()
98
+ session.headers.update({
99
+ "User-Agent": "lumina-wiki/0.1 (research-pack; scite fetcher)",
100
+ "Authorization": f"Bearer {api_key}",
101
+ })
102
+ return wrap_session(session, namespace="scite")
103
+
104
+
105
+ def _handle_response_errors(resp: requests.Response, context: str) -> None:
106
+ """Check for API-level errors and raise appropriate exceptions.
107
+
108
+ 404 is NOT raised here — a DOI absent from the Scite index is a normal
109
+ "no data" outcome handled by the caller, not an error.
110
+
111
+ ValueError -> exit 2 (request problem); RuntimeError -> exit 3 (transient).
112
+ """
113
+ if resp.status_code == 404:
114
+ return
115
+ if resp.status_code in (401, 403):
116
+ raise ValueError(f"Scite rejected the API key (HTTP {resp.status_code}). Check SCITE_API_KEY.")
117
+ if resp.status_code == 429:
118
+ raise RuntimeError("Rate limit exceeded. Wait before retrying.")
119
+ if resp.status_code >= 500:
120
+ raise RuntimeError(f"Scite API returned HTTP {resp.status_code}")
121
+ if 400 <= resp.status_code < 500:
122
+ # Any other 4xx is a request-shape problem, not a transient one.
123
+ raise ValueError(f"Scite rejected the request for {context} (HTTP {resp.status_code}).")
124
+ resp.raise_for_status()
125
+
126
+
127
+ def _parse_json(resp: requests.Response, context: str) -> dict[str, Any] | None:
128
+ """Parse a JSON object body. Returns None when the body is JSON but not an
129
+ object (treated as "no usable data"). A non-JSON body is a server-side
130
+ problem raised as RuntimeError -> exit 3, not mislabeled exit 2.
131
+ """
132
+ try:
133
+ data = resp.json()
134
+ except ValueError as exc:
135
+ raise RuntimeError(f"Scite returned a non-JSON body for {context}: {exc}") from exc
136
+ return data if isinstance(data, dict) else None
137
+
138
+
139
+ # ---------------------------------------------------------------------------
140
+ # Core fetch
141
+ # ---------------------------------------------------------------------------
142
+
143
+ def fetch_tally(doi: str, session: requests.Session) -> dict[str, Any]:
144
+ """Fetch the Smart Citation tally for a single DOI.
145
+
146
+ Returns a normalized dict. When the DOI is not indexed by Scite, returns
147
+ {"found": False, "doi": doi} rather than raising.
148
+ """
149
+ url = f"{SCITE_API_BASE}/tallies/{quote(doi, safe='/')}"
150
+ resp = session.get(url, timeout=REQUEST_TIMEOUT)
151
+ _handle_response_errors(resp, f"tally for '{doi}'")
152
+ if resp.status_code == 404:
153
+ return {"found": False, "doi": doi}
154
+
155
+ raw = _parse_json(resp, f"tally for '{doi}'")
156
+ if raw is None:
157
+ # 200 with no usable tally object — treat as no data, not zeros.
158
+ return {"found": False, "doi": doi}
159
+ # Scite returns either a flat tally object or {"tally": {...}}.
160
+ nested = raw.get("tally")
161
+ tally = nested if isinstance(nested, dict) else raw
162
+
163
+ result: dict[str, Any] = {
164
+ "found": True,
165
+ "doi": tally.get("doi", doi),
166
+ "source": "scite.ai",
167
+ }
168
+ # Only surface fields the API actually returned — never fabricate zeros.
169
+ # `contradicting` is Scite's name for what the wiki records as `contrasting`.
170
+ field_aliases = {
171
+ "supporting": ("supporting",),
172
+ "contrasting": ("contradicting", "contrasting"),
173
+ "mentioning": ("mentioning",),
174
+ "unclassified": ("unclassified",),
175
+ "total": ("total",),
176
+ }
177
+ for out_key, candidates in field_aliases.items():
178
+ for name in candidates:
179
+ if tally.get(name) is not None:
180
+ result[out_key] = tally[name]
181
+ break
182
+ return result
183
+
184
+
185
+ # ---------------------------------------------------------------------------
186
+ # CLI
187
+ # ---------------------------------------------------------------------------
188
+
189
+ def main(argv: list[str] | None = None) -> None:
190
+ parser = argparse.ArgumentParser(
191
+ prog="fetch_scite.py",
192
+ description="Fetch Scite.ai Smart Citation tallies. Requires SCITE_API_KEY.",
193
+ )
194
+ subparsers = parser.add_subparsers(dest="command", required=True)
195
+
196
+ t = subparsers.add_parser("tally", help="Fetch supporting/contrasting/mentioning tallies for a DOI.")
197
+ t.add_argument("doi", help="DOI of the publication (e.g. 10.1234/abcd).")
198
+
199
+ args = parser.parse_args(argv)
200
+
201
+ if not args.doi.strip():
202
+ _err("Error: DOI must not be empty.")
203
+ sys.exit(2)
204
+ doi = _clean_doi(args.doi.strip())
205
+
206
+ api_key = _get_api_key()
207
+ session = _make_session(api_key)
208
+
209
+ try:
210
+ result = fetch_tally(doi, session)
211
+ print(json.dumps(result, ensure_ascii=False, indent=2))
212
+ sys.exit(0)
213
+ except ValueError as exc:
214
+ _err(f"Error: {exc}")
215
+ sys.exit(2)
216
+ except RuntimeError as exc:
217
+ _err(f"API error: {exc}")
218
+ _err("Retry hint: wait a few seconds and try again.")
219
+ sys.exit(3)
220
+ except requests.exceptions.ConnectionError as exc:
221
+ _err(f"Network error: {exc}")
222
+ _err("Retry hint: check your internet connection and try again.")
223
+ sys.exit(3)
224
+ except requests.exceptions.Timeout:
225
+ _err("Request timed out while contacting Scite.")
226
+ _err("Retry hint: Scite may be slow; try again in a few minutes.")
227
+ sys.exit(3)
228
+ except requests.exceptions.HTTPError as exc:
229
+ code = exc.response.status_code if exc.response is not None else "unknown"
230
+ _err(f"HTTP error {code} from Scite.")
231
+ _err("Retry hint: try again later.")
232
+ sys.exit(3)
233
+
234
+
235
+ if __name__ == "__main__":
236
+ main()