sciverse 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,11 @@
1
+ __pycache__/
2
+ *.pyc
3
+ .venv/
4
+ .pytest_cache/
5
+ node_modules/
6
+ packages/python/dist/
7
+ packages/python/build/
8
+ packages/typescript/dist/
9
+ packages/typescript/node_modules/
10
+ packages/mcp/dist/
11
+ packages/mcp/src/generated/
@@ -0,0 +1,354 @@
1
+ Metadata-Version: 2.4
2
+ Name: sciverse
3
+ Version: 0.3.0
4
+ Summary: SciVerse Agent Tools — OpenAI/Anthropic/LangChain compatible tool schema and async client for SciVerse retrieval APIs
5
+ Project-URL: Homepage, https://github.com/opendatalab/SciVerse-agent-tools
6
+ Project-URL: Repository, https://github.com/opendatalab/SciVerse-agent-tools.git
7
+ Project-URL: Documentation, https://github.com/opendatalab/SciVerse-agent-tools#readme
8
+ Project-URL: Changelog, https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md
9
+ Project-URL: Issues, https://github.com/opendatalab/SciVerse-agent-tools/issues
10
+ Author: SciVerse Platform Team
11
+ License: Apache-2.0
12
+ Keywords: agent,llm,scientific-papers,sciverse,tool-calling
13
+ Requires-Python: >=3.10
14
+ Requires-Dist: httpx>=0.27
15
+ Requires-Dist: pydantic>=2.5
16
+ Description-Content-Type: text/markdown
17
+
18
+ # sciverse
19
+
20
+ [English](#english) | [中文](#中文)
21
+
22
+ SciVerse open-platform Python SDK + CLI for academic paper retrieval. Wraps
23
+ five retrieval tools (`search_papers`, `semantic_search`, `read_content`,
24
+ `list_catalog`, `get_resource`) behind one async client + ready-to-use
25
+ `OPENAI_TOOLS` / `ANTHROPIC_TOOLS` constants for direct tool-calling.
26
+
27
+ > 工具:`search_papers`(结构化元数据)/ `semantic_search`(语义检索)/ `read_content`(原文切片)/ `list_catalog`(字段 introspection)/ `get_resource`(论文图片二进制)
28
+
29
+ ---
30
+
31
+ ## English
32
+
33
+ ### Install
34
+
35
+ ```bash
36
+ pip install sciverse
37
+ # or, if you only want the CLI:
38
+ pipx install sciverse
39
+ ```
40
+
41
+ ### Configure once (no env vars needed afterwards)
42
+
43
+ ```bash
44
+ sciverse auth login
45
+ # - opens https://sciverse.space/tokens in your browser
46
+ # - paste the token you create
47
+ # - saved to ~/.sciverse/credentials.json (file mode 0600)
48
+ ```
49
+
50
+ After this any `AgentToolsClient()` without explicit args picks it up
51
+ automatically. Override hierarchy: explicit arg → `SCIVERSE_API_TOKEN` env →
52
+ credentials file → default.
53
+
54
+ ### CLI
55
+
56
+ ```bash
57
+ sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
58
+ sciverse auth status # show masked token / endpoint / saved_at
59
+ sciverse auth logout # delete the credentials file
60
+ ```
61
+
62
+ `--token` is useful in CI scripts. `--no-browser` is for remote / headless
63
+ boxes.
64
+
65
+ ### Quick start
66
+
67
+ ```python
68
+ import asyncio
69
+ from sciverse import AgentToolsClient
70
+
71
+ async def main():
72
+ async with AgentToolsClient() as c: # token + endpoint auto-resolved
73
+ r = await c.semantic_search(query="Transformer attention mechanism", top_k=3)
74
+ for hit in r["hits"]:
75
+ print(hit["doc_id"], hit["score"], hit["title"])
76
+
77
+ asyncio.run(main())
78
+ ```
79
+
80
+ ### Long-lived client (web server / agent runtime)
81
+
82
+ ```python
83
+ client = AgentToolsClient() # construct once at startup
84
+ try:
85
+ while serving:
86
+ r = await client.semantic_search(query=...)
87
+ ...
88
+ finally:
89
+ await client.aclose() # release the underlying httpx connection pool
90
+ ```
91
+
92
+ ### Five retrieval tools
93
+
94
+ ```python
95
+ # 1. Structured metadata search (Boolean filters + sort + pagination)
96
+ await c.search_papers(
97
+ query="transformer", # full-text BM25 (optional)
98
+ authors=["Hinton"],
99
+ year_from=2020, year_to=2024,
100
+ journals=["Nature", "Science"],
101
+ sort_by_year="desc", # "desc" / "asc" / "none"
102
+ page_size=10,
103
+ )
104
+
105
+ # 2. Natural-language semantic search (vector + BM25 hybrid, returns chunks)
106
+ await c.semantic_search(query="How does attention work?", top_k=10, mode="balanced")
107
+
108
+ # 3. Byte-range read of original paper text
109
+ # (use doc_id + offset from semantic_search hits)
110
+ await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
111
+
112
+ # 4. Schema introspection — call once to discover field names + enum values
113
+ await c.list_catalog(include_sample_values=True)
114
+
115
+ # 5. Fetch a paper figure / table image (when read_content Markdown contains
116
+ # ![alt](file_name) placeholders)
117
+ bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
118
+ ```
119
+
120
+ ### Use with Anthropic / OpenAI tool-calling
121
+
122
+ The SDK exports ready-to-use tool schemas matching each provider's spec —
123
+ drop straight into `messages.create(tools=...)` or
124
+ `chat.completions.create(tools=...)`.
125
+
126
+ ```python
127
+ from anthropic import Anthropic
128
+ from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
129
+
130
+ anthropic = Anthropic()
131
+ async with AgentToolsClient() as sv:
132
+ messages = [{"role": "user", "content": "Find 3 transformer papers"}]
133
+ resp = anthropic.messages.create(
134
+ model="claude-opus-4-7",
135
+ max_tokens=4096,
136
+ tools=ANTHROPIC_TOOLS,
137
+ messages=messages,
138
+ )
139
+ # ... handle tool_use blocks by dispatching to sv.search_papers / ...
140
+ ```
141
+
142
+ ```python
143
+ from openai import OpenAI
144
+ from sciverse import OPENAI_TOOLS, AgentToolsClient
145
+
146
+ openai = OpenAI()
147
+ async with AgentToolsClient() as sv:
148
+ resp = openai.chat.completions.create(
149
+ model="gpt-4o",
150
+ tools=OPENAI_TOOLS,
151
+ messages=[{"role": "user", "content": "Find 3 transformer papers"}],
152
+ )
153
+ # ... handle tool_calls similarly
154
+ ```
155
+
156
+ For Claude Agent SDK / OpenAI Agents SDK (agent loop handled by framework),
157
+ see [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server).
158
+
159
+ ### Error handling
160
+
161
+ Non-2xx responses raise `httpx.HTTPStatusError`. Platform error body:
162
+ `{code, message, request_id}`.
163
+
164
+ ```python
165
+ import httpx
166
+ try:
167
+ await c.search_papers(query="x")
168
+ except httpx.HTTPStatusError as e:
169
+ print(e.response.status_code, e.response.text)
170
+ ```
171
+
172
+ | HTTP | Meaning |
173
+ |---|---|
174
+ | 400 | Bad request (unknown field, conflicting query+sort, ...) |
175
+ | 401 | Token missing / invalid / user disabled |
176
+ | 403 | Field permission denied |
177
+ | 429 | Rate limit (60 req / 60s per user, shared across protected endpoints) |
178
+ | 502 | Upstream metadata-service unavailable |
179
+
180
+ ### Typed request models (optional)
181
+
182
+ ```python
183
+ from sciverse.types import SearchPapersRequest, SemanticSearchRequest
184
+ # Pydantic v2 models — for explicit validation when constructing requests.
185
+ ```
186
+
187
+ ### Links
188
+
189
+ - Source repo: <https://github.com/opendatalab/SciVerse-agent-tools>
190
+ - Changelog: <https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
191
+ - Console (get a token): <https://sciverse.space>
192
+ - License: Apache-2.0
193
+
194
+ ---
195
+
196
+ ## 中文
197
+
198
+ SciVerse 开放平台 Python SDK + CLI,提供 5 个学术文献检索 tool(结构化元数据、
199
+ 语义检索、原文切片、字段 introspection、论文图片)。
200
+
201
+ ### 安装
202
+
203
+ ```bash
204
+ pip install sciverse
205
+ # 只想用 CLI 时:
206
+ pipx install sciverse
207
+ ```
208
+
209
+ ### 登录(只跑一次,后续 SDK 无需再传 token)
210
+
211
+ ```bash
212
+ sciverse auth login
213
+ # - 浏览器打开 https://sciverse.space/tokens
214
+ # - 复制控制台生成的 token,粘贴回 CLI
215
+ # - 保存到 ~/.sciverse/credentials.json(文件权限 0600)
216
+ ```
217
+
218
+ 之后任何 `AgentToolsClient()` 不传 token 自动 fallback 读取。优先级:
219
+ 显式参数 → `SCIVERSE_API_TOKEN` 环境变量 → 凭据文件 → 默认值。
220
+
221
+ ### CLI
222
+
223
+ ```bash
224
+ sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
225
+ sciverse auth status # 查看打码后的 token、endpoint、保存时间
226
+ sciverse auth logout # 删凭据文件
227
+ ```
228
+
229
+ `--token` 用于 CI 脚本场景(跳过交互式粘贴)。`--no-browser` 适合远程 / 无桌面环境。
230
+
231
+ ### 快速开始
232
+
233
+ ```python
234
+ import asyncio
235
+ from sciverse import AgentToolsClient
236
+
237
+ async def main():
238
+ async with AgentToolsClient() as c: # token + endpoint 自动解析
239
+ r = await c.semantic_search(query="Transformer 注意力机制", top_k=3)
240
+ for hit in r["hits"]:
241
+ print(hit["doc_id"], hit["score"], hit["title"])
242
+
243
+ asyncio.run(main())
244
+ ```
245
+
246
+ ### 长生命周期 client(web server / agent runtime 场景)
247
+
248
+ ```python
249
+ client = AgentToolsClient() # 启动时构造一次
250
+ try:
251
+ while serving:
252
+ r = await client.semantic_search(query=...)
253
+ ...
254
+ finally:
255
+ await client.aclose() # 显式关闭底层 httpx 连接池
256
+ ```
257
+
258
+ ### 5 个检索 tool
259
+
260
+ ```python
261
+ # 1. 结构化元数据查询(布尔过滤 + 排序 + 分页)
262
+ await c.search_papers(
263
+ query="transformer", # 全文 BM25(可选)
264
+ authors=["Hinton"],
265
+ year_from=2020, year_to=2024,
266
+ journals=["Nature", "Science"],
267
+ sort_by_year="desc", # "desc" / "asc" / "none"
268
+ page_size=10,
269
+ )
270
+
271
+ # 2. 自然语言语义检索(向量 + BM25 混合,返回 chunk)
272
+ await c.semantic_search(query="注意力机制如何工作?", top_k=10, mode="balanced")
273
+
274
+ # 3. 按字节区间读原文(配合 semantic_search 返回的 doc_id + offset 用)
275
+ await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
276
+
277
+ # 4. 字段 introspection —— Agent 接入第一步先调一次拿 schema + 枚举值
278
+ await c.list_catalog(include_sample_values=True)
279
+
280
+ # 5. 取文献附属图片(当 read_content 的 Markdown 含 ![alt](file_name) 占位时)
281
+ bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
282
+ ```
283
+
284
+ ### 接入 Anthropic / OpenAI tool calling
285
+
286
+ SDK 内嵌了对应 provider 格式的 tool schema 常量,可直接喂给
287
+ `messages.create(tools=...)` / `chat.completions.create(tools=...)`:
288
+
289
+ ```python
290
+ from anthropic import Anthropic
291
+ from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
292
+
293
+ anthropic = Anthropic()
294
+ async with AgentToolsClient() as sv:
295
+ messages = [{"role": "user", "content": "找 3 篇 Transformer 论文"}]
296
+ resp = anthropic.messages.create(
297
+ model="claude-opus-4-7",
298
+ max_tokens=4096,
299
+ tools=ANTHROPIC_TOOLS,
300
+ messages=messages,
301
+ )
302
+ # ... 在 tool_use block 里分发到 sv.search_papers / sv.semantic_search / ...
303
+ ```
304
+
305
+ ```python
306
+ from openai import OpenAI
307
+ from sciverse import OPENAI_TOOLS, AgentToolsClient
308
+
309
+ openai = OpenAI()
310
+ async with AgentToolsClient() as sv:
311
+ resp = openai.chat.completions.create(
312
+ model="gpt-4o",
313
+ tools=OPENAI_TOOLS,
314
+ messages=[{"role": "user", "content": "找 3 篇 Transformer 论文"}],
315
+ )
316
+ # ... 同理 dispatch tool_calls
317
+ ```
318
+
319
+ Claude Agent SDK / OpenAI Agents SDK 写起来更简单 —— 它们接受 MCP server 配置,
320
+ agent loop 全权处理。详见 [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server)。
321
+
322
+ ### 错误处理
323
+
324
+ 非 2xx 响应抛 `httpx.HTTPStatusError`。平台错误体格式 `{code, message, request_id}`:
325
+
326
+ ```python
327
+ import httpx
328
+ try:
329
+ await c.search_papers(query="x")
330
+ except httpx.HTTPStatusError as e:
331
+ print(e.response.status_code, e.response.text)
332
+ ```
333
+
334
+ | HTTP | 含义 |
335
+ |---|---|
336
+ | 400 | 请求参数错误(未知字段 / query 与 sort 冲突等) |
337
+ | 401 | Token 缺失 / 无效 / 用户被禁用 |
338
+ | 403 | 字段权限不足 |
339
+ | 429 | 用户级限流(60 请求 / 60 秒,受保护接口共享额度) |
340
+ | 502 | 上游 metadata-service 不可用 |
341
+
342
+ ### 类型化请求构造(可选)
343
+
344
+ ```python
345
+ from sciverse.types import SearchPapersRequest, SemanticSearchRequest
346
+ # Pydantic v2 模型,需要显式校验构造时用。
347
+ ```
348
+
349
+ ### 链接
350
+
351
+ - 源码仓库:<https://github.com/opendatalab/SciVerse-agent-tools>
352
+ - 变更日志:<https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
353
+ - 控制台申请 Token:<https://sciverse.space>
354
+ - 协议:Apache-2.0
@@ -0,0 +1,337 @@
1
+ # sciverse
2
+
3
+ [English](#english) | [中文](#中文)
4
+
5
+ SciVerse open-platform Python SDK + CLI for academic paper retrieval. Wraps
6
+ five retrieval tools (`search_papers`, `semantic_search`, `read_content`,
7
+ `list_catalog`, `get_resource`) behind one async client + ready-to-use
8
+ `OPENAI_TOOLS` / `ANTHROPIC_TOOLS` constants for direct tool-calling.
9
+
10
+ > 工具:`search_papers`(结构化元数据)/ `semantic_search`(语义检索)/ `read_content`(原文切片)/ `list_catalog`(字段 introspection)/ `get_resource`(论文图片二进制)
11
+
12
+ ---
13
+
14
+ ## English
15
+
16
+ ### Install
17
+
18
+ ```bash
19
+ pip install sciverse
20
+ # or, if you only want the CLI:
21
+ pipx install sciverse
22
+ ```
23
+
24
+ ### Configure once (no env vars needed afterwards)
25
+
26
+ ```bash
27
+ sciverse auth login
28
+ # - opens https://sciverse.space/tokens in your browser
29
+ # - paste the token you create
30
+ # - saved to ~/.sciverse/credentials.json (file mode 0600)
31
+ ```
32
+
33
+ After this any `AgentToolsClient()` without explicit args picks it up
34
+ automatically. Override hierarchy: explicit arg → `SCIVERSE_API_TOKEN` env →
35
+ credentials file → default.
36
+
37
+ ### CLI
38
+
39
+ ```bash
40
+ sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
41
+ sciverse auth status # show masked token / endpoint / saved_at
42
+ sciverse auth logout # delete the credentials file
43
+ ```
44
+
45
+ `--token` is useful in CI scripts. `--no-browser` is for remote / headless
46
+ boxes.
47
+
48
+ ### Quick start
49
+
50
+ ```python
51
+ import asyncio
52
+ from sciverse import AgentToolsClient
53
+
54
+ async def main():
55
+ async with AgentToolsClient() as c: # token + endpoint auto-resolved
56
+ r = await c.semantic_search(query="Transformer attention mechanism", top_k=3)
57
+ for hit in r["hits"]:
58
+ print(hit["doc_id"], hit["score"], hit["title"])
59
+
60
+ asyncio.run(main())
61
+ ```
62
+
63
+ ### Long-lived client (web server / agent runtime)
64
+
65
+ ```python
66
+ client = AgentToolsClient() # construct once at startup
67
+ try:
68
+ while serving:
69
+ r = await client.semantic_search(query=...)
70
+ ...
71
+ finally:
72
+ await client.aclose() # release the underlying httpx connection pool
73
+ ```
74
+
75
+ ### Five retrieval tools
76
+
77
+ ```python
78
+ # 1. Structured metadata search (Boolean filters + sort + pagination)
79
+ await c.search_papers(
80
+ query="transformer", # full-text BM25 (optional)
81
+ authors=["Hinton"],
82
+ year_from=2020, year_to=2024,
83
+ journals=["Nature", "Science"],
84
+ sort_by_year="desc", # "desc" / "asc" / "none"
85
+ page_size=10,
86
+ )
87
+
88
+ # 2. Natural-language semantic search (vector + BM25 hybrid, returns chunks)
89
+ await c.semantic_search(query="How does attention work?", top_k=10, mode="balanced")
90
+
91
+ # 3. Byte-range read of original paper text
92
+ # (use doc_id + offset from semantic_search hits)
93
+ await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
94
+
95
+ # 4. Schema introspection — call once to discover field names + enum values
96
+ await c.list_catalog(include_sample_values=True)
97
+
98
+ # 5. Fetch a paper figure / table image (when read_content Markdown contains
99
+ # ![alt](file_name) placeholders)
100
+ bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
101
+ ```
102
+
103
+ ### Use with Anthropic / OpenAI tool-calling
104
+
105
+ The SDK exports ready-to-use tool schemas matching each provider's spec —
106
+ drop straight into `messages.create(tools=...)` or
107
+ `chat.completions.create(tools=...)`.
108
+
109
+ ```python
110
+ from anthropic import Anthropic
111
+ from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
112
+
113
+ anthropic = Anthropic()
114
+ async with AgentToolsClient() as sv:
115
+ messages = [{"role": "user", "content": "Find 3 transformer papers"}]
116
+ resp = anthropic.messages.create(
117
+ model="claude-opus-4-7",
118
+ max_tokens=4096,
119
+ tools=ANTHROPIC_TOOLS,
120
+ messages=messages,
121
+ )
122
+ # ... handle tool_use blocks by dispatching to sv.search_papers / ...
123
+ ```
124
+
125
+ ```python
126
+ from openai import OpenAI
127
+ from sciverse import OPENAI_TOOLS, AgentToolsClient
128
+
129
+ openai = OpenAI()
130
+ async with AgentToolsClient() as sv:
131
+ resp = openai.chat.completions.create(
132
+ model="gpt-4o",
133
+ tools=OPENAI_TOOLS,
134
+ messages=[{"role": "user", "content": "Find 3 transformer papers"}],
135
+ )
136
+ # ... handle tool_calls similarly
137
+ ```
138
+
139
+ For Claude Agent SDK / OpenAI Agents SDK (agent loop handled by framework),
140
+ see [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server).
141
+
142
+ ### Error handling
143
+
144
+ Non-2xx responses raise `httpx.HTTPStatusError`. Platform error body:
145
+ `{code, message, request_id}`.
146
+
147
+ ```python
148
+ import httpx
149
+ try:
150
+ await c.search_papers(query="x")
151
+ except httpx.HTTPStatusError as e:
152
+ print(e.response.status_code, e.response.text)
153
+ ```
154
+
155
+ | HTTP | Meaning |
156
+ |---|---|
157
+ | 400 | Bad request (unknown field, conflicting query+sort, ...) |
158
+ | 401 | Token missing / invalid / user disabled |
159
+ | 403 | Field permission denied |
160
+ | 429 | Rate limit (60 req / 60s per user, shared across protected endpoints) |
161
+ | 502 | Upstream metadata-service unavailable |
162
+
163
+ ### Typed request models (optional)
164
+
165
+ ```python
166
+ from sciverse.types import SearchPapersRequest, SemanticSearchRequest
167
+ # Pydantic v2 models — for explicit validation when constructing requests.
168
+ ```
169
+
170
+ ### Links
171
+
172
+ - Source repo: <https://github.com/opendatalab/SciVerse-agent-tools>
173
+ - Changelog: <https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
174
+ - Console (get a token): <https://sciverse.space>
175
+ - License: Apache-2.0
176
+
177
+ ---
178
+
179
+ ## 中文
180
+
181
+ SciVerse 开放平台 Python SDK + CLI,提供 5 个学术文献检索 tool(结构化元数据、
182
+ 语义检索、原文切片、字段 introspection、论文图片)。
183
+
184
+ ### 安装
185
+
186
+ ```bash
187
+ pip install sciverse
188
+ # 只想用 CLI 时:
189
+ pipx install sciverse
190
+ ```
191
+
192
+ ### 登录(只跑一次,后续 SDK 无需再传 token)
193
+
194
+ ```bash
195
+ sciverse auth login
196
+ # - 浏览器打开 https://sciverse.space/tokens
197
+ # - 复制控制台生成的 token,粘贴回 CLI
198
+ # - 保存到 ~/.sciverse/credentials.json(文件权限 0600)
199
+ ```
200
+
201
+ 之后任何 `AgentToolsClient()` 不传 token 自动 fallback 读取。优先级:
202
+ 显式参数 → `SCIVERSE_API_TOKEN` 环境变量 → 凭据文件 → 默认值。
203
+
204
+ ### CLI
205
+
206
+ ```bash
207
+ sciverse auth login [--token <t>] [--endpoint <url>] [--no-browser]
208
+ sciverse auth status # 查看打码后的 token、endpoint、保存时间
209
+ sciverse auth logout # 删凭据文件
210
+ ```
211
+
212
+ `--token` 用于 CI 脚本场景(跳过交互式粘贴)。`--no-browser` 适合远程 / 无桌面环境。
213
+
214
+ ### 快速开始
215
+
216
+ ```python
217
+ import asyncio
218
+ from sciverse import AgentToolsClient
219
+
220
+ async def main():
221
+ async with AgentToolsClient() as c: # token + endpoint 自动解析
222
+ r = await c.semantic_search(query="Transformer 注意力机制", top_k=3)
223
+ for hit in r["hits"]:
224
+ print(hit["doc_id"], hit["score"], hit["title"])
225
+
226
+ asyncio.run(main())
227
+ ```
228
+
229
+ ### 长生命周期 client(web server / agent runtime 场景)
230
+
231
+ ```python
232
+ client = AgentToolsClient() # 启动时构造一次
233
+ try:
234
+ while serving:
235
+ r = await client.semantic_search(query=...)
236
+ ...
237
+ finally:
238
+ await client.aclose() # 显式关闭底层 httpx 连接池
239
+ ```
240
+
241
+ ### 5 个检索 tool
242
+
243
+ ```python
244
+ # 1. 结构化元数据查询(布尔过滤 + 排序 + 分页)
245
+ await c.search_papers(
246
+ query="transformer", # 全文 BM25(可选)
247
+ authors=["Hinton"],
248
+ year_from=2020, year_to=2024,
249
+ journals=["Nature", "Science"],
250
+ sort_by_year="desc", # "desc" / "asc" / "none"
251
+ page_size=10,
252
+ )
253
+
254
+ # 2. 自然语言语义检索(向量 + BM25 混合,返回 chunk)
255
+ await c.semantic_search(query="注意力机制如何工作?", top_k=10, mode="balanced")
256
+
257
+ # 3. 按字节区间读原文(配合 semantic_search 返回的 doc_id + offset 用)
258
+ await c.read_content(doc_id="p_xxx", offset=0, limit=8192)
259
+
260
+ # 4. 字段 introspection —— Agent 接入第一步先调一次拿 schema + 枚举值
261
+ await c.list_catalog(include_sample_values=True)
262
+
263
+ # 5. 取文献附属图片(当 read_content 的 Markdown 含 ![alt](file_name) 占位时)
264
+ bytes_, mime_type = await c.get_resource(file_name="dt=xxx/p_yyy/f3.png")
265
+ ```
266
+
267
+ ### 接入 Anthropic / OpenAI tool calling
268
+
269
+ SDK 内嵌了对应 provider 格式的 tool schema 常量,可直接喂给
270
+ `messages.create(tools=...)` / `chat.completions.create(tools=...)`:
271
+
272
+ ```python
273
+ from anthropic import Anthropic
274
+ from sciverse import ANTHROPIC_TOOLS, AgentToolsClient
275
+
276
+ anthropic = Anthropic()
277
+ async with AgentToolsClient() as sv:
278
+ messages = [{"role": "user", "content": "找 3 篇 Transformer 论文"}]
279
+ resp = anthropic.messages.create(
280
+ model="claude-opus-4-7",
281
+ max_tokens=4096,
282
+ tools=ANTHROPIC_TOOLS,
283
+ messages=messages,
284
+ )
285
+ # ... 在 tool_use block 里分发到 sv.search_papers / sv.semantic_search / ...
286
+ ```
287
+
288
+ ```python
289
+ from openai import OpenAI
290
+ from sciverse import OPENAI_TOOLS, AgentToolsClient
291
+
292
+ openai = OpenAI()
293
+ async with AgentToolsClient() as sv:
294
+ resp = openai.chat.completions.create(
295
+ model="gpt-4o",
296
+ tools=OPENAI_TOOLS,
297
+ messages=[{"role": "user", "content": "找 3 篇 Transformer 论文"}],
298
+ )
299
+ # ... 同理 dispatch tool_calls
300
+ ```
301
+
302
+ Claude Agent SDK / OpenAI Agents SDK 写起来更简单 —— 它们接受 MCP server 配置,
303
+ agent loop 全权处理。详见 [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server)。
304
+
305
+ ### 错误处理
306
+
307
+ 非 2xx 响应抛 `httpx.HTTPStatusError`。平台错误体格式 `{code, message, request_id}`:
308
+
309
+ ```python
310
+ import httpx
311
+ try:
312
+ await c.search_papers(query="x")
313
+ except httpx.HTTPStatusError as e:
314
+ print(e.response.status_code, e.response.text)
315
+ ```
316
+
317
+ | HTTP | 含义 |
318
+ |---|---|
319
+ | 400 | 请求参数错误(未知字段 / query 与 sort 冲突等) |
320
+ | 401 | Token 缺失 / 无效 / 用户被禁用 |
321
+ | 403 | 字段权限不足 |
322
+ | 429 | 用户级限流(60 请求 / 60 秒,受保护接口共享额度) |
323
+ | 502 | 上游 metadata-service 不可用 |
324
+
325
+ ### 类型化请求构造(可选)
326
+
327
+ ```python
328
+ from sciverse.types import SearchPapersRequest, SemanticSearchRequest
329
+ # Pydantic v2 模型,需要显式校验构造时用。
330
+ ```
331
+
332
+ ### 链接
333
+
334
+ - 源码仓库:<https://github.com/opendatalab/SciVerse-agent-tools>
335
+ - 变更日志:<https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
336
+ - 控制台申请 Token:<https://sciverse.space>
337
+ - 协议:Apache-2.0