sciverse 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,282 @@
1
+ # sciverse
2
+
3
+ [English](#english) | [中文](#中文)
4
+
5
+ SciVerse open-platform TypeScript SDK for academic paper retrieval. Wraps five
6
+ retrieval tools (`searchPapers`, `semanticSearch`, `readContent`, `listCatalog`,
7
+ `getResource`) behind one fetch-based client + ready-to-use `OPENAI_TOOLS` /
8
+ `ANTHROPIC_TOOLS` constants for direct tool-calling.
9
+
10
+ > Tools: `searchPapers` (structured metadata) / `semanticSearch` (semantic retrieval) / `readContent` (text byte-range) / `listCatalog` (field introspection) / `getResource` (paper figure binary).
11
+ >
12
+ > 工具:`searchPapers`(结构化元数据)/ `semanticSearch`(语义检索)/ `readContent`(原文切片)/ `listCatalog`(字段 introspection)/ `getResource`(论文图片二进制)
13
+
14
+ ---
15
+
16
+ ## English
17
+
18
+ ### Install
19
+
20
+ ```bash
21
+ npm install sciverse # or pnpm add / yarn add
22
+ ```
23
+
24
+ Node.js ≥ 18 (uses native `fetch`).
25
+
26
+ ### Configure once via Python CLI (optional but recommended)
27
+
28
+ ```bash
29
+ pip install sciverse && sciverse auth login
30
+ # - opens https://sciverse.space/tokens in your browser
31
+ # - paste the token you create
32
+ # - saved to ~/.sciverse/credentials.json (file mode 0600)
33
+ ```
34
+
35
+ After this any `new AgentToolsClient()` without explicit args picks it up
36
+ automatically. Override hierarchy: explicit arg → `SCIVERSE_API_TOKEN` env →
37
+ credentials file → default. Pure Node.js shops can skip the CLI and use env
38
+ vars / explicit constructor args.
39
+
40
+ ### Quick start
41
+
42
+ ```ts
43
+ import { AgentToolsClient } from "sciverse";
44
+
45
+ const c = new AgentToolsClient(); // token + baseUrl auto-resolved
46
+
47
+ const r: any = await c.semanticSearch({ query: "Transformer attention mechanism", top_k: 3 });
48
+ for (const hit of r.hits) {
49
+ console.log(hit.doc_id, hit.score, hit.title);
50
+ }
51
+ ```
52
+
53
+ ### Explicit construction
54
+
55
+ ```ts
56
+ const c = new AgentToolsClient({
57
+ baseUrl: "https://api.sciverse.space",
58
+ token: process.env.MY_TOKEN!,
59
+ });
60
+ ```
61
+
62
+ ### Five retrieval tools
63
+
64
+ ```ts
65
+ // 1. Structured metadata search (Boolean filters + sort + pagination)
66
+ await c.searchPapers({
67
+ query: "transformer", // full-text BM25 (optional)
68
+ authors: ["Hinton"],
69
+ year_from: 2020, year_to: 2024,
70
+ journals: ["Nature", "Science"],
71
+ sort_by_year: "desc", // "desc" / "asc" / "none"
72
+ page_size: 10,
73
+ });
74
+
75
+ // 2. Natural-language semantic search (vector + BM25 hybrid, returns chunks)
76
+ await c.semanticSearch({ query: "How does attention work?", top_k: 10, mode: "balanced" });
77
+
78
+ // 3. Byte-range read of original paper text
79
+ await c.readContent({ doc_id: "p_xxx", offset: 0, limit: 8192 });
80
+
81
+ // 4. Schema introspection — call once to discover field names + enum values
82
+ await c.listCatalog({ include_sample_values: true });
83
+
84
+ // 5. Fetch a paper figure / table image
85
+ const { bytes, mimeType } = await c.getResource({ file_name: "dt=xxx/p_yyy/f3.png" });
86
+ // `bytes` is a Uint8Array; `mimeType` is e.g. "image/png"
87
+ ```
88
+
89
+ ### Response typing
90
+
91
+ Responses are returned as `unknown`. Cast with the generated OpenAPI types:
92
+
93
+ ```ts
94
+ import type { components } from "sciverse";
95
+ type SemanticSearchResp = components["schemas"]["SemanticSearchResponse"];
96
+ const r = (await c.semanticSearch({ query: "x" })) as SemanticSearchResp;
97
+ ```
98
+
99
+ ### Use with OpenAI / Anthropic tool-calling
100
+
101
+ ```ts
102
+ import OpenAI from "openai";
103
+ import { AgentToolsClient, OPENAI_TOOLS } from "sciverse";
104
+
105
+ const openai = new OpenAI();
106
+ const sv = new AgentToolsClient();
107
+
108
+ const resp = await openai.chat.completions.create({
109
+ model: "gpt-4o",
110
+ tools: OPENAI_TOOLS as any,
111
+ messages: [{ role: "user", content: "Find 3 transformer papers" }],
112
+ });
113
+ // ... dispatch tool_calls to sv.searchPapers / sv.semanticSearch / ...
114
+ ```
115
+
116
+ `ANTHROPIC_TOOLS` is exported the same way for `@anthropic-ai/sdk`.
117
+
118
+ For Claude Agent SDK / OpenAI Agents SDK (agent loop handled by framework),
119
+ see [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server).
120
+
121
+ ### Error handling
122
+
123
+ Non-2xx responses throw `new Error("SciVerse API <status>: <body>")`:
124
+
125
+ ```ts
126
+ try {
127
+ await c.searchPapers({ query: "x" });
128
+ } catch (e) {
129
+ console.error(e); // "SciVerse API 401: {...}"
130
+ }
131
+ ```
132
+
133
+ | HTTP | Meaning |
134
+ |---|---|
135
+ | 400 | Bad request (unknown field, conflicting query+sort, ...) |
136
+ | 401 | Token missing / invalid / user disabled |
137
+ | 403 | Field permission denied |
138
+ | 429 | Rate limit (60 req / 60s per user, shared across protected endpoints) |
139
+ | 502 | Upstream metadata-service unavailable |
140
+
141
+ ### Links
142
+
143
+ - Source repo: <https://github.com/opendatalab/SciVerse-agent-tools>
144
+ - Changelog: <https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
145
+ - Console (get a token): <https://sciverse.space>
146
+ - License: Apache-2.0
147
+
148
+ ---
149
+
150
+ ## 中文
151
+
152
+ SciVerse 开放平台 TypeScript SDK,5 个学术文献检索 tool(结构化元数据、
153
+ 语义检索、原文切片、字段 introspection、论文图片)。
154
+
155
+ ### 安装
156
+
157
+ ```bash
158
+ npm install sciverse # 或 pnpm add / yarn add
159
+ ```
160
+
161
+ 要求 Node.js ≥ 18(使用 native fetch)。
162
+
163
+ ### 通过 Python CLI 登录一次(推荐)
164
+
165
+ ```bash
166
+ pip install sciverse && sciverse auth login
167
+ # - 浏览器打开 https://sciverse.space/tokens
168
+ # - 复制控制台生成的 token,粘贴回 CLI
169
+ # - 保存到 ~/.sciverse/credentials.json(文件权限 0600)
170
+ ```
171
+
172
+ 之后任何 `new AgentToolsClient()` 不传 token 都自动 fallback 读取。优先级:
173
+ 显式参数 → `SCIVERSE_API_TOKEN` 环境变量 → 凭据文件 → 默认值。纯 Node 用户
174
+ 不想装 Python 也可以直接通过环境变量或构造参数传 token。
175
+
176
+ ### 快速开始
177
+
178
+ ```ts
179
+ import { AgentToolsClient } from "sciverse";
180
+
181
+ const c = new AgentToolsClient(); // token + baseUrl 自动解析
182
+
183
+ const r: any = await c.semanticSearch({ query: "Transformer 注意力机制", top_k: 3 });
184
+ for (const hit of r.hits) {
185
+ console.log(hit.doc_id, hit.score, hit.title);
186
+ }
187
+ ```
188
+
189
+ ### 显式构造
190
+
191
+ ```ts
192
+ const c = new AgentToolsClient({
193
+ baseUrl: "https://api.sciverse.space",
194
+ token: process.env.MY_TOKEN!,
195
+ });
196
+ ```
197
+
198
+ ### 5 个检索 tool
199
+
200
+ ```ts
201
+ // 1. 结构化元数据查询(布尔过滤 + 排序 + 分页)
202
+ await c.searchPapers({
203
+ query: "transformer", // 全文 BM25(可选)
204
+ authors: ["Hinton"],
205
+ year_from: 2020, year_to: 2024,
206
+ journals: ["Nature", "Science"],
207
+ sort_by_year: "desc", // "desc" / "asc" / "none"
208
+ page_size: 10,
209
+ });
210
+
211
+ // 2. 自然语言语义检索(向量 + BM25 混合,返回 chunk)
212
+ await c.semanticSearch({ query: "注意力机制如何工作?", top_k: 10, mode: "balanced" });
213
+
214
+ // 3. 按字节区间读原文
215
+ await c.readContent({ doc_id: "p_xxx", offset: 0, limit: 8192 });
216
+
217
+ // 4. 字段 introspection —— Agent 接入第一步
218
+ await c.listCatalog({ include_sample_values: true });
219
+
220
+ // 5. 取文献附属图片(read_content Markdown 中 ![alt](file_name) 占位时)
221
+ const { bytes, mimeType } = await c.getResource({ file_name: "dt=xxx/p_yyy/f3.png" });
222
+ // `bytes` 是 Uint8Array;`mimeType` 形如 "image/png"
223
+ ```
224
+
225
+ ### 响应类型化
226
+
227
+ 响应默认 `unknown`,用派生自 OpenAPI 的类型 cast:
228
+
229
+ ```ts
230
+ import type { components } from "sciverse";
231
+ type SemanticSearchResp = components["schemas"]["SemanticSearchResponse"];
232
+ const r = (await c.semanticSearch({ query: "x" })) as SemanticSearchResp;
233
+ ```
234
+
235
+ ### 接入 OpenAI / Anthropic tool calling
236
+
237
+ ```ts
238
+ import OpenAI from "openai";
239
+ import { AgentToolsClient, OPENAI_TOOLS } from "sciverse";
240
+
241
+ const openai = new OpenAI();
242
+ const sv = new AgentToolsClient();
243
+
244
+ const resp = await openai.chat.completions.create({
245
+ model: "gpt-4o",
246
+ tools: OPENAI_TOOLS as any,
247
+ messages: [{ role: "user", content: "找 3 篇 Transformer 论文" }],
248
+ });
249
+ // ... 同理 dispatch tool_calls 到 sv.searchPapers / sv.semanticSearch / ...
250
+ ```
251
+
252
+ `ANTHROPIC_TOOLS` 同样导出,用于 `@anthropic-ai/sdk`。
253
+
254
+ Claude Agent SDK / OpenAI Agents SDK 写法更简洁(agent loop 由框架处理),
255
+ 详见 [`sciverse-mcp-server`](https://www.npmjs.com/package/sciverse-mcp-server)。
256
+
257
+ ### 错误处理
258
+
259
+ 非 2xx 响应抛 `new Error("SciVerse API <status>: <body>")`:
260
+
261
+ ```ts
262
+ try {
263
+ await c.searchPapers({ query: "x" });
264
+ } catch (e) {
265
+ console.error(e); // "SciVerse API 401: {...}"
266
+ }
267
+ ```
268
+
269
+ | HTTP | 含义 |
270
+ |---|---|
271
+ | 400 | 请求参数错误(未知字段 / query 与 sort 冲突等) |
272
+ | 401 | Token 缺失 / 无效 / 用户被禁用 |
273
+ | 403 | 字段权限不足 |
274
+ | 429 | 用户级限流(60 请求 / 60 秒,受保护接口共享额度) |
275
+ | 502 | 上游 metadata-service 不可用 |
276
+
277
+ ### 链接
278
+
279
+ - 源码仓库:<https://github.com/opendatalab/SciVerse-agent-tools>
280
+ - 变更日志:<https://github.com/opendatalab/SciVerse-agent-tools/blob/main/CHANGELOG.md>
281
+ - 控制台申请 Token:<https://sciverse.space>
282
+ - 协议:Apache-2.0
@@ -0,0 +1,37 @@
1
+ /**
2
+ * AgentToolsClientOptions
3
+ *
4
+ * `token` 和 `baseUrl` 都是可选的。未传时按以下顺序 fallback:
5
+ * 1. 显式参数
6
+ * 2. 环境变量 SCIVERSE_API_TOKEN / SCIVERSE_BASE_URL
7
+ * 3. ~/.sciverse/credentials.json(由 `sciverse auth login` Python CLI 写入)
8
+ * 4. baseUrl 默认值 https://api.sciverse.space;token 找不到则构造抛错
9
+ */
10
+ export interface AgentToolsClientOptions {
11
+ baseUrl?: string;
12
+ token?: string;
13
+ }
14
+ export declare class AgentToolsClient {
15
+ private baseUrl;
16
+ private token;
17
+ constructor(opts?: AgentToolsClientOptions);
18
+ private request;
19
+ searchPapers(args: Record<string, unknown>): Promise<unknown>;
20
+ semanticSearch(body: {
21
+ query: string;
22
+ } & Record<string, unknown>): Promise<unknown>;
23
+ listCatalog(params?: {
24
+ include_sample_values?: boolean;
25
+ }): Promise<unknown>;
26
+ getResource(params: {
27
+ file_name: string;
28
+ }): Promise<{
29
+ bytes: Uint8Array;
30
+ mimeType: string;
31
+ }>;
32
+ readContent(params: {
33
+ doc_id: string;
34
+ offset?: number;
35
+ limit?: number;
36
+ }): Promise<unknown>;
37
+ }
package/dist/client.js ADDED
@@ -0,0 +1,124 @@
1
+ import { randomUUID } from "node:crypto";
2
+ import { resolveEndpoint, resolveToken } from "./credentials.js";
3
+ const SKILL_NAME = "sciverse";
4
+ const CHANNEL = "typescript-sdk";
5
+ const PLATFORM = process.platform;
6
+ const PASSTHROUGH = ["query", "page", "page_size", "fields"];
7
+ function toBackendPayload(args) {
8
+ const out = {};
9
+ const filters = [];
10
+ const sort = [];
11
+ for (const k of PASSTHROUGH) {
12
+ if (args[k] !== undefined && args[k] !== null)
13
+ out[k] = args[k];
14
+ }
15
+ const addFilter = (field, operator, value) => {
16
+ filters.push({ field, operator, value });
17
+ };
18
+ if (args.title_contains !== undefined && args.title_contains !== null) {
19
+ addFilter("title", "FILTER_OP_CONTAINS", args.title_contains);
20
+ }
21
+ if (args.abstract_contains !== undefined && args.abstract_contains !== null) {
22
+ addFilter("abstract", "FILTER_OP_CONTAINS", args.abstract_contains);
23
+ }
24
+ if (Array.isArray(args.authors) && args.authors.length > 0) {
25
+ addFilter("author", "FILTER_OP_IN", args.authors);
26
+ }
27
+ if (args.year_from !== undefined && args.year_from !== null) {
28
+ addFilter("publication_published_year", "FILTER_OP_GTE", args.year_from);
29
+ }
30
+ if (args.year_to !== undefined && args.year_to !== null) {
31
+ addFilter("publication_published_year", "FILTER_OP_LTE", args.year_to);
32
+ }
33
+ if (Array.isArray(args.journals) && args.journals.length > 0) {
34
+ addFilter("publication_venue_name", "FILTER_OP_IN", args.journals);
35
+ }
36
+ if (Array.isArray(args.subjects) && args.subjects.length > 0) {
37
+ addFilter("subjects", "FILTER_OP_IN", args.subjects);
38
+ }
39
+ if (Array.isArray(args.filters_advanced)) {
40
+ for (const item of args.filters_advanced) {
41
+ filters.push({ operator: "FILTER_OP_EQ", ...item });
42
+ }
43
+ }
44
+ const sortByYear = args.sort_by_year;
45
+ if (sortByYear && sortByYear !== "none") {
46
+ sort.push({
47
+ field: "publication_published_year",
48
+ order: sortByYear === "desc" ? "SORT_ORDER_DESC" : "SORT_ORDER_ASC",
49
+ });
50
+ }
51
+ if (filters.length > 0)
52
+ out.filters = filters;
53
+ if (sort.length > 0)
54
+ out.sort = sort;
55
+ return out;
56
+ }
57
+ export class AgentToolsClient {
58
+ baseUrl;
59
+ token;
60
+ constructor(opts = {}) {
61
+ const token = resolveToken(opts.token);
62
+ if (!token) {
63
+ throw new Error("未找到 SciVerse API Token。请显式传 token、或设 SCIVERSE_API_TOKEN 环境变量、" +
64
+ "或运行 `pip install sciverse && sciverse auth login` 保存凭据到 ~/.sciverse/credentials.json。");
65
+ }
66
+ this.baseUrl = resolveEndpoint(opts.baseUrl).replace(/\/$/, "");
67
+ this.token = token;
68
+ }
69
+ async request(path, init) {
70
+ const res = await fetch(`${this.baseUrl}${path}`, {
71
+ ...init,
72
+ headers: {
73
+ ...(init.headers ?? {}),
74
+ authorization: `Bearer ${this.token}`,
75
+ "content-type": "application/json",
76
+ "x-request-id": `${SKILL_NAME}-${PLATFORM}-${CHANNEL}-${randomUUID()}`,
77
+ },
78
+ });
79
+ if (!res.ok) {
80
+ const body = await res.text();
81
+ throw new Error(`SciVerse API ${res.status}: ${body}`);
82
+ }
83
+ return (await res.json());
84
+ }
85
+ async searchPapers(args) {
86
+ const body = toBackendPayload(args);
87
+ return this.request("/meta-search", { method: "POST", body: JSON.stringify(body) });
88
+ }
89
+ async semanticSearch(body) {
90
+ const cleaned = Object.fromEntries(Object.entries(body).filter(([, v]) => v !== undefined));
91
+ return this.request("/agentic-search", { method: "POST", body: JSON.stringify(cleaned) });
92
+ }
93
+ async listCatalog(params = {}) {
94
+ const qs = new URLSearchParams();
95
+ qs.set("include_sample_values", String(Boolean(params.include_sample_values)));
96
+ return this.request(`/meta-catalog?${qs.toString()}`, { method: "GET" });
97
+ }
98
+ async getResource(params) {
99
+ const qs = new URLSearchParams({ file_name: params.file_name });
100
+ const res = await fetch(`${this.baseUrl}/resource?${qs.toString()}`, {
101
+ method: "GET",
102
+ headers: {
103
+ authorization: `Bearer ${this.token}`,
104
+ accept: "image/*",
105
+ },
106
+ });
107
+ if (!res.ok) {
108
+ const body = await res.text();
109
+ throw new Error(`SciVerse API ${res.status}: ${body}`);
110
+ }
111
+ const mimeType = (res.headers.get("content-type") || "application/octet-stream").split(";")[0].trim();
112
+ const buf = new Uint8Array(await res.arrayBuffer());
113
+ return { bytes: buf, mimeType };
114
+ }
115
+ async readContent(params) {
116
+ const qs = new URLSearchParams();
117
+ qs.set("doc_id", params.doc_id);
118
+ if (params.offset !== undefined)
119
+ qs.set("offset", String(params.offset));
120
+ if (params.limit !== undefined)
121
+ qs.set("limit", String(params.limit));
122
+ return this.request(`/content?${qs.toString()}`, { method: "GET" });
123
+ }
124
+ }
@@ -0,0 +1,10 @@
1
+ export interface StoredCredentials {
2
+ token?: string;
3
+ endpoint?: string;
4
+ saved_at?: string;
5
+ }
6
+ export declare const DEFAULT_ENDPOINT = "https://api.sciverse.space";
7
+ export declare function credentialsPath(): string;
8
+ export declare function loadStoredCredentials(): StoredCredentials | null;
9
+ export declare function resolveToken(explicit?: string, env?: NodeJS.ProcessEnv): string | null;
10
+ export declare function resolveEndpoint(explicit?: string, env?: NodeJS.ProcessEnv): string;
@@ -0,0 +1,53 @@
1
+ // 共享凭据文件读取(与 packages/mcp/src/credentials.ts + Python
2
+ // sciverse.credentials 同源契约)。
3
+ //
4
+ // 文件 `~/.sciverse/credentials.json` 由 Python CLI `sciverse auth login` 写入;
5
+ // 这里只读不写,让"装一次 Python CLI 后所有客户端形态都免传 token"链路成立。
6
+ import { readFileSync, existsSync } from "node:fs";
7
+ import { homedir } from "node:os";
8
+ import { join } from "node:path";
9
+ export const DEFAULT_ENDPOINT = "https://api.sciverse.space";
10
+ /** 返回用户 home 目录。优先读环境变量 HOME/USERPROFILE 以便测试 override,
11
+ * 否则 fallback 到 os.homedir()(Node 内置)。 */
12
+ function getHomeDir() {
13
+ return process.env.HOME ?? process.env.USERPROFILE ?? homedir();
14
+ }
15
+ export function credentialsPath() {
16
+ return join(getHomeDir(), ".sciverse", "credentials.json");
17
+ }
18
+ export function loadStoredCredentials() {
19
+ const path = credentialsPath();
20
+ if (!existsSync(path))
21
+ return null;
22
+ try {
23
+ const raw = readFileSync(path, "utf8");
24
+ const data = JSON.parse(raw);
25
+ if (data && typeof data === "object" && !Array.isArray(data)) {
26
+ return data;
27
+ }
28
+ return null;
29
+ }
30
+ catch {
31
+ return null;
32
+ }
33
+ }
34
+ export function resolveToken(explicit, env = process.env) {
35
+ if (explicit)
36
+ return explicit;
37
+ if (env.SCIVERSE_API_TOKEN)
38
+ return env.SCIVERSE_API_TOKEN;
39
+ const creds = loadStoredCredentials();
40
+ if (creds?.token)
41
+ return creds.token;
42
+ return null;
43
+ }
44
+ export function resolveEndpoint(explicit, env = process.env) {
45
+ if (explicit)
46
+ return explicit;
47
+ if (env.SCIVERSE_BASE_URL)
48
+ return env.SCIVERSE_BASE_URL;
49
+ const creds = loadStoredCredentials();
50
+ if (creds?.endpoint)
51
+ return creds.endpoint;
52
+ return DEFAULT_ENDPOINT;
53
+ }
@@ -0,0 +1,4 @@
1
+ export { AgentToolsClient } from "./client";
2
+ export type { AgentToolsClientOptions } from "./client";
3
+ export { TOOLS_VERSION, OPENAI_TOOLS, ANTHROPIC_TOOLS } from "./tools";
4
+ export type { components, paths } from "./types";
package/dist/index.js ADDED
@@ -0,0 +1,2 @@
1
+ export { AgentToolsClient } from "./client";
2
+ export { TOOLS_VERSION, OPENAI_TOOLS, ANTHROPIC_TOOLS } from "./tools";