chinese-summary 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +255 -0
- package/package.json +4 -2
- package/tsconfig.json +13 -0
- package/tsup.config.ts +28 -0
package/README.md
ADDED
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# chinese-summary
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/chinese-summary)
|
|
4
|
+
[](https://opensource.org/licenses/MIT)
|
|
5
|
+
|
|
6
|
+
中文文本概要提取库 — 纯机器算法,无 AI 依赖,零外部依赖。
|
|
7
|
+
|
|
8
|
+
基于 TextRank + 位置加权 + TF-IDF 关键词加权 + MMR 多样性选句,支持 5 级压缩,可将长文压缩为一句话。
|
|
9
|
+
|
|
10
|
+
## 特性
|
|
11
|
+
|
|
12
|
+
- **零外部依赖** — 纯 TypeScript 实现,无需分词器、无需 AI 模型
|
|
13
|
+
- **字级 n-gram** — 绕过中文分词,直接按字符滑动窗口计算相似度
|
|
14
|
+
- **5 级压缩** — 从"不压缩"到"极致压缩为一句话",灵活控制摘要长度
|
|
15
|
+
- **位置加权** — 首段首句、段落首尾句获得更高先验权重
|
|
16
|
+
- **TF-IDF 关键词加权** — 包含全文关键词的句子获得额外权重
|
|
17
|
+
- **MMR 去冗余** — 选句时兼顾相关性和多样性,避免语义重复
|
|
18
|
+
- **子句连词处理** — 极致压缩时自动剥离脱离上下文的连词
|
|
19
|
+
- **健壮性** — 完善的输入校验、参数校验、数值安全防护
|
|
20
|
+
|
|
21
|
+
## 安装
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
npm install chinese-summary
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## 使用方式
|
|
28
|
+
|
|
29
|
+
### Node.js(ESM)
|
|
30
|
+
|
|
31
|
+
```ts
|
|
32
|
+
import { extractSummary, rankSentences } from "chinese-summary";
|
|
33
|
+
|
|
34
|
+
const text = "人工智能是计算机科学的重要分支。深度学习推动了AI的快速发展。自然语言处理取得了突破性进展。";
|
|
35
|
+
|
|
36
|
+
// 默认:级别 3(中度压缩,约 30% 句子)
|
|
37
|
+
const result = extractSummary(text);
|
|
38
|
+
console.log(result.text);
|
|
39
|
+
|
|
40
|
+
// 极致压缩:压缩为一句话
|
|
41
|
+
const extreme = extractSummary(text, { compressionLevel: 1 });
|
|
42
|
+
console.log(extreme.text);
|
|
43
|
+
|
|
44
|
+
// 指定句子数量(兼容旧接口)
|
|
45
|
+
const legacy = extractSummary(text, { sentenceCount: 2 });
|
|
46
|
+
console.log(legacy.text);
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Node.js(CJS)
|
|
50
|
+
|
|
51
|
+
```js
|
|
52
|
+
const { extractSummary, rankSentences } = require("chinese-summary");
|
|
53
|
+
|
|
54
|
+
const result = extractSummary(text, { compressionLevel: 3 });
|
|
55
|
+
console.log(result.text);
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### 浏览器(IIFE)
|
|
59
|
+
|
|
60
|
+
```html
|
|
61
|
+
<script src="node_modules/chinese-summary/dist/chinese-summary.iife.js"></script>
|
|
62
|
+
<script>
|
|
63
|
+
var result = ChineseSummary.extractSummary(text, { compressionLevel: 2 });
|
|
64
|
+
console.log(result.text);
|
|
65
|
+
</script>
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### 浏览器(ES Module)
|
|
69
|
+
|
|
70
|
+
```html
|
|
71
|
+
<script type="module">
|
|
72
|
+
import { extractSummary } from "./node_modules/chinese-summary/dist/chinese-summary.mjs";
|
|
73
|
+
const result = extractSummary(text, { compressionLevel: 2 });
|
|
74
|
+
</script>
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## 压缩级别
|
|
78
|
+
|
|
79
|
+
| 级别 | 说明 | 压缩策略 | 适用场景 |
|
|
80
|
+
|------|------|----------|----------|
|
|
81
|
+
| 1 | 极致压缩 | 子句级提取,拼接为一句话 | 标题生成、推送摘要 |
|
|
82
|
+
| 2 | 高度压缩 | 约 20% 句子 + 多轮重排 | 短摘要、列表预览 |
|
|
83
|
+
| 3 | 中度压缩 | 约 30% 句子(**默认**) | 通用摘要 |
|
|
84
|
+
| 4 | 轻度压缩 | 约 50% 句子 | 长摘要、速读 |
|
|
85
|
+
| 5 | 不压缩 | 返回全部句子 | 仅排序、调试 |
|
|
86
|
+
|
|
87
|
+
压缩效果示例(1584 字 AI 文章):
|
|
88
|
+
|
|
89
|
+
| 级别 | 输出字数 | 压缩率 |
|
|
90
|
+
|------|----------|--------|
|
|
91
|
+
| 1 | 69 字 | 95.6% |
|
|
92
|
+
| 2 | 408 字 | 74.2% |
|
|
93
|
+
| 3 | 536 字 | 66.2% |
|
|
94
|
+
| 4 | 886 字 | 44.1% |
|
|
95
|
+
| 5 | 1603 字 | -1.2% |
|
|
96
|
+
|
|
97
|
+
## API
|
|
98
|
+
|
|
99
|
+
### `extractSummary(text, options?)`
|
|
100
|
+
|
|
101
|
+
提取中文文本概要,返回 `SummaryResult`:
|
|
102
|
+
|
|
103
|
+
```ts
|
|
104
|
+
interface SummaryResult {
|
|
105
|
+
summary: string[]; // 摘要句子(按原文顺序)
|
|
106
|
+
sentences: SentenceInfo[]; // 所有句子及其得分
|
|
107
|
+
text: string; // 摘要文本(句子间用空格连接)
|
|
108
|
+
compressionLevel: 1|2|3|4|5;
|
|
109
|
+
clauses?: ClauseInfo[]; // 子句信息(仅级别 1)
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### `rankSentences(text, options?)`
|
|
114
|
+
|
|
115
|
+
仅获取句子得分排名,不提取摘要。返回 `SentenceInfo[]`(按得分降序)。
|
|
116
|
+
|
|
117
|
+
## 配置选项
|
|
118
|
+
|
|
119
|
+
### 压缩控制
|
|
120
|
+
|
|
121
|
+
| 选项 | 类型 | 默认值 | 说明 |
|
|
122
|
+
|------|------|--------|------|
|
|
123
|
+
| `compressionLevel` | `1\|2\|3\|4\|5` | `3` | 压缩级别,与 `sentenceCount` 互斥 |
|
|
124
|
+
| `sentenceCount` | `number` | `3` | 摘要句子数(旧接口) |
|
|
125
|
+
| `maxClauses` | `number` | `3` | 极致压缩最大子句数(仅级别 1) |
|
|
126
|
+
|
|
127
|
+
### TextRank 算法
|
|
128
|
+
|
|
129
|
+
| 选项 | 类型 | 默认值 | 范围 | 说明 |
|
|
130
|
+
|------|------|--------|------|------|
|
|
131
|
+
| `ngramSize` | `number` | `2` | 1-5 | n-gram 大小 |
|
|
132
|
+
| `dampingFactor` | `number` | `0.85` | 0.1-0.95 | 阻尼系数 |
|
|
133
|
+
| `maxIterations` | `number` | `30` | 1-200 | 最大迭代次数 |
|
|
134
|
+
| `convergenceThreshold` | `number` | `0.0001` | 1e-8~1 | 收敛阈值 |
|
|
135
|
+
|
|
136
|
+
### 位置权重
|
|
137
|
+
|
|
138
|
+
| 选项 | 类型 | 默认值 | 说明 |
|
|
139
|
+
|------|------|--------|------|
|
|
140
|
+
| `weightFirstSentence` | `number` | `1.5` | 首段首句权重 |
|
|
141
|
+
| `weightFirstParagraph` | `number` | `1.2` | 首段其他句权重 |
|
|
142
|
+
| `weightParagraphStart` | `number` | `1.1` | 段落首句权重 |
|
|
143
|
+
| `weightParagraphEnd` | `number` | `1.05` | 段落末句权重 |
|
|
144
|
+
|
|
145
|
+
### 多样性与关键词
|
|
146
|
+
|
|
147
|
+
| 选项 | 类型 | 默认值 | 说明 |
|
|
148
|
+
|------|------|--------|------|
|
|
149
|
+
| `mmrLambda` | `number` | `0.7` | MMR 多样性系数 λ(0.3-1.0) |
|
|
150
|
+
| `keywordWeight` | `number` | `1.2` | 关键词权重系数(0=关闭) |
|
|
151
|
+
|
|
152
|
+
## 更多示例
|
|
153
|
+
|
|
154
|
+
```ts
|
|
155
|
+
// 调整多样性:0.3=最大多样性,1.0=纯得分排序
|
|
156
|
+
extractSummary(text, { compressionLevel: 3, mmrLambda: 0.3 });
|
|
157
|
+
|
|
158
|
+
// 调整主题聚焦度:0=关闭,2.0+=强聚焦
|
|
159
|
+
extractSummary(text, { compressionLevel: 3, keywordWeight: 2.0 });
|
|
160
|
+
|
|
161
|
+
// 极致压缩为 5 个子句
|
|
162
|
+
extractSummary(text, { compressionLevel: 1, maxClauses: 5 });
|
|
163
|
+
|
|
164
|
+
// 获取句子排名(调试用)
|
|
165
|
+
const ranked = rankSentences(text);
|
|
166
|
+
ranked.slice(0, 5).forEach(s => console.log(`[${s.score.toFixed(4)}] ${s.text}`));
|
|
167
|
+
|
|
168
|
+
// 强化首段首句(适合新闻)
|
|
169
|
+
extractSummary(text, { compressionLevel: 3, weightFirstSentence: 2.0 });
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## 构建产物
|
|
173
|
+
|
|
174
|
+
| 文件 | 格式 | 适用环境 |
|
|
175
|
+
|------|------|----------|
|
|
176
|
+
| `dist/chinese-summary.cjs` | CJS | Node.js `require()` |
|
|
177
|
+
| `dist/chinese-summary.mjs` | ESM | Node.js `import`、浏览器 `<script type="module">` |
|
|
178
|
+
| `dist/chinese-summary.iife.js` | IIFE | 浏览器 `<script>` 标签,全局变量 `ChineseSummary` |
|
|
179
|
+
| `dist/chinese-summary.d.ts` | 类型声明 | TypeScript 智能提示 |
|
|
180
|
+
|
|
181
|
+
## 从源码构建
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
git clone https://github.com/cn-dev/chinese-summary.git
|
|
185
|
+
cd chinese-summary
|
|
186
|
+
npm install
|
|
187
|
+
npm run build
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## 运行测试
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
# 基础功能测试
|
|
194
|
+
npx tsx test/test.ts
|
|
195
|
+
|
|
196
|
+
# 长文本测试(约 2000 字)
|
|
197
|
+
npx tsx test/test-long.ts
|
|
198
|
+
|
|
199
|
+
# 健壮性测试(74 项边界用例)
|
|
200
|
+
npx tsx test/test-robust.ts
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
## 算法说明
|
|
204
|
+
|
|
205
|
+
本库从零实现了 TextRank 算法(未引用任何第三方库),在此基础上加入了:
|
|
206
|
+
|
|
207
|
+
- **位置先验权重** — 首段首句、段落首尾句获得更高初始分数
|
|
208
|
+
- **TF-IDF 关键词加权** — 以句子为文档,字级 unigram 提取关键词,包含关键词的句子获得额外权重
|
|
209
|
+
- **MMR 多样性选句** — `MMR(s) = λ×score(s) - (1-λ)×max_sim(s, 已选集)`,避免语义重复
|
|
210
|
+
- **子句连词处理** — 极致压缩时自动剥离脱离上下文的连词(如"然而""因此")
|
|
211
|
+
|
|
212
|
+
## 项目结构
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
chinese-summary/
|
|
216
|
+
├── src/
|
|
217
|
+
│ └── chinese-summary.ts # 核心源码(单文件,约 1200 行)
|
|
218
|
+
├── dist/ # 构建产物
|
|
219
|
+
├── test/
|
|
220
|
+
│ ├── test.ts # 基础功能测试
|
|
221
|
+
│ ├── test-long.ts # 长文本测试
|
|
222
|
+
│ └── test-robust.ts # 健壮性测试(74 项)
|
|
223
|
+
├── docs/
|
|
224
|
+
│ ├── usage-guide.md # 详细使用指南
|
|
225
|
+
│ └── test-report.md # 测试报告
|
|
226
|
+
├── tsconfig.json # TypeScript 配置
|
|
227
|
+
├── tsup.config.ts # 构建配置
|
|
228
|
+
└── package.json
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
## License
|
|
232
|
+
|
|
233
|
+
MIT License
|
|
234
|
+
|
|
235
|
+
Copyright (c) 2025 北京锋通科技有限公司
|
|
236
|
+
|
|
237
|
+
Authors: 郭玉峰, 吴琼
|
|
238
|
+
|
|
239
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
240
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
241
|
+
in the Software without restriction, including without limitation the rights
|
|
242
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
243
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
244
|
+
furnished to do so, subject to the following conditions:
|
|
245
|
+
|
|
246
|
+
The above copyright notice and this permission notice shall be included in all
|
|
247
|
+
copies or substantial portions of the Software.
|
|
248
|
+
|
|
249
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
250
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
251
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
252
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
253
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
254
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
255
|
+
SOFTWARE.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "chinese-summary",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.2",
|
|
4
4
|
"description": "中文文本概要提取库(TextRank + 位置加权 + TF-IDF + MMR)",
|
|
5
5
|
"author": "郭玉峰, 吴琼 <gyfinjava@163.com> (北京锋通科技有限公司)",
|
|
6
6
|
"license": "MIT",
|
|
@@ -38,7 +38,9 @@
|
|
|
38
38
|
"dist",
|
|
39
39
|
"src",
|
|
40
40
|
"test",
|
|
41
|
-
"docs"
|
|
41
|
+
"docs",
|
|
42
|
+
"tsconfig.json",
|
|
43
|
+
"tsup.config.ts"
|
|
42
44
|
],
|
|
43
45
|
"scripts": {
|
|
44
46
|
"build": "tsup",
|
package/tsconfig.json
ADDED
package/tsup.config.ts
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
import { defineConfig } from "tsup";
|
|
2
|
+
|
|
3
|
+
export default defineConfig([
|
|
4
|
+
// CJS + ESM
|
|
5
|
+
{
|
|
6
|
+
entry: { "chinese-summary": "src/chinese-summary.ts" },
|
|
7
|
+
format: ["esm", "cjs"],
|
|
8
|
+
dts: true,
|
|
9
|
+
clean: true,
|
|
10
|
+
minify: false,
|
|
11
|
+
sourcemap: true,
|
|
12
|
+
outDir: "dist",
|
|
13
|
+
outExtension({ format }) {
|
|
14
|
+
return format === "esm" ? { js: ".mjs" } : { js: ".cjs" };
|
|
15
|
+
},
|
|
16
|
+
},
|
|
17
|
+
// IIFE:浏览器 <script> 标签引入
|
|
18
|
+
{
|
|
19
|
+
entry: { "chinese-summary": "src/chinese-summary.ts" },
|
|
20
|
+
format: ["iife"],
|
|
21
|
+
globalName: "ChineseSummary",
|
|
22
|
+
clean: false,
|
|
23
|
+
minify: false,
|
|
24
|
+
sourcemap: true,
|
|
25
|
+
outDir: "dist",
|
|
26
|
+
outExtension: () => ({ js: ".iife.js" }),
|
|
27
|
+
},
|
|
28
|
+
]);
|