gs-search 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md ADDED
@@ -0,0 +1,126 @@
1
+ # gs-search
2
+
3
+ ## 他の言語
4
+
5
+ - [中文 README](README.zh-CN.md)
6
+ - [English README](README.md)
7
+ - [한국어 README](README.ko.md)
8
+
9
+ JavaScript/TypeScriptアプリケーション向けの軽量、高速、メモリ効率の良い全文検索エンジンです。
10
+
11
+ ## 特徴
12
+
13
+ - 🔍 **全文検索**(トークン化サポート付き)
14
+ - 📦 **軽量**(外部依存関係なし)
15
+ - ⚡ **高速**な検索パフォーマンス
16
+ - 📱 **ブラウザ & Node.js** サポート
17
+ - 🌐 **多言語**トークン化
18
+ - 🗄️ **カスタムストレージ**サポート
19
+ - 📊 **バッチ操作**(効率的なインデックス作成)
20
+
21
+ ## インストール
22
+
23
+ ```bash
24
+ # Yarnを使用する場合
25
+ yarn add gs-search
26
+
27
+ # npmを使用する場合
28
+ npm install gs-search
29
+ ```
30
+
31
+ ## クイックスタート
32
+
33
+ ```typescript
34
+ import { SimpleSearch } from 'gs-search';
35
+
36
+ // 検索エンジンインスタンスを作成
37
+ const searchEngine = new SimpleSearch();
38
+
39
+ // ドキュメントを追加
40
+ await searchEngine.addDocuments([
41
+ { id: 1, text: 'Hello world!' },
42
+ { id: 2, text: 'This is a test document' },
43
+ { id: 3, text: 'Another document for testing' }
44
+ ]);
45
+
46
+ // 検索
47
+ const results = await searchEngine.search('test');
48
+ console.log(results);
49
+ // 出力: [{ id: 2, score: 1.5, tokens: ['test'] }, { id: 3, score: 1.5, tokens: ['test'] }]
50
+
51
+ // ドキュメントを削除
52
+ await searchEngine.deleteDocument(1);
53
+
54
+ // 検索エンジンのステータスを取得
55
+ const status = await searchEngine.getStatus();
56
+ console.log(status);
57
+ ```
58
+
59
+ ## 高度な使用方法
60
+
61
+ ### SearchEngine
62
+
63
+ より細かい制御と高度な機能には、`SearchEngine`を使用します:
64
+
65
+ ```typescript
66
+ import { SearchEngine, NodeStorage } from 'gs-search';
67
+
68
+ // カスタムストレージでエンジンを作成
69
+ const engine = new SearchEngine({
70
+ storage: new NodeStorage('./search-data')
71
+ });
72
+
73
+ // エンジンを初期化
74
+ await engine.init();
75
+
76
+ // トランザクション内でドキュメントを追加
77
+ await engine.beginTransaction();
78
+ try {
79
+ await engine.addDocuments([
80
+ // ... ドキュメント
81
+ ]);
82
+ await engine.commit();
83
+ } catch (error) {
84
+ await engine.rollback();
85
+ }
86
+ ```
87
+
88
+ ## APIリファレンス
89
+
90
+ ### SimpleSearch
91
+
92
+ - `constructor()`: 新しい検索エンジンインスタンスを作成
93
+ - `addDocument(doc: IDocument): Promise<void>`: 単一のドキュメントを追加
94
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 複数のドキュメントを追加
95
+ - `deleteDocument(id: number): Promise<void>`: ドキュメントを削除
96
+ - `search(query: string, limit?: number): Promise<IResult[]>`: ドキュメントを検索
97
+ - `getStatus(): Promise<IStatus>`: 検索エンジンのステータスを取得
98
+
99
+ ### CoreSearchEngine
100
+
101
+ - `constructor(options: ICoreSearchOptions)`: 新しいコアエンジンインスタンスを作成
102
+ - `init(): Promise<void>`: エンジンを初期化
103
+ - `addDocument(doc: IDocument): Promise<void>`: 単一のドキュメントを追加
104
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 複数のドキュメントを追加
105
+ - `deleteDocument(id: number): Promise<void>`: ドキュメントを削除
106
+ - `search(query: string, limit?: number): Promise<IResult[]>`: ドキュメントを検索
107
+ - `getStatus(): Promise<IStatus>`: 検索エンジンのステータスを取得
108
+ - `beginTransaction(): void`: トランザクションを開始
109
+ - `commit(): Promise<void>`: トランザクションをコミット
110
+ - `rollback(): void`: トランザクションをロールバック
111
+
112
+ ## ストレージ
113
+
114
+ 検索エンジンはカスタムストレージ実装をサポートしています:
115
+
116
+ - `BrowserStorage`: ブラウザ環境用(IndexedDBを使用)
117
+ - `NodeStorage`: Node.js環境用(ファイルシステムを使用)
118
+
119
+ ## ライセンス
120
+
121
+ MIT License
122
+
123
+ ## リンク
124
+
125
+ - [GitHubリポジトリ](https://github.com/grain-sand/gs-search)
126
+ - [npmパッケージ](https://www.npmjs.com/package/gs-search)
package/README.ko.md ADDED
@@ -0,0 +1,126 @@
1
+ # gs-search
2
+
3
+ ## 다른 언어
4
+
5
+ - [中文 README](README.zh-CN.md)
6
+ - [English README](README.md)
7
+ - [日本語 README](README.ja.md)
8
+
9
+ JavaScript/TypeScript 애플리케이션을 위한 가볍고 빠르며 메모리 효율적인 전문 검색 엔진입니다.
10
+
11
+ ## 기능
12
+
13
+ - 🔍 **전문 검색** 토큰화 지원
14
+ - 📦 **가볍고** 외부 의존성 없음
15
+ - ⚡ **고속** 검색 성능
16
+ - 📱 **브라우저 & Node.js** 지원
17
+ - 🌐 **다국어** 토큰화
18
+ - 🗄️ **커스텀 스토리지** 지원
19
+ - 📊 **일괄 작업** 효율적인 인덱싱
20
+
21
+ ## 설치
22
+
23
+ ```bash
24
+ # Yarn 사용
25
+ yarn add gs-search
26
+
27
+ # npm 사용
28
+ npm install gs-search
29
+ ```
30
+
31
+ ## 빠른 시작
32
+
33
+ ```typescript
34
+ import { SimpleSearch } from 'gs-search';
35
+
36
+ // 검색 엔진 인스턴스 생성
37
+ const searchEngine = new SimpleSearch();
38
+
39
+ // 문서 추가
40
+ await searchEngine.addDocuments([
41
+ { id: 1, text: 'Hello world!' },
42
+ { id: 2, text: 'This is a test document' },
43
+ { id: 3, text: 'Another document for testing' }
44
+ ]);
45
+
46
+ // 검색
47
+ const results = await searchEngine.search('test');
48
+ console.log(results);
49
+ // 출력: [{ id: 2, score: 1.5, tokens: ['test'] }, { id: 3, score: 1.5, tokens: ['test'] }]
50
+
51
+ // 문서 삭제
52
+ await searchEngine.deleteDocument(1);
53
+
54
+ // 검색 엔진 상태 가져오기
55
+ const status = await searchEngine.getStatus();
56
+ console.log(status);
57
+ ```
58
+
59
+ ## 고급 사용법
60
+
61
+ ### SearchEngine
62
+
63
+ 보다 세밀한 제어와 고급 기능을 위해 `SearchEngine`를 사용하세요:
64
+
65
+ ```typescript
66
+ import { SearchEngine, NodeStorage } from 'gs-search';
67
+
68
+ // 커스텀 스토리지로 엔진 생성
69
+ const engine = new SearchEngine({
70
+ storage: new NodeStorage('./search-data')
71
+ });
72
+
73
+ // 엔진 초기화
74
+ await engine.init();
75
+
76
+ // 트랜잭션 내에서 문서 추가
77
+ await engine.beginTransaction();
78
+ try {
79
+ await engine.addDocuments([
80
+ // ... 문서
81
+ ]);
82
+ await engine.commit();
83
+ } catch (error) {
84
+ await engine.rollback();
85
+ }
86
+ ```
87
+
88
+ ## API 참조
89
+
90
+ ### SimpleSearch
91
+
92
+ - `constructor()`: 새로운 검색 엔진 인스턴스 생성
93
+ - `addDocument(doc: IDocument): Promise<void>`: 단일 문서 추가
94
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 여러 문서 추가
95
+ - `deleteDocument(id: number): Promise<void>`: 문서 삭제
96
+ - `search(query: string, limit?: number): Promise<IResult[]>`: 문서 검색
97
+ - `getStatus(): Promise<IStatus>`: 검색 엔진 상태 가져오기
98
+
99
+ ### CoreSearchEngine
100
+
101
+ - `constructor(options: ICoreSearchOptions)`: 새로운 코어 엔진 인스턴스 생성
102
+ - `init(): Promise<void>`: 엔진 초기화
103
+ - `addDocument(doc: IDocument): Promise<void>`: 단일 문서 추가
104
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 여러 문서 추가
105
+ - `deleteDocument(id: number): Promise<void>`: 문서 삭제
106
+ - `search(query: string, limit?: number): Promise<IResult[]>`: 문서 검색
107
+ - `getStatus(): Promise<IStatus>`: 검색 엔진 상태 가져오기
108
+ - `beginTransaction(): void`: 트랜잭션 시작
109
+ - `commit(): Promise<void>`: 트랜잭션 커밋
110
+ - `rollback(): void`: 트랜잭션 롤백
111
+
112
+ ## 스토리지
113
+
114
+ 검색 엔진은 커스텀 스토리지 구현을 지원합니다:
115
+
116
+ - `BrowserStorage`: 브라우저 환경용 (IndexedDB 사용)
117
+ - `NodeStorage`: Node.js 환경용 (파일 시스템 사용)
118
+
119
+ ## 라이선스
120
+
121
+ MIT License
122
+
123
+ ## 링크
124
+
125
+ - [GitHub 리포지토리](https://github.com/grain-sand/gs-search)
126
+ - [npm 패키지](https://www.npmjs.com/package/gs-search)
package/README.md ADDED
@@ -0,0 +1,128 @@
1
+ # gs-search
2
+
3
+ ## Other Languages
4
+
5
+ - [中文 README](README.zh-CN.md)
6
+ - [日本語 README](README.ja.md)
7
+ - [한국어 README](README.ko.md)
8
+
9
+ A lightweight, fast, and memory-efficient full-text search engine for JavaScript/TypeScript applications.
10
+
11
+ ## Features
12
+
13
+ - 🔍 **Full-text search** with tokenization support
14
+ - 📦 **Lightweight** with zero external dependencies
15
+ - ⚡ **Fast** search performance
16
+ - 📱 **Browser & Node.js** support
17
+ - 🌐 **Multi-language** tokenization
18
+ - 🗄️ **Custom storage** support
19
+ - 📊 **Batch operations** for efficient indexing
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ # Using yarn
25
+ yarn add gs-search
26
+
27
+ # Using npm
28
+ npm install gs-search
29
+ ```
30
+
31
+ ## Quick Start
32
+
33
+ ```typescript
34
+ import { SimpleSearch } from 'gs-search';
35
+
36
+ // Add documents in batch
37
+ await SimpleSearch.addDocuments([
38
+ { id: 1, text: 'Hello world!' },
39
+ { id: 2, text: 'This is a test document' },
40
+ { id: 3, text: 'Another document for testing' }
41
+ ]);
42
+
43
+ // Add a single document
44
+ await SimpleSearch.addDocument({ id: 4, text: 'Single document addition' });
45
+
46
+ // Search
47
+ const results = await SimpleSearch.search('test');
48
+ console.log(results);
49
+ // Output: [{ id: 2, score: 1.5, tokens: ['test'] }, { id: 3, score: 1.5, tokens: ['test'] }]
50
+
51
+ // Delete a document
52
+ await SimpleSearch.removeDocument(1);
53
+
54
+ // Get search engine status
55
+ const status = await SimpleSearch.getStatus();
56
+ console.log(status);
57
+ ```
58
+
59
+ ## Advanced Usage
60
+
61
+ ### SearchEngine
62
+
63
+ For more control and advanced features, use the `SearchEngine`:
64
+
65
+ ```typescript
66
+ import { SearchEngine, NodeStorage } from 'gs-search';
67
+
68
+ // Create engine with custom storage
69
+ const engine = new SearchEngine({
70
+ storage: new NodeStorage('./search-data')
71
+ });
72
+
73
+ // Initialize engine
74
+ await engine.init();
75
+
76
+ // Add documents in batch
77
+ await engine.startBatch();
78
+ try {
79
+ await engine.addDocuments([
80
+ // ... documents
81
+ ]);
82
+ await engine.endBatch();
83
+ } catch (error) {
84
+ // Handle error
85
+ }
86
+ ```
87
+
88
+ ## API Reference
89
+
90
+ ### SimpleSearch
91
+
92
+ **Static Methods (No instance creation required):**
93
+ - `configure(config: Partial<ISearchEngineConfig>): void`: Configure the search engine
94
+ - `addDocument(doc: IDocument): Promise<void>`: Add a single document
95
+ - `addDocuments(docs: IDocument[]): Promise<void>`: Add multiple documents
96
+ - `removeDocument(id: number): Promise<void>`: Delete a document
97
+ - `search(query: string, limit?: number): Promise<IResult[]>`: Search for documents
98
+ - `getStatus(): Promise<IStatus>`: Get search engine status
99
+ - `startBatch(): void`: Start batch operations
100
+ - `endBatch(): Promise<void>`: End batch operations
101
+
102
+ ### CoreSearchEngine
103
+
104
+ - `constructor(options: ICoreSearchOptions)`: Create a new core engine instance
105
+ - `init(): Promise<void>`: Initialize the engine
106
+ - `addDocument(doc: IDocument): Promise<void>`: Add a single document
107
+ - `addDocuments(docs: IDocument[]): Promise<void>`: Add multiple documents
108
+ - `removeDocument(id: number): Promise<void>`: Delete a document
109
+ - `search(query: string, limit?: number): Promise<IResult[]>`: Search for documents
110
+ - `getStatus(): Promise<IStatus>`: Get search engine status
111
+ - `startBatch(): void`: Start batch operations
112
+ - `endBatch(): Promise<void>`: End batch operations
113
+
114
+ ## Storage
115
+
116
+ The search engine supports custom storage implementations:
117
+
118
+ - `BrowserStorage`: For browser environments (uses IndexedDB)
119
+ - `NodeStorage`: For Node.js environments (uses file system)
120
+
121
+ ## License
122
+
123
+ MIT License
124
+
125
+ ## Links
126
+
127
+ - [GitHub Repository](https://github.com/grain-sand/gs-search)
128
+ - [npm Package](https://www.npmjs.com/package/gs-search)
@@ -0,0 +1,224 @@
1
+ # gs-search
2
+
3
+ ## 其他语言
4
+
5
+ - [English README](README.md)
6
+ - [日本語 README](README.ja.md)
7
+ - [한국어 README](README.ko.md)
8
+
9
+ 可以在现代浏览器运行,且会自动存储索引的极小纯前端搜索库,可以配合其它强大的分词库使用效果更好
10
+
11
+ ## 特性
12
+
13
+ - 🔍 **全文搜索** 支持多语言分词
14
+ - 📦 **轻量级** 无任何第三方依赖,体积小
15
+ - ⚡ **高性能** 快速搜索与索引构建
16
+ - 📱 **浏览器兼容** 支持现代浏览器
17
+ - 🌐 **多语言支持** 适配不同语言分词需求
18
+ - 🗄️ **自定义存储** 支持灵活的存储实现
19
+ - 📊 **批处理操作** 批量添加文档更高效
20
+
21
+ ## 安装
22
+
23
+ ```bash
24
+ # 使用yarn
25
+ yarn add gs-search
26
+
27
+ # 使用npm
28
+ npm install gs-search
29
+ ```
30
+
31
+ ## 快速开始
32
+
33
+ ```typescript
34
+ import { SimpleSearch } from 'gs-search';
35
+
36
+ // 批量添加文档
37
+ await SimpleSearch.addDocuments([
38
+ { id: 1, text: 'Hello world!' },
39
+ { id: 2, text: '这是一个测试文档' },
40
+ { id: 3, text: '另一个用于测试的文档' }
41
+ ]);
42
+
43
+ // 添加单个文档
44
+ await SimpleSearch.addDocument({ id: 4, text: '单个文档添加示例' });
45
+
46
+ // 搜索
47
+ const results = await SimpleSearch.search('测试');
48
+ console.log(results);
49
+ // 输出: [{ id: 2, score: 1.5, tokens: ['测试'] }, { id: 3, score: 1.5, tokens: ['测试'] }]
50
+
51
+ // 删除文档
52
+ await SimpleSearch.removeDocument(1);
53
+
54
+ // 获取搜索引擎状态
55
+ const status = await SimpleSearch.getStatus();
56
+ console.log(status);
57
+ ```
58
+
59
+ ## 高级用法
60
+
61
+ ### SearchEngine
62
+
63
+ 如需更多控制和高级功能,可以使用 `SearchEngine`:
64
+
65
+ ```typescript
66
+ import { SearchEngine, NodeStorage } from 'gs-search';
67
+
68
+ // 创建带有自定义存储的引擎
69
+ const engine = new SearchEngine({
70
+ storage: new NodeStorage('./search-data')
71
+ });
72
+
73
+ // 初始化引擎
74
+ await engine.init();
75
+
76
+ // 批处理添加文档
77
+ await engine.startBatch();
78
+ try {
79
+ await engine.addDocuments([
80
+ // ... 文档
81
+ ]);
82
+ await engine.endBatch();
83
+ } catch (error) {
84
+ // 处理错误
85
+ }
86
+ ```
87
+
88
+ ### 自定义存储
89
+
90
+ 您可以实现自定义存储来持久化数据:
91
+
92
+ ```typescript
93
+ import { StorageInterface } from 'gs-search';
94
+
95
+ class CustomStorage implements StorageInterface {
96
+ async saveIndex(index: any): Promise<void> {
97
+ // 保存索引到自定义存储
98
+ }
99
+
100
+ async loadIndex(): Promise<any | null> {
101
+ // 从自定义存储加载索引
102
+ return null;
103
+ }
104
+
105
+ async clear(): Promise<void> {
106
+ // 清空存储
107
+ }
108
+ }
109
+
110
+ // 使用自定义存储
111
+ const storage = new CustomStorage();
112
+ const engine = new SearchEngine({ storage });
113
+ ```
114
+
115
+ ### 事务支持
116
+
117
+ 使用事务进行批量操作以提高性能:
118
+
119
+ ```typescript
120
+ await engine.startTransaction();
121
+
122
+ try {
123
+ // 批量添加文档
124
+ for (let i = 0; i < 1000; i++) {
125
+ await engine.addDocuments([{ id: i, text: `文档 ${i}` }]);
126
+ }
127
+
128
+ // 提交事务
129
+ await engine.commitTransaction();
130
+ } catch (error) {
131
+ // 回滚事务
132
+ await engine.rollbackTransaction();
133
+ }
134
+ ```
135
+
136
+ ## 自定义分词器
137
+
138
+ 您可以通过配置自定义分词器来支持特定的语言或分词需求。以下是一个简单的正则分词器示例,按空格和字符分词,且最长token不超过5字符:
139
+
140
+ ```typescript
141
+ import { SimpleSearch } from 'gs-search';
142
+
143
+ // 自定义分词器:按空格和字符分词,最长token不超过5字符
144
+ const customTokenizer = (text: string): string[] => {
145
+ // 按空格分词
146
+ const tokens: string[] = [];
147
+ const words = text.toLowerCase().split(/\s+/);
148
+
149
+ // 对每个单词,按字符切分,最长不超过5字符
150
+ for (const word of words) {
151
+ if (word.length <= 5) {
152
+ tokens.push(word);
153
+ } else {
154
+ // 超过5字符的单词按字符切分
155
+ for (let i = 0; i < word.length; i++) {
156
+ tokens.push(word[i]);
157
+ }
158
+ }
159
+ }
160
+
161
+ return tokens;
162
+ };
163
+
164
+ // 配置自定义分词器
165
+ SimpleSearch.configure({
166
+ indexingTokenizer: customTokenizer,
167
+ searchTokenizer: customTokenizer
168
+ });
169
+ ```
170
+
171
+ ## API参考
172
+
173
+ ### SimpleSearch
174
+
175
+ **静态方法(无需创建实例):**
176
+ - `configure(config: Partial<ISearchEngineConfig>): void`: 配置搜索引擎
177
+ - `addDocument(doc: IDocument): Promise<void>`: 添加单个文档
178
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 批量添加文档
179
+ - `removeDocument(id: number): Promise<void>`: 删除文档
180
+ - `search(query: string, limit?: number): Promise<IResult[]>`: 搜索文档
181
+ - `getStatus(): Promise<IStatus>`: 获取搜索引擎状态
182
+ - `startBatch(): void`: 开始批处理操作
183
+ - `endBatch(): Promise<void>`: 结束批处理操作
184
+
185
+ ### CoreSearchEngine
186
+
187
+ - `constructor(options: ICoreSearchOptions)`: 创建核心引擎实例
188
+ - `init(): Promise<void>`: 初始化引擎
189
+ - `addDocument(doc: IDocument): Promise<void>`: 添加单个文档
190
+ - `addDocuments(docs: IDocument[]): Promise<void>`: 批量添加文档
191
+ - `removeDocument(id: number): Promise<void>`: 删除文档
192
+ - `search(query: string, limit?: number): Promise<IResult[]>`: 搜索文档
193
+ - `getStatus(): Promise<IStatus>`: 获取搜索引擎状态
194
+ - `startBatch(): void`: 开始批处理
195
+ - `endBatch(): Promise<void>`: 结束批处理
196
+
197
+ ## 存储支持
198
+
199
+ gs-search支持多种存储方式:
200
+
201
+ - **InMemoryStorage**:内存存储(默认)
202
+ - **LocalStorage**:浏览器本地存储
203
+ - **IndexedDBStorage**:浏览器IndexedDB存储
204
+ - **自定义存储**:实现StorageInterface接口
205
+
206
+ ## 浏览器支持
207
+
208
+ - Chrome (最新)
209
+ - Firefox (最新)
210
+ - Safari (最新)
211
+ - Edge (最新)
212
+
213
+ ## 许可证
214
+
215
+ [MIT License](LICENSE)
216
+
217
+ ## 贡献
218
+
219
+ 欢迎贡献代码!请查看[GitHub仓库](https://github.com/grain-sand/gs-search)了解更多信息。
220
+
221
+ ## 联系方式
222
+
223
+ - [GitHub Repository](https://github.com/grain-sand/gs-search)
224
+ - [npm Package](https://www.npmjs.com/package/gs-search)
package/lib/index.cjs ADDED
@@ -0,0 +1 @@
1
+ "use strict";Object.create,Object.defineProperty,Object.getOwnPropertyDescriptor,Object.getOwnPropertyNames,Object.getPrototypeOf,Object.prototype.hasOwnProperty;class t{#t;constructor(t){this.#t=t}async#e(){return await(await navigator.storage.getDirectory()).getDirectoryHandle(this.#t,{create:!0})}async write(t,e){const s=await(await(await this.#e()).getFileHandle(t,{create:!0})).createWritable();await s.write(e),await s.close()}async append(t,e){const s=await this.#e();let i;try{i=await s.getFileHandle(t,{create:!0})}catch{i=await s.getFileHandle(t,{create:!0})}const n=await i.getFile(),a=await i.createWritable({keepExistingData:!0});await a.seek(n.size),await a.write(e),await a.close()}async read(t){const e=await this.#e();try{return await(await(await e.getFileHandle(t)).getFile()).arrayBuffer()}catch{return null}}async readRange(t,e,s){const i=await this.#e();try{return await(await(await i.getFileHandle(t)).getFile()).slice(e,s).arrayBuffer()}catch{return null}}async remove(t){const e=await this.#e();try{await e.removeEntry(t)}catch{}}async listFiles(){const t=await this.#e(),e=[];for await(const s of t.keys())e.push(s);return e}async clearAll(){const t=await this.#e();for await(const e of t.keys())await t.removeEntry(e,{recursive:!0})}async getFileSize(t){const e=await this.#e();try{return(await(await e.getFileHandle(t)).getFile()).size}catch{return 0}}}class e{#s=null;#i=null;#t;#n="";constructor(t){this.#t=t}async#a(){if(this.#s)return;const t=await import("node:fs"),e=await import("node:path");this.#s=t.promises,this.#i=e.default||e,this.#n=this.#i.join(process.cwd(),this.#t);try{await this.#s.access(this.#n)}catch{await this.#s.mkdir(this.#n,{recursive:!0})}}#r(t){return this.#i.join(this.#n,t)}async write(t,e){await this.#a(),await this.#s.writeFile(this.#r(t),Buffer.from(e))}async append(t,e){await this.#a(),await this.#s.appendFile(this.#r(t),Buffer.from(e))}async read(t){await this.#a();try{const e=await this.#s.readFile(this.#r(t));return e.buffer.slice(e.byteOffset,e.byteOffset+e.byteLength)}catch{return null}}async readRange(t,e,s){await this.#a();try{const i=await this.#s.open(this.#r(t),"r"),n=s-e,a=Buffer.alloc(n);return await i.read(a,0,n,e),await i.close(),a.buffer.slice(a.byteOffset,a.byteOffset+a.byteLength)}catch{return null}}async remove(t){await this.#a();try{await this.#s.unlink(this.#r(t))}catch{}}async listFiles(){await this.#a();try{return await this.#s.readdir(this.#n)}catch{return[]}}async clearAll(){await this.#a();try{const t=await this.#s.readdir(this.#n);for(const e of t)await this.#s.unlink(this.#i.join(this.#n,e))}catch{}}async getFileSize(t){await this.#a();try{return(await this.#s.stat(this.#r(t))).size}catch{return 0}}}const s="search_meta.json",i="deleted_ids.bin",n="added_ids.bin";class a{#o;#h={wordSegments:[],charSegments:[]};#c=new Set;#d=new Set;constructor(t){this.#o=t}async load(){const t=await this.#o.read(s);if(t){const e=(new TextDecoder).decode(t);this.#h=JSON.parse(e)}else this.#h={wordSegments:[],charSegments:[]};const e=await this.#o.read(i);if(e){const t=new DataView(e);let s=0;const i=e.byteLength;for(;s<i&&!(s+4>i);){const e=t.getUint32(s,!0);this.#c.add(e),s+=4,s<i&&30===t.getUint8(s)&&(s+=1)}}const a=await this.#o.read(n);if(a){const t=new DataView(a);let e=0;const s=a.byteLength;for(;e<s&&!(e+4>s);){const i=t.getUint32(e,!0);this.#d.add(i),e+=4,e<s&&30===t.getUint8(e)&&(e+=1)}}}async save(){const t=JSON.stringify(this.#h);if(await this.#o.write(s,(new TextEncoder).encode(t).buffer),0===this.#c.size)await this.#o.remove(i);else{const t=4*this.#c.size+this.#c.size,e=new ArrayBuffer(t),s=new DataView(e);let n=0;for(const t of this.#c)s.setUint32(n,t,!0),n+=4,s.setUint8(n,30),n+=1;await this.#o.write(i,e)}if(0===this.#d.size)await this.#o.remove(n);else{const t=4*this.#d.size+this.#d.size,e=new ArrayBuffer(t),s=new DataView(e);let i=0;for(const t of this.#d)s.setUint32(i,t,!0),i+=4,s.setUint8(i,30),i+=1;await this.#o.write(n,e)}}getSegments(t){return"word"===t?this.#h.wordSegments:this.#h.charSegments}getDeletedIds(){return this.#c}addDeletedId(t){this.#c.add(t)}isDeleted(t){return this.#c.has(t)}addAddedId(t){this.#d.add(t)}removeAddedId(t){this.#d.delete(t)}isAdded(t){return this.#d.has(t)}getAddedIds(){return this.#d}getLastSegmentInfo(t){const e=this.getSegments(t);return 0===e.length?null:e[e.length-1]}updateSegment(t,e,s,i,n,a){const r="word"===t?this.#h.wordSegments:this.#h.charSegments;if(a)r.push({filename:e,start:s,end:i,tokenCount:n});else{const t=r[r.length-1];t&&t.filename===e&&(t.end=i,t.tokenCount=n)}}reset(){this.#h={wordSegments:[],charSegments:[]},this.#c.clear(),this.#d.clear()}}class r{static SEPARATOR=30;#o;constructor(t){this.#o=t}async appendBatch(t,e){if(0===e.length)return await this.#o.getFileSize(t);const s=new TextEncoder;let i=0;for(const t of e){i+=8;for(const e of t.tokens){i+=2+Math.min(s.encode(e).byteLength,65535)}i+=1}const n=new Uint8Array(i);let a=0;for(const t of e){const e=[];for(const i of t.tokens){const t=s.encode(i),n=t.byteLength>65535?t.slice(0,65535):t;e.push(n)}const i=new DataView(n.buffer,a);i.setUint32(0,t.id,!0),i.setUint32(4,e.length,!0),a+=8;for(const t of e)new DataView(n.buffer,a).setUint16(0,t.byteLength,!0),a+=2,n.set(t,a),a+=t.byteLength;n[a++]=r.SEPARATOR}return await this.#o.append(t,n.buffer),await this.#o.getFileSize(t)}async readRange(t,e,s){const i=await this.#o.readRange(t,e,s);if(!i||0===i.byteLength)return[];const n=new DataView(i),a=new Uint8Array(i),o=new TextDecoder,h=[];let c=0;const d=i.byteLength;for(;c<d&&!(c+8>d);){const t=n.getUint32(c,!0);c+=4;const e=n.getUint32(c,!0);c+=4;const s=[];for(let t=0;t<e&&!(c+2>d);t++){const t=n.getUint16(c,!0);if(c+=2,c+t>d)break;const e=new Uint8Array(i,c,t);s.push(o.decode(e)),c+=t}c<d&&a[c]===r.SEPARATOR&&(c+=1),h.push({id:t,tokens:s})}return h}async getCurrentSize(t){return await this.#o.getFileSize(t)}}class o{#g;#o;#l=null;#f=null;static hash(t){let e=5381;for(let s=0;s<t.length;s++)e=(e<<5)+e^t.charCodeAt(s);return e>>>0}constructor(t,e){this.#g=t,this.#o=e}async loadIndex(){return!!this.#l||(this.#l=await this.#o.read(this.#g),!!this.#l&&(this.#f=new DataView(this.#l),!0))}async buildAndSave(t){const e=new Map;for(const s of t){const t=new Map;for(const i of s.tokens)if(!t.has(i)){t.set(i,!0);const n=o.hash(i);e.has(n)||e.set(n,[]),e.get(n).push(s.id)}}const s=Array.from(e.keys()).sort((t,e)=>t-e);let i=0;const n=new Array(s.length);for(let t=0;t<s.length;t++){const a=s[t],r=e.get(a);n[t]=r,i+=r.length}const a=12*s.length,r=new ArrayBuffer(8+a+4*i),h=new DataView(r);h.setUint32(0,1229866072),h.setUint32(4,s.length);let c=8,d=8+a;for(let t=0;t<s.length;t++){const e=s[t],i=n[t];h.setUint32(c,e),h.setUint32(c+4,d),h.setUint32(c+8,i.length),c+=12;for(let t=0;t<i.length;t++)h.setUint32(d,i[t],!0),d+=4}await this.#o.write(this.#g,r),this.#l=r,this.#f=h}search(t){if(!this.#f||!this.#l)return[];const e=o.hash(t);let s=0,i=this.#f.getUint32(4)-1;for(;s<=i;){const t=s+i>>>1,n=8+12*t,a=this.#f.getUint32(n);if(a<e)s=t+1;else{if(!(a>e)){const t=this.#f.getUint32(n+4),e=this.#f.getUint32(n+8),s=[];for(let i=0;i<e;i++)s.push(this.#f.getUint32(t+4*i,!0));return s}i=t-1}}return[]}}const h="word_cache.bin",c="char_cache.bin";class d{#o;#h;#w;#u;#m=!1;#y;#p=!1;#S={word:0,char:0};constructor(s){if(!s.baseDir)throw new Error("SearchEngine requires 'baseDir' in config.");if(this.#y={wordSegmentTokenThreshold:1e5,charSegmentTokenThreshold:5e5,minWordTokenSave:0,minCharTokenSave:0,...s},(this.#y.minWordTokenSave||0)>=(this.#y.wordSegmentTokenThreshold||1e5))throw new Error("minWordTokenSave must be less than wordSegmentTokenThreshold");if((this.#y.minCharTokenSave||0)>=(this.#y.charSegmentTokenThreshold||5e5))throw new Error("minCharTokenSave must be less than charSegmentTokenThreshold");let i=null;if(this.#y.storage&&("object"==typeof this.#y.storage?i=this.#y.storage:"browser"===this.#y.storage?i=new t(this.#y.baseDir):"node"===this.#y.storage&&(i=new e(this.#y.baseDir))),!i){const s=typeof navigator<"u"&&navigator?.storage?.getDirectory instanceof Function,n=typeof process<"u"&&null!=process.versions&&null!=process.versions.node;s?i=new t(this.#y.baseDir):n&&(i=new e(this.#y.baseDir))}if(!i)throw new Error('Storage initialization failed. Please configure "storage" explicitly or ensure you are in a supported environment (Browser/Node).');this.#o=i,this.#h=new a(this.#o),this.#w=new r(this.#o),this.#u=new Map}async init(){if(this.#m)return;await this.#h.load();const t=[...this.#h.getSegments("word"),...this.#h.getSegments("char")];for(const e of t)this.#u.has(e.filename)||this.#u.set(e.filename,new o(e.filename,this.#o)),await this.#u.get(e.filename).loadIndex();this.#m=!0}startBatch(){this.#p=!0,this.#S={word:0,char:0}}async endBatch(){this.#p=!1,this.#S.word>0&&await this.#D("word",this.#S.word),this.#S.char>0&&await this.#D("char",this.#S.char),this.#S={word:0,char:0},await this.#h.save()}#k(t){if(typeof Intl<"u"&&Intl.Segmenter){const e=new Intl.Segmenter([],{granularity:"word"});return Array.from(e.segment(t)).filter(t=>t.isWordLike).map(t=>t.segment.toLowerCase())}return t.toLowerCase().split(/[^a-z0-9\u4e00-\u9fa5]+/g).filter(t=>t.length>0)}#b(t){return this.#y.indexingTokenizer?this.#y.indexingTokenizer(t):this.#k(t)}#T(t){return this.#y.searchTokenizer?this.#y.searchTokenizer(t):this.#y.indexingTokenizer?this.#y.indexingTokenizer(t):this.#k(t)}async addDocument(t){return this.addDocuments([t])}async addDocuments(t){if(this.#m||await this.init(),0===t.length)return;const e=this.#h.getDeletedIds(),s=[],i=[];for(const n of t){if(e.has(n.id))throw new Error(`Document ID ${n.id} has been deleted and cannot be re-added.`);if(this.#h.isAdded(n.id))throw new Error(`Document ID ${n.id} already exists.`);const t=this.#b(n.text),a=[],r=[];for(const e of t)e.length>1?a.push(e):1===e.length&&r.push(e);a.length>0&&s.push({id:n.id,tokens:a}),r.length>0&&i.push({id:n.id,tokens:r})}let n=0,a=0;if(s.length>0){await this.#w.appendBatch(h,s);for(const t of s)n+=t.tokens.length}if(i.length>0){await this.#w.appendBatch(c,i);for(const t of i)a+=t.tokens.length}for(const e of t)this.#h.addAddedId(e.id);this.#p?(this.#S.word+=n,this.#S.char+=a):(n>0&&await this.#D("word",n),a>0&&await this.#D("char",a),await this.#h.save())}async#D(t,e){const s="word"===t?h:c,i=await this.#w.getCurrentSize(s),n="word"===t?this.#y.wordSegmentTokenThreshold||1e5:this.#y.charSegmentTokenThreshold||5e5,a="word"===t?this.#y.minWordTokenSave||0:this.#y.minCharTokenSave||0,r=this.#h.getLastSegmentInfo(t);let d,g,l,f;const w=()=>{const e=this.#h.getSegments(t).length+1;return`${t}_seg_${e}.bin`};if(r){const t=r.tokenCount;t>=n||t+e>=n?(d=w(),l=!0,g=r.end,f=e):(d=r.filename,l=!1,g=r.start,f=t+e)}else d=w(),l=!0,g=0,f=e;if(f<a)return void this.#h.updateSegment(t,d,g,i,f,l);const u=await this.#w.readRange(s,g,i);let m=this.#u.get(d);m||(m=new o(d,this.#o),this.#u.set(d,m)),await m.buildAndSave(u),this.#h.updateSegment(t,d,g,i,f,l)}async search(t,e){this.#m||await this.init();const s=this.#T(t),i=s.filter(t=>t.length>1),n=s.filter(t=>1===t.length),a=this.#h.getDeletedIds(),r=new Map,h=new Map,c=t=>{const e=this.#h.getSegments(t);for(const t of e){const e=t.filename;!this.#u.has(e)&&!h.has(e)&&h.set(e,new o(e,this.#o))}};c("word"),c("char"),await Promise.all(Array.from(h.entries()).map(([t,e])=>e.loadIndex().then(s=>{s&&this.#u.set(t,e)})));const d=async(t,e)=>{if(0===e.length)return;const s=this.#h.getSegments(t);for(const t of s){const s=t.filename,i=this.#u.get(s);if(i)for(const t of e){const e=i.search(t),s=1+.1*t.length;for(const i of e)if(!a.has(i))if(r.has(i)){const e=r.get(i);e.score+=s,e.tokens.add(t)}else r.set(i,{score:0,tokens:new Set([t])})}}};await d("word",i),await d("char",n);const g=[];return r.forEach((t,e)=>{g.push({id:e,score:t.score,tokens:Array.from(t.tokens)})}),g.sort((t,e)=>e.score-t.score),"number"==typeof e&&e>0?g.slice(0,e):g}async removeDocument(t){this.#m||await this.init(),this.#h.addDeletedId(t),this.#h.removeAddedId(t),await this.#h.save()}async clearAll(){await this.#o.clearAll(),this.#u.clear(),this.#h.reset(),this.#m=!1,this.#p=!1,this.#S={word:0,char:0}}async getStatus(){return this.#m||await this.init(),{wordSegments:this.#h.getSegments("word").length,charSegments:this.#h.getSegments("char").length,deleted:this.#h.getDeletedIds().size,wordCacheSize:await this.#w.getCurrentSize(h),charCacheSize:await this.#w.getCurrentSize(c),inBatch:this.#p}}}exports.BrowserStorage=t,exports.NodeStorage=e,exports.SearchEngine=d,exports.SimpleSearch=class{static#I=null;static#v={baseDir:"simple_search_data",wordSegmentTokenThreshold:1e5,minWordTokenSave:0};static configure(t){const e={...this.#v,...t};this.#I=new d(e)}static#z(){return this.#I||(this.#I=new d(this.#v)),this.#I}static async startBatch(){this.#z().startBatch()}static async endBatch(){return this.#z().endBatch()}static async addDocument(t){return this.#z().addDocument(t)}static async addDocuments(t){return this.#z().addDocuments(t)}static async search(t,e){return this.#z().search(t,e)}static async removeDocument(t){return this.#z().removeDocument(t)}static async clearAll(){return this.#z().clearAll()}static async getStatus(){return this.#z().getStatus()}};
package/lib/index.d.ts ADDED
@@ -0,0 +1,199 @@
1
+ /**
2
+ * 核心类型定义
3
+ */
4
+ interface IDocument {
5
+ id: number;
6
+ text: string;
7
+ }
8
+ interface IResult {
9
+ id: number;
10
+ score: number;
11
+ tokens: string[];
12
+ }
13
+ interface ISegmentMeta {
14
+ filename: string;
15
+ start: number;
16
+ end: number;
17
+ tokenCount: number;
18
+ }
19
+ interface IIndexMeta {
20
+ wordSegments: ISegmentMeta[];
21
+ charSegments: ISegmentMeta[];
22
+ }
23
+ interface ITokenizedDoc {
24
+ id: number;
25
+ tokens: string[];
26
+ }
27
+ type IndexType = 'word' | 'char';
28
+ /**
29
+ * 存储层接口 (外部化)
30
+ */
31
+ interface IStorage {
32
+ write(filename: string, data: ArrayBuffer): Promise<void>;
33
+ append(filename: string, data: ArrayBuffer): Promise<void>;
34
+ read(filename: string): Promise<ArrayBuffer | null>;
35
+ readRange(filename: string, start: number, end: number): Promise<ArrayBuffer | null>;
36
+ remove(filename: string): Promise<void>;
37
+ listFiles(): Promise<string[]>;
38
+ clearAll(): Promise<void>;
39
+ getFileSize(filename: string): Promise<number>;
40
+ }
41
+ /**
42
+ * 核心搜索引擎配置
43
+ */
44
+ interface ISearchEngineConfig {
45
+ /** * 数据存储的基础目录 (必填)
46
+ * 用于区分不同的搜索引擎实例
47
+ */
48
+ baseDir: string;
49
+ /**
50
+ * 存储实现配置 (可选)
51
+ * - 'browser': 强制使用 OPFS (BrowserStorage)
52
+ * - 'node': 强制使用 Node.js fs (NodeStorage)
53
+ * - IStorage: 传入自定义的存储实例
54
+ * - undefined: 自动检测环境
55
+ */
56
+ storage?: 'browser' | 'node' | IStorage;
57
+ /**
58
+ * 索引时使用的分词器 (算法核心配置)
59
+ * - 作用: 将文档文本转换为索引用的token序列
60
+ * - 算法: 自定义实现的分词逻辑,需满足返回字符串数组的格式要求
61
+ * - 建议: 针对不同语言(中文/英文/日文等)使用专门的分词实现
62
+ * - 影响: 直接决定索引的粒度和搜索的准确性
63
+ */
64
+ indexingTokenizer?: (text: string) => string[];
65
+ /**
66
+ * 搜索时使用的分词器 (算法核心配置)
67
+ * - 作用: 将查询文本转换为搜索用的token序列
68
+ * - 算法: 自定义实现的分词逻辑,需满足返回字符串数组的格式要求
69
+ * - 建议: 与indexingTokenizer保持一致的分词策略以确保搜索准确性
70
+ * - 影响: 直接决定搜索匹配的范围和结果的相关性
71
+ */
72
+ searchTokenizer?: (text: string) => string[];
73
+ /**
74
+ * 词索引分段阈值 (Token数) - 分段算法配置
75
+ * - 作用: 控制词索引文件的大小,超过阈值时创建新的索引段
76
+ * - 算法: 基于Token数量的分段策略,当新增Token数加上已有Token数超过阈值时触发分段
77
+ * - 默认值: 100000
78
+ * - 影响: 过小会导致索引文件过多,过大可能影响搜索性能
79
+ */
80
+ wordSegmentTokenThreshold?: number;
81
+ /**
82
+ * 字索引分段阈值 (Token数) - 分段算法配置
83
+ * - 作用: 控制字索引文件的大小,超过阈值时创建新的索引段
84
+ * - 算法: 基于Token数量的分段策略,当新增Token数加上已有Token数超过阈值时触发分段
85
+ * - 默认值: 500000
86
+ * - 影响: 过小会导致索引文件过多,过大可能影响搜索性能
87
+ */
88
+ charSegmentTokenThreshold?: number;
89
+ /**
90
+ * 词索引最小保存阈值 (Token数) - 缓存算法配置
91
+ * - 作用: 控制词索引是否立即写入磁盘,低于阈值时只保存在内存缓存中
92
+ * - 算法: 基于Token数量的缓存策略,当累计Token数达到阈值时才进行持久化
93
+ * - 默认值: 0
94
+ * - 影响: 适当设置可减少磁盘IO次数,提高索引性能
95
+ */
96
+ minWordTokenSave?: number;
97
+ /**
98
+ * 字索引最小保存阈值 (Token数) - 缓存算法配置
99
+ * - 作用: 控制字索引是否立即写入磁盘,低于阈值时只保存在内存缓存中
100
+ * - 算法: 基于Token数量的缓存策略,当累计Token数达到阈值时才进行持久化
101
+ * - 默认值: 0
102
+ * - 影响: 适当设置可减少磁盘IO次数,提高索引性能
103
+ */
104
+ minCharTokenSave?: number;
105
+ }
106
+
107
+ /**
108
+ * 核心搜索引擎类 (多实例支持)
109
+ */
110
+ declare class SearchEngine {
111
+ #private;
112
+ constructor(config: ISearchEngineConfig);
113
+ init(): Promise<void>;
114
+ /**
115
+ * 开启批处理
116
+ * 批处理期间 addDocuments 只写入缓存,不触发索引段构建
117
+ */
118
+ startBatch(): void;
119
+ /**
120
+ * 结束批处理
121
+ * 触发索引构建检查并保存元数据
122
+ */
123
+ endBatch(): Promise<void>;
124
+ addDocument(doc: IDocument): Promise<void>;
125
+ addDocuments(docs: IDocument[]): Promise<void>;
126
+ search(query: string, limit?: number): Promise<IResult[]>;
127
+ removeDocument(id: number): Promise<void>;
128
+ clearAll(): Promise<void>;
129
+ getStatus(): Promise<{
130
+ wordSegments: number;
131
+ charSegments: number;
132
+ deleted: number;
133
+ wordCacheSize: number;
134
+ charCacheSize: number;
135
+ inBatch: boolean;
136
+ }>;
137
+ }
138
+
139
+ /**
140
+ * 快速使用封装
141
+ * 提供单例模式和默认配置
142
+ */
143
+ declare class SimpleSearch {
144
+ #private;
145
+ /**
146
+ * 配置并初始化单例
147
+ */
148
+ static configure(config: Partial<ISearchEngineConfig>): void;
149
+ static startBatch(): Promise<void>;
150
+ static endBatch(): Promise<void>;
151
+ static addDocument(doc: IDocument): Promise<void>;
152
+ static addDocuments(docs: IDocument[]): Promise<void>;
153
+ static search(query: string, limit?: number): Promise<IResult[]>;
154
+ static removeDocument(id: number): Promise<void>;
155
+ static clearAll(): Promise<void>;
156
+ static getStatus(): Promise<{
157
+ wordSegments: number;
158
+ charSegments: number;
159
+ deleted: number;
160
+ wordCacheSize: number;
161
+ charCacheSize: number;
162
+ inBatch: boolean;
163
+ }>;
164
+ }
165
+
166
+ /**
167
+ * 浏览器实现 (OPFS - 子目录隔离)
168
+ * 支持 Main Thread 和 Web Worker
169
+ */
170
+ declare class BrowserStorage implements IStorage {
171
+ #private;
172
+ constructor(baseDir: string);
173
+ write(filename: string, data: ArrayBuffer): Promise<void>;
174
+ append(filename: string, data: ArrayBuffer): Promise<void>;
175
+ read(filename: string): Promise<ArrayBuffer | null>;
176
+ readRange(filename: string, start: number, end: number): Promise<ArrayBuffer | null>;
177
+ remove(filename: string): Promise<void>;
178
+ listFiles(): Promise<string[]>;
179
+ clearAll(): Promise<void>;
180
+ getFileSize(filename: string): Promise<number>;
181
+ }
182
+ /**
183
+ * Node.js 实现
184
+ */
185
+ declare class NodeStorage implements IStorage {
186
+ #private;
187
+ constructor(baseDir: string);
188
+ write(filename: string, data: ArrayBuffer): Promise<void>;
189
+ append(filename: string, data: ArrayBuffer): Promise<void>;
190
+ read(filename: string): Promise<ArrayBuffer | null>;
191
+ readRange(filename: string, start: number, end: number): Promise<ArrayBuffer | null>;
192
+ remove(filename: string): Promise<void>;
193
+ listFiles(): Promise<string[]>;
194
+ clearAll(): Promise<void>;
195
+ getFileSize(filename: string): Promise<number>;
196
+ }
197
+
198
+ export { BrowserStorage, NodeStorage, SearchEngine, SimpleSearch };
199
+ export type { IDocument, IIndexMeta, IResult, ISearchEngineConfig, ISegmentMeta, IStorage, ITokenizedDoc, IndexType };
package/lib/index.js ADDED
@@ -0,0 +1 @@
1
+ class t{#t;constructor(t){this.#t=t}async#e(){return await(await navigator.storage.getDirectory()).getDirectoryHandle(this.#t,{create:!0})}async write(t,e){const s=await(await(await this.#e()).getFileHandle(t,{create:!0})).createWritable();await s.write(e),await s.close()}async append(t,e){const s=await this.#e();let i;try{i=await s.getFileHandle(t,{create:!0})}catch{i=await s.getFileHandle(t,{create:!0})}const n=await i.getFile(),a=await i.createWritable({keepExistingData:!0});await a.seek(n.size),await a.write(e),await a.close()}async read(t){const e=await this.#e();try{return await(await(await e.getFileHandle(t)).getFile()).arrayBuffer()}catch{return null}}async readRange(t,e,s){const i=await this.#e();try{return await(await(await i.getFileHandle(t)).getFile()).slice(e,s).arrayBuffer()}catch{return null}}async remove(t){const e=await this.#e();try{await e.removeEntry(t)}catch{}}async listFiles(){const t=await this.#e(),e=[];for await(const s of t.keys())e.push(s);return e}async clearAll(){const t=await this.#e();for await(const e of t.keys())await t.removeEntry(e,{recursive:!0})}async getFileSize(t){const e=await this.#e();try{return(await(await e.getFileHandle(t)).getFile()).size}catch{return 0}}}class e{#s=null;#i=null;#t;#n="";constructor(t){this.#t=t}async#a(){if(this.#s)return;const t=await import("node:fs"),e=await import("node:path");this.#s=t.promises,this.#i=e.default||e,this.#n=this.#i.join(process.cwd(),this.#t);try{await this.#s.access(this.#n)}catch{await this.#s.mkdir(this.#n,{recursive:!0})}}#r(t){return this.#i.join(this.#n,t)}async write(t,e){await this.#a(),await this.#s.writeFile(this.#r(t),Buffer.from(e))}async append(t,e){await this.#a(),await this.#s.appendFile(this.#r(t),Buffer.from(e))}async read(t){await this.#a();try{const e=await this.#s.readFile(this.#r(t));return e.buffer.slice(e.byteOffset,e.byteOffset+e.byteLength)}catch{return null}}async readRange(t,e,s){await this.#a();try{const i=await this.#s.open(this.#r(t),"r"),n=s-e,a=Buffer.alloc(n);return await i.read(a,0,n,e),await i.close(),a.buffer.slice(a.byteOffset,a.byteOffset+a.byteLength)}catch{return null}}async remove(t){await this.#a();try{await this.#s.unlink(this.#r(t))}catch{}}async listFiles(){await this.#a();try{return await this.#s.readdir(this.#n)}catch{return[]}}async clearAll(){await this.#a();try{const t=await this.#s.readdir(this.#n);for(const e of t)await this.#s.unlink(this.#i.join(this.#n,e))}catch{}}async getFileSize(t){await this.#a();try{return(await this.#s.stat(this.#r(t))).size}catch{return 0}}}const s="search_meta.json",i="deleted_ids.bin",n="added_ids.bin";class a{#o;#h={wordSegments:[],charSegments:[]};#c=new Set;#d=new Set;constructor(t){this.#o=t}async load(){const t=await this.#o.read(s);if(t){const e=(new TextDecoder).decode(t);this.#h=JSON.parse(e)}else this.#h={wordSegments:[],charSegments:[]};const e=await this.#o.read(i);if(e){const t=new DataView(e);let s=0;const i=e.byteLength;for(;s<i&&!(s+4>i);){const e=t.getUint32(s,!0);this.#c.add(e),s+=4,s<i&&30===t.getUint8(s)&&(s+=1)}}const a=await this.#o.read(n);if(a){const t=new DataView(a);let e=0;const s=a.byteLength;for(;e<s&&!(e+4>s);){const i=t.getUint32(e,!0);this.#d.add(i),e+=4,e<s&&30===t.getUint8(e)&&(e+=1)}}}async save(){const t=JSON.stringify(this.#h);if(await this.#o.write(s,(new TextEncoder).encode(t).buffer),0===this.#c.size)await this.#o.remove(i);else{const t=4*this.#c.size+this.#c.size,e=new ArrayBuffer(t),s=new DataView(e);let n=0;for(const t of this.#c)s.setUint32(n,t,!0),n+=4,s.setUint8(n,30),n+=1;await this.#o.write(i,e)}if(0===this.#d.size)await this.#o.remove(n);else{const t=4*this.#d.size+this.#d.size,e=new ArrayBuffer(t),s=new DataView(e);let i=0;for(const t of this.#d)s.setUint32(i,t,!0),i+=4,s.setUint8(i,30),i+=1;await this.#o.write(n,e)}}getSegments(t){return"word"===t?this.#h.wordSegments:this.#h.charSegments}getDeletedIds(){return this.#c}addDeletedId(t){this.#c.add(t)}isDeleted(t){return this.#c.has(t)}addAddedId(t){this.#d.add(t)}removeAddedId(t){this.#d.delete(t)}isAdded(t){return this.#d.has(t)}getAddedIds(){return this.#d}getLastSegmentInfo(t){const e=this.getSegments(t);return 0===e.length?null:e[e.length-1]}updateSegment(t,e,s,i,n,a){const r="word"===t?this.#h.wordSegments:this.#h.charSegments;if(a)r.push({filename:e,start:s,end:i,tokenCount:n});else{const t=r[r.length-1];t&&t.filename===e&&(t.end=i,t.tokenCount=n)}}reset(){this.#h={wordSegments:[],charSegments:[]},this.#c.clear(),this.#d.clear()}}class r{static SEPARATOR=30;#o;constructor(t){this.#o=t}async appendBatch(t,e){if(0===e.length)return await this.#o.getFileSize(t);const s=new TextEncoder;let i=0;for(const t of e){i+=8;for(const e of t.tokens){i+=2+Math.min(s.encode(e).byteLength,65535)}i+=1}const n=new Uint8Array(i);let a=0;for(const t of e){const e=[];for(const i of t.tokens){const t=s.encode(i),n=t.byteLength>65535?t.slice(0,65535):t;e.push(n)}const i=new DataView(n.buffer,a);i.setUint32(0,t.id,!0),i.setUint32(4,e.length,!0),a+=8;for(const t of e)new DataView(n.buffer,a).setUint16(0,t.byteLength,!0),a+=2,n.set(t,a),a+=t.byteLength;n[a++]=r.SEPARATOR}return await this.#o.append(t,n.buffer),await this.#o.getFileSize(t)}async readRange(t,e,s){const i=await this.#o.readRange(t,e,s);if(!i||0===i.byteLength)return[];const n=new DataView(i),a=new Uint8Array(i),o=new TextDecoder,h=[];let c=0;const d=i.byteLength;for(;c<d&&!(c+8>d);){const t=n.getUint32(c,!0);c+=4;const e=n.getUint32(c,!0);c+=4;const s=[];for(let t=0;t<e&&!(c+2>d);t++){const t=n.getUint16(c,!0);if(c+=2,c+t>d)break;const e=new Uint8Array(i,c,t);s.push(o.decode(e)),c+=t}c<d&&a[c]===r.SEPARATOR&&(c+=1),h.push({id:t,tokens:s})}return h}async getCurrentSize(t){return await this.#o.getFileSize(t)}}class o{#g;#o;#l=null;#f=null;static hash(t){let e=5381;for(let s=0;s<t.length;s++)e=(e<<5)+e^t.charCodeAt(s);return e>>>0}constructor(t,e){this.#g=t,this.#o=e}async loadIndex(){return!!this.#l||(this.#l=await this.#o.read(this.#g),!!this.#l&&(this.#f=new DataView(this.#l),!0))}async buildAndSave(t){const e=new Map;for(const s of t){const t=new Map;for(const i of s.tokens)if(!t.has(i)){t.set(i,!0);const n=o.hash(i);e.has(n)||e.set(n,[]),e.get(n).push(s.id)}}const s=Array.from(e.keys()).sort((t,e)=>t-e);let i=0;const n=new Array(s.length);for(let t=0;t<s.length;t++){const a=s[t],r=e.get(a);n[t]=r,i+=r.length}const a=12*s.length,r=new ArrayBuffer(8+a+4*i),h=new DataView(r);h.setUint32(0,1229866072),h.setUint32(4,s.length);let c=8,d=8+a;for(let t=0;t<s.length;t++){const e=s[t],i=n[t];h.setUint32(c,e),h.setUint32(c+4,d),h.setUint32(c+8,i.length),c+=12;for(let t=0;t<i.length;t++)h.setUint32(d,i[t],!0),d+=4}await this.#o.write(this.#g,r),this.#l=r,this.#f=h}search(t){if(!this.#f||!this.#l)return[];const e=o.hash(t);let s=0,i=this.#f.getUint32(4)-1;for(;s<=i;){const t=s+i>>>1,n=8+12*t,a=this.#f.getUint32(n);if(a<e)s=t+1;else{if(!(a>e)){const t=this.#f.getUint32(n+4),e=this.#f.getUint32(n+8),s=[];for(let i=0;i<e;i++)s.push(this.#f.getUint32(t+4*i,!0));return s}i=t-1}}return[]}}const h="word_cache.bin",c="char_cache.bin";class d{#o;#h;#w;#u;#m=!1;#y;#p=!1;#S={word:0,char:0};constructor(s){if(!s.baseDir)throw new Error("SearchEngine requires 'baseDir' in config.");if(this.#y={wordSegmentTokenThreshold:1e5,charSegmentTokenThreshold:5e5,minWordTokenSave:0,minCharTokenSave:0,...s},(this.#y.minWordTokenSave||0)>=(this.#y.wordSegmentTokenThreshold||1e5))throw new Error("minWordTokenSave must be less than wordSegmentTokenThreshold");if((this.#y.minCharTokenSave||0)>=(this.#y.charSegmentTokenThreshold||5e5))throw new Error("minCharTokenSave must be less than charSegmentTokenThreshold");let i=null;if(this.#y.storage&&("object"==typeof this.#y.storage?i=this.#y.storage:"browser"===this.#y.storage?i=new t(this.#y.baseDir):"node"===this.#y.storage&&(i=new e(this.#y.baseDir))),!i){const s=typeof navigator<"u"&&navigator?.storage?.getDirectory instanceof Function,n=typeof process<"u"&&null!=process.versions&&null!=process.versions.node;s?i=new t(this.#y.baseDir):n&&(i=new e(this.#y.baseDir))}if(!i)throw new Error('Storage initialization failed. Please configure "storage" explicitly or ensure you are in a supported environment (Browser/Node).');this.#o=i,this.#h=new a(this.#o),this.#w=new r(this.#o),this.#u=new Map}async init(){if(this.#m)return;await this.#h.load();const t=[...this.#h.getSegments("word"),...this.#h.getSegments("char")];for(const e of t)this.#u.has(e.filename)||this.#u.set(e.filename,new o(e.filename,this.#o)),await this.#u.get(e.filename).loadIndex();this.#m=!0}startBatch(){this.#p=!0,this.#S={word:0,char:0}}async endBatch(){this.#p=!1,this.#S.word>0&&await this.#D("word",this.#S.word),this.#S.char>0&&await this.#D("char",this.#S.char),this.#S={word:0,char:0},await this.#h.save()}#k(t){if(typeof Intl<"u"&&Intl.Segmenter){const e=new Intl.Segmenter([],{granularity:"word"});return Array.from(e.segment(t)).filter(t=>t.isWordLike).map(t=>t.segment.toLowerCase())}return t.toLowerCase().split(/[^a-z0-9\u4e00-\u9fa5]+/g).filter(t=>t.length>0)}#b(t){return this.#y.indexingTokenizer?this.#y.indexingTokenizer(t):this.#k(t)}#T(t){return this.#y.searchTokenizer?this.#y.searchTokenizer(t):this.#y.indexingTokenizer?this.#y.indexingTokenizer(t):this.#k(t)}async addDocument(t){return this.addDocuments([t])}async addDocuments(t){if(this.#m||await this.init(),0===t.length)return;const e=this.#h.getDeletedIds(),s=[],i=[];for(const n of t){if(e.has(n.id))throw new Error(`Document ID ${n.id} has been deleted and cannot be re-added.`);if(this.#h.isAdded(n.id))throw new Error(`Document ID ${n.id} already exists.`);const t=this.#b(n.text),a=[],r=[];for(const e of t)e.length>1?a.push(e):1===e.length&&r.push(e);a.length>0&&s.push({id:n.id,tokens:a}),r.length>0&&i.push({id:n.id,tokens:r})}let n=0,a=0;if(s.length>0){await this.#w.appendBatch(h,s);for(const t of s)n+=t.tokens.length}if(i.length>0){await this.#w.appendBatch(c,i);for(const t of i)a+=t.tokens.length}for(const e of t)this.#h.addAddedId(e.id);this.#p?(this.#S.word+=n,this.#S.char+=a):(n>0&&await this.#D("word",n),a>0&&await this.#D("char",a),await this.#h.save())}async#D(t,e){const s="word"===t?h:c,i=await this.#w.getCurrentSize(s),n="word"===t?this.#y.wordSegmentTokenThreshold||1e5:this.#y.charSegmentTokenThreshold||5e5,a="word"===t?this.#y.minWordTokenSave||0:this.#y.minCharTokenSave||0,r=this.#h.getLastSegmentInfo(t);let d,g,l,f;const w=()=>{const e=this.#h.getSegments(t).length+1;return`${t}_seg_${e}.bin`};if(r){const t=r.tokenCount;t>=n||t+e>=n?(d=w(),l=!0,g=r.end,f=e):(d=r.filename,l=!1,g=r.start,f=t+e)}else d=w(),l=!0,g=0,f=e;if(f<a)return void this.#h.updateSegment(t,d,g,i,f,l);const u=await this.#w.readRange(s,g,i);let m=this.#u.get(d);m||(m=new o(d,this.#o),this.#u.set(d,m)),await m.buildAndSave(u),this.#h.updateSegment(t,d,g,i,f,l)}async search(t,e){this.#m||await this.init();const s=this.#T(t),i=s.filter(t=>t.length>1),n=s.filter(t=>1===t.length),a=this.#h.getDeletedIds(),r=new Map,h=new Map,c=t=>{const e=this.#h.getSegments(t);for(const t of e){const e=t.filename;!this.#u.has(e)&&!h.has(e)&&h.set(e,new o(e,this.#o))}};c("word"),c("char"),await Promise.all(Array.from(h.entries()).map(([t,e])=>e.loadIndex().then(s=>{s&&this.#u.set(t,e)})));const d=async(t,e)=>{if(0===e.length)return;const s=this.#h.getSegments(t);for(const t of s){const s=t.filename,i=this.#u.get(s);if(i)for(const t of e){const e=i.search(t),s=1+.1*t.length;for(const i of e)if(!a.has(i))if(r.has(i)){const e=r.get(i);e.score+=s,e.tokens.add(t)}else r.set(i,{score:0,tokens:new Set([t])})}}};await d("word",i),await d("char",n);const g=[];return r.forEach((t,e)=>{g.push({id:e,score:t.score,tokens:Array.from(t.tokens)})}),g.sort((t,e)=>e.score-t.score),"number"==typeof e&&e>0?g.slice(0,e):g}async removeDocument(t){this.#m||await this.init(),this.#h.addDeletedId(t),this.#h.removeAddedId(t),await this.#h.save()}async clearAll(){await this.#o.clearAll(),this.#u.clear(),this.#h.reset(),this.#m=!1,this.#p=!1,this.#S={word:0,char:0}}async getStatus(){return this.#m||await this.init(),{wordSegments:this.#h.getSegments("word").length,charSegments:this.#h.getSegments("char").length,deleted:this.#h.getDeletedIds().size,wordCacheSize:await this.#w.getCurrentSize(h),charCacheSize:await this.#w.getCurrentSize(c),inBatch:this.#p}}}class g{static#I=null;static#v={baseDir:"simple_search_data",wordSegmentTokenThreshold:1e5,minWordTokenSave:0};static configure(t){const e={...this.#v,...t};this.#I=new d(e)}static#z(){return this.#I||(this.#I=new d(this.#v)),this.#I}static async startBatch(){this.#z().startBatch()}static async endBatch(){return this.#z().endBatch()}static async addDocument(t){return this.#z().addDocument(t)}static async addDocuments(t){return this.#z().addDocuments(t)}static async search(t,e){return this.#z().search(t,e)}static async removeDocument(t){return this.#z().removeDocument(t)}static async clearAll(){return this.#z().clearAll()}static async getStatus(){return this.#z().getStatus()}}export{t as BrowserStorage,e as NodeStorage,d as SearchEngine,g as SimpleSearch};
package/package.json ADDED
@@ -0,0 +1,28 @@
1
+ {
2
+ "name": "gs-search",
3
+ "version": "0.1.0",
4
+ "type": "module",
5
+ "main": "lib/index.cjs",
6
+ "module": "lib/index.js",
7
+ "exports": {
8
+ ".": {
9
+ "import": "./lib/index.js",
10
+ "require": "./lib/index.cjs",
11
+ "types": "./lib/index.d.ts"
12
+ }
13
+ },
14
+ "types": "lib/index.d.ts",
15
+ "keywords": [
16
+ ],
17
+ "homepage": "https://github.com/grain-sand/gs-search",
18
+ "repository": {
19
+ "type": "git",
20
+ "url": "git+https://github.com/grain-sand/gs-search.git"
21
+ },
22
+ "bugs": {
23
+ "url": "https://github.com/grain-sand/gs-search/issues"
24
+ },
25
+ "author": "grain-sand",
26
+ "license": "MIT",
27
+ "dependencies": {}
28
+ }