@1-/scan 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. package/README.md +70 -51
  2. package/_.js +2 -2
  3. package/load.js +2 -2
  4. package/package.json +6 -6
  5. package/save.js +4 -2
package/README.md CHANGED
@@ -9,11 +9,11 @@ Incrementally scans directory files, compares file sizes and modification times
9
9
 
10
10
  ## Features
11
11
 
12
- - **Incremental Scanning**: Detects and processes only new, modified, or deleted files, avoiding redundant file system operations.
12
+ - **Incremental Scanning**: Compares file sizes and modification times to process only new, modified, or deleted files, avoiding redundant read/write operations.
13
13
  - **Key Optimization**: Stores relative paths within 16 bytes directly as raw bytes; hashes longer paths to 16-byte MD5 digests to optimize database index space and query performance.
14
14
  - **Metadata Compression**: Compresses file sizes and modification times using Varint (variable-length byte) encoding.
15
- - **Transactional Integrity**: Packages updates and deletions in a single database transaction to guarantee consistency.
16
- - **Flexible Filtering**: Supports custom ignore callback functions to filter specific files and directories.
15
+ - **Transactional Integrity**: Performs updates and deletions within database transactions to guarantee consistency.
16
+ - **File Filtering**: Supports custom ignore callback functions to filter files and directories.
17
17
  - **Native Database**: Integrates Bun native `bun:sqlite` module, eliminating external database driver dependencies.
18
18
 
19
19
  ## Usage
@@ -58,41 +58,47 @@ using _upsert = upsert;
58
58
 
59
59
  console.log("Synced. Updated files:", updated_paths);
60
60
 
61
+ // Update scanned file metadata in database
61
62
  for (const rel_path of updated_paths) {
62
63
  await upsert(rel_path);
63
64
  }
64
65
  ```
65
66
 
67
+ ### Bulk Storage Module Usage
68
+
69
+ ```javascript
70
+ import save from "@1-/scan/save.js";
71
+ import sqlite from "@1-/scan/sqlite.js";
72
+
73
+ const db = sqlite("./scan_record.db");
74
+
75
+ // Bulk update and delete metadata
76
+ save(db, [["file.txt", new Uint8Array([1, 2, 3]), 123, 1620000000]], [new Uint8Array([4, 5, 6])]);
77
+
78
+ db.close();
79
+ ```
80
+
66
81
  ## Design Ideas
67
82
 
68
83
  The main entry orchestrates independent modules to execute the incremental scanning and synchronization flow.
69
84
 
70
- ```mermaid
71
- graph TD
72
- Entry["_.js (Entry Point)"] -->|1. Initialize Connection| Sqlite["sqlite.js"]
73
- Entry -->|2. Load Existing Records| Load["load.js"]
74
- Entry -->|3. Walk & Compare Files| DirWalk["dirWalk.js"]
75
- DirWalk -->|Invoke| Walk["@1-/walk/walkRelIgnore"]
76
- DirWalk -->|Process Path Keys| Hash["hash.js"]
77
- Entry -->|4. Delete Absent & Return Upsert| Trans["trans.js"]
78
- Save["save.js (Independent Sync Helper)"] -->|Transaction Wrapper| Trans
79
- ```
85
+ ![](https://i-01.eu.org/KDY9Lax79XwPBKztEqhU5A)
80
86
 
81
87
  1. **Initialize Connection (`sqlite.js`)**: Opens SQLite database connection and configures automatic connection disposal.
82
- 2. **Load Records (`load.js`)**: Automatically creates schema if missing, retrieves existing file hashes, sizes, and modification times, and reconstructs reference set in memory.
88
+ 2. **Load Records (`load.js`)**: Automatically creates `scanMtimeLen` table if missing, retrieves existing file hashes, sizes, and modification times, and reconstructs reference set in memory.
83
89
  3. **Walk & Compare (`dirWalk.js`)**: Traverses directory structure recursively. Paths are transformed into 16-byte keys via `hash.js`. File attributes are encoded using `@3-/vb` and compared against database records to identify additions and modifications.
84
90
  4. **Delete & Return Upsert**: Uses `trans.js` to execute transaction-safe deletions for deleted files, and returns modified relative paths and an `upsert` function so that caller can update database records.
85
- 5. **Independent Sync Helper (`save.js`)**: Exported independent module to execute bulk inserts and deletions in a single transaction.
91
+ 5. **Independent Sync Helper (`save.js`)**: Exported independent module to execute bulk updates and deletions in transactions.
86
92
 
87
93
  ## Tech Stack
88
94
 
89
95
  - **Bun**: Runtime environment and test framework.
90
- - **Bun SQLite**: Native high-performance SQLite engine built into Bun.
96
+ - **Bun SQLite**: Native SQLite engine built into Bun.
91
97
  - **@1-/walk**: Directory walker with ignore support.
92
98
  - **@3-/vb**: Variable-length byte (Varint) encoder and decoder.
93
99
  - **@3-/binmap / @3-/binset**: Memory-efficient collections designed for binary keys.
94
100
 
95
- ## Directory Structure
101
+ ## Code Structure
96
102
 
97
103
  ```
98
104
  .
@@ -104,7 +110,7 @@ graph TD
104
110
  │ ├── save.js # Independent helper executing bulk updates and deletions
105
111
  │ ├── sqlite.js # Connection manager instantiating SQLite database
106
112
  │ └── trans.js # Transaction wrapper providing rollback mechanism
107
- └── tests # Test suites
113
+ └── tests # Test directory
108
114
  ```
109
115
 
110
116
  ## History
@@ -112,22 +118,26 @@ graph TD
112
118
  SQLite was created by D. Richard Hipp in 2000 while designing board software for US Navy guided-missile destroyers. The system originally depended on a commercial database that required constant database administration; a connection loss could stall the entire damage control application. To resolve this vulnerability, Hipp designed a serverless, zero-configuration embedded database that directly reads and writes local files—marking the birth of SQLite.
113
119
 
114
120
  To conserve disk space and reduce I/O overhead, SQLite utilizes Varint (variable-length integer) encoding for metadata storage. Under this scheme, small integers consume only 1 byte, while larger numbers scale dynamically. This library inherits that design philosophy, compressing file metadata into varints before storing it, ensuring minimal footprint and high sync performance.
115
- ../doc/en/about.md
121
+ ## About
122
+
123
+ This library is developed by [WebC.site](https://webc.site).
124
+
125
+ [WebC.site](https://webc.site): A new paradigm of web development for AI
116
126
 
117
127
  ---
118
128
 
119
129
  <a id="zh"></a>
120
130
  # @1-/scan : 增量扫描目录文件并使用 SQLite 记录元数据
121
131
 
122
- 增量扫描目录文件,通过比对文件大小和修改时间检测变更,并同步至 SQLite 数据库中,最终返回有更新的相对路径列表。
132
+ 增量扫描目录文件,通过比对大小与修改时间检测变更,同步元数据至 SQLite 数据库,返回发生变更之相对路径列表。
123
133
 
124
134
  ## 功能介绍
125
135
 
126
- - **增量扫描**:仅处理新增、修改或删除的文件,避免冗余的文件系统读写,提升同步速度。
127
- - **路径压缩**:当相对路径长度小于等于 16 字节时保留原始字节;超出 16 字节则转换为 16 字节 MD5 值作为数据库主键,优化索引空间与查询性能。
128
- - **元数据压缩**:使用 Varint(可变字节整型)编码方式压缩存储文件大小和修改时间。
129
- - **事务安全**:将更新与删除操作合并在单个数据库事务中执行,确保数据一致性。
130
- - **灵活过滤**:支持通过自定义回调函数过滤指定类型的文件与目录。
136
+ - **增量扫描**:比对文件大小与修改时间,仅对新增、修改或删除之文件执行操作,避免冗余读写,提升效率。
137
+ - **路径压缩**:当相对路径长度不大于 16 字节时保留原始字节;超出 16 字节则转换为 16 字节 MD5 值作为主键,优化索引空间与查询性能。
138
+ - **元数据压缩**:使用 Varint(可变字节整型)编码方式压缩存储文件大小与修改时间。
139
+ - **事务安全**:更新与删除操作合并在数据库事务中执行,确保数据一致性。
140
+ - **过滤规则**:支持传入自定义过滤函数,按需排除特定文件与目录。
131
141
  - **原生依赖**:基于 Bun 内置 `bun:sqlite` 模块,无需额外安装或编译数据库驱动。
132
142
 
133
143
  ## 使用演示
@@ -140,7 +150,7 @@ import scan from "@1-/scan";
140
150
  const dir = "./data";
141
151
  const db_path = "./scan_record.db";
142
152
 
143
- // 扫描目录并同步至 SQLite,返回发生变更的相对路径列表与更新函数
153
+ // 扫描目录并同步至 SQLite,返回发生变更之相对路径列表与更新函数
144
154
  const [updated_paths, upsert] = await scan(dir, db_path);
145
155
 
146
156
  // 退出作用域时自动关闭数据库
@@ -148,13 +158,13 @@ using _upsert = upsert;
148
158
 
149
159
  console.log("更新文件列表:", updated_paths);
150
160
 
151
- // 更新已处理文件的元数据至数据库
161
+ // 更新已处理文件元数据至数据库
152
162
  for (const rel_path of updated_paths) {
153
163
  await upsert(rel_path);
154
164
  }
155
165
  ```
156
166
 
157
- ### 带有忽略规则的扫描
167
+ ### 过滤规则扫描
158
168
 
159
169
  ```javascript
160
170
  import scan from "@1-/scan";
@@ -162,7 +172,7 @@ import scan from "@1-/scan";
162
172
  const dir = "./data";
163
173
  const db_path = "./scan_record.db";
164
174
 
165
- // 忽略特定文件或目录
175
+ // 过滤临时文件与特定配置
166
176
  const ignore = (kind, rel_path) => {
167
177
  return rel_path.startsWith("temp/") || rel_path === "config.json";
168
178
  };
@@ -177,36 +187,41 @@ for (const rel_path of updated_paths) {
177
187
  }
178
188
  ```
179
189
 
180
- ## 设计思路
190
+ ### 批量存储模块使用
191
+
192
+ ```javascript
193
+ import save from "@1-/scan/save.js";
194
+ import sqlite from "@1-/scan/sqlite.js";
181
195
 
182
- 系统主入口调用各个独立模块完成增量扫描与数据同步流程。
183
-
184
- ```mermaid
185
- graph TD
186
- Entry["_.js (主入口)"] -->|1. 初始化连接| Sqlite["sqlite.js"]
187
- Entry -->|2. 加载已有记录| Load["load.js"]
188
- Entry -->|3. 扫描文件系统并对比| DirWalk["dirWalk.js"]
189
- DirWalk -->|调用| Walk["@1-/walk/walkRelIgnore"]
190
- DirWalk -->|处理路径键| Hash["hash.js"]
191
- Entry -->|4. 删除失效记录并返回更新函数| Trans["trans.js"]
192
- Save["save.js (独立批量存储辅助模块)"] -->|事务保障| Trans
196
+ const db = sqlite("./scan_record.db");
197
+
198
+ // 批量更新与删除元数据
199
+ save(db, [["file.txt", new Uint8Array([1, 2, 3]), 123, 1620000000]], [new Uint8Array([4, 5, 6])]);
200
+
201
+ db.close();
193
202
  ```
194
203
 
204
+ ## 设计思路
205
+
206
+ 系统主入口调度各独立模块完成增量扫描与数据同步。
207
+
208
+ ![](https://i-01.eu.org/o9xJHQ8BDxiBtfjhCcTUtQ)
209
+
195
210
  1. **初始化连接 (`sqlite.js`)**:打开 SQLite 数据库,并配置自动释放连接机制。
196
- 2. **加载记录 (`load.js`)**:若表不存在则自动创建,读取已记录的文件哈希、大小及修改时间,在内存中还原比对集合。
211
+ 2. **加载记录 (`load.js`)**:若数据表 `scanMtimeLen` 不存在则自动创建,读取已记录的文件哈希、大小及修改时间,在内存中还原比对集合。
197
212
  3. **文件系统扫描 (`dirWalk.js`)**:递归遍历目录,利用 `hash.js` 将路径映射为 16 字节键。对比当前文件与数据库元数据(利用 `@3-/vb` 进行压缩状态对比),筛选出新增和修改的文件。
198
- 4. **删除与返回更新函数**:使用 `trans.js` 开启事务,批量删除已被移除的无效记录,并返回变更的相对路径列表与 `upsert` 函数,供调用者按需持久化数据。
199
- 5. **独立批量存储辅助模块 (`save.js`)**:导出的独立工具模块,用于在单个事务中一次性批量写入与删除。
213
+ 4. **删除与返回更新函数**:使用 `trans.js` 开启事务,批量删除已被移除的记录,并返回变更的相对路径列表与 `upsert` 函数,供调用者持久化数据。
214
+ 5. **独立批量存储模块 (`save.js`)**:供外部调用的独立工具模块,用于在事务中批量写入与删除。
200
215
 
201
216
  ## 技术栈
202
217
 
203
- - **Bun**:JavaScript 运行时及测试框架。
204
- - **Bun SQLite**:内置的轻量级、高性能 SQLite 实现。
218
+ - **Bun**:JavaScript 运行时与测试框架。
219
+ - **Bun SQLite**:内置 SQLite 实现。
205
220
  - **@1-/walk**:支持过滤规则的目录递归遍历工具。
206
221
  - **@3-/vb**:Varint(可变字节)编码与解码器。
207
222
  - **@3-/binmap / @3-/binset**:针对二进制键优化的 Map 和 Set 容器。
208
223
 
209
- ## 目录结构
224
+ ## 代码结构
210
225
 
211
226
  ```
212
227
  .
@@ -218,12 +233,16 @@ graph TD
218
233
  │ ├── save.js # 独立导出的批量持久化与删除辅助函数
219
234
  │ ├── sqlite.js # 创建并配置 SQLite 数据库实例
220
235
  │ └── trans.js # 封装 SQLite 事务,提供异常回滚机制
221
- └── tests # 单元测试模块
236
+ └── tests # 单元测试目录
222
237
  ```
223
238
 
224
239
  ## 历史故事
225
240
 
226
- SQLite 的诞生与军事应用密切相关。2000 年,D. Richard Hipp 在为美国海军陆战队设计导弹驱逐舰板载损害控制系统软件时,遇到商业数据库由于配置复杂、日常需要专业维护且一旦连接丢失便会导致整个软件瘫痪的问题。Hipp 随即着手设计了一套无需任何独立服务器、零配置且直接对本地文件进行读写的嵌入式数据库,这便是 SQLite。
241
+ SQLite 的诞生源自海军军工项目。2000 年,D. Richard Hipp 为美国海军陆战队设计导弹驱逐舰板载损害控制软件时,遭遇商业数据库因配置复杂、日常维护繁琐且连接丢失即导致系统瘫痪之痛点。Hipp 随后设计出免服务器配置、直接读写本地文件之嵌入式数据库,即 SQLite。
242
+
243
+ 为了节省磁盘空间与降低读写延迟,SQLite 广泛应用了 Varint(可变字节整型)编码。在这种编码下,数值较小的整数仅占用 1 字节,只有大数值才会占用更多字节。本项目中对文件大小和修改时间采用同样的压缩设计,秉承了 SQLite 节省空间与高效之设计哲学。
244
+ ## 关于
245
+
246
+ 本库由 [WebC.site](https://webc.site) 开发。
227
247
 
228
- 为极限节约磁盘空间 and 降低读写延迟,SQLite 广泛应用了 Varint(可变字节整型)编码。在这种编码下,数值较小的整数(如常见的文件大小、序列号)仅占用 1 个字节,只有大数值才会占用更多字节。本项目中对文件大小和修改时间采用同样的压缩设计,从而秉承了 SQLite 极致节约空间与高效率的系统设计哲学。
229
- ../doc/zh/about.md
248
+ [WebC.site](https://webc.site) : 面向人工智能的网站开发新范式
package/_.js CHANGED
@@ -21,12 +21,12 @@ export default async (dir, db_path, ignore) => {
21
21
 
22
22
  if (to_delete.length > 0) {
23
23
  trans(db, () => {
24
- const del = db.prepare("DELETE FROM file WHERE hash=?");
24
+ const del = db.prepare("DELETE FROM scanMtimeLen WHERE hash=?");
25
25
  to_delete.forEach((h) => del.run(h));
26
26
  });
27
27
  }
28
28
 
29
- const insert = db.prepare("INSERT OR REPLACE INTO file(hash,size,mtime)VALUES(?,?,?)"),
29
+ const insert = db.prepare("INSERT OR REPLACE INTO scanMtimeLen(hash,size,mtime)VALUES(?,?,?)"),
30
30
  upsert = async (rel_path) => {
31
31
  const fp = join(dir, rel_path),
32
32
  { size, mtimeMs } = await stat(fp),
package/load.js CHANGED
@@ -2,10 +2,10 @@ const SQLITE_ERROR = 1;
2
2
 
3
3
  export default (db) => {
4
4
  try {
5
- return db.prepare("SELECT hash,size,mtime FROM file").all();
5
+ return db.prepare("SELECT hash,size,mtime FROM scanMtimeLen").all();
6
6
  } catch (err) {
7
7
  if (err.errno === SQLITE_ERROR) {
8
- db.exec("CREATE TABLE file(hash PRIMARY KEY,size INT UNSIGNED,mtime INT UNSIGNED)");
8
+ db.exec("CREATE TABLE scanMtimeLen(hash PRIMARY KEY,size INT UNSIGNED,mtime INT UNSIGNED)");
9
9
  return [];
10
10
  }
11
11
  throw err;
package/package.json CHANGED
@@ -1,13 +1,13 @@
1
1
  {
2
2
  "name": "@1-/scan",
3
- "version": "0.1.4",
3
+ "version": "0.1.6",
4
4
  "description": "Incrementally scan directory files and track metadata in SQLite / 增量扫描目录文件并使用 SQLite 记录元数据",
5
5
  "keywords": [
6
- "scan",
7
- "incremental",
8
- "sqlite",
9
6
  "directory",
10
- "metadata"
7
+ "incremental",
8
+ "metadata",
9
+ "scan",
10
+ "sqlite"
11
11
  ],
12
12
  "homepage": "https://github.com/webc-site/npm/tree/main/scan",
13
13
  "license": "MulanPSL-2.0",
@@ -22,7 +22,7 @@
22
22
  "./*": "./*"
23
23
  },
24
24
  "dependencies": {
25
- "@1-/walk": "^0.1.0",
25
+ "@1-/walk": "^0.1.1",
26
26
  "@3-/binmap": "^0.1.20",
27
27
  "@3-/binset": "^0.1.6",
28
28
  "@3-/int": "^0.1.1",
package/save.js CHANGED
@@ -4,11 +4,13 @@ export default (db, to_update, to_delete) => {
4
4
  if (to_update.length > 0 || to_delete.length > 0) {
5
5
  trans(db, () => {
6
6
  if (to_update.length > 0) {
7
- const insert = db.prepare("INSERT OR REPLACE INTO file(hash,size,mtime)VALUES(?,?,?)");
7
+ const insert = db.prepare(
8
+ "INSERT OR REPLACE INTO scanMtimeLen(hash,size,mtime)VALUES(?,?,?)",
9
+ );
8
10
  to_update.forEach(([_, h, size, mtime]) => insert.run(h, size, mtime));
9
11
  }
10
12
  if (to_delete.length > 0) {
11
- const del = db.prepare("DELETE FROM file WHERE hash=?");
13
+ const del = db.prepare("DELETE FROM scanMtimeLen WHERE hash=?");
12
14
  to_delete.forEach((h) => del.run(h));
13
15
  }
14
16
  });