@1-/scan 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,145 @@
1
+ [English](#en) | [中文](#zh)
2
+
3
+ ---
4
+
5
+ <a id="en"></a>
6
+ # @1-/scan : Incrementally scan directory files and track metadata in SQLite
7
+
8
+ Incrementally scans directory files, tracks file sizes and modification times, and synchronizes status into a SQLite database.
9
+
10
+ ## Features
11
+
12
+ - **Incremental Scanning**: Detects and updates only new, modified, or deleted files, avoiding redundant file system operations
13
+ - **Space-Efficient Storage**: Employs Varint compression (`@3-/vb`) to serialize and compare file sizes and modification times
14
+ - **Smart Path Key**: Stores short relative paths (≤ 16 bytes) as raw binary to preserve readability, while hashing longer paths to 16-byte MD5 digests to optimize index performance
15
+ - **Database Synchronization**: Synchronizes updates and deletions in a single atomic transaction
16
+ - **Ignore Pattern Support**: Integrates ignore rules dynamically during traversal
17
+
18
+ ## Usage
19
+
20
+ ```javascript
21
+ import scan from "@1-/scan";
22
+
23
+ const directoryPath = "./src";
24
+ const sqliteDbPath = "./files.db";
25
+
26
+ // Scan directory and sync records into SQLite
27
+ await scan(directoryPath, sqliteDbPath);
28
+ ```
29
+
30
+ ## Design Ideas
31
+
32
+ Execution flow of modules:
33
+
34
+ ```mermaid
35
+ graph TD
36
+ Entry["_.js (Main)"] -->|Open database| Sqlite[sqlite.js]
37
+ Entry -->|Load existing records| FileAll[fileAll.js]
38
+ Entry -->|Walk and compare| DirWalk[dirWalk.js]
39
+ DirWalk -->|Traverse files| Walk["@1-/walk/walkRelIgnore"]
40
+ DirWalk -->|Optimize keys| Hash[hash.js]
41
+ Entry -->|Apply modifications| FileWrite[fileWrite.js]
42
+ FileWrite -->|Wrap transaction| Trans[trans.js]
43
+ ```
44
+
45
+ ## Tech Stack
46
+
47
+ - **Bun**: Runtime and test runner
48
+ - **SQLite**: Local relational database engine
49
+ - **@1-/walk**: Directory walker with ignore support
50
+ - **@3-/vb**: Variable-length byte encoder
51
+ - **@3-/binmap** / **@3-/binset**: Efficient binary collection structures
52
+
53
+ ## Directory Structure
54
+
55
+ ```
56
+ .
57
+ ├── src
58
+ │ ├── _.js # Entry point orchestrating the scanning and sync process
59
+ │ ├── dirWalk.js # Recursively scans files and filters modified ones
60
+ │ ├── fileAll.js # Retrieves database records and initializes schema
61
+ │ ├── fileWrite.js # Performs bulk database inserts and deletes
62
+ │ ├── hash.js # Processes path keys into raw bytes or MD5 digests
63
+ │ ├── sqlite.js # Manages SQLite database connection and disposal
64
+ │ └── trans.js # Wraps operations inside an SQL transaction
65
+ └── tests # Test suites
66
+ ```
67
+
68
+ ## History
69
+
70
+ SQLite was designed in 2000 by D. Richard Hipp while working on a US Navy damage control system. The application originally relied on an Informix database, which required extensive database administration. Hipp designed SQLite to be a serverless, self-contained library requiring zero configuration, allowing the software to function reliably even when database services were unavailable.
71
+
72
+ To optimize space inside the database file, SQLite internally uses variable-length integers (Varints) to compress metadata. This project adopts similar techniques—compressing file size and modification time into varints before storage—inheriting the SQLite philosophy of minimalism and space efficiency for local file synchronization.
73
+ ../doc/en/about.md
74
+
75
+ ---
76
+
77
+ <a id="zh"></a>
78
+ # @1-/scan : 增量扫描目录文件并使用 SQLite 记录元数据
79
+
80
+ 增量扫描指定目录,比对并同步文件大小与修改时间,将记录存入 SQLite 数据库。
81
+
82
+ ## 功能介绍
83
+
84
+ - **增量扫描**:仅处理新增、修改或删除的文件,避免冗余文件操作
85
+ - **紧凑存储**:使用可变字节码(Varint)压缩技术(`@3-/vb`)比对并保存文件大小和修改时间
86
+ - **智能哈希**:短相对路径(≤ 16 字节)保留原始字节,长相对路径哈希为 16 字节 MD5,优化数据库索引效率
87
+ - **事务同步**:更新与删除操作合并至单次数据库事务,确保一致性
88
+ - **规则过滤**:基于 `@1-/walk` 的忽略规则过滤特定文件与目录
89
+
90
+ ## 使用演示
91
+
92
+ ```javascript
93
+ import scan from "@1-/scan";
94
+
95
+ const directoryPath = "./src";
96
+ const sqliteDbPath = "./files.db";
97
+
98
+ // 扫描目录并同步至 SQLite 数据库
99
+ await scan(directoryPath, sqliteDbPath);
100
+ ```
101
+
102
+ ## 设计思路
103
+
104
+ 模块调用流程如下:
105
+
106
+ ```mermaid
107
+ graph TD
108
+ Entry["_.js (主入口)"] -->|打开数据库| Sqlite[sqlite.js]
109
+ Entry -->|载入已有记录| FileAll[fileAll.js]
110
+ Entry -->|遍历并比对| DirWalk[dirWalk.js]
111
+ DirWalk -->|扫描文件系统| Walk["@1-/walk/walkRelIgnore"]
112
+ DirWalk -->|计算路径哈希| Hash[hash.js]
113
+ Entry -->|写入变更数据| FileWrite[fileWrite.js]
114
+ FileWrite -->|执行事务控制| Trans[trans.js]
115
+ ```
116
+
117
+ ## 技术栈
118
+
119
+ - **Bun**: 运行环境与测试工具
120
+ - **SQLite**: 本地关系型数据库
121
+ - **@1-/walk**: 支持忽略规则的目录遍历工具
122
+ - **@3-/vb**: 可变长度整型编码器
123
+ - **@3-/binmap** / **@3-/binset**: 二进制哈希键容器
124
+
125
+ ## 目录结构
126
+
127
+ ```
128
+ .
129
+ ├── src
130
+ │ ├── _.js # 主入口,统筹扫描与同步逻辑
131
+ │ ├── dirWalk.js # 递归遍历目录,比对筛选出变更文件
132
+ │ ├── fileAll.js # 读取数据库中全部记录,初始化数据表
133
+ │ ├── fileWrite.js # 事务内执行批量插入与删除
134
+ │ ├── hash.js # 计算相对路径哈希值或保留原始字节
135
+ │ ├── sqlite.js # 管理 SQLite 数据库连接及资源释放
136
+ │ └── trans.js # 封装数据库事务控制
137
+ └── tests # 测试目录
138
+ ```
139
+
140
+ ## 历史故事
141
+
142
+ SQLite 由 D. Richard Hipp 于 2000 年为美国海军驱逐舰的控制系统编写。当时系统采用的商业数据库需繁琐的系统管理,一旦数据库故障系统便无法运行。Hipp 因而设计出无服务器、零配置且直接读写单文件的 SQLite。
143
+
144
+ 为极限节约存储空间,SQLite 内部大量采用可变长度整数(Varint)编码。本项目同样引入 Varint 压缩算法,对文件大小与修改时间做高效编码后再作比对存储,延续了 SQLite 追求极致性能与紧凑空间的优良传统。
145
+ ../doc/zh/about.md
package/_.js ADDED
@@ -0,0 +1,26 @@
1
+ import { BinMap } from "@3-/binmap";
2
+ import vbE from "@3-/vb/vbE.js";
3
+ import sqlite from "./sqlite.js";
4
+ import fileAll from "./fileAll.js";
5
+ import dirWalk from "./dirWalk.js";
6
+ import fileWrite from "./fileWrite.js";
7
+
8
+ export default async (dir, db_path) => {
9
+ using db = sqlite(db_path);
10
+ const existing = new BinMap(),
11
+ db_rows = fileAll(db);
12
+
13
+ for (const row of db_rows) {
14
+ existing.set(row.hash, vbE([row.size, row.mtime]));
15
+ }
16
+
17
+ const [scanned, to_update] = await dirWalk(dir, existing),
18
+ to_delete = [];
19
+ for (const row of db_rows) {
20
+ if (!scanned.has(row.hash)) {
21
+ to_delete.push(row.hash);
22
+ }
23
+ }
24
+
25
+ fileWrite(db, to_update, to_delete);
26
+ };
package/dirWalk.js ADDED
@@ -0,0 +1,32 @@
1
+ import { stat } from "node:fs/promises";
2
+ import { join } from "node:path";
3
+ import { FILE } from "@1-/walk";
4
+ import walkRelIgnore from "@1-/walk/walkRelIgnore.js";
5
+ import { BinSet } from "@3-/binset";
6
+ import u8eq from "@3-/u8/u8eq.js";
7
+ import vbE from "@3-/vb/vbE.js";
8
+ import int from "@3-/int";
9
+ import hash from "./hash.js";
10
+
11
+ export default async (dir, existing) => {
12
+ const scanned = new BinSet(),
13
+ to_update = [];
14
+
15
+ await walkRelIgnore(dir, async (kind, rel_path) => {
16
+ if (kind === FILE) {
17
+ const { size, mtimeMs } = await stat(join(dir, rel_path)),
18
+ mtime = int(mtimeMs),
19
+ h = hash(rel_path);
20
+
21
+ scanned.add(h);
22
+
23
+ const val = existing.get(h);
24
+ if (val && u8eq(val, vbE([size, mtime]))) {
25
+ return;
26
+ }
27
+ to_update.push([h, size, mtime]);
28
+ }
29
+ });
30
+
31
+ return [scanned, to_update];
32
+ };
package/fileAll.js ADDED
@@ -0,0 +1,13 @@
1
+ const SQLITE_ERROR = 1;
2
+
3
+ export default (db) => {
4
+ try {
5
+ return db.prepare("SELECT hash,size,mtime FROM file").all();
6
+ } catch (err) {
7
+ if (err.errcode === SQLITE_ERROR) {
8
+ db.exec("CREATE TABLE file(hash PRIMARY KEY,size INT UNSIGNED,mtime INT UNSIGNED)");
9
+ return [];
10
+ }
11
+ throw err;
12
+ }
13
+ };
package/fileWrite.js ADDED
@@ -0,0 +1,20 @@
1
+ import trans from "./trans.js";
2
+
3
+ export default (db, to_update, to_delete) => {
4
+ if (to_update.length > 0 || to_delete.length > 0) {
5
+ trans(db, () => {
6
+ if (to_update.length > 0) {
7
+ const insert = db.prepare("INSERT OR REPLACE INTO file(hash,size,mtime)VALUES(?,?,?)");
8
+ for (const record of to_update) {
9
+ insert.run(...record);
10
+ }
11
+ }
12
+ if (to_delete.length > 0) {
13
+ const del = db.prepare("DELETE FROM file WHERE hash=?");
14
+ for (const h of to_delete) {
15
+ del.run(h);
16
+ }
17
+ }
18
+ });
19
+ }
20
+ };
package/hash.js ADDED
@@ -0,0 +1,7 @@
1
+ import { createHash } from "node:crypto";
2
+ import utf8e from "@3-/utf8/utf8e.js";
3
+
4
+ export default (str) => {
5
+ const buf = utf8e(str);
6
+ return buf.length <= 16 ? buf : createHash("md5").update(buf).digest();
7
+ };
package/package.json ADDED
@@ -0,0 +1,33 @@
1
+ {
2
+ "name": "@1-/scan",
3
+ "version": "0.1.0",
4
+ "description": "Incrementally scan directory files and track metadata in SQLite / 增量扫描目录文件并使用 SQLite 记录元数据",
5
+ "keywords": [
6
+ "scan",
7
+ "incremental",
8
+ "sqlite",
9
+ "directory",
10
+ "metadata"
11
+ ],
12
+ "homepage": "https://github.com/webc-site/npm/tree/main/scan",
13
+ "license": "MulanPSL-2.0",
14
+ "author": "x-at-01@googlegroups.com",
15
+ "repository": {
16
+ "type": "git",
17
+ "url": "git+https://github.com/webc-site/npm.git"
18
+ },
19
+ "type": "module",
20
+ "exports": {
21
+ ".": "./_.js",
22
+ "./*": "./*"
23
+ },
24
+ "dependencies": {
25
+ "@1-/walk": "^0.1.0",
26
+ "@3-/binmap": "^0.1.20",
27
+ "@3-/binset": "^0.1.6",
28
+ "@3-/int": "^0.1.1",
29
+ "@3-/u8": "^0.1.2",
30
+ "@3-/utf8": "^0.1.1",
31
+ "@3-/vb": "^0.1.6"
32
+ }
33
+ }
package/sqlite.js ADDED
@@ -0,0 +1,7 @@
1
+ const { DatabaseSync } = await import("node:sqlite");
2
+
3
+ export default (db_path) => {
4
+ const db = new DatabaseSync(db_path);
5
+ db[Symbol.dispose] = () => db.close();
6
+ return db;
7
+ };
package/trans.js ADDED
@@ -0,0 +1,11 @@
1
+ export default (db, run) => {
2
+ db.exec("BEGIN");
3
+ try {
4
+ const res = run();
5
+ db.exec("COMMIT");
6
+ return res;
7
+ } catch (err) {
8
+ db.exec("ROLLBACK");
9
+ throw err;
10
+ }
11
+ };