encoding-aware-fs 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,54 @@
1
+ # encoding-aware-fs
2
+
3
+ 面向 AI 编码工具(Claude Code、OpenCode 等)的编码感知文件操作 MCP 服务器,专为处理 GB18030/GBK/GB2312 编码项目而设计。
4
+
5
+ ## 问题背景
6
+
7
+ AI 编码助手默认所有文件都是 UTF-8 编码,在读写 GB18030/GBK/GB2312 编码的文件时会导致静默乱码——这在中文遗留项目中非常常见。本 MCP 服务器透明地处理编码检测与转换,让 AI 工具能正确操作非 UTF-8 文件。
8
+
9
+ ## 功能特性
10
+
11
+ - **自动编码检测** — 读取文件时自动识别原始编码(GB18030/GBK/GB2312),返回 UTF-8 给 AI 工具
12
+ - **透明回写** — 写入时自动将 UTF-8 内容转换回文件的原始编码
13
+ - **行尾保持** — 检测并保持文件原有的 CRLF/LF 行尾格式,编辑后不会改变
14
+ - **三个 MCP 工具**:`read_file`、`write_file`、`edit_file` — 替代内置文件操作工具
15
+ - **一键安装/卸载** — 支持 Claude Code 和 OpenCode
16
+
17
+ ## 安装
18
+
19
+ ```bash
20
+ npx encoding-aware-fs install
21
+ ```
22
+
23
+ 该命令会在项目的 `.mcp.json`(Claude Code)或 `.opencode.json`(OpenCode)中注册 MCP 服务器,并添加规则文件指示 AI 使用 MCP 工具进行所有文件操作。
24
+
25
+ ## 卸载
26
+
27
+ ```bash
28
+ npx encoding-aware-fs uninstall
29
+ ```
30
+
31
+ ## MCP 工具
32
+
33
+ | 工具 | 说明 |
34
+ |------|------|
35
+ | `read_file` | 读取文件,自动检测编码并返回 UTF-8 内容 |
36
+ | `write_file` | 写入 UTF-8 内容,自动转换为文件原始编码 |
37
+ | `edit_file` | 查找替换编辑,保持编码和行尾格式 |
38
+
39
+ ## 工作原理
40
+
41
+ 1. **读取时**:通过 BOM 和启发式算法检测文件编码,解码为 UTF-8 返回
42
+ 2. **写入/编辑时**:先读取原文件检测编码和行尾风格,在 UTF-8 下完成操作,写入前转回原编码并还原行尾格式
43
+
44
+ ## 开发
45
+
46
+ ```bash
47
+ npm install
48
+ npm run build
49
+ npm test
50
+ ```
51
+
52
+ ## 许可证
53
+
54
+ MIT
package/RESEARCH.md ADDED
@@ -0,0 +1,502 @@
1
+ # Encoding-Aware-FS MCP Project Research
2
+
3
+ **Date**: 2026-03-26
4
+ **Project Location**: F:\WorkSpace\projects\filesystemex
5
+ **Version**: 0.1.4
6
+
7
+ ## Executive Summary
8
+
9
+ The **encoding-aware-fs** is a sophisticated MCP (Model Context Protocol) server designed to handle file operations with transparent encoding detection and conversion. It specializes in working with GB18030/GBK/GB2312 encoded files while maintaining UTF-8 compatibility for AI tools.
10
+
11
+ ---
12
+
13
+ ## 1. Overall Project Structure
14
+
15
+ ### Directory Layout
16
+ ```
17
+ filesystemex/
18
+ ├── .git/ # Git repository
19
+ ├── .claude/ # Claude-specific config
20
+ ├── .agents/ # Agent definitions
21
+ ├── src/ # TypeScript source code
22
+ │ ├── server.ts # Main MCP server entry
23
+ │ ├── index.ts # CLI router (install/uninstall/serve)
24
+ │ ├── tools/ # MCP Tool implementations
25
+ │ │ ├── read-file.ts # Read file tool with encoding detection
26
+ │ │ ├── write-file.ts # Write file tool with encoding preservation
27
+ │ │ └── edit-file.ts # Edit file tool (THE KEY TOOL)
28
+ │ ├── encoding/ # Encoding utilities
29
+ │ │ ├── detector.ts # Python chardet integration
30
+ │ │ ├── converter.ts # iconv-lite encoding/decoding
31
+ │ │ ├── line-endings.ts # CRLF/LF detection and restoration
32
+ │ │ └── binary.ts # Binary file detection
33
+ │ ├── config.ts # Project configuration loader
34
+ │ ├── path-validation.ts # Security path validation with symlink checks
35
+ │ ├── path-utils.ts # Path utilities (normalization, home expansion)
36
+ │ ├── config-io.ts # Config file I/O
37
+ │ ├── logger.ts # Structured logging
38
+ │ ├── installer.ts # One-click installation
39
+ │ └── uninstaller.ts # One-click uninstallation
40
+ ├── dist/ # Compiled JavaScript
41
+ │ ├── server.js # Bundled MCP server
42
+ │ └── index.js # Bundled CLI entry point
43
+ ├── tests/ # Vitest test suite
44
+ │ ├── fixtures/ # Test files (GB18030, UTF-8, binary)
45
+ │ └── encoding/ # Encoding tests
46
+ ├── docs/ # Documentation
47
+ ├── skills/ # Claude Code skills
48
+ ├── package.json # NPM package metadata
49
+ ├── tsconfig.json # TypeScript configuration
50
+ ├── build.js # esbuild build script
51
+ ├── .mcp.json # MCP server configuration
52
+ ├── .encoding-converter.json # Project encoding config
53
+ └── README.md # Project documentation
54
+ ```
55
+
56
+ ### Key Dependencies
57
+ - `@modelcontextprotocol/sdk` - MCP protocol implementation
58
+ - `zod` - Schema validation
59
+ - `iconv-lite` - Encoding conversion
60
+ - `diff` - Unified diff generation
61
+ - `@inquirer/prompts` - Interactive CLI
62
+
63
+ ---
64
+
65
+ ## 2. The Edit_File Tool - Complete Implementation
66
+
67
+ ### 2.1 What edit_file Returns
68
+
69
+ The `edit_file` tool returns a **unified diff** showing all changes:
70
+
71
+ ```typescript
72
+ return {
73
+ content: [{ type: "text" as const, text: diff }]
74
+ }
75
+ ```
76
+
77
+ Example return format:
78
+ ```
79
+ --- /path/to/file.txt
80
+ +++ /path/to/file.txt
81
+ @@ -10,3 +10,3 @@
82
+ context line
83
+ -removed text
84
+ +new text
85
+ context line
86
+ ```
87
+
88
+ ### 2.2 Tool Input Schema
89
+
90
+ ```typescript
91
+ {
92
+ path: z.string().describe("Absolute file path"),
93
+ edits: z.array(z.object({
94
+ oldText: z.string().describe("Text to search for"),
95
+ newText: z.string().describe("Text to replace with"),
96
+ })),
97
+ dryRun: z.boolean().default(false).describe("Preview changes without writing"),
98
+ }
99
+ ```
100
+
101
+ ### 2.3 Handler Execution Flow (9 Steps)
102
+
103
+ **Step 1: Path Validation**
104
+ - Validates path is within allowed directories
105
+ - Resolves symlinks to prevent escape attacks
106
+ - For new files, validates parent directory exists
107
+
108
+ **Step 2: Read Raw Bytes**
109
+ - Reads file as binary buffer
110
+ - Preserves original line endings
111
+
112
+ **Step 3: Detect Line Ending Style**
113
+ - Examines raw bytes for CRLF vs LF
114
+ - Returns 'CRLF' or 'LF'
115
+ - Ties favor CRLF
116
+
117
+ **Step 4: Detect Encoding**
118
+ - Spawns Python process with chardet library
119
+ - Reads first 4KB of file
120
+ - Returns encoding and confidence (0-1)
121
+ - Uses confidence threshold (default 0.8)
122
+ - Falls back to project config sourceEncoding
123
+
124
+ **Step 5: Decode to UTF-8**
125
+ - Uses iconv-lite to convert detected encoding to UTF-8
126
+ - All further processing uses UTF-8
127
+
128
+ **Step 6: Apply Edits**
129
+ - Normalizes line endings to LF for comparison
130
+ - Tries exact string match first
131
+ - Falls back to fuzzy line-by-line matching
132
+ - Fuzzy matching compares trimmed lines, preserves indentation
133
+ - Throws error if no match found
134
+
135
+ **Step 7: Generate Diff**
136
+ - Uses `diff` package's `createTwoFilesPatch()` function
137
+ - Generates unified diff format
138
+ - Normalizes line endings to LF for comparison
139
+
140
+ **Step 8: Conditional Write**
141
+ - If dryRun=true: returns diff only
142
+ - If dryRun=false:
143
+ - Restores original line ending style
144
+ - Encodes back to original detected encoding
145
+ - Performs atomic write (temp file → rename)
146
+
147
+ **Step 9: Return Result**
148
+ - Returns standardized MCP response with diff text
149
+
150
+ ---
151
+
152
+ ## 3. Fuzzy Edit Matching Algorithm
153
+
154
+ Location: `src/tools/edit-file.ts`, lines 25-74
155
+
156
+ Two-phase matching strategy:
157
+
158
+ **Phase 1: Exact Match**
159
+ ```
160
+ Search for oldText substring in content
161
+ If found, replace with newText
162
+ ```
163
+
164
+ **Phase 2: Fuzzy Line-by-Line Match**
165
+ ```
166
+ For each position in content:
167
+ Compare trimmed lines with oldText lines
168
+ If all lines match (trimmed):
169
+ - Get original indentation from first line
170
+ - Apply newText preserving indentation
171
+ - Handle relative indentation for multi-line replacements
172
+ - Continue to next edit
173
+ If no match found:
174
+ Throw "Could not find match for edit" error
175
+ ```
176
+
177
+ Key features:
178
+ - Whitespace-tolerant
179
+ - Indentation-aware
180
+ - Relative indentation handling
181
+ - Deterministic (uses first match)
182
+
183
+ ---
184
+
185
+ ## 4. MCP Server Communication
186
+
187
+ ### 4.1 Server Startup
188
+
189
+ ```typescript
190
+ // 1. Create McpServer instance
191
+ const server = new McpServer({
192
+ name: "encoding-aware-fs",
193
+ version: "0.1.0",
194
+ })
195
+
196
+ // 2. Load configuration from .encoding-converter.json
197
+ config = await loadProjectConfig(process.cwd())
198
+
199
+ // 3. Define allowed directories
200
+ const allowedDirectories = [process.cwd()]
201
+
202
+ // 4. Register tools
203
+ registerReadFile(server, config, allowedDirectories)
204
+ registerWriteFile(server, config, allowedDirectories)
205
+ registerEditFile(server, config, allowedDirectories)
206
+
207
+ // 5. Connect stdio transport
208
+ const transport = new StdioServerTransport()
209
+ await server.connect(transport)
210
+ ```
211
+
212
+ ### 4.2 Communication Protocol
213
+
214
+ **Transport**: stdio (stdin/stdout)
215
+
216
+ **Message Flow**:
217
+ 1. Client sends JSON-RPC request via stdin
218
+ 2. Server parses and validates with Zod schema
219
+ 3. Server executes async handler
220
+ 4. Handler returns result object
221
+ 5. Server wraps in JSON-RPC response
222
+ 6. Server writes to stdout
223
+ 7. Client parses and displays result
224
+
225
+ ### 4.3 Response Format
226
+
227
+ Standard MCP response:
228
+
229
+ ```typescript
230
+ {
231
+ content: [
232
+ {
233
+ type: "text", // Can also be "image" or "resource"
234
+ text: string // The actual content
235
+ }
236
+ ]
237
+ }
238
+ ```
239
+
240
+ ---
241
+
242
+ ## 5. Encoding Handling
243
+
244
+ ### 5.1 Encoding Detection
245
+
246
+ Uses Python's `chardet` library:
247
+
248
+ ```typescript
249
+ async function detectEncoding(filePath: string): Promise<DetectionResult> {
250
+ // Spawns Python process running chardet.detect()
251
+ // Reads first 4KB of file
252
+ // Returns { encoding: string | null, confidence: number }
253
+ }
254
+ ```
255
+
256
+ Supported encodings:
257
+ - GB18030 (default for legacy projects)
258
+ - GBK
259
+ - GB2312
260
+ - UTF-8
261
+ - Any encoding supported by iconv-lite
262
+
263
+ ### 5.2 Encoding Conversion
264
+
265
+ Uses iconv-lite:
266
+
267
+ ```typescript
268
+ // Decoding: Any encoding → UTF-8
269
+ function decodeToUtf8(buffer: Buffer, encoding: string): string {
270
+ return iconv.decode(buffer, encoding)
271
+ }
272
+
273
+ // Encoding: UTF-8 → Target encoding
274
+ function encodeFromUtf8(text: string, encoding: string): Buffer {
275
+ return iconv.encode(text, encoding)
276
+ }
277
+ ```
278
+
279
+ ### 5.3 Line Ending Preservation
280
+
281
+ Detects and preserves CRLF vs LF:
282
+
283
+ ```typescript
284
+ function detectLineEnding(buffer: Buffer): 'CRLF' | 'LF' {
285
+ // Count 0x0D 0x0A (CRLF) vs standalone 0x0A (LF)
286
+ // Returns whichever is dominant
287
+ }
288
+
289
+ function restoreLineEndings(text: string, style: 'CRLF' | 'LF'): string {
290
+ // Normalize to LF first, then convert to target style
291
+ }
292
+ ```
293
+
294
+ ---
295
+
296
+ ## 6. Security: Path Validation
297
+
298
+ Location: `src/path-validation.ts`
299
+
300
+ Multi-layer security checks:
301
+
302
+ 1. Type validation
303
+ 2. Null byte prevention (path injection)
304
+ 3. Absolute path enforcement
305
+ 4. Allowed directory boundary check
306
+ 5. Symlink resolution & re-validation
307
+ 6. Parent directory validation (for new files)
308
+
309
+ Prevents:
310
+ - Directory traversal attacks
311
+ - Symlink escape attacks
312
+ - Writing outside project boundaries
313
+
314
+ ---
315
+
316
+ ## 7. Atomic File Write Pattern
317
+
318
+ Used by both write_file and edit_file:
319
+
320
+ ```typescript
321
+ const tempPath = `${validPath}.${randomBytes(16).toString('hex')}.tmp`
322
+ try {
323
+ await fs.writeFile(tempPath, encoded)
324
+ await fs.rename(tempPath, validPath) // Atomic operation
325
+ } catch (error) {
326
+ try { await fs.unlink(tempPath) } catch {} // Cleanup
327
+ throw error
328
+ }
329
+ ```
330
+
331
+ Benefits:
332
+ - Prevents partial writes on crash
333
+ - Atomic rename prevents corruption
334
+ - Automatic cleanup on error
335
+
336
+ ---
337
+
338
+ ## 8. Diff Generation
339
+
340
+ Location: `src/tools/edit-file.ts`, lines 16-18
341
+
342
+ ```typescript
343
+ function createUnifiedDiff(
344
+ original: string,
345
+ modified: string,
346
+ filepath: string
347
+ ): string {
348
+ return createTwoFilesPatch(
349
+ filepath,
350
+ filepath,
351
+ normalizeLineEndings(original),
352
+ normalizeLineEndings(modified),
353
+ 'original',
354
+ 'modified'
355
+ )
356
+ }
357
+ ```
358
+
359
+ Uses `diff` package to generate standard unified diff format.
360
+
361
+ ---
362
+
363
+ ## 9. Logging
364
+
365
+ Location: `src/logger.ts`
366
+
367
+ Structured logging with configurable levels:
368
+
369
+ ```typescript
370
+ function log(
371
+ level: 'debug' | 'info' | 'error',
372
+ message: string,
373
+ data?: object
374
+ ): void
375
+ ```
376
+
377
+ - Logs to: `~/.encoding-aware-fs/logs/server.log`
378
+ - Level controlled by: `ENCODING_CONVERTER_LOG` env variable
379
+ - Default level: 'error'
380
+ - Includes timestamps and JSON data
381
+
382
+ Used to track encoding detection and file operations.
383
+
384
+ ---
385
+
386
+ ## 10. Configuration
387
+
388
+ ### .encoding-converter.json
389
+
390
+ ```json
391
+ {
392
+ "sourceEncoding": "GB18030",
393
+ "targetEncoding": "UTF-8",
394
+ "confidenceThreshold": 0.8
395
+ }
396
+ ```
397
+
398
+ Controls:
399
+ - Default encoding for undetected files
400
+ - Encoding detection confidence threshold
401
+ - Target output encoding
402
+
403
+ ### .mcp.json
404
+
405
+ ```json
406
+ {
407
+ "mcpServers": {
408
+ "encoding-aware-fs": {
409
+ "command": "cmd",
410
+ "args": ["/c", "npx", "-y", "encoding-aware-fs", "serve"],
411
+ "env": {}
412
+ }
413
+ }
414
+ }
415
+ ```
416
+
417
+ Registers the MCP server with Claude Code.
418
+
419
+ ---
420
+
421
+ ## 11. Data Processing Flow
422
+
423
+ ```
424
+ User Input (AI Tool)
425
+
426
+ [stdin] JSON-RPC Request
427
+
428
+ Path Validation (security checks)
429
+
430
+ Read raw bytes
431
+
432
+ Detect line ending (CRLF vs LF)
433
+
434
+ Detect encoding (Python chardet)
435
+
436
+ Decode to UTF-8 (iconv-lite)
437
+
438
+ Apply edits (exact or fuzzy matching)
439
+
440
+ Generate unified diff
441
+
442
+ DryRun?
443
+ YES → Return diff only
444
+ NO → Restore line endings → Encode to original → Atomic write
445
+
446
+ [stdout] JSON-RPC Response with diff
447
+
448
+ Client displays diff
449
+ ```
450
+
451
+ ---
452
+
453
+ ## 12. Testing
454
+
455
+ Framework: Vitest
456
+
457
+ Test files:
458
+ - `tests/encoding/line-endings.test.ts`
459
+ - `tests/encoding/detector.test.ts`
460
+ - `tests/encoding/converter.test.ts`
461
+ - `tests/encoding/binary.test.ts`
462
+ - `tests/config.test.ts`
463
+
464
+ Run tests:
465
+ ```bash
466
+ npm test # Run once
467
+ npm run test:watch # Watch mode
468
+ npm run typecheck # TypeScript validation
469
+ ```
470
+
471
+ ---
472
+
473
+ ## 13. Summary
474
+
475
+ ### What edit_file Returns
476
+
477
+ A **unified diff** in standard format, showing:
478
+ - File paths and timestamps
479
+ - Line numbers and change counts
480
+ - Context lines (unchanged)
481
+ - Removed lines (prefixed with `-`)
482
+ - Added lines (prefixed with `+`)
483
+
484
+ ### Communication Model
485
+
486
+ 1. **Input**: JSON-RPC call with path and edits array
487
+ 2. **Processing**: Detect encoding → decode → apply edits → generate diff → optionally write
488
+ 3. **Output**: MCP response with diff text
489
+ 4. **Optional**: Write changes to disk (atomic, preserves encoding/line-endings)
490
+
491
+ ### Unique Features
492
+
493
+ 1. **Transparent Encoding Handling**: Works with legacy GB18030 projects
494
+ 2. **Line Ending Preservation**: Maintains original CRLF vs LF
495
+ 3. **Smart Matching**: Fuzzy matching handles indentation variations
496
+ 4. **Safe Operations**: Atomic writes prevent corruption
497
+ 5. **DryRun Support**: Preview before committing
498
+ 6. **Security**: Comprehensive path validation and symlink checks
499
+
500
+ ---
501
+
502
+ **End of Research Document**