vaultex 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
vaultex-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,106 @@
1
+ Metadata-Version: 2.4
2
+ Name: vaultex
3
+ Version: 0.1.0
4
+ Summary: Extract and merge files from a folder with precision — a lightweight GUI tool for developers and LLM workflows.
5
+ Author-email: Ian Gong <gongzhijie535@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/gongzhijie535-ctrl/vaultex
8
+ Project-URL: Repository, https://github.com/gongzhijie535-ctrl/vaultex
9
+ Project-URL: Issues, https://github.com/gongzhijie535-ctrl/vaultex
10
+ Keywords: file,extract,merge,llm,gradio,code,tools
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Topic :: Utilities
16
+ Requires-Python: >=3.10
17
+ Description-Content-Type: text/markdown
18
+ Requires-Dist: gradio>=4.0
19
+
20
+ # 🔐 Vaultex
21
+
22
+ > Extract what you need. Nothing more.
23
+
24
+ Vaultex is a lightweight GUI tool that lets you **scan a folder and merge all matching files into a single text output** — ready to paste into an LLM, a doc, or anywhere you need a full-project snapshot.
25
+
26
+ The idea is simple: when you're working with a codebase or a collection of files, you often need to quickly gather *specific types* of files across nested folders. Vaultex gives you precise control over what gets included, what gets skipped, and how the result is organized — all through a clean interface.
27
+
28
+ ---
29
+
30
+ ## ✨ Features
31
+
32
+ - 📂 **Folder picker** — browse or paste a path directly
33
+ - 📄 **File type selector** — pick from common extensions or add your own
34
+ - 🎯 **Whitelist / blacklist filtering** — specify exactly which folders and files to include or exclude
35
+ - 📦 **File size limit** — skip files that are too large
36
+ - 🔁 **Recursive or flat mode** — go deep or stay shallow
37
+ - 🔃 **Sort options** — by path, filename, or last modified time
38
+ - 🔍 **Preview before extracting** — scan first, extract when ready
39
+ - 💾 **Save to file** — optionally write the merged output back to disk
40
+ - 🤖 **Token estimator** — rough count to check if output fits your LLM context window
41
+
42
+ ---
43
+
44
+ ## 🚀 Installation
45
+
46
+ ```bash
47
+ pip install vaultex
48
+ ```
49
+
50
+ Then launch:
51
+
52
+ ```bash
53
+ vaultex
54
+ ```
55
+
56
+ Or run directly from source:
57
+
58
+ ```bash
59
+ git clone https://github.com/gongzhijie535-ctrl/vaultex
60
+ cd vaultex
61
+ pip install -e .
62
+ python -m vaultex
63
+ ```
64
+
65
+ ---
66
+
67
+ ## 🖥️ Usage
68
+
69
+ 1. Select a folder using the **📂 picker** or paste a path
70
+ 2. Check the file types you want (`.py`, `.md`, `.json`, etc.)
71
+ 3. Expand **Filter Options** to narrow down by folder or filename
72
+ 4. Click **🔍 Preview** to confirm the file list
73
+ 5. Click **🚀 Extract** to merge and view the output
74
+
75
+ ---
76
+
77
+ ## 📁 Project Structure
78
+
79
+ ```
80
+ vaultex/
81
+ ├── __init__.py
82
+ ├── __main__.py # entry point: python -m vaultex
83
+ ├── core.py # file collection + merging logic
84
+ └── app.py # Gradio UI
85
+ ```
86
+
87
+ ---
88
+
89
+ ## 📦 Requirements
90
+
91
+ - Python ≥ 3.10
92
+ - gradio
93
+
94
+ ---
95
+
96
+ ## 👤 Author
97
+
98
+ **Ian Gong (龚智杰)**
99
+ 📧 gongzhijie535@gmail.com
100
+ 🐙 [@gongzhijie535-ctrl](https://github.com/gongzhijie535-ctrl)
101
+
102
+ ---
103
+
104
+ ## 📄 License
105
+
106
+ MIT
@@ -0,0 +1,87 @@
1
+ # 🔐 Vaultex
2
+
3
+ > Extract what you need. Nothing more.
4
+
5
+ Vaultex is a lightweight GUI tool that lets you **scan a folder and merge all matching files into a single text output** — ready to paste into an LLM, a doc, or anywhere you need a full-project snapshot.
6
+
7
+ The idea is simple: when you're working with a codebase or a collection of files, you often need to quickly gather *specific types* of files across nested folders. Vaultex gives you precise control over what gets included, what gets skipped, and how the result is organized — all through a clean interface.
8
+
9
+ ---
10
+
11
+ ## ✨ Features
12
+
13
+ - 📂 **Folder picker** — browse or paste a path directly
14
+ - 📄 **File type selector** — pick from common extensions or add your own
15
+ - 🎯 **Whitelist / blacklist filtering** — specify exactly which folders and files to include or exclude
16
+ - 📦 **File size limit** — skip files that are too large
17
+ - 🔁 **Recursive or flat mode** — go deep or stay shallow
18
+ - 🔃 **Sort options** — by path, filename, or last modified time
19
+ - 🔍 **Preview before extracting** — scan first, extract when ready
20
+ - 💾 **Save to file** — optionally write the merged output back to disk
21
+ - 🤖 **Token estimator** — rough count to check if output fits your LLM context window
22
+
23
+ ---
24
+
25
+ ## 🚀 Installation
26
+
27
+ ```bash
28
+ pip install vaultex
29
+ ```
30
+
31
+ Then launch:
32
+
33
+ ```bash
34
+ vaultex
35
+ ```
36
+
37
+ Or run directly from source:
38
+
39
+ ```bash
40
+ git clone https://github.com/gongzhijie535-ctrl/vaultex
41
+ cd vaultex
42
+ pip install -e .
43
+ python -m vaultex
44
+ ```
45
+
46
+ ---
47
+
48
+ ## 🖥️ Usage
49
+
50
+ 1. Select a folder using the **📂 picker** or paste a path
51
+ 2. Check the file types you want (`.py`, `.md`, `.json`, etc.)
52
+ 3. Expand **Filter Options** to narrow down by folder or filename
53
+ 4. Click **🔍 Preview** to confirm the file list
54
+ 5. Click **🚀 Extract** to merge and view the output
55
+
56
+ ---
57
+
58
+ ## 📁 Project Structure
59
+
60
+ ```
61
+ vaultex/
62
+ ├── __init__.py
63
+ ├── __main__.py # entry point: python -m vaultex
64
+ ├── core.py # file collection + merging logic
65
+ └── app.py # Gradio UI
66
+ ```
67
+
68
+ ---
69
+
70
+ ## 📦 Requirements
71
+
72
+ - Python ≥ 3.10
73
+ - gradio
74
+
75
+ ---
76
+
77
+ ## 👤 Author
78
+
79
+ **Ian Gong (龚智杰)**
80
+ 📧 gongzhijie535@gmail.com
81
+ 🐙 [@gongzhijie535-ctrl](https://github.com/gongzhijie535-ctrl)
82
+
83
+ ---
84
+
85
+ ## 📄 License
86
+
87
+ MIT
@@ -0,0 +1,37 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "vaultex"
7
+ version = "0.1.0"
8
+ description = "Extract and merge files from a folder with precision — a lightweight GUI tool for developers and LLM workflows."
9
+ readme = "README.md"
10
+ license = { text = "MIT" }
11
+ authors = [
12
+ { name = "Ian Gong", email = "gongzhijie535@gmail.com" }
13
+ ]
14
+ keywords = ["file", "extract", "merge", "llm", "gradio", "code", "tools"]
15
+ classifiers = [
16
+ "Programming Language :: Python :: 3",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Operating System :: OS Independent",
19
+ "Intended Audience :: Developers",
20
+ "Topic :: Utilities",
21
+ ]
22
+ requires-python = ">=3.10"
23
+ dependencies = [
24
+ "gradio>=4.0",
25
+ ]
26
+
27
+ [project.urls]
28
+ Homepage = "https://github.com/gongzhijie535-ctrl/vaultex"
29
+ Repository = "https://github.com/gongzhijie535-ctrl/vaultex"
30
+ Issues = "https://github.com/gongzhijie535-ctrl/vaultex"
31
+
32
+ [project.scripts]
33
+ vaultex = "vaultex.app:launch"
34
+
35
+ [tool.setuptools.packages.find]
36
+ where = ["."]
37
+ include = ["vaultex*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
vaultex-0.1.0/setup.py ADDED
@@ -0,0 +1,3 @@
1
+ from setuptools import setup
2
+
3
+ setup()
@@ -0,0 +1,4 @@
1
+ from vaultex.core import extract, DEFAULT_EXTENSIONS
2
+
3
+ __version__ = "0.1.0"
4
+ __all__ = ["extract", "DEFAULT_EXTENSIONS"]
@@ -0,0 +1,4 @@
1
+ from vaultex.app import launch
2
+
3
+ if __name__ == "__main__":
4
+ launch()
@@ -0,0 +1,343 @@
1
+ import os
2
+ import gradio as gr
3
+ from vaultex.core import extract, DEFAULT_EXTENSIONS
4
+
5
+
6
+ def pick_folder():
7
+ try:
8
+ import tkinter as tk
9
+ from tkinter import filedialog
10
+ root = tk.Tk()
11
+ root.withdraw()
12
+ root.wm_attributes("-topmost", True)
13
+ folder = filedialog.askdirectory(title="选择文件夹")
14
+ root.destroy()
15
+ return folder or ""
16
+ except Exception as e:
17
+ return f"⚠️ 无法打开选择窗口:{e}"
18
+
19
+
20
+ def _merge_extensions(selected, custom_str):
21
+ exts = set(selected or [])
22
+ for line in custom_str.splitlines():
23
+ line = line.strip()
24
+ if not line:
25
+ continue
26
+ if not line.startswith("."):
27
+ line = "." + line
28
+ exts.add(line.lower())
29
+ return list(exts)
30
+
31
+
32
+ def _parse_lines(raw: str) -> list:
33
+ return [s.strip() for s in raw.splitlines() if s.strip()]
34
+
35
+
36
+ def _parse_keyword_files(raw: str) -> set:
37
+ return {s.strip().lower() for s in raw.splitlines() if s.strip()}
38
+
39
+
40
+ def _build_common_args(folder_path, selected_extensions, custom_extensions_str,
41
+ recursive, only_folders_str, skip_folders_str,
42
+ keyword_files_str, skip_files_str, max_file_kb):
43
+ folder_path = folder_path.strip()
44
+ extensions = _merge_extensions(selected_extensions, custom_extensions_str)
45
+ only_folders = _parse_lines(only_folders_str)
46
+ skip_folders = _parse_lines(skip_folders_str)
47
+ keyword_files = _parse_keyword_files(keyword_files_str)
48
+ skip_files = _parse_lines(skip_files_str)
49
+ max_kb = int(max_file_kb) if str(max_file_kb).strip().isdigit() else 0
50
+ return folder_path, extensions, only_folders, skip_folders, keyword_files, skip_files, max_kb
51
+
52
+
53
+ def run_scan(folder_path, selected_extensions, custom_extensions_str,
54
+ recursive, only_folders_str, skip_folders_str,
55
+ keyword_files_str, skip_files_str, max_file_kb):
56
+
57
+ folder_path, extensions, only_folders, skip_folders, keyword_files, skip_files, max_kb = _build_common_args(
58
+ folder_path, selected_extensions, custom_extensions_str,
59
+ recursive, only_folders_str, skip_folders_str,
60
+ keyword_files_str, skip_files_str, max_file_kb
61
+ )
62
+
63
+ if not folder_path:
64
+ return "⚠️ 请输入文件夹路径"
65
+ if not os.path.isdir(folder_path):
66
+ return "⚠️ 路径不存在或不是文件夹"
67
+ if not extensions:
68
+ return "⚠️ 请至少选择或填写一种文件类型"
69
+
70
+ from vaultex.core import _collect_files
71
+ file_list = _collect_files(
72
+ folder_path, extensions, recursive,
73
+ skip_folders, skip_files, only_folders, max_kb, keyword_files
74
+ )
75
+ file_list.sort()
76
+
77
+ if not file_list:
78
+ return "⚠️ 没有找到符合条件的文件"
79
+
80
+ lines = [f"🔍 扫描完成,共找到 {len(file_list)} 个文件:", ""]
81
+ for i, f in enumerate(file_list, 1):
82
+ rel = os.path.relpath(f, folder_path)
83
+ size = os.path.getsize(f) / 1024
84
+ lines.append(f" {i:>3}. {rel} ({size:.1f} KB)")
85
+
86
+ return "\n".join(lines)
87
+
88
+
89
+ def run_extract(folder_path, selected_extensions, custom_extensions_str,
90
+ recursive, separator, only_folders_str, skip_folders_str,
91
+ keyword_files_str, skip_files_str, max_file_kb,
92
+ sort_by, save_to_file, output_filename):
93
+
94
+ folder_path, extensions, only_folders, skip_folders, keyword_files, skip_files, max_kb = _build_common_args(
95
+ folder_path, selected_extensions, custom_extensions_str,
96
+ recursive, only_folders_str, skip_folders_str,
97
+ keyword_files_str, skip_files_str, max_file_kb
98
+ )
99
+
100
+ if not folder_path:
101
+ return "⚠️ 请输入文件夹路径", "未选择文件夹"
102
+ if not os.path.isdir(folder_path):
103
+ return "⚠️ 路径不存在或不是文件夹", "路径无效"
104
+ if not extensions:
105
+ return "⚠️ 请至少选择或填写一种文件类型", "未选择文件类型"
106
+
107
+ separator_str = separator.strip() if separator.strip() else "=" * 60
108
+
109
+ merged, file_list, stats = extract(
110
+ folder_path=folder_path,
111
+ extensions=extensions,
112
+ recursive=recursive,
113
+ separator=separator_str,
114
+ skip_folders=skip_folders,
115
+ skip_files=skip_files,
116
+ only_folders=only_folders,
117
+ max_file_kb=max_kb,
118
+ sort_by=sort_by,
119
+ keyword_files=keyword_files,
120
+ )
121
+
122
+ if not file_list:
123
+ return "⚠️ 没有找到符合条件的文件", "0 个文件"
124
+
125
+ saved_msg = ""
126
+ if save_to_file:
127
+ out_name = output_filename.strip() or "代码汇总.txt"
128
+ if not out_name.endswith(".txt"):
129
+ out_name += ".txt"
130
+ out_path = os.path.join(folder_path, out_name)
131
+ try:
132
+ with open(out_path, "w", encoding="utf-8") as f:
133
+ f.write(merged)
134
+ saved_msg = f"\n💾 已保存到:{out_path}"
135
+ except Exception as e:
136
+ saved_msg = f"\n⚠️ 保存失败:{e}"
137
+
138
+ summary_lines = [
139
+ f"✅ 共提取 {stats['file_count']} 个文件",
140
+ f"📊 总字符数:{stats['char_count']:,}",
141
+ f"🤖 估算 Token:{stats['token_est']:,} (字符数 ÷ 4,英文准 / 中文偏低)",
142
+ "",
143
+ "📋 文件清单:",
144
+ ]
145
+ for f in file_list:
146
+ summary_lines.append(f" • {os.path.relpath(f, folder_path)}")
147
+ if saved_msg:
148
+ summary_lines.append(saved_msg)
149
+
150
+ return merged, "\n".join(summary_lines)
151
+
152
+
153
+ HELP_TEXT = """
154
+ ## 📖 用法说明
155
+
156
+ **基本流程**
157
+ 1. 点击 `📂 选择` 按钮选择目标文件夹,或直接粘贴路径
158
+ 2. 勾选需要的文件类型(也可在下方自定义)
159
+ 3. 按需展开「过滤条件」和「输出设置」填写参数
160
+ 4. 点 `🔍 预览文件列表` 确认文件范围
161
+ 5. 确认无误后点 `🚀 开始提取`
162
+
163
+ ---
164
+
165
+ **过滤功能说明**
166
+
167
+ | 功能 | 说明 |
168
+ |------|------|
169
+ | ✅ 指定文件夹 | 只在这些文件夹里查找,每行一个,留空 = 不限制 |
170
+ | ⛔ 跳过文件夹 | 排除这些文件夹,每行一个 |
171
+ | ✅ 指定文件名 | 精确匹配完整文件名(含后缀),每行一个,留空 = 不限制 |
172
+ | ⛔ 跳过文件名 | 排除这些文件,每行一个完整文件名(含后缀) |
173
+ | 📦 文件大小上限 | 跳过超过指定 KB 的文件,0 = 不限制 |
174
+ | 🔁 递归模式 | 勾选则进入所有子文件夹,取消则只读当前层 |
175
+
176
+ ---
177
+
178
+ **Token 估算说明**
179
+
180
+ - 纯英文代码:误差较小
181
+ - 含中文注释:实际 Token 会更高
182
+
183
+ 主流模型上下文参考:GPT-4o ≈ 128K,Claude ≈ 200K,Gemini 1.5 Pro ≈ 1M
184
+ """.strip()
185
+
186
+
187
+ def launch():
188
+ with gr.Blocks(title="Vaultex") as demo:
189
+
190
+ gr.Markdown("# 🔐 Vaultex\n### 从文件夹中提取并合并文本文件内容")
191
+
192
+ with gr.Tabs():
193
+
194
+ # ── Tab 1:主界面 ────────────────────────────────────
195
+ with gr.Tab("🚀 提取"):
196
+ with gr.Row():
197
+
198
+ # 左栏:配置
199
+ with gr.Column(scale=1, min_width=340):
200
+
201
+ # 路径
202
+ with gr.Row():
203
+ folder_input = gr.Textbox(
204
+ label="📁 文件夹路径",
205
+ placeholder="手动输入或点右侧按钮选择",
206
+ lines=1,
207
+ scale=5
208
+ )
209
+ pick_btn = gr.Button("📂 选择", scale=1, min_width=60)
210
+ pick_btn.click(fn=pick_folder, inputs=[], outputs=[folder_input])
211
+
212
+ # 文件类型
213
+ with gr.Accordion("📄 文件类型", open=True):
214
+ ext_selector = gr.CheckboxGroup(
215
+ choices=DEFAULT_EXTENSIONS,
216
+ value=[".txt", ".md", ".py", ".js", ".json"],
217
+ label="勾选类型"
218
+ )
219
+ custom_ext_input = gr.Textbox(
220
+ label="➕ 自定义类型(每行一个)",
221
+ placeholder=".vue\n.svelte\n.lock",
222
+ lines=2
223
+ )
224
+
225
+ # 过滤条件(默认折叠)
226
+ with gr.Accordion("🔽 过滤条件", open=False):
227
+
228
+ gr.Markdown("**📂 文件夹**")
229
+ with gr.Row():
230
+ only_folders_input = gr.Textbox(
231
+ label="✅ 指定文件夹(只看这些)",
232
+ placeholder="src\nlib\nutils",
233
+ lines=4
234
+ )
235
+ skip_folders_input = gr.Textbox(
236
+ label="⛔ 跳过文件夹(排除这些)",
237
+ placeholder="__pycache__\n.git\nnode_modules",
238
+ lines=4
239
+ )
240
+
241
+ gr.Markdown("**📄 文件**")
242
+ with gr.Row():
243
+ keyword_files_input = gr.Textbox(
244
+ label="✅ 指定文件名(只要这些,含后缀)",
245
+ placeholder="model.py\nconfig.json",
246
+ lines=4
247
+ )
248
+ skip_files_input = gr.Textbox(
249
+ label="⛔ 跳过文件名(排除这些,含后缀)",
250
+ placeholder="setup.py\nconfig.py",
251
+ lines=4
252
+ )
253
+
254
+ max_kb_input = gr.Number(
255
+ label="📦 文件大小上限(KB,0 = 不限)",
256
+ value=0,
257
+ precision=0
258
+ )
259
+
260
+ # 输出设置(默认折叠)
261
+ with gr.Accordion("💾 输出设置", open=False):
262
+ with gr.Row():
263
+ recursive_toggle = gr.Checkbox(
264
+ label="🔁 包含子文件夹",
265
+ value=True,
266
+ scale=1
267
+ )
268
+ sort_selector = gr.Radio(
269
+ choices=[("路径", "path"), ("文件名", "name"), ("修改时间", "mtime")],
270
+ value="path",
271
+ label="🔃 排序方式",
272
+ scale=2
273
+ )
274
+ separator_input = gr.Textbox(
275
+ label="✂️ 文件分隔符",
276
+ value="=" * 60,
277
+ lines=1
278
+ )
279
+ save_toggle = gr.Checkbox(
280
+ label="💾 同时保存到文件",
281
+ value=False
282
+ )
283
+ output_filename_input = gr.Textbox(
284
+ label="📄 输出文件名",
285
+ value="代码汇总.txt",
286
+ lines=1
287
+ )
288
+
289
+ # 按钮
290
+ with gr.Row():
291
+ scan_btn = gr.Button("🔍 预览文件列表", variant="secondary", scale=1)
292
+ extract_btn = gr.Button("🚀 开始提取", variant="primary", scale=1)
293
+
294
+ # 右栏:输出
295
+ with gr.Column(scale=2):
296
+ scan_output = gr.Textbox(
297
+ label="🔍 文件列表预览",
298
+ lines=8,
299
+ interactive=False
300
+ )
301
+ summary_output = gr.Textbox(
302
+ label="📋 提取摘要",
303
+ lines=8,
304
+ interactive=False
305
+ )
306
+ result_output = gr.Textbox(
307
+ label="📝 合并内容",
308
+ lines=18,
309
+ interactive=False
310
+ )
311
+
312
+ # ── Tab 2:用法说明 ──────────────────────────────────
313
+ with gr.Tab("📖 用法说明"):
314
+ gr.Markdown(HELP_TEXT)
315
+
316
+ # 事件绑定
317
+ scan_btn.click(
318
+ fn=run_scan,
319
+ inputs=[
320
+ folder_input, ext_selector, custom_ext_input,
321
+ recursive_toggle, only_folders_input, skip_folders_input,
322
+ keyword_files_input, skip_files_input, max_kb_input
323
+ ],
324
+ outputs=[scan_output]
325
+ )
326
+
327
+ extract_btn.click(
328
+ fn=run_extract,
329
+ inputs=[
330
+ folder_input, ext_selector, custom_ext_input,
331
+ recursive_toggle, separator_input,
332
+ only_folders_input, skip_folders_input,
333
+ keyword_files_input, skip_files_input, max_kb_input,
334
+ sort_selector, save_toggle, output_filename_input
335
+ ],
336
+ outputs=[result_output, summary_output]
337
+ )
338
+
339
+ demo.launch(theme=gr.themes.Soft())
340
+
341
+
342
+ if __name__ == "__main__":
343
+ launch()
@@ -0,0 +1,152 @@
1
+ import os
2
+
3
+ DEFAULT_EXTENSIONS = [
4
+ ".py", ".js", ".ts", ".jsx", ".tsx",
5
+ ".html", ".css", ".scss",
6
+ ".json", ".yaml", ".yml", ".toml", ".ini", ".env",
7
+ ".md", ".txt", ".rst", ".csv", ".xml",
8
+ ".sh", ".bat", ".ps1",
9
+ ".c", ".cpp", ".h", ".java", ".go", ".rs",
10
+ ]
11
+
12
+
13
+ def extract(
14
+ folder_path: str,
15
+ extensions: list,
16
+ recursive: bool = True,
17
+ separator: str = "=" * 60,
18
+ skip_folders: list = None,
19
+ skip_files: list = None,
20
+ only_folders: list = None,
21
+ max_file_kb: int = 0,
22
+ sort_by: str = "path",
23
+ keyword_files: set = None,
24
+ ) -> tuple[str, list, dict]:
25
+
26
+ skip_folders = skip_folders or []
27
+ skip_files = skip_files or []
28
+ only_folders = only_folders or []
29
+ keyword_files = keyword_files or set()
30
+
31
+ file_list = _collect_files(
32
+ folder_path, extensions, recursive,
33
+ skip_folders, skip_files, only_folders, max_file_kb, keyword_files
34
+ )
35
+
36
+ if sort_by == "name":
37
+ file_list.sort(key=lambda p: os.path.basename(p).lower())
38
+ elif sort_by == "mtime":
39
+ file_list.sort(key=lambda p: os.path.getmtime(p), reverse=True)
40
+ else:
41
+ file_list.sort()
42
+
43
+ if not file_list:
44
+ return "", [], {}
45
+
46
+ lines = []
47
+ total = len(file_list)
48
+
49
+ lines.append("📁 代码汇总报告")
50
+ lines.append(f"根目录:{folder_path}")
51
+ lines.append(f"共读取:{total} 个文件")
52
+ lines.append(separator)
53
+ lines.append("")
54
+
55
+ lines.append("📋 文件目录索引")
56
+ lines.append("-" * 40)
57
+ for i, filepath in enumerate(file_list, 1):
58
+ rel = os.path.relpath(filepath, folder_path)
59
+ size_kb = os.path.getsize(filepath) / 1024
60
+ lines.append(f" {i:>3}. {rel} ({size_kb:.1f} KB)")
61
+ lines.append("")
62
+ lines.append("")
63
+
64
+ for i, filepath in enumerate(file_list, 1):
65
+ rel = os.path.relpath(filepath, folder_path)
66
+ lines.append(separator)
67
+ lines.append(f"# 文件 {i}/{total}:{rel}")
68
+ lines.append(separator)
69
+ lines.append("")
70
+ content = _read_file(filepath)
71
+ lines.append(content)
72
+ lines.append("")
73
+ lines.append("")
74
+
75
+ merged = "\n".join(lines)
76
+
77
+ char_count = sum(len(_read_file(f)) for f in file_list)
78
+ stats = {
79
+ "file_count": total,
80
+ "char_count": char_count,
81
+ "token_est": char_count // 4,
82
+ }
83
+
84
+ return merged, file_list, stats
85
+
86
+
87
+ def _collect_files(folder, extensions, recursive, skip_folders, skip_files,
88
+ only_folders, max_file_kb, keyword_files=None):
89
+ collected = []
90
+ keyword_files = keyword_files or set()
91
+
92
+ if recursive:
93
+ for root, dirs, files in os.walk(folder):
94
+ # 指定文件夹:当前层级的文件夹名必须在列表里
95
+ if only_folders:
96
+ dirs[:] = [
97
+ d for d in dirs
98
+ if d in only_folders and d not in skip_folders and not d.startswith(".")
99
+ ]
100
+ else:
101
+ dirs[:] = [
102
+ d for d in dirs
103
+ if d not in skip_folders and not d.startswith(".")
104
+ ]
105
+
106
+ # 指定文件夹时,根目录本身的文件也要判断是否在指定范围内
107
+ # 根目录直接收录,子目录只收录在 only_folders 里的
108
+ rel_root = os.path.relpath(root, folder)
109
+ if only_folders and rel_root != ".":
110
+ top_dir = rel_root.split(os.sep)[0]
111
+ if top_dir not in only_folders:
112
+ continue
113
+
114
+ for file in files:
115
+ full_path = os.path.join(root, file)
116
+ if not _passes_filters(file, extensions, skip_files, max_file_kb, keyword_files, full_path):
117
+ continue
118
+ collected.append(full_path)
119
+ else:
120
+ for item in os.listdir(folder):
121
+ full_path = os.path.join(folder, item)
122
+ if not os.path.isfile(full_path):
123
+ continue
124
+ if not _passes_filters(item, extensions, skip_files, max_file_kb, keyword_files, full_path):
125
+ continue
126
+ collected.append(full_path)
127
+
128
+ return collected
129
+
130
+
131
+ def _passes_filters(filename, extensions, skip_files, max_file_kb, keyword_files, full_path):
132
+ if filename in skip_files:
133
+ return False
134
+ if not any(filename.endswith(ext) for ext in extensions):
135
+ return False
136
+ if max_file_kb > 0 and os.path.getsize(full_path) / 1024 > max_file_kb:
137
+ return False
138
+ if keyword_files and filename.lower() not in keyword_files:
139
+ return False
140
+ return True
141
+
142
+
143
+ def _read_file(filepath):
144
+ for encoding in ("utf-8", "gbk", "latin-1"):
145
+ try:
146
+ with open(filepath, "r", encoding=encoding) as f:
147
+ return f.read()
148
+ except UnicodeDecodeError:
149
+ continue
150
+ except Exception as e:
151
+ return f"⚠️ 读取失败:{e}"
152
+ return "⚠️ 读取失败:所有编码均无法解析"
@@ -0,0 +1,106 @@
1
+ Metadata-Version: 2.4
2
+ Name: vaultex
3
+ Version: 0.1.0
4
+ Summary: Extract and merge files from a folder with precision — a lightweight GUI tool for developers and LLM workflows.
5
+ Author-email: Ian Gong <gongzhijie535@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/gongzhijie535-ctrl/vaultex
8
+ Project-URL: Repository, https://github.com/gongzhijie535-ctrl/vaultex
9
+ Project-URL: Issues, https://github.com/gongzhijie535-ctrl/vaultex
10
+ Keywords: file,extract,merge,llm,gradio,code,tools
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Topic :: Utilities
16
+ Requires-Python: >=3.10
17
+ Description-Content-Type: text/markdown
18
+ Requires-Dist: gradio>=4.0
19
+
20
+ # 🔐 Vaultex
21
+
22
+ > Extract what you need. Nothing more.
23
+
24
+ Vaultex is a lightweight GUI tool that lets you **scan a folder and merge all matching files into a single text output** — ready to paste into an LLM, a doc, or anywhere you need a full-project snapshot.
25
+
26
+ The idea is simple: when you're working with a codebase or a collection of files, you often need to quickly gather *specific types* of files across nested folders. Vaultex gives you precise control over what gets included, what gets skipped, and how the result is organized — all through a clean interface.
27
+
28
+ ---
29
+
30
+ ## ✨ Features
31
+
32
+ - 📂 **Folder picker** — browse or paste a path directly
33
+ - 📄 **File type selector** — pick from common extensions or add your own
34
+ - 🎯 **Whitelist / blacklist filtering** — specify exactly which folders and files to include or exclude
35
+ - 📦 **File size limit** — skip files that are too large
36
+ - 🔁 **Recursive or flat mode** — go deep or stay shallow
37
+ - 🔃 **Sort options** — by path, filename, or last modified time
38
+ - 🔍 **Preview before extracting** — scan first, extract when ready
39
+ - 💾 **Save to file** — optionally write the merged output back to disk
40
+ - 🤖 **Token estimator** — rough count to check if output fits your LLM context window
41
+
42
+ ---
43
+
44
+ ## 🚀 Installation
45
+
46
+ ```bash
47
+ pip install vaultex
48
+ ```
49
+
50
+ Then launch:
51
+
52
+ ```bash
53
+ vaultex
54
+ ```
55
+
56
+ Or run directly from source:
57
+
58
+ ```bash
59
+ git clone https://github.com/gongzhijie535-ctrl/vaultex
60
+ cd vaultex
61
+ pip install -e .
62
+ python -m vaultex
63
+ ```
64
+
65
+ ---
66
+
67
+ ## 🖥️ Usage
68
+
69
+ 1. Select a folder using the **📂 picker** or paste a path
70
+ 2. Check the file types you want (`.py`, `.md`, `.json`, etc.)
71
+ 3. Expand **Filter Options** to narrow down by folder or filename
72
+ 4. Click **🔍 Preview** to confirm the file list
73
+ 5. Click **🚀 Extract** to merge and view the output
74
+
75
+ ---
76
+
77
+ ## 📁 Project Structure
78
+
79
+ ```
80
+ vaultex/
81
+ ├── __init__.py
82
+ ├── __main__.py # entry point: python -m vaultex
83
+ ├── core.py # file collection + merging logic
84
+ └── app.py # Gradio UI
85
+ ```
86
+
87
+ ---
88
+
89
+ ## 📦 Requirements
90
+
91
+ - Python ≥ 3.10
92
+ - gradio
93
+
94
+ ---
95
+
96
+ ## 👤 Author
97
+
98
+ **Ian Gong (龚智杰)**
99
+ 📧 gongzhijie535@gmail.com
100
+ 🐙 [@gongzhijie535-ctrl](https://github.com/gongzhijie535-ctrl)
101
+
102
+ ---
103
+
104
+ ## 📄 License
105
+
106
+ MIT
@@ -0,0 +1,13 @@
1
+ README.md
2
+ pyproject.toml
3
+ setup.py
4
+ vaultex/__init__.py
5
+ vaultex/__main__.py
6
+ vaultex/app.py
7
+ vaultex/core.py
8
+ vaultex.egg-info/PKG-INFO
9
+ vaultex.egg-info/SOURCES.txt
10
+ vaultex.egg-info/dependency_links.txt
11
+ vaultex.egg-info/entry_points.txt
12
+ vaultex.egg-info/requires.txt
13
+ vaultex.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ vaultex = vaultex.app:launch
@@ -0,0 +1 @@
1
+ gradio>=4.0
@@ -0,0 +1 @@
1
+ vaultex