large-file-upload-sdk 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,67 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+
28
+ # IDE
29
+ .vscode/
30
+ .idea/
31
+ *.swp
32
+ *.swo
33
+
34
+ # OS
35
+ .DS_Store
36
+ Thumbs.db
37
+
38
+ # Git
39
+ .git/
40
+ .gitignore
41
+
42
+ # Docker
43
+ Dockerfile*
44
+ docker-compose*.yml
45
+
46
+ # Logs
47
+ *.log
48
+ logs/
49
+
50
+ # Uploads (don't include in image)
51
+ uploads/
52
+ temp/
53
+ **/uploads/
54
+ **/temp/
55
+
56
+ # Environment files
57
+ .env
58
+ .env.local
59
+
60
+ # Test files
61
+ tests/
62
+ test_*.py
63
+ *_test.py
64
+
65
+ # Documentation
66
+ *.md
67
+ docs/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 SAIS Development Team
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,16 @@
1
+ include README.md
2
+ include LICENSE
3
+ include SDK_INSTALL.md
4
+ include file_upload_sdk.py
5
+ include sdk_usage_example.py
6
+ exclude test_*.py
7
+ exclude API_TEST.md
8
+ exclude Dockerfile*
9
+ exclude .env*
10
+ exclude *.sh
11
+ exclude requirements-docker.txt
12
+ exclude Jenkinsfile
13
+ recursive-exclude uploads *
14
+ recursive-exclude temp *
15
+ recursive-exclude __pycache__ *
16
+ recursive-exclude *.egg-info *
@@ -0,0 +1,344 @@
1
+ Metadata-Version: 2.4
2
+ Name: large-file-upload-sdk
3
+ Version: 1.0.0
4
+ Summary: Large File Upload SDK - Support chunked upload and resumable transfer
5
+ Home-page: https://github.com/your-username/file-upload-server
6
+ Author: SAIS Development Team
7
+ Author-email: SAIS Development Team <dev@sais.com.cn>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/your-username/file-upload-server
10
+ Project-URL: Documentation, https://github.com/your-username/file-upload-server#readme
11
+ Project-URL: Repository, https://github.com/your-username/file-upload-server
12
+ Project-URL: Bug Reports, https://github.com/your-username/file-upload-server/issues
13
+ Keywords: file upload,large file,chunked upload,resumable upload,file transfer
14
+ Classifier: Development Status :: 5 - Production/Stable
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
17
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
18
+ Classifier: Topic :: System :: Archiving
19
+ Classifier: License :: OSI Approved :: MIT License
20
+ Classifier: Operating System :: OS Independent
21
+ Classifier: Programming Language :: Python :: 3
22
+ Classifier: Programming Language :: Python :: 3.7
23
+ Classifier: Programming Language :: Python :: 3.8
24
+ Classifier: Programming Language :: Python :: 3.9
25
+ Classifier: Programming Language :: Python :: 3.10
26
+ Classifier: Programming Language :: Python :: 3.11
27
+ Classifier: Programming Language :: Python :: 3.12
28
+ Requires-Python: >=3.7
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: requests>=2.25.0
32
+ Requires-Dist: urllib3>=1.26.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=6.0; extra == "dev"
35
+ Requires-Dist: pytest-cov>=2.0; extra == "dev"
36
+ Requires-Dist: black>=21.0; extra == "dev"
37
+ Requires-Dist: flake8>=3.8; extra == "dev"
38
+ Dynamic: author
39
+ Dynamic: home-page
40
+ Dynamic: license-file
41
+ Dynamic: requires-python
42
+
43
+ # File Upload Server
44
+
45
+ 星河启智二期用于上传大文件的服务器 - 基于 FastAPI 构建,支持分片上传和断点续传
46
+
47
+ ## 功能特性
48
+
49
+ - ✅ **分片上传**: 支持大文件分片上传,提高上传成功率
50
+ - ✅ **断点续传**: 网络中断后可继续上传,无需重新开始
51
+ - ✅ **MD5校验**: 可选的文件和分片MD5校验,确保数据完整性
52
+ - ✅ **并发控制**: 支持多文件并发上传
53
+ - ✅ **自动清理**: 自动清理过期的临时文件
54
+ - ✅ **RESTful API**: 提供完整的REST API接口
55
+ - ✅ **在线文档**: 自动生成的API文档
56
+ - ✅ **Docker支持**: 支持容器化部署
57
+
58
+ ## 技术栈
59
+
60
+ - **后端框架**: FastAPI 0.104+
61
+ - **异步处理**: asyncio + aiofiles
62
+ - **数据验证**: Pydantic
63
+ - **ASGI服务器**: Uvicorn
64
+ - **容器化**: Docker
65
+
66
+ ## 快速开始
67
+
68
+ ### 环境要求
69
+
70
+ - Python 3.11+
71
+ - 或者 Docker
72
+
73
+ ### 本地开发
74
+
75
+ 1. **克隆项目**
76
+ ```bash
77
+ git clone <repository-url>
78
+ cd file-upload-server
79
+ ```
80
+
81
+ 2. **安装依赖**
82
+ ```bash
83
+ pip install -r requirements.txt
84
+ ```
85
+
86
+ 3. **配置环境变量**
87
+ ```bash
88
+ cp env.example .env
89
+ # 编辑 .env 文件,根据环境修改 UPLOAD_BASE_DIR
90
+ ```
91
+
92
+ 4. **启动服务**
93
+ ```bash
94
+ python main.py
95
+ ```
96
+
97
+ 服务将在 `http://localhost:8000` 启动
98
+
99
+ ### Docker部署
100
+
101
+ 1. **构建镜像**
102
+ ```bash
103
+ docker build -t file-upload-server .
104
+ ```
105
+
106
+ 2. **运行容器**
107
+ ```bash
108
+ # Staging环境(使用默认配置)
109
+ docker run -d \
110
+ --name file-upload-server \
111
+ -p 8000:8000 \
112
+ -v /cpfs-nfs/sais/data-plaza-svc:/cpfs-nfs/sais/data-plaza-svc \
113
+ file-upload-server
114
+
115
+ # Production环境(只需覆盖UPLOAD_BASE_DIR)
116
+ docker run -d \
117
+ --name file-upload-server \
118
+ -p 8000:8000 \
119
+ -v /normal-cpfs-datasets:/normal-cpfs-datasets \
120
+ -v /normal-cpfs-datasets-temp:/normal-cpfs-datasets-temp \
121
+ -e UPLOAD_BASE_DIR=/normal-cpfs-datasets \
122
+ -e TEMP_DIR=/normal-cpfs-datasets-temp \
123
+ file-upload-server
124
+ ```
125
+
126
+ ## API 接口
127
+
128
+ 服务启动后访问 `http://localhost:8000/docs` 查看完整的API文档
129
+
130
+ ### 主要接口
131
+
132
+ | 接口 | 方法 | 说明 |
133
+ |------|------|------|
134
+ | `/` | GET | 服务信息 |
135
+ | `/health` | GET | 健康检查 |
136
+ | `/api/upload/check` | POST | 检查文件状态,支持断点续传 |
137
+ | `/api/upload/chunk` | POST | 上传文件分片 |
138
+ | `/api/upload/merge` | POST | 合并分片完成上传 |
139
+ | `/api/upload/status/{file_id}` | GET | 获取上传状态 |
140
+ | `/api/upload/{file_id}` | DELETE | 取消上传 |
141
+ | `/api/download/{filename}` | GET | 下载文件 |
142
+
143
+ ## 使用示例
144
+
145
+ ### Python SDK 使用(推荐)
146
+
147
+ #### 安装SDK
148
+ ```bash
149
+ pip install file-upload-sdk
150
+ ```
151
+
152
+ #### 基础用法(最简单)
153
+ ```python
154
+ import file_upload_sdk as api
155
+
156
+ # 初始化SDK
157
+ api.init_sdk("http://localhost:8000")
158
+
159
+ # 上传文件(就像调用普通API一样简单)
160
+ result = api.upload_file(
161
+ file_path="/path/to/local/your_file.suffix",
162
+ file_name="your_file.suffix"
163
+ )
164
+
165
+ if result['success']:
166
+ print(f"上传成功!文件URL: {result['file_url']}")
167
+ else:
168
+ print(f"上传失败: {result['error']}")
169
+ ```
170
+
171
+ #### 带进度显示的用法
172
+ ```python
173
+ import file_upload_sdk as api
174
+
175
+ api.init_sdk("http://localhost:8000")
176
+
177
+ def show_progress(progress):
178
+ print(f"\r上传进度: {progress:.1f}%", end="", flush=True)
179
+
180
+ result = api.upload_file(
181
+ file_path="/path/to/large_file.zip",
182
+ file_name="uploaded_file.zip",
183
+ progress_callback=show_progress
184
+ )
185
+ ```
186
+
187
+ #### 面向对象的用法
188
+ ```python
189
+ from file_upload_sdk import FileUploadSDK
190
+
191
+ sdk = FileUploadSDK(
192
+ base_url="http://localhost:8000",
193
+ chunk_size=5 * 1024 * 1024, # 5MB分片
194
+ retry_times=5
195
+ )
196
+
197
+ result = sdk.upload_file(
198
+ file_path="/path/to/your_file.suffix",
199
+ file_name="your_file.suffix",
200
+ enable_md5_check=True
201
+ )
202
+ ```
203
+
204
+ #### 断点续传示例
205
+ ```python
206
+ import file_upload_sdk as api
207
+
208
+ api.init_sdk("http://localhost:8000")
209
+
210
+ # 使用自定义文件ID进行断点续传
211
+ file_id = "my_unique_file_id"
212
+
213
+ result = api.upload_file(
214
+ file_path="/path/to/large_file.zip",
215
+ file_name="large_file.zip",
216
+ custom_file_id=file_id
217
+ )
218
+
219
+ # 如果上传中断,再次调用相同的代码即可继续上传
220
+ ```
221
+
222
+ ### cURL 使用示例
223
+
224
+ #### 1. 检查文件状态
225
+
226
+ ```bash
227
+ curl -X POST "http://localhost:8000/api/upload/check" \
228
+ -H "Content-Type: application/json" \
229
+ -d '{
230
+ "file_id": "unique-file-id",
231
+ "file_name": "large-file.zip",
232
+ "file_size": 1073741824,
233
+ "total_chunks": 100,
234
+ "file_md5": "d41d8cd98f00b204e9800998ecf8427e"
235
+ }'
236
+ ```
237
+
238
+ #### 2. 上传分片
239
+
240
+ ```bash
241
+ curl -X POST "http://localhost:8000/api/upload/chunk" \
242
+ -F "file_id=unique-file-id" \
243
+ -F "file_name=large-file.zip" \
244
+ -F "file_size=1073741824" \
245
+ -F "chunk_index=0" \
246
+ -F "total_chunks=100" \
247
+ -F "chunk_size=10485760" \
248
+ -F "chunk_file=@chunk_0.bin"
249
+ ```
250
+
251
+ #### 3. 合并文件
252
+
253
+ ```bash
254
+ curl -X POST "http://localhost:8000/api/upload/merge" \
255
+ -H "Content-Type: application/json" \
256
+ -d '{
257
+ "file_id": "unique-file-id",
258
+ "file_name": "large-file.zip",
259
+ "total_chunks": 100,
260
+ "file_md5": "d41d8cd98f00b204e9800998ecf8427e"
261
+ }'
262
+ ```
263
+
264
+ ## 配置说明
265
+
266
+ 主要配置项 (在 `config.py` 或环境变量中设置):
267
+
268
+ | 配置项 | 默认值 | 说明 |
269
+ |--------|--------|------|
270
+ | `HOST` | 0.0.0.0 | 服务器地址 |
271
+ | `PORT` | 8000 | 服务器端口 |
272
+ | `UPLOAD_BASE_DIR` | /cpfs-nfs/sais/data-plaza-svc/datasets | 数据集基础路径 |
273
+ | `TEMP_DIR` | /data/uploads/temp | 临时文件目录 |
274
+ | `MAX_FILE_SIZE` | 500GB | 单文件最大大小 |
275
+ | `CHUNK_SIZE` | 10MB | 分片大小 |
276
+ | `CHUNK_EXPIRE_HOURS` | 24 | 分片过期时间(小时) |
277
+ | `ENABLE_MD5_CHECK` | true | 是否启用MD5校验 |
278
+
279
+ ### 环境配置
280
+
281
+ 根据部署环境,只需要修改 `UPLOAD_BASE_DIR` 环境变量:
282
+
283
+ - **Staging环境**: `UPLOAD_BASE_DIR=/cpfs-nfs/sais/data-plaza-svc/datasets`
284
+ - **Production环境**: `UPLOAD_BASE_DIR=/normal-cpfs-datasets`
285
+
286
+ 文件最终会保存在: `{UPLOAD_BASE_DIR}/{datasetName}/full/download/`
287
+
288
+ ## 项目结构
289
+
290
+ ```
291
+ file-upload-server/
292
+ ├── main.py # 主应用入口
293
+ ├── config.py # 配置管理
294
+ ├── models.py # 数据模型
295
+ ├── services.py # 业务逻辑
296
+ ├── routers.py # 路由定义
297
+ ├── file_upload_sdk.py # Python SDK
298
+ ├── sdk_usage_example.py # SDK使用示例
299
+ ├── test_client.py # 测试客户端
300
+ ├── setup.py # SDK打包配置
301
+ ├── requirements.txt # 依赖包
302
+ ├── Dockerfile # Docker镜像
303
+ ├── .dockerignore # Docker忽略文件
304
+ ├── env.example # 环境变量示例
305
+ └── README.md # 项目说明
306
+ ```
307
+
308
+ ## 开发指南
309
+
310
+ ### 运行测试
311
+
312
+ ```bash
313
+ pytest
314
+ ```
315
+
316
+ ### 代码检查
317
+
318
+ ```bash
319
+ flake8 .
320
+ black .
321
+ ```
322
+
323
+ ## 部署建议
324
+
325
+ ### 生产环境
326
+
327
+ 1. **使用反向代理**: 建议使用 Nginx 作为反向代理
328
+ 2. **文件存储**: 使用网络存储(如NFS)或对象存储
329
+ 3. **监控**: 配置日志和监控系统
330
+ 4. **安全**: 配置HTTPS和访问控制
331
+
332
+ ### 性能优化
333
+
334
+ 1. **增加worker数量**: `uvicorn main:app --workers 4`
335
+ 2. **调整分片大小**: 根据网络条件调整 `CHUNK_SIZE`
336
+ 3. **使用SSD**: 临时文件目录使用SSD存储
337
+
338
+ ## 许可证
339
+
340
+ 此项目为内部使用项目
341
+
342
+ ## 支持
343
+
344
+ 如有问题请联系开发团队或提交 Issue