synapse-sdk 2025.10.1__py3-none-any.whl → 2025.10.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synapse-sdk might be problematic. Click here for more details.
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/docs/plugins/plugins.md +12 -5
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current.json +16 -4
- synapse_sdk/devtools/docs/sidebars.ts +27 -1
- synapse_sdk/plugins/README.md +487 -80
- synapse_sdk/plugins/categories/export/actions/export/action.py +8 -3
- synapse_sdk/plugins/categories/export/actions/export/utils.py +108 -8
- synapse_sdk/plugins/categories/pre_annotation/actions/__init__.py +4 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/__init__.py +3 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/action.py +10 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/__init__.py +28 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/action.py +145 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/enums.py +269 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/exceptions.py +14 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/factory.py +76 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/models.py +97 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/orchestrator.py +250 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/run.py +64 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/__init__.py +17 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/annotation.py +284 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/base.py +170 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/extraction.py +83 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/metrics.py +87 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/preprocessor.py +127 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/validation.py +143 -0
- synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +2 -1
- synapse_sdk/plugins/categories/upload/actions/upload/models.py +134 -94
- synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +2 -2
- synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +106 -14
- synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +113 -36
- synapse_sdk/plugins/categories/upload/templates/README.md +365 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/METADATA +1 -1
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/RECORD +50 -22
- synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +0 -1964
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +0 -2077
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/WHEEL +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/entry_points.txt +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/licenses/LICENSE +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/top_level.txt +0 -0
|
@@ -1,1463 +0,0 @@
|
|
|
1
|
-
# Developing Upload Templates with BaseUploader
|
|
2
|
-
|
|
3
|
-
This guide provides comprehensive documentation for plugin developers who want to create custom upload plugins using the BaseUploader template class. The BaseUploader follows the template method pattern to provide a structured, extensible foundation for file processing workflows.
|
|
4
|
-
|
|
5
|
-
## Quick Start
|
|
6
|
-
|
|
7
|
-
### Basic Plugin Structure
|
|
8
|
-
|
|
9
|
-
Create your upload plugin by inheriting from BaseUploader:
|
|
10
|
-
|
|
11
|
-
```python
|
|
12
|
-
from pathlib import Path
|
|
13
|
-
from typing import List, Dict, Any
|
|
14
|
-
from . import BaseUploader
|
|
15
|
-
|
|
16
|
-
class MyUploader(BaseUploader):
|
|
17
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
18
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
19
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
20
|
-
|
|
21
|
-
def process_files(self, organized_files: List) -> List:
|
|
22
|
-
"""Implement your custom file processing logic here."""
|
|
23
|
-
# Your processing logic goes here
|
|
24
|
-
return organized_files
|
|
25
|
-
|
|
26
|
-
def handle_upload_files(self) -> List[Dict[str, Any]]:
|
|
27
|
-
"""Main entry point called by the upload action."""
|
|
28
|
-
return super().handle_upload_files()
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
### Minimal Working Example
|
|
32
|
-
|
|
33
|
-
```python
|
|
34
|
-
class SimpleUploader(BaseUploader):
|
|
35
|
-
def process_files(self, organized_files: List) -> List:
|
|
36
|
-
"""Add metadata to each file group."""
|
|
37
|
-
for file_group in organized_files:
|
|
38
|
-
file_group['processed_by'] = 'SimpleUploader'
|
|
39
|
-
file_group['processing_timestamp'] = datetime.now().isoformat()
|
|
40
|
-
return organized_files
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
## Architecture Deep Dive
|
|
44
|
-
|
|
45
|
-
### Workflow Pipeline
|
|
46
|
-
|
|
47
|
-
The BaseUploader implements a comprehensive 6-step workflow pipeline:
|
|
48
|
-
|
|
49
|
-
```
|
|
50
|
-
1. setup_directories() # Initialize directory structure
|
|
51
|
-
2. organize_files() # Group and structure files
|
|
52
|
-
3. before_process() # Pre-processing hooks
|
|
53
|
-
4. process_files() # Main processing logic (REQUIRED)
|
|
54
|
-
5. after_process() # Post-processing hooks
|
|
55
|
-
6. validate_files() # Final validation and filtering
|
|
56
|
-
```
|
|
57
|
-
|
|
58
|
-
### Template Method Pattern
|
|
59
|
-
|
|
60
|
-
BaseUploader uses the template method pattern where:
|
|
61
|
-
- **Concrete methods** provide default behavior that works for most cases
|
|
62
|
-
- **Hook methods** allow customization at specific points
|
|
63
|
-
- **Abstract methods** must be implemented by subclasses
|
|
64
|
-
|
|
65
|
-
## Core Methods Reference
|
|
66
|
-
|
|
67
|
-
### Required Methods
|
|
68
|
-
|
|
69
|
-
#### `process_files(organized_files: List) -> List`
|
|
70
|
-
|
|
71
|
-
**Purpose**: Main processing method that transforms files according to your plugin's logic.
|
|
72
|
-
|
|
73
|
-
**When to use**: Always - this is the core method every plugin must implement.
|
|
74
|
-
|
|
75
|
-
**Parameters**:
|
|
76
|
-
- `organized_files`: List of file group dictionaries containing organized file data
|
|
77
|
-
|
|
78
|
-
**Returns**: List of processed file groups ready for upload
|
|
79
|
-
|
|
80
|
-
**Example**:
|
|
81
|
-
```python
|
|
82
|
-
def process_files(self, organized_files: List) -> List:
|
|
83
|
-
"""Convert TIFF images to JPEG format."""
|
|
84
|
-
processed_files = []
|
|
85
|
-
|
|
86
|
-
for file_group in organized_files:
|
|
87
|
-
files_dict = file_group.get('files', {})
|
|
88
|
-
converted_files = {}
|
|
89
|
-
|
|
90
|
-
for spec_name, file_path in files_dict.items():
|
|
91
|
-
if file_path.suffix.lower() in ['.tif', '.tiff']:
|
|
92
|
-
# Convert TIFF to JPEG
|
|
93
|
-
jpeg_path = self.convert_tiff_to_jpeg(file_path)
|
|
94
|
-
converted_files[spec_name] = jpeg_path
|
|
95
|
-
self.run.log_message(f"Converted {file_path} to {jpeg_path}")
|
|
96
|
-
else:
|
|
97
|
-
converted_files[spec_name] = file_path
|
|
98
|
-
|
|
99
|
-
file_group['files'] = converted_files
|
|
100
|
-
processed_files.append(file_group)
|
|
101
|
-
|
|
102
|
-
return processed_files
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### Optional Hook Methods
|
|
106
|
-
|
|
107
|
-
#### `setup_directories() -> None`
|
|
108
|
-
|
|
109
|
-
**Purpose**: Create custom directory structures before processing begins.
|
|
110
|
-
|
|
111
|
-
**When to use**: When your plugin needs specific directories for processing, temporary files, or output.
|
|
112
|
-
|
|
113
|
-
**Example**:
|
|
114
|
-
```python
|
|
115
|
-
def setup_directories(self):
|
|
116
|
-
"""Create processing directories."""
|
|
117
|
-
(self.path / 'temp').mkdir(exist_ok=True)
|
|
118
|
-
(self.path / 'processed').mkdir(exist_ok=True)
|
|
119
|
-
(self.path / 'thumbnails').mkdir(exist_ok=True)
|
|
120
|
-
self.run.log_message("Created processing directories")
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
#### `organize_files(files: List) -> List`
|
|
124
|
-
|
|
125
|
-
**Purpose**: Reorganize and structure files before main processing.
|
|
126
|
-
|
|
127
|
-
**When to use**: When you need to group files differently, filter by criteria, or restructure the data.
|
|
128
|
-
|
|
129
|
-
**Example**:
|
|
130
|
-
```python
|
|
131
|
-
def organize_files(self, files: List) -> List:
|
|
132
|
-
"""Group files by type and size."""
|
|
133
|
-
large_files = []
|
|
134
|
-
small_files = []
|
|
135
|
-
|
|
136
|
-
for file_group in files:
|
|
137
|
-
total_size = sum(f.stat().st_size for f in file_group.get('files', {}).values())
|
|
138
|
-
if total_size > 100 * 1024 * 1024: # 100MB
|
|
139
|
-
large_files.append(file_group)
|
|
140
|
-
else:
|
|
141
|
-
small_files.append(file_group)
|
|
142
|
-
|
|
143
|
-
# Process large files first
|
|
144
|
-
return large_files + small_files
|
|
145
|
-
```
|
|
146
|
-
|
|
147
|
-
#### `before_process(organized_files: List) -> List`
|
|
148
|
-
|
|
149
|
-
**Purpose**: Pre-processing hook for setup tasks before main processing.
|
|
150
|
-
|
|
151
|
-
**When to use**: For validation, preparation, or initialization tasks.
|
|
152
|
-
|
|
153
|
-
**Example**:
|
|
154
|
-
```python
|
|
155
|
-
def before_process(self, organized_files: List) -> List:
|
|
156
|
-
"""Validate and prepare files for processing."""
|
|
157
|
-
self.run.log_message(f"Starting processing of {len(organized_files)} file groups")
|
|
158
|
-
|
|
159
|
-
# Check available disk space
|
|
160
|
-
if not self.check_disk_space(organized_files):
|
|
161
|
-
raise Exception("Insufficient disk space for processing")
|
|
162
|
-
|
|
163
|
-
# Initialize processing resources
|
|
164
|
-
self.processing_queue = Queue()
|
|
165
|
-
return organized_files
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
#### `after_process(processed_files: List) -> List`
|
|
169
|
-
|
|
170
|
-
**Purpose**: Post-processing hook for cleanup and finalization.
|
|
171
|
-
|
|
172
|
-
**When to use**: For cleanup, final transformations, or resource deallocation.
|
|
173
|
-
|
|
174
|
-
**Example**:
|
|
175
|
-
```python
|
|
176
|
-
def after_process(self, processed_files: List) -> List:
|
|
177
|
-
"""Clean up temporary files and generate summary."""
|
|
178
|
-
# Remove temporary files
|
|
179
|
-
temp_dir = self.path / 'temp'
|
|
180
|
-
if temp_dir.exists():
|
|
181
|
-
shutil.rmtree(temp_dir)
|
|
182
|
-
|
|
183
|
-
# Generate processing summary
|
|
184
|
-
summary = {
|
|
185
|
-
'total_processed': len(processed_files),
|
|
186
|
-
'processing_time': time.time() - self.start_time,
|
|
187
|
-
'plugin_version': '1.0.0'
|
|
188
|
-
}
|
|
189
|
-
|
|
190
|
-
self.run.log_message(f"Processing complete: {summary}")
|
|
191
|
-
return processed_files
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
#### `validate_files(files: List) -> List`
|
|
195
|
-
|
|
196
|
-
**Purpose**: Custom validation logic beyond type checking.
|
|
197
|
-
|
|
198
|
-
**When to use**: When you need additional validation rules beyond the built-in file type validation.
|
|
199
|
-
|
|
200
|
-
**Example**:
|
|
201
|
-
```python
|
|
202
|
-
def validate_files(self, files: List) -> List:
|
|
203
|
-
"""Custom validation with size and format checks."""
|
|
204
|
-
# First apply built-in type validation
|
|
205
|
-
validated_files = self.validate_file_types(files)
|
|
206
|
-
|
|
207
|
-
# Then apply custom validation
|
|
208
|
-
final_files = []
|
|
209
|
-
for file_group in validated_files:
|
|
210
|
-
if self.validate_file_group(file_group):
|
|
211
|
-
final_files.append(file_group)
|
|
212
|
-
else:
|
|
213
|
-
self.run.log_message(f"File group failed custom validation: {file_group}")
|
|
214
|
-
|
|
215
|
-
return final_files
|
|
216
|
-
|
|
217
|
-
def validate_file_group(self, file_group: Dict) -> bool:
|
|
218
|
-
"""Custom validation for individual file groups."""
|
|
219
|
-
files_dict = file_group.get('files', {})
|
|
220
|
-
|
|
221
|
-
for spec_name, file_path in files_dict.items():
|
|
222
|
-
# Check file size limits
|
|
223
|
-
if file_path.stat().st_size > 500 * 1024 * 1024: # 500MB limit
|
|
224
|
-
return False
|
|
225
|
-
|
|
226
|
-
# Check file accessibility
|
|
227
|
-
if not os.access(file_path, os.R_OK):
|
|
228
|
-
return False
|
|
229
|
-
|
|
230
|
-
return True
|
|
231
|
-
```
|
|
232
|
-
|
|
233
|
-
## Advanced Features
|
|
234
|
-
|
|
235
|
-
### File Type Validation System
|
|
236
|
-
|
|
237
|
-
The BaseUploader includes a sophisticated validation system that you can customize:
|
|
238
|
-
|
|
239
|
-
#### Default File Extensions
|
|
240
|
-
|
|
241
|
-
```python
|
|
242
|
-
def get_file_extensions_config(self) -> Dict[str, List[str]]:
|
|
243
|
-
"""Override to customize allowed file extensions."""
|
|
244
|
-
return {
|
|
245
|
-
'pcd': ['.pcd'],
|
|
246
|
-
'text': ['.txt', '.html'],
|
|
247
|
-
'audio': ['.wav', '.mp3'],
|
|
248
|
-
'data': ['.bin', '.json', '.fbx'],
|
|
249
|
-
'image': ['.jpg', '.jpeg', '.png'],
|
|
250
|
-
'video': ['.mp4'],
|
|
251
|
-
}
|
|
252
|
-
```
|
|
253
|
-
|
|
254
|
-
#### Custom Extension Configuration
|
|
255
|
-
|
|
256
|
-
```python
|
|
257
|
-
class CustomUploader(BaseUploader):
|
|
258
|
-
def get_file_extensions_config(self) -> Dict[str, List[str]]:
|
|
259
|
-
"""Add support for additional formats."""
|
|
260
|
-
config = super().get_file_extensions_config()
|
|
261
|
-
config.update({
|
|
262
|
-
'cad': ['.dwg', '.dxf', '.step'],
|
|
263
|
-
'archive': ['.zip', '.rar', '.7z'],
|
|
264
|
-
'document': ['.pdf', '.docx', '.xlsx']
|
|
265
|
-
})
|
|
266
|
-
return config
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
#### Conversion Warnings
|
|
270
|
-
|
|
271
|
-
```python
|
|
272
|
-
def get_conversion_warnings_config(self) -> Dict[str, str]:
|
|
273
|
-
"""Override to customize conversion warnings."""
|
|
274
|
-
return {
|
|
275
|
-
'.tif': ' .jpg, .png',
|
|
276
|
-
'.tiff': ' .jpg, .png',
|
|
277
|
-
'.avi': ' .mp4',
|
|
278
|
-
'.mov': ' .mp4',
|
|
279
|
-
'.raw': ' .jpg, .png',
|
|
280
|
-
'.bmp': ' .jpg, .png',
|
|
281
|
-
}
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
### Custom Filtering
|
|
285
|
-
|
|
286
|
-
Implement the `filter_files` method for fine-grained control:
|
|
287
|
-
|
|
288
|
-
```python
|
|
289
|
-
def filter_files(self, organized_file: Dict[str, Any]) -> bool:
|
|
290
|
-
"""Custom filtering logic."""
|
|
291
|
-
# Filter by file size
|
|
292
|
-
files_dict = organized_file.get('files', {})
|
|
293
|
-
total_size = sum(f.stat().st_size for f in files_dict.values())
|
|
294
|
-
|
|
295
|
-
if total_size < 1024: # Skip files smaller than 1KB
|
|
296
|
-
self.run.log_message(f"Skipping small file group: {total_size} bytes")
|
|
297
|
-
return False
|
|
298
|
-
|
|
299
|
-
# Filter by file age
|
|
300
|
-
oldest_file = min(files_dict.values(), key=lambda f: f.stat().st_mtime)
|
|
301
|
-
age_days = (time.time() - oldest_file.stat().st_mtime) / 86400
|
|
302
|
-
|
|
303
|
-
if age_days > 365: # Skip files older than 1 year
|
|
304
|
-
self.run.log_message(f"Skipping old file group: {age_days} days old")
|
|
305
|
-
return False
|
|
306
|
-
|
|
307
|
-
return True
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
## Real-World Examples
|
|
311
|
-
|
|
312
|
-
### Example 1: Image Processing Plugin
|
|
313
|
-
|
|
314
|
-
```python
|
|
315
|
-
class ImageProcessingUploader(BaseUploader):
|
|
316
|
-
"""Converts TIFF images to JPEG and generates thumbnails."""
|
|
317
|
-
|
|
318
|
-
def setup_directories(self):
|
|
319
|
-
"""Create directories for processed images and thumbnails."""
|
|
320
|
-
(self.path / 'processed').mkdir(exist_ok=True)
|
|
321
|
-
(self.path / 'thumbnails').mkdir(exist_ok=True)
|
|
322
|
-
|
|
323
|
-
def organize_files(self, files: List) -> List:
|
|
324
|
-
"""Separate raw and processed images."""
|
|
325
|
-
raw_images = []
|
|
326
|
-
processed_images = []
|
|
327
|
-
|
|
328
|
-
for file_group in files:
|
|
329
|
-
has_raw = any(
|
|
330
|
-
f.suffix.lower() in ['.tif', '.tiff', '.raw']
|
|
331
|
-
for f in file_group.get('files', {}).values()
|
|
332
|
-
)
|
|
333
|
-
|
|
334
|
-
if has_raw:
|
|
335
|
-
raw_images.append(file_group)
|
|
336
|
-
else:
|
|
337
|
-
processed_images.append(file_group)
|
|
338
|
-
|
|
339
|
-
# Process raw images first
|
|
340
|
-
return raw_images + processed_images
|
|
341
|
-
|
|
342
|
-
def process_files(self, organized_files: List) -> List:
|
|
343
|
-
"""Convert images and generate thumbnails."""
|
|
344
|
-
processed_files = []
|
|
345
|
-
|
|
346
|
-
for file_group in organized_files:
|
|
347
|
-
files_dict = file_group.get('files', {})
|
|
348
|
-
converted_files = {}
|
|
349
|
-
|
|
350
|
-
for spec_name, file_path in files_dict.items():
|
|
351
|
-
if file_path.suffix.lower() in ['.tif', '.tiff']:
|
|
352
|
-
# Convert to JPEG
|
|
353
|
-
jpeg_path = self.convert_to_jpeg(file_path)
|
|
354
|
-
converted_files[spec_name] = jpeg_path
|
|
355
|
-
|
|
356
|
-
# Generate thumbnail
|
|
357
|
-
thumbnail_path = self.generate_thumbnail(jpeg_path)
|
|
358
|
-
converted_files[f"{spec_name}_thumbnail"] = thumbnail_path
|
|
359
|
-
|
|
360
|
-
self.run.log_message(f"Processed {file_path.name} -> {jpeg_path.name}")
|
|
361
|
-
else:
|
|
362
|
-
converted_files[spec_name] = file_path
|
|
363
|
-
|
|
364
|
-
file_group['files'] = converted_files
|
|
365
|
-
processed_files.append(file_group)
|
|
366
|
-
|
|
367
|
-
return processed_files
|
|
368
|
-
|
|
369
|
-
def convert_to_jpeg(self, tiff_path: Path) -> Path:
|
|
370
|
-
"""Convert TIFF to JPEG using PIL."""
|
|
371
|
-
from PIL import Image
|
|
372
|
-
|
|
373
|
-
output_path = self.path / 'processed' / f"{tiff_path.stem}.jpg"
|
|
374
|
-
|
|
375
|
-
with Image.open(tiff_path) as img:
|
|
376
|
-
# Convert to RGB if necessary
|
|
377
|
-
if img.mode in ('RGBA', 'LA', 'P'):
|
|
378
|
-
img = img.convert('RGB')
|
|
379
|
-
|
|
380
|
-
img.save(output_path, 'JPEG', quality=95)
|
|
381
|
-
|
|
382
|
-
return output_path
|
|
383
|
-
|
|
384
|
-
def generate_thumbnail(self, image_path: Path) -> Path:
|
|
385
|
-
"""Generate thumbnail for processed image."""
|
|
386
|
-
from PIL import Image
|
|
387
|
-
|
|
388
|
-
thumbnail_path = self.path / 'thumbnails' / f"{image_path.stem}_thumb.jpg"
|
|
389
|
-
|
|
390
|
-
with Image.open(image_path) as img:
|
|
391
|
-
img.thumbnail((200, 200), Image.Resampling.LANCZOS)
|
|
392
|
-
img.save(thumbnail_path, 'JPEG', quality=85)
|
|
393
|
-
|
|
394
|
-
return thumbnail_path
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
### Example 2: Data Validation Plugin
|
|
398
|
-
|
|
399
|
-
```python
|
|
400
|
-
class DataValidationUploader(BaseUploader):
|
|
401
|
-
"""Validates data files and generates quality reports."""
|
|
402
|
-
|
|
403
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
404
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
405
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
406
|
-
|
|
407
|
-
# Initialize validation config from extra_params
|
|
408
|
-
self.validation_config = extra_params.get('validation_config', {})
|
|
409
|
-
self.strict_mode = extra_params.get('strict_validation', False)
|
|
410
|
-
|
|
411
|
-
def before_process(self, organized_files: List) -> List:
|
|
412
|
-
"""Initialize validation engine."""
|
|
413
|
-
self.validation_results = []
|
|
414
|
-
self.run.log_message(f"Starting validation of {len(organized_files)} file groups")
|
|
415
|
-
return organized_files
|
|
416
|
-
|
|
417
|
-
def process_files(self, organized_files: List) -> List:
|
|
418
|
-
"""Validate files and generate quality reports."""
|
|
419
|
-
processed_files = []
|
|
420
|
-
|
|
421
|
-
for file_group in organized_files:
|
|
422
|
-
validation_result = self.validate_file_group(file_group)
|
|
423
|
-
|
|
424
|
-
# Add validation metadata
|
|
425
|
-
file_group['validation'] = validation_result
|
|
426
|
-
file_group['quality_score'] = validation_result['score']
|
|
427
|
-
|
|
428
|
-
# Include file group based on validation results
|
|
429
|
-
if self.should_include_file_group(validation_result):
|
|
430
|
-
processed_files.append(file_group)
|
|
431
|
-
self.run.log_message(f"File group passed validation: {validation_result['score']}")
|
|
432
|
-
else:
|
|
433
|
-
self.run.log_message(f"File group failed validation: {validation_result['errors']}")
|
|
434
|
-
|
|
435
|
-
return processed_files
|
|
436
|
-
|
|
437
|
-
def validate_file_group(self, file_group: Dict) -> Dict:
|
|
438
|
-
"""Comprehensive validation of file group."""
|
|
439
|
-
files_dict = file_group.get('files', {})
|
|
440
|
-
errors = []
|
|
441
|
-
warnings = []
|
|
442
|
-
score = 100
|
|
443
|
-
|
|
444
|
-
for spec_name, file_path in files_dict.items():
|
|
445
|
-
# File existence and accessibility
|
|
446
|
-
if not file_path.exists():
|
|
447
|
-
errors.append(f"File not found: {file_path}")
|
|
448
|
-
score -= 50
|
|
449
|
-
continue
|
|
450
|
-
|
|
451
|
-
if not os.access(file_path, os.R_OK):
|
|
452
|
-
errors.append(f"File not readable: {file_path}")
|
|
453
|
-
score -= 30
|
|
454
|
-
continue
|
|
455
|
-
|
|
456
|
-
# File size validation
|
|
457
|
-
file_size = file_path.stat().st_size
|
|
458
|
-
if file_size == 0:
|
|
459
|
-
errors.append(f"Empty file: {file_path}")
|
|
460
|
-
score -= 40
|
|
461
|
-
elif file_size > 1024 * 1024 * 1024: # 1GB
|
|
462
|
-
warnings.append(f"Large file: {file_path} ({file_size} bytes)")
|
|
463
|
-
score -= 10
|
|
464
|
-
|
|
465
|
-
# Content validation based on extension
|
|
466
|
-
try:
|
|
467
|
-
if file_path.suffix.lower() == '.json':
|
|
468
|
-
self.validate_json_file(file_path)
|
|
469
|
-
elif file_path.suffix.lower() in ['.jpg', '.png']:
|
|
470
|
-
self.validate_image_file(file_path)
|
|
471
|
-
# Add more content validations as needed
|
|
472
|
-
except Exception as e:
|
|
473
|
-
errors.append(f"Content validation failed for {file_path}: {str(e)}")
|
|
474
|
-
score -= 25
|
|
475
|
-
|
|
476
|
-
return {
|
|
477
|
-
'score': max(0, score),
|
|
478
|
-
'errors': errors,
|
|
479
|
-
'warnings': warnings,
|
|
480
|
-
'validated_at': datetime.now().isoformat()
|
|
481
|
-
}
|
|
482
|
-
|
|
483
|
-
def should_include_file_group(self, validation_result: Dict) -> bool:
|
|
484
|
-
"""Determine if file group should be included based on validation."""
|
|
485
|
-
if validation_result['errors'] and self.strict_mode:
|
|
486
|
-
return False
|
|
487
|
-
|
|
488
|
-
min_score = self.validation_config.get('min_score', 50)
|
|
489
|
-
return validation_result['score'] >= min_score
|
|
490
|
-
|
|
491
|
-
def validate_json_file(self, file_path: Path):
|
|
492
|
-
"""Validate JSON file structure."""
|
|
493
|
-
import json
|
|
494
|
-
with open(file_path, 'r') as f:
|
|
495
|
-
json.load(f) # Will raise exception if invalid JSON
|
|
496
|
-
|
|
497
|
-
def validate_image_file(self, file_path: Path):
|
|
498
|
-
"""Validate image file integrity."""
|
|
499
|
-
from PIL import Image
|
|
500
|
-
with Image.open(file_path) as img:
|
|
501
|
-
img.verify() # Will raise exception if corrupted
|
|
502
|
-
```
|
|
503
|
-
|
|
504
|
-
### Example 3: Batch Processing Plugin
|
|
505
|
-
|
|
506
|
-
```python
|
|
507
|
-
class BatchProcessingUploader(BaseUploader):
|
|
508
|
-
"""Processes files in configurable batches with progress tracking."""
|
|
509
|
-
|
|
510
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
511
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
512
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
513
|
-
|
|
514
|
-
self.batch_size = extra_params.get('batch_size', 10)
|
|
515
|
-
self.parallel_processing = extra_params.get('use_parallel', True)
|
|
516
|
-
self.max_workers = extra_params.get('max_workers', 4)
|
|
517
|
-
|
|
518
|
-
def organize_files(self, files: List) -> List:
|
|
519
|
-
"""Organize files into processing batches."""
|
|
520
|
-
batches = []
|
|
521
|
-
current_batch = []
|
|
522
|
-
|
|
523
|
-
for file_group in files:
|
|
524
|
-
current_batch.append(file_group)
|
|
525
|
-
|
|
526
|
-
if len(current_batch) >= self.batch_size:
|
|
527
|
-
batches.append({
|
|
528
|
-
'batch_id': len(batches) + 1,
|
|
529
|
-
'files': current_batch,
|
|
530
|
-
'batch_size': len(current_batch)
|
|
531
|
-
})
|
|
532
|
-
current_batch = []
|
|
533
|
-
|
|
534
|
-
# Add remaining files as final batch
|
|
535
|
-
if current_batch:
|
|
536
|
-
batches.append({
|
|
537
|
-
'batch_id': len(batches) + 1,
|
|
538
|
-
'files': current_batch,
|
|
539
|
-
'batch_size': len(current_batch)
|
|
540
|
-
})
|
|
541
|
-
|
|
542
|
-
self.run.log_message(f"Organized {len(files)} files into {len(batches)} batches")
|
|
543
|
-
return batches
|
|
544
|
-
|
|
545
|
-
def process_files(self, organized_files: List) -> List:
|
|
546
|
-
"""Process files in batches with progress tracking."""
|
|
547
|
-
all_processed_files = []
|
|
548
|
-
total_batches = len(organized_files)
|
|
549
|
-
|
|
550
|
-
if self.parallel_processing:
|
|
551
|
-
all_processed_files = self.process_batches_parallel(organized_files)
|
|
552
|
-
else:
|
|
553
|
-
all_processed_files = self.process_batches_sequential(organized_files)
|
|
554
|
-
|
|
555
|
-
self.run.log_message(f"Completed processing {total_batches} batches")
|
|
556
|
-
return all_processed_files
|
|
557
|
-
|
|
558
|
-
def process_batches_sequential(self, batches: List) -> List:
|
|
559
|
-
"""Process batches sequentially."""
|
|
560
|
-
all_files = []
|
|
561
|
-
|
|
562
|
-
for i, batch in enumerate(batches, 1):
|
|
563
|
-
self.run.log_message(f"Processing batch {i}/{len(batches)}")
|
|
564
|
-
|
|
565
|
-
processed_batch = self.process_single_batch(batch)
|
|
566
|
-
all_files.extend(processed_batch)
|
|
567
|
-
|
|
568
|
-
# Update progress
|
|
569
|
-
progress = (i / len(batches)) * 100
|
|
570
|
-
self.run.log_message(f"Progress: {progress:.1f}% complete")
|
|
571
|
-
|
|
572
|
-
return all_files
|
|
573
|
-
|
|
574
|
-
def process_batches_parallel(self, batches: List) -> List:
|
|
575
|
-
"""Process batches in parallel using ThreadPoolExecutor."""
|
|
576
|
-
from concurrent.futures import ThreadPoolExecutor, as_completed
|
|
577
|
-
|
|
578
|
-
all_files = []
|
|
579
|
-
completed_batches = 0
|
|
580
|
-
|
|
581
|
-
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
|
|
582
|
-
# Submit all batches
|
|
583
|
-
future_to_batch = {
|
|
584
|
-
executor.submit(self.process_single_batch, batch): batch
|
|
585
|
-
for batch in batches
|
|
586
|
-
}
|
|
587
|
-
|
|
588
|
-
# Process completed batches
|
|
589
|
-
for future in as_completed(future_to_batch):
|
|
590
|
-
batch = future_to_batch[future]
|
|
591
|
-
try:
|
|
592
|
-
processed_files = future.result()
|
|
593
|
-
all_files.extend(processed_files)
|
|
594
|
-
completed_batches += 1
|
|
595
|
-
|
|
596
|
-
progress = (completed_batches / len(batches)) * 100
|
|
597
|
-
self.run.log_message(f"Batch {batch['batch_id']} complete. Progress: {progress:.1f}%")
|
|
598
|
-
|
|
599
|
-
except Exception as e:
|
|
600
|
-
self.run.log_message(f"Batch {batch['batch_id']} failed: {str(e)}")
|
|
601
|
-
|
|
602
|
-
return all_files
|
|
603
|
-
|
|
604
|
-
def process_single_batch(self, batch: Dict) -> List:
|
|
605
|
-
"""Process a single batch of files."""
|
|
606
|
-
batch_files = batch['files']
|
|
607
|
-
processed_files = []
|
|
608
|
-
|
|
609
|
-
for file_group in batch_files:
|
|
610
|
-
# Apply your specific processing logic here
|
|
611
|
-
processed_file = self.process_file_group(file_group)
|
|
612
|
-
processed_files.append(processed_file)
|
|
613
|
-
|
|
614
|
-
return processed_files
|
|
615
|
-
|
|
616
|
-
def process_file_group(self, file_group: Dict) -> Dict:
|
|
617
|
-
"""Process individual file group - implement your logic here."""
|
|
618
|
-
# Example: Add batch processing metadata
|
|
619
|
-
file_group['batch_processed'] = True
|
|
620
|
-
file_group['processed_timestamp'] = datetime.now().isoformat()
|
|
621
|
-
return file_group
|
|
622
|
-
```
|
|
623
|
-
|
|
624
|
-
## Error Handling and Logging
|
|
625
|
-
|
|
626
|
-
### Comprehensive Error Handling
|
|
627
|
-
|
|
628
|
-
```python
|
|
629
|
-
class RobustUploader(BaseUploader):
|
|
630
|
-
def process_files(self, organized_files: List) -> List:
|
|
631
|
-
"""Process files with comprehensive error handling."""
|
|
632
|
-
processed_files = []
|
|
633
|
-
failed_files = []
|
|
634
|
-
|
|
635
|
-
for i, file_group in enumerate(organized_files):
|
|
636
|
-
try:
|
|
637
|
-
self.run.log_message(f"Processing file group {i+1}/{len(organized_files)}")
|
|
638
|
-
|
|
639
|
-
# Process with validation
|
|
640
|
-
processed_file = self.process_file_group_safely(file_group)
|
|
641
|
-
processed_files.append(processed_file)
|
|
642
|
-
|
|
643
|
-
except Exception as e:
|
|
644
|
-
error_info = {
|
|
645
|
-
'file_group': file_group,
|
|
646
|
-
'error': str(e),
|
|
647
|
-
'error_type': type(e).__name__,
|
|
648
|
-
'timestamp': datetime.now().isoformat()
|
|
649
|
-
}
|
|
650
|
-
failed_files.append(error_info)
|
|
651
|
-
|
|
652
|
-
self.run.log_message(f"Failed to process file group: {str(e)}")
|
|
653
|
-
|
|
654
|
-
# Continue processing other files
|
|
655
|
-
continue
|
|
656
|
-
|
|
657
|
-
# Log summary
|
|
658
|
-
self.run.log_message(
|
|
659
|
-
f"Processing complete: {len(processed_files)} successful, {len(failed_files)} failed"
|
|
660
|
-
)
|
|
661
|
-
|
|
662
|
-
if failed_files:
|
|
663
|
-
# Save error report
|
|
664
|
-
self.save_error_report(failed_files)
|
|
665
|
-
|
|
666
|
-
return processed_files
|
|
667
|
-
|
|
668
|
-
def process_file_group_safely(self, file_group: Dict) -> Dict:
|
|
669
|
-
"""Process file group with validation and error checking."""
|
|
670
|
-
# Validate file group structure
|
|
671
|
-
if 'files' not in file_group:
|
|
672
|
-
raise ValueError("File group missing 'files' key")
|
|
673
|
-
|
|
674
|
-
files_dict = file_group['files']
|
|
675
|
-
if not files_dict:
|
|
676
|
-
raise ValueError("File group has no files")
|
|
677
|
-
|
|
678
|
-
# Validate file accessibility
|
|
679
|
-
for spec_name, file_path in files_dict.items():
|
|
680
|
-
if not file_path.exists():
|
|
681
|
-
raise FileNotFoundError(f"File not found: {file_path}")
|
|
682
|
-
|
|
683
|
-
if not os.access(file_path, os.R_OK):
|
|
684
|
-
raise PermissionError(f"Cannot read file: {file_path}")
|
|
685
|
-
|
|
686
|
-
# Perform actual processing
|
|
687
|
-
return self.apply_processing_logic(file_group)
|
|
688
|
-
|
|
689
|
-
def save_error_report(self, failed_files: List):
|
|
690
|
-
"""Save detailed error report for debugging."""
|
|
691
|
-
error_report_path = self.path / 'error_report.json'
|
|
692
|
-
|
|
693
|
-
report = {
|
|
694
|
-
'timestamp': datetime.now().isoformat(),
|
|
695
|
-
'plugin_name': self.__class__.__name__,
|
|
696
|
-
'total_errors': len(failed_files),
|
|
697
|
-
'errors': failed_files
|
|
698
|
-
}
|
|
699
|
-
|
|
700
|
-
with open(error_report_path, 'w') as f:
|
|
701
|
-
json.dump(report, f, indent=2, default=str)
|
|
702
|
-
|
|
703
|
-
self.run.log_message(f"Error report saved to: {error_report_path}")
|
|
704
|
-
```
|
|
705
|
-
|
|
706
|
-
### Structured Logging
|
|
707
|
-
|
|
708
|
-
```python
|
|
709
|
-
class LoggingUploader(BaseUploader):
|
|
710
|
-
def setup_directories(self):
|
|
711
|
-
"""Setup logging directory."""
|
|
712
|
-
log_dir = self.path / 'logs'
|
|
713
|
-
log_dir.mkdir(exist_ok=True)
|
|
714
|
-
|
|
715
|
-
# Initialize structured logging
|
|
716
|
-
self.setup_structured_logging(log_dir)
|
|
717
|
-
|
|
718
|
-
def setup_structured_logging(self, log_dir: Path):
|
|
719
|
-
"""Setup structured logging with different levels."""
|
|
720
|
-
import logging
|
|
721
|
-
import json
|
|
722
|
-
|
|
723
|
-
# Create custom formatter for structured logs
|
|
724
|
-
class StructuredFormatter(logging.Formatter):
|
|
725
|
-
def format(self, record):
|
|
726
|
-
log_entry = {
|
|
727
|
-
'timestamp': datetime.now().isoformat(),
|
|
728
|
-
'level': record.levelname,
|
|
729
|
-
'message': record.getMessage(),
|
|
730
|
-
'plugin': 'LoggingUploader'
|
|
731
|
-
}
|
|
732
|
-
|
|
733
|
-
# Add extra fields if present
|
|
734
|
-
if hasattr(record, 'file_path'):
|
|
735
|
-
log_entry['file_path'] = str(record.file_path)
|
|
736
|
-
if hasattr(record, 'operation'):
|
|
737
|
-
log_entry['operation'] = record.operation
|
|
738
|
-
if hasattr(record, 'duration'):
|
|
739
|
-
log_entry['duration'] = record.duration
|
|
740
|
-
|
|
741
|
-
return json.dumps(log_entry)
|
|
742
|
-
|
|
743
|
-
# Setup logger
|
|
744
|
-
self.logger = logging.getLogger('upload_plugin')
|
|
745
|
-
self.logger.setLevel(logging.INFO)
|
|
746
|
-
|
|
747
|
-
# File handler
|
|
748
|
-
handler = logging.FileHandler(log_dir / 'plugin.log')
|
|
749
|
-
handler.setFormatter(StructuredFormatter())
|
|
750
|
-
self.logger.addHandler(handler)
|
|
751
|
-
|
|
752
|
-
def process_files(self, organized_files: List) -> List:
|
|
753
|
-
"""Process files with detailed logging."""
|
|
754
|
-
start_time = time.time()
|
|
755
|
-
|
|
756
|
-
self.logger.info(
|
|
757
|
-
f"Starting file processing",
|
|
758
|
-
extra={'operation': 'process_files', 'file_count': len(organized_files)}
|
|
759
|
-
)
|
|
760
|
-
|
|
761
|
-
processed_files = []
|
|
762
|
-
|
|
763
|
-
for i, file_group in enumerate(organized_files):
|
|
764
|
-
file_start_time = time.time()
|
|
765
|
-
|
|
766
|
-
try:
|
|
767
|
-
# Process file group
|
|
768
|
-
processed_file = self.process_file_group(file_group)
|
|
769
|
-
processed_files.append(processed_file)
|
|
770
|
-
|
|
771
|
-
# Log success
|
|
772
|
-
duration = time.time() - file_start_time
|
|
773
|
-
self.logger.info(
|
|
774
|
-
f"Successfully processed file group {i+1}",
|
|
775
|
-
extra={
|
|
776
|
-
'operation': 'process_file_group',
|
|
777
|
-
'file_group_index': i,
|
|
778
|
-
'duration': duration
|
|
779
|
-
}
|
|
780
|
-
)
|
|
781
|
-
|
|
782
|
-
except Exception as e:
|
|
783
|
-
# Log error
|
|
784
|
-
duration = time.time() - file_start_time
|
|
785
|
-
self.logger.error(
|
|
786
|
-
f"Failed to process file group {i+1}: {str(e)}",
|
|
787
|
-
extra={
|
|
788
|
-
'operation': 'process_file_group',
|
|
789
|
-
'file_group_index': i,
|
|
790
|
-
'duration': duration,
|
|
791
|
-
'error': str(e)
|
|
792
|
-
}
|
|
793
|
-
)
|
|
794
|
-
raise
|
|
795
|
-
|
|
796
|
-
# Log overall completion
|
|
797
|
-
total_duration = time.time() - start_time
|
|
798
|
-
self.logger.info(
|
|
799
|
-
f"Completed file processing",
|
|
800
|
-
extra={
|
|
801
|
-
'operation': 'process_files',
|
|
802
|
-
'total_duration': total_duration,
|
|
803
|
-
'processed_count': len(processed_files)
|
|
804
|
-
}
|
|
805
|
-
)
|
|
806
|
-
|
|
807
|
-
return processed_files
|
|
808
|
-
```
|
|
809
|
-
|
|
810
|
-
## Performance Optimization
|
|
811
|
-
|
|
812
|
-
### Memory Management
|
|
813
|
-
|
|
814
|
-
```python
|
|
815
|
-
class MemoryEfficientUploader(BaseUploader):
|
|
816
|
-
"""Uploader optimized for large file processing."""
|
|
817
|
-
|
|
818
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
819
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
820
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
821
|
-
|
|
822
|
-
self.chunk_size = extra_params.get('chunk_size', 8192) # 8KB chunks
|
|
823
|
-
self.memory_limit = extra_params.get('memory_limit_mb', 100) * 1024 * 1024
|
|
824
|
-
|
|
825
|
-
def process_files(self, organized_files: List) -> List:
|
|
826
|
-
"""Process files with memory management."""
|
|
827
|
-
import psutil
|
|
828
|
-
import gc
|
|
829
|
-
|
|
830
|
-
processed_files = []
|
|
831
|
-
|
|
832
|
-
for file_group in organized_files:
|
|
833
|
-
# Check memory usage before processing
|
|
834
|
-
memory_usage = psutil.Process().memory_info().rss
|
|
835
|
-
|
|
836
|
-
if memory_usage > self.memory_limit:
|
|
837
|
-
self.run.log_message(f"High memory usage: {memory_usage / 1024 / 1024:.1f}MB")
|
|
838
|
-
|
|
839
|
-
# Force garbage collection
|
|
840
|
-
gc.collect()
|
|
841
|
-
|
|
842
|
-
# Check again after cleanup
|
|
843
|
-
memory_usage = psutil.Process().memory_info().rss
|
|
844
|
-
if memory_usage > self.memory_limit:
|
|
845
|
-
self.run.log_message("Memory limit exceeded, processing in smaller chunks")
|
|
846
|
-
processed_file = self.process_file_group_chunked(file_group)
|
|
847
|
-
else:
|
|
848
|
-
processed_file = self.process_file_group_normal(file_group)
|
|
849
|
-
else:
|
|
850
|
-
processed_file = self.process_file_group_normal(file_group)
|
|
851
|
-
|
|
852
|
-
processed_files.append(processed_file)
|
|
853
|
-
|
|
854
|
-
return processed_files
|
|
855
|
-
|
|
856
|
-
def process_file_group_chunked(self, file_group: Dict) -> Dict:
|
|
857
|
-
"""Process large files in chunks to manage memory."""
|
|
858
|
-
files_dict = file_group.get('files', {})
|
|
859
|
-
processed_files = {}
|
|
860
|
-
|
|
861
|
-
for spec_name, file_path in files_dict.items():
|
|
862
|
-
if file_path.stat().st_size > 50 * 1024 * 1024: # 50MB
|
|
863
|
-
# Process large files in chunks
|
|
864
|
-
processed_path = self.process_large_file_chunked(file_path)
|
|
865
|
-
processed_files[spec_name] = processed_path
|
|
866
|
-
else:
|
|
867
|
-
# Process smaller files normally
|
|
868
|
-
processed_files[spec_name] = file_path
|
|
869
|
-
|
|
870
|
-
file_group['files'] = processed_files
|
|
871
|
-
return file_group
|
|
872
|
-
|
|
873
|
-
def process_large_file_chunked(self, file_path: Path) -> Path:
|
|
874
|
-
"""Process large file in chunks."""
|
|
875
|
-
output_path = self.path / 'processed' / file_path.name
|
|
876
|
-
|
|
877
|
-
with open(file_path, 'rb') as infile, open(output_path, 'wb') as outfile:
|
|
878
|
-
while True:
|
|
879
|
-
chunk = infile.read(self.chunk_size)
|
|
880
|
-
if not chunk:
|
|
881
|
-
break
|
|
882
|
-
|
|
883
|
-
# Apply processing to chunk
|
|
884
|
-
processed_chunk = self.process_chunk(chunk)
|
|
885
|
-
outfile.write(processed_chunk)
|
|
886
|
-
|
|
887
|
-
return output_path
|
|
888
|
-
|
|
889
|
-
def process_chunk(self, chunk: bytes) -> bytes:
|
|
890
|
-
"""Process individual chunk - override with your logic."""
|
|
891
|
-
# Example: Simple pass-through
|
|
892
|
-
return chunk
|
|
893
|
-
```
|
|
894
|
-
|
|
895
|
-
### Async Processing
|
|
896
|
-
|
|
897
|
-
```python
|
|
898
|
-
import asyncio
|
|
899
|
-
from concurrent.futures import ProcessPoolExecutor
|
|
900
|
-
|
|
901
|
-
class AsyncUploader(BaseUploader):
|
|
902
|
-
"""Uploader with asynchronous processing capabilities."""
|
|
903
|
-
|
|
904
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
905
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
906
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
907
|
-
|
|
908
|
-
self.max_concurrent = extra_params.get('max_concurrent', 5)
|
|
909
|
-
self.use_process_pool = extra_params.get('use_process_pool', False)
|
|
910
|
-
|
|
911
|
-
def process_files(self, organized_files: List) -> List:
|
|
912
|
-
"""Process files asynchronously."""
|
|
913
|
-
# Run async processing in sync context
|
|
914
|
-
return asyncio.run(self._process_files_async(organized_files))
|
|
915
|
-
|
|
916
|
-
async def _process_files_async(self, organized_files: List) -> List:
|
|
917
|
-
"""Main async processing method."""
|
|
918
|
-
if self.use_process_pool:
|
|
919
|
-
return await self._process_with_process_pool(organized_files)
|
|
920
|
-
else:
|
|
921
|
-
return await self._process_with_async_tasks(organized_files)
|
|
922
|
-
|
|
923
|
-
async def _process_with_async_tasks(self, organized_files: List) -> List:
|
|
924
|
-
"""Process using async tasks with concurrency limit."""
|
|
925
|
-
semaphore = asyncio.Semaphore(self.max_concurrent)
|
|
926
|
-
|
|
927
|
-
async def process_with_semaphore(file_group):
|
|
928
|
-
async with semaphore:
|
|
929
|
-
return await self._process_file_group_async(file_group)
|
|
930
|
-
|
|
931
|
-
# Create tasks for all file groups
|
|
932
|
-
tasks = [
|
|
933
|
-
process_with_semaphore(file_group)
|
|
934
|
-
for file_group in organized_files
|
|
935
|
-
]
|
|
936
|
-
|
|
937
|
-
# Wait for all tasks to complete
|
|
938
|
-
processed_files = await asyncio.gather(*tasks, return_exceptions=True)
|
|
939
|
-
|
|
940
|
-
# Filter out exceptions and log errors
|
|
941
|
-
valid_files = []
|
|
942
|
-
for i, result in enumerate(processed_files):
|
|
943
|
-
if isinstance(result, Exception):
|
|
944
|
-
self.run.log_message(f"Error processing file group {i}: {str(result)}")
|
|
945
|
-
else:
|
|
946
|
-
valid_files.append(result)
|
|
947
|
-
|
|
948
|
-
return valid_files
|
|
949
|
-
|
|
950
|
-
async def _process_with_process_pool(self, organized_files: List) -> List:
|
|
951
|
-
"""Process using process pool for CPU-intensive tasks."""
|
|
952
|
-
loop = asyncio.get_event_loop()
|
|
953
|
-
|
|
954
|
-
with ProcessPoolExecutor(max_workers=self.max_concurrent) as executor:
|
|
955
|
-
# Submit all tasks to process pool
|
|
956
|
-
futures = [
|
|
957
|
-
loop.run_in_executor(executor, self._process_file_group_sync, file_group)
|
|
958
|
-
for file_group in organized_files
|
|
959
|
-
]
|
|
960
|
-
|
|
961
|
-
# Wait for completion
|
|
962
|
-
processed_files = await asyncio.gather(*futures, return_exceptions=True)
|
|
963
|
-
|
|
964
|
-
# Filter exceptions
|
|
965
|
-
valid_files = []
|
|
966
|
-
for i, result in enumerate(processed_files):
|
|
967
|
-
if isinstance(result, Exception):
|
|
968
|
-
self.run.log_message(f"Error in process pool for file group {i}: {str(result)}")
|
|
969
|
-
else:
|
|
970
|
-
valid_files.append(result)
|
|
971
|
-
|
|
972
|
-
return valid_files
|
|
973
|
-
|
|
974
|
-
async def _process_file_group_async(self, file_group: Dict) -> Dict:
|
|
975
|
-
"""Async processing of individual file group."""
|
|
976
|
-
# Simulate async I/O operation
|
|
977
|
-
await asyncio.sleep(0.1)
|
|
978
|
-
|
|
979
|
-
# Apply your processing logic here
|
|
980
|
-
file_group['async_processed'] = True
|
|
981
|
-
file_group['processed_timestamp'] = datetime.now().isoformat()
|
|
982
|
-
|
|
983
|
-
return file_group
|
|
984
|
-
|
|
985
|
-
def _process_file_group_sync(self, file_group: Dict) -> Dict:
|
|
986
|
-
"""Synchronous processing for process pool."""
|
|
987
|
-
# This runs in a separate process
|
|
988
|
-
import time
|
|
989
|
-
time.sleep(0.1) # Simulate CPU work
|
|
990
|
-
|
|
991
|
-
file_group['process_pool_processed'] = True
|
|
992
|
-
file_group['processed_timestamp'] = datetime.now().isoformat()
|
|
993
|
-
|
|
994
|
-
return file_group
|
|
995
|
-
```
|
|
996
|
-
|
|
997
|
-
## Testing and Debugging
|
|
998
|
-
|
|
999
|
-
### Unit Testing Framework
|
|
1000
|
-
|
|
1001
|
-
```python
|
|
1002
|
-
import unittest
|
|
1003
|
-
from unittest.mock import Mock, patch, MagicMock
|
|
1004
|
-
from pathlib import Path
|
|
1005
|
-
import tempfile
|
|
1006
|
-
import shutil
|
|
1007
|
-
|
|
1008
|
-
class TestMyUploader(unittest.TestCase):
|
|
1009
|
-
"""Test suite for custom uploader."""
|
|
1010
|
-
|
|
1011
|
-
def setUp(self):
|
|
1012
|
-
"""Set up test environment."""
|
|
1013
|
-
# Create temporary directory
|
|
1014
|
-
self.temp_dir = Path(tempfile.mkdtemp())
|
|
1015
|
-
|
|
1016
|
-
# Mock run object
|
|
1017
|
-
self.mock_run = Mock()
|
|
1018
|
-
self.mock_run.log_message = Mock()
|
|
1019
|
-
|
|
1020
|
-
# Sample file specification
|
|
1021
|
-
self.file_specification = [
|
|
1022
|
-
{'name': 'image_data', 'file_type': 'image'},
|
|
1023
|
-
{'name': 'text_data', 'file_type': 'text'}
|
|
1024
|
-
]
|
|
1025
|
-
|
|
1026
|
-
# Create test files
|
|
1027
|
-
self.test_files = self.create_test_files()
|
|
1028
|
-
|
|
1029
|
-
# Sample organized files
|
|
1030
|
-
self.organized_files = [
|
|
1031
|
-
{
|
|
1032
|
-
'files': {
|
|
1033
|
-
'image_data': self.test_files['image'],
|
|
1034
|
-
'text_data': self.test_files['text']
|
|
1035
|
-
},
|
|
1036
|
-
'metadata': {'group_id': 1}
|
|
1037
|
-
}
|
|
1038
|
-
]
|
|
1039
|
-
|
|
1040
|
-
def tearDown(self):
|
|
1041
|
-
"""Clean up test environment."""
|
|
1042
|
-
shutil.rmtree(self.temp_dir)
|
|
1043
|
-
|
|
1044
|
-
def create_test_files(self) -> Dict[str, Path]:
|
|
1045
|
-
"""Create test files for testing."""
|
|
1046
|
-
files = {}
|
|
1047
|
-
|
|
1048
|
-
# Create test image file
|
|
1049
|
-
image_file = self.temp_dir / 'test_image.jpg'
|
|
1050
|
-
with open(image_file, 'wb') as f:
|
|
1051
|
-
f.write(b'fake_image_data')
|
|
1052
|
-
files['image'] = image_file
|
|
1053
|
-
|
|
1054
|
-
# Create test text file
|
|
1055
|
-
text_file = self.temp_dir / 'test_text.txt'
|
|
1056
|
-
with open(text_file, 'w') as f:
|
|
1057
|
-
f.write('test content')
|
|
1058
|
-
files['text'] = text_file
|
|
1059
|
-
|
|
1060
|
-
return files
|
|
1061
|
-
|
|
1062
|
-
def test_initialization(self):
|
|
1063
|
-
"""Test uploader initialization."""
|
|
1064
|
-
uploader = MyUploader(
|
|
1065
|
-
run=self.mock_run,
|
|
1066
|
-
path=self.temp_dir,
|
|
1067
|
-
file_specification=self.file_specification,
|
|
1068
|
-
organized_files=self.organized_files
|
|
1069
|
-
)
|
|
1070
|
-
|
|
1071
|
-
self.assertEqual(uploader.path, self.temp_dir)
|
|
1072
|
-
self.assertEqual(uploader.file_specification, self.file_specification)
|
|
1073
|
-
self.assertEqual(uploader.organized_files, self.organized_files)
|
|
1074
|
-
|
|
1075
|
-
def test_process_files(self):
|
|
1076
|
-
"""Test process_files method."""
|
|
1077
|
-
uploader = MyUploader(
|
|
1078
|
-
run=self.mock_run,
|
|
1079
|
-
path=self.temp_dir,
|
|
1080
|
-
file_specification=self.file_specification,
|
|
1081
|
-
organized_files=self.organized_files
|
|
1082
|
-
)
|
|
1083
|
-
|
|
1084
|
-
result = uploader.process_files(self.organized_files)
|
|
1085
|
-
|
|
1086
|
-
# Verify result structure
|
|
1087
|
-
self.assertIsInstance(result, list)
|
|
1088
|
-
self.assertEqual(len(result), 1)
|
|
1089
|
-
|
|
1090
|
-
# Verify processing occurred
|
|
1091
|
-
processed_file = result[0]
|
|
1092
|
-
self.assertIn('processed_by', processed_file)
|
|
1093
|
-
self.assertEqual(processed_file['processed_by'], 'MyUploader')
|
|
1094
|
-
|
|
1095
|
-
def test_handle_upload_files_workflow(self):
|
|
1096
|
-
"""Test complete workflow."""
|
|
1097
|
-
uploader = MyUploader(
|
|
1098
|
-
run=self.mock_run,
|
|
1099
|
-
path=self.temp_dir,
|
|
1100
|
-
file_specification=self.file_specification,
|
|
1101
|
-
organized_files=self.organized_files
|
|
1102
|
-
)
|
|
1103
|
-
|
|
1104
|
-
# Mock workflow methods
|
|
1105
|
-
with patch.object(uploader, 'setup_directories') as mock_setup, \
|
|
1106
|
-
patch.object(uploader, 'organize_files', return_value=self.organized_files) as mock_organize, \
|
|
1107
|
-
patch.object(uploader, 'before_process', return_value=self.organized_files) as mock_before, \
|
|
1108
|
-
patch.object(uploader, 'process_files', return_value=self.organized_files) as mock_process, \
|
|
1109
|
-
patch.object(uploader, 'after_process', return_value=self.organized_files) as mock_after, \
|
|
1110
|
-
patch.object(uploader, 'validate_files', return_value=self.organized_files) as mock_validate:
|
|
1111
|
-
|
|
1112
|
-
result = uploader.handle_upload_files()
|
|
1113
|
-
|
|
1114
|
-
# Verify all methods were called in correct order
|
|
1115
|
-
mock_setup.assert_called_once()
|
|
1116
|
-
mock_organize.assert_called_once()
|
|
1117
|
-
mock_before.assert_called_once()
|
|
1118
|
-
mock_process.assert_called_once()
|
|
1119
|
-
mock_after.assert_called_once()
|
|
1120
|
-
mock_validate.assert_called_once()
|
|
1121
|
-
|
|
1122
|
-
self.assertEqual(result, self.organized_files)
|
|
1123
|
-
|
|
1124
|
-
def test_error_handling(self):
|
|
1125
|
-
"""Test error handling in process_files."""
|
|
1126
|
-
uploader = MyUploader(
|
|
1127
|
-
run=self.mock_run,
|
|
1128
|
-
path=self.temp_dir,
|
|
1129
|
-
file_specification=self.file_specification,
|
|
1130
|
-
organized_files=self.organized_files
|
|
1131
|
-
)
|
|
1132
|
-
|
|
1133
|
-
# Test with invalid file group
|
|
1134
|
-
invalid_files = [{'invalid': 'structure'}]
|
|
1135
|
-
|
|
1136
|
-
with self.assertRaises(Exception):
|
|
1137
|
-
uploader.process_files(invalid_files)
|
|
1138
|
-
|
|
1139
|
-
@patch('your_module.some_external_dependency')
|
|
1140
|
-
def test_external_dependencies(self, mock_dependency):
|
|
1141
|
-
"""Test integration with external dependencies."""
|
|
1142
|
-
mock_dependency.return_value = 'mocked_result'
|
|
1143
|
-
|
|
1144
|
-
uploader = MyUploader(
|
|
1145
|
-
run=self.mock_run,
|
|
1146
|
-
path=self.temp_dir,
|
|
1147
|
-
file_specification=self.file_specification,
|
|
1148
|
-
organized_files=self.organized_files
|
|
1149
|
-
)
|
|
1150
|
-
|
|
1151
|
-
# Test method that uses external dependency
|
|
1152
|
-
result = uploader.some_method_using_dependency()
|
|
1153
|
-
|
|
1154
|
-
mock_dependency.assert_called_once()
|
|
1155
|
-
self.assertEqual(result, 'expected_result_based_on_mock')
|
|
1156
|
-
|
|
1157
|
-
if __name__ == '__main__':
|
|
1158
|
-
# Run specific test
|
|
1159
|
-
unittest.main()
|
|
1160
|
-
```
|
|
1161
|
-
|
|
1162
|
-
### Integration Testing
|
|
1163
|
-
|
|
1164
|
-
```python
|
|
1165
|
-
class TestUploaderIntegration(unittest.TestCase):
|
|
1166
|
-
"""Integration tests for uploader with real file operations."""
|
|
1167
|
-
|
|
1168
|
-
def setUp(self):
|
|
1169
|
-
"""Set up integration test environment."""
|
|
1170
|
-
self.temp_dir = Path(tempfile.mkdtemp())
|
|
1171
|
-
self.mock_run = Mock()
|
|
1172
|
-
|
|
1173
|
-
# Create realistic test files
|
|
1174
|
-
self.create_realistic_test_files()
|
|
1175
|
-
|
|
1176
|
-
def create_realistic_test_files(self):
|
|
1177
|
-
"""Create realistic test files for integration testing."""
|
|
1178
|
-
# Create various file types
|
|
1179
|
-
(self.temp_dir / 'images').mkdir()
|
|
1180
|
-
(self.temp_dir / 'data').mkdir()
|
|
1181
|
-
|
|
1182
|
-
# TIFF image that can be actually processed
|
|
1183
|
-
tiff_path = self.temp_dir / 'images' / 'test.tif'
|
|
1184
|
-
# Create a minimal valid TIFF file
|
|
1185
|
-
self.create_minimal_tiff(tiff_path)
|
|
1186
|
-
|
|
1187
|
-
# JSON data file
|
|
1188
|
-
json_path = self.temp_dir / 'data' / 'test.json'
|
|
1189
|
-
with open(json_path, 'w') as f:
|
|
1190
|
-
json.dump({'test': 'data', 'values': [1, 2, 3]}, f)
|
|
1191
|
-
|
|
1192
|
-
self.test_files = {
|
|
1193
|
-
'image_file': tiff_path,
|
|
1194
|
-
'data_file': json_path
|
|
1195
|
-
}
|
|
1196
|
-
|
|
1197
|
-
def create_minimal_tiff(self, path: Path):
|
|
1198
|
-
"""Create a minimal valid TIFF file for testing."""
|
|
1199
|
-
try:
|
|
1200
|
-
from PIL import Image
|
|
1201
|
-
import numpy as np
|
|
1202
|
-
|
|
1203
|
-
# Create a small test image
|
|
1204
|
-
array = np.zeros((50, 50, 3), dtype=np.uint8)
|
|
1205
|
-
array[10:40, 10:40] = [255, 0, 0] # Red square
|
|
1206
|
-
|
|
1207
|
-
image = Image.fromarray(array)
|
|
1208
|
-
image.save(path, 'TIFF')
|
|
1209
|
-
except ImportError:
|
|
1210
|
-
# Fallback: create empty file if PIL not available
|
|
1211
|
-
path.touch()
|
|
1212
|
-
|
|
1213
|
-
def test_full_workflow_with_real_files(self):
|
|
1214
|
-
"""Test complete workflow with real file operations."""
|
|
1215
|
-
file_specification = [
|
|
1216
|
-
{'name': 'test_image', 'file_type': 'image'},
|
|
1217
|
-
{'name': 'test_data', 'file_type': 'data'}
|
|
1218
|
-
]
|
|
1219
|
-
|
|
1220
|
-
organized_files = [
|
|
1221
|
-
{
|
|
1222
|
-
'files': {
|
|
1223
|
-
'test_image': self.test_files['image_file'],
|
|
1224
|
-
'test_data': self.test_files['data_file']
|
|
1225
|
-
}
|
|
1226
|
-
}
|
|
1227
|
-
]
|
|
1228
|
-
|
|
1229
|
-
uploader = ImageProcessingUploader(
|
|
1230
|
-
run=self.mock_run,
|
|
1231
|
-
path=self.temp_dir,
|
|
1232
|
-
file_specification=file_specification,
|
|
1233
|
-
organized_files=organized_files
|
|
1234
|
-
)
|
|
1235
|
-
|
|
1236
|
-
# Run complete workflow
|
|
1237
|
-
result = uploader.handle_upload_files()
|
|
1238
|
-
|
|
1239
|
-
# Verify results
|
|
1240
|
-
self.assertIsInstance(result, list)
|
|
1241
|
-
self.assertTrue(len(result) > 0)
|
|
1242
|
-
|
|
1243
|
-
# Check if processing directories were created
|
|
1244
|
-
self.assertTrue((self.temp_dir / 'processed').exists())
|
|
1245
|
-
self.assertTrue((self.temp_dir / 'thumbnails').exists())
|
|
1246
|
-
|
|
1247
|
-
# Verify logging calls
|
|
1248
|
-
self.assertTrue(self.mock_run.log_message.called)
|
|
1249
|
-
```
|
|
1250
|
-
|
|
1251
|
-
### Debugging Utilities
|
|
1252
|
-
|
|
1253
|
-
```python
|
|
1254
|
-
class DebuggingUploader(BaseUploader):
|
|
1255
|
-
"""Uploader with enhanced debugging capabilities."""
|
|
1256
|
-
|
|
1257
|
-
def __init__(self, run, path: Path, file_specification: List = None,
|
|
1258
|
-
organized_files: List = None, extra_params: Dict = None):
|
|
1259
|
-
super().__init__(run, path, file_specification, organized_files, extra_params)
|
|
1260
|
-
|
|
1261
|
-
self.debug_mode = extra_params.get('debug_mode', False)
|
|
1262
|
-
self.debug_dir = self.path / 'debug'
|
|
1263
|
-
|
|
1264
|
-
if self.debug_mode:
|
|
1265
|
-
self.debug_dir.mkdir(exist_ok=True)
|
|
1266
|
-
self.setup_debugging()
|
|
1267
|
-
|
|
1268
|
-
def setup_debugging(self):
|
|
1269
|
-
"""Initialize debugging infrastructure."""
|
|
1270
|
-
import json
|
|
1271
|
-
|
|
1272
|
-
# Save initialization state
|
|
1273
|
-
init_state = {
|
|
1274
|
-
'path': str(self.path),
|
|
1275
|
-
'file_specification': self.file_specification,
|
|
1276
|
-
'organized_files_count': len(self.organized_files),
|
|
1277
|
-
'extra_params': self.extra_params,
|
|
1278
|
-
'timestamp': datetime.now().isoformat()
|
|
1279
|
-
}
|
|
1280
|
-
|
|
1281
|
-
with open(self.debug_dir / 'init_state.json', 'w') as f:
|
|
1282
|
-
json.dump(init_state, f, indent=2, default=str)
|
|
1283
|
-
|
|
1284
|
-
def debug_log(self, message: str, data: Any = None):
|
|
1285
|
-
"""Enhanced debug logging."""
|
|
1286
|
-
if not self.debug_mode:
|
|
1287
|
-
return
|
|
1288
|
-
|
|
1289
|
-
debug_entry = {
|
|
1290
|
-
'timestamp': datetime.now().isoformat(),
|
|
1291
|
-
'message': message,
|
|
1292
|
-
'data': data
|
|
1293
|
-
}
|
|
1294
|
-
|
|
1295
|
-
# Write to debug log
|
|
1296
|
-
debug_log_path = self.debug_dir / 'debug.log'
|
|
1297
|
-
with open(debug_log_path, 'a') as f:
|
|
1298
|
-
f.write(json.dumps(debug_entry, default=str) + '\n')
|
|
1299
|
-
|
|
1300
|
-
# Also log to main run
|
|
1301
|
-
self.run.log_message(f"DEBUG: {message}")
|
|
1302
|
-
|
|
1303
|
-
def setup_directories(self):
|
|
1304
|
-
"""Setup directories with debugging."""
|
|
1305
|
-
self.debug_log("Setting up directories")
|
|
1306
|
-
super().setup_directories()
|
|
1307
|
-
|
|
1308
|
-
if self.debug_mode:
|
|
1309
|
-
# Save directory state
|
|
1310
|
-
dirs_state = {
|
|
1311
|
-
'existing_dirs': [str(p) for p in self.path.iterdir() if p.is_dir()],
|
|
1312
|
-
'path_exists': self.path.exists(),
|
|
1313
|
-
'path_writable': os.access(self.path, os.W_OK)
|
|
1314
|
-
}
|
|
1315
|
-
self.debug_log("Directory setup complete", dirs_state)
|
|
1316
|
-
|
|
1317
|
-
def process_files(self, organized_files: List) -> List:
|
|
1318
|
-
"""Process files with debugging instrumentation."""
|
|
1319
|
-
self.debug_log(f"Starting process_files with {len(organized_files)} file groups")
|
|
1320
|
-
|
|
1321
|
-
# Save input state
|
|
1322
|
-
if self.debug_mode:
|
|
1323
|
-
with open(self.debug_dir / 'input_files.json', 'w') as f:
|
|
1324
|
-
json.dump(organized_files, f, indent=2, default=str)
|
|
1325
|
-
|
|
1326
|
-
processed_files = []
|
|
1327
|
-
|
|
1328
|
-
for i, file_group in enumerate(organized_files):
|
|
1329
|
-
self.debug_log(f"Processing file group {i+1}")
|
|
1330
|
-
|
|
1331
|
-
try:
|
|
1332
|
-
# Process with timing
|
|
1333
|
-
start_time = time.time()
|
|
1334
|
-
processed_file = self.process_file_group_with_debug(file_group, i)
|
|
1335
|
-
duration = time.time() - start_time
|
|
1336
|
-
|
|
1337
|
-
processed_files.append(processed_file)
|
|
1338
|
-
self.debug_log(f"File group {i+1} processed successfully", {'duration': duration})
|
|
1339
|
-
|
|
1340
|
-
except Exception as e:
|
|
1341
|
-
error_data = {
|
|
1342
|
-
'file_group_index': i,
|
|
1343
|
-
'error': str(e),
|
|
1344
|
-
'error_type': type(e).__name__,
|
|
1345
|
-
'file_group': file_group
|
|
1346
|
-
}
|
|
1347
|
-
self.debug_log(f"Error processing file group {i+1}", error_data)
|
|
1348
|
-
|
|
1349
|
-
# Save error state
|
|
1350
|
-
if self.debug_mode:
|
|
1351
|
-
with open(self.debug_dir / f'error_group_{i}.json', 'w') as f:
|
|
1352
|
-
json.dump(error_data, f, indent=2, default=str)
|
|
1353
|
-
|
|
1354
|
-
raise
|
|
1355
|
-
|
|
1356
|
-
# Save output state
|
|
1357
|
-
if self.debug_mode:
|
|
1358
|
-
with open(self.debug_dir / 'output_files.json', 'w') as f:
|
|
1359
|
-
json.dump(processed_files, f, indent=2, default=str)
|
|
1360
|
-
|
|
1361
|
-
self.debug_log(f"process_files completed with {len(processed_files)} processed files")
|
|
1362
|
-
return processed_files
|
|
1363
|
-
|
|
1364
|
-
def process_file_group_with_debug(self, file_group: Dict, index: int) -> Dict:
|
|
1365
|
-
"""Process individual file group with debugging."""
|
|
1366
|
-
if self.debug_mode:
|
|
1367
|
-
# Save intermediate state
|
|
1368
|
-
with open(self.debug_dir / f'group_{index}_input.json', 'w') as f:
|
|
1369
|
-
json.dump(file_group, f, indent=2, default=str)
|
|
1370
|
-
|
|
1371
|
-
# Apply your processing logic
|
|
1372
|
-
processed_group = self.apply_custom_processing(file_group)
|
|
1373
|
-
|
|
1374
|
-
if self.debug_mode:
|
|
1375
|
-
# Save result state
|
|
1376
|
-
with open(self.debug_dir / f'group_{index}_output.json', 'w') as f:
|
|
1377
|
-
json.dump(processed_group, f, indent=2, default=str)
|
|
1378
|
-
|
|
1379
|
-
return processed_group
|
|
1380
|
-
|
|
1381
|
-
def apply_custom_processing(self, file_group: Dict) -> Dict:
|
|
1382
|
-
"""Your custom processing logic - implement as needed."""
|
|
1383
|
-
# Example implementation
|
|
1384
|
-
file_group['debug_processed'] = True
|
|
1385
|
-
file_group['processing_timestamp'] = datetime.now().isoformat()
|
|
1386
|
-
return file_group
|
|
1387
|
-
|
|
1388
|
-
def generate_debug_report(self):
|
|
1389
|
-
"""Generate comprehensive debug report."""
|
|
1390
|
-
if not self.debug_mode:
|
|
1391
|
-
return
|
|
1392
|
-
|
|
1393
|
-
report = {
|
|
1394
|
-
'plugin_name': self.__class__.__name__,
|
|
1395
|
-
'debug_session': datetime.now().isoformat(),
|
|
1396
|
-
'files_processed': 0,
|
|
1397
|
-
'errors': [],
|
|
1398
|
-
'performance': {}
|
|
1399
|
-
}
|
|
1400
|
-
|
|
1401
|
-
# Analyze debug files
|
|
1402
|
-
for debug_file in self.debug_dir.glob('*.json'):
|
|
1403
|
-
if debug_file.name.startswith('error_'):
|
|
1404
|
-
with open(debug_file) as f:
|
|
1405
|
-
error_data = json.load(f)
|
|
1406
|
-
report['errors'].append(error_data)
|
|
1407
|
-
elif debug_file.name == 'output_files.json':
|
|
1408
|
-
with open(debug_file) as f:
|
|
1409
|
-
output_data = json.load(f)
|
|
1410
|
-
report['files_processed'] = len(output_data)
|
|
1411
|
-
|
|
1412
|
-
# Save final report
|
|
1413
|
-
with open(self.debug_dir / 'debug_report.json', 'w') as f:
|
|
1414
|
-
json.dump(report, f, indent=2, default=str)
|
|
1415
|
-
|
|
1416
|
-
self.run.log_message(f"Debug report generated at: {self.debug_dir / 'debug_report.json'}")
|
|
1417
|
-
```
|
|
1418
|
-
|
|
1419
|
-
## Best Practices Summary
|
|
1420
|
-
|
|
1421
|
-
### 1. Code Organization
|
|
1422
|
-
- Keep `process_files()` focused on core logic
|
|
1423
|
-
- Use hook methods for setup, cleanup, and validation
|
|
1424
|
-
- Separate concerns using helper methods
|
|
1425
|
-
- Follow single responsibility principle
|
|
1426
|
-
|
|
1427
|
-
### 2. Error Handling
|
|
1428
|
-
- Implement comprehensive error handling
|
|
1429
|
-
- Log errors with context information
|
|
1430
|
-
- Fail gracefully when possible
|
|
1431
|
-
- Provide meaningful error messages
|
|
1432
|
-
|
|
1433
|
-
### 3. Performance
|
|
1434
|
-
- Profile your processing logic
|
|
1435
|
-
- Use appropriate data structures
|
|
1436
|
-
- Consider memory usage for large files
|
|
1437
|
-
- Implement async processing for I/O-heavy operations
|
|
1438
|
-
|
|
1439
|
-
### 4. Testing
|
|
1440
|
-
- Write unit tests for all methods
|
|
1441
|
-
- Include integration tests with real files
|
|
1442
|
-
- Test error conditions and edge cases
|
|
1443
|
-
- Use mocking for external dependencies
|
|
1444
|
-
|
|
1445
|
-
### 5. Logging
|
|
1446
|
-
- Log important operations and milestones
|
|
1447
|
-
- Include timing information for performance analysis
|
|
1448
|
-
- Use structured logging for better analysis
|
|
1449
|
-
- Provide different log levels (info, warning, error)
|
|
1450
|
-
|
|
1451
|
-
### 6. Configuration
|
|
1452
|
-
- Use `extra_params` for plugin configuration
|
|
1453
|
-
- Provide sensible defaults
|
|
1454
|
-
- Validate configuration parameters
|
|
1455
|
-
- Document all configuration options
|
|
1456
|
-
|
|
1457
|
-
### 7. Documentation
|
|
1458
|
-
- Document all methods with clear docstrings
|
|
1459
|
-
- Provide usage examples
|
|
1460
|
-
- Document configuration options
|
|
1461
|
-
- Include troubleshooting information
|
|
1462
|
-
|
|
1463
|
-
This comprehensive guide should help you develop robust, efficient, and maintainable upload plugins using the BaseUploader template. Remember to adapt the examples to your specific use case and requirements.
|