synapse-sdk 2025.10.1__py3-none-any.whl → 2025.10.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synapse-sdk might be problematic. Click here for more details.

Files changed (54) hide show
  1. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
  2. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
  3. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
  4. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
  5. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
  6. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
  7. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
  8. synapse_sdk/devtools/docs/docs/plugins/plugins.md +12 -5
  9. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
  10. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
  11. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
  12. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
  13. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
  14. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
  15. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
  16. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current.json +16 -4
  17. synapse_sdk/devtools/docs/sidebars.ts +27 -1
  18. synapse_sdk/plugins/README.md +487 -80
  19. synapse_sdk/plugins/categories/export/actions/export/action.py +8 -3
  20. synapse_sdk/plugins/categories/export/actions/export/utils.py +108 -8
  21. synapse_sdk/plugins/categories/pre_annotation/actions/__init__.py +4 -0
  22. synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/__init__.py +3 -0
  23. synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/action.py +10 -0
  24. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/__init__.py +28 -0
  25. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/action.py +145 -0
  26. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/enums.py +269 -0
  27. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/exceptions.py +14 -0
  28. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/factory.py +76 -0
  29. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/models.py +97 -0
  30. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/orchestrator.py +250 -0
  31. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/run.py +64 -0
  32. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/__init__.py +17 -0
  33. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/annotation.py +284 -0
  34. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/base.py +170 -0
  35. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/extraction.py +83 -0
  36. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/metrics.py +87 -0
  37. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/preprocessor.py +127 -0
  38. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/validation.py +143 -0
  39. synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +2 -1
  40. synapse_sdk/plugins/categories/upload/actions/upload/models.py +134 -94
  41. synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +2 -2
  42. synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +106 -14
  43. synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +113 -36
  44. synapse_sdk/plugins/categories/upload/templates/README.md +365 -0
  45. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/METADATA +1 -1
  46. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/RECORD +50 -22
  47. synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +0 -1463
  48. synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +0 -1964
  49. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +0 -1463
  50. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +0 -2077
  51. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/WHEEL +0 -0
  52. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/entry_points.txt +0 -0
  53. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/licenses/LICENSE +0 -0
  54. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/top_level.txt +0 -0
@@ -1,1463 +0,0 @@
1
- # Developing Upload Templates with BaseUploader
2
-
3
- This guide provides comprehensive documentation for plugin developers who want to create custom upload plugins using the BaseUploader template class. The BaseUploader follows the template method pattern to provide a structured, extensible foundation for file processing workflows.
4
-
5
- ## Quick Start
6
-
7
- ### Basic Plugin Structure
8
-
9
- Create your upload plugin by inheriting from BaseUploader:
10
-
11
- ```python
12
- from pathlib import Path
13
- from typing import List, Dict, Any
14
- from . import BaseUploader
15
-
16
- class MyUploader(BaseUploader):
17
- def __init__(self, run, path: Path, file_specification: List = None,
18
- organized_files: List = None, extra_params: Dict = None):
19
- super().__init__(run, path, file_specification, organized_files, extra_params)
20
-
21
- def process_files(self, organized_files: List) -> List:
22
- """Implement your custom file processing logic here."""
23
- # Your processing logic goes here
24
- return organized_files
25
-
26
- def handle_upload_files(self) -> List[Dict[str, Any]]:
27
- """Main entry point called by the upload action."""
28
- return super().handle_upload_files()
29
- ```
30
-
31
- ### Minimal Working Example
32
-
33
- ```python
34
- class SimpleUploader(BaseUploader):
35
- def process_files(self, organized_files: List) -> List:
36
- """Add metadata to each file group."""
37
- for file_group in organized_files:
38
- file_group['processed_by'] = 'SimpleUploader'
39
- file_group['processing_timestamp'] = datetime.now().isoformat()
40
- return organized_files
41
- ```
42
-
43
- ## Architecture Deep Dive
44
-
45
- ### Workflow Pipeline
46
-
47
- The BaseUploader implements a comprehensive 6-step workflow pipeline:
48
-
49
- ```
50
- 1. setup_directories() # Initialize directory structure
51
- 2. organize_files() # Group and structure files
52
- 3. before_process() # Pre-processing hooks
53
- 4. process_files() # Main processing logic (REQUIRED)
54
- 5. after_process() # Post-processing hooks
55
- 6. validate_files() # Final validation and filtering
56
- ```
57
-
58
- ### Template Method Pattern
59
-
60
- BaseUploader uses the template method pattern where:
61
- - **Concrete methods** provide default behavior that works for most cases
62
- - **Hook methods** allow customization at specific points
63
- - **Abstract methods** must be implemented by subclasses
64
-
65
- ## Core Methods Reference
66
-
67
- ### Required Methods
68
-
69
- #### `process_files(organized_files: List) -> List`
70
-
71
- **Purpose**: Main processing method that transforms files according to your plugin's logic.
72
-
73
- **When to use**: Always - this is the core method every plugin must implement.
74
-
75
- **Parameters**:
76
- - `organized_files`: List of file group dictionaries containing organized file data
77
-
78
- **Returns**: List of processed file groups ready for upload
79
-
80
- **Example**:
81
- ```python
82
- def process_files(self, organized_files: List) -> List:
83
- """Convert TIFF images to JPEG format."""
84
- processed_files = []
85
-
86
- for file_group in organized_files:
87
- files_dict = file_group.get('files', {})
88
- converted_files = {}
89
-
90
- for spec_name, file_path in files_dict.items():
91
- if file_path.suffix.lower() in ['.tif', '.tiff']:
92
- # Convert TIFF to JPEG
93
- jpeg_path = self.convert_tiff_to_jpeg(file_path)
94
- converted_files[spec_name] = jpeg_path
95
- self.run.log_message(f"Converted {file_path} to {jpeg_path}")
96
- else:
97
- converted_files[spec_name] = file_path
98
-
99
- file_group['files'] = converted_files
100
- processed_files.append(file_group)
101
-
102
- return processed_files
103
- ```
104
-
105
- ### Optional Hook Methods
106
-
107
- #### `setup_directories() -> None`
108
-
109
- **Purpose**: Create custom directory structures before processing begins.
110
-
111
- **When to use**: When your plugin needs specific directories for processing, temporary files, or output.
112
-
113
- **Example**:
114
- ```python
115
- def setup_directories(self):
116
- """Create processing directories."""
117
- (self.path / 'temp').mkdir(exist_ok=True)
118
- (self.path / 'processed').mkdir(exist_ok=True)
119
- (self.path / 'thumbnails').mkdir(exist_ok=True)
120
- self.run.log_message("Created processing directories")
121
- ```
122
-
123
- #### `organize_files(files: List) -> List`
124
-
125
- **Purpose**: Reorganize and structure files before main processing.
126
-
127
- **When to use**: When you need to group files differently, filter by criteria, or restructure the data.
128
-
129
- **Example**:
130
- ```python
131
- def organize_files(self, files: List) -> List:
132
- """Group files by type and size."""
133
- large_files = []
134
- small_files = []
135
-
136
- for file_group in files:
137
- total_size = sum(f.stat().st_size for f in file_group.get('files', {}).values())
138
- if total_size > 100 * 1024 * 1024: # 100MB
139
- large_files.append(file_group)
140
- else:
141
- small_files.append(file_group)
142
-
143
- # Process large files first
144
- return large_files + small_files
145
- ```
146
-
147
- #### `before_process(organized_files: List) -> List`
148
-
149
- **Purpose**: Pre-processing hook for setup tasks before main processing.
150
-
151
- **When to use**: For validation, preparation, or initialization tasks.
152
-
153
- **Example**:
154
- ```python
155
- def before_process(self, organized_files: List) -> List:
156
- """Validate and prepare files for processing."""
157
- self.run.log_message(f"Starting processing of {len(organized_files)} file groups")
158
-
159
- # Check available disk space
160
- if not self.check_disk_space(organized_files):
161
- raise Exception("Insufficient disk space for processing")
162
-
163
- # Initialize processing resources
164
- self.processing_queue = Queue()
165
- return organized_files
166
- ```
167
-
168
- #### `after_process(processed_files: List) -> List`
169
-
170
- **Purpose**: Post-processing hook for cleanup and finalization.
171
-
172
- **When to use**: For cleanup, final transformations, or resource deallocation.
173
-
174
- **Example**:
175
- ```python
176
- def after_process(self, processed_files: List) -> List:
177
- """Clean up temporary files and generate summary."""
178
- # Remove temporary files
179
- temp_dir = self.path / 'temp'
180
- if temp_dir.exists():
181
- shutil.rmtree(temp_dir)
182
-
183
- # Generate processing summary
184
- summary = {
185
- 'total_processed': len(processed_files),
186
- 'processing_time': time.time() - self.start_time,
187
- 'plugin_version': '1.0.0'
188
- }
189
-
190
- self.run.log_message(f"Processing complete: {summary}")
191
- return processed_files
192
- ```
193
-
194
- #### `validate_files(files: List) -> List`
195
-
196
- **Purpose**: Custom validation logic beyond type checking.
197
-
198
- **When to use**: When you need additional validation rules beyond the built-in file type validation.
199
-
200
- **Example**:
201
- ```python
202
- def validate_files(self, files: List) -> List:
203
- """Custom validation with size and format checks."""
204
- # First apply built-in type validation
205
- validated_files = self.validate_file_types(files)
206
-
207
- # Then apply custom validation
208
- final_files = []
209
- for file_group in validated_files:
210
- if self.validate_file_group(file_group):
211
- final_files.append(file_group)
212
- else:
213
- self.run.log_message(f"File group failed custom validation: {file_group}")
214
-
215
- return final_files
216
-
217
- def validate_file_group(self, file_group: Dict) -> bool:
218
- """Custom validation for individual file groups."""
219
- files_dict = file_group.get('files', {})
220
-
221
- for spec_name, file_path in files_dict.items():
222
- # Check file size limits
223
- if file_path.stat().st_size > 500 * 1024 * 1024: # 500MB limit
224
- return False
225
-
226
- # Check file accessibility
227
- if not os.access(file_path, os.R_OK):
228
- return False
229
-
230
- return True
231
- ```
232
-
233
- ## Advanced Features
234
-
235
- ### File Type Validation System
236
-
237
- The BaseUploader includes a sophisticated validation system that you can customize:
238
-
239
- #### Default File Extensions
240
-
241
- ```python
242
- def get_file_extensions_config(self) -> Dict[str, List[str]]:
243
- """Override to customize allowed file extensions."""
244
- return {
245
- 'pcd': ['.pcd'],
246
- 'text': ['.txt', '.html'],
247
- 'audio': ['.wav', '.mp3'],
248
- 'data': ['.bin', '.json', '.fbx'],
249
- 'image': ['.jpg', '.jpeg', '.png'],
250
- 'video': ['.mp4'],
251
- }
252
- ```
253
-
254
- #### Custom Extension Configuration
255
-
256
- ```python
257
- class CustomUploader(BaseUploader):
258
- def get_file_extensions_config(self) -> Dict[str, List[str]]:
259
- """Add support for additional formats."""
260
- config = super().get_file_extensions_config()
261
- config.update({
262
- 'cad': ['.dwg', '.dxf', '.step'],
263
- 'archive': ['.zip', '.rar', '.7z'],
264
- 'document': ['.pdf', '.docx', '.xlsx']
265
- })
266
- return config
267
- ```
268
-
269
- #### Conversion Warnings
270
-
271
- ```python
272
- def get_conversion_warnings_config(self) -> Dict[str, str]:
273
- """Override to customize conversion warnings."""
274
- return {
275
- '.tif': ' .jpg, .png',
276
- '.tiff': ' .jpg, .png',
277
- '.avi': ' .mp4',
278
- '.mov': ' .mp4',
279
- '.raw': ' .jpg, .png',
280
- '.bmp': ' .jpg, .png',
281
- }
282
- ```
283
-
284
- ### Custom Filtering
285
-
286
- Implement the `filter_files` method for fine-grained control:
287
-
288
- ```python
289
- def filter_files(self, organized_file: Dict[str, Any]) -> bool:
290
- """Custom filtering logic."""
291
- # Filter by file size
292
- files_dict = organized_file.get('files', {})
293
- total_size = sum(f.stat().st_size for f in files_dict.values())
294
-
295
- if total_size < 1024: # Skip files smaller than 1KB
296
- self.run.log_message(f"Skipping small file group: {total_size} bytes")
297
- return False
298
-
299
- # Filter by file age
300
- oldest_file = min(files_dict.values(), key=lambda f: f.stat().st_mtime)
301
- age_days = (time.time() - oldest_file.stat().st_mtime) / 86400
302
-
303
- if age_days > 365: # Skip files older than 1 year
304
- self.run.log_message(f"Skipping old file group: {age_days} days old")
305
- return False
306
-
307
- return True
308
- ```
309
-
310
- ## Real-World Examples
311
-
312
- ### Example 1: Image Processing Plugin
313
-
314
- ```python
315
- class ImageProcessingUploader(BaseUploader):
316
- """Converts TIFF images to JPEG and generates thumbnails."""
317
-
318
- def setup_directories(self):
319
- """Create directories for processed images and thumbnails."""
320
- (self.path / 'processed').mkdir(exist_ok=True)
321
- (self.path / 'thumbnails').mkdir(exist_ok=True)
322
-
323
- def organize_files(self, files: List) -> List:
324
- """Separate raw and processed images."""
325
- raw_images = []
326
- processed_images = []
327
-
328
- for file_group in files:
329
- has_raw = any(
330
- f.suffix.lower() in ['.tif', '.tiff', '.raw']
331
- for f in file_group.get('files', {}).values()
332
- )
333
-
334
- if has_raw:
335
- raw_images.append(file_group)
336
- else:
337
- processed_images.append(file_group)
338
-
339
- # Process raw images first
340
- return raw_images + processed_images
341
-
342
- def process_files(self, organized_files: List) -> List:
343
- """Convert images and generate thumbnails."""
344
- processed_files = []
345
-
346
- for file_group in organized_files:
347
- files_dict = file_group.get('files', {})
348
- converted_files = {}
349
-
350
- for spec_name, file_path in files_dict.items():
351
- if file_path.suffix.lower() in ['.tif', '.tiff']:
352
- # Convert to JPEG
353
- jpeg_path = self.convert_to_jpeg(file_path)
354
- converted_files[spec_name] = jpeg_path
355
-
356
- # Generate thumbnail
357
- thumbnail_path = self.generate_thumbnail(jpeg_path)
358
- converted_files[f"{spec_name}_thumbnail"] = thumbnail_path
359
-
360
- self.run.log_message(f"Processed {file_path.name} -> {jpeg_path.name}")
361
- else:
362
- converted_files[spec_name] = file_path
363
-
364
- file_group['files'] = converted_files
365
- processed_files.append(file_group)
366
-
367
- return processed_files
368
-
369
- def convert_to_jpeg(self, tiff_path: Path) -> Path:
370
- """Convert TIFF to JPEG using PIL."""
371
- from PIL import Image
372
-
373
- output_path = self.path / 'processed' / f"{tiff_path.stem}.jpg"
374
-
375
- with Image.open(tiff_path) as img:
376
- # Convert to RGB if necessary
377
- if img.mode in ('RGBA', 'LA', 'P'):
378
- img = img.convert('RGB')
379
-
380
- img.save(output_path, 'JPEG', quality=95)
381
-
382
- return output_path
383
-
384
- def generate_thumbnail(self, image_path: Path) -> Path:
385
- """Generate thumbnail for processed image."""
386
- from PIL import Image
387
-
388
- thumbnail_path = self.path / 'thumbnails' / f"{image_path.stem}_thumb.jpg"
389
-
390
- with Image.open(image_path) as img:
391
- img.thumbnail((200, 200), Image.Resampling.LANCZOS)
392
- img.save(thumbnail_path, 'JPEG', quality=85)
393
-
394
- return thumbnail_path
395
- ```
396
-
397
- ### Example 2: Data Validation Plugin
398
-
399
- ```python
400
- class DataValidationUploader(BaseUploader):
401
- """Validates data files and generates quality reports."""
402
-
403
- def __init__(self, run, path: Path, file_specification: List = None,
404
- organized_files: List = None, extra_params: Dict = None):
405
- super().__init__(run, path, file_specification, organized_files, extra_params)
406
-
407
- # Initialize validation config from extra_params
408
- self.validation_config = extra_params.get('validation_config', {})
409
- self.strict_mode = extra_params.get('strict_validation', False)
410
-
411
- def before_process(self, organized_files: List) -> List:
412
- """Initialize validation engine."""
413
- self.validation_results = []
414
- self.run.log_message(f"Starting validation of {len(organized_files)} file groups")
415
- return organized_files
416
-
417
- def process_files(self, organized_files: List) -> List:
418
- """Validate files and generate quality reports."""
419
- processed_files = []
420
-
421
- for file_group in organized_files:
422
- validation_result = self.validate_file_group(file_group)
423
-
424
- # Add validation metadata
425
- file_group['validation'] = validation_result
426
- file_group['quality_score'] = validation_result['score']
427
-
428
- # Include file group based on validation results
429
- if self.should_include_file_group(validation_result):
430
- processed_files.append(file_group)
431
- self.run.log_message(f"File group passed validation: {validation_result['score']}")
432
- else:
433
- self.run.log_message(f"File group failed validation: {validation_result['errors']}")
434
-
435
- return processed_files
436
-
437
- def validate_file_group(self, file_group: Dict) -> Dict:
438
- """Comprehensive validation of file group."""
439
- files_dict = file_group.get('files', {})
440
- errors = []
441
- warnings = []
442
- score = 100
443
-
444
- for spec_name, file_path in files_dict.items():
445
- # File existence and accessibility
446
- if not file_path.exists():
447
- errors.append(f"File not found: {file_path}")
448
- score -= 50
449
- continue
450
-
451
- if not os.access(file_path, os.R_OK):
452
- errors.append(f"File not readable: {file_path}")
453
- score -= 30
454
- continue
455
-
456
- # File size validation
457
- file_size = file_path.stat().st_size
458
- if file_size == 0:
459
- errors.append(f"Empty file: {file_path}")
460
- score -= 40
461
- elif file_size > 1024 * 1024 * 1024: # 1GB
462
- warnings.append(f"Large file: {file_path} ({file_size} bytes)")
463
- score -= 10
464
-
465
- # Content validation based on extension
466
- try:
467
- if file_path.suffix.lower() == '.json':
468
- self.validate_json_file(file_path)
469
- elif file_path.suffix.lower() in ['.jpg', '.png']:
470
- self.validate_image_file(file_path)
471
- # Add more content validations as needed
472
- except Exception as e:
473
- errors.append(f"Content validation failed for {file_path}: {str(e)}")
474
- score -= 25
475
-
476
- return {
477
- 'score': max(0, score),
478
- 'errors': errors,
479
- 'warnings': warnings,
480
- 'validated_at': datetime.now().isoformat()
481
- }
482
-
483
- def should_include_file_group(self, validation_result: Dict) -> bool:
484
- """Determine if file group should be included based on validation."""
485
- if validation_result['errors'] and self.strict_mode:
486
- return False
487
-
488
- min_score = self.validation_config.get('min_score', 50)
489
- return validation_result['score'] >= min_score
490
-
491
- def validate_json_file(self, file_path: Path):
492
- """Validate JSON file structure."""
493
- import json
494
- with open(file_path, 'r') as f:
495
- json.load(f) # Will raise exception if invalid JSON
496
-
497
- def validate_image_file(self, file_path: Path):
498
- """Validate image file integrity."""
499
- from PIL import Image
500
- with Image.open(file_path) as img:
501
- img.verify() # Will raise exception if corrupted
502
- ```
503
-
504
- ### Example 3: Batch Processing Plugin
505
-
506
- ```python
507
- class BatchProcessingUploader(BaseUploader):
508
- """Processes files in configurable batches with progress tracking."""
509
-
510
- def __init__(self, run, path: Path, file_specification: List = None,
511
- organized_files: List = None, extra_params: Dict = None):
512
- super().__init__(run, path, file_specification, organized_files, extra_params)
513
-
514
- self.batch_size = extra_params.get('batch_size', 10)
515
- self.parallel_processing = extra_params.get('use_parallel', True)
516
- self.max_workers = extra_params.get('max_workers', 4)
517
-
518
- def organize_files(self, files: List) -> List:
519
- """Organize files into processing batches."""
520
- batches = []
521
- current_batch = []
522
-
523
- for file_group in files:
524
- current_batch.append(file_group)
525
-
526
- if len(current_batch) >= self.batch_size:
527
- batches.append({
528
- 'batch_id': len(batches) + 1,
529
- 'files': current_batch,
530
- 'batch_size': len(current_batch)
531
- })
532
- current_batch = []
533
-
534
- # Add remaining files as final batch
535
- if current_batch:
536
- batches.append({
537
- 'batch_id': len(batches) + 1,
538
- 'files': current_batch,
539
- 'batch_size': len(current_batch)
540
- })
541
-
542
- self.run.log_message(f"Organized {len(files)} files into {len(batches)} batches")
543
- return batches
544
-
545
- def process_files(self, organized_files: List) -> List:
546
- """Process files in batches with progress tracking."""
547
- all_processed_files = []
548
- total_batches = len(organized_files)
549
-
550
- if self.parallel_processing:
551
- all_processed_files = self.process_batches_parallel(organized_files)
552
- else:
553
- all_processed_files = self.process_batches_sequential(organized_files)
554
-
555
- self.run.log_message(f"Completed processing {total_batches} batches")
556
- return all_processed_files
557
-
558
- def process_batches_sequential(self, batches: List) -> List:
559
- """Process batches sequentially."""
560
- all_files = []
561
-
562
- for i, batch in enumerate(batches, 1):
563
- self.run.log_message(f"Processing batch {i}/{len(batches)}")
564
-
565
- processed_batch = self.process_single_batch(batch)
566
- all_files.extend(processed_batch)
567
-
568
- # Update progress
569
- progress = (i / len(batches)) * 100
570
- self.run.log_message(f"Progress: {progress:.1f}% complete")
571
-
572
- return all_files
573
-
574
- def process_batches_parallel(self, batches: List) -> List:
575
- """Process batches in parallel using ThreadPoolExecutor."""
576
- from concurrent.futures import ThreadPoolExecutor, as_completed
577
-
578
- all_files = []
579
- completed_batches = 0
580
-
581
- with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
582
- # Submit all batches
583
- future_to_batch = {
584
- executor.submit(self.process_single_batch, batch): batch
585
- for batch in batches
586
- }
587
-
588
- # Process completed batches
589
- for future in as_completed(future_to_batch):
590
- batch = future_to_batch[future]
591
- try:
592
- processed_files = future.result()
593
- all_files.extend(processed_files)
594
- completed_batches += 1
595
-
596
- progress = (completed_batches / len(batches)) * 100
597
- self.run.log_message(f"Batch {batch['batch_id']} complete. Progress: {progress:.1f}%")
598
-
599
- except Exception as e:
600
- self.run.log_message(f"Batch {batch['batch_id']} failed: {str(e)}")
601
-
602
- return all_files
603
-
604
- def process_single_batch(self, batch: Dict) -> List:
605
- """Process a single batch of files."""
606
- batch_files = batch['files']
607
- processed_files = []
608
-
609
- for file_group in batch_files:
610
- # Apply your specific processing logic here
611
- processed_file = self.process_file_group(file_group)
612
- processed_files.append(processed_file)
613
-
614
- return processed_files
615
-
616
- def process_file_group(self, file_group: Dict) -> Dict:
617
- """Process individual file group - implement your logic here."""
618
- # Example: Add batch processing metadata
619
- file_group['batch_processed'] = True
620
- file_group['processed_timestamp'] = datetime.now().isoformat()
621
- return file_group
622
- ```
623
-
624
- ## Error Handling and Logging
625
-
626
- ### Comprehensive Error Handling
627
-
628
- ```python
629
- class RobustUploader(BaseUploader):
630
- def process_files(self, organized_files: List) -> List:
631
- """Process files with comprehensive error handling."""
632
- processed_files = []
633
- failed_files = []
634
-
635
- for i, file_group in enumerate(organized_files):
636
- try:
637
- self.run.log_message(f"Processing file group {i+1}/{len(organized_files)}")
638
-
639
- # Process with validation
640
- processed_file = self.process_file_group_safely(file_group)
641
- processed_files.append(processed_file)
642
-
643
- except Exception as e:
644
- error_info = {
645
- 'file_group': file_group,
646
- 'error': str(e),
647
- 'error_type': type(e).__name__,
648
- 'timestamp': datetime.now().isoformat()
649
- }
650
- failed_files.append(error_info)
651
-
652
- self.run.log_message(f"Failed to process file group: {str(e)}")
653
-
654
- # Continue processing other files
655
- continue
656
-
657
- # Log summary
658
- self.run.log_message(
659
- f"Processing complete: {len(processed_files)} successful, {len(failed_files)} failed"
660
- )
661
-
662
- if failed_files:
663
- # Save error report
664
- self.save_error_report(failed_files)
665
-
666
- return processed_files
667
-
668
- def process_file_group_safely(self, file_group: Dict) -> Dict:
669
- """Process file group with validation and error checking."""
670
- # Validate file group structure
671
- if 'files' not in file_group:
672
- raise ValueError("File group missing 'files' key")
673
-
674
- files_dict = file_group['files']
675
- if not files_dict:
676
- raise ValueError("File group has no files")
677
-
678
- # Validate file accessibility
679
- for spec_name, file_path in files_dict.items():
680
- if not file_path.exists():
681
- raise FileNotFoundError(f"File not found: {file_path}")
682
-
683
- if not os.access(file_path, os.R_OK):
684
- raise PermissionError(f"Cannot read file: {file_path}")
685
-
686
- # Perform actual processing
687
- return self.apply_processing_logic(file_group)
688
-
689
- def save_error_report(self, failed_files: List):
690
- """Save detailed error report for debugging."""
691
- error_report_path = self.path / 'error_report.json'
692
-
693
- report = {
694
- 'timestamp': datetime.now().isoformat(),
695
- 'plugin_name': self.__class__.__name__,
696
- 'total_errors': len(failed_files),
697
- 'errors': failed_files
698
- }
699
-
700
- with open(error_report_path, 'w') as f:
701
- json.dump(report, f, indent=2, default=str)
702
-
703
- self.run.log_message(f"Error report saved to: {error_report_path}")
704
- ```
705
-
706
- ### Structured Logging
707
-
708
- ```python
709
- class LoggingUploader(BaseUploader):
710
- def setup_directories(self):
711
- """Setup logging directory."""
712
- log_dir = self.path / 'logs'
713
- log_dir.mkdir(exist_ok=True)
714
-
715
- # Initialize structured logging
716
- self.setup_structured_logging(log_dir)
717
-
718
- def setup_structured_logging(self, log_dir: Path):
719
- """Setup structured logging with different levels."""
720
- import logging
721
- import json
722
-
723
- # Create custom formatter for structured logs
724
- class StructuredFormatter(logging.Formatter):
725
- def format(self, record):
726
- log_entry = {
727
- 'timestamp': datetime.now().isoformat(),
728
- 'level': record.levelname,
729
- 'message': record.getMessage(),
730
- 'plugin': 'LoggingUploader'
731
- }
732
-
733
- # Add extra fields if present
734
- if hasattr(record, 'file_path'):
735
- log_entry['file_path'] = str(record.file_path)
736
- if hasattr(record, 'operation'):
737
- log_entry['operation'] = record.operation
738
- if hasattr(record, 'duration'):
739
- log_entry['duration'] = record.duration
740
-
741
- return json.dumps(log_entry)
742
-
743
- # Setup logger
744
- self.logger = logging.getLogger('upload_plugin')
745
- self.logger.setLevel(logging.INFO)
746
-
747
- # File handler
748
- handler = logging.FileHandler(log_dir / 'plugin.log')
749
- handler.setFormatter(StructuredFormatter())
750
- self.logger.addHandler(handler)
751
-
752
- def process_files(self, organized_files: List) -> List:
753
- """Process files with detailed logging."""
754
- start_time = time.time()
755
-
756
- self.logger.info(
757
- f"Starting file processing",
758
- extra={'operation': 'process_files', 'file_count': len(organized_files)}
759
- )
760
-
761
- processed_files = []
762
-
763
- for i, file_group in enumerate(organized_files):
764
- file_start_time = time.time()
765
-
766
- try:
767
- # Process file group
768
- processed_file = self.process_file_group(file_group)
769
- processed_files.append(processed_file)
770
-
771
- # Log success
772
- duration = time.time() - file_start_time
773
- self.logger.info(
774
- f"Successfully processed file group {i+1}",
775
- extra={
776
- 'operation': 'process_file_group',
777
- 'file_group_index': i,
778
- 'duration': duration
779
- }
780
- )
781
-
782
- except Exception as e:
783
- # Log error
784
- duration = time.time() - file_start_time
785
- self.logger.error(
786
- f"Failed to process file group {i+1}: {str(e)}",
787
- extra={
788
- 'operation': 'process_file_group',
789
- 'file_group_index': i,
790
- 'duration': duration,
791
- 'error': str(e)
792
- }
793
- )
794
- raise
795
-
796
- # Log overall completion
797
- total_duration = time.time() - start_time
798
- self.logger.info(
799
- f"Completed file processing",
800
- extra={
801
- 'operation': 'process_files',
802
- 'total_duration': total_duration,
803
- 'processed_count': len(processed_files)
804
- }
805
- )
806
-
807
- return processed_files
808
- ```
809
-
810
- ## Performance Optimization
811
-
812
- ### Memory Management
813
-
814
- ```python
815
- class MemoryEfficientUploader(BaseUploader):
816
- """Uploader optimized for large file processing."""
817
-
818
- def __init__(self, run, path: Path, file_specification: List = None,
819
- organized_files: List = None, extra_params: Dict = None):
820
- super().__init__(run, path, file_specification, organized_files, extra_params)
821
-
822
- self.chunk_size = extra_params.get('chunk_size', 8192) # 8KB chunks
823
- self.memory_limit = extra_params.get('memory_limit_mb', 100) * 1024 * 1024
824
-
825
- def process_files(self, organized_files: List) -> List:
826
- """Process files with memory management."""
827
- import psutil
828
- import gc
829
-
830
- processed_files = []
831
-
832
- for file_group in organized_files:
833
- # Check memory usage before processing
834
- memory_usage = psutil.Process().memory_info().rss
835
-
836
- if memory_usage > self.memory_limit:
837
- self.run.log_message(f"High memory usage: {memory_usage / 1024 / 1024:.1f}MB")
838
-
839
- # Force garbage collection
840
- gc.collect()
841
-
842
- # Check again after cleanup
843
- memory_usage = psutil.Process().memory_info().rss
844
- if memory_usage > self.memory_limit:
845
- self.run.log_message("Memory limit exceeded, processing in smaller chunks")
846
- processed_file = self.process_file_group_chunked(file_group)
847
- else:
848
- processed_file = self.process_file_group_normal(file_group)
849
- else:
850
- processed_file = self.process_file_group_normal(file_group)
851
-
852
- processed_files.append(processed_file)
853
-
854
- return processed_files
855
-
856
- def process_file_group_chunked(self, file_group: Dict) -> Dict:
857
- """Process large files in chunks to manage memory."""
858
- files_dict = file_group.get('files', {})
859
- processed_files = {}
860
-
861
- for spec_name, file_path in files_dict.items():
862
- if file_path.stat().st_size > 50 * 1024 * 1024: # 50MB
863
- # Process large files in chunks
864
- processed_path = self.process_large_file_chunked(file_path)
865
- processed_files[spec_name] = processed_path
866
- else:
867
- # Process smaller files normally
868
- processed_files[spec_name] = file_path
869
-
870
- file_group['files'] = processed_files
871
- return file_group
872
-
873
- def process_large_file_chunked(self, file_path: Path) -> Path:
874
- """Process large file in chunks."""
875
- output_path = self.path / 'processed' / file_path.name
876
-
877
- with open(file_path, 'rb') as infile, open(output_path, 'wb') as outfile:
878
- while True:
879
- chunk = infile.read(self.chunk_size)
880
- if not chunk:
881
- break
882
-
883
- # Apply processing to chunk
884
- processed_chunk = self.process_chunk(chunk)
885
- outfile.write(processed_chunk)
886
-
887
- return output_path
888
-
889
- def process_chunk(self, chunk: bytes) -> bytes:
890
- """Process individual chunk - override with your logic."""
891
- # Example: Simple pass-through
892
- return chunk
893
- ```
894
-
895
- ### Async Processing
896
-
897
- ```python
898
- import asyncio
899
- from concurrent.futures import ProcessPoolExecutor
900
-
901
- class AsyncUploader(BaseUploader):
902
- """Uploader with asynchronous processing capabilities."""
903
-
904
- def __init__(self, run, path: Path, file_specification: List = None,
905
- organized_files: List = None, extra_params: Dict = None):
906
- super().__init__(run, path, file_specification, organized_files, extra_params)
907
-
908
- self.max_concurrent = extra_params.get('max_concurrent', 5)
909
- self.use_process_pool = extra_params.get('use_process_pool', False)
910
-
911
- def process_files(self, organized_files: List) -> List:
912
- """Process files asynchronously."""
913
- # Run async processing in sync context
914
- return asyncio.run(self._process_files_async(organized_files))
915
-
916
- async def _process_files_async(self, organized_files: List) -> List:
917
- """Main async processing method."""
918
- if self.use_process_pool:
919
- return await self._process_with_process_pool(organized_files)
920
- else:
921
- return await self._process_with_async_tasks(organized_files)
922
-
923
- async def _process_with_async_tasks(self, organized_files: List) -> List:
924
- """Process using async tasks with concurrency limit."""
925
- semaphore = asyncio.Semaphore(self.max_concurrent)
926
-
927
- async def process_with_semaphore(file_group):
928
- async with semaphore:
929
- return await self._process_file_group_async(file_group)
930
-
931
- # Create tasks for all file groups
932
- tasks = [
933
- process_with_semaphore(file_group)
934
- for file_group in organized_files
935
- ]
936
-
937
- # Wait for all tasks to complete
938
- processed_files = await asyncio.gather(*tasks, return_exceptions=True)
939
-
940
- # Filter out exceptions and log errors
941
- valid_files = []
942
- for i, result in enumerate(processed_files):
943
- if isinstance(result, Exception):
944
- self.run.log_message(f"Error processing file group {i}: {str(result)}")
945
- else:
946
- valid_files.append(result)
947
-
948
- return valid_files
949
-
950
- async def _process_with_process_pool(self, organized_files: List) -> List:
951
- """Process using process pool for CPU-intensive tasks."""
952
- loop = asyncio.get_event_loop()
953
-
954
- with ProcessPoolExecutor(max_workers=self.max_concurrent) as executor:
955
- # Submit all tasks to process pool
956
- futures = [
957
- loop.run_in_executor(executor, self._process_file_group_sync, file_group)
958
- for file_group in organized_files
959
- ]
960
-
961
- # Wait for completion
962
- processed_files = await asyncio.gather(*futures, return_exceptions=True)
963
-
964
- # Filter exceptions
965
- valid_files = []
966
- for i, result in enumerate(processed_files):
967
- if isinstance(result, Exception):
968
- self.run.log_message(f"Error in process pool for file group {i}: {str(result)}")
969
- else:
970
- valid_files.append(result)
971
-
972
- return valid_files
973
-
974
- async def _process_file_group_async(self, file_group: Dict) -> Dict:
975
- """Async processing of individual file group."""
976
- # Simulate async I/O operation
977
- await asyncio.sleep(0.1)
978
-
979
- # Apply your processing logic here
980
- file_group['async_processed'] = True
981
- file_group['processed_timestamp'] = datetime.now().isoformat()
982
-
983
- return file_group
984
-
985
- def _process_file_group_sync(self, file_group: Dict) -> Dict:
986
- """Synchronous processing for process pool."""
987
- # This runs in a separate process
988
- import time
989
- time.sleep(0.1) # Simulate CPU work
990
-
991
- file_group['process_pool_processed'] = True
992
- file_group['processed_timestamp'] = datetime.now().isoformat()
993
-
994
- return file_group
995
- ```
996
-
997
- ## Testing and Debugging
998
-
999
- ### Unit Testing Framework
1000
-
1001
- ```python
1002
- import unittest
1003
- from unittest.mock import Mock, patch, MagicMock
1004
- from pathlib import Path
1005
- import tempfile
1006
- import shutil
1007
-
1008
- class TestMyUploader(unittest.TestCase):
1009
- """Test suite for custom uploader."""
1010
-
1011
- def setUp(self):
1012
- """Set up test environment."""
1013
- # Create temporary directory
1014
- self.temp_dir = Path(tempfile.mkdtemp())
1015
-
1016
- # Mock run object
1017
- self.mock_run = Mock()
1018
- self.mock_run.log_message = Mock()
1019
-
1020
- # Sample file specification
1021
- self.file_specification = [
1022
- {'name': 'image_data', 'file_type': 'image'},
1023
- {'name': 'text_data', 'file_type': 'text'}
1024
- ]
1025
-
1026
- # Create test files
1027
- self.test_files = self.create_test_files()
1028
-
1029
- # Sample organized files
1030
- self.organized_files = [
1031
- {
1032
- 'files': {
1033
- 'image_data': self.test_files['image'],
1034
- 'text_data': self.test_files['text']
1035
- },
1036
- 'metadata': {'group_id': 1}
1037
- }
1038
- ]
1039
-
1040
- def tearDown(self):
1041
- """Clean up test environment."""
1042
- shutil.rmtree(self.temp_dir)
1043
-
1044
- def create_test_files(self) -> Dict[str, Path]:
1045
- """Create test files for testing."""
1046
- files = {}
1047
-
1048
- # Create test image file
1049
- image_file = self.temp_dir / 'test_image.jpg'
1050
- with open(image_file, 'wb') as f:
1051
- f.write(b'fake_image_data')
1052
- files['image'] = image_file
1053
-
1054
- # Create test text file
1055
- text_file = self.temp_dir / 'test_text.txt'
1056
- with open(text_file, 'w') as f:
1057
- f.write('test content')
1058
- files['text'] = text_file
1059
-
1060
- return files
1061
-
1062
- def test_initialization(self):
1063
- """Test uploader initialization."""
1064
- uploader = MyUploader(
1065
- run=self.mock_run,
1066
- path=self.temp_dir,
1067
- file_specification=self.file_specification,
1068
- organized_files=self.organized_files
1069
- )
1070
-
1071
- self.assertEqual(uploader.path, self.temp_dir)
1072
- self.assertEqual(uploader.file_specification, self.file_specification)
1073
- self.assertEqual(uploader.organized_files, self.organized_files)
1074
-
1075
- def test_process_files(self):
1076
- """Test process_files method."""
1077
- uploader = MyUploader(
1078
- run=self.mock_run,
1079
- path=self.temp_dir,
1080
- file_specification=self.file_specification,
1081
- organized_files=self.organized_files
1082
- )
1083
-
1084
- result = uploader.process_files(self.organized_files)
1085
-
1086
- # Verify result structure
1087
- self.assertIsInstance(result, list)
1088
- self.assertEqual(len(result), 1)
1089
-
1090
- # Verify processing occurred
1091
- processed_file = result[0]
1092
- self.assertIn('processed_by', processed_file)
1093
- self.assertEqual(processed_file['processed_by'], 'MyUploader')
1094
-
1095
- def test_handle_upload_files_workflow(self):
1096
- """Test complete workflow."""
1097
- uploader = MyUploader(
1098
- run=self.mock_run,
1099
- path=self.temp_dir,
1100
- file_specification=self.file_specification,
1101
- organized_files=self.organized_files
1102
- )
1103
-
1104
- # Mock workflow methods
1105
- with patch.object(uploader, 'setup_directories') as mock_setup, \
1106
- patch.object(uploader, 'organize_files', return_value=self.organized_files) as mock_organize, \
1107
- patch.object(uploader, 'before_process', return_value=self.organized_files) as mock_before, \
1108
- patch.object(uploader, 'process_files', return_value=self.organized_files) as mock_process, \
1109
- patch.object(uploader, 'after_process', return_value=self.organized_files) as mock_after, \
1110
- patch.object(uploader, 'validate_files', return_value=self.organized_files) as mock_validate:
1111
-
1112
- result = uploader.handle_upload_files()
1113
-
1114
- # Verify all methods were called in correct order
1115
- mock_setup.assert_called_once()
1116
- mock_organize.assert_called_once()
1117
- mock_before.assert_called_once()
1118
- mock_process.assert_called_once()
1119
- mock_after.assert_called_once()
1120
- mock_validate.assert_called_once()
1121
-
1122
- self.assertEqual(result, self.organized_files)
1123
-
1124
- def test_error_handling(self):
1125
- """Test error handling in process_files."""
1126
- uploader = MyUploader(
1127
- run=self.mock_run,
1128
- path=self.temp_dir,
1129
- file_specification=self.file_specification,
1130
- organized_files=self.organized_files
1131
- )
1132
-
1133
- # Test with invalid file group
1134
- invalid_files = [{'invalid': 'structure'}]
1135
-
1136
- with self.assertRaises(Exception):
1137
- uploader.process_files(invalid_files)
1138
-
1139
- @patch('your_module.some_external_dependency')
1140
- def test_external_dependencies(self, mock_dependency):
1141
- """Test integration with external dependencies."""
1142
- mock_dependency.return_value = 'mocked_result'
1143
-
1144
- uploader = MyUploader(
1145
- run=self.mock_run,
1146
- path=self.temp_dir,
1147
- file_specification=self.file_specification,
1148
- organized_files=self.organized_files
1149
- )
1150
-
1151
- # Test method that uses external dependency
1152
- result = uploader.some_method_using_dependency()
1153
-
1154
- mock_dependency.assert_called_once()
1155
- self.assertEqual(result, 'expected_result_based_on_mock')
1156
-
1157
- if __name__ == '__main__':
1158
- # Run specific test
1159
- unittest.main()
1160
- ```
1161
-
1162
- ### Integration Testing
1163
-
1164
- ```python
1165
- class TestUploaderIntegration(unittest.TestCase):
1166
- """Integration tests for uploader with real file operations."""
1167
-
1168
- def setUp(self):
1169
- """Set up integration test environment."""
1170
- self.temp_dir = Path(tempfile.mkdtemp())
1171
- self.mock_run = Mock()
1172
-
1173
- # Create realistic test files
1174
- self.create_realistic_test_files()
1175
-
1176
- def create_realistic_test_files(self):
1177
- """Create realistic test files for integration testing."""
1178
- # Create various file types
1179
- (self.temp_dir / 'images').mkdir()
1180
- (self.temp_dir / 'data').mkdir()
1181
-
1182
- # TIFF image that can be actually processed
1183
- tiff_path = self.temp_dir / 'images' / 'test.tif'
1184
- # Create a minimal valid TIFF file
1185
- self.create_minimal_tiff(tiff_path)
1186
-
1187
- # JSON data file
1188
- json_path = self.temp_dir / 'data' / 'test.json'
1189
- with open(json_path, 'w') as f:
1190
- json.dump({'test': 'data', 'values': [1, 2, 3]}, f)
1191
-
1192
- self.test_files = {
1193
- 'image_file': tiff_path,
1194
- 'data_file': json_path
1195
- }
1196
-
1197
- def create_minimal_tiff(self, path: Path):
1198
- """Create a minimal valid TIFF file for testing."""
1199
- try:
1200
- from PIL import Image
1201
- import numpy as np
1202
-
1203
- # Create a small test image
1204
- array = np.zeros((50, 50, 3), dtype=np.uint8)
1205
- array[10:40, 10:40] = [255, 0, 0] # Red square
1206
-
1207
- image = Image.fromarray(array)
1208
- image.save(path, 'TIFF')
1209
- except ImportError:
1210
- # Fallback: create empty file if PIL not available
1211
- path.touch()
1212
-
1213
- def test_full_workflow_with_real_files(self):
1214
- """Test complete workflow with real file operations."""
1215
- file_specification = [
1216
- {'name': 'test_image', 'file_type': 'image'},
1217
- {'name': 'test_data', 'file_type': 'data'}
1218
- ]
1219
-
1220
- organized_files = [
1221
- {
1222
- 'files': {
1223
- 'test_image': self.test_files['image_file'],
1224
- 'test_data': self.test_files['data_file']
1225
- }
1226
- }
1227
- ]
1228
-
1229
- uploader = ImageProcessingUploader(
1230
- run=self.mock_run,
1231
- path=self.temp_dir,
1232
- file_specification=file_specification,
1233
- organized_files=organized_files
1234
- )
1235
-
1236
- # Run complete workflow
1237
- result = uploader.handle_upload_files()
1238
-
1239
- # Verify results
1240
- self.assertIsInstance(result, list)
1241
- self.assertTrue(len(result) > 0)
1242
-
1243
- # Check if processing directories were created
1244
- self.assertTrue((self.temp_dir / 'processed').exists())
1245
- self.assertTrue((self.temp_dir / 'thumbnails').exists())
1246
-
1247
- # Verify logging calls
1248
- self.assertTrue(self.mock_run.log_message.called)
1249
- ```
1250
-
1251
- ### Debugging Utilities
1252
-
1253
- ```python
1254
- class DebuggingUploader(BaseUploader):
1255
- """Uploader with enhanced debugging capabilities."""
1256
-
1257
- def __init__(self, run, path: Path, file_specification: List = None,
1258
- organized_files: List = None, extra_params: Dict = None):
1259
- super().__init__(run, path, file_specification, organized_files, extra_params)
1260
-
1261
- self.debug_mode = extra_params.get('debug_mode', False)
1262
- self.debug_dir = self.path / 'debug'
1263
-
1264
- if self.debug_mode:
1265
- self.debug_dir.mkdir(exist_ok=True)
1266
- self.setup_debugging()
1267
-
1268
- def setup_debugging(self):
1269
- """Initialize debugging infrastructure."""
1270
- import json
1271
-
1272
- # Save initialization state
1273
- init_state = {
1274
- 'path': str(self.path),
1275
- 'file_specification': self.file_specification,
1276
- 'organized_files_count': len(self.organized_files),
1277
- 'extra_params': self.extra_params,
1278
- 'timestamp': datetime.now().isoformat()
1279
- }
1280
-
1281
- with open(self.debug_dir / 'init_state.json', 'w') as f:
1282
- json.dump(init_state, f, indent=2, default=str)
1283
-
1284
- def debug_log(self, message: str, data: Any = None):
1285
- """Enhanced debug logging."""
1286
- if not self.debug_mode:
1287
- return
1288
-
1289
- debug_entry = {
1290
- 'timestamp': datetime.now().isoformat(),
1291
- 'message': message,
1292
- 'data': data
1293
- }
1294
-
1295
- # Write to debug log
1296
- debug_log_path = self.debug_dir / 'debug.log'
1297
- with open(debug_log_path, 'a') as f:
1298
- f.write(json.dumps(debug_entry, default=str) + '\n')
1299
-
1300
- # Also log to main run
1301
- self.run.log_message(f"DEBUG: {message}")
1302
-
1303
- def setup_directories(self):
1304
- """Setup directories with debugging."""
1305
- self.debug_log("Setting up directories")
1306
- super().setup_directories()
1307
-
1308
- if self.debug_mode:
1309
- # Save directory state
1310
- dirs_state = {
1311
- 'existing_dirs': [str(p) for p in self.path.iterdir() if p.is_dir()],
1312
- 'path_exists': self.path.exists(),
1313
- 'path_writable': os.access(self.path, os.W_OK)
1314
- }
1315
- self.debug_log("Directory setup complete", dirs_state)
1316
-
1317
- def process_files(self, organized_files: List) -> List:
1318
- """Process files with debugging instrumentation."""
1319
- self.debug_log(f"Starting process_files with {len(organized_files)} file groups")
1320
-
1321
- # Save input state
1322
- if self.debug_mode:
1323
- with open(self.debug_dir / 'input_files.json', 'w') as f:
1324
- json.dump(organized_files, f, indent=2, default=str)
1325
-
1326
- processed_files = []
1327
-
1328
- for i, file_group in enumerate(organized_files):
1329
- self.debug_log(f"Processing file group {i+1}")
1330
-
1331
- try:
1332
- # Process with timing
1333
- start_time = time.time()
1334
- processed_file = self.process_file_group_with_debug(file_group, i)
1335
- duration = time.time() - start_time
1336
-
1337
- processed_files.append(processed_file)
1338
- self.debug_log(f"File group {i+1} processed successfully", {'duration': duration})
1339
-
1340
- except Exception as e:
1341
- error_data = {
1342
- 'file_group_index': i,
1343
- 'error': str(e),
1344
- 'error_type': type(e).__name__,
1345
- 'file_group': file_group
1346
- }
1347
- self.debug_log(f"Error processing file group {i+1}", error_data)
1348
-
1349
- # Save error state
1350
- if self.debug_mode:
1351
- with open(self.debug_dir / f'error_group_{i}.json', 'w') as f:
1352
- json.dump(error_data, f, indent=2, default=str)
1353
-
1354
- raise
1355
-
1356
- # Save output state
1357
- if self.debug_mode:
1358
- with open(self.debug_dir / 'output_files.json', 'w') as f:
1359
- json.dump(processed_files, f, indent=2, default=str)
1360
-
1361
- self.debug_log(f"process_files completed with {len(processed_files)} processed files")
1362
- return processed_files
1363
-
1364
- def process_file_group_with_debug(self, file_group: Dict, index: int) -> Dict:
1365
- """Process individual file group with debugging."""
1366
- if self.debug_mode:
1367
- # Save intermediate state
1368
- with open(self.debug_dir / f'group_{index}_input.json', 'w') as f:
1369
- json.dump(file_group, f, indent=2, default=str)
1370
-
1371
- # Apply your processing logic
1372
- processed_group = self.apply_custom_processing(file_group)
1373
-
1374
- if self.debug_mode:
1375
- # Save result state
1376
- with open(self.debug_dir / f'group_{index}_output.json', 'w') as f:
1377
- json.dump(processed_group, f, indent=2, default=str)
1378
-
1379
- return processed_group
1380
-
1381
- def apply_custom_processing(self, file_group: Dict) -> Dict:
1382
- """Your custom processing logic - implement as needed."""
1383
- # Example implementation
1384
- file_group['debug_processed'] = True
1385
- file_group['processing_timestamp'] = datetime.now().isoformat()
1386
- return file_group
1387
-
1388
- def generate_debug_report(self):
1389
- """Generate comprehensive debug report."""
1390
- if not self.debug_mode:
1391
- return
1392
-
1393
- report = {
1394
- 'plugin_name': self.__class__.__name__,
1395
- 'debug_session': datetime.now().isoformat(),
1396
- 'files_processed': 0,
1397
- 'errors': [],
1398
- 'performance': {}
1399
- }
1400
-
1401
- # Analyze debug files
1402
- for debug_file in self.debug_dir.glob('*.json'):
1403
- if debug_file.name.startswith('error_'):
1404
- with open(debug_file) as f:
1405
- error_data = json.load(f)
1406
- report['errors'].append(error_data)
1407
- elif debug_file.name == 'output_files.json':
1408
- with open(debug_file) as f:
1409
- output_data = json.load(f)
1410
- report['files_processed'] = len(output_data)
1411
-
1412
- # Save final report
1413
- with open(self.debug_dir / 'debug_report.json', 'w') as f:
1414
- json.dump(report, f, indent=2, default=str)
1415
-
1416
- self.run.log_message(f"Debug report generated at: {self.debug_dir / 'debug_report.json'}")
1417
- ```
1418
-
1419
- ## Best Practices Summary
1420
-
1421
- ### 1. Code Organization
1422
- - Keep `process_files()` focused on core logic
1423
- - Use hook methods for setup, cleanup, and validation
1424
- - Separate concerns using helper methods
1425
- - Follow single responsibility principle
1426
-
1427
- ### 2. Error Handling
1428
- - Implement comprehensive error handling
1429
- - Log errors with context information
1430
- - Fail gracefully when possible
1431
- - Provide meaningful error messages
1432
-
1433
- ### 3. Performance
1434
- - Profile your processing logic
1435
- - Use appropriate data structures
1436
- - Consider memory usage for large files
1437
- - Implement async processing for I/O-heavy operations
1438
-
1439
- ### 4. Testing
1440
- - Write unit tests for all methods
1441
- - Include integration tests with real files
1442
- - Test error conditions and edge cases
1443
- - Use mocking for external dependencies
1444
-
1445
- ### 5. Logging
1446
- - Log important operations and milestones
1447
- - Include timing information for performance analysis
1448
- - Use structured logging for better analysis
1449
- - Provide different log levels (info, warning, error)
1450
-
1451
- ### 6. Configuration
1452
- - Use `extra_params` for plugin configuration
1453
- - Provide sensible defaults
1454
- - Validate configuration parameters
1455
- - Document all configuration options
1456
-
1457
- ### 7. Documentation
1458
- - Document all methods with clear docstrings
1459
- - Provide usage examples
1460
- - Document configuration options
1461
- - Include troubleshooting information
1462
-
1463
- This comprehensive guide should help you develop robust, efficient, and maintainable upload plugins using the BaseUploader template. Remember to adapt the examples to your specific use case and requirements.