synapse-sdk 2025.10.1__py3-none-any.whl → 2025.10.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synapse-sdk might be problematic. Click here for more details.

Files changed (54) hide show
  1. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
  2. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
  3. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
  4. synapse_sdk/devtools/docs/docs/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
  5. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
  6. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
  7. synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
  8. synapse_sdk/devtools/docs/docs/plugins/plugins.md +12 -5
  9. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/pre-annotation-plugin-overview.md +198 -0
  10. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-action-development.md +1645 -0
  11. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-overview.md +717 -0
  12. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/pre-annotation-plugins/to-task-template-development.md +1380 -0
  13. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
  14. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
  15. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
  16. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current.json +16 -4
  17. synapse_sdk/devtools/docs/sidebars.ts +27 -1
  18. synapse_sdk/plugins/README.md +487 -80
  19. synapse_sdk/plugins/categories/export/actions/export/action.py +8 -3
  20. synapse_sdk/plugins/categories/export/actions/export/utils.py +108 -8
  21. synapse_sdk/plugins/categories/pre_annotation/actions/__init__.py +4 -0
  22. synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/__init__.py +3 -0
  23. synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/action.py +10 -0
  24. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/__init__.py +28 -0
  25. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/action.py +145 -0
  26. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/enums.py +269 -0
  27. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/exceptions.py +14 -0
  28. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/factory.py +76 -0
  29. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/models.py +97 -0
  30. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/orchestrator.py +250 -0
  31. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/run.py +64 -0
  32. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/__init__.py +17 -0
  33. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/annotation.py +284 -0
  34. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/base.py +170 -0
  35. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/extraction.py +83 -0
  36. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/metrics.py +87 -0
  37. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/preprocessor.py +127 -0
  38. synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/validation.py +143 -0
  39. synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +2 -1
  40. synapse_sdk/plugins/categories/upload/actions/upload/models.py +134 -94
  41. synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +2 -2
  42. synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +106 -14
  43. synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +113 -36
  44. synapse_sdk/plugins/categories/upload/templates/README.md +365 -0
  45. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/METADATA +1 -1
  46. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/RECORD +50 -22
  47. synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +0 -1463
  48. synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +0 -1964
  49. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +0 -1463
  50. synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +0 -2077
  51. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/WHEEL +0 -0
  52. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/entry_points.txt +0 -0
  53. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/licenses/LICENSE +0 -0
  54. {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.4.dist-info}/top_level.txt +0 -0
@@ -1,1964 +0,0 @@
1
- ---
2
- id: upload-plugins
3
- title: Upload Plugins
4
- sidebar_position: 3
5
- ---
6
-
7
- # Upload Plugins
8
-
9
- Upload plugins provide file upload and data ingestion operations for processing files into the Synapse platform with comprehensive metadata support, security validation, and organized data unit generation.
10
-
11
- ## Overview
12
-
13
- **Available Actions:**
14
-
15
- - `upload` - Upload files and directories to storage with optional Excel metadata processing
16
-
17
- **Use Cases:**
18
-
19
- - Bulk file uploads with metadata annotation
20
- - Excel-based metadata mapping and validation
21
- - Recursive directory processing
22
- - Type-based file organization
23
- - Batch data unit creation
24
- - Secure file processing with size and content validation
25
-
26
- **Supported Upload Sources:**
27
-
28
- - Local file system paths (files and directories)
29
- - Recursive directory scanning
30
- - Excel metadata files for enhanced file annotation
31
- - Mixed file types with automatic organization
32
-
33
- ## Upload Action Architecture
34
-
35
- The upload system uses a modern, extensible architecture built on proven design patterns. The refactored implementation transforms the previous monolithic approach into a modular, strategy-based system with clear separation of concerns.
36
-
37
- ### Design Patterns
38
-
39
- The architecture leverages several key design patterns:
40
-
41
- - **Strategy Pattern**: Pluggable behaviors for validation, file discovery, metadata processing, upload operations, and data unit creation
42
- - **Facade Pattern**: UploadOrchestrator provides a simplified interface to coordinate complex workflows
43
- - **Factory Pattern**: StrategyFactory creates appropriate strategy implementations based on runtime parameters
44
- - **Context Pattern**: UploadContext maintains shared state and communication between workflow components
45
-
46
- ### Component Architecture
47
-
48
- ```mermaid
49
- classDiagram
50
- %% Light/Dark mode compatible colors
51
- classDef coreClass fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
52
- classDef strategyClass fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
53
- classDef stepClass fill:#fff9c4,stroke:#f57c00,stroke-width:2px,color:#000000
54
- classDef contextClass fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
55
-
56
- class UploadAction {
57
- +name: str = "upload"
58
- +category: PluginCategory.UPLOAD
59
- +method: RunMethod.JOB
60
- +run_class: UploadRun
61
- +params_model: UploadParams
62
- +progress_categories: dict
63
- +metrics_categories: dict
64
- +strategy_factory: StrategyFactory
65
- +step_registry: StepRegistry
66
-
67
- +start() dict
68
- +get_workflow_summary() dict
69
- +_configure_workflow() None
70
- +_configure_strategies() dict
71
- }
72
-
73
- class UploadOrchestrator {
74
- +context: UploadContext
75
- +step_registry: StepRegistry
76
- +strategies: dict
77
- +executed_steps: list
78
- +current_step_index: int
79
- +rollback_executed: bool
80
-
81
- +execute() dict
82
- +get_workflow_summary() dict
83
- +get_executed_steps() list
84
- +is_rollback_executed() bool
85
- +_execute_step(step) StepResult
86
- +_handle_step_failure(step, error) None
87
- +_rollback_executed_steps() None
88
- }
89
-
90
- class UploadContext {
91
- +params: dict
92
- +run: UploadRun
93
- +client: Any
94
- +storage: Any
95
- +pathlib_cwd: Path
96
- +metadata: dict
97
- +file_specifications: dict
98
- +organized_files: list
99
- +uploaded_files: list
100
- +data_units: list
101
- +metrics: dict
102
- +errors: list
103
- +strategies: dict
104
- +rollback_data: dict
105
-
106
- +update(result: StepResult) None
107
- +get_result() dict
108
- +has_errors() bool
109
- +update_metrics(category, metrics) None
110
- }
111
-
112
- class StepRegistry {
113
- +_steps: list
114
- +register(step: BaseStep) None
115
- +get_steps() list
116
- +get_total_progress_weight() float
117
- +clear() None
118
- }
119
-
120
- class StrategyFactory {
121
- +create_validation_strategy(params, context) BaseValidationStrategy
122
- +create_file_discovery_strategy(params, context) BaseFileDiscoveryStrategy
123
- +create_metadata_strategy(params, context) BaseMetadataStrategy
124
- +create_upload_strategy(params, context) BaseUploadStrategy
125
- +create_data_unit_strategy(params, context) BaseDataUnitStrategy
126
- +get_available_strategies() dict
127
- }
128
-
129
- class BaseStep {
130
- <<abstract>>
131
- +name: str
132
- +progress_weight: float
133
- +execute(context: UploadContext) StepResult
134
- +can_skip(context: UploadContext) bool
135
- +rollback(context: UploadContext) None
136
- +create_success_result(data) StepResult
137
- +create_error_result(error) StepResult
138
- +create_skip_result() StepResult
139
- }
140
-
141
- class ExcelSecurityConfig {
142
- +max_file_size_mb: int = 10
143
- +max_rows: int = 100000
144
- +max_columns: int = 50
145
- +max_file_size_bytes: int
146
- +MAX_FILE_SIZE_MB: int
147
- +MAX_FILE_SIZE_BYTES: int
148
- +MAX_ROWS: int
149
- +MAX_COLUMNS: int
150
- +from_action_config(action_config) ExcelSecurityConfig
151
- }
152
-
153
- class StepResult {
154
- +success: bool
155
- +data: dict
156
- +error: str
157
- +rollback_data: dict
158
- +skipped: bool
159
- +original_exception: Exception
160
- +timestamp: datetime
161
- }
162
-
163
- %% Strategy Base Classes
164
- class BaseValidationStrategy {
165
- <<abstract>>
166
- +validate_files(files, context) bool
167
- +validate_security(file_path) bool
168
- }
169
-
170
- class BaseFileDiscoveryStrategy {
171
- <<abstract>>
172
- +discover_files(path, context) list
173
- +organize_files(files, specs, context) list
174
- }
175
-
176
- class BaseMetadataStrategy {
177
- <<abstract>>
178
- +process_metadata(context) dict
179
- +extract_metadata(file_path) dict
180
- }
181
-
182
- class BaseUploadStrategy {
183
- <<abstract>>
184
- +upload_files(files, context) list
185
- +upload_batch(batch, context) list
186
- }
187
-
188
- class BaseDataUnitStrategy {
189
- <<abstract>>
190
- +generate_data_units(files, context) list
191
- +create_data_unit_batch(batch, context) list
192
- }
193
-
194
- %% Workflow Steps
195
- class InitializeStep {
196
- +name = "initialize"
197
- +progress_weight = 0.05
198
- }
199
-
200
- class ProcessMetadataStep {
201
- +name = "process_metadata"
202
- +progress_weight = 0.05
203
- }
204
-
205
- class AnalyzeCollectionStep {
206
- +name = "analyze_collection"
207
- +progress_weight = 0.05
208
- }
209
-
210
- class OrganizeFilesStep {
211
- +name = "organize_files"
212
- +progress_weight = 0.10
213
- }
214
-
215
- class ValidateFilesStep {
216
- +name = "validate_files"
217
- +progress_weight = 0.05
218
- }
219
-
220
- class UploadFilesStep {
221
- +name = "upload_files"
222
- +progress_weight = 0.30
223
- }
224
-
225
- class GenerateDataUnitsStep {
226
- +name = "generate_data_units"
227
- +progress_weight = 0.35
228
- }
229
-
230
- class CleanupStep {
231
- +name = "cleanup"
232
- +progress_weight = 0.05
233
- }
234
-
235
- %% Relationships
236
- UploadAction --> UploadRun : uses
237
- UploadAction --> UploadParams : validates with
238
- UploadAction --> ExcelSecurityConfig : configures
239
- UploadAction --> UploadOrchestrator : creates and executes
240
- UploadAction --> StrategyFactory : configures strategies
241
- UploadAction --> StepRegistry : manages workflow steps
242
- UploadRun --> LogCode : logs with
243
- UploadRun --> UploadStatus : tracks status
244
- UploadOrchestrator --> UploadContext : coordinates state
245
- UploadOrchestrator --> StepRegistry : executes steps from
246
- UploadOrchestrator --> BaseStep : executes
247
- BaseStep --> StepResult : returns
248
- UploadContext --> StepResult : updates with
249
- StrategyFactory --> BaseValidationStrategy : creates
250
- StrategyFactory --> BaseFileDiscoveryStrategy : creates
251
- StrategyFactory --> BaseMetadataStrategy : creates
252
- StrategyFactory --> BaseUploadStrategy : creates
253
- StrategyFactory --> BaseDataUnitStrategy : creates
254
- StepRegistry --> BaseStep : contains
255
-
256
- %% Step inheritance
257
- InitializeStep --|> BaseStep : extends
258
- ProcessMetadataStep --|> BaseStep : extends
259
- AnalyzeCollectionStep --|> BaseStep : extends
260
- OrganizeFilesStep --|> BaseStep : extends
261
- ValidateFilesStep --|> BaseStep : extends
262
- UploadFilesStep --|> BaseStep : extends
263
- GenerateDataUnitsStep --|> BaseStep : extends
264
- CleanupStep --|> BaseStep : extends
265
-
266
- %% Note: Class styling defined above - Mermaid will apply based on classDef definitions
267
- ```
268
-
269
- ### Step-Based Workflow Execution
270
-
271
- The refactored architecture uses a step-based workflow coordinated by the UploadOrchestrator. Each step has a defined responsibility and progress weight.
272
-
273
- #### Workflow Steps Overview
274
-
275
- | Step | Name | Weight | Responsibility |
276
- | ---- | ------------------- | ------ | -------------------------------------------- |
277
- | 1 | Initialize | 5% | Setup storage, pathlib, and basic validation |
278
- | 2 | Process Metadata | 5% | Handle Excel metadata if provided |
279
- | 3 | Analyze Collection | 5% | Retrieve and validate data collection specs |
280
- | 4 | Organize Files | 10% | Discover and organize files by type |
281
- | 5 | Validate Files | 5% | Security and content validation |
282
- | 6 | Upload Files | 30% | Upload files to storage |
283
- | 7 | Generate Data Units | 35% | Create data units from uploaded files |
284
- | 8 | Cleanup | 5% | Clean temporary resources |
285
-
286
- #### Execution Flow
287
-
288
- ```mermaid
289
- flowchart TD
290
- %% Start
291
- A["🚀 Upload Action Started"] --> B["📋 Create UploadContext"]
292
- B --> C["⚙️ Configure Strategies"]
293
- C --> D["📝 Register Workflow Steps"]
294
- D --> E["🎯 Create UploadOrchestrator"]
295
-
296
- %% Strategy Injection
297
- E --> F["💉 Inject Strategies into Context"]
298
- F --> G["📊 Initialize Progress Tracking"]
299
-
300
- %% Step Execution Loop
301
- G --> H["🔄 Start Step Execution Loop"]
302
- H --> I["📍 Get Next Step"]
303
- I --> J{"🤔 Can Step be Skipped?"}
304
- J -->|Yes| K["⏭️ Skip Step"]
305
- J -->|No| L["▶️ Execute Step"]
306
-
307
- %% Step Execution
308
- L --> M{"✅ Step Successful?"}
309
- M -->|Yes| N["📈 Update Progress"]
310
- M -->|No| O["❌ Handle Step Failure"]
311
-
312
- %% Success Path
313
- N --> P["💾 Store Step Result"]
314
- P --> Q["📝 Add to Executed Steps"]
315
- Q --> R{"🏁 More Steps?"}
316
- R -->|Yes| I
317
- R -->|No| S["🎉 Workflow Complete"]
318
-
319
- %% Skip Path
320
- K --> T["📊 Update Progress (Skip)"]
321
- T --> R
322
-
323
- %% Error Handling
324
- O --> U["🔙 Start Rollback Process"]
325
- U --> V["⏪ Rollback Executed Steps"]
326
- V --> W["📝 Log Rollback Results"]
327
- W --> X["💥 Propagate Exception"]
328
-
329
- %% Final Results
330
- S --> Y["📊 Collect Final Metrics"]
331
- Y --> Z["📋 Generate Result Summary"]
332
- Z --> AA["🔄 Return to UploadAction"]
333
-
334
- %% Apply styles - Light/Dark mode compatible
335
- classDef startNode fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
336
- classDef processNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
337
- classDef decisionNode fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000000
338
- classDef successNode fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
339
- classDef errorNode fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
340
- classDef stepNode fill:#f0f4c3,stroke:#689f38,stroke-width:1px,color:#000000
341
-
342
- class A,B,E startNode
343
- class C,D,F,G,H,I,L,N,P,Q,T,Y,Z,AA processNode
344
- class J,M,R decisionNode
345
- class K,S successNode
346
- class O,U,V,W,X errorNode
347
- ```
348
-
349
- #### Strategy Integration Points
350
-
351
- Strategies are injected into the workflow at specific points:
352
-
353
- - **Validation Strategy**: Used by ValidateFilesStep
354
- - **File Discovery Strategy**: Used by OrganizeFilesStep
355
- - **Metadata Strategy**: Used by ProcessMetadataStep
356
- - **Upload Strategy**: Used by UploadFilesStep
357
- - **Data Unit Strategy**: Used by GenerateDataUnitsStep
358
-
359
- #### Error Handling and Rollback
360
-
361
- The orchestrator provides automatic rollback functionality:
362
-
363
- 1. **Exception Capture**: Preserves original exceptions for debugging
364
- 2. **Rollback Execution**: Calls rollback() on all successfully executed steps in reverse order
365
- 3. **Graceful Degradation**: Continues rollback even if individual step rollbacks fail
366
- 4. **State Preservation**: Maintains execution state for post-failure analysis
367
-
368
- ## Development Guide
369
-
370
- This section provides comprehensive guidance for extending the upload action with custom strategies and workflow steps.
371
-
372
- ### Creating Custom Strategies
373
-
374
- Strategies implement specific behaviors for different aspects of the upload process. Each strategy type has a well-defined interface.
375
-
376
- #### Custom Validation Strategy
377
-
378
- ```python
379
- from synapse_sdk.plugins.categories.upload.actions.upload.strategies.validation.base import BaseValidationStrategy
380
- from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext
381
-
382
- class CustomValidationStrategy(BaseValidationStrategy):
383
- """Custom validation strategy with advanced security checks."""
384
-
385
- def validate_files(self, files: List[Path], context: UploadContext) -> bool:
386
- """Validate files using custom business rules."""
387
- for file_path in files:
388
- # Custom validation logic
389
- if not self._validate_custom_rules(file_path):
390
- return False
391
-
392
- # Call security validation
393
- if not self.validate_security(file_path):
394
- return False
395
- return True
396
-
397
- def validate_security(self, file_path: Path) -> bool:
398
- """Custom security validation."""
399
- # Implement custom security checks
400
- if file_path.suffix in ['.exe', '.bat', '.sh']:
401
- return False
402
-
403
- # Check file size
404
- if file_path.stat().st_size > 100 * 1024 * 1024: # 100MB
405
- return False
406
-
407
- return True
408
-
409
- def _validate_custom_rules(self, file_path: Path) -> bool:
410
- """Implement domain-specific validation rules."""
411
- # Custom business logic
412
- return True
413
- ```
414
-
415
- #### Custom File Discovery Strategy
416
-
417
- ```python
418
- from synapse_sdk.plugins.categories.upload.actions.upload.strategies.file_discovery.base import BaseFileDiscoveryStrategy
419
- from pathlib import Path
420
- from typing import List, Dict, Any
421
-
422
- class CustomFileDiscoveryStrategy(BaseFileDiscoveryStrategy):
423
- """Custom file discovery with advanced filtering."""
424
-
425
- def discover_files(self, path: Path, context: UploadContext) -> List[Path]:
426
- """Discover files with custom filtering rules."""
427
- files = []
428
-
429
- if context.get_param('is_recursive', False):
430
- files = list(path.rglob('*'))
431
- else:
432
- files = list(path.iterdir())
433
-
434
- # Apply custom filtering
435
- return self._apply_custom_filters(files, context)
436
-
437
- def organize_files(self, files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]:
438
- """Organize files using custom categorization."""
439
- organized = []
440
-
441
- for file_path in files:
442
- if file_path.is_file():
443
- category = self._determine_category(file_path)
444
- organized.append({
445
- 'file_path': file_path,
446
- 'category': category,
447
- 'metadata': self._extract_file_metadata(file_path)
448
- })
449
-
450
- return organized
451
-
452
- def _apply_custom_filters(self, files: List[Path], context: UploadContext) -> List[Path]:
453
- """Apply domain-specific file filters."""
454
- filtered = []
455
- for file_path in files:
456
- if self._should_include_file(file_path):
457
- filtered.append(file_path)
458
- return filtered
459
-
460
- def _determine_category(self, file_path: Path) -> str:
461
- """Determine file category using custom logic."""
462
- # Custom categorization logic
463
- ext = file_path.suffix.lower()
464
- if ext in ['.jpg', '.png', '.gif']:
465
- return 'images'
466
- elif ext in ['.pdf', '.doc', '.docx']:
467
- return 'documents'
468
- else:
469
- return 'other'
470
- ```
471
-
472
- #### Custom Upload Strategy
473
-
474
- ```python
475
- from synapse_sdk.plugins.categories.upload.actions.upload.strategies.upload.base import BaseUploadStrategy
476
- from typing import List, Dict, Any
477
-
478
- class CustomUploadStrategy(BaseUploadStrategy):
479
- """Custom upload strategy with advanced retry logic."""
480
-
481
- def upload_files(self, files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
482
- """Upload files with custom batching and retry logic."""
483
- uploaded_files = []
484
- batch_size = context.get_param('upload_batch_size', 10)
485
-
486
- # Process in custom batches
487
- for i in range(0, len(files), batch_size):
488
- batch = files[i:i + batch_size]
489
- batch_results = self.upload_batch(batch, context)
490
- uploaded_files.extend(batch_results)
491
-
492
- return uploaded_files
493
-
494
- def upload_batch(self, batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
495
- """Upload a batch of files with retry logic."""
496
- results = []
497
-
498
- for file_info in batch:
499
- max_retries = 3
500
- for attempt in range(max_retries):
501
- try:
502
- result = self._upload_single_file(file_info, context)
503
- results.append(result)
504
- break
505
- except Exception as e:
506
- if attempt == max_retries - 1:
507
- # Final attempt failed
508
- context.add_error(f"Failed to upload {file_info['file_path']}: {e}")
509
- else:
510
- # Wait before retry
511
- time.sleep(2 ** attempt)
512
-
513
- return results
514
-
515
- def _upload_single_file(self, file_info: Dict[str, Any], context: UploadContext) -> Dict[str, Any]:
516
- """Upload a single file with custom logic."""
517
- # Custom upload implementation
518
- file_path = file_info['file_path']
519
-
520
- # Use the storage from context
521
- storage = context.storage
522
-
523
- # Custom upload logic here
524
- uploaded_file = {
525
- 'file_path': str(file_path),
526
- 'storage_path': f"uploads/{file_path.name}",
527
- 'size': file_path.stat().st_size,
528
- 'checksum': self._calculate_checksum(file_path)
529
- }
530
-
531
- return uploaded_file
532
- ```
533
-
534
- ### Creating Custom Workflow Steps
535
-
536
- Custom workflow steps extend the base step class and implement the required interface.
537
-
538
- #### Custom Processing Step
539
-
540
- ```python
541
- from synapse_sdk.plugins.categories.upload.actions.upload.steps.base import BaseStep
542
- from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext, StepResult
543
- from pathlib import Path
544
-
545
- class CustomProcessingStep(BaseStep):
546
- """Custom processing step for specialized file handling."""
547
-
548
- @property
549
- def name(self) -> str:
550
- return 'custom_processing'
551
-
552
- @property
553
- def progress_weight(self) -> float:
554
- return 0.15 # 15% of total workflow
555
-
556
- def execute(self, context: UploadContext) -> StepResult:
557
- """Execute custom processing logic."""
558
- try:
559
- # Custom processing logic
560
- processed_files = self._process_files(context)
561
-
562
- # Update context with results
563
- return self.create_success_result({
564
- 'processed_files': processed_files,
565
- 'processing_stats': self._get_processing_stats()
566
- })
567
-
568
- except Exception as e:
569
- return self.create_error_result(f'Custom processing failed: {str(e)}')
570
-
571
- def can_skip(self, context: UploadContext) -> bool:
572
- """Determine if step can be skipped."""
573
- # Skip if no files to process
574
- return len(context.organized_files) == 0
575
-
576
- def rollback(self, context: UploadContext) -> None:
577
- """Rollback custom processing operations."""
578
- # Clean up any resources created during processing
579
- self._cleanup_processing_resources(context)
580
-
581
- def _process_files(self, context: UploadContext) -> List[Dict]:
582
- """Implement custom file processing."""
583
- processed = []
584
-
585
- for file_info in context.organized_files:
586
- # Custom processing logic
587
- result = self._process_single_file(file_info)
588
- processed.append(result)
589
-
590
- return processed
591
-
592
- def _process_single_file(self, file_info: Dict) -> Dict:
593
- """Process a single file."""
594
- # Custom processing implementation
595
- return {
596
- 'original': file_info,
597
- 'processed': True,
598
- 'timestamp': datetime.now()
599
- }
600
- ```
601
-
602
- ### Strategy Factory Extension
603
-
604
- To make custom strategies available, extend the StrategyFactory:
605
-
606
- ```python
607
- from synapse_sdk.plugins.categories.upload.actions.upload.factory import StrategyFactory
608
-
609
- class CustomStrategyFactory(StrategyFactory):
610
- """Extended factory with custom strategies."""
611
-
612
- def create_validation_strategy(self, params: Dict, context=None):
613
- """Create validation strategy with custom options."""
614
- validation_type = params.get('custom_validation_type', 'default')
615
-
616
- if validation_type == 'strict':
617
- return CustomValidationStrategy()
618
- else:
619
- return super().create_validation_strategy(params, context)
620
-
621
- def create_file_discovery_strategy(self, params: Dict, context=None):
622
- """Create file discovery strategy with custom options."""
623
- discovery_mode = params.get('discovery_mode', 'default')
624
-
625
- if discovery_mode == 'advanced':
626
- return CustomFileDiscoveryStrategy()
627
- else:
628
- return super().create_file_discovery_strategy(params, context)
629
- ```
630
-
631
- ### Custom Upload Action
632
-
633
- For comprehensive customization, extend the UploadAction itself:
634
-
635
- ```python
636
- from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
637
- from synapse_sdk.plugins.categories.decorators import register_action
638
-
639
- @register_action
640
- class CustomUploadAction(UploadAction):
641
- """Custom upload action with extended workflow."""
642
-
643
- name = 'custom_upload'
644
-
645
- def __init__(self, *args, **kwargs):
646
- super().__init__(*args, **kwargs)
647
- # Use custom strategy factory
648
- self.strategy_factory = CustomStrategyFactory()
649
-
650
- def _configure_workflow(self) -> None:
651
- """Configure custom workflow with additional steps."""
652
- # Register standard steps
653
- super()._configure_workflow()
654
-
655
- # Add custom processing step
656
- self.step_registry.register(CustomProcessingStep())
657
-
658
- def _configure_strategies(self, context=None) -> Dict[str, Any]:
659
- """Configure strategies with custom parameters."""
660
- strategies = super()._configure_strategies(context)
661
-
662
- # Add custom strategy
663
- strategies['custom_processing'] = self._create_custom_processing_strategy()
664
-
665
- return strategies
666
-
667
- def _create_custom_processing_strategy(self):
668
- """Create custom processing strategy."""
669
- return CustomProcessingStrategy(self.params)
670
- ```
671
-
672
- ### Testing Custom Components
673
-
674
- #### Testing Custom Strategies
675
-
676
- ```python
677
- import pytest
678
- from unittest.mock import Mock
679
- from pathlib import Path
680
-
681
- class TestCustomValidationStrategy:
682
-
683
- def setup_method(self):
684
- self.strategy = CustomValidationStrategy()
685
- self.context = Mock()
686
-
687
- def test_validate_files_success(self):
688
- """Test successful file validation."""
689
- files = [Path('/test/file1.txt'), Path('/test/file2.jpg')]
690
- result = self.strategy.validate_files(files, self.context)
691
- assert result is True
692
-
693
- def test_validate_files_security_failure(self):
694
- """Test validation failure for security reasons."""
695
- files = [Path('/test/malware.exe')]
696
- result = self.strategy.validate_files(files, self.context)
697
- assert result is False
698
-
699
- def test_validate_large_file_failure(self):
700
- """Test validation failure for large files."""
701
- # Mock file stat to return large size
702
- large_file = Mock(spec=Path)
703
- large_file.suffix = '.txt'
704
- large_file.stat.return_value.st_size = 200 * 1024 * 1024 # 200MB
705
-
706
- result = self.strategy.validate_security(large_file)
707
- assert result is False
708
- ```
709
-
710
- #### Testing Custom Steps
711
-
712
- ```python
713
- class TestCustomProcessingStep:
714
-
715
- def setup_method(self):
716
- self.step = CustomProcessingStep()
717
- self.context = Mock()
718
- self.context.organized_files = [
719
- {'file_path': '/test/file1.txt'},
720
- {'file_path': '/test/file2.jpg'}
721
- ]
722
-
723
- def test_execute_success(self):
724
- """Test successful step execution."""
725
- result = self.step.execute(self.context)
726
-
727
- assert result.success is True
728
- assert 'processed_files' in result.data
729
- assert len(result.data['processed_files']) == 2
730
-
731
- def test_can_skip_with_no_files(self):
732
- """Test step skipping logic."""
733
- self.context.organized_files = []
734
- assert self.step.can_skip(self.context) is True
735
-
736
- def test_rollback_cleanup(self):
737
- """Test rollback cleanup."""
738
- # This should not raise an exception
739
- self.step.rollback(self.context)
740
- ```
741
-
742
- ## Upload Parameters
743
-
744
- The upload action uses `UploadParams` for comprehensive parameter validation:
745
-
746
- ### Required Parameters
747
-
748
- | Parameter | Type | Description | Validation |
749
- | ----------------- | ----- | -------------------------- | ------------------ |
750
- | `name` | `str` | Human-readable upload name | Must be non-blank |
751
- | `path` | `str` | Source file/directory path | Must be valid path |
752
- | `storage` | `int` | Target storage ID | Must exist via API |
753
- | `data_collection` | `int` | Data collection ID | Must exist via API |
754
-
755
- ### Optional Parameters
756
-
757
- | Parameter | Type | Default | Description |
758
- | ------------------------------- | ------------- | ------- | ---------------------------------- |
759
- | `description` | `str \| None` | `None` | Upload description |
760
- | `project` | `int \| None` | `None` | Project ID (validated if provided) |
761
- | `excel_metadata_path` | `str \| None` | `None` | Path to Excel metadata file |
762
- | `is_recursive` | `bool` | `False` | Scan directories recursively |
763
- | `max_file_size_mb` | `int` | `50` | Maximum file size in MB |
764
- | `creating_data_unit_batch_size` | `int` | `100` | Batch size for data units |
765
- | `use_async_upload` | `bool` | `True` | Use asynchronous processing |
766
-
767
- ### Parameter Validation
768
-
769
- The system performs real-time validation:
770
-
771
- ```python
772
- # Storage validation
773
- @field_validator('storage', mode='before')
774
- @classmethod
775
- def check_storage_exists(cls, value: str, info) -> str:
776
- action = info.context['action']
777
- client = action.client
778
- try:
779
- client.get_storage(value)
780
- except ClientError:
781
- raise PydanticCustomError('client_error', 'Storage not found')
782
- return value
783
- ```
784
-
785
- ## Excel Metadata Processing
786
-
787
- Upload plugins provide advanced Excel metadata processing with comprehensive filename matching, flexible header support, and optimized performance:
788
-
789
- ### Excel File Format
790
-
791
- The Excel file supports flexible header formats and comprehensive filename matching:
792
-
793
- #### Supported Header Formats
794
-
795
- Both header formats are supported with case-insensitive matching:
796
-
797
- **Option 1: "filename" header**
798
- | filename | category | description | custom_field |
799
- | ---------- | -------- | ------------------ | ------------ |
800
- | image1.jpg | nature | Mountain landscape | high_res |
801
- | image2.png | urban | City skyline | processed |
802
-
803
- **Option 2: "file_name" header**
804
- | file_name | category | description | custom_field |
805
- | ---------- | -------- | ------------------ | ------------ |
806
- | image1.jpg | nature | Mountain landscape | high_res |
807
- | image2.png | urban | City skyline | processed |
808
-
809
- #### Filename Matching Strategy
810
-
811
- The system uses a comprehensive 5-tier priority matching algorithm to associate files with metadata:
812
-
813
- 1. **Exact stem match** (highest priority): `image1` matches `image1.jpg`
814
- 2. **Exact filename match**: `image1.jpg` matches `image1.jpg`
815
- 3. **Metadata key stem match**: `path/image1.ext` stem matches `image1`
816
- 4. **Partial path matching**: `/uploads/image1.jpg` contains `image1`
817
- 5. **Full path matching**: Complete path matching for complex structures
818
-
819
- This robust matching ensures metadata is correctly associated regardless of file organization or naming conventions.
820
-
821
- ### Security Validation
822
-
823
- Excel files undergo comprehensive security validation:
824
-
825
- ```python
826
- class ExcelSecurityConfig:
827
- max_file_size_mb: int = 10 # File size limit in MB
828
- max_rows: int = 100000 # Row count limit
829
- max_columns: int = 50 # Column count limit
830
- ```
831
-
832
- #### Advanced Security Features
833
-
834
- - **File format validation**: Checks Excel file signatures (PK for .xlsx, compound document for .xls)
835
- - **Memory estimation**: Prevents memory exhaustion from oversized spreadsheets
836
- - **Content sanitization**: Automatic truncation of overly long values
837
- - **Error resilience**: Graceful handling of corrupted or inaccessible files
838
-
839
- ### Configuration via config.yaml
840
-
841
- Security limits and processing options can be configured:
842
-
843
- ```yaml
844
- actions:
845
- upload:
846
- excel_config:
847
- max_file_size_mb: 10 # Maximum Excel file size in MB
848
- max_rows: 100000 # Maximum number of rows allowed
849
- max_columns: 50 # Maximum number of columns allowed
850
- ```
851
-
852
- ### Performance Optimizations
853
-
854
- The Excel metadata processing includes several performance enhancements:
855
-
856
- #### Metadata Indexing
857
- - **O(1) hash lookups** for exact stem and filename matches
858
- - **Pre-built indexes** for common matching patterns
859
- - **Fallback algorithms** for complex path matching scenarios
860
-
861
- #### Efficient Processing
862
- - **Optimized row processing**: Skip empty rows early
863
- - **Memory-conscious operation**: Process files in batches
864
- - **Smart file discovery**: Cache path strings to avoid repeated conversions
865
-
866
- ### Metadata Processing Flow
867
-
868
- 1. **Security Validation**: File size, format, and content limits
869
- 2. **Header Validation**: Support for both "filename" and "file_name" with case-insensitive matching
870
- 3. **Index Building**: Create O(1) lookup structures for performance
871
- 4. **Content Processing**: Row-by-row metadata extraction with optimization
872
- 5. **Data Sanitization**: Automatic truncation and validation
873
- 6. **Pattern Matching**: 5-tier filename association algorithm
874
- 7. **Mapping Creation**: Optimized filename to metadata mapping
875
-
876
- ### Excel Metadata Parameter
877
-
878
- You can specify a custom Excel metadata file path:
879
-
880
- ```python
881
- params = {
882
- "name": "Excel Metadata Upload",
883
- "path": "/data/files",
884
- "storage": 1,
885
- "data_collection": 5,
886
- "excel_metadata_path": "/data/custom_metadata.xlsx" # Custom Excel file
887
- }
888
- ```
889
-
890
- #### Path Resolution
891
- - **Absolute paths**: Used directly if they exist and are accessible
892
- - **Relative paths**: Resolved relative to the upload path
893
- - **Default discovery**: Automatically searches for `meta.xlsx` or `meta.xls` if no path specified
894
- - **Storage integration**: Uses storage configuration for proper path resolution
895
-
896
- ### Error Handling
897
-
898
- Comprehensive error handling ensures robust operation:
899
-
900
- ```python
901
- # Excel processing errors are handled gracefully
902
- try:
903
- metadata = process_excel_metadata(excel_path)
904
- except ExcelSecurityError as e:
905
- # Security violation - file too large, too many rows, etc.
906
- log_security_violation(e)
907
- except ExcelParsingError as e:
908
- # Parsing failure - corrupted file, invalid format, etc.
909
- log_parsing_error(e)
910
- ```
911
-
912
- #### Error Recovery
913
- - **Graceful degradation**: Continue processing with empty metadata if Excel fails
914
- - **Detailed logging**: Specific error codes for different failure types
915
- - **Path validation**: Comprehensive validation during parameter processing
916
- - **Fallback behavior**: Smart defaults when metadata cannot be processed
917
-
918
- ## File Organization
919
-
920
- The upload system automatically organizes files based on their types:
921
-
922
- ### Type Detection
923
-
924
- Files are categorized based on:
925
-
926
- - File extension patterns
927
- - MIME type detection
928
- - Content analysis
929
- - Custom type rules
930
-
931
- ### Directory Structure
932
-
933
- ```
934
- upload_output/
935
- ├── images/
936
- │ ├── image1.jpg
937
- │ └── image2.png
938
- ├── documents/
939
- │ ├── report.pdf
940
- │ └── data.xlsx
941
- └── videos/
942
- └── presentation.mp4
943
- ```
944
-
945
- ### Batch Processing
946
-
947
- Files are processed in configurable batches:
948
-
949
- ```python
950
- # Configure batch size
951
- params = {
952
- "creating_data_unit_batch_size": 100,
953
- "use_async_upload": True
954
- }
955
- ```
956
-
957
- ## Progress Tracking and Metrics
958
-
959
- ### Progress Categories
960
-
961
- The upload action tracks progress across three main phases:
962
-
963
- | Category | Proportion | Description |
964
- | --------------------- | ---------- | ----------------------------------- |
965
- | `analyze_collection` | 2% | Parameter validation and setup |
966
- | `upload_data_files` | 38% | File upload processing |
967
- | `generate_data_units` | 60% | Data unit creation and finalization |
968
-
969
- ### Metrics Collection
970
-
971
- Real-time metrics are collected for monitoring:
972
-
973
- ```python
974
- metrics_categories = {
975
- 'data_files': {
976
- 'stand_by': 0, # Files waiting to be processed
977
- 'failed': 0, # Files that failed upload
978
- 'success': 0, # Successfully uploaded files
979
- },
980
- 'data_units': {
981
- 'stand_by': 0, # Units waiting to be created
982
- 'failed': 0, # Units that failed creation
983
- 'success': 0, # Successfully created units
984
- },
985
- }
986
- ```
987
-
988
- ## Type-Safe Logging
989
-
990
- The upload system uses enum-based logging for consistency:
991
-
992
- ### Log Codes
993
-
994
- ```python
995
- class LogCode(str, Enum):
996
- VALIDATION_FAILED = 'VALIDATION_FAILED'
997
- NO_FILES_FOUND = 'NO_FILES_FOUND'
998
- EXCEL_SECURITY_VIOLATION = 'EXCEL_SECURITY_VIOLATION'
999
- EXCEL_PARSING_ERROR = 'EXCEL_PARSING_ERROR'
1000
- FILES_DISCOVERED = 'FILES_DISCOVERED'
1001
- UPLOADING_DATA_FILES = 'UPLOADING_DATA_FILES'
1002
- GENERATING_DATA_UNITS = 'GENERATING_DATA_UNITS'
1003
- IMPORT_COMPLETED = 'IMPORT_COMPLETED'
1004
- ```
1005
-
1006
- ### Logging Usage
1007
-
1008
- ```python
1009
- # Basic logging
1010
- run.log_message_with_code(LogCode.FILES_DISCOVERED, file_count)
1011
-
1012
- # With custom level
1013
- run.log_message_with_code(
1014
- LogCode.EXCEL_SECURITY_VIOLATION,
1015
- filename,
1016
- level=Context.DANGER
1017
- )
1018
-
1019
- # Upload-specific events
1020
- run.log_upload_event(LogCode.UPLOADING_DATA_FILES, batch_size)
1021
- ```
1022
-
1023
- ## Migration Guide
1024
-
1025
- ### From Legacy to Refactored Architecture
1026
-
1027
- The upload action has been refactored using modern design patterns while maintaining **100% backward compatibility**. Existing code will continue to work without changes.
1028
-
1029
- #### Key Changes
1030
-
1031
- **Before (Legacy Monolithic):**
1032
-
1033
- - Single 900+ line action class with all logic
1034
- - Hard-coded behaviors for validation, file discovery, etc.
1035
- - No extensibility or customization options
1036
- - Manual error handling throughout
1037
-
1038
- **After (Strategy/Facade Patterns):**
1039
-
1040
- - Clean separation of concerns with 8 workflow steps
1041
- - Pluggable strategies for different behaviors
1042
- - Extensible architecture for custom implementations
1043
- - Automatic rollback and comprehensive error handling
1044
-
1045
- #### Backward Compatibility
1046
-
1047
- ```python
1048
- # This legacy usage still works exactly the same
1049
- from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
1050
-
1051
- params = {
1052
- "name": "My Upload",
1053
- "path": "/data/files",
1054
- "storage": 1,
1055
- "data_collection": 5 # Changed from 'collection' to 'data_collection'
1056
- }
1057
-
1058
- action = UploadAction(params=params, plugin_config=config)
1059
- result = action.start() # Works identically to before
1060
- ```
1061
-
1062
- #### Enhanced Capabilities
1063
-
1064
- The refactored architecture provides new capabilities:
1065
-
1066
- ```python
1067
- # Get detailed workflow information
1068
- action = UploadAction(params=params, plugin_config=config)
1069
- workflow_info = action.get_workflow_summary()
1070
- print(f"Configured with {workflow_info['step_count']} steps")
1071
- print(f"Available strategies: {workflow_info['available_strategies']}")
1072
-
1073
- # Execute and get detailed results
1074
- result = action.start()
1075
- print(f"Success: {result['success']}")
1076
- print(f"Uploaded files: {result['uploaded_files_count']}")
1077
- print(f"Generated data units: {result['generated_data_units_count']}")
1078
- print(f"Errors: {result['errors']}")
1079
- print(f"Metrics: {result['metrics']}")
1080
- ```
1081
-
1082
- #### Parameter Changes
1083
-
1084
- Only one parameter name changed:
1085
-
1086
- | Legacy | Refactored | Status |
1087
- | -------------------- | ----------------- | ------------------- |
1088
- | `collection` | `data_collection` | **Required change** |
1089
- | All other parameters | Unchanged | Fully compatible |
1090
-
1091
- #### Benefits of Migration
1092
-
1093
- - **Better Error Handling**: Automatic rollback on failures
1094
- - **Progress Tracking**: Detailed progress metrics across workflow steps
1095
- - **Extensibility**: Add custom strategies and steps
1096
- - **Testing**: Better testability with mock-friendly architecture
1097
- - **Maintainability**: Clean separation of concerns
1098
- - **Performance**: More efficient resource management
1099
-
1100
- ## Usage Examples
1101
-
1102
- ### Basic File Upload (Refactored Architecture)
1103
-
1104
- ```python
1105
- from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
1106
-
1107
- # Basic upload configuration with new architecture
1108
- params = {
1109
- "name": "Dataset Upload",
1110
- "description": "Training dataset for ML model",
1111
- "path": "/data/training_images",
1112
- "storage": 1,
1113
- "data_collection": 5, # Note: 'data_collection' instead of 'collection'
1114
- "is_recursive": True,
1115
- "max_file_size_mb": 100
1116
- }
1117
-
1118
- action = UploadAction(
1119
- params=params,
1120
- plugin_config=plugin_config
1121
- )
1122
-
1123
- # Execute with automatic step-based workflow and rollback
1124
- result = action.start()
1125
-
1126
- # Enhanced result information
1127
- print(f"Upload successful: {result['success']}")
1128
- print(f"Uploaded {result['uploaded_files_count']} files")
1129
- print(f"Generated {result['generated_data_units_count']} data units")
1130
- print(f"Workflow errors: {result['errors']}")
1131
-
1132
- # Access detailed metrics
1133
- workflow_metrics = result['metrics'].get('workflow', {})
1134
- print(f"Total steps executed: {workflow_metrics.get('current_step', 0)}")
1135
- print(f"Progress completed: {workflow_metrics.get('progress_percentage', 0)}%")
1136
- ```
1137
-
1138
- ### Excel Metadata Upload with Progress Tracking
1139
-
1140
- ```python
1141
- # Upload with Excel metadata and progress monitoring
1142
- params = {
1143
- "name": "Annotated Dataset Upload",
1144
- "path": "/data/images",
1145
- "storage": 1,
1146
- "data_collection": 5,
1147
- "excel_metadata_path": "/data/metadata.xlsx",
1148
- "is_recursive": False,
1149
- "creating_data_unit_batch_size": 50
1150
- }
1151
-
1152
- action = UploadAction(
1153
- params=params,
1154
- plugin_config=plugin_config
1155
- )
1156
-
1157
- # Get workflow summary before execution
1158
- workflow_info = action.get_workflow_summary()
1159
- print(f"Workflow configured with {workflow_info['step_count']} steps")
1160
- print(f"Total progress weight: {workflow_info['total_progress_weight']}")
1161
- print(f"Steps: {workflow_info['steps']}")
1162
-
1163
- # Execute with enhanced error handling
1164
- try:
1165
- result = action.start()
1166
- if result['success']:
1167
- print("Upload completed successfully!")
1168
- print(f"Files: {result['uploaded_files_count']}")
1169
- print(f"Data units: {result['generated_data_units_count']}")
1170
- else:
1171
- print("Upload failed with errors:")
1172
- for error in result['errors']:
1173
- print(f" - {error}")
1174
- except Exception as e:
1175
- print(f"Upload action failed: {e}")
1176
- ```
1177
-
1178
- ### Custom Strategy Upload
1179
-
1180
- ```python
1181
- from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
1182
- from my_custom_strategies import CustomValidationStrategy
1183
-
1184
- # Create action with custom factory
1185
- class CustomUploadAction(UploadAction):
1186
- def _configure_strategies(self, context=None):
1187
- strategies = super()._configure_strategies(context)
1188
-
1189
- # Override with custom validation
1190
- if self.params.get('use_strict_validation'):
1191
- strategies['validation'] = CustomValidationStrategy()
1192
-
1193
- return strategies
1194
-
1195
- # Use custom action
1196
- params = {
1197
- "name": "Strict Validation Upload",
1198
- "path": "/data/sensitive_files",
1199
- "storage": 1,
1200
- "data_collection": 5,
1201
- "use_strict_validation": True,
1202
- "max_file_size_mb": 10 # Stricter limits
1203
- }
1204
-
1205
- action = CustomUploadAction(
1206
- params=params,
1207
- plugin_config=plugin_config
1208
- )
1209
-
1210
- result = action.start()
1211
- ```
1212
-
1213
- ### Batch Processing with Custom Configuration
1214
-
1215
- ```python
1216
- # Custom plugin configuration with config.yaml
1217
- plugin_config = {
1218
- "actions": {
1219
- "upload": {
1220
- "excel_config": {
1221
- "max_file_size_mb": 20,
1222
- "max_rows": 50000,
1223
- "max_columns": 100
1224
- }
1225
- }
1226
- }
1227
- }
1228
-
1229
- # Large batch upload with custom settings
1230
- params = {
1231
- "name": "Large Batch Upload",
1232
- "path": "/data/large_dataset",
1233
- "storage": 2,
1234
- "data_collection": 10,
1235
- "is_recursive": True,
1236
- "max_file_size_mb": 500,
1237
- "creating_data_unit_batch_size": 200,
1238
- "use_async_upload": True
1239
- }
1240
-
1241
- action = UploadAction(
1242
- params=params,
1243
- plugin_config=plugin_config
1244
- )
1245
-
1246
- # Execute with progress monitoring
1247
- result = action.start()
1248
-
1249
- # Analyze results
1250
- print(f"Batch upload summary:")
1251
- print(f" Success: {result['success']}")
1252
- print(f" Files processed: {result['uploaded_files_count']}")
1253
- print(f" Data units created: {result['generated_data_units_count']}")
1254
-
1255
- # Check metrics by category
1256
- metrics = result['metrics']
1257
- if 'data_files' in metrics:
1258
- files_metrics = metrics['data_files']
1259
- print(f" Files - Success: {files_metrics.get('success', 0)}")
1260
- print(f" Files - Failed: {files_metrics.get('failed', 0)}")
1261
-
1262
- if 'data_units' in metrics:
1263
- units_metrics = metrics['data_units']
1264
- print(f" Units - Success: {units_metrics.get('success', 0)}")
1265
- print(f" Units - Failed: {units_metrics.get('failed', 0)}")
1266
- ```
1267
-
1268
- ### Error Handling and Rollback
1269
-
1270
- ```python
1271
- # Demonstrate enhanced error handling with automatic rollback
1272
- params = {
1273
- "name": "Error Recovery Example",
1274
- "path": "/data/problematic_files",
1275
- "storage": 1,
1276
- "data_collection": 5,
1277
- "is_recursive": True
1278
- }
1279
-
1280
- action = UploadAction(
1281
- params=params,
1282
- plugin_config=plugin_config
1283
- )
1284
-
1285
- try:
1286
- result = action.start()
1287
-
1288
- if not result['success']:
1289
- print("Upload failed, but cleanup was automatic:")
1290
- print(f"Errors encountered: {len(result['errors'])}")
1291
- for i, error in enumerate(result['errors'], 1):
1292
- print(f" {i}. {error}")
1293
-
1294
- # Check if rollback was performed (via orchestrator internals)
1295
- workflow_metrics = result['metrics'].get('workflow', {})
1296
- current_step = workflow_metrics.get('current_step', 0)
1297
- total_steps = workflow_metrics.get('total_steps', 0)
1298
- print(f"Workflow stopped at step {current_step} of {total_steps}")
1299
-
1300
- except Exception as e:
1301
- print(f"Critical upload failure: {e}")
1302
- # Rollback was automatically performed before exception propagation
1303
- ```
1304
-
1305
- ## Error Handling
1306
-
1307
- ### Exception Types
1308
-
1309
- The upload system defines specific exceptions:
1310
-
1311
- ```python
1312
- # Security violations
1313
- try:
1314
- action.run_action()
1315
- except ExcelSecurityError as e:
1316
- print(f"Excel security violation: {e}")
1317
-
1318
- # Parsing errors
1319
- except ExcelParsingError as e:
1320
- print(f"Excel parsing failed: {e}")
1321
-
1322
- # General upload errors
1323
- except ActionError as e:
1324
- print(f"Upload action failed: {e}")
1325
- ```
1326
-
1327
- ### Validation Errors
1328
-
1329
- Parameter validation provides detailed error messages:
1330
-
1331
- ```python
1332
- from pydantic import ValidationError
1333
-
1334
- try:
1335
- params = UploadParams(**invalid_params)
1336
- except ValidationError as e:
1337
- for error in e.errors():
1338
- print(f"Field {error['loc']}: {error['msg']}")
1339
- ```
1340
-
1341
- ## API Reference
1342
-
1343
- ### Core Components
1344
-
1345
- #### UploadAction
1346
-
1347
- Main upload action class implementing Strategy and Facade patterns for file processing operations.
1348
-
1349
- **Class Attributes:**
1350
-
1351
- - `name = 'upload'` - Action identifier
1352
- - `category = PluginCategory.UPLOAD` - Plugin category
1353
- - `method = RunMethod.JOB` - Execution method
1354
- - `run_class = UploadRun` - Specialized run management
1355
- - `params_model = UploadParams` - Parameter validation model
1356
- - `strategy_factory: StrategyFactory` - Creates strategy implementations
1357
- - `step_registry: StepRegistry` - Manages workflow steps
1358
-
1359
- **Key Methods:**
1360
-
1361
- - `start() -> Dict[str, Any]` - Execute orchestrated upload workflow
1362
- - `get_workflow_summary() -> Dict[str, Any]` - Get configured workflow summary
1363
- - `_configure_workflow() -> None` - Register workflow steps in execution order
1364
- - `_configure_strategies(context=None) -> Dict[str, Any]` - Create strategy instances
1365
-
1366
- **Progress Categories:**
1367
-
1368
- ```python
1369
- progress_categories = {
1370
- 'analyze_collection': {'proportion': 2},
1371
- 'upload_data_files': {'proportion': 38},
1372
- 'generate_data_units': {'proportion': 60},
1373
- }
1374
- ```
1375
-
1376
- #### UploadOrchestrator
1377
-
1378
- Facade component coordinating the complete upload workflow with automatic rollback.
1379
-
1380
- **Class Attributes:**
1381
-
1382
- - `context: UploadContext` - Shared state across workflow
1383
- - `step_registry: StepRegistry` - Registry of workflow steps
1384
- - `strategies: Dict[str, Any]` - Strategy implementations
1385
- - `executed_steps: List[BaseStep]` - Successfully executed steps
1386
- - `current_step_index: int` - Current position in workflow
1387
- - `rollback_executed: bool` - Whether rollback was performed
1388
-
1389
- **Key Methods:**
1390
-
1391
- - `execute() -> Dict[str, Any]` - Execute complete workflow with error handling
1392
- - `get_workflow_summary() -> Dict[str, Any]` - Get execution summary and metrics
1393
- - `get_executed_steps() -> List[BaseStep]` - Get list of successfully executed steps
1394
- - `is_rollback_executed() -> bool` - Check if rollback was performed
1395
- - `_execute_step(step: BaseStep) -> StepResult` - Execute individual workflow step
1396
- - `_handle_step_failure(step: BaseStep, error: Exception) -> None` - Handle step failures
1397
- - `_rollback_executed_steps() -> None` - Rollback executed steps in reverse order
1398
-
1399
- #### UploadContext
1400
-
1401
- Context object maintaining shared state and communication between workflow components.
1402
-
1403
- **State Attributes:**
1404
-
1405
- - `params: Dict` - Upload parameters
1406
- - `run: UploadRun` - Run management instance
1407
- - `client: Any` - API client for external operations
1408
- - `storage: Any` - Storage configuration object
1409
- - `pathlib_cwd: Path` - Current working directory path
1410
- - `metadata: Dict[str, Dict[str, Any]]` - File metadata mappings
1411
- - `file_specifications: Dict[str, Any]` - Data collection file specs
1412
- - `organized_files: List[Dict[str, Any]]` - Organized file information
1413
- - `uploaded_files: List[Dict[str, Any]]` - Successfully uploaded files
1414
- - `data_units: List[Dict[str, Any]]` - Generated data units
1415
-
1416
- **Progress and Metrics:**
1417
-
1418
- - `metrics: Dict[str, Any]` - Workflow metrics and statistics
1419
- - `errors: List[str]` - Accumulated error messages
1420
- - `step_results: List[StepResult]` - Results from executed steps
1421
-
1422
- **Strategy and Rollback:**
1423
-
1424
- - `strategies: Dict[str, Any]` - Injected strategy implementations
1425
- - `rollback_data: Dict[str, Any]` - Data for rollback operations
1426
-
1427
- **Key Methods:**
1428
-
1429
- - `update(result: StepResult) -> None` - Update context with step results
1430
- - `get_result() -> Dict[str, Any]` - Generate final result dictionary
1431
- - `has_errors() -> bool` - Check for accumulated errors
1432
- - `get_last_step_result() -> Optional[StepResult]` - Get most recent step result
1433
- - `update_metrics(category: str, metrics: Dict[str, Any]) -> None` - Update metrics
1434
- - `add_error(error: str) -> None` - Add error to context
1435
- - `get_param(key: str, default: Any = None) -> Any` - Get parameter with default
1436
-
1437
- #### StepRegistry
1438
-
1439
- Registry managing the collection and execution order of workflow steps.
1440
-
1441
- **Attributes:**
1442
-
1443
- - `_steps: List[BaseStep]` - Registered workflow steps in execution order
1444
-
1445
- **Key Methods:**
1446
-
1447
- - `register(step: BaseStep) -> None` - Register a workflow step
1448
- - `get_steps() -> List[BaseStep]` - Get all registered steps in order
1449
- - `get_total_progress_weight() -> float` - Calculate total progress weight
1450
- - `clear() -> None` - Clear all registered steps
1451
- - `__len__() -> int` - Get number of registered steps
1452
-
1453
- #### StrategyFactory
1454
-
1455
- Factory component creating appropriate strategy implementations based on parameters.
1456
-
1457
- **Key Methods:**
1458
-
1459
- - `create_validation_strategy(params: Dict, context=None) -> BaseValidationStrategy` - Create validation strategy
1460
- - `create_file_discovery_strategy(params: Dict, context=None) -> BaseFileDiscoveryStrategy` - Create file discovery strategy
1461
- - `create_metadata_strategy(params: Dict, context=None) -> BaseMetadataStrategy` - Create metadata processing strategy
1462
- - `create_upload_strategy(params: Dict, context: UploadContext) -> BaseUploadStrategy` - Create upload strategy (requires context)
1463
- - `create_data_unit_strategy(params: Dict, context: UploadContext) -> BaseDataUnitStrategy` - Create data unit strategy (requires context)
1464
- - `get_available_strategies() -> Dict[str, List[str]]` - Get available strategy types and implementations
1465
-
1466
- ### Workflow Steps
1467
-
1468
- #### BaseStep (Abstract)
1469
-
1470
- Base class for all workflow steps providing common interface and utilities.
1471
-
1472
- **Abstract Properties:**
1473
-
1474
- - `name: str` - Unique step identifier
1475
- - `progress_weight: float` - Weight for progress calculation (sum should equal 1.0)
1476
-
1477
- **Abstract Methods:**
1478
-
1479
- - `execute(context: UploadContext) -> StepResult` - Execute step logic
1480
- - `can_skip(context: UploadContext) -> bool` - Determine if step can be skipped
1481
- - `rollback(context: UploadContext) -> None` - Rollback step operations
1482
-
1483
- **Utility Methods:**
1484
-
1485
- - `create_success_result(data: Dict = None) -> StepResult` - Create success result
1486
- - `create_error_result(error: str, original_exception: Exception = None) -> StepResult` - Create error result
1487
- - `create_skip_result() -> StepResult` - Create skip result
1488
-
1489
- #### StepResult
1490
-
1491
- Result object returned by workflow step execution.
1492
-
1493
- **Attributes:**
1494
-
1495
- - `success: bool` - Whether step executed successfully
1496
- - `data: Dict[str, Any]` - Step result data
1497
- - `error: str` - Error message if step failed
1498
- - `rollback_data: Dict[str, Any]` - Data needed for rollback
1499
- - `skipped: bool` - Whether step was skipped
1500
- - `original_exception: Optional[Exception]` - Original exception for debugging
1501
- - `timestamp: datetime` - Execution timestamp
1502
-
1503
- **Usage:**
1504
-
1505
- ```python
1506
- # Boolean evaluation
1507
- if step_result:
1508
- # Step was successful
1509
- process_success(step_result.data)
1510
- ```
1511
-
1512
- #### Concrete Steps
1513
-
1514
- **InitializeStep** (`name: "initialize"`, `weight: 0.05`)
1515
-
1516
- - Sets up storage connection and pathlib working directory
1517
- - Validates basic upload prerequisites
1518
-
1519
- **ProcessMetadataStep** (`name: "process_metadata"`, `weight: 0.05`)
1520
-
1521
- - Processes Excel metadata if provided
1522
- - Validates metadata security and format
1523
-
1524
- **AnalyzeCollectionStep** (`name: "analyze_collection"`, `weight: 0.05`)
1525
-
1526
- - Retrieves and validates data collection file specifications
1527
- - Sets up file organization rules
1528
-
1529
- **OrganizeFilesStep** (`name: "organize_files"`, `weight: 0.10`)
1530
-
1531
- - Discovers files using file discovery strategy
1532
- - Organizes files by type and specification
1533
-
1534
- **ValidateFilesStep** (`name: "validate_files"`, `weight: 0.05`)
1535
-
1536
- - Validates files using validation strategy
1537
- - Performs security and content checks
1538
-
1539
- **UploadFilesStep** (`name: "upload_files"`, `weight: 0.30`)
1540
-
1541
- - Uploads files using upload strategy
1542
- - Handles batching and progress tracking
1543
-
1544
- **GenerateDataUnitsStep** (`name: "generate_data_units"`, `weight: 0.35`)
1545
-
1546
- - Creates data units using data unit strategy
1547
- - Links uploaded files to data units
1548
-
1549
- **CleanupStep** (`name: "cleanup"`, `weight: 0.05`)
1550
-
1551
- - Cleans temporary resources and files
1552
- - Performs final validation
1553
-
1554
- ### Strategy Base Classes
1555
-
1556
- #### BaseValidationStrategy (Abstract)
1557
-
1558
- Base class for file validation strategies.
1559
-
1560
- **Abstract Methods:**
1561
-
1562
- - `validate_files(files: List[Path], context: UploadContext) -> bool` - Validate collection of files
1563
- - `validate_security(file_path: Path) -> bool` - Validate individual file security
1564
-
1565
- #### BaseFileDiscoveryStrategy (Abstract)
1566
-
1567
- Base class for file discovery and organization strategies.
1568
-
1569
- **Abstract Methods:**
1570
-
1571
- - `discover_files(path: Path, context: UploadContext) -> List[Path]` - Discover files from path
1572
- - `organize_files(files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]` - Organize discovered files
1573
-
1574
- #### BaseMetadataStrategy (Abstract)
1575
-
1576
- Base class for metadata processing strategies.
1577
-
1578
- **Abstract Methods:**
1579
-
1580
- - `process_metadata(context: UploadContext) -> Dict[str, Any]` - Process metadata from context
1581
- - `extract_metadata(file_path: Path) -> Dict[str, Any]` - Extract metadata from file
1582
-
1583
- #### BaseUploadStrategy (Abstract)
1584
-
1585
- Base class for file upload strategies.
1586
-
1587
- **Abstract Methods:**
1588
-
1589
- - `upload_files(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload collection of files
1590
- - `upload_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload file batch
1591
-
1592
- #### BaseDataUnitStrategy (Abstract)
1593
-
1594
- Base class for data unit creation strategies.
1595
-
1596
- **Abstract Methods:**
1597
-
1598
- - `generate_data_units(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Generate data units
1599
- - `create_data_unit_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Create data unit batch
1600
-
1601
- ### Legacy Components
1602
-
1603
- #### UploadRun
1604
-
1605
- Specialized run management for upload operations (unchanged from legacy).
1606
-
1607
- **Logging Methods:**
1608
-
1609
- - `log_message_with_code(code, *args, level=None)` - Type-safe logging
1610
- - `log_upload_event(code, *args, level=None)` - Upload-specific events
1611
-
1612
- **Nested Models:**
1613
-
1614
- - `UploadEventLog` - Upload event logging
1615
- - `DataFileLog` - Data file processing logs
1616
- - `DataUnitLog` - Data unit creation logs
1617
- - `TaskLog` - Task execution logs
1618
- - `MetricsRecord` - Metrics tracking
1619
-
1620
- #### UploadParams
1621
-
1622
- Parameter validation model with Pydantic integration (unchanged from legacy).
1623
-
1624
- **Required Parameters:**
1625
-
1626
- - `name: str` - Upload name
1627
- - `path: str` - Source path
1628
- - `storage: int` - Storage ID
1629
- - `data_collection: int` - Data collection ID
1630
-
1631
- **Optional Parameters:**
1632
-
1633
- - `description: str | None = None` - Upload description
1634
- - `project: int | None = None` - Project ID
1635
- - `excel_metadata_path: str | None = None` - Excel metadata file path
1636
- - `is_recursive: bool = False` - Recursive file discovery
1637
- - `max_file_size_mb: int = 50` - Maximum file size
1638
- - `creating_data_unit_batch_size: int = 100` - Data unit batch size
1639
- - `use_async_upload: bool = True` - Async upload processing
1640
-
1641
- **Validation Features:**
1642
-
1643
- - Real-time API validation for storage/data_collection/project
1644
- - String sanitization and length validation
1645
- - Type checking and conversion
1646
- - Custom validator methods
1647
-
1648
- ### Utility Classes
1649
-
1650
- #### ExcelSecurityConfig
1651
-
1652
- Security configuration for Excel file processing.
1653
-
1654
- **Configuration Attributes:**
1655
-
1656
- - `max_file_size_mb` - File size limit in megabytes (default: 10)
1657
- - `max_rows` - Row count limit (default: 100000)
1658
- - `max_columns` - Column count limit (default: 50)
1659
-
1660
- **Key Methods:**
1661
-
1662
- - `from_action_config(action_config)` - Create config from config.yaml
1663
-
1664
- #### PathAwareJSONEncoder
1665
-
1666
- Custom JSON encoder for Path and datetime objects.
1667
-
1668
- **Supported Types:**
1669
-
1670
- - Path objects (converts to string)
1671
- - Datetime objects (ISO format)
1672
- - Standard JSON-serializable types
1673
-
1674
- ### Enums
1675
-
1676
- #### LogCode
1677
-
1678
- Type-safe logging codes for upload operations.
1679
-
1680
- **Categories:**
1681
-
1682
- - Validation codes (VALIDATION_FAILED, STORAGE_VALIDATION_FAILED)
1683
- - File processing codes (NO_FILES_FOUND, FILES_DISCOVERED)
1684
- - Excel processing codes (EXCEL_SECURITY_VIOLATION, EXCEL_PARSING_ERROR)
1685
- - Progress codes (UPLOADING_DATA_FILES, GENERATING_DATA_UNITS)
1686
-
1687
- #### UploadStatus
1688
-
1689
- Upload processing status enumeration.
1690
-
1691
- **Values:**
1692
-
1693
- - `SUCCESS = 'success'` - Operation completed successfully
1694
- - `FAILED = 'failed'` - Operation failed with errors
1695
-
1696
- ### Exceptions
1697
-
1698
- #### ExcelSecurityError
1699
-
1700
- Raised when Excel files violate security constraints.
1701
-
1702
- **Common Causes:**
1703
-
1704
- - File size exceeds limits
1705
- - Memory usage estimation too high
1706
- - Content security violations
1707
-
1708
- #### ExcelParsingError
1709
-
1710
- Raised when Excel files cannot be parsed.
1711
-
1712
- **Common Causes:**
1713
-
1714
- - File format corruption
1715
- - Invalid Excel structure
1716
- - Missing required columns
1717
- - Content parsing failures
1718
-
1719
- ## Best Practices
1720
-
1721
- ### Architecture Patterns
1722
-
1723
- 1. **Strategy Selection**: Choose appropriate strategies based on use case requirements
1724
-
1725
- - Use `RecursiveFileDiscoveryStrategy` for deep directory structures
1726
- - Use `BasicValidationStrategy` for standard file validation
1727
- - Use `AsyncUploadStrategy` for large file sets
1728
-
1729
- 2. **Step Ordering**: Maintain logical step dependencies
1730
-
1731
- - Initialize → Process Metadata → Analyze Collection → Organize Files → Validate → Upload → Generate Data Units → Cleanup
1732
- - Custom steps should be inserted at appropriate points in the workflow
1733
-
1734
- 3. **Context Management**: Leverage UploadContext for state sharing
1735
- - Store intermediate results in context for downstream steps
1736
- - Use context for cross-step communication
1737
- - Preserve rollback data for cleanup operations
1738
-
1739
- ### Performance Optimization
1740
-
1741
- 1. **Batch Processing**: Configure optimal batch sizes based on system resources
1742
-
1743
- ```python
1744
- params = {
1745
- "creating_data_unit_batch_size": 200, # Adjust based on memory
1746
- "upload_batch_size": 10, # Custom parameter for upload strategies
1747
- }
1748
- ```
1749
-
1750
- 2. **Async Operations**: Enable async processing for I/O-bound operations
1751
-
1752
- ```python
1753
- params = {
1754
- "use_async_upload": True, # Better throughput for network operations
1755
- }
1756
- ```
1757
-
1758
- 3. **Memory Management**: Monitor memory usage in custom strategies
1759
-
1760
- - Process files in chunks rather than loading all into memory
1761
- - Use generators for large file collections
1762
- - Configure Excel security limits appropriately
1763
-
1764
- 4. **Progress Monitoring**: Implement detailed progress tracking
1765
- ```python
1766
- # Custom step with progress updates
1767
- def execute(self, context):
1768
- total_files = len(context.organized_files)
1769
- for i, file_info in enumerate(context.organized_files):
1770
- # Process file
1771
- progress = (i + 1) / total_files * 100
1772
- context.update_metrics('custom_step', {'progress': progress})
1773
- ```
1774
-
1775
- ### Security Considerations
1776
-
1777
- 1. **Input Validation**: Validate all input parameters and file paths
1778
-
1779
- ```python
1780
- # Custom validation in strategy
1781
- def validate_files(self, files, context):
1782
- for file_path in files:
1783
- if not self._is_safe_path(file_path):
1784
- return False
1785
- return True
1786
- ```
1787
-
1788
- 2. **File Content Security**: Implement content-based security checks
1789
-
1790
- - Scan for malicious file signatures
1791
- - Validate file headers match extensions
1792
- - Check for embedded executables
1793
-
1794
- 3. **Excel Security**: Configure appropriate security limits
1795
-
1796
- ```python
1797
- import os
1798
- os.environ['EXCEL_MAX_FILE_SIZE_MB'] = '10'
1799
- os.environ['EXCEL_MAX_MEMORY_MB'] = '30'
1800
- ```
1801
-
1802
- 4. **Path Sanitization**: Validate and sanitize all file paths
1803
- - Prevent path traversal attacks
1804
- - Validate file extensions
1805
- - Check file permissions
1806
-
1807
- ### Error Handling and Recovery
1808
-
1809
- 1. **Graceful Degradation**: Design for partial failure scenarios
1810
-
1811
- ```python
1812
- class RobustUploadStrategy(BaseUploadStrategy):
1813
- def upload_files(self, files, context):
1814
- successful_uploads = []
1815
- failed_uploads = []
1816
-
1817
- for file_info in files:
1818
- try:
1819
- result = self._upload_file(file_info)
1820
- successful_uploads.append(result)
1821
- except Exception as e:
1822
- failed_uploads.append({'file': file_info, 'error': str(e)})
1823
- # Continue with other files instead of failing completely
1824
-
1825
- # Update context with partial results
1826
- context.add_uploaded_files(successful_uploads)
1827
- if failed_uploads:
1828
- context.add_error(f"Failed to upload {len(failed_uploads)} files")
1829
-
1830
- return successful_uploads
1831
- ```
1832
-
1833
- 2. **Rollback Design**: Implement comprehensive rollback strategies
1834
-
1835
- ```python
1836
- def rollback(self, context):
1837
- # Clean up in reverse order of operations
1838
- if hasattr(self, '_created_temp_files'):
1839
- for temp_file in self._created_temp_files:
1840
- try:
1841
- temp_file.unlink()
1842
- except Exception:
1843
- pass # Don't fail rollback due to cleanup issues
1844
- ```
1845
-
1846
- 3. **Detailed Logging**: Use structured logging for debugging
1847
- ```python
1848
- def execute(self, context):
1849
- try:
1850
- context.run.log_message_with_code(
1851
- 'CUSTOM_STEP_STARTED',
1852
- {'step': self.name, 'file_count': len(context.organized_files)}
1853
- )
1854
- # Step logic here
1855
- except Exception as e:
1856
- context.run.log_message_with_code(
1857
- 'CUSTOM_STEP_FAILED',
1858
- {'step': self.name, 'error': str(e)},
1859
- level=Context.DANGER
1860
- )
1861
- raise
1862
- ```
1863
-
1864
- ### Development Guidelines
1865
-
1866
- 1. **Custom Strategy Development**: Follow established patterns
1867
-
1868
- ```python
1869
- # Always extend appropriate base class
1870
- class MyCustomStrategy(BaseValidationStrategy):
1871
- def __init__(self, config=None):
1872
- self.config = config or {}
1873
-
1874
- def validate_files(self, files, context):
1875
- # Implement validation logic
1876
- return True
1877
-
1878
- def validate_security(self, file_path):
1879
- # Implement security validation
1880
- return True
1881
- ```
1882
-
1883
- 2. **Testing Strategy**: Comprehensive test coverage
1884
-
1885
- ```python
1886
- # Test both success and failure scenarios
1887
- class TestCustomStrategy:
1888
- def test_success_case(self):
1889
- strategy = MyCustomStrategy()
1890
- result = strategy.validate_files([Path('valid_file.txt')], mock_context)
1891
- assert result is True
1892
-
1893
- def test_security_failure(self):
1894
- strategy = MyCustomStrategy()
1895
- result = strategy.validate_security(Path('malware.exe'))
1896
- assert result is False
1897
-
1898
- def test_rollback_cleanup(self):
1899
- step = MyCustomStep()
1900
- step.rollback(mock_context)
1901
- # Assert cleanup was performed
1902
- ```
1903
-
1904
- 3. **Extension Points**: Use factory pattern for extensibility
1905
-
1906
- ```python
1907
- class CustomStrategyFactory(StrategyFactory):
1908
- def create_validation_strategy(self, params, context=None):
1909
- validation_type = params.get('validation_type', 'basic')
1910
-
1911
- strategy_map = {
1912
- 'basic': BasicValidationStrategy,
1913
- 'strict': StrictValidationStrategy,
1914
- 'custom': MyCustomValidationStrategy,
1915
- }
1916
-
1917
- strategy_class = strategy_map.get(validation_type, BasicValidationStrategy)
1918
- return strategy_class(params)
1919
- ```
1920
-
1921
- 4. **Configuration Management**: Use environment variables and parameters
1922
-
1923
- ```python
1924
- class ConfigurableStep(BaseStep):
1925
- def __init__(self):
1926
- # Allow runtime configuration
1927
- self.batch_size = int(os.getenv('STEP_BATCH_SIZE', '50'))
1928
- self.timeout = int(os.getenv('STEP_TIMEOUT_SECONDS', '300'))
1929
-
1930
- def execute(self, context):
1931
- # Use configured values
1932
- batch_size = context.get_param('step_batch_size', self.batch_size)
1933
- timeout = context.get_param('step_timeout', self.timeout)
1934
- ```
1935
-
1936
- ### Anti-Patterns to Avoid
1937
-
1938
- 1. **Tight Coupling**: Don't couple strategies to specific implementations
1939
- 2. **State Mutation**: Don't modify context state directly outside of update() method
1940
- 3. **Exception Swallowing**: Don't catch and ignore exceptions without proper handling
1941
- 4. **Blocking Operations**: Don't perform long-running synchronous operations without progress updates
1942
- 5. **Memory Leaks**: Don't hold references to large objects in step instances
1943
-
1944
- ### Troubleshooting Guide
1945
-
1946
- 1. **Step Failures**: Check step execution order and dependencies
1947
- 2. **Strategy Issues**: Verify strategy factory configuration and parameter passing
1948
- 3. **Context Problems**: Ensure proper context updates and state management
1949
- 4. **Rollback Failures**: Design idempotent rollback operations
1950
- 5. **Performance Issues**: Profile batch sizes and async operation usage
1951
-
1952
- ### Migration Checklist
1953
-
1954
- When upgrading from legacy implementation:
1955
-
1956
- - [ ] Update parameter name from `collection` to `data_collection`
1957
- - [ ] Test existing workflows for compatibility
1958
- - [ ] Review custom extensions for new architecture opportunities
1959
- - [ ] Update error handling to leverage new rollback capabilities
1960
-
1961
- For detailed information on developing custom upload plugins using the BaseUploader template, see the [Developing Upload Templates](./developing-upload-template.md) guide.
1962
- - [ ] Consider implementing custom strategies for specialized requirements
1963
- - [ ] Update test cases to validate new workflow steps
1964
- - [ ] Review logging and metrics collection for enhanced information