synapse-sdk 2025.9.1__py3-none-any.whl → 2025.9.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synapse-sdk might be problematic. Click here for more details.
- synapse_sdk/devtools/docs/docs/api/clients/annotation-mixin.md +378 -0
- synapse_sdk/devtools/docs/docs/api/clients/backend.md +368 -1
- synapse_sdk/devtools/docs/docs/api/clients/core-mixin.md +477 -0
- synapse_sdk/devtools/docs/docs/api/clients/data-collection-mixin.md +422 -0
- synapse_sdk/devtools/docs/docs/api/clients/hitl-mixin.md +554 -0
- synapse_sdk/devtools/docs/docs/api/clients/index.md +391 -0
- synapse_sdk/devtools/docs/docs/api/clients/integration-mixin.md +571 -0
- synapse_sdk/devtools/docs/docs/api/clients/ml-mixin.md +578 -0
- synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +1463 -0
- synapse_sdk/devtools/docs/docs/plugins/export-plugins.md +161 -34
- synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +1497 -213
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/annotation-mixin.md +289 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/backend.md +378 -11
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/core-mixin.md +417 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/data-collection-mixin.md +356 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/hitl-mixin.md +192 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/index.md +391 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/integration-mixin.md +479 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/api/clients/ml-mixin.md +284 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +1463 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/export-plugins.md +161 -34
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +1752 -572
- synapse_sdk/devtools/docs/sidebars.ts +7 -0
- synapse_sdk/plugins/README.md +1 -2
- synapse_sdk/plugins/categories/base.py +7 -0
- synapse_sdk/plugins/categories/export/actions/__init__.py +3 -0
- synapse_sdk/plugins/categories/export/actions/export/__init__.py +28 -0
- synapse_sdk/plugins/categories/export/actions/export/action.py +160 -0
- synapse_sdk/plugins/categories/export/actions/export/enums.py +113 -0
- synapse_sdk/plugins/categories/export/actions/export/exceptions.py +53 -0
- synapse_sdk/plugins/categories/export/actions/export/models.py +74 -0
- synapse_sdk/plugins/categories/export/actions/export/run.py +195 -0
- synapse_sdk/plugins/categories/export/actions/export/utils.py +187 -0
- synapse_sdk/plugins/categories/export/templates/plugin/__init__.py +1 -1
- synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +1 -2
- synapse_sdk/plugins/categories/upload/actions/upload/action.py +154 -531
- synapse_sdk/plugins/categories/upload/actions/upload/context.py +185 -0
- synapse_sdk/plugins/categories/upload/actions/upload/factory.py +143 -0
- synapse_sdk/plugins/categories/upload/actions/upload/models.py +66 -29
- synapse_sdk/plugins/categories/upload/actions/upload/orchestrator.py +182 -0
- synapse_sdk/plugins/categories/upload/actions/upload/registry.py +113 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/base.py +106 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +62 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/collection.py +62 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/generate.py +80 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/initialize.py +66 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +101 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +89 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/upload.py +96 -0
- synapse_sdk/plugins/categories/upload/actions/upload/steps/validate.py +61 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/base.py +86 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/data_unit/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/data_unit/batch.py +39 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/data_unit/single.py +34 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/file_discovery/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/file_discovery/flat.py +233 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/file_discovery/recursive.py +253 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/metadata/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/metadata/excel.py +174 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/metadata/none.py +16 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/upload/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/upload/async_upload.py +109 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/upload/sync.py +43 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/validation/__init__.py +1 -0
- synapse_sdk/plugins/categories/upload/actions/upload/strategies/validation/default.py +45 -0
- synapse_sdk/plugins/categories/upload/actions/upload/utils.py +194 -83
- synapse_sdk/plugins/categories/upload/templates/config.yaml +4 -0
- synapse_sdk/plugins/categories/upload/templates/plugin/__init__.py +269 -0
- synapse_sdk/plugins/categories/upload/templates/plugin/upload.py +71 -27
- synapse_sdk/plugins/models.py +7 -0
- synapse_sdk/shared/__init__.py +21 -0
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/METADATA +2 -1
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/RECORD +79 -28
- synapse_sdk/plugins/categories/export/actions/export.py +0 -385
- synapse_sdk/plugins/categories/export/enums.py +0 -7
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/WHEEL +0 -0
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/entry_points.txt +0 -0
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/licenses/LICENSE +0 -0
- {synapse_sdk-2025.9.1.dist-info → synapse_sdk-2025.9.4.dist-info}/top_level.txt +0 -0
|
@@ -32,16 +32,26 @@ Upload plugins provide file upload and data ingestion operations for processing
|
|
|
32
32
|
|
|
33
33
|
## Upload Action Architecture
|
|
34
34
|
|
|
35
|
-
The upload system uses a
|
|
35
|
+
The upload system uses a modern, extensible architecture built on proven design patterns. The refactored implementation transforms the previous monolithic approach into a modular, strategy-based system with clear separation of concerns.
|
|
36
|
+
|
|
37
|
+
### Design Patterns
|
|
38
|
+
|
|
39
|
+
The architecture leverages several key design patterns:
|
|
40
|
+
|
|
41
|
+
- **Strategy Pattern**: Pluggable behaviors for validation, file discovery, metadata processing, upload operations, and data unit creation
|
|
42
|
+
- **Facade Pattern**: UploadOrchestrator provides a simplified interface to coordinate complex workflows
|
|
43
|
+
- **Factory Pattern**: StrategyFactory creates appropriate strategy implementations based on runtime parameters
|
|
44
|
+
- **Context Pattern**: UploadContext maintains shared state and communication between workflow components
|
|
45
|
+
|
|
46
|
+
### Component Architecture
|
|
36
47
|
|
|
37
48
|
```mermaid
|
|
38
49
|
classDiagram
|
|
39
|
-
%% Light/Dark mode compatible colors
|
|
40
|
-
classDef
|
|
41
|
-
classDef
|
|
42
|
-
classDef
|
|
43
|
-
classDef
|
|
44
|
-
classDef enumClass fill:#ffccbc80,stroke:#d32f2f,stroke-width:2px
|
|
50
|
+
%% Light/Dark mode compatible colors
|
|
51
|
+
classDef coreClass fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
|
|
52
|
+
classDef strategyClass fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
|
|
53
|
+
classDef stepClass fill:#fff9c4,stroke:#f57c00,stroke-width:2px,color:#000000
|
|
54
|
+
classDef contextClass fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
|
|
45
55
|
|
|
46
56
|
class UploadAction {
|
|
47
57
|
+name: str = "upload"
|
|
@@ -51,160 +61,682 @@ classDiagram
|
|
|
51
61
|
+params_model: UploadParams
|
|
52
62
|
+progress_categories: dict
|
|
53
63
|
+metrics_categories: dict
|
|
64
|
+
+strategy_factory: StrategyFactory
|
|
65
|
+
+step_registry: StepRegistry
|
|
54
66
|
|
|
55
67
|
+start() dict
|
|
56
|
-
+
|
|
57
|
-
+
|
|
58
|
-
+
|
|
59
|
-
|
|
60
|
-
|
|
68
|
+
+get_workflow_summary() dict
|
|
69
|
+
+_configure_workflow() None
|
|
70
|
+
+_configure_strategies() dict
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
class UploadOrchestrator {
|
|
74
|
+
+context: UploadContext
|
|
75
|
+
+step_registry: StepRegistry
|
|
76
|
+
+strategies: dict
|
|
77
|
+
+executed_steps: list
|
|
78
|
+
+current_step_index: int
|
|
79
|
+
+rollback_executed: bool
|
|
80
|
+
|
|
81
|
+
+execute() dict
|
|
82
|
+
+get_workflow_summary() dict
|
|
83
|
+
+get_executed_steps() list
|
|
84
|
+
+is_rollback_executed() bool
|
|
85
|
+
+_execute_step(step) StepResult
|
|
86
|
+
+_handle_step_failure(step, error) None
|
|
87
|
+
+_rollback_executed_steps() None
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
class UploadContext {
|
|
91
|
+
+params: dict
|
|
92
|
+
+run: UploadRun
|
|
93
|
+
+client: Any
|
|
94
|
+
+storage: Any
|
|
95
|
+
+pathlib_cwd: Path
|
|
96
|
+
+metadata: dict
|
|
97
|
+
+file_specifications: dict
|
|
98
|
+
+organized_files: list
|
|
99
|
+
+uploaded_files: list
|
|
100
|
+
+data_units: list
|
|
101
|
+
+metrics: dict
|
|
102
|
+
+errors: list
|
|
103
|
+
+strategies: dict
|
|
104
|
+
+rollback_data: dict
|
|
105
|
+
|
|
106
|
+
+update(result: StepResult) None
|
|
107
|
+
+get_result() dict
|
|
108
|
+
+has_errors() bool
|
|
109
|
+
+update_metrics(category, metrics) None
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
class StepRegistry {
|
|
113
|
+
+_steps: list
|
|
114
|
+
+register(step: BaseStep) None
|
|
115
|
+
+get_steps() list
|
|
116
|
+
+get_total_progress_weight() float
|
|
117
|
+
+clear() None
|
|
61
118
|
}
|
|
62
119
|
|
|
63
|
-
class
|
|
64
|
-
+
|
|
65
|
-
+
|
|
66
|
-
+
|
|
67
|
-
+
|
|
68
|
-
+
|
|
69
|
-
+
|
|
70
|
-
+MetricsRecord: BaseModel
|
|
120
|
+
class StrategyFactory {
|
|
121
|
+
+create_validation_strategy(params, context) BaseValidationStrategy
|
|
122
|
+
+create_file_discovery_strategy(params, context) BaseFileDiscoveryStrategy
|
|
123
|
+
+create_metadata_strategy(params, context) BaseMetadataStrategy
|
|
124
|
+
+create_upload_strategy(params, context) BaseUploadStrategy
|
|
125
|
+
+create_data_unit_strategy(params, context) BaseDataUnitStrategy
|
|
126
|
+
+get_available_strategies() dict
|
|
71
127
|
}
|
|
72
128
|
|
|
73
|
-
class
|
|
129
|
+
class BaseStep {
|
|
130
|
+
<<abstract>>
|
|
74
131
|
+name: str
|
|
75
|
-
+
|
|
76
|
-
+
|
|
77
|
-
+
|
|
78
|
-
+
|
|
79
|
-
+
|
|
80
|
-
+
|
|
81
|
-
+
|
|
82
|
-
+max_file_size_mb: int = 50
|
|
83
|
-
+creating_data_unit_batch_size: int = 100
|
|
84
|
-
+use_async_upload: bool = True
|
|
85
|
-
|
|
86
|
-
+check_storage_exists(value) str
|
|
87
|
-
+check_collection_exists(value) str
|
|
88
|
-
+check_project_exists(value) str
|
|
132
|
+
+progress_weight: float
|
|
133
|
+
+execute(context: UploadContext) StepResult
|
|
134
|
+
+can_skip(context: UploadContext) bool
|
|
135
|
+
+rollback(context: UploadContext) None
|
|
136
|
+
+create_success_result(data) StepResult
|
|
137
|
+
+create_error_result(error) StepResult
|
|
138
|
+
+create_skip_result() StepResult
|
|
89
139
|
}
|
|
90
140
|
|
|
91
141
|
class ExcelSecurityConfig {
|
|
142
|
+
+max_file_size_mb: int = 10
|
|
143
|
+
+max_rows: int = 100000
|
|
144
|
+
+max_columns: int = 50
|
|
145
|
+
+max_file_size_bytes: int
|
|
92
146
|
+MAX_FILE_SIZE_MB: int
|
|
93
147
|
+MAX_FILE_SIZE_BYTES: int
|
|
94
|
-
+MAX_MEMORY_USAGE_MB: int
|
|
95
|
-
+MAX_MEMORY_USAGE_BYTES: int
|
|
96
148
|
+MAX_ROWS: int
|
|
97
149
|
+MAX_COLUMNS: int
|
|
98
|
-
+
|
|
99
|
-
|
|
100
|
-
|
|
150
|
+
+from_action_config(action_config) ExcelSecurityConfig
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
class StepResult {
|
|
154
|
+
+success: bool
|
|
155
|
+
+data: dict
|
|
156
|
+
+error: str
|
|
157
|
+
+rollback_data: dict
|
|
158
|
+
+skipped: bool
|
|
159
|
+
+original_exception: Exception
|
|
160
|
+
+timestamp: datetime
|
|
161
|
+
}
|
|
162
|
+
|
|
163
|
+
%% Strategy Base Classes
|
|
164
|
+
class BaseValidationStrategy {
|
|
165
|
+
<<abstract>>
|
|
166
|
+
+validate_files(files, context) bool
|
|
167
|
+
+validate_security(file_path) bool
|
|
168
|
+
}
|
|
169
|
+
|
|
170
|
+
class BaseFileDiscoveryStrategy {
|
|
171
|
+
<<abstract>>
|
|
172
|
+
+discover_files(path, context) list
|
|
173
|
+
+organize_files(files, specs, context) list
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
class BaseMetadataStrategy {
|
|
177
|
+
<<abstract>>
|
|
178
|
+
+process_metadata(context) dict
|
|
179
|
+
+extract_metadata(file_path) dict
|
|
180
|
+
}
|
|
181
|
+
|
|
182
|
+
class BaseUploadStrategy {
|
|
183
|
+
<<abstract>>
|
|
184
|
+
+upload_files(files, context) list
|
|
185
|
+
+upload_batch(batch, context) list
|
|
101
186
|
}
|
|
102
187
|
|
|
103
|
-
class
|
|
104
|
-
|
|
105
|
-
+
|
|
106
|
-
+
|
|
188
|
+
class BaseDataUnitStrategy {
|
|
189
|
+
<<abstract>>
|
|
190
|
+
+generate_data_units(files, context) list
|
|
191
|
+
+create_data_unit_batch(batch, context) list
|
|
107
192
|
}
|
|
108
193
|
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
+
|
|
112
|
-
+
|
|
113
|
-
+EXCEL_PARSING_ERROR: str
|
|
114
|
-
+FILES_DISCOVERED: str
|
|
115
|
-
+UPLOADING_DATA_FILES: str
|
|
116
|
-
+GENERATING_DATA_UNITS: str
|
|
117
|
-
+IMPORT_COMPLETED: str
|
|
194
|
+
%% Workflow Steps
|
|
195
|
+
class InitializeStep {
|
|
196
|
+
+name = "initialize"
|
|
197
|
+
+progress_weight = 0.05
|
|
118
198
|
}
|
|
119
199
|
|
|
120
|
-
class
|
|
121
|
-
+
|
|
122
|
-
+
|
|
200
|
+
class ProcessMetadataStep {
|
|
201
|
+
+name = "process_metadata"
|
|
202
|
+
+progress_weight = 0.05
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
class AnalyzeCollectionStep {
|
|
206
|
+
+name = "analyze_collection"
|
|
207
|
+
+progress_weight = 0.05
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
class OrganizeFilesStep {
|
|
211
|
+
+name = "organize_files"
|
|
212
|
+
+progress_weight = 0.10
|
|
213
|
+
}
|
|
214
|
+
|
|
215
|
+
class ValidateFilesStep {
|
|
216
|
+
+name = "validate_files"
|
|
217
|
+
+progress_weight = 0.05
|
|
218
|
+
}
|
|
219
|
+
|
|
220
|
+
class UploadFilesStep {
|
|
221
|
+
+name = "upload_files"
|
|
222
|
+
+progress_weight = 0.30
|
|
223
|
+
}
|
|
224
|
+
|
|
225
|
+
class GenerateDataUnitsStep {
|
|
226
|
+
+name = "generate_data_units"
|
|
227
|
+
+progress_weight = 0.35
|
|
228
|
+
}
|
|
229
|
+
|
|
230
|
+
class CleanupStep {
|
|
231
|
+
+name = "cleanup"
|
|
232
|
+
+progress_weight = 0.05
|
|
123
233
|
}
|
|
124
234
|
|
|
125
235
|
%% Relationships
|
|
126
236
|
UploadAction --> UploadRun : uses
|
|
127
237
|
UploadAction --> UploadParams : validates with
|
|
128
238
|
UploadAction --> ExcelSecurityConfig : configures
|
|
129
|
-
UploadAction -->
|
|
239
|
+
UploadAction --> UploadOrchestrator : creates and executes
|
|
240
|
+
UploadAction --> StrategyFactory : configures strategies
|
|
241
|
+
UploadAction --> StepRegistry : manages workflow steps
|
|
130
242
|
UploadRun --> LogCode : logs with
|
|
131
243
|
UploadRun --> UploadStatus : tracks status
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
244
|
+
UploadOrchestrator --> UploadContext : coordinates state
|
|
245
|
+
UploadOrchestrator --> StepRegistry : executes steps from
|
|
246
|
+
UploadOrchestrator --> BaseStep : executes
|
|
247
|
+
BaseStep --> StepResult : returns
|
|
248
|
+
UploadContext --> StepResult : updates with
|
|
249
|
+
StrategyFactory --> BaseValidationStrategy : creates
|
|
250
|
+
StrategyFactory --> BaseFileDiscoveryStrategy : creates
|
|
251
|
+
StrategyFactory --> BaseMetadataStrategy : creates
|
|
252
|
+
StrategyFactory --> BaseUploadStrategy : creates
|
|
253
|
+
StrategyFactory --> BaseDataUnitStrategy : creates
|
|
254
|
+
StepRegistry --> BaseStep : contains
|
|
255
|
+
|
|
256
|
+
%% Step inheritance
|
|
257
|
+
InitializeStep --|> BaseStep : extends
|
|
258
|
+
ProcessMetadataStep --|> BaseStep : extends
|
|
259
|
+
AnalyzeCollectionStep --|> BaseStep : extends
|
|
260
|
+
OrganizeFilesStep --|> BaseStep : extends
|
|
261
|
+
ValidateFilesStep --|> BaseStep : extends
|
|
262
|
+
UploadFilesStep --|> BaseStep : extends
|
|
263
|
+
GenerateDataUnitsStep --|> BaseStep : extends
|
|
264
|
+
CleanupStep --|> BaseStep : extends
|
|
265
|
+
|
|
266
|
+
%% Note: Class styling defined above - Mermaid will apply based on classDef definitions
|
|
135
267
|
```
|
|
136
268
|
|
|
137
|
-
###
|
|
269
|
+
### Step-Based Workflow Execution
|
|
270
|
+
|
|
271
|
+
The refactored architecture uses a step-based workflow coordinated by the UploadOrchestrator. Each step has a defined responsibility and progress weight.
|
|
272
|
+
|
|
273
|
+
#### Workflow Steps Overview
|
|
138
274
|
|
|
139
|
-
|
|
275
|
+
| Step | Name | Weight | Responsibility |
|
|
276
|
+
| ---- | ------------------- | ------ | -------------------------------------------- |
|
|
277
|
+
| 1 | Initialize | 5% | Setup storage, pathlib, and basic validation |
|
|
278
|
+
| 2 | Process Metadata | 5% | Handle Excel metadata if provided |
|
|
279
|
+
| 3 | Analyze Collection | 5% | Retrieve and validate data collection specs |
|
|
280
|
+
| 4 | Organize Files | 10% | Discover and organize files by type |
|
|
281
|
+
| 5 | Validate Files | 5% | Security and content validation |
|
|
282
|
+
| 6 | Upload Files | 30% | Upload files to storage |
|
|
283
|
+
| 7 | Generate Data Units | 35% | Create data units from uploaded files |
|
|
284
|
+
| 8 | Cleanup | 5% | Clean temporary resources |
|
|
285
|
+
|
|
286
|
+
#### Execution Flow
|
|
140
287
|
|
|
141
288
|
```mermaid
|
|
142
289
|
flowchart TD
|
|
143
290
|
%% Start
|
|
144
|
-
A[Upload Action Started] --> B[
|
|
145
|
-
B --> C[
|
|
146
|
-
C --> D[
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
E
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
R --> S[Create Data Unit Batch]
|
|
176
|
-
S --> T{More Units?}
|
|
177
|
-
T -->|Yes| S
|
|
178
|
-
T -->|No| U[Complete Upload]
|
|
179
|
-
|
|
180
|
-
%% Completion
|
|
181
|
-
U --> V[Update Metrics]
|
|
182
|
-
V --> W[Log Results]
|
|
183
|
-
W --> X[Return Summary]
|
|
291
|
+
A["🚀 Upload Action Started"] --> B["📋 Create UploadContext"]
|
|
292
|
+
B --> C["⚙️ Configure Strategies"]
|
|
293
|
+
C --> D["📝 Register Workflow Steps"]
|
|
294
|
+
D --> E["🎯 Create UploadOrchestrator"]
|
|
295
|
+
|
|
296
|
+
%% Strategy Injection
|
|
297
|
+
E --> F["💉 Inject Strategies into Context"]
|
|
298
|
+
F --> G["📊 Initialize Progress Tracking"]
|
|
299
|
+
|
|
300
|
+
%% Step Execution Loop
|
|
301
|
+
G --> H["🔄 Start Step Execution Loop"]
|
|
302
|
+
H --> I["📍 Get Next Step"]
|
|
303
|
+
I --> J{"🤔 Can Step be Skipped?"}
|
|
304
|
+
J -->|Yes| K["⏭️ Skip Step"]
|
|
305
|
+
J -->|No| L["▶️ Execute Step"]
|
|
306
|
+
|
|
307
|
+
%% Step Execution
|
|
308
|
+
L --> M{"✅ Step Successful?"}
|
|
309
|
+
M -->|Yes| N["📈 Update Progress"]
|
|
310
|
+
M -->|No| O["❌ Handle Step Failure"]
|
|
311
|
+
|
|
312
|
+
%% Success Path
|
|
313
|
+
N --> P["💾 Store Step Result"]
|
|
314
|
+
P --> Q["📝 Add to Executed Steps"]
|
|
315
|
+
Q --> R{"🏁 More Steps?"}
|
|
316
|
+
R -->|Yes| I
|
|
317
|
+
R -->|No| S["🎉 Workflow Complete"]
|
|
318
|
+
|
|
319
|
+
%% Skip Path
|
|
320
|
+
K --> T["📊 Update Progress (Skip)"]
|
|
321
|
+
T --> R
|
|
184
322
|
|
|
185
323
|
%% Error Handling
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
%% Apply styles
|
|
197
|
-
classDef startNode fill:#
|
|
198
|
-
classDef processNode fill:#
|
|
199
|
-
classDef decisionNode fill:#
|
|
200
|
-
classDef
|
|
201
|
-
classDef
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
class B,
|
|
205
|
-
class
|
|
206
|
-
class
|
|
207
|
-
class
|
|
324
|
+
O --> U["🔙 Start Rollback Process"]
|
|
325
|
+
U --> V["⏪ Rollback Executed Steps"]
|
|
326
|
+
V --> W["📝 Log Rollback Results"]
|
|
327
|
+
W --> X["💥 Propagate Exception"]
|
|
328
|
+
|
|
329
|
+
%% Final Results
|
|
330
|
+
S --> Y["📊 Collect Final Metrics"]
|
|
331
|
+
Y --> Z["📋 Generate Result Summary"]
|
|
332
|
+
Z --> AA["🔄 Return to UploadAction"]
|
|
333
|
+
|
|
334
|
+
%% Apply styles - Light/Dark mode compatible
|
|
335
|
+
classDef startNode fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
|
|
336
|
+
classDef processNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
|
|
337
|
+
classDef decisionNode fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000000
|
|
338
|
+
classDef successNode fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
|
|
339
|
+
classDef errorNode fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
|
|
340
|
+
classDef stepNode fill:#f0f4c3,stroke:#689f38,stroke-width:1px,color:#000000
|
|
341
|
+
|
|
342
|
+
class A,B,E startNode
|
|
343
|
+
class C,D,F,G,H,I,L,N,P,Q,T,Y,Z,AA processNode
|
|
344
|
+
class J,M,R decisionNode
|
|
345
|
+
class K,S successNode
|
|
346
|
+
class O,U,V,W,X errorNode
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
#### Strategy Integration Points
|
|
350
|
+
|
|
351
|
+
Strategies are injected into the workflow at specific points:
|
|
352
|
+
|
|
353
|
+
- **Validation Strategy**: Used by ValidateFilesStep
|
|
354
|
+
- **File Discovery Strategy**: Used by OrganizeFilesStep
|
|
355
|
+
- **Metadata Strategy**: Used by ProcessMetadataStep
|
|
356
|
+
- **Upload Strategy**: Used by UploadFilesStep
|
|
357
|
+
- **Data Unit Strategy**: Used by GenerateDataUnitsStep
|
|
358
|
+
|
|
359
|
+
#### Error Handling and Rollback
|
|
360
|
+
|
|
361
|
+
The orchestrator provides automatic rollback functionality:
|
|
362
|
+
|
|
363
|
+
1. **Exception Capture**: Preserves original exceptions for debugging
|
|
364
|
+
2. **Rollback Execution**: Calls rollback() on all successfully executed steps in reverse order
|
|
365
|
+
3. **Graceful Degradation**: Continues rollback even if individual step rollbacks fail
|
|
366
|
+
4. **State Preservation**: Maintains execution state for post-failure analysis
|
|
367
|
+
|
|
368
|
+
## Development Guide
|
|
369
|
+
|
|
370
|
+
This section provides comprehensive guidance for extending the upload action with custom strategies and workflow steps.
|
|
371
|
+
|
|
372
|
+
### Creating Custom Strategies
|
|
373
|
+
|
|
374
|
+
Strategies implement specific behaviors for different aspects of the upload process. Each strategy type has a well-defined interface.
|
|
375
|
+
|
|
376
|
+
#### Custom Validation Strategy
|
|
377
|
+
|
|
378
|
+
```python
|
|
379
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.validation.base import BaseValidationStrategy
|
|
380
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext
|
|
381
|
+
|
|
382
|
+
class CustomValidationStrategy(BaseValidationStrategy):
|
|
383
|
+
"""Custom validation strategy with advanced security checks."""
|
|
384
|
+
|
|
385
|
+
def validate_files(self, files: List[Path], context: UploadContext) -> bool:
|
|
386
|
+
"""Validate files using custom business rules."""
|
|
387
|
+
for file_path in files:
|
|
388
|
+
# Custom validation logic
|
|
389
|
+
if not self._validate_custom_rules(file_path):
|
|
390
|
+
return False
|
|
391
|
+
|
|
392
|
+
# Call security validation
|
|
393
|
+
if not self.validate_security(file_path):
|
|
394
|
+
return False
|
|
395
|
+
return True
|
|
396
|
+
|
|
397
|
+
def validate_security(self, file_path: Path) -> bool:
|
|
398
|
+
"""Custom security validation."""
|
|
399
|
+
# Implement custom security checks
|
|
400
|
+
if file_path.suffix in ['.exe', '.bat', '.sh']:
|
|
401
|
+
return False
|
|
402
|
+
|
|
403
|
+
# Check file size
|
|
404
|
+
if file_path.stat().st_size > 100 * 1024 * 1024: # 100MB
|
|
405
|
+
return False
|
|
406
|
+
|
|
407
|
+
return True
|
|
408
|
+
|
|
409
|
+
def _validate_custom_rules(self, file_path: Path) -> bool:
|
|
410
|
+
"""Implement domain-specific validation rules."""
|
|
411
|
+
# Custom business logic
|
|
412
|
+
return True
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
#### Custom File Discovery Strategy
|
|
416
|
+
|
|
417
|
+
```python
|
|
418
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.file_discovery.base import BaseFileDiscoveryStrategy
|
|
419
|
+
from pathlib import Path
|
|
420
|
+
from typing import List, Dict, Any
|
|
421
|
+
|
|
422
|
+
class CustomFileDiscoveryStrategy(BaseFileDiscoveryStrategy):
|
|
423
|
+
"""Custom file discovery with advanced filtering."""
|
|
424
|
+
|
|
425
|
+
def discover_files(self, path: Path, context: UploadContext) -> List[Path]:
|
|
426
|
+
"""Discover files with custom filtering rules."""
|
|
427
|
+
files = []
|
|
428
|
+
|
|
429
|
+
if context.get_param('is_recursive', False):
|
|
430
|
+
files = list(path.rglob('*'))
|
|
431
|
+
else:
|
|
432
|
+
files = list(path.iterdir())
|
|
433
|
+
|
|
434
|
+
# Apply custom filtering
|
|
435
|
+
return self._apply_custom_filters(files, context)
|
|
436
|
+
|
|
437
|
+
def organize_files(self, files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]:
|
|
438
|
+
"""Organize files using custom categorization."""
|
|
439
|
+
organized = []
|
|
440
|
+
|
|
441
|
+
for file_path in files:
|
|
442
|
+
if file_path.is_file():
|
|
443
|
+
category = self._determine_category(file_path)
|
|
444
|
+
organized.append({
|
|
445
|
+
'file_path': file_path,
|
|
446
|
+
'category': category,
|
|
447
|
+
'metadata': self._extract_file_metadata(file_path)
|
|
448
|
+
})
|
|
449
|
+
|
|
450
|
+
return organized
|
|
451
|
+
|
|
452
|
+
def _apply_custom_filters(self, files: List[Path], context: UploadContext) -> List[Path]:
|
|
453
|
+
"""Apply domain-specific file filters."""
|
|
454
|
+
filtered = []
|
|
455
|
+
for file_path in files:
|
|
456
|
+
if self._should_include_file(file_path):
|
|
457
|
+
filtered.append(file_path)
|
|
458
|
+
return filtered
|
|
459
|
+
|
|
460
|
+
def _determine_category(self, file_path: Path) -> str:
|
|
461
|
+
"""Determine file category using custom logic."""
|
|
462
|
+
# Custom categorization logic
|
|
463
|
+
ext = file_path.suffix.lower()
|
|
464
|
+
if ext in ['.jpg', '.png', '.gif']:
|
|
465
|
+
return 'images'
|
|
466
|
+
elif ext in ['.pdf', '.doc', '.docx']:
|
|
467
|
+
return 'documents'
|
|
468
|
+
else:
|
|
469
|
+
return 'other'
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
#### Custom Upload Strategy
|
|
473
|
+
|
|
474
|
+
```python
|
|
475
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.upload.base import BaseUploadStrategy
|
|
476
|
+
from typing import List, Dict, Any
|
|
477
|
+
|
|
478
|
+
class CustomUploadStrategy(BaseUploadStrategy):
|
|
479
|
+
"""Custom upload strategy with advanced retry logic."""
|
|
480
|
+
|
|
481
|
+
def upload_files(self, files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
|
|
482
|
+
"""Upload files with custom batching and retry logic."""
|
|
483
|
+
uploaded_files = []
|
|
484
|
+
batch_size = context.get_param('upload_batch_size', 10)
|
|
485
|
+
|
|
486
|
+
# Process in custom batches
|
|
487
|
+
for i in range(0, len(files), batch_size):
|
|
488
|
+
batch = files[i:i + batch_size]
|
|
489
|
+
batch_results = self.upload_batch(batch, context)
|
|
490
|
+
uploaded_files.extend(batch_results)
|
|
491
|
+
|
|
492
|
+
return uploaded_files
|
|
493
|
+
|
|
494
|
+
def upload_batch(self, batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
|
|
495
|
+
"""Upload a batch of files with retry logic."""
|
|
496
|
+
results = []
|
|
497
|
+
|
|
498
|
+
for file_info in batch:
|
|
499
|
+
max_retries = 3
|
|
500
|
+
for attempt in range(max_retries):
|
|
501
|
+
try:
|
|
502
|
+
result = self._upload_single_file(file_info, context)
|
|
503
|
+
results.append(result)
|
|
504
|
+
break
|
|
505
|
+
except Exception as e:
|
|
506
|
+
if attempt == max_retries - 1:
|
|
507
|
+
# Final attempt failed
|
|
508
|
+
context.add_error(f"Failed to upload {file_info['file_path']}: {e}")
|
|
509
|
+
else:
|
|
510
|
+
# Wait before retry
|
|
511
|
+
time.sleep(2 ** attempt)
|
|
512
|
+
|
|
513
|
+
return results
|
|
514
|
+
|
|
515
|
+
def _upload_single_file(self, file_info: Dict[str, Any], context: UploadContext) -> Dict[str, Any]:
|
|
516
|
+
"""Upload a single file with custom logic."""
|
|
517
|
+
# Custom upload implementation
|
|
518
|
+
file_path = file_info['file_path']
|
|
519
|
+
|
|
520
|
+
# Use the storage from context
|
|
521
|
+
storage = context.storage
|
|
522
|
+
|
|
523
|
+
# Custom upload logic here
|
|
524
|
+
uploaded_file = {
|
|
525
|
+
'file_path': str(file_path),
|
|
526
|
+
'storage_path': f"uploads/{file_path.name}",
|
|
527
|
+
'size': file_path.stat().st_size,
|
|
528
|
+
'checksum': self._calculate_checksum(file_path)
|
|
529
|
+
}
|
|
530
|
+
|
|
531
|
+
return uploaded_file
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
### Creating Custom Workflow Steps
|
|
535
|
+
|
|
536
|
+
Custom workflow steps extend the base step class and implement the required interface.
|
|
537
|
+
|
|
538
|
+
#### Custom Processing Step
|
|
539
|
+
|
|
540
|
+
```python
|
|
541
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.steps.base import BaseStep
|
|
542
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext, StepResult
|
|
543
|
+
from pathlib import Path
|
|
544
|
+
|
|
545
|
+
class CustomProcessingStep(BaseStep):
|
|
546
|
+
"""Custom processing step for specialized file handling."""
|
|
547
|
+
|
|
548
|
+
@property
|
|
549
|
+
def name(self) -> str:
|
|
550
|
+
return 'custom_processing'
|
|
551
|
+
|
|
552
|
+
@property
|
|
553
|
+
def progress_weight(self) -> float:
|
|
554
|
+
return 0.15 # 15% of total workflow
|
|
555
|
+
|
|
556
|
+
def execute(self, context: UploadContext) -> StepResult:
|
|
557
|
+
"""Execute custom processing logic."""
|
|
558
|
+
try:
|
|
559
|
+
# Custom processing logic
|
|
560
|
+
processed_files = self._process_files(context)
|
|
561
|
+
|
|
562
|
+
# Update context with results
|
|
563
|
+
return self.create_success_result({
|
|
564
|
+
'processed_files': processed_files,
|
|
565
|
+
'processing_stats': self._get_processing_stats()
|
|
566
|
+
})
|
|
567
|
+
|
|
568
|
+
except Exception as e:
|
|
569
|
+
return self.create_error_result(f'Custom processing failed: {str(e)}')
|
|
570
|
+
|
|
571
|
+
def can_skip(self, context: UploadContext) -> bool:
|
|
572
|
+
"""Determine if step can be skipped."""
|
|
573
|
+
# Skip if no files to process
|
|
574
|
+
return len(context.organized_files) == 0
|
|
575
|
+
|
|
576
|
+
def rollback(self, context: UploadContext) -> None:
|
|
577
|
+
"""Rollback custom processing operations."""
|
|
578
|
+
# Clean up any resources created during processing
|
|
579
|
+
self._cleanup_processing_resources(context)
|
|
580
|
+
|
|
581
|
+
def _process_files(self, context: UploadContext) -> List[Dict]:
|
|
582
|
+
"""Implement custom file processing."""
|
|
583
|
+
processed = []
|
|
584
|
+
|
|
585
|
+
for file_info in context.organized_files:
|
|
586
|
+
# Custom processing logic
|
|
587
|
+
result = self._process_single_file(file_info)
|
|
588
|
+
processed.append(result)
|
|
589
|
+
|
|
590
|
+
return processed
|
|
591
|
+
|
|
592
|
+
def _process_single_file(self, file_info: Dict) -> Dict:
|
|
593
|
+
"""Process a single file."""
|
|
594
|
+
# Custom processing implementation
|
|
595
|
+
return {
|
|
596
|
+
'original': file_info,
|
|
597
|
+
'processed': True,
|
|
598
|
+
'timestamp': datetime.now()
|
|
599
|
+
}
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
### Strategy Factory Extension
|
|
603
|
+
|
|
604
|
+
To make custom strategies available, extend the StrategyFactory:
|
|
605
|
+
|
|
606
|
+
```python
|
|
607
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.factory import StrategyFactory
|
|
608
|
+
|
|
609
|
+
class CustomStrategyFactory(StrategyFactory):
|
|
610
|
+
"""Extended factory with custom strategies."""
|
|
611
|
+
|
|
612
|
+
def create_validation_strategy(self, params: Dict, context=None):
|
|
613
|
+
"""Create validation strategy with custom options."""
|
|
614
|
+
validation_type = params.get('custom_validation_type', 'default')
|
|
615
|
+
|
|
616
|
+
if validation_type == 'strict':
|
|
617
|
+
return CustomValidationStrategy()
|
|
618
|
+
else:
|
|
619
|
+
return super().create_validation_strategy(params, context)
|
|
620
|
+
|
|
621
|
+
def create_file_discovery_strategy(self, params: Dict, context=None):
|
|
622
|
+
"""Create file discovery strategy with custom options."""
|
|
623
|
+
discovery_mode = params.get('discovery_mode', 'default')
|
|
624
|
+
|
|
625
|
+
if discovery_mode == 'advanced':
|
|
626
|
+
return CustomFileDiscoveryStrategy()
|
|
627
|
+
else:
|
|
628
|
+
return super().create_file_discovery_strategy(params, context)
|
|
629
|
+
```
|
|
630
|
+
|
|
631
|
+
### Custom Upload Action
|
|
632
|
+
|
|
633
|
+
For comprehensive customization, extend the UploadAction itself:
|
|
634
|
+
|
|
635
|
+
```python
|
|
636
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
637
|
+
from synapse_sdk.plugins.categories.decorators import register_action
|
|
638
|
+
|
|
639
|
+
@register_action
|
|
640
|
+
class CustomUploadAction(UploadAction):
|
|
641
|
+
"""Custom upload action with extended workflow."""
|
|
642
|
+
|
|
643
|
+
name = 'custom_upload'
|
|
644
|
+
|
|
645
|
+
def __init__(self, *args, **kwargs):
|
|
646
|
+
super().__init__(*args, **kwargs)
|
|
647
|
+
# Use custom strategy factory
|
|
648
|
+
self.strategy_factory = CustomStrategyFactory()
|
|
649
|
+
|
|
650
|
+
def _configure_workflow(self) -> None:
|
|
651
|
+
"""Configure custom workflow with additional steps."""
|
|
652
|
+
# Register standard steps
|
|
653
|
+
super()._configure_workflow()
|
|
654
|
+
|
|
655
|
+
# Add custom processing step
|
|
656
|
+
self.step_registry.register(CustomProcessingStep())
|
|
657
|
+
|
|
658
|
+
def _configure_strategies(self, context=None) -> Dict[str, Any]:
|
|
659
|
+
"""Configure strategies with custom parameters."""
|
|
660
|
+
strategies = super()._configure_strategies(context)
|
|
661
|
+
|
|
662
|
+
# Add custom strategy
|
|
663
|
+
strategies['custom_processing'] = self._create_custom_processing_strategy()
|
|
664
|
+
|
|
665
|
+
return strategies
|
|
666
|
+
|
|
667
|
+
def _create_custom_processing_strategy(self):
|
|
668
|
+
"""Create custom processing strategy."""
|
|
669
|
+
return CustomProcessingStrategy(self.params)
|
|
670
|
+
```
|
|
671
|
+
|
|
672
|
+
### Testing Custom Components
|
|
673
|
+
|
|
674
|
+
#### Testing Custom Strategies
|
|
675
|
+
|
|
676
|
+
```python
|
|
677
|
+
import pytest
|
|
678
|
+
from unittest.mock import Mock
|
|
679
|
+
from pathlib import Path
|
|
680
|
+
|
|
681
|
+
class TestCustomValidationStrategy:
|
|
682
|
+
|
|
683
|
+
def setup_method(self):
|
|
684
|
+
self.strategy = CustomValidationStrategy()
|
|
685
|
+
self.context = Mock()
|
|
686
|
+
|
|
687
|
+
def test_validate_files_success(self):
|
|
688
|
+
"""Test successful file validation."""
|
|
689
|
+
files = [Path('/test/file1.txt'), Path('/test/file2.jpg')]
|
|
690
|
+
result = self.strategy.validate_files(files, self.context)
|
|
691
|
+
assert result is True
|
|
692
|
+
|
|
693
|
+
def test_validate_files_security_failure(self):
|
|
694
|
+
"""Test validation failure for security reasons."""
|
|
695
|
+
files = [Path('/test/malware.exe')]
|
|
696
|
+
result = self.strategy.validate_files(files, self.context)
|
|
697
|
+
assert result is False
|
|
698
|
+
|
|
699
|
+
def test_validate_large_file_failure(self):
|
|
700
|
+
"""Test validation failure for large files."""
|
|
701
|
+
# Mock file stat to return large size
|
|
702
|
+
large_file = Mock(spec=Path)
|
|
703
|
+
large_file.suffix = '.txt'
|
|
704
|
+
large_file.stat.return_value.st_size = 200 * 1024 * 1024 # 200MB
|
|
705
|
+
|
|
706
|
+
result = self.strategy.validate_security(large_file)
|
|
707
|
+
assert result is False
|
|
708
|
+
```
|
|
709
|
+
|
|
710
|
+
#### Testing Custom Steps
|
|
711
|
+
|
|
712
|
+
```python
|
|
713
|
+
class TestCustomProcessingStep:
|
|
714
|
+
|
|
715
|
+
def setup_method(self):
|
|
716
|
+
self.step = CustomProcessingStep()
|
|
717
|
+
self.context = Mock()
|
|
718
|
+
self.context.organized_files = [
|
|
719
|
+
{'file_path': '/test/file1.txt'},
|
|
720
|
+
{'file_path': '/test/file2.jpg'}
|
|
721
|
+
]
|
|
722
|
+
|
|
723
|
+
def test_execute_success(self):
|
|
724
|
+
"""Test successful step execution."""
|
|
725
|
+
result = self.step.execute(self.context)
|
|
726
|
+
|
|
727
|
+
assert result.success is True
|
|
728
|
+
assert 'processed_files' in result.data
|
|
729
|
+
assert len(result.data['processed_files']) == 2
|
|
730
|
+
|
|
731
|
+
def test_can_skip_with_no_files(self):
|
|
732
|
+
"""Test step skipping logic."""
|
|
733
|
+
self.context.organized_files = []
|
|
734
|
+
assert self.step.can_skip(self.context) is True
|
|
735
|
+
|
|
736
|
+
def test_rollback_cleanup(self):
|
|
737
|
+
"""Test rollback cleanup."""
|
|
738
|
+
# This should not raise an exception
|
|
739
|
+
self.step.rollback(self.context)
|
|
208
740
|
```
|
|
209
741
|
|
|
210
742
|
## Upload Parameters
|
|
@@ -213,12 +745,12 @@ The upload action uses `UploadParams` for comprehensive parameter validation:
|
|
|
213
745
|
|
|
214
746
|
### Required Parameters
|
|
215
747
|
|
|
216
|
-
| Parameter
|
|
217
|
-
|
|
|
218
|
-
| `name`
|
|
219
|
-
| `path`
|
|
220
|
-
| `storage`
|
|
221
|
-
| `
|
|
748
|
+
| Parameter | Type | Description | Validation |
|
|
749
|
+
| ----------------- | ----- | -------------------------- | ------------------ |
|
|
750
|
+
| `name` | `str` | Human-readable upload name | Must be non-blank |
|
|
751
|
+
| `path` | `str` | Source file/directory path | Must be valid path |
|
|
752
|
+
| `storage` | `int` | Target storage ID | Must exist via API |
|
|
753
|
+
| `data_collection` | `int` | Data collection ID | Must exist via API |
|
|
222
754
|
|
|
223
755
|
### Optional Parameters
|
|
224
756
|
|
|
@@ -252,53 +784,136 @@ def check_storage_exists(cls, value: str, info) -> str:
|
|
|
252
784
|
|
|
253
785
|
## Excel Metadata Processing
|
|
254
786
|
|
|
255
|
-
Upload plugins
|
|
787
|
+
Upload plugins provide advanced Excel metadata processing with comprehensive filename matching, flexible header support, and optimized performance:
|
|
256
788
|
|
|
257
789
|
### Excel File Format
|
|
258
790
|
|
|
259
|
-
The Excel file
|
|
791
|
+
The Excel file supports flexible header formats and comprehensive filename matching:
|
|
792
|
+
|
|
793
|
+
#### Supported Header Formats
|
|
260
794
|
|
|
795
|
+
Both header formats are supported with case-insensitive matching:
|
|
796
|
+
|
|
797
|
+
**Option 1: "filename" header**
|
|
261
798
|
| filename | category | description | custom_field |
|
|
262
799
|
| ---------- | -------- | ------------------ | ------------ |
|
|
263
800
|
| image1.jpg | nature | Mountain landscape | high_res |
|
|
264
801
|
| image2.png | urban | City skyline | processed |
|
|
265
802
|
|
|
803
|
+
**Option 2: "file_name" header**
|
|
804
|
+
| file_name | category | description | custom_field |
|
|
805
|
+
| ---------- | -------- | ------------------ | ------------ |
|
|
806
|
+
| image1.jpg | nature | Mountain landscape | high_res |
|
|
807
|
+
| image2.png | urban | City skyline | processed |
|
|
808
|
+
|
|
809
|
+
#### Filename Matching Strategy
|
|
810
|
+
|
|
811
|
+
The system uses a comprehensive 5-tier priority matching algorithm to associate files with metadata:
|
|
812
|
+
|
|
813
|
+
1. **Exact stem match** (highest priority): `image1` matches `image1.jpg`
|
|
814
|
+
2. **Exact filename match**: `image1.jpg` matches `image1.jpg`
|
|
815
|
+
3. **Metadata key stem match**: `path/image1.ext` stem matches `image1`
|
|
816
|
+
4. **Partial path matching**: `/uploads/image1.jpg` contains `image1`
|
|
817
|
+
5. **Full path matching**: Complete path matching for complex structures
|
|
818
|
+
|
|
819
|
+
This robust matching ensures metadata is correctly associated regardless of file organization or naming conventions.
|
|
820
|
+
|
|
266
821
|
### Security Validation
|
|
267
822
|
|
|
268
823
|
Excel files undergo comprehensive security validation:
|
|
269
824
|
|
|
270
825
|
```python
|
|
271
826
|
class ExcelSecurityConfig:
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
MAX_COLUMNS = 50 # Column count limit
|
|
276
|
-
MAX_FILENAME_LENGTH = 255 # Filename length limit
|
|
277
|
-
MAX_COLUMN_NAME_LENGTH = 100 # Column name length
|
|
278
|
-
MAX_METADATA_VALUE_LENGTH = 1000 # Metadata value length
|
|
827
|
+
max_file_size_mb: int = 10 # File size limit in MB
|
|
828
|
+
max_rows: int = 100000 # Row count limit
|
|
829
|
+
max_columns: int = 50 # Column count limit
|
|
279
830
|
```
|
|
280
831
|
|
|
281
|
-
|
|
832
|
+
#### Advanced Security Features
|
|
833
|
+
|
|
834
|
+
- **File format validation**: Checks Excel file signatures (PK for .xlsx, compound document for .xls)
|
|
835
|
+
- **Memory estimation**: Prevents memory exhaustion from oversized spreadsheets
|
|
836
|
+
- **Content sanitization**: Automatic truncation of overly long values
|
|
837
|
+
- **Error resilience**: Graceful handling of corrupted or inaccessible files
|
|
282
838
|
|
|
283
|
-
|
|
839
|
+
### Configuration via config.yaml
|
|
284
840
|
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
841
|
+
Security limits and processing options can be configured:
|
|
842
|
+
|
|
843
|
+
```yaml
|
|
844
|
+
actions:
|
|
845
|
+
upload:
|
|
846
|
+
excel_config:
|
|
847
|
+
max_file_size_mb: 10 # Maximum Excel file size in MB
|
|
848
|
+
max_rows: 100000 # Maximum number of rows allowed
|
|
849
|
+
max_columns: 50 # Maximum number of columns allowed
|
|
293
850
|
```
|
|
294
851
|
|
|
852
|
+
### Performance Optimizations
|
|
853
|
+
|
|
854
|
+
The Excel metadata processing includes several performance enhancements:
|
|
855
|
+
|
|
856
|
+
#### Metadata Indexing
|
|
857
|
+
- **O(1) hash lookups** for exact stem and filename matches
|
|
858
|
+
- **Pre-built indexes** for common matching patterns
|
|
859
|
+
- **Fallback algorithms** for complex path matching scenarios
|
|
860
|
+
|
|
861
|
+
#### Efficient Processing
|
|
862
|
+
- **Optimized row processing**: Skip empty rows early
|
|
863
|
+
- **Memory-conscious operation**: Process files in batches
|
|
864
|
+
- **Smart file discovery**: Cache path strings to avoid repeated conversions
|
|
865
|
+
|
|
295
866
|
### Metadata Processing Flow
|
|
296
867
|
|
|
297
|
-
1. **Security Validation**: File size,
|
|
298
|
-
2. **
|
|
299
|
-
3. **
|
|
300
|
-
4. **
|
|
301
|
-
5. **
|
|
868
|
+
1. **Security Validation**: File size, format, and content limits
|
|
869
|
+
2. **Header Validation**: Support for both "filename" and "file_name" with case-insensitive matching
|
|
870
|
+
3. **Index Building**: Create O(1) lookup structures for performance
|
|
871
|
+
4. **Content Processing**: Row-by-row metadata extraction with optimization
|
|
872
|
+
5. **Data Sanitization**: Automatic truncation and validation
|
|
873
|
+
6. **Pattern Matching**: 5-tier filename association algorithm
|
|
874
|
+
7. **Mapping Creation**: Optimized filename to metadata mapping
|
|
875
|
+
|
|
876
|
+
### Excel Metadata Parameter
|
|
877
|
+
|
|
878
|
+
You can specify a custom Excel metadata file path:
|
|
879
|
+
|
|
880
|
+
```python
|
|
881
|
+
params = {
|
|
882
|
+
"name": "Excel Metadata Upload",
|
|
883
|
+
"path": "/data/files",
|
|
884
|
+
"storage": 1,
|
|
885
|
+
"data_collection": 5,
|
|
886
|
+
"excel_metadata_path": "/data/custom_metadata.xlsx" # Custom Excel file
|
|
887
|
+
}
|
|
888
|
+
```
|
|
889
|
+
|
|
890
|
+
#### Path Resolution
|
|
891
|
+
- **Absolute paths**: Used directly if they exist and are accessible
|
|
892
|
+
- **Relative paths**: Resolved relative to the upload path
|
|
893
|
+
- **Default discovery**: Automatically searches for `meta.xlsx` or `meta.xls` if no path specified
|
|
894
|
+
- **Storage integration**: Uses storage configuration for proper path resolution
|
|
895
|
+
|
|
896
|
+
### Error Handling
|
|
897
|
+
|
|
898
|
+
Comprehensive error handling ensures robust operation:
|
|
899
|
+
|
|
900
|
+
```python
|
|
901
|
+
# Excel processing errors are handled gracefully
|
|
902
|
+
try:
|
|
903
|
+
metadata = process_excel_metadata(excel_path)
|
|
904
|
+
except ExcelSecurityError as e:
|
|
905
|
+
# Security violation - file too large, too many rows, etc.
|
|
906
|
+
log_security_violation(e)
|
|
907
|
+
except ExcelParsingError as e:
|
|
908
|
+
# Parsing failure - corrupted file, invalid format, etc.
|
|
909
|
+
log_parsing_error(e)
|
|
910
|
+
```
|
|
911
|
+
|
|
912
|
+
#### Error Recovery
|
|
913
|
+
- **Graceful degradation**: Continue processing with empty metadata if Excel fails
|
|
914
|
+
- **Detailed logging**: Specific error codes for different failure types
|
|
915
|
+
- **Path validation**: Comprehensive validation during parameter processing
|
|
916
|
+
- **Fallback behavior**: Smart defaults when metadata cannot be processed
|
|
302
917
|
|
|
303
918
|
## File Organization
|
|
304
919
|
|
|
@@ -405,20 +1020,97 @@ run.log_message_with_code(
|
|
|
405
1020
|
run.log_upload_event(LogCode.UPLOADING_DATA_FILES, batch_size)
|
|
406
1021
|
```
|
|
407
1022
|
|
|
1023
|
+
## Migration Guide
|
|
1024
|
+
|
|
1025
|
+
### From Legacy to Refactored Architecture
|
|
1026
|
+
|
|
1027
|
+
The upload action has been refactored using modern design patterns while maintaining **100% backward compatibility**. Existing code will continue to work without changes.
|
|
1028
|
+
|
|
1029
|
+
#### Key Changes
|
|
1030
|
+
|
|
1031
|
+
**Before (Legacy Monolithic):**
|
|
1032
|
+
|
|
1033
|
+
- Single 900+ line action class with all logic
|
|
1034
|
+
- Hard-coded behaviors for validation, file discovery, etc.
|
|
1035
|
+
- No extensibility or customization options
|
|
1036
|
+
- Manual error handling throughout
|
|
1037
|
+
|
|
1038
|
+
**After (Strategy/Facade Patterns):**
|
|
1039
|
+
|
|
1040
|
+
- Clean separation of concerns with 8 workflow steps
|
|
1041
|
+
- Pluggable strategies for different behaviors
|
|
1042
|
+
- Extensible architecture for custom implementations
|
|
1043
|
+
- Automatic rollback and comprehensive error handling
|
|
1044
|
+
|
|
1045
|
+
#### Backward Compatibility
|
|
1046
|
+
|
|
1047
|
+
```python
|
|
1048
|
+
# This legacy usage still works exactly the same
|
|
1049
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
1050
|
+
|
|
1051
|
+
params = {
|
|
1052
|
+
"name": "My Upload",
|
|
1053
|
+
"path": "/data/files",
|
|
1054
|
+
"storage": 1,
|
|
1055
|
+
"data_collection": 5 # Changed from 'collection' to 'data_collection'
|
|
1056
|
+
}
|
|
1057
|
+
|
|
1058
|
+
action = UploadAction(params=params, plugin_config=config)
|
|
1059
|
+
result = action.start() # Works identically to before
|
|
1060
|
+
```
|
|
1061
|
+
|
|
1062
|
+
#### Enhanced Capabilities
|
|
1063
|
+
|
|
1064
|
+
The refactored architecture provides new capabilities:
|
|
1065
|
+
|
|
1066
|
+
```python
|
|
1067
|
+
# Get detailed workflow information
|
|
1068
|
+
action = UploadAction(params=params, plugin_config=config)
|
|
1069
|
+
workflow_info = action.get_workflow_summary()
|
|
1070
|
+
print(f"Configured with {workflow_info['step_count']} steps")
|
|
1071
|
+
print(f"Available strategies: {workflow_info['available_strategies']}")
|
|
1072
|
+
|
|
1073
|
+
# Execute and get detailed results
|
|
1074
|
+
result = action.start()
|
|
1075
|
+
print(f"Success: {result['success']}")
|
|
1076
|
+
print(f"Uploaded files: {result['uploaded_files_count']}")
|
|
1077
|
+
print(f"Generated data units: {result['generated_data_units_count']}")
|
|
1078
|
+
print(f"Errors: {result['errors']}")
|
|
1079
|
+
print(f"Metrics: {result['metrics']}")
|
|
1080
|
+
```
|
|
1081
|
+
|
|
1082
|
+
#### Parameter Changes
|
|
1083
|
+
|
|
1084
|
+
Only one parameter name changed:
|
|
1085
|
+
|
|
1086
|
+
| Legacy | Refactored | Status |
|
|
1087
|
+
| -------------------- | ----------------- | ------------------- |
|
|
1088
|
+
| `collection` | `data_collection` | **Required change** |
|
|
1089
|
+
| All other parameters | Unchanged | Fully compatible |
|
|
1090
|
+
|
|
1091
|
+
#### Benefits of Migration
|
|
1092
|
+
|
|
1093
|
+
- **Better Error Handling**: Automatic rollback on failures
|
|
1094
|
+
- **Progress Tracking**: Detailed progress metrics across workflow steps
|
|
1095
|
+
- **Extensibility**: Add custom strategies and steps
|
|
1096
|
+
- **Testing**: Better testability with mock-friendly architecture
|
|
1097
|
+
- **Maintainability**: Clean separation of concerns
|
|
1098
|
+
- **Performance**: More efficient resource management
|
|
1099
|
+
|
|
408
1100
|
## Usage Examples
|
|
409
1101
|
|
|
410
|
-
### Basic File Upload
|
|
1102
|
+
### Basic File Upload (Refactored Architecture)
|
|
411
1103
|
|
|
412
1104
|
```python
|
|
413
|
-
from synapse_sdk.plugins.categories.upload.actions.upload import UploadAction
|
|
1105
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
414
1106
|
|
|
415
|
-
# Basic upload configuration
|
|
1107
|
+
# Basic upload configuration with new architecture
|
|
416
1108
|
params = {
|
|
417
1109
|
"name": "Dataset Upload",
|
|
418
1110
|
"description": "Training dataset for ML model",
|
|
419
1111
|
"path": "/data/training_images",
|
|
420
1112
|
"storage": 1,
|
|
421
|
-
"
|
|
1113
|
+
"data_collection": 5, # Note: 'data_collection' instead of 'collection'
|
|
422
1114
|
"is_recursive": True,
|
|
423
1115
|
"max_file_size_mb": 100
|
|
424
1116
|
}
|
|
@@ -428,20 +1120,30 @@ action = UploadAction(
|
|
|
428
1120
|
plugin_config=plugin_config
|
|
429
1121
|
)
|
|
430
1122
|
|
|
431
|
-
|
|
1123
|
+
# Execute with automatic step-based workflow and rollback
|
|
1124
|
+
result = action.start()
|
|
1125
|
+
|
|
1126
|
+
# Enhanced result information
|
|
1127
|
+
print(f"Upload successful: {result['success']}")
|
|
432
1128
|
print(f"Uploaded {result['uploaded_files_count']} files")
|
|
433
|
-
print(f"
|
|
1129
|
+
print(f"Generated {result['generated_data_units_count']} data units")
|
|
1130
|
+
print(f"Workflow errors: {result['errors']}")
|
|
1131
|
+
|
|
1132
|
+
# Access detailed metrics
|
|
1133
|
+
workflow_metrics = result['metrics'].get('workflow', {})
|
|
1134
|
+
print(f"Total steps executed: {workflow_metrics.get('current_step', 0)}")
|
|
1135
|
+
print(f"Progress completed: {workflow_metrics.get('progress_percentage', 0)}%")
|
|
434
1136
|
```
|
|
435
1137
|
|
|
436
|
-
### Excel Metadata Upload
|
|
1138
|
+
### Excel Metadata Upload with Progress Tracking
|
|
437
1139
|
|
|
438
1140
|
```python
|
|
439
|
-
# Upload with Excel metadata
|
|
1141
|
+
# Upload with Excel metadata and progress monitoring
|
|
440
1142
|
params = {
|
|
441
1143
|
"name": "Annotated Dataset Upload",
|
|
442
1144
|
"path": "/data/images",
|
|
443
1145
|
"storage": 1,
|
|
444
|
-
"
|
|
1146
|
+
"data_collection": 5,
|
|
445
1147
|
"excel_metadata_path": "/data/metadata.xlsx",
|
|
446
1148
|
"is_recursive": False,
|
|
447
1149
|
"creating_data_unit_batch_size": 50
|
|
@@ -452,37 +1154,152 @@ action = UploadAction(
|
|
|
452
1154
|
plugin_config=plugin_config
|
|
453
1155
|
)
|
|
454
1156
|
|
|
455
|
-
|
|
1157
|
+
# Get workflow summary before execution
|
|
1158
|
+
workflow_info = action.get_workflow_summary()
|
|
1159
|
+
print(f"Workflow configured with {workflow_info['step_count']} steps")
|
|
1160
|
+
print(f"Total progress weight: {workflow_info['total_progress_weight']}")
|
|
1161
|
+
print(f"Steps: {workflow_info['steps']}")
|
|
1162
|
+
|
|
1163
|
+
# Execute with enhanced error handling
|
|
1164
|
+
try:
|
|
1165
|
+
result = action.start()
|
|
1166
|
+
if result['success']:
|
|
1167
|
+
print("Upload completed successfully!")
|
|
1168
|
+
print(f"Files: {result['uploaded_files_count']}")
|
|
1169
|
+
print(f"Data units: {result['generated_data_units_count']}")
|
|
1170
|
+
else:
|
|
1171
|
+
print("Upload failed with errors:")
|
|
1172
|
+
for error in result['errors']:
|
|
1173
|
+
print(f" - {error}")
|
|
1174
|
+
except Exception as e:
|
|
1175
|
+
print(f"Upload action failed: {e}")
|
|
456
1176
|
```
|
|
457
1177
|
|
|
458
|
-
### Custom
|
|
1178
|
+
### Custom Strategy Upload
|
|
459
1179
|
|
|
460
1180
|
```python
|
|
461
|
-
|
|
462
|
-
import
|
|
1181
|
+
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
1182
|
+
from my_custom_strategies import CustomValidationStrategy
|
|
1183
|
+
|
|
1184
|
+
# Create action with custom factory
|
|
1185
|
+
class CustomUploadAction(UploadAction):
|
|
1186
|
+
def _configure_strategies(self, context=None):
|
|
1187
|
+
strategies = super()._configure_strategies(context)
|
|
463
1188
|
|
|
464
|
-
|
|
465
|
-
|
|
1189
|
+
# Override with custom validation
|
|
1190
|
+
if self.params.get('use_strict_validation'):
|
|
1191
|
+
strategies['validation'] = CustomValidationStrategy()
|
|
466
1192
|
|
|
467
|
-
|
|
1193
|
+
return strategies
|
|
1194
|
+
|
|
1195
|
+
# Use custom action
|
|
468
1196
|
params = {
|
|
469
|
-
"name": "
|
|
1197
|
+
"name": "Strict Validation Upload",
|
|
1198
|
+
"path": "/data/sensitive_files",
|
|
1199
|
+
"storage": 1,
|
|
1200
|
+
"data_collection": 5,
|
|
1201
|
+
"use_strict_validation": True,
|
|
1202
|
+
"max_file_size_mb": 10 # Stricter limits
|
|
1203
|
+
}
|
|
1204
|
+
|
|
1205
|
+
action = CustomUploadAction(
|
|
1206
|
+
params=params,
|
|
1207
|
+
plugin_config=plugin_config
|
|
1208
|
+
)
|
|
1209
|
+
|
|
1210
|
+
result = action.start()
|
|
1211
|
+
```
|
|
1212
|
+
|
|
1213
|
+
### Batch Processing with Custom Configuration
|
|
1214
|
+
|
|
1215
|
+
```python
|
|
1216
|
+
# Custom plugin configuration with config.yaml
|
|
1217
|
+
plugin_config = {
|
|
1218
|
+
"actions": {
|
|
1219
|
+
"upload": {
|
|
1220
|
+
"excel_config": {
|
|
1221
|
+
"max_file_size_mb": 20,
|
|
1222
|
+
"max_rows": 50000,
|
|
1223
|
+
"max_columns": 100
|
|
1224
|
+
}
|
|
1225
|
+
}
|
|
1226
|
+
}
|
|
1227
|
+
}
|
|
1228
|
+
|
|
1229
|
+
# Large batch upload with custom settings
|
|
1230
|
+
params = {
|
|
1231
|
+
"name": "Large Batch Upload",
|
|
470
1232
|
"path": "/data/large_dataset",
|
|
471
1233
|
"storage": 2,
|
|
472
|
-
"
|
|
1234
|
+
"data_collection": 10,
|
|
1235
|
+
"is_recursive": True,
|
|
473
1236
|
"max_file_size_mb": 500,
|
|
474
1237
|
"creating_data_unit_batch_size": 200,
|
|
475
|
-
"use_async_upload": True
|
|
1238
|
+
"use_async_upload": True
|
|
1239
|
+
}
|
|
1240
|
+
|
|
1241
|
+
action = UploadAction(
|
|
1242
|
+
params=params,
|
|
1243
|
+
plugin_config=plugin_config
|
|
1244
|
+
)
|
|
1245
|
+
|
|
1246
|
+
# Execute with progress monitoring
|
|
1247
|
+
result = action.start()
|
|
1248
|
+
|
|
1249
|
+
# Analyze results
|
|
1250
|
+
print(f"Batch upload summary:")
|
|
1251
|
+
print(f" Success: {result['success']}")
|
|
1252
|
+
print(f" Files processed: {result['uploaded_files_count']}")
|
|
1253
|
+
print(f" Data units created: {result['generated_data_units_count']}")
|
|
1254
|
+
|
|
1255
|
+
# Check metrics by category
|
|
1256
|
+
metrics = result['metrics']
|
|
1257
|
+
if 'data_files' in metrics:
|
|
1258
|
+
files_metrics = metrics['data_files']
|
|
1259
|
+
print(f" Files - Success: {files_metrics.get('success', 0)}")
|
|
1260
|
+
print(f" Files - Failed: {files_metrics.get('failed', 0)}")
|
|
1261
|
+
|
|
1262
|
+
if 'data_units' in metrics:
|
|
1263
|
+
units_metrics = metrics['data_units']
|
|
1264
|
+
print(f" Units - Success: {units_metrics.get('success', 0)}")
|
|
1265
|
+
print(f" Units - Failed: {units_metrics.get('failed', 0)}")
|
|
1266
|
+
```
|
|
1267
|
+
|
|
1268
|
+
### Error Handling and Rollback
|
|
1269
|
+
|
|
1270
|
+
```python
|
|
1271
|
+
# Demonstrate enhanced error handling with automatic rollback
|
|
1272
|
+
params = {
|
|
1273
|
+
"name": "Error Recovery Example",
|
|
1274
|
+
"path": "/data/problematic_files",
|
|
1275
|
+
"storage": 1,
|
|
1276
|
+
"data_collection": 5,
|
|
476
1277
|
"is_recursive": True
|
|
477
1278
|
}
|
|
478
1279
|
|
|
479
1280
|
action = UploadAction(
|
|
480
1281
|
params=params,
|
|
481
|
-
plugin_config=plugin_config
|
|
482
|
-
debug=True
|
|
1282
|
+
plugin_config=plugin_config
|
|
483
1283
|
)
|
|
484
1284
|
|
|
485
|
-
|
|
1285
|
+
try:
|
|
1286
|
+
result = action.start()
|
|
1287
|
+
|
|
1288
|
+
if not result['success']:
|
|
1289
|
+
print("Upload failed, but cleanup was automatic:")
|
|
1290
|
+
print(f"Errors encountered: {len(result['errors'])}")
|
|
1291
|
+
for i, error in enumerate(result['errors'], 1):
|
|
1292
|
+
print(f" {i}. {error}")
|
|
1293
|
+
|
|
1294
|
+
# Check if rollback was performed (via orchestrator internals)
|
|
1295
|
+
workflow_metrics = result['metrics'].get('workflow', {})
|
|
1296
|
+
current_step = workflow_metrics.get('current_step', 0)
|
|
1297
|
+
total_steps = workflow_metrics.get('total_steps', 0)
|
|
1298
|
+
print(f"Workflow stopped at step {current_step} of {total_steps}")
|
|
1299
|
+
|
|
1300
|
+
except Exception as e:
|
|
1301
|
+
print(f"Critical upload failure: {e}")
|
|
1302
|
+
# Rollback was automatically performed before exception propagation
|
|
486
1303
|
```
|
|
487
1304
|
|
|
488
1305
|
## Error Handling
|
|
@@ -523,11 +1340,11 @@ except ValidationError as e:
|
|
|
523
1340
|
|
|
524
1341
|
## API Reference
|
|
525
1342
|
|
|
526
|
-
###
|
|
1343
|
+
### Core Components
|
|
527
1344
|
|
|
528
1345
|
#### UploadAction
|
|
529
1346
|
|
|
530
|
-
Main upload action class for file processing operations.
|
|
1347
|
+
Main upload action class implementing Strategy and Facade patterns for file processing operations.
|
|
531
1348
|
|
|
532
1349
|
**Class Attributes:**
|
|
533
1350
|
|
|
@@ -536,17 +1353,256 @@ Main upload action class for file processing operations.
|
|
|
536
1353
|
- `method = RunMethod.JOB` - Execution method
|
|
537
1354
|
- `run_class = UploadRun` - Specialized run management
|
|
538
1355
|
- `params_model = UploadParams` - Parameter validation model
|
|
1356
|
+
- `strategy_factory: StrategyFactory` - Creates strategy implementations
|
|
1357
|
+
- `step_registry: StepRegistry` - Manages workflow steps
|
|
1358
|
+
|
|
1359
|
+
**Key Methods:**
|
|
1360
|
+
|
|
1361
|
+
- `start() -> Dict[str, Any]` - Execute orchestrated upload workflow
|
|
1362
|
+
- `get_workflow_summary() -> Dict[str, Any]` - Get configured workflow summary
|
|
1363
|
+
- `_configure_workflow() -> None` - Register workflow steps in execution order
|
|
1364
|
+
- `_configure_strategies(context=None) -> Dict[str, Any]` - Create strategy instances
|
|
1365
|
+
|
|
1366
|
+
**Progress Categories:**
|
|
1367
|
+
|
|
1368
|
+
```python
|
|
1369
|
+
progress_categories = {
|
|
1370
|
+
'analyze_collection': {'proportion': 2},
|
|
1371
|
+
'upload_data_files': {'proportion': 38},
|
|
1372
|
+
'generate_data_units': {'proportion': 60},
|
|
1373
|
+
}
|
|
1374
|
+
```
|
|
1375
|
+
|
|
1376
|
+
#### UploadOrchestrator
|
|
1377
|
+
|
|
1378
|
+
Facade component coordinating the complete upload workflow with automatic rollback.
|
|
1379
|
+
|
|
1380
|
+
**Class Attributes:**
|
|
1381
|
+
|
|
1382
|
+
- `context: UploadContext` - Shared state across workflow
|
|
1383
|
+
- `step_registry: StepRegistry` - Registry of workflow steps
|
|
1384
|
+
- `strategies: Dict[str, Any]` - Strategy implementations
|
|
1385
|
+
- `executed_steps: List[BaseStep]` - Successfully executed steps
|
|
1386
|
+
- `current_step_index: int` - Current position in workflow
|
|
1387
|
+
- `rollback_executed: bool` - Whether rollback was performed
|
|
1388
|
+
|
|
1389
|
+
**Key Methods:**
|
|
1390
|
+
|
|
1391
|
+
- `execute() -> Dict[str, Any]` - Execute complete workflow with error handling
|
|
1392
|
+
- `get_workflow_summary() -> Dict[str, Any]` - Get execution summary and metrics
|
|
1393
|
+
- `get_executed_steps() -> List[BaseStep]` - Get list of successfully executed steps
|
|
1394
|
+
- `is_rollback_executed() -> bool` - Check if rollback was performed
|
|
1395
|
+
- `_execute_step(step: BaseStep) -> StepResult` - Execute individual workflow step
|
|
1396
|
+
- `_handle_step_failure(step: BaseStep, error: Exception) -> None` - Handle step failures
|
|
1397
|
+
- `_rollback_executed_steps() -> None` - Rollback executed steps in reverse order
|
|
1398
|
+
|
|
1399
|
+
#### UploadContext
|
|
1400
|
+
|
|
1401
|
+
Context object maintaining shared state and communication between workflow components.
|
|
1402
|
+
|
|
1403
|
+
**State Attributes:**
|
|
1404
|
+
|
|
1405
|
+
- `params: Dict` - Upload parameters
|
|
1406
|
+
- `run: UploadRun` - Run management instance
|
|
1407
|
+
- `client: Any` - API client for external operations
|
|
1408
|
+
- `storage: Any` - Storage configuration object
|
|
1409
|
+
- `pathlib_cwd: Path` - Current working directory path
|
|
1410
|
+
- `metadata: Dict[str, Dict[str, Any]]` - File metadata mappings
|
|
1411
|
+
- `file_specifications: Dict[str, Any]` - Data collection file specs
|
|
1412
|
+
- `organized_files: List[Dict[str, Any]]` - Organized file information
|
|
1413
|
+
- `uploaded_files: List[Dict[str, Any]]` - Successfully uploaded files
|
|
1414
|
+
- `data_units: List[Dict[str, Any]]` - Generated data units
|
|
1415
|
+
|
|
1416
|
+
**Progress and Metrics:**
|
|
1417
|
+
|
|
1418
|
+
- `metrics: Dict[str, Any]` - Workflow metrics and statistics
|
|
1419
|
+
- `errors: List[str]` - Accumulated error messages
|
|
1420
|
+
- `step_results: List[StepResult]` - Results from executed steps
|
|
1421
|
+
|
|
1422
|
+
**Strategy and Rollback:**
|
|
1423
|
+
|
|
1424
|
+
- `strategies: Dict[str, Any]` - Injected strategy implementations
|
|
1425
|
+
- `rollback_data: Dict[str, Any]` - Data for rollback operations
|
|
1426
|
+
|
|
1427
|
+
**Key Methods:**
|
|
1428
|
+
|
|
1429
|
+
- `update(result: StepResult) -> None` - Update context with step results
|
|
1430
|
+
- `get_result() -> Dict[str, Any]` - Generate final result dictionary
|
|
1431
|
+
- `has_errors() -> bool` - Check for accumulated errors
|
|
1432
|
+
- `get_last_step_result() -> Optional[StepResult]` - Get most recent step result
|
|
1433
|
+
- `update_metrics(category: str, metrics: Dict[str, Any]) -> None` - Update metrics
|
|
1434
|
+
- `add_error(error: str) -> None` - Add error to context
|
|
1435
|
+
- `get_param(key: str, default: Any = None) -> Any` - Get parameter with default
|
|
1436
|
+
|
|
1437
|
+
#### StepRegistry
|
|
1438
|
+
|
|
1439
|
+
Registry managing the collection and execution order of workflow steps.
|
|
1440
|
+
|
|
1441
|
+
**Attributes:**
|
|
1442
|
+
|
|
1443
|
+
- `_steps: List[BaseStep]` - Registered workflow steps in execution order
|
|
1444
|
+
|
|
1445
|
+
**Key Methods:**
|
|
1446
|
+
|
|
1447
|
+
- `register(step: BaseStep) -> None` - Register a workflow step
|
|
1448
|
+
- `get_steps() -> List[BaseStep]` - Get all registered steps in order
|
|
1449
|
+
- `get_total_progress_weight() -> float` - Calculate total progress weight
|
|
1450
|
+
- `clear() -> None` - Clear all registered steps
|
|
1451
|
+
- `__len__() -> int` - Get number of registered steps
|
|
1452
|
+
|
|
1453
|
+
#### StrategyFactory
|
|
1454
|
+
|
|
1455
|
+
Factory component creating appropriate strategy implementations based on parameters.
|
|
539
1456
|
|
|
540
1457
|
**Key Methods:**
|
|
541
1458
|
|
|
542
|
-
- `
|
|
543
|
-
- `
|
|
544
|
-
- `
|
|
545
|
-
- `
|
|
1459
|
+
- `create_validation_strategy(params: Dict, context=None) -> BaseValidationStrategy` - Create validation strategy
|
|
1460
|
+
- `create_file_discovery_strategy(params: Dict, context=None) -> BaseFileDiscoveryStrategy` - Create file discovery strategy
|
|
1461
|
+
- `create_metadata_strategy(params: Dict, context=None) -> BaseMetadataStrategy` - Create metadata processing strategy
|
|
1462
|
+
- `create_upload_strategy(params: Dict, context: UploadContext) -> BaseUploadStrategy` - Create upload strategy (requires context)
|
|
1463
|
+
- `create_data_unit_strategy(params: Dict, context: UploadContext) -> BaseDataUnitStrategy` - Create data unit strategy (requires context)
|
|
1464
|
+
- `get_available_strategies() -> Dict[str, List[str]]` - Get available strategy types and implementations
|
|
1465
|
+
|
|
1466
|
+
### Workflow Steps
|
|
1467
|
+
|
|
1468
|
+
#### BaseStep (Abstract)
|
|
1469
|
+
|
|
1470
|
+
Base class for all workflow steps providing common interface and utilities.
|
|
1471
|
+
|
|
1472
|
+
**Abstract Properties:**
|
|
1473
|
+
|
|
1474
|
+
- `name: str` - Unique step identifier
|
|
1475
|
+
- `progress_weight: float` - Weight for progress calculation (sum should equal 1.0)
|
|
1476
|
+
|
|
1477
|
+
**Abstract Methods:**
|
|
1478
|
+
|
|
1479
|
+
- `execute(context: UploadContext) -> StepResult` - Execute step logic
|
|
1480
|
+
- `can_skip(context: UploadContext) -> bool` - Determine if step can be skipped
|
|
1481
|
+
- `rollback(context: UploadContext) -> None` - Rollback step operations
|
|
1482
|
+
|
|
1483
|
+
**Utility Methods:**
|
|
1484
|
+
|
|
1485
|
+
- `create_success_result(data: Dict = None) -> StepResult` - Create success result
|
|
1486
|
+
- `create_error_result(error: str, original_exception: Exception = None) -> StepResult` - Create error result
|
|
1487
|
+
- `create_skip_result() -> StepResult` - Create skip result
|
|
1488
|
+
|
|
1489
|
+
#### StepResult
|
|
1490
|
+
|
|
1491
|
+
Result object returned by workflow step execution.
|
|
1492
|
+
|
|
1493
|
+
**Attributes:**
|
|
1494
|
+
|
|
1495
|
+
- `success: bool` - Whether step executed successfully
|
|
1496
|
+
- `data: Dict[str, Any]` - Step result data
|
|
1497
|
+
- `error: str` - Error message if step failed
|
|
1498
|
+
- `rollback_data: Dict[str, Any]` - Data needed for rollback
|
|
1499
|
+
- `skipped: bool` - Whether step was skipped
|
|
1500
|
+
- `original_exception: Optional[Exception]` - Original exception for debugging
|
|
1501
|
+
- `timestamp: datetime` - Execution timestamp
|
|
1502
|
+
|
|
1503
|
+
**Usage:**
|
|
1504
|
+
|
|
1505
|
+
```python
|
|
1506
|
+
# Boolean evaluation
|
|
1507
|
+
if step_result:
|
|
1508
|
+
# Step was successful
|
|
1509
|
+
process_success(step_result.data)
|
|
1510
|
+
```
|
|
1511
|
+
|
|
1512
|
+
#### Concrete Steps
|
|
1513
|
+
|
|
1514
|
+
**InitializeStep** (`name: "initialize"`, `weight: 0.05`)
|
|
1515
|
+
|
|
1516
|
+
- Sets up storage connection and pathlib working directory
|
|
1517
|
+
- Validates basic upload prerequisites
|
|
1518
|
+
|
|
1519
|
+
**ProcessMetadataStep** (`name: "process_metadata"`, `weight: 0.05`)
|
|
1520
|
+
|
|
1521
|
+
- Processes Excel metadata if provided
|
|
1522
|
+
- Validates metadata security and format
|
|
1523
|
+
|
|
1524
|
+
**AnalyzeCollectionStep** (`name: "analyze_collection"`, `weight: 0.05`)
|
|
1525
|
+
|
|
1526
|
+
- Retrieves and validates data collection file specifications
|
|
1527
|
+
- Sets up file organization rules
|
|
1528
|
+
|
|
1529
|
+
**OrganizeFilesStep** (`name: "organize_files"`, `weight: 0.10`)
|
|
1530
|
+
|
|
1531
|
+
- Discovers files using file discovery strategy
|
|
1532
|
+
- Organizes files by type and specification
|
|
1533
|
+
|
|
1534
|
+
**ValidateFilesStep** (`name: "validate_files"`, `weight: 0.05`)
|
|
1535
|
+
|
|
1536
|
+
- Validates files using validation strategy
|
|
1537
|
+
- Performs security and content checks
|
|
1538
|
+
|
|
1539
|
+
**UploadFilesStep** (`name: "upload_files"`, `weight: 0.30`)
|
|
1540
|
+
|
|
1541
|
+
- Uploads files using upload strategy
|
|
1542
|
+
- Handles batching and progress tracking
|
|
1543
|
+
|
|
1544
|
+
**GenerateDataUnitsStep** (`name: "generate_data_units"`, `weight: 0.35`)
|
|
1545
|
+
|
|
1546
|
+
- Creates data units using data unit strategy
|
|
1547
|
+
- Links uploaded files to data units
|
|
1548
|
+
|
|
1549
|
+
**CleanupStep** (`name: "cleanup"`, `weight: 0.05`)
|
|
1550
|
+
|
|
1551
|
+
- Cleans temporary resources and files
|
|
1552
|
+
- Performs final validation
|
|
1553
|
+
|
|
1554
|
+
### Strategy Base Classes
|
|
1555
|
+
|
|
1556
|
+
#### BaseValidationStrategy (Abstract)
|
|
1557
|
+
|
|
1558
|
+
Base class for file validation strategies.
|
|
1559
|
+
|
|
1560
|
+
**Abstract Methods:**
|
|
1561
|
+
|
|
1562
|
+
- `validate_files(files: List[Path], context: UploadContext) -> bool` - Validate collection of files
|
|
1563
|
+
- `validate_security(file_path: Path) -> bool` - Validate individual file security
|
|
1564
|
+
|
|
1565
|
+
#### BaseFileDiscoveryStrategy (Abstract)
|
|
1566
|
+
|
|
1567
|
+
Base class for file discovery and organization strategies.
|
|
1568
|
+
|
|
1569
|
+
**Abstract Methods:**
|
|
1570
|
+
|
|
1571
|
+
- `discover_files(path: Path, context: UploadContext) -> List[Path]` - Discover files from path
|
|
1572
|
+
- `organize_files(files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]` - Organize discovered files
|
|
1573
|
+
|
|
1574
|
+
#### BaseMetadataStrategy (Abstract)
|
|
1575
|
+
|
|
1576
|
+
Base class for metadata processing strategies.
|
|
1577
|
+
|
|
1578
|
+
**Abstract Methods:**
|
|
1579
|
+
|
|
1580
|
+
- `process_metadata(context: UploadContext) -> Dict[str, Any]` - Process metadata from context
|
|
1581
|
+
- `extract_metadata(file_path: Path) -> Dict[str, Any]` - Extract metadata from file
|
|
1582
|
+
|
|
1583
|
+
#### BaseUploadStrategy (Abstract)
|
|
1584
|
+
|
|
1585
|
+
Base class for file upload strategies.
|
|
1586
|
+
|
|
1587
|
+
**Abstract Methods:**
|
|
1588
|
+
|
|
1589
|
+
- `upload_files(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload collection of files
|
|
1590
|
+
- `upload_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload file batch
|
|
1591
|
+
|
|
1592
|
+
#### BaseDataUnitStrategy (Abstract)
|
|
1593
|
+
|
|
1594
|
+
Base class for data unit creation strategies.
|
|
1595
|
+
|
|
1596
|
+
**Abstract Methods:**
|
|
1597
|
+
|
|
1598
|
+
- `generate_data_units(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Generate data units
|
|
1599
|
+
- `create_data_unit_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Create data unit batch
|
|
1600
|
+
|
|
1601
|
+
### Legacy Components
|
|
546
1602
|
|
|
547
1603
|
#### UploadRun
|
|
548
1604
|
|
|
549
|
-
Specialized run management for upload operations.
|
|
1605
|
+
Specialized run management for upload operations (unchanged from legacy).
|
|
550
1606
|
|
|
551
1607
|
**Logging Methods:**
|
|
552
1608
|
|
|
@@ -563,11 +1619,28 @@ Specialized run management for upload operations.
|
|
|
563
1619
|
|
|
564
1620
|
#### UploadParams
|
|
565
1621
|
|
|
566
|
-
Parameter validation model with Pydantic integration.
|
|
1622
|
+
Parameter validation model with Pydantic integration (unchanged from legacy).
|
|
1623
|
+
|
|
1624
|
+
**Required Parameters:**
|
|
1625
|
+
|
|
1626
|
+
- `name: str` - Upload name
|
|
1627
|
+
- `path: str` - Source path
|
|
1628
|
+
- `storage: int` - Storage ID
|
|
1629
|
+
- `data_collection: int` - Data collection ID
|
|
1630
|
+
|
|
1631
|
+
**Optional Parameters:**
|
|
1632
|
+
|
|
1633
|
+
- `description: str | None = None` - Upload description
|
|
1634
|
+
- `project: int | None = None` - Project ID
|
|
1635
|
+
- `excel_metadata_path: str | None = None` - Excel metadata file path
|
|
1636
|
+
- `is_recursive: bool = False` - Recursive file discovery
|
|
1637
|
+
- `max_file_size_mb: int = 50` - Maximum file size
|
|
1638
|
+
- `creating_data_unit_batch_size: int = 100` - Data unit batch size
|
|
1639
|
+
- `use_async_upload: bool = True` - Async upload processing
|
|
567
1640
|
|
|
568
1641
|
**Validation Features:**
|
|
569
1642
|
|
|
570
|
-
- Real-time API validation for storage/
|
|
1643
|
+
- Real-time API validation for storage/data_collection/project
|
|
571
1644
|
- String sanitization and length validation
|
|
572
1645
|
- Type checking and conversion
|
|
573
1646
|
- Custom validator methods
|
|
@@ -580,19 +1653,13 @@ Security configuration for Excel file processing.
|
|
|
580
1653
|
|
|
581
1654
|
**Configuration Attributes:**
|
|
582
1655
|
|
|
583
|
-
- File size
|
|
584
|
-
-
|
|
585
|
-
-
|
|
586
|
-
- Environment variable overrides
|
|
587
|
-
|
|
588
|
-
#### ExcelMetadataUtils
|
|
589
|
-
|
|
590
|
-
Utility methods for Excel metadata processing.
|
|
1656
|
+
- `max_file_size_mb` - File size limit in megabytes (default: 10)
|
|
1657
|
+
- `max_rows` - Row count limit (default: 100000)
|
|
1658
|
+
- `max_columns` - Column count limit (default: 50)
|
|
591
1659
|
|
|
592
1660
|
**Key Methods:**
|
|
593
1661
|
|
|
594
|
-
- `
|
|
595
|
-
- `is_valid_filename_length()` - Filename validation
|
|
1662
|
+
- `from_action_config(action_config)` - Create config from config.yaml
|
|
596
1663
|
|
|
597
1664
|
#### PathAwareJSONEncoder
|
|
598
1665
|
|
|
@@ -651,30 +1718,247 @@ Raised when Excel files cannot be parsed.
|
|
|
651
1718
|
|
|
652
1719
|
## Best Practices
|
|
653
1720
|
|
|
1721
|
+
### Architecture Patterns
|
|
1722
|
+
|
|
1723
|
+
1. **Strategy Selection**: Choose appropriate strategies based on use case requirements
|
|
1724
|
+
|
|
1725
|
+
- Use `RecursiveFileDiscoveryStrategy` for deep directory structures
|
|
1726
|
+
- Use `BasicValidationStrategy` for standard file validation
|
|
1727
|
+
- Use `AsyncUploadStrategy` for large file sets
|
|
1728
|
+
|
|
1729
|
+
2. **Step Ordering**: Maintain logical step dependencies
|
|
1730
|
+
|
|
1731
|
+
- Initialize → Process Metadata → Analyze Collection → Organize Files → Validate → Upload → Generate Data Units → Cleanup
|
|
1732
|
+
- Custom steps should be inserted at appropriate points in the workflow
|
|
1733
|
+
|
|
1734
|
+
3. **Context Management**: Leverage UploadContext for state sharing
|
|
1735
|
+
- Store intermediate results in context for downstream steps
|
|
1736
|
+
- Use context for cross-step communication
|
|
1737
|
+
- Preserve rollback data for cleanup operations
|
|
1738
|
+
|
|
654
1739
|
### Performance Optimization
|
|
655
1740
|
|
|
656
|
-
1. **Batch Processing**:
|
|
657
|
-
2. **Async Operations**: Enable async processing for better throughput
|
|
658
|
-
3. **Memory Management**: Configure Excel security limits appropriately
|
|
659
|
-
4. **Progress Monitoring**: Track progress categories for user feedback
|
|
1741
|
+
1. **Batch Processing**: Configure optimal batch sizes based on system resources
|
|
660
1742
|
|
|
661
|
-
|
|
1743
|
+
```python
|
|
1744
|
+
params = {
|
|
1745
|
+
"creating_data_unit_batch_size": 200, # Adjust based on memory
|
|
1746
|
+
"upload_batch_size": 10, # Custom parameter for upload strategies
|
|
1747
|
+
}
|
|
1748
|
+
```
|
|
662
1749
|
|
|
663
|
-
|
|
664
|
-
2. **Excel Security**: Configure appropriate security limits
|
|
665
|
-
3. **Path Sanitization**: Validate and sanitize file paths
|
|
666
|
-
4. **Content Filtering**: Implement content-based security checks
|
|
1750
|
+
2. **Async Operations**: Enable async processing for I/O-bound operations
|
|
667
1751
|
|
|
668
|
-
|
|
1752
|
+
```python
|
|
1753
|
+
params = {
|
|
1754
|
+
"use_async_upload": True, # Better throughput for network operations
|
|
1755
|
+
}
|
|
1756
|
+
```
|
|
1757
|
+
|
|
1758
|
+
3. **Memory Management**: Monitor memory usage in custom strategies
|
|
1759
|
+
|
|
1760
|
+
- Process files in chunks rather than loading all into memory
|
|
1761
|
+
- Use generators for large file collections
|
|
1762
|
+
- Configure Excel security limits appropriately
|
|
1763
|
+
|
|
1764
|
+
4. **Progress Monitoring**: Implement detailed progress tracking
|
|
1765
|
+
```python
|
|
1766
|
+
# Custom step with progress updates
|
|
1767
|
+
def execute(self, context):
|
|
1768
|
+
total_files = len(context.organized_files)
|
|
1769
|
+
for i, file_info in enumerate(context.organized_files):
|
|
1770
|
+
# Process file
|
|
1771
|
+
progress = (i + 1) / total_files * 100
|
|
1772
|
+
context.update_metrics('custom_step', {'progress': progress})
|
|
1773
|
+
```
|
|
1774
|
+
|
|
1775
|
+
### Security Considerations
|
|
669
1776
|
|
|
670
|
-
1. **
|
|
671
|
-
|
|
672
|
-
|
|
673
|
-
|
|
1777
|
+
1. **Input Validation**: Validate all input parameters and file paths
|
|
1778
|
+
|
|
1779
|
+
```python
|
|
1780
|
+
# Custom validation in strategy
|
|
1781
|
+
def validate_files(self, files, context):
|
|
1782
|
+
for file_path in files:
|
|
1783
|
+
if not self._is_safe_path(file_path):
|
|
1784
|
+
return False
|
|
1785
|
+
return True
|
|
1786
|
+
```
|
|
1787
|
+
|
|
1788
|
+
2. **File Content Security**: Implement content-based security checks
|
|
1789
|
+
|
|
1790
|
+
- Scan for malicious file signatures
|
|
1791
|
+
- Validate file headers match extensions
|
|
1792
|
+
- Check for embedded executables
|
|
1793
|
+
|
|
1794
|
+
3. **Excel Security**: Configure appropriate security limits
|
|
1795
|
+
|
|
1796
|
+
```python
|
|
1797
|
+
import os
|
|
1798
|
+
os.environ['EXCEL_MAX_FILE_SIZE_MB'] = '10'
|
|
1799
|
+
os.environ['EXCEL_MAX_MEMORY_MB'] = '30'
|
|
1800
|
+
```
|
|
1801
|
+
|
|
1802
|
+
4. **Path Sanitization**: Validate and sanitize all file paths
|
|
1803
|
+
- Prevent path traversal attacks
|
|
1804
|
+
- Validate file extensions
|
|
1805
|
+
- Check file permissions
|
|
1806
|
+
|
|
1807
|
+
### Error Handling and Recovery
|
|
1808
|
+
|
|
1809
|
+
1. **Graceful Degradation**: Design for partial failure scenarios
|
|
1810
|
+
|
|
1811
|
+
```python
|
|
1812
|
+
class RobustUploadStrategy(BaseUploadStrategy):
|
|
1813
|
+
def upload_files(self, files, context):
|
|
1814
|
+
successful_uploads = []
|
|
1815
|
+
failed_uploads = []
|
|
1816
|
+
|
|
1817
|
+
for file_info in files:
|
|
1818
|
+
try:
|
|
1819
|
+
result = self._upload_file(file_info)
|
|
1820
|
+
successful_uploads.append(result)
|
|
1821
|
+
except Exception as e:
|
|
1822
|
+
failed_uploads.append({'file': file_info, 'error': str(e)})
|
|
1823
|
+
# Continue with other files instead of failing completely
|
|
1824
|
+
|
|
1825
|
+
# Update context with partial results
|
|
1826
|
+
context.add_uploaded_files(successful_uploads)
|
|
1827
|
+
if failed_uploads:
|
|
1828
|
+
context.add_error(f"Failed to upload {len(failed_uploads)} files")
|
|
1829
|
+
|
|
1830
|
+
return successful_uploads
|
|
1831
|
+
```
|
|
1832
|
+
|
|
1833
|
+
2. **Rollback Design**: Implement comprehensive rollback strategies
|
|
1834
|
+
|
|
1835
|
+
```python
|
|
1836
|
+
def rollback(self, context):
|
|
1837
|
+
# Clean up in reverse order of operations
|
|
1838
|
+
if hasattr(self, '_created_temp_files'):
|
|
1839
|
+
for temp_file in self._created_temp_files:
|
|
1840
|
+
try:
|
|
1841
|
+
temp_file.unlink()
|
|
1842
|
+
except Exception:
|
|
1843
|
+
pass # Don't fail rollback due to cleanup issues
|
|
1844
|
+
```
|
|
1845
|
+
|
|
1846
|
+
3. **Detailed Logging**: Use structured logging for debugging
|
|
1847
|
+
```python
|
|
1848
|
+
def execute(self, context):
|
|
1849
|
+
try:
|
|
1850
|
+
context.run.log_message_with_code(
|
|
1851
|
+
'CUSTOM_STEP_STARTED',
|
|
1852
|
+
{'step': self.name, 'file_count': len(context.organized_files)}
|
|
1853
|
+
)
|
|
1854
|
+
# Step logic here
|
|
1855
|
+
except Exception as e:
|
|
1856
|
+
context.run.log_message_with_code(
|
|
1857
|
+
'CUSTOM_STEP_FAILED',
|
|
1858
|
+
{'step': self.name, 'error': str(e)},
|
|
1859
|
+
level=Context.DANGER
|
|
1860
|
+
)
|
|
1861
|
+
raise
|
|
1862
|
+
```
|
|
674
1863
|
|
|
675
1864
|
### Development Guidelines
|
|
676
1865
|
|
|
677
|
-
1. **
|
|
678
|
-
|
|
679
|
-
|
|
680
|
-
|
|
1866
|
+
1. **Custom Strategy Development**: Follow established patterns
|
|
1867
|
+
|
|
1868
|
+
```python
|
|
1869
|
+
# Always extend appropriate base class
|
|
1870
|
+
class MyCustomStrategy(BaseValidationStrategy):
|
|
1871
|
+
def __init__(self, config=None):
|
|
1872
|
+
self.config = config or {}
|
|
1873
|
+
|
|
1874
|
+
def validate_files(self, files, context):
|
|
1875
|
+
# Implement validation logic
|
|
1876
|
+
return True
|
|
1877
|
+
|
|
1878
|
+
def validate_security(self, file_path):
|
|
1879
|
+
# Implement security validation
|
|
1880
|
+
return True
|
|
1881
|
+
```
|
|
1882
|
+
|
|
1883
|
+
2. **Testing Strategy**: Comprehensive test coverage
|
|
1884
|
+
|
|
1885
|
+
```python
|
|
1886
|
+
# Test both success and failure scenarios
|
|
1887
|
+
class TestCustomStrategy:
|
|
1888
|
+
def test_success_case(self):
|
|
1889
|
+
strategy = MyCustomStrategy()
|
|
1890
|
+
result = strategy.validate_files([Path('valid_file.txt')], mock_context)
|
|
1891
|
+
assert result is True
|
|
1892
|
+
|
|
1893
|
+
def test_security_failure(self):
|
|
1894
|
+
strategy = MyCustomStrategy()
|
|
1895
|
+
result = strategy.validate_security(Path('malware.exe'))
|
|
1896
|
+
assert result is False
|
|
1897
|
+
|
|
1898
|
+
def test_rollback_cleanup(self):
|
|
1899
|
+
step = MyCustomStep()
|
|
1900
|
+
step.rollback(mock_context)
|
|
1901
|
+
# Assert cleanup was performed
|
|
1902
|
+
```
|
|
1903
|
+
|
|
1904
|
+
3. **Extension Points**: Use factory pattern for extensibility
|
|
1905
|
+
|
|
1906
|
+
```python
|
|
1907
|
+
class CustomStrategyFactory(StrategyFactory):
|
|
1908
|
+
def create_validation_strategy(self, params, context=None):
|
|
1909
|
+
validation_type = params.get('validation_type', 'basic')
|
|
1910
|
+
|
|
1911
|
+
strategy_map = {
|
|
1912
|
+
'basic': BasicValidationStrategy,
|
|
1913
|
+
'strict': StrictValidationStrategy,
|
|
1914
|
+
'custom': MyCustomValidationStrategy,
|
|
1915
|
+
}
|
|
1916
|
+
|
|
1917
|
+
strategy_class = strategy_map.get(validation_type, BasicValidationStrategy)
|
|
1918
|
+
return strategy_class(params)
|
|
1919
|
+
```
|
|
1920
|
+
|
|
1921
|
+
4. **Configuration Management**: Use environment variables and parameters
|
|
1922
|
+
|
|
1923
|
+
```python
|
|
1924
|
+
class ConfigurableStep(BaseStep):
|
|
1925
|
+
def __init__(self):
|
|
1926
|
+
# Allow runtime configuration
|
|
1927
|
+
self.batch_size = int(os.getenv('STEP_BATCH_SIZE', '50'))
|
|
1928
|
+
self.timeout = int(os.getenv('STEP_TIMEOUT_SECONDS', '300'))
|
|
1929
|
+
|
|
1930
|
+
def execute(self, context):
|
|
1931
|
+
# Use configured values
|
|
1932
|
+
batch_size = context.get_param('step_batch_size', self.batch_size)
|
|
1933
|
+
timeout = context.get_param('step_timeout', self.timeout)
|
|
1934
|
+
```
|
|
1935
|
+
|
|
1936
|
+
### Anti-Patterns to Avoid
|
|
1937
|
+
|
|
1938
|
+
1. **Tight Coupling**: Don't couple strategies to specific implementations
|
|
1939
|
+
2. **State Mutation**: Don't modify context state directly outside of update() method
|
|
1940
|
+
3. **Exception Swallowing**: Don't catch and ignore exceptions without proper handling
|
|
1941
|
+
4. **Blocking Operations**: Don't perform long-running synchronous operations without progress updates
|
|
1942
|
+
5. **Memory Leaks**: Don't hold references to large objects in step instances
|
|
1943
|
+
|
|
1944
|
+
### Troubleshooting Guide
|
|
1945
|
+
|
|
1946
|
+
1. **Step Failures**: Check step execution order and dependencies
|
|
1947
|
+
2. **Strategy Issues**: Verify strategy factory configuration and parameter passing
|
|
1948
|
+
3. **Context Problems**: Ensure proper context updates and state management
|
|
1949
|
+
4. **Rollback Failures**: Design idempotent rollback operations
|
|
1950
|
+
5. **Performance Issues**: Profile batch sizes and async operation usage
|
|
1951
|
+
|
|
1952
|
+
### Migration Checklist
|
|
1953
|
+
|
|
1954
|
+
When upgrading from legacy implementation:
|
|
1955
|
+
|
|
1956
|
+
- [ ] Update parameter name from `collection` to `data_collection`
|
|
1957
|
+
- [ ] Test existing workflows for compatibility
|
|
1958
|
+
- [ ] Review custom extensions for new architecture opportunities
|
|
1959
|
+
- [ ] Update error handling to leverage new rollback capabilities
|
|
1960
|
+
|
|
1961
|
+
For detailed information on developing custom upload plugins using the BaseUploader template, see the [Developing Upload Templates](./developing-upload-template.md) guide.
|
|
1962
|
+
- [ ] Consider implementing custom strategies for specialized requirements
|
|
1963
|
+
- [ ] Update test cases to validate new workflow steps
|
|
1964
|
+
- [ ] Review logging and metrics collection for enhanced information
|