synapse-sdk 2025.10.1__py3-none-any.whl → 2025.10.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synapse-sdk might be problematic. Click here for more details.
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
- synapse_sdk/devtools/docs/docs/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/docs/plugins/plugins.md +12 -5
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-action.md +934 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-overview.md +560 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/categories/upload-plugins/upload-plugin-template.md +715 -0
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current.json +16 -4
- synapse_sdk/devtools/docs/sidebars.ts +13 -1
- synapse_sdk/plugins/README.md +487 -80
- synapse_sdk/plugins/categories/pre_annotation/actions/__init__.py +4 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/__init__.py +3 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/pre_annotation/action.py +10 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/__init__.py +28 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/action.py +145 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/enums.py +269 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/exceptions.py +14 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/factory.py +76 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/models.py +97 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/orchestrator.py +250 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/run.py +64 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/__init__.py +17 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/annotation.py +284 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/base.py +170 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/extraction.py +83 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/metrics.py +87 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/preprocessor.py +127 -0
- synapse_sdk/plugins/categories/pre_annotation/actions/to_task/strategies/validation.py +143 -0
- synapse_sdk/plugins/categories/upload/actions/upload/__init__.py +2 -1
- synapse_sdk/plugins/categories/upload/actions/upload/models.py +134 -94
- synapse_sdk/plugins/categories/upload/actions/upload/steps/cleanup.py +2 -2
- synapse_sdk/plugins/categories/upload/actions/upload/steps/metadata.py +106 -14
- synapse_sdk/plugins/categories/upload/actions/upload/steps/organize.py +113 -36
- synapse_sdk/plugins/categories/upload/templates/README.md +365 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/METADATA +1 -1
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/RECORD +40 -20
- synapse_sdk/devtools/docs/docs/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/docs/plugins/upload-plugins.md +0 -1964
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/developing-upload-template.md +0 -1463
- synapse_sdk/devtools/docs/i18n/ko/docusaurus-plugin-content-docs/current/plugins/upload-plugins.md +0 -2077
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/WHEEL +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/entry_points.txt +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/licenses/LICENSE +0 -0
- {synapse_sdk-2025.10.1.dist-info → synapse_sdk-2025.10.3.dist-info}/top_level.txt +0 -0
|
@@ -1,1964 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
id: upload-plugins
|
|
3
|
-
title: Upload Plugins
|
|
4
|
-
sidebar_position: 3
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# Upload Plugins
|
|
8
|
-
|
|
9
|
-
Upload plugins provide file upload and data ingestion operations for processing files into the Synapse platform with comprehensive metadata support, security validation, and organized data unit generation.
|
|
10
|
-
|
|
11
|
-
## Overview
|
|
12
|
-
|
|
13
|
-
**Available Actions:**
|
|
14
|
-
|
|
15
|
-
- `upload` - Upload files and directories to storage with optional Excel metadata processing
|
|
16
|
-
|
|
17
|
-
**Use Cases:**
|
|
18
|
-
|
|
19
|
-
- Bulk file uploads with metadata annotation
|
|
20
|
-
- Excel-based metadata mapping and validation
|
|
21
|
-
- Recursive directory processing
|
|
22
|
-
- Type-based file organization
|
|
23
|
-
- Batch data unit creation
|
|
24
|
-
- Secure file processing with size and content validation
|
|
25
|
-
|
|
26
|
-
**Supported Upload Sources:**
|
|
27
|
-
|
|
28
|
-
- Local file system paths (files and directories)
|
|
29
|
-
- Recursive directory scanning
|
|
30
|
-
- Excel metadata files for enhanced file annotation
|
|
31
|
-
- Mixed file types with automatic organization
|
|
32
|
-
|
|
33
|
-
## Upload Action Architecture
|
|
34
|
-
|
|
35
|
-
The upload system uses a modern, extensible architecture built on proven design patterns. The refactored implementation transforms the previous monolithic approach into a modular, strategy-based system with clear separation of concerns.
|
|
36
|
-
|
|
37
|
-
### Design Patterns
|
|
38
|
-
|
|
39
|
-
The architecture leverages several key design patterns:
|
|
40
|
-
|
|
41
|
-
- **Strategy Pattern**: Pluggable behaviors for validation, file discovery, metadata processing, upload operations, and data unit creation
|
|
42
|
-
- **Facade Pattern**: UploadOrchestrator provides a simplified interface to coordinate complex workflows
|
|
43
|
-
- **Factory Pattern**: StrategyFactory creates appropriate strategy implementations based on runtime parameters
|
|
44
|
-
- **Context Pattern**: UploadContext maintains shared state and communication between workflow components
|
|
45
|
-
|
|
46
|
-
### Component Architecture
|
|
47
|
-
|
|
48
|
-
```mermaid
|
|
49
|
-
classDiagram
|
|
50
|
-
%% Light/Dark mode compatible colors
|
|
51
|
-
classDef coreClass fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
|
|
52
|
-
classDef strategyClass fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
|
|
53
|
-
classDef stepClass fill:#fff9c4,stroke:#f57c00,stroke-width:2px,color:#000000
|
|
54
|
-
classDef contextClass fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
|
|
55
|
-
|
|
56
|
-
class UploadAction {
|
|
57
|
-
+name: str = "upload"
|
|
58
|
-
+category: PluginCategory.UPLOAD
|
|
59
|
-
+method: RunMethod.JOB
|
|
60
|
-
+run_class: UploadRun
|
|
61
|
-
+params_model: UploadParams
|
|
62
|
-
+progress_categories: dict
|
|
63
|
-
+metrics_categories: dict
|
|
64
|
-
+strategy_factory: StrategyFactory
|
|
65
|
-
+step_registry: StepRegistry
|
|
66
|
-
|
|
67
|
-
+start() dict
|
|
68
|
-
+get_workflow_summary() dict
|
|
69
|
-
+_configure_workflow() None
|
|
70
|
-
+_configure_strategies() dict
|
|
71
|
-
}
|
|
72
|
-
|
|
73
|
-
class UploadOrchestrator {
|
|
74
|
-
+context: UploadContext
|
|
75
|
-
+step_registry: StepRegistry
|
|
76
|
-
+strategies: dict
|
|
77
|
-
+executed_steps: list
|
|
78
|
-
+current_step_index: int
|
|
79
|
-
+rollback_executed: bool
|
|
80
|
-
|
|
81
|
-
+execute() dict
|
|
82
|
-
+get_workflow_summary() dict
|
|
83
|
-
+get_executed_steps() list
|
|
84
|
-
+is_rollback_executed() bool
|
|
85
|
-
+_execute_step(step) StepResult
|
|
86
|
-
+_handle_step_failure(step, error) None
|
|
87
|
-
+_rollback_executed_steps() None
|
|
88
|
-
}
|
|
89
|
-
|
|
90
|
-
class UploadContext {
|
|
91
|
-
+params: dict
|
|
92
|
-
+run: UploadRun
|
|
93
|
-
+client: Any
|
|
94
|
-
+storage: Any
|
|
95
|
-
+pathlib_cwd: Path
|
|
96
|
-
+metadata: dict
|
|
97
|
-
+file_specifications: dict
|
|
98
|
-
+organized_files: list
|
|
99
|
-
+uploaded_files: list
|
|
100
|
-
+data_units: list
|
|
101
|
-
+metrics: dict
|
|
102
|
-
+errors: list
|
|
103
|
-
+strategies: dict
|
|
104
|
-
+rollback_data: dict
|
|
105
|
-
|
|
106
|
-
+update(result: StepResult) None
|
|
107
|
-
+get_result() dict
|
|
108
|
-
+has_errors() bool
|
|
109
|
-
+update_metrics(category, metrics) None
|
|
110
|
-
}
|
|
111
|
-
|
|
112
|
-
class StepRegistry {
|
|
113
|
-
+_steps: list
|
|
114
|
-
+register(step: BaseStep) None
|
|
115
|
-
+get_steps() list
|
|
116
|
-
+get_total_progress_weight() float
|
|
117
|
-
+clear() None
|
|
118
|
-
}
|
|
119
|
-
|
|
120
|
-
class StrategyFactory {
|
|
121
|
-
+create_validation_strategy(params, context) BaseValidationStrategy
|
|
122
|
-
+create_file_discovery_strategy(params, context) BaseFileDiscoveryStrategy
|
|
123
|
-
+create_metadata_strategy(params, context) BaseMetadataStrategy
|
|
124
|
-
+create_upload_strategy(params, context) BaseUploadStrategy
|
|
125
|
-
+create_data_unit_strategy(params, context) BaseDataUnitStrategy
|
|
126
|
-
+get_available_strategies() dict
|
|
127
|
-
}
|
|
128
|
-
|
|
129
|
-
class BaseStep {
|
|
130
|
-
<<abstract>>
|
|
131
|
-
+name: str
|
|
132
|
-
+progress_weight: float
|
|
133
|
-
+execute(context: UploadContext) StepResult
|
|
134
|
-
+can_skip(context: UploadContext) bool
|
|
135
|
-
+rollback(context: UploadContext) None
|
|
136
|
-
+create_success_result(data) StepResult
|
|
137
|
-
+create_error_result(error) StepResult
|
|
138
|
-
+create_skip_result() StepResult
|
|
139
|
-
}
|
|
140
|
-
|
|
141
|
-
class ExcelSecurityConfig {
|
|
142
|
-
+max_file_size_mb: int = 10
|
|
143
|
-
+max_rows: int = 100000
|
|
144
|
-
+max_columns: int = 50
|
|
145
|
-
+max_file_size_bytes: int
|
|
146
|
-
+MAX_FILE_SIZE_MB: int
|
|
147
|
-
+MAX_FILE_SIZE_BYTES: int
|
|
148
|
-
+MAX_ROWS: int
|
|
149
|
-
+MAX_COLUMNS: int
|
|
150
|
-
+from_action_config(action_config) ExcelSecurityConfig
|
|
151
|
-
}
|
|
152
|
-
|
|
153
|
-
class StepResult {
|
|
154
|
-
+success: bool
|
|
155
|
-
+data: dict
|
|
156
|
-
+error: str
|
|
157
|
-
+rollback_data: dict
|
|
158
|
-
+skipped: bool
|
|
159
|
-
+original_exception: Exception
|
|
160
|
-
+timestamp: datetime
|
|
161
|
-
}
|
|
162
|
-
|
|
163
|
-
%% Strategy Base Classes
|
|
164
|
-
class BaseValidationStrategy {
|
|
165
|
-
<<abstract>>
|
|
166
|
-
+validate_files(files, context) bool
|
|
167
|
-
+validate_security(file_path) bool
|
|
168
|
-
}
|
|
169
|
-
|
|
170
|
-
class BaseFileDiscoveryStrategy {
|
|
171
|
-
<<abstract>>
|
|
172
|
-
+discover_files(path, context) list
|
|
173
|
-
+organize_files(files, specs, context) list
|
|
174
|
-
}
|
|
175
|
-
|
|
176
|
-
class BaseMetadataStrategy {
|
|
177
|
-
<<abstract>>
|
|
178
|
-
+process_metadata(context) dict
|
|
179
|
-
+extract_metadata(file_path) dict
|
|
180
|
-
}
|
|
181
|
-
|
|
182
|
-
class BaseUploadStrategy {
|
|
183
|
-
<<abstract>>
|
|
184
|
-
+upload_files(files, context) list
|
|
185
|
-
+upload_batch(batch, context) list
|
|
186
|
-
}
|
|
187
|
-
|
|
188
|
-
class BaseDataUnitStrategy {
|
|
189
|
-
<<abstract>>
|
|
190
|
-
+generate_data_units(files, context) list
|
|
191
|
-
+create_data_unit_batch(batch, context) list
|
|
192
|
-
}
|
|
193
|
-
|
|
194
|
-
%% Workflow Steps
|
|
195
|
-
class InitializeStep {
|
|
196
|
-
+name = "initialize"
|
|
197
|
-
+progress_weight = 0.05
|
|
198
|
-
}
|
|
199
|
-
|
|
200
|
-
class ProcessMetadataStep {
|
|
201
|
-
+name = "process_metadata"
|
|
202
|
-
+progress_weight = 0.05
|
|
203
|
-
}
|
|
204
|
-
|
|
205
|
-
class AnalyzeCollectionStep {
|
|
206
|
-
+name = "analyze_collection"
|
|
207
|
-
+progress_weight = 0.05
|
|
208
|
-
}
|
|
209
|
-
|
|
210
|
-
class OrganizeFilesStep {
|
|
211
|
-
+name = "organize_files"
|
|
212
|
-
+progress_weight = 0.10
|
|
213
|
-
}
|
|
214
|
-
|
|
215
|
-
class ValidateFilesStep {
|
|
216
|
-
+name = "validate_files"
|
|
217
|
-
+progress_weight = 0.05
|
|
218
|
-
}
|
|
219
|
-
|
|
220
|
-
class UploadFilesStep {
|
|
221
|
-
+name = "upload_files"
|
|
222
|
-
+progress_weight = 0.30
|
|
223
|
-
}
|
|
224
|
-
|
|
225
|
-
class GenerateDataUnitsStep {
|
|
226
|
-
+name = "generate_data_units"
|
|
227
|
-
+progress_weight = 0.35
|
|
228
|
-
}
|
|
229
|
-
|
|
230
|
-
class CleanupStep {
|
|
231
|
-
+name = "cleanup"
|
|
232
|
-
+progress_weight = 0.05
|
|
233
|
-
}
|
|
234
|
-
|
|
235
|
-
%% Relationships
|
|
236
|
-
UploadAction --> UploadRun : uses
|
|
237
|
-
UploadAction --> UploadParams : validates with
|
|
238
|
-
UploadAction --> ExcelSecurityConfig : configures
|
|
239
|
-
UploadAction --> UploadOrchestrator : creates and executes
|
|
240
|
-
UploadAction --> StrategyFactory : configures strategies
|
|
241
|
-
UploadAction --> StepRegistry : manages workflow steps
|
|
242
|
-
UploadRun --> LogCode : logs with
|
|
243
|
-
UploadRun --> UploadStatus : tracks status
|
|
244
|
-
UploadOrchestrator --> UploadContext : coordinates state
|
|
245
|
-
UploadOrchestrator --> StepRegistry : executes steps from
|
|
246
|
-
UploadOrchestrator --> BaseStep : executes
|
|
247
|
-
BaseStep --> StepResult : returns
|
|
248
|
-
UploadContext --> StepResult : updates with
|
|
249
|
-
StrategyFactory --> BaseValidationStrategy : creates
|
|
250
|
-
StrategyFactory --> BaseFileDiscoveryStrategy : creates
|
|
251
|
-
StrategyFactory --> BaseMetadataStrategy : creates
|
|
252
|
-
StrategyFactory --> BaseUploadStrategy : creates
|
|
253
|
-
StrategyFactory --> BaseDataUnitStrategy : creates
|
|
254
|
-
StepRegistry --> BaseStep : contains
|
|
255
|
-
|
|
256
|
-
%% Step inheritance
|
|
257
|
-
InitializeStep --|> BaseStep : extends
|
|
258
|
-
ProcessMetadataStep --|> BaseStep : extends
|
|
259
|
-
AnalyzeCollectionStep --|> BaseStep : extends
|
|
260
|
-
OrganizeFilesStep --|> BaseStep : extends
|
|
261
|
-
ValidateFilesStep --|> BaseStep : extends
|
|
262
|
-
UploadFilesStep --|> BaseStep : extends
|
|
263
|
-
GenerateDataUnitsStep --|> BaseStep : extends
|
|
264
|
-
CleanupStep --|> BaseStep : extends
|
|
265
|
-
|
|
266
|
-
%% Note: Class styling defined above - Mermaid will apply based on classDef definitions
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
### Step-Based Workflow Execution
|
|
270
|
-
|
|
271
|
-
The refactored architecture uses a step-based workflow coordinated by the UploadOrchestrator. Each step has a defined responsibility and progress weight.
|
|
272
|
-
|
|
273
|
-
#### Workflow Steps Overview
|
|
274
|
-
|
|
275
|
-
| Step | Name | Weight | Responsibility |
|
|
276
|
-
| ---- | ------------------- | ------ | -------------------------------------------- |
|
|
277
|
-
| 1 | Initialize | 5% | Setup storage, pathlib, and basic validation |
|
|
278
|
-
| 2 | Process Metadata | 5% | Handle Excel metadata if provided |
|
|
279
|
-
| 3 | Analyze Collection | 5% | Retrieve and validate data collection specs |
|
|
280
|
-
| 4 | Organize Files | 10% | Discover and organize files by type |
|
|
281
|
-
| 5 | Validate Files | 5% | Security and content validation |
|
|
282
|
-
| 6 | Upload Files | 30% | Upload files to storage |
|
|
283
|
-
| 7 | Generate Data Units | 35% | Create data units from uploaded files |
|
|
284
|
-
| 8 | Cleanup | 5% | Clean temporary resources |
|
|
285
|
-
|
|
286
|
-
#### Execution Flow
|
|
287
|
-
|
|
288
|
-
```mermaid
|
|
289
|
-
flowchart TD
|
|
290
|
-
%% Start
|
|
291
|
-
A["🚀 Upload Action Started"] --> B["📋 Create UploadContext"]
|
|
292
|
-
B --> C["⚙️ Configure Strategies"]
|
|
293
|
-
C --> D["📝 Register Workflow Steps"]
|
|
294
|
-
D --> E["🎯 Create UploadOrchestrator"]
|
|
295
|
-
|
|
296
|
-
%% Strategy Injection
|
|
297
|
-
E --> F["💉 Inject Strategies into Context"]
|
|
298
|
-
F --> G["📊 Initialize Progress Tracking"]
|
|
299
|
-
|
|
300
|
-
%% Step Execution Loop
|
|
301
|
-
G --> H["🔄 Start Step Execution Loop"]
|
|
302
|
-
H --> I["📍 Get Next Step"]
|
|
303
|
-
I --> J{"🤔 Can Step be Skipped?"}
|
|
304
|
-
J -->|Yes| K["⏭️ Skip Step"]
|
|
305
|
-
J -->|No| L["▶️ Execute Step"]
|
|
306
|
-
|
|
307
|
-
%% Step Execution
|
|
308
|
-
L --> M{"✅ Step Successful?"}
|
|
309
|
-
M -->|Yes| N["📈 Update Progress"]
|
|
310
|
-
M -->|No| O["❌ Handle Step Failure"]
|
|
311
|
-
|
|
312
|
-
%% Success Path
|
|
313
|
-
N --> P["💾 Store Step Result"]
|
|
314
|
-
P --> Q["📝 Add to Executed Steps"]
|
|
315
|
-
Q --> R{"🏁 More Steps?"}
|
|
316
|
-
R -->|Yes| I
|
|
317
|
-
R -->|No| S["🎉 Workflow Complete"]
|
|
318
|
-
|
|
319
|
-
%% Skip Path
|
|
320
|
-
K --> T["📊 Update Progress (Skip)"]
|
|
321
|
-
T --> R
|
|
322
|
-
|
|
323
|
-
%% Error Handling
|
|
324
|
-
O --> U["🔙 Start Rollback Process"]
|
|
325
|
-
U --> V["⏪ Rollback Executed Steps"]
|
|
326
|
-
V --> W["📝 Log Rollback Results"]
|
|
327
|
-
W --> X["💥 Propagate Exception"]
|
|
328
|
-
|
|
329
|
-
%% Final Results
|
|
330
|
-
S --> Y["📊 Collect Final Metrics"]
|
|
331
|
-
Y --> Z["📋 Generate Result Summary"]
|
|
332
|
-
Z --> AA["🔄 Return to UploadAction"]
|
|
333
|
-
|
|
334
|
-
%% Apply styles - Light/Dark mode compatible
|
|
335
|
-
classDef startNode fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000
|
|
336
|
-
classDef processNode fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000
|
|
337
|
-
classDef decisionNode fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000000
|
|
338
|
-
classDef successNode fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000
|
|
339
|
-
classDef errorNode fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000000
|
|
340
|
-
classDef stepNode fill:#f0f4c3,stroke:#689f38,stroke-width:1px,color:#000000
|
|
341
|
-
|
|
342
|
-
class A,B,E startNode
|
|
343
|
-
class C,D,F,G,H,I,L,N,P,Q,T,Y,Z,AA processNode
|
|
344
|
-
class J,M,R decisionNode
|
|
345
|
-
class K,S successNode
|
|
346
|
-
class O,U,V,W,X errorNode
|
|
347
|
-
```
|
|
348
|
-
|
|
349
|
-
#### Strategy Integration Points
|
|
350
|
-
|
|
351
|
-
Strategies are injected into the workflow at specific points:
|
|
352
|
-
|
|
353
|
-
- **Validation Strategy**: Used by ValidateFilesStep
|
|
354
|
-
- **File Discovery Strategy**: Used by OrganizeFilesStep
|
|
355
|
-
- **Metadata Strategy**: Used by ProcessMetadataStep
|
|
356
|
-
- **Upload Strategy**: Used by UploadFilesStep
|
|
357
|
-
- **Data Unit Strategy**: Used by GenerateDataUnitsStep
|
|
358
|
-
|
|
359
|
-
#### Error Handling and Rollback
|
|
360
|
-
|
|
361
|
-
The orchestrator provides automatic rollback functionality:
|
|
362
|
-
|
|
363
|
-
1. **Exception Capture**: Preserves original exceptions for debugging
|
|
364
|
-
2. **Rollback Execution**: Calls rollback() on all successfully executed steps in reverse order
|
|
365
|
-
3. **Graceful Degradation**: Continues rollback even if individual step rollbacks fail
|
|
366
|
-
4. **State Preservation**: Maintains execution state for post-failure analysis
|
|
367
|
-
|
|
368
|
-
## Development Guide
|
|
369
|
-
|
|
370
|
-
This section provides comprehensive guidance for extending the upload action with custom strategies and workflow steps.
|
|
371
|
-
|
|
372
|
-
### Creating Custom Strategies
|
|
373
|
-
|
|
374
|
-
Strategies implement specific behaviors for different aspects of the upload process. Each strategy type has a well-defined interface.
|
|
375
|
-
|
|
376
|
-
#### Custom Validation Strategy
|
|
377
|
-
|
|
378
|
-
```python
|
|
379
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.validation.base import BaseValidationStrategy
|
|
380
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext
|
|
381
|
-
|
|
382
|
-
class CustomValidationStrategy(BaseValidationStrategy):
|
|
383
|
-
"""Custom validation strategy with advanced security checks."""
|
|
384
|
-
|
|
385
|
-
def validate_files(self, files: List[Path], context: UploadContext) -> bool:
|
|
386
|
-
"""Validate files using custom business rules."""
|
|
387
|
-
for file_path in files:
|
|
388
|
-
# Custom validation logic
|
|
389
|
-
if not self._validate_custom_rules(file_path):
|
|
390
|
-
return False
|
|
391
|
-
|
|
392
|
-
# Call security validation
|
|
393
|
-
if not self.validate_security(file_path):
|
|
394
|
-
return False
|
|
395
|
-
return True
|
|
396
|
-
|
|
397
|
-
def validate_security(self, file_path: Path) -> bool:
|
|
398
|
-
"""Custom security validation."""
|
|
399
|
-
# Implement custom security checks
|
|
400
|
-
if file_path.suffix in ['.exe', '.bat', '.sh']:
|
|
401
|
-
return False
|
|
402
|
-
|
|
403
|
-
# Check file size
|
|
404
|
-
if file_path.stat().st_size > 100 * 1024 * 1024: # 100MB
|
|
405
|
-
return False
|
|
406
|
-
|
|
407
|
-
return True
|
|
408
|
-
|
|
409
|
-
def _validate_custom_rules(self, file_path: Path) -> bool:
|
|
410
|
-
"""Implement domain-specific validation rules."""
|
|
411
|
-
# Custom business logic
|
|
412
|
-
return True
|
|
413
|
-
```
|
|
414
|
-
|
|
415
|
-
#### Custom File Discovery Strategy
|
|
416
|
-
|
|
417
|
-
```python
|
|
418
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.file_discovery.base import BaseFileDiscoveryStrategy
|
|
419
|
-
from pathlib import Path
|
|
420
|
-
from typing import List, Dict, Any
|
|
421
|
-
|
|
422
|
-
class CustomFileDiscoveryStrategy(BaseFileDiscoveryStrategy):
|
|
423
|
-
"""Custom file discovery with advanced filtering."""
|
|
424
|
-
|
|
425
|
-
def discover_files(self, path: Path, context: UploadContext) -> List[Path]:
|
|
426
|
-
"""Discover files with custom filtering rules."""
|
|
427
|
-
files = []
|
|
428
|
-
|
|
429
|
-
if context.get_param('is_recursive', False):
|
|
430
|
-
files = list(path.rglob('*'))
|
|
431
|
-
else:
|
|
432
|
-
files = list(path.iterdir())
|
|
433
|
-
|
|
434
|
-
# Apply custom filtering
|
|
435
|
-
return self._apply_custom_filters(files, context)
|
|
436
|
-
|
|
437
|
-
def organize_files(self, files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]:
|
|
438
|
-
"""Organize files using custom categorization."""
|
|
439
|
-
organized = []
|
|
440
|
-
|
|
441
|
-
for file_path in files:
|
|
442
|
-
if file_path.is_file():
|
|
443
|
-
category = self._determine_category(file_path)
|
|
444
|
-
organized.append({
|
|
445
|
-
'file_path': file_path,
|
|
446
|
-
'category': category,
|
|
447
|
-
'metadata': self._extract_file_metadata(file_path)
|
|
448
|
-
})
|
|
449
|
-
|
|
450
|
-
return organized
|
|
451
|
-
|
|
452
|
-
def _apply_custom_filters(self, files: List[Path], context: UploadContext) -> List[Path]:
|
|
453
|
-
"""Apply domain-specific file filters."""
|
|
454
|
-
filtered = []
|
|
455
|
-
for file_path in files:
|
|
456
|
-
if self._should_include_file(file_path):
|
|
457
|
-
filtered.append(file_path)
|
|
458
|
-
return filtered
|
|
459
|
-
|
|
460
|
-
def _determine_category(self, file_path: Path) -> str:
|
|
461
|
-
"""Determine file category using custom logic."""
|
|
462
|
-
# Custom categorization logic
|
|
463
|
-
ext = file_path.suffix.lower()
|
|
464
|
-
if ext in ['.jpg', '.png', '.gif']:
|
|
465
|
-
return 'images'
|
|
466
|
-
elif ext in ['.pdf', '.doc', '.docx']:
|
|
467
|
-
return 'documents'
|
|
468
|
-
else:
|
|
469
|
-
return 'other'
|
|
470
|
-
```
|
|
471
|
-
|
|
472
|
-
#### Custom Upload Strategy
|
|
473
|
-
|
|
474
|
-
```python
|
|
475
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.strategies.upload.base import BaseUploadStrategy
|
|
476
|
-
from typing import List, Dict, Any
|
|
477
|
-
|
|
478
|
-
class CustomUploadStrategy(BaseUploadStrategy):
|
|
479
|
-
"""Custom upload strategy with advanced retry logic."""
|
|
480
|
-
|
|
481
|
-
def upload_files(self, files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
|
|
482
|
-
"""Upload files with custom batching and retry logic."""
|
|
483
|
-
uploaded_files = []
|
|
484
|
-
batch_size = context.get_param('upload_batch_size', 10)
|
|
485
|
-
|
|
486
|
-
# Process in custom batches
|
|
487
|
-
for i in range(0, len(files), batch_size):
|
|
488
|
-
batch = files[i:i + batch_size]
|
|
489
|
-
batch_results = self.upload_batch(batch, context)
|
|
490
|
-
uploaded_files.extend(batch_results)
|
|
491
|
-
|
|
492
|
-
return uploaded_files
|
|
493
|
-
|
|
494
|
-
def upload_batch(self, batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]:
|
|
495
|
-
"""Upload a batch of files with retry logic."""
|
|
496
|
-
results = []
|
|
497
|
-
|
|
498
|
-
for file_info in batch:
|
|
499
|
-
max_retries = 3
|
|
500
|
-
for attempt in range(max_retries):
|
|
501
|
-
try:
|
|
502
|
-
result = self._upload_single_file(file_info, context)
|
|
503
|
-
results.append(result)
|
|
504
|
-
break
|
|
505
|
-
except Exception as e:
|
|
506
|
-
if attempt == max_retries - 1:
|
|
507
|
-
# Final attempt failed
|
|
508
|
-
context.add_error(f"Failed to upload {file_info['file_path']}: {e}")
|
|
509
|
-
else:
|
|
510
|
-
# Wait before retry
|
|
511
|
-
time.sleep(2 ** attempt)
|
|
512
|
-
|
|
513
|
-
return results
|
|
514
|
-
|
|
515
|
-
def _upload_single_file(self, file_info: Dict[str, Any], context: UploadContext) -> Dict[str, Any]:
|
|
516
|
-
"""Upload a single file with custom logic."""
|
|
517
|
-
# Custom upload implementation
|
|
518
|
-
file_path = file_info['file_path']
|
|
519
|
-
|
|
520
|
-
# Use the storage from context
|
|
521
|
-
storage = context.storage
|
|
522
|
-
|
|
523
|
-
# Custom upload logic here
|
|
524
|
-
uploaded_file = {
|
|
525
|
-
'file_path': str(file_path),
|
|
526
|
-
'storage_path': f"uploads/{file_path.name}",
|
|
527
|
-
'size': file_path.stat().st_size,
|
|
528
|
-
'checksum': self._calculate_checksum(file_path)
|
|
529
|
-
}
|
|
530
|
-
|
|
531
|
-
return uploaded_file
|
|
532
|
-
```
|
|
533
|
-
|
|
534
|
-
### Creating Custom Workflow Steps
|
|
535
|
-
|
|
536
|
-
Custom workflow steps extend the base step class and implement the required interface.
|
|
537
|
-
|
|
538
|
-
#### Custom Processing Step
|
|
539
|
-
|
|
540
|
-
```python
|
|
541
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.steps.base import BaseStep
|
|
542
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.context import UploadContext, StepResult
|
|
543
|
-
from pathlib import Path
|
|
544
|
-
|
|
545
|
-
class CustomProcessingStep(BaseStep):
|
|
546
|
-
"""Custom processing step for specialized file handling."""
|
|
547
|
-
|
|
548
|
-
@property
|
|
549
|
-
def name(self) -> str:
|
|
550
|
-
return 'custom_processing'
|
|
551
|
-
|
|
552
|
-
@property
|
|
553
|
-
def progress_weight(self) -> float:
|
|
554
|
-
return 0.15 # 15% of total workflow
|
|
555
|
-
|
|
556
|
-
def execute(self, context: UploadContext) -> StepResult:
|
|
557
|
-
"""Execute custom processing logic."""
|
|
558
|
-
try:
|
|
559
|
-
# Custom processing logic
|
|
560
|
-
processed_files = self._process_files(context)
|
|
561
|
-
|
|
562
|
-
# Update context with results
|
|
563
|
-
return self.create_success_result({
|
|
564
|
-
'processed_files': processed_files,
|
|
565
|
-
'processing_stats': self._get_processing_stats()
|
|
566
|
-
})
|
|
567
|
-
|
|
568
|
-
except Exception as e:
|
|
569
|
-
return self.create_error_result(f'Custom processing failed: {str(e)}')
|
|
570
|
-
|
|
571
|
-
def can_skip(self, context: UploadContext) -> bool:
|
|
572
|
-
"""Determine if step can be skipped."""
|
|
573
|
-
# Skip if no files to process
|
|
574
|
-
return len(context.organized_files) == 0
|
|
575
|
-
|
|
576
|
-
def rollback(self, context: UploadContext) -> None:
|
|
577
|
-
"""Rollback custom processing operations."""
|
|
578
|
-
# Clean up any resources created during processing
|
|
579
|
-
self._cleanup_processing_resources(context)
|
|
580
|
-
|
|
581
|
-
def _process_files(self, context: UploadContext) -> List[Dict]:
|
|
582
|
-
"""Implement custom file processing."""
|
|
583
|
-
processed = []
|
|
584
|
-
|
|
585
|
-
for file_info in context.organized_files:
|
|
586
|
-
# Custom processing logic
|
|
587
|
-
result = self._process_single_file(file_info)
|
|
588
|
-
processed.append(result)
|
|
589
|
-
|
|
590
|
-
return processed
|
|
591
|
-
|
|
592
|
-
def _process_single_file(self, file_info: Dict) -> Dict:
|
|
593
|
-
"""Process a single file."""
|
|
594
|
-
# Custom processing implementation
|
|
595
|
-
return {
|
|
596
|
-
'original': file_info,
|
|
597
|
-
'processed': True,
|
|
598
|
-
'timestamp': datetime.now()
|
|
599
|
-
}
|
|
600
|
-
```
|
|
601
|
-
|
|
602
|
-
### Strategy Factory Extension
|
|
603
|
-
|
|
604
|
-
To make custom strategies available, extend the StrategyFactory:
|
|
605
|
-
|
|
606
|
-
```python
|
|
607
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.factory import StrategyFactory
|
|
608
|
-
|
|
609
|
-
class CustomStrategyFactory(StrategyFactory):
|
|
610
|
-
"""Extended factory with custom strategies."""
|
|
611
|
-
|
|
612
|
-
def create_validation_strategy(self, params: Dict, context=None):
|
|
613
|
-
"""Create validation strategy with custom options."""
|
|
614
|
-
validation_type = params.get('custom_validation_type', 'default')
|
|
615
|
-
|
|
616
|
-
if validation_type == 'strict':
|
|
617
|
-
return CustomValidationStrategy()
|
|
618
|
-
else:
|
|
619
|
-
return super().create_validation_strategy(params, context)
|
|
620
|
-
|
|
621
|
-
def create_file_discovery_strategy(self, params: Dict, context=None):
|
|
622
|
-
"""Create file discovery strategy with custom options."""
|
|
623
|
-
discovery_mode = params.get('discovery_mode', 'default')
|
|
624
|
-
|
|
625
|
-
if discovery_mode == 'advanced':
|
|
626
|
-
return CustomFileDiscoveryStrategy()
|
|
627
|
-
else:
|
|
628
|
-
return super().create_file_discovery_strategy(params, context)
|
|
629
|
-
```
|
|
630
|
-
|
|
631
|
-
### Custom Upload Action
|
|
632
|
-
|
|
633
|
-
For comprehensive customization, extend the UploadAction itself:
|
|
634
|
-
|
|
635
|
-
```python
|
|
636
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
637
|
-
from synapse_sdk.plugins.categories.decorators import register_action
|
|
638
|
-
|
|
639
|
-
@register_action
|
|
640
|
-
class CustomUploadAction(UploadAction):
|
|
641
|
-
"""Custom upload action with extended workflow."""
|
|
642
|
-
|
|
643
|
-
name = 'custom_upload'
|
|
644
|
-
|
|
645
|
-
def __init__(self, *args, **kwargs):
|
|
646
|
-
super().__init__(*args, **kwargs)
|
|
647
|
-
# Use custom strategy factory
|
|
648
|
-
self.strategy_factory = CustomStrategyFactory()
|
|
649
|
-
|
|
650
|
-
def _configure_workflow(self) -> None:
|
|
651
|
-
"""Configure custom workflow with additional steps."""
|
|
652
|
-
# Register standard steps
|
|
653
|
-
super()._configure_workflow()
|
|
654
|
-
|
|
655
|
-
# Add custom processing step
|
|
656
|
-
self.step_registry.register(CustomProcessingStep())
|
|
657
|
-
|
|
658
|
-
def _configure_strategies(self, context=None) -> Dict[str, Any]:
|
|
659
|
-
"""Configure strategies with custom parameters."""
|
|
660
|
-
strategies = super()._configure_strategies(context)
|
|
661
|
-
|
|
662
|
-
# Add custom strategy
|
|
663
|
-
strategies['custom_processing'] = self._create_custom_processing_strategy()
|
|
664
|
-
|
|
665
|
-
return strategies
|
|
666
|
-
|
|
667
|
-
def _create_custom_processing_strategy(self):
|
|
668
|
-
"""Create custom processing strategy."""
|
|
669
|
-
return CustomProcessingStrategy(self.params)
|
|
670
|
-
```
|
|
671
|
-
|
|
672
|
-
### Testing Custom Components
|
|
673
|
-
|
|
674
|
-
#### Testing Custom Strategies
|
|
675
|
-
|
|
676
|
-
```python
|
|
677
|
-
import pytest
|
|
678
|
-
from unittest.mock import Mock
|
|
679
|
-
from pathlib import Path
|
|
680
|
-
|
|
681
|
-
class TestCustomValidationStrategy:
|
|
682
|
-
|
|
683
|
-
def setup_method(self):
|
|
684
|
-
self.strategy = CustomValidationStrategy()
|
|
685
|
-
self.context = Mock()
|
|
686
|
-
|
|
687
|
-
def test_validate_files_success(self):
|
|
688
|
-
"""Test successful file validation."""
|
|
689
|
-
files = [Path('/test/file1.txt'), Path('/test/file2.jpg')]
|
|
690
|
-
result = self.strategy.validate_files(files, self.context)
|
|
691
|
-
assert result is True
|
|
692
|
-
|
|
693
|
-
def test_validate_files_security_failure(self):
|
|
694
|
-
"""Test validation failure for security reasons."""
|
|
695
|
-
files = [Path('/test/malware.exe')]
|
|
696
|
-
result = self.strategy.validate_files(files, self.context)
|
|
697
|
-
assert result is False
|
|
698
|
-
|
|
699
|
-
def test_validate_large_file_failure(self):
|
|
700
|
-
"""Test validation failure for large files."""
|
|
701
|
-
# Mock file stat to return large size
|
|
702
|
-
large_file = Mock(spec=Path)
|
|
703
|
-
large_file.suffix = '.txt'
|
|
704
|
-
large_file.stat.return_value.st_size = 200 * 1024 * 1024 # 200MB
|
|
705
|
-
|
|
706
|
-
result = self.strategy.validate_security(large_file)
|
|
707
|
-
assert result is False
|
|
708
|
-
```
|
|
709
|
-
|
|
710
|
-
#### Testing Custom Steps
|
|
711
|
-
|
|
712
|
-
```python
|
|
713
|
-
class TestCustomProcessingStep:
|
|
714
|
-
|
|
715
|
-
def setup_method(self):
|
|
716
|
-
self.step = CustomProcessingStep()
|
|
717
|
-
self.context = Mock()
|
|
718
|
-
self.context.organized_files = [
|
|
719
|
-
{'file_path': '/test/file1.txt'},
|
|
720
|
-
{'file_path': '/test/file2.jpg'}
|
|
721
|
-
]
|
|
722
|
-
|
|
723
|
-
def test_execute_success(self):
|
|
724
|
-
"""Test successful step execution."""
|
|
725
|
-
result = self.step.execute(self.context)
|
|
726
|
-
|
|
727
|
-
assert result.success is True
|
|
728
|
-
assert 'processed_files' in result.data
|
|
729
|
-
assert len(result.data['processed_files']) == 2
|
|
730
|
-
|
|
731
|
-
def test_can_skip_with_no_files(self):
|
|
732
|
-
"""Test step skipping logic."""
|
|
733
|
-
self.context.organized_files = []
|
|
734
|
-
assert self.step.can_skip(self.context) is True
|
|
735
|
-
|
|
736
|
-
def test_rollback_cleanup(self):
|
|
737
|
-
"""Test rollback cleanup."""
|
|
738
|
-
# This should not raise an exception
|
|
739
|
-
self.step.rollback(self.context)
|
|
740
|
-
```
|
|
741
|
-
|
|
742
|
-
## Upload Parameters
|
|
743
|
-
|
|
744
|
-
The upload action uses `UploadParams` for comprehensive parameter validation:
|
|
745
|
-
|
|
746
|
-
### Required Parameters
|
|
747
|
-
|
|
748
|
-
| Parameter | Type | Description | Validation |
|
|
749
|
-
| ----------------- | ----- | -------------------------- | ------------------ |
|
|
750
|
-
| `name` | `str` | Human-readable upload name | Must be non-blank |
|
|
751
|
-
| `path` | `str` | Source file/directory path | Must be valid path |
|
|
752
|
-
| `storage` | `int` | Target storage ID | Must exist via API |
|
|
753
|
-
| `data_collection` | `int` | Data collection ID | Must exist via API |
|
|
754
|
-
|
|
755
|
-
### Optional Parameters
|
|
756
|
-
|
|
757
|
-
| Parameter | Type | Default | Description |
|
|
758
|
-
| ------------------------------- | ------------- | ------- | ---------------------------------- |
|
|
759
|
-
| `description` | `str \| None` | `None` | Upload description |
|
|
760
|
-
| `project` | `int \| None` | `None` | Project ID (validated if provided) |
|
|
761
|
-
| `excel_metadata_path` | `str \| None` | `None` | Path to Excel metadata file |
|
|
762
|
-
| `is_recursive` | `bool` | `False` | Scan directories recursively |
|
|
763
|
-
| `max_file_size_mb` | `int` | `50` | Maximum file size in MB |
|
|
764
|
-
| `creating_data_unit_batch_size` | `int` | `100` | Batch size for data units |
|
|
765
|
-
| `use_async_upload` | `bool` | `True` | Use asynchronous processing |
|
|
766
|
-
|
|
767
|
-
### Parameter Validation
|
|
768
|
-
|
|
769
|
-
The system performs real-time validation:
|
|
770
|
-
|
|
771
|
-
```python
|
|
772
|
-
# Storage validation
|
|
773
|
-
@field_validator('storage', mode='before')
|
|
774
|
-
@classmethod
|
|
775
|
-
def check_storage_exists(cls, value: str, info) -> str:
|
|
776
|
-
action = info.context['action']
|
|
777
|
-
client = action.client
|
|
778
|
-
try:
|
|
779
|
-
client.get_storage(value)
|
|
780
|
-
except ClientError:
|
|
781
|
-
raise PydanticCustomError('client_error', 'Storage not found')
|
|
782
|
-
return value
|
|
783
|
-
```
|
|
784
|
-
|
|
785
|
-
## Excel Metadata Processing
|
|
786
|
-
|
|
787
|
-
Upload plugins provide advanced Excel metadata processing with comprehensive filename matching, flexible header support, and optimized performance:
|
|
788
|
-
|
|
789
|
-
### Excel File Format
|
|
790
|
-
|
|
791
|
-
The Excel file supports flexible header formats and comprehensive filename matching:
|
|
792
|
-
|
|
793
|
-
#### Supported Header Formats
|
|
794
|
-
|
|
795
|
-
Both header formats are supported with case-insensitive matching:
|
|
796
|
-
|
|
797
|
-
**Option 1: "filename" header**
|
|
798
|
-
| filename | category | description | custom_field |
|
|
799
|
-
| ---------- | -------- | ------------------ | ------------ |
|
|
800
|
-
| image1.jpg | nature | Mountain landscape | high_res |
|
|
801
|
-
| image2.png | urban | City skyline | processed |
|
|
802
|
-
|
|
803
|
-
**Option 2: "file_name" header**
|
|
804
|
-
| file_name | category | description | custom_field |
|
|
805
|
-
| ---------- | -------- | ------------------ | ------------ |
|
|
806
|
-
| image1.jpg | nature | Mountain landscape | high_res |
|
|
807
|
-
| image2.png | urban | City skyline | processed |
|
|
808
|
-
|
|
809
|
-
#### Filename Matching Strategy
|
|
810
|
-
|
|
811
|
-
The system uses a comprehensive 5-tier priority matching algorithm to associate files with metadata:
|
|
812
|
-
|
|
813
|
-
1. **Exact stem match** (highest priority): `image1` matches `image1.jpg`
|
|
814
|
-
2. **Exact filename match**: `image1.jpg` matches `image1.jpg`
|
|
815
|
-
3. **Metadata key stem match**: `path/image1.ext` stem matches `image1`
|
|
816
|
-
4. **Partial path matching**: `/uploads/image1.jpg` contains `image1`
|
|
817
|
-
5. **Full path matching**: Complete path matching for complex structures
|
|
818
|
-
|
|
819
|
-
This robust matching ensures metadata is correctly associated regardless of file organization or naming conventions.
|
|
820
|
-
|
|
821
|
-
### Security Validation
|
|
822
|
-
|
|
823
|
-
Excel files undergo comprehensive security validation:
|
|
824
|
-
|
|
825
|
-
```python
|
|
826
|
-
class ExcelSecurityConfig:
|
|
827
|
-
max_file_size_mb: int = 10 # File size limit in MB
|
|
828
|
-
max_rows: int = 100000 # Row count limit
|
|
829
|
-
max_columns: int = 50 # Column count limit
|
|
830
|
-
```
|
|
831
|
-
|
|
832
|
-
#### Advanced Security Features
|
|
833
|
-
|
|
834
|
-
- **File format validation**: Checks Excel file signatures (PK for .xlsx, compound document for .xls)
|
|
835
|
-
- **Memory estimation**: Prevents memory exhaustion from oversized spreadsheets
|
|
836
|
-
- **Content sanitization**: Automatic truncation of overly long values
|
|
837
|
-
- **Error resilience**: Graceful handling of corrupted or inaccessible files
|
|
838
|
-
|
|
839
|
-
### Configuration via config.yaml
|
|
840
|
-
|
|
841
|
-
Security limits and processing options can be configured:
|
|
842
|
-
|
|
843
|
-
```yaml
|
|
844
|
-
actions:
|
|
845
|
-
upload:
|
|
846
|
-
excel_config:
|
|
847
|
-
max_file_size_mb: 10 # Maximum Excel file size in MB
|
|
848
|
-
max_rows: 100000 # Maximum number of rows allowed
|
|
849
|
-
max_columns: 50 # Maximum number of columns allowed
|
|
850
|
-
```
|
|
851
|
-
|
|
852
|
-
### Performance Optimizations
|
|
853
|
-
|
|
854
|
-
The Excel metadata processing includes several performance enhancements:
|
|
855
|
-
|
|
856
|
-
#### Metadata Indexing
|
|
857
|
-
- **O(1) hash lookups** for exact stem and filename matches
|
|
858
|
-
- **Pre-built indexes** for common matching patterns
|
|
859
|
-
- **Fallback algorithms** for complex path matching scenarios
|
|
860
|
-
|
|
861
|
-
#### Efficient Processing
|
|
862
|
-
- **Optimized row processing**: Skip empty rows early
|
|
863
|
-
- **Memory-conscious operation**: Process files in batches
|
|
864
|
-
- **Smart file discovery**: Cache path strings to avoid repeated conversions
|
|
865
|
-
|
|
866
|
-
### Metadata Processing Flow
|
|
867
|
-
|
|
868
|
-
1. **Security Validation**: File size, format, and content limits
|
|
869
|
-
2. **Header Validation**: Support for both "filename" and "file_name" with case-insensitive matching
|
|
870
|
-
3. **Index Building**: Create O(1) lookup structures for performance
|
|
871
|
-
4. **Content Processing**: Row-by-row metadata extraction with optimization
|
|
872
|
-
5. **Data Sanitization**: Automatic truncation and validation
|
|
873
|
-
6. **Pattern Matching**: 5-tier filename association algorithm
|
|
874
|
-
7. **Mapping Creation**: Optimized filename to metadata mapping
|
|
875
|
-
|
|
876
|
-
### Excel Metadata Parameter
|
|
877
|
-
|
|
878
|
-
You can specify a custom Excel metadata file path:
|
|
879
|
-
|
|
880
|
-
```python
|
|
881
|
-
params = {
|
|
882
|
-
"name": "Excel Metadata Upload",
|
|
883
|
-
"path": "/data/files",
|
|
884
|
-
"storage": 1,
|
|
885
|
-
"data_collection": 5,
|
|
886
|
-
"excel_metadata_path": "/data/custom_metadata.xlsx" # Custom Excel file
|
|
887
|
-
}
|
|
888
|
-
```
|
|
889
|
-
|
|
890
|
-
#### Path Resolution
|
|
891
|
-
- **Absolute paths**: Used directly if they exist and are accessible
|
|
892
|
-
- **Relative paths**: Resolved relative to the upload path
|
|
893
|
-
- **Default discovery**: Automatically searches for `meta.xlsx` or `meta.xls` if no path specified
|
|
894
|
-
- **Storage integration**: Uses storage configuration for proper path resolution
|
|
895
|
-
|
|
896
|
-
### Error Handling
|
|
897
|
-
|
|
898
|
-
Comprehensive error handling ensures robust operation:
|
|
899
|
-
|
|
900
|
-
```python
|
|
901
|
-
# Excel processing errors are handled gracefully
|
|
902
|
-
try:
|
|
903
|
-
metadata = process_excel_metadata(excel_path)
|
|
904
|
-
except ExcelSecurityError as e:
|
|
905
|
-
# Security violation - file too large, too many rows, etc.
|
|
906
|
-
log_security_violation(e)
|
|
907
|
-
except ExcelParsingError as e:
|
|
908
|
-
# Parsing failure - corrupted file, invalid format, etc.
|
|
909
|
-
log_parsing_error(e)
|
|
910
|
-
```
|
|
911
|
-
|
|
912
|
-
#### Error Recovery
|
|
913
|
-
- **Graceful degradation**: Continue processing with empty metadata if Excel fails
|
|
914
|
-
- **Detailed logging**: Specific error codes for different failure types
|
|
915
|
-
- **Path validation**: Comprehensive validation during parameter processing
|
|
916
|
-
- **Fallback behavior**: Smart defaults when metadata cannot be processed
|
|
917
|
-
|
|
918
|
-
## File Organization
|
|
919
|
-
|
|
920
|
-
The upload system automatically organizes files based on their types:
|
|
921
|
-
|
|
922
|
-
### Type Detection
|
|
923
|
-
|
|
924
|
-
Files are categorized based on:
|
|
925
|
-
|
|
926
|
-
- File extension patterns
|
|
927
|
-
- MIME type detection
|
|
928
|
-
- Content analysis
|
|
929
|
-
- Custom type rules
|
|
930
|
-
|
|
931
|
-
### Directory Structure
|
|
932
|
-
|
|
933
|
-
```
|
|
934
|
-
upload_output/
|
|
935
|
-
├── images/
|
|
936
|
-
│ ├── image1.jpg
|
|
937
|
-
│ └── image2.png
|
|
938
|
-
├── documents/
|
|
939
|
-
│ ├── report.pdf
|
|
940
|
-
│ └── data.xlsx
|
|
941
|
-
└── videos/
|
|
942
|
-
└── presentation.mp4
|
|
943
|
-
```
|
|
944
|
-
|
|
945
|
-
### Batch Processing
|
|
946
|
-
|
|
947
|
-
Files are processed in configurable batches:
|
|
948
|
-
|
|
949
|
-
```python
|
|
950
|
-
# Configure batch size
|
|
951
|
-
params = {
|
|
952
|
-
"creating_data_unit_batch_size": 100,
|
|
953
|
-
"use_async_upload": True
|
|
954
|
-
}
|
|
955
|
-
```
|
|
956
|
-
|
|
957
|
-
## Progress Tracking and Metrics
|
|
958
|
-
|
|
959
|
-
### Progress Categories
|
|
960
|
-
|
|
961
|
-
The upload action tracks progress across three main phases:
|
|
962
|
-
|
|
963
|
-
| Category | Proportion | Description |
|
|
964
|
-
| --------------------- | ---------- | ----------------------------------- |
|
|
965
|
-
| `analyze_collection` | 2% | Parameter validation and setup |
|
|
966
|
-
| `upload_data_files` | 38% | File upload processing |
|
|
967
|
-
| `generate_data_units` | 60% | Data unit creation and finalization |
|
|
968
|
-
|
|
969
|
-
### Metrics Collection
|
|
970
|
-
|
|
971
|
-
Real-time metrics are collected for monitoring:
|
|
972
|
-
|
|
973
|
-
```python
|
|
974
|
-
metrics_categories = {
|
|
975
|
-
'data_files': {
|
|
976
|
-
'stand_by': 0, # Files waiting to be processed
|
|
977
|
-
'failed': 0, # Files that failed upload
|
|
978
|
-
'success': 0, # Successfully uploaded files
|
|
979
|
-
},
|
|
980
|
-
'data_units': {
|
|
981
|
-
'stand_by': 0, # Units waiting to be created
|
|
982
|
-
'failed': 0, # Units that failed creation
|
|
983
|
-
'success': 0, # Successfully created units
|
|
984
|
-
},
|
|
985
|
-
}
|
|
986
|
-
```
|
|
987
|
-
|
|
988
|
-
## Type-Safe Logging
|
|
989
|
-
|
|
990
|
-
The upload system uses enum-based logging for consistency:
|
|
991
|
-
|
|
992
|
-
### Log Codes
|
|
993
|
-
|
|
994
|
-
```python
|
|
995
|
-
class LogCode(str, Enum):
|
|
996
|
-
VALIDATION_FAILED = 'VALIDATION_FAILED'
|
|
997
|
-
NO_FILES_FOUND = 'NO_FILES_FOUND'
|
|
998
|
-
EXCEL_SECURITY_VIOLATION = 'EXCEL_SECURITY_VIOLATION'
|
|
999
|
-
EXCEL_PARSING_ERROR = 'EXCEL_PARSING_ERROR'
|
|
1000
|
-
FILES_DISCOVERED = 'FILES_DISCOVERED'
|
|
1001
|
-
UPLOADING_DATA_FILES = 'UPLOADING_DATA_FILES'
|
|
1002
|
-
GENERATING_DATA_UNITS = 'GENERATING_DATA_UNITS'
|
|
1003
|
-
IMPORT_COMPLETED = 'IMPORT_COMPLETED'
|
|
1004
|
-
```
|
|
1005
|
-
|
|
1006
|
-
### Logging Usage
|
|
1007
|
-
|
|
1008
|
-
```python
|
|
1009
|
-
# Basic logging
|
|
1010
|
-
run.log_message_with_code(LogCode.FILES_DISCOVERED, file_count)
|
|
1011
|
-
|
|
1012
|
-
# With custom level
|
|
1013
|
-
run.log_message_with_code(
|
|
1014
|
-
LogCode.EXCEL_SECURITY_VIOLATION,
|
|
1015
|
-
filename,
|
|
1016
|
-
level=Context.DANGER
|
|
1017
|
-
)
|
|
1018
|
-
|
|
1019
|
-
# Upload-specific events
|
|
1020
|
-
run.log_upload_event(LogCode.UPLOADING_DATA_FILES, batch_size)
|
|
1021
|
-
```
|
|
1022
|
-
|
|
1023
|
-
## Migration Guide
|
|
1024
|
-
|
|
1025
|
-
### From Legacy to Refactored Architecture
|
|
1026
|
-
|
|
1027
|
-
The upload action has been refactored using modern design patterns while maintaining **100% backward compatibility**. Existing code will continue to work without changes.
|
|
1028
|
-
|
|
1029
|
-
#### Key Changes
|
|
1030
|
-
|
|
1031
|
-
**Before (Legacy Monolithic):**
|
|
1032
|
-
|
|
1033
|
-
- Single 900+ line action class with all logic
|
|
1034
|
-
- Hard-coded behaviors for validation, file discovery, etc.
|
|
1035
|
-
- No extensibility or customization options
|
|
1036
|
-
- Manual error handling throughout
|
|
1037
|
-
|
|
1038
|
-
**After (Strategy/Facade Patterns):**
|
|
1039
|
-
|
|
1040
|
-
- Clean separation of concerns with 8 workflow steps
|
|
1041
|
-
- Pluggable strategies for different behaviors
|
|
1042
|
-
- Extensible architecture for custom implementations
|
|
1043
|
-
- Automatic rollback and comprehensive error handling
|
|
1044
|
-
|
|
1045
|
-
#### Backward Compatibility
|
|
1046
|
-
|
|
1047
|
-
```python
|
|
1048
|
-
# This legacy usage still works exactly the same
|
|
1049
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
1050
|
-
|
|
1051
|
-
params = {
|
|
1052
|
-
"name": "My Upload",
|
|
1053
|
-
"path": "/data/files",
|
|
1054
|
-
"storage": 1,
|
|
1055
|
-
"data_collection": 5 # Changed from 'collection' to 'data_collection'
|
|
1056
|
-
}
|
|
1057
|
-
|
|
1058
|
-
action = UploadAction(params=params, plugin_config=config)
|
|
1059
|
-
result = action.start() # Works identically to before
|
|
1060
|
-
```
|
|
1061
|
-
|
|
1062
|
-
#### Enhanced Capabilities
|
|
1063
|
-
|
|
1064
|
-
The refactored architecture provides new capabilities:
|
|
1065
|
-
|
|
1066
|
-
```python
|
|
1067
|
-
# Get detailed workflow information
|
|
1068
|
-
action = UploadAction(params=params, plugin_config=config)
|
|
1069
|
-
workflow_info = action.get_workflow_summary()
|
|
1070
|
-
print(f"Configured with {workflow_info['step_count']} steps")
|
|
1071
|
-
print(f"Available strategies: {workflow_info['available_strategies']}")
|
|
1072
|
-
|
|
1073
|
-
# Execute and get detailed results
|
|
1074
|
-
result = action.start()
|
|
1075
|
-
print(f"Success: {result['success']}")
|
|
1076
|
-
print(f"Uploaded files: {result['uploaded_files_count']}")
|
|
1077
|
-
print(f"Generated data units: {result['generated_data_units_count']}")
|
|
1078
|
-
print(f"Errors: {result['errors']}")
|
|
1079
|
-
print(f"Metrics: {result['metrics']}")
|
|
1080
|
-
```
|
|
1081
|
-
|
|
1082
|
-
#### Parameter Changes
|
|
1083
|
-
|
|
1084
|
-
Only one parameter name changed:
|
|
1085
|
-
|
|
1086
|
-
| Legacy | Refactored | Status |
|
|
1087
|
-
| -------------------- | ----------------- | ------------------- |
|
|
1088
|
-
| `collection` | `data_collection` | **Required change** |
|
|
1089
|
-
| All other parameters | Unchanged | Fully compatible |
|
|
1090
|
-
|
|
1091
|
-
#### Benefits of Migration
|
|
1092
|
-
|
|
1093
|
-
- **Better Error Handling**: Automatic rollback on failures
|
|
1094
|
-
- **Progress Tracking**: Detailed progress metrics across workflow steps
|
|
1095
|
-
- **Extensibility**: Add custom strategies and steps
|
|
1096
|
-
- **Testing**: Better testability with mock-friendly architecture
|
|
1097
|
-
- **Maintainability**: Clean separation of concerns
|
|
1098
|
-
- **Performance**: More efficient resource management
|
|
1099
|
-
|
|
1100
|
-
## Usage Examples
|
|
1101
|
-
|
|
1102
|
-
### Basic File Upload (Refactored Architecture)
|
|
1103
|
-
|
|
1104
|
-
```python
|
|
1105
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
1106
|
-
|
|
1107
|
-
# Basic upload configuration with new architecture
|
|
1108
|
-
params = {
|
|
1109
|
-
"name": "Dataset Upload",
|
|
1110
|
-
"description": "Training dataset for ML model",
|
|
1111
|
-
"path": "/data/training_images",
|
|
1112
|
-
"storage": 1,
|
|
1113
|
-
"data_collection": 5, # Note: 'data_collection' instead of 'collection'
|
|
1114
|
-
"is_recursive": True,
|
|
1115
|
-
"max_file_size_mb": 100
|
|
1116
|
-
}
|
|
1117
|
-
|
|
1118
|
-
action = UploadAction(
|
|
1119
|
-
params=params,
|
|
1120
|
-
plugin_config=plugin_config
|
|
1121
|
-
)
|
|
1122
|
-
|
|
1123
|
-
# Execute with automatic step-based workflow and rollback
|
|
1124
|
-
result = action.start()
|
|
1125
|
-
|
|
1126
|
-
# Enhanced result information
|
|
1127
|
-
print(f"Upload successful: {result['success']}")
|
|
1128
|
-
print(f"Uploaded {result['uploaded_files_count']} files")
|
|
1129
|
-
print(f"Generated {result['generated_data_units_count']} data units")
|
|
1130
|
-
print(f"Workflow errors: {result['errors']}")
|
|
1131
|
-
|
|
1132
|
-
# Access detailed metrics
|
|
1133
|
-
workflow_metrics = result['metrics'].get('workflow', {})
|
|
1134
|
-
print(f"Total steps executed: {workflow_metrics.get('current_step', 0)}")
|
|
1135
|
-
print(f"Progress completed: {workflow_metrics.get('progress_percentage', 0)}%")
|
|
1136
|
-
```
|
|
1137
|
-
|
|
1138
|
-
### Excel Metadata Upload with Progress Tracking
|
|
1139
|
-
|
|
1140
|
-
```python
|
|
1141
|
-
# Upload with Excel metadata and progress monitoring
|
|
1142
|
-
params = {
|
|
1143
|
-
"name": "Annotated Dataset Upload",
|
|
1144
|
-
"path": "/data/images",
|
|
1145
|
-
"storage": 1,
|
|
1146
|
-
"data_collection": 5,
|
|
1147
|
-
"excel_metadata_path": "/data/metadata.xlsx",
|
|
1148
|
-
"is_recursive": False,
|
|
1149
|
-
"creating_data_unit_batch_size": 50
|
|
1150
|
-
}
|
|
1151
|
-
|
|
1152
|
-
action = UploadAction(
|
|
1153
|
-
params=params,
|
|
1154
|
-
plugin_config=plugin_config
|
|
1155
|
-
)
|
|
1156
|
-
|
|
1157
|
-
# Get workflow summary before execution
|
|
1158
|
-
workflow_info = action.get_workflow_summary()
|
|
1159
|
-
print(f"Workflow configured with {workflow_info['step_count']} steps")
|
|
1160
|
-
print(f"Total progress weight: {workflow_info['total_progress_weight']}")
|
|
1161
|
-
print(f"Steps: {workflow_info['steps']}")
|
|
1162
|
-
|
|
1163
|
-
# Execute with enhanced error handling
|
|
1164
|
-
try:
|
|
1165
|
-
result = action.start()
|
|
1166
|
-
if result['success']:
|
|
1167
|
-
print("Upload completed successfully!")
|
|
1168
|
-
print(f"Files: {result['uploaded_files_count']}")
|
|
1169
|
-
print(f"Data units: {result['generated_data_units_count']}")
|
|
1170
|
-
else:
|
|
1171
|
-
print("Upload failed with errors:")
|
|
1172
|
-
for error in result['errors']:
|
|
1173
|
-
print(f" - {error}")
|
|
1174
|
-
except Exception as e:
|
|
1175
|
-
print(f"Upload action failed: {e}")
|
|
1176
|
-
```
|
|
1177
|
-
|
|
1178
|
-
### Custom Strategy Upload
|
|
1179
|
-
|
|
1180
|
-
```python
|
|
1181
|
-
from synapse_sdk.plugins.categories.upload.actions.upload.action import UploadAction
|
|
1182
|
-
from my_custom_strategies import CustomValidationStrategy
|
|
1183
|
-
|
|
1184
|
-
# Create action with custom factory
|
|
1185
|
-
class CustomUploadAction(UploadAction):
|
|
1186
|
-
def _configure_strategies(self, context=None):
|
|
1187
|
-
strategies = super()._configure_strategies(context)
|
|
1188
|
-
|
|
1189
|
-
# Override with custom validation
|
|
1190
|
-
if self.params.get('use_strict_validation'):
|
|
1191
|
-
strategies['validation'] = CustomValidationStrategy()
|
|
1192
|
-
|
|
1193
|
-
return strategies
|
|
1194
|
-
|
|
1195
|
-
# Use custom action
|
|
1196
|
-
params = {
|
|
1197
|
-
"name": "Strict Validation Upload",
|
|
1198
|
-
"path": "/data/sensitive_files",
|
|
1199
|
-
"storage": 1,
|
|
1200
|
-
"data_collection": 5,
|
|
1201
|
-
"use_strict_validation": True,
|
|
1202
|
-
"max_file_size_mb": 10 # Stricter limits
|
|
1203
|
-
}
|
|
1204
|
-
|
|
1205
|
-
action = CustomUploadAction(
|
|
1206
|
-
params=params,
|
|
1207
|
-
plugin_config=plugin_config
|
|
1208
|
-
)
|
|
1209
|
-
|
|
1210
|
-
result = action.start()
|
|
1211
|
-
```
|
|
1212
|
-
|
|
1213
|
-
### Batch Processing with Custom Configuration
|
|
1214
|
-
|
|
1215
|
-
```python
|
|
1216
|
-
# Custom plugin configuration with config.yaml
|
|
1217
|
-
plugin_config = {
|
|
1218
|
-
"actions": {
|
|
1219
|
-
"upload": {
|
|
1220
|
-
"excel_config": {
|
|
1221
|
-
"max_file_size_mb": 20,
|
|
1222
|
-
"max_rows": 50000,
|
|
1223
|
-
"max_columns": 100
|
|
1224
|
-
}
|
|
1225
|
-
}
|
|
1226
|
-
}
|
|
1227
|
-
}
|
|
1228
|
-
|
|
1229
|
-
# Large batch upload with custom settings
|
|
1230
|
-
params = {
|
|
1231
|
-
"name": "Large Batch Upload",
|
|
1232
|
-
"path": "/data/large_dataset",
|
|
1233
|
-
"storage": 2,
|
|
1234
|
-
"data_collection": 10,
|
|
1235
|
-
"is_recursive": True,
|
|
1236
|
-
"max_file_size_mb": 500,
|
|
1237
|
-
"creating_data_unit_batch_size": 200,
|
|
1238
|
-
"use_async_upload": True
|
|
1239
|
-
}
|
|
1240
|
-
|
|
1241
|
-
action = UploadAction(
|
|
1242
|
-
params=params,
|
|
1243
|
-
plugin_config=plugin_config
|
|
1244
|
-
)
|
|
1245
|
-
|
|
1246
|
-
# Execute with progress monitoring
|
|
1247
|
-
result = action.start()
|
|
1248
|
-
|
|
1249
|
-
# Analyze results
|
|
1250
|
-
print(f"Batch upload summary:")
|
|
1251
|
-
print(f" Success: {result['success']}")
|
|
1252
|
-
print(f" Files processed: {result['uploaded_files_count']}")
|
|
1253
|
-
print(f" Data units created: {result['generated_data_units_count']}")
|
|
1254
|
-
|
|
1255
|
-
# Check metrics by category
|
|
1256
|
-
metrics = result['metrics']
|
|
1257
|
-
if 'data_files' in metrics:
|
|
1258
|
-
files_metrics = metrics['data_files']
|
|
1259
|
-
print(f" Files - Success: {files_metrics.get('success', 0)}")
|
|
1260
|
-
print(f" Files - Failed: {files_metrics.get('failed', 0)}")
|
|
1261
|
-
|
|
1262
|
-
if 'data_units' in metrics:
|
|
1263
|
-
units_metrics = metrics['data_units']
|
|
1264
|
-
print(f" Units - Success: {units_metrics.get('success', 0)}")
|
|
1265
|
-
print(f" Units - Failed: {units_metrics.get('failed', 0)}")
|
|
1266
|
-
```
|
|
1267
|
-
|
|
1268
|
-
### Error Handling and Rollback
|
|
1269
|
-
|
|
1270
|
-
```python
|
|
1271
|
-
# Demonstrate enhanced error handling with automatic rollback
|
|
1272
|
-
params = {
|
|
1273
|
-
"name": "Error Recovery Example",
|
|
1274
|
-
"path": "/data/problematic_files",
|
|
1275
|
-
"storage": 1,
|
|
1276
|
-
"data_collection": 5,
|
|
1277
|
-
"is_recursive": True
|
|
1278
|
-
}
|
|
1279
|
-
|
|
1280
|
-
action = UploadAction(
|
|
1281
|
-
params=params,
|
|
1282
|
-
plugin_config=plugin_config
|
|
1283
|
-
)
|
|
1284
|
-
|
|
1285
|
-
try:
|
|
1286
|
-
result = action.start()
|
|
1287
|
-
|
|
1288
|
-
if not result['success']:
|
|
1289
|
-
print("Upload failed, but cleanup was automatic:")
|
|
1290
|
-
print(f"Errors encountered: {len(result['errors'])}")
|
|
1291
|
-
for i, error in enumerate(result['errors'], 1):
|
|
1292
|
-
print(f" {i}. {error}")
|
|
1293
|
-
|
|
1294
|
-
# Check if rollback was performed (via orchestrator internals)
|
|
1295
|
-
workflow_metrics = result['metrics'].get('workflow', {})
|
|
1296
|
-
current_step = workflow_metrics.get('current_step', 0)
|
|
1297
|
-
total_steps = workflow_metrics.get('total_steps', 0)
|
|
1298
|
-
print(f"Workflow stopped at step {current_step} of {total_steps}")
|
|
1299
|
-
|
|
1300
|
-
except Exception as e:
|
|
1301
|
-
print(f"Critical upload failure: {e}")
|
|
1302
|
-
# Rollback was automatically performed before exception propagation
|
|
1303
|
-
```
|
|
1304
|
-
|
|
1305
|
-
## Error Handling
|
|
1306
|
-
|
|
1307
|
-
### Exception Types
|
|
1308
|
-
|
|
1309
|
-
The upload system defines specific exceptions:
|
|
1310
|
-
|
|
1311
|
-
```python
|
|
1312
|
-
# Security violations
|
|
1313
|
-
try:
|
|
1314
|
-
action.run_action()
|
|
1315
|
-
except ExcelSecurityError as e:
|
|
1316
|
-
print(f"Excel security violation: {e}")
|
|
1317
|
-
|
|
1318
|
-
# Parsing errors
|
|
1319
|
-
except ExcelParsingError as e:
|
|
1320
|
-
print(f"Excel parsing failed: {e}")
|
|
1321
|
-
|
|
1322
|
-
# General upload errors
|
|
1323
|
-
except ActionError as e:
|
|
1324
|
-
print(f"Upload action failed: {e}")
|
|
1325
|
-
```
|
|
1326
|
-
|
|
1327
|
-
### Validation Errors
|
|
1328
|
-
|
|
1329
|
-
Parameter validation provides detailed error messages:
|
|
1330
|
-
|
|
1331
|
-
```python
|
|
1332
|
-
from pydantic import ValidationError
|
|
1333
|
-
|
|
1334
|
-
try:
|
|
1335
|
-
params = UploadParams(**invalid_params)
|
|
1336
|
-
except ValidationError as e:
|
|
1337
|
-
for error in e.errors():
|
|
1338
|
-
print(f"Field {error['loc']}: {error['msg']}")
|
|
1339
|
-
```
|
|
1340
|
-
|
|
1341
|
-
## API Reference
|
|
1342
|
-
|
|
1343
|
-
### Core Components
|
|
1344
|
-
|
|
1345
|
-
#### UploadAction
|
|
1346
|
-
|
|
1347
|
-
Main upload action class implementing Strategy and Facade patterns for file processing operations.
|
|
1348
|
-
|
|
1349
|
-
**Class Attributes:**
|
|
1350
|
-
|
|
1351
|
-
- `name = 'upload'` - Action identifier
|
|
1352
|
-
- `category = PluginCategory.UPLOAD` - Plugin category
|
|
1353
|
-
- `method = RunMethod.JOB` - Execution method
|
|
1354
|
-
- `run_class = UploadRun` - Specialized run management
|
|
1355
|
-
- `params_model = UploadParams` - Parameter validation model
|
|
1356
|
-
- `strategy_factory: StrategyFactory` - Creates strategy implementations
|
|
1357
|
-
- `step_registry: StepRegistry` - Manages workflow steps
|
|
1358
|
-
|
|
1359
|
-
**Key Methods:**
|
|
1360
|
-
|
|
1361
|
-
- `start() -> Dict[str, Any]` - Execute orchestrated upload workflow
|
|
1362
|
-
- `get_workflow_summary() -> Dict[str, Any]` - Get configured workflow summary
|
|
1363
|
-
- `_configure_workflow() -> None` - Register workflow steps in execution order
|
|
1364
|
-
- `_configure_strategies(context=None) -> Dict[str, Any]` - Create strategy instances
|
|
1365
|
-
|
|
1366
|
-
**Progress Categories:**
|
|
1367
|
-
|
|
1368
|
-
```python
|
|
1369
|
-
progress_categories = {
|
|
1370
|
-
'analyze_collection': {'proportion': 2},
|
|
1371
|
-
'upload_data_files': {'proportion': 38},
|
|
1372
|
-
'generate_data_units': {'proportion': 60},
|
|
1373
|
-
}
|
|
1374
|
-
```
|
|
1375
|
-
|
|
1376
|
-
#### UploadOrchestrator
|
|
1377
|
-
|
|
1378
|
-
Facade component coordinating the complete upload workflow with automatic rollback.
|
|
1379
|
-
|
|
1380
|
-
**Class Attributes:**
|
|
1381
|
-
|
|
1382
|
-
- `context: UploadContext` - Shared state across workflow
|
|
1383
|
-
- `step_registry: StepRegistry` - Registry of workflow steps
|
|
1384
|
-
- `strategies: Dict[str, Any]` - Strategy implementations
|
|
1385
|
-
- `executed_steps: List[BaseStep]` - Successfully executed steps
|
|
1386
|
-
- `current_step_index: int` - Current position in workflow
|
|
1387
|
-
- `rollback_executed: bool` - Whether rollback was performed
|
|
1388
|
-
|
|
1389
|
-
**Key Methods:**
|
|
1390
|
-
|
|
1391
|
-
- `execute() -> Dict[str, Any]` - Execute complete workflow with error handling
|
|
1392
|
-
- `get_workflow_summary() -> Dict[str, Any]` - Get execution summary and metrics
|
|
1393
|
-
- `get_executed_steps() -> List[BaseStep]` - Get list of successfully executed steps
|
|
1394
|
-
- `is_rollback_executed() -> bool` - Check if rollback was performed
|
|
1395
|
-
- `_execute_step(step: BaseStep) -> StepResult` - Execute individual workflow step
|
|
1396
|
-
- `_handle_step_failure(step: BaseStep, error: Exception) -> None` - Handle step failures
|
|
1397
|
-
- `_rollback_executed_steps() -> None` - Rollback executed steps in reverse order
|
|
1398
|
-
|
|
1399
|
-
#### UploadContext
|
|
1400
|
-
|
|
1401
|
-
Context object maintaining shared state and communication between workflow components.
|
|
1402
|
-
|
|
1403
|
-
**State Attributes:**
|
|
1404
|
-
|
|
1405
|
-
- `params: Dict` - Upload parameters
|
|
1406
|
-
- `run: UploadRun` - Run management instance
|
|
1407
|
-
- `client: Any` - API client for external operations
|
|
1408
|
-
- `storage: Any` - Storage configuration object
|
|
1409
|
-
- `pathlib_cwd: Path` - Current working directory path
|
|
1410
|
-
- `metadata: Dict[str, Dict[str, Any]]` - File metadata mappings
|
|
1411
|
-
- `file_specifications: Dict[str, Any]` - Data collection file specs
|
|
1412
|
-
- `organized_files: List[Dict[str, Any]]` - Organized file information
|
|
1413
|
-
- `uploaded_files: List[Dict[str, Any]]` - Successfully uploaded files
|
|
1414
|
-
- `data_units: List[Dict[str, Any]]` - Generated data units
|
|
1415
|
-
|
|
1416
|
-
**Progress and Metrics:**
|
|
1417
|
-
|
|
1418
|
-
- `metrics: Dict[str, Any]` - Workflow metrics and statistics
|
|
1419
|
-
- `errors: List[str]` - Accumulated error messages
|
|
1420
|
-
- `step_results: List[StepResult]` - Results from executed steps
|
|
1421
|
-
|
|
1422
|
-
**Strategy and Rollback:**
|
|
1423
|
-
|
|
1424
|
-
- `strategies: Dict[str, Any]` - Injected strategy implementations
|
|
1425
|
-
- `rollback_data: Dict[str, Any]` - Data for rollback operations
|
|
1426
|
-
|
|
1427
|
-
**Key Methods:**
|
|
1428
|
-
|
|
1429
|
-
- `update(result: StepResult) -> None` - Update context with step results
|
|
1430
|
-
- `get_result() -> Dict[str, Any]` - Generate final result dictionary
|
|
1431
|
-
- `has_errors() -> bool` - Check for accumulated errors
|
|
1432
|
-
- `get_last_step_result() -> Optional[StepResult]` - Get most recent step result
|
|
1433
|
-
- `update_metrics(category: str, metrics: Dict[str, Any]) -> None` - Update metrics
|
|
1434
|
-
- `add_error(error: str) -> None` - Add error to context
|
|
1435
|
-
- `get_param(key: str, default: Any = None) -> Any` - Get parameter with default
|
|
1436
|
-
|
|
1437
|
-
#### StepRegistry
|
|
1438
|
-
|
|
1439
|
-
Registry managing the collection and execution order of workflow steps.
|
|
1440
|
-
|
|
1441
|
-
**Attributes:**
|
|
1442
|
-
|
|
1443
|
-
- `_steps: List[BaseStep]` - Registered workflow steps in execution order
|
|
1444
|
-
|
|
1445
|
-
**Key Methods:**
|
|
1446
|
-
|
|
1447
|
-
- `register(step: BaseStep) -> None` - Register a workflow step
|
|
1448
|
-
- `get_steps() -> List[BaseStep]` - Get all registered steps in order
|
|
1449
|
-
- `get_total_progress_weight() -> float` - Calculate total progress weight
|
|
1450
|
-
- `clear() -> None` - Clear all registered steps
|
|
1451
|
-
- `__len__() -> int` - Get number of registered steps
|
|
1452
|
-
|
|
1453
|
-
#### StrategyFactory
|
|
1454
|
-
|
|
1455
|
-
Factory component creating appropriate strategy implementations based on parameters.
|
|
1456
|
-
|
|
1457
|
-
**Key Methods:**
|
|
1458
|
-
|
|
1459
|
-
- `create_validation_strategy(params: Dict, context=None) -> BaseValidationStrategy` - Create validation strategy
|
|
1460
|
-
- `create_file_discovery_strategy(params: Dict, context=None) -> BaseFileDiscoveryStrategy` - Create file discovery strategy
|
|
1461
|
-
- `create_metadata_strategy(params: Dict, context=None) -> BaseMetadataStrategy` - Create metadata processing strategy
|
|
1462
|
-
- `create_upload_strategy(params: Dict, context: UploadContext) -> BaseUploadStrategy` - Create upload strategy (requires context)
|
|
1463
|
-
- `create_data_unit_strategy(params: Dict, context: UploadContext) -> BaseDataUnitStrategy` - Create data unit strategy (requires context)
|
|
1464
|
-
- `get_available_strategies() -> Dict[str, List[str]]` - Get available strategy types and implementations
|
|
1465
|
-
|
|
1466
|
-
### Workflow Steps
|
|
1467
|
-
|
|
1468
|
-
#### BaseStep (Abstract)
|
|
1469
|
-
|
|
1470
|
-
Base class for all workflow steps providing common interface and utilities.
|
|
1471
|
-
|
|
1472
|
-
**Abstract Properties:**
|
|
1473
|
-
|
|
1474
|
-
- `name: str` - Unique step identifier
|
|
1475
|
-
- `progress_weight: float` - Weight for progress calculation (sum should equal 1.0)
|
|
1476
|
-
|
|
1477
|
-
**Abstract Methods:**
|
|
1478
|
-
|
|
1479
|
-
- `execute(context: UploadContext) -> StepResult` - Execute step logic
|
|
1480
|
-
- `can_skip(context: UploadContext) -> bool` - Determine if step can be skipped
|
|
1481
|
-
- `rollback(context: UploadContext) -> None` - Rollback step operations
|
|
1482
|
-
|
|
1483
|
-
**Utility Methods:**
|
|
1484
|
-
|
|
1485
|
-
- `create_success_result(data: Dict = None) -> StepResult` - Create success result
|
|
1486
|
-
- `create_error_result(error: str, original_exception: Exception = None) -> StepResult` - Create error result
|
|
1487
|
-
- `create_skip_result() -> StepResult` - Create skip result
|
|
1488
|
-
|
|
1489
|
-
#### StepResult
|
|
1490
|
-
|
|
1491
|
-
Result object returned by workflow step execution.
|
|
1492
|
-
|
|
1493
|
-
**Attributes:**
|
|
1494
|
-
|
|
1495
|
-
- `success: bool` - Whether step executed successfully
|
|
1496
|
-
- `data: Dict[str, Any]` - Step result data
|
|
1497
|
-
- `error: str` - Error message if step failed
|
|
1498
|
-
- `rollback_data: Dict[str, Any]` - Data needed for rollback
|
|
1499
|
-
- `skipped: bool` - Whether step was skipped
|
|
1500
|
-
- `original_exception: Optional[Exception]` - Original exception for debugging
|
|
1501
|
-
- `timestamp: datetime` - Execution timestamp
|
|
1502
|
-
|
|
1503
|
-
**Usage:**
|
|
1504
|
-
|
|
1505
|
-
```python
|
|
1506
|
-
# Boolean evaluation
|
|
1507
|
-
if step_result:
|
|
1508
|
-
# Step was successful
|
|
1509
|
-
process_success(step_result.data)
|
|
1510
|
-
```
|
|
1511
|
-
|
|
1512
|
-
#### Concrete Steps
|
|
1513
|
-
|
|
1514
|
-
**InitializeStep** (`name: "initialize"`, `weight: 0.05`)
|
|
1515
|
-
|
|
1516
|
-
- Sets up storage connection and pathlib working directory
|
|
1517
|
-
- Validates basic upload prerequisites
|
|
1518
|
-
|
|
1519
|
-
**ProcessMetadataStep** (`name: "process_metadata"`, `weight: 0.05`)
|
|
1520
|
-
|
|
1521
|
-
- Processes Excel metadata if provided
|
|
1522
|
-
- Validates metadata security and format
|
|
1523
|
-
|
|
1524
|
-
**AnalyzeCollectionStep** (`name: "analyze_collection"`, `weight: 0.05`)
|
|
1525
|
-
|
|
1526
|
-
- Retrieves and validates data collection file specifications
|
|
1527
|
-
- Sets up file organization rules
|
|
1528
|
-
|
|
1529
|
-
**OrganizeFilesStep** (`name: "organize_files"`, `weight: 0.10`)
|
|
1530
|
-
|
|
1531
|
-
- Discovers files using file discovery strategy
|
|
1532
|
-
- Organizes files by type and specification
|
|
1533
|
-
|
|
1534
|
-
**ValidateFilesStep** (`name: "validate_files"`, `weight: 0.05`)
|
|
1535
|
-
|
|
1536
|
-
- Validates files using validation strategy
|
|
1537
|
-
- Performs security and content checks
|
|
1538
|
-
|
|
1539
|
-
**UploadFilesStep** (`name: "upload_files"`, `weight: 0.30`)
|
|
1540
|
-
|
|
1541
|
-
- Uploads files using upload strategy
|
|
1542
|
-
- Handles batching and progress tracking
|
|
1543
|
-
|
|
1544
|
-
**GenerateDataUnitsStep** (`name: "generate_data_units"`, `weight: 0.35`)
|
|
1545
|
-
|
|
1546
|
-
- Creates data units using data unit strategy
|
|
1547
|
-
- Links uploaded files to data units
|
|
1548
|
-
|
|
1549
|
-
**CleanupStep** (`name: "cleanup"`, `weight: 0.05`)
|
|
1550
|
-
|
|
1551
|
-
- Cleans temporary resources and files
|
|
1552
|
-
- Performs final validation
|
|
1553
|
-
|
|
1554
|
-
### Strategy Base Classes
|
|
1555
|
-
|
|
1556
|
-
#### BaseValidationStrategy (Abstract)
|
|
1557
|
-
|
|
1558
|
-
Base class for file validation strategies.
|
|
1559
|
-
|
|
1560
|
-
**Abstract Methods:**
|
|
1561
|
-
|
|
1562
|
-
- `validate_files(files: List[Path], context: UploadContext) -> bool` - Validate collection of files
|
|
1563
|
-
- `validate_security(file_path: Path) -> bool` - Validate individual file security
|
|
1564
|
-
|
|
1565
|
-
#### BaseFileDiscoveryStrategy (Abstract)
|
|
1566
|
-
|
|
1567
|
-
Base class for file discovery and organization strategies.
|
|
1568
|
-
|
|
1569
|
-
**Abstract Methods:**
|
|
1570
|
-
|
|
1571
|
-
- `discover_files(path: Path, context: UploadContext) -> List[Path]` - Discover files from path
|
|
1572
|
-
- `organize_files(files: List[Path], specs: Dict[str, Any], context: UploadContext) -> List[Dict[str, Any]]` - Organize discovered files
|
|
1573
|
-
|
|
1574
|
-
#### BaseMetadataStrategy (Abstract)
|
|
1575
|
-
|
|
1576
|
-
Base class for metadata processing strategies.
|
|
1577
|
-
|
|
1578
|
-
**Abstract Methods:**
|
|
1579
|
-
|
|
1580
|
-
- `process_metadata(context: UploadContext) -> Dict[str, Any]` - Process metadata from context
|
|
1581
|
-
- `extract_metadata(file_path: Path) -> Dict[str, Any]` - Extract metadata from file
|
|
1582
|
-
|
|
1583
|
-
#### BaseUploadStrategy (Abstract)
|
|
1584
|
-
|
|
1585
|
-
Base class for file upload strategies.
|
|
1586
|
-
|
|
1587
|
-
**Abstract Methods:**
|
|
1588
|
-
|
|
1589
|
-
- `upload_files(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload collection of files
|
|
1590
|
-
- `upload_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Upload file batch
|
|
1591
|
-
|
|
1592
|
-
#### BaseDataUnitStrategy (Abstract)
|
|
1593
|
-
|
|
1594
|
-
Base class for data unit creation strategies.
|
|
1595
|
-
|
|
1596
|
-
**Abstract Methods:**
|
|
1597
|
-
|
|
1598
|
-
- `generate_data_units(files: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Generate data units
|
|
1599
|
-
- `create_data_unit_batch(batch: List[Dict[str, Any]], context: UploadContext) -> List[Dict[str, Any]]` - Create data unit batch
|
|
1600
|
-
|
|
1601
|
-
### Legacy Components
|
|
1602
|
-
|
|
1603
|
-
#### UploadRun
|
|
1604
|
-
|
|
1605
|
-
Specialized run management for upload operations (unchanged from legacy).
|
|
1606
|
-
|
|
1607
|
-
**Logging Methods:**
|
|
1608
|
-
|
|
1609
|
-
- `log_message_with_code(code, *args, level=None)` - Type-safe logging
|
|
1610
|
-
- `log_upload_event(code, *args, level=None)` - Upload-specific events
|
|
1611
|
-
|
|
1612
|
-
**Nested Models:**
|
|
1613
|
-
|
|
1614
|
-
- `UploadEventLog` - Upload event logging
|
|
1615
|
-
- `DataFileLog` - Data file processing logs
|
|
1616
|
-
- `DataUnitLog` - Data unit creation logs
|
|
1617
|
-
- `TaskLog` - Task execution logs
|
|
1618
|
-
- `MetricsRecord` - Metrics tracking
|
|
1619
|
-
|
|
1620
|
-
#### UploadParams
|
|
1621
|
-
|
|
1622
|
-
Parameter validation model with Pydantic integration (unchanged from legacy).
|
|
1623
|
-
|
|
1624
|
-
**Required Parameters:**
|
|
1625
|
-
|
|
1626
|
-
- `name: str` - Upload name
|
|
1627
|
-
- `path: str` - Source path
|
|
1628
|
-
- `storage: int` - Storage ID
|
|
1629
|
-
- `data_collection: int` - Data collection ID
|
|
1630
|
-
|
|
1631
|
-
**Optional Parameters:**
|
|
1632
|
-
|
|
1633
|
-
- `description: str | None = None` - Upload description
|
|
1634
|
-
- `project: int | None = None` - Project ID
|
|
1635
|
-
- `excel_metadata_path: str | None = None` - Excel metadata file path
|
|
1636
|
-
- `is_recursive: bool = False` - Recursive file discovery
|
|
1637
|
-
- `max_file_size_mb: int = 50` - Maximum file size
|
|
1638
|
-
- `creating_data_unit_batch_size: int = 100` - Data unit batch size
|
|
1639
|
-
- `use_async_upload: bool = True` - Async upload processing
|
|
1640
|
-
|
|
1641
|
-
**Validation Features:**
|
|
1642
|
-
|
|
1643
|
-
- Real-time API validation for storage/data_collection/project
|
|
1644
|
-
- String sanitization and length validation
|
|
1645
|
-
- Type checking and conversion
|
|
1646
|
-
- Custom validator methods
|
|
1647
|
-
|
|
1648
|
-
### Utility Classes
|
|
1649
|
-
|
|
1650
|
-
#### ExcelSecurityConfig
|
|
1651
|
-
|
|
1652
|
-
Security configuration for Excel file processing.
|
|
1653
|
-
|
|
1654
|
-
**Configuration Attributes:**
|
|
1655
|
-
|
|
1656
|
-
- `max_file_size_mb` - File size limit in megabytes (default: 10)
|
|
1657
|
-
- `max_rows` - Row count limit (default: 100000)
|
|
1658
|
-
- `max_columns` - Column count limit (default: 50)
|
|
1659
|
-
|
|
1660
|
-
**Key Methods:**
|
|
1661
|
-
|
|
1662
|
-
- `from_action_config(action_config)` - Create config from config.yaml
|
|
1663
|
-
|
|
1664
|
-
#### PathAwareJSONEncoder
|
|
1665
|
-
|
|
1666
|
-
Custom JSON encoder for Path and datetime objects.
|
|
1667
|
-
|
|
1668
|
-
**Supported Types:**
|
|
1669
|
-
|
|
1670
|
-
- Path objects (converts to string)
|
|
1671
|
-
- Datetime objects (ISO format)
|
|
1672
|
-
- Standard JSON-serializable types
|
|
1673
|
-
|
|
1674
|
-
### Enums
|
|
1675
|
-
|
|
1676
|
-
#### LogCode
|
|
1677
|
-
|
|
1678
|
-
Type-safe logging codes for upload operations.
|
|
1679
|
-
|
|
1680
|
-
**Categories:**
|
|
1681
|
-
|
|
1682
|
-
- Validation codes (VALIDATION_FAILED, STORAGE_VALIDATION_FAILED)
|
|
1683
|
-
- File processing codes (NO_FILES_FOUND, FILES_DISCOVERED)
|
|
1684
|
-
- Excel processing codes (EXCEL_SECURITY_VIOLATION, EXCEL_PARSING_ERROR)
|
|
1685
|
-
- Progress codes (UPLOADING_DATA_FILES, GENERATING_DATA_UNITS)
|
|
1686
|
-
|
|
1687
|
-
#### UploadStatus
|
|
1688
|
-
|
|
1689
|
-
Upload processing status enumeration.
|
|
1690
|
-
|
|
1691
|
-
**Values:**
|
|
1692
|
-
|
|
1693
|
-
- `SUCCESS = 'success'` - Operation completed successfully
|
|
1694
|
-
- `FAILED = 'failed'` - Operation failed with errors
|
|
1695
|
-
|
|
1696
|
-
### Exceptions
|
|
1697
|
-
|
|
1698
|
-
#### ExcelSecurityError
|
|
1699
|
-
|
|
1700
|
-
Raised when Excel files violate security constraints.
|
|
1701
|
-
|
|
1702
|
-
**Common Causes:**
|
|
1703
|
-
|
|
1704
|
-
- File size exceeds limits
|
|
1705
|
-
- Memory usage estimation too high
|
|
1706
|
-
- Content security violations
|
|
1707
|
-
|
|
1708
|
-
#### ExcelParsingError
|
|
1709
|
-
|
|
1710
|
-
Raised when Excel files cannot be parsed.
|
|
1711
|
-
|
|
1712
|
-
**Common Causes:**
|
|
1713
|
-
|
|
1714
|
-
- File format corruption
|
|
1715
|
-
- Invalid Excel structure
|
|
1716
|
-
- Missing required columns
|
|
1717
|
-
- Content parsing failures
|
|
1718
|
-
|
|
1719
|
-
## Best Practices
|
|
1720
|
-
|
|
1721
|
-
### Architecture Patterns
|
|
1722
|
-
|
|
1723
|
-
1. **Strategy Selection**: Choose appropriate strategies based on use case requirements
|
|
1724
|
-
|
|
1725
|
-
- Use `RecursiveFileDiscoveryStrategy` for deep directory structures
|
|
1726
|
-
- Use `BasicValidationStrategy` for standard file validation
|
|
1727
|
-
- Use `AsyncUploadStrategy` for large file sets
|
|
1728
|
-
|
|
1729
|
-
2. **Step Ordering**: Maintain logical step dependencies
|
|
1730
|
-
|
|
1731
|
-
- Initialize → Process Metadata → Analyze Collection → Organize Files → Validate → Upload → Generate Data Units → Cleanup
|
|
1732
|
-
- Custom steps should be inserted at appropriate points in the workflow
|
|
1733
|
-
|
|
1734
|
-
3. **Context Management**: Leverage UploadContext for state sharing
|
|
1735
|
-
- Store intermediate results in context for downstream steps
|
|
1736
|
-
- Use context for cross-step communication
|
|
1737
|
-
- Preserve rollback data for cleanup operations
|
|
1738
|
-
|
|
1739
|
-
### Performance Optimization
|
|
1740
|
-
|
|
1741
|
-
1. **Batch Processing**: Configure optimal batch sizes based on system resources
|
|
1742
|
-
|
|
1743
|
-
```python
|
|
1744
|
-
params = {
|
|
1745
|
-
"creating_data_unit_batch_size": 200, # Adjust based on memory
|
|
1746
|
-
"upload_batch_size": 10, # Custom parameter for upload strategies
|
|
1747
|
-
}
|
|
1748
|
-
```
|
|
1749
|
-
|
|
1750
|
-
2. **Async Operations**: Enable async processing for I/O-bound operations
|
|
1751
|
-
|
|
1752
|
-
```python
|
|
1753
|
-
params = {
|
|
1754
|
-
"use_async_upload": True, # Better throughput for network operations
|
|
1755
|
-
}
|
|
1756
|
-
```
|
|
1757
|
-
|
|
1758
|
-
3. **Memory Management**: Monitor memory usage in custom strategies
|
|
1759
|
-
|
|
1760
|
-
- Process files in chunks rather than loading all into memory
|
|
1761
|
-
- Use generators for large file collections
|
|
1762
|
-
- Configure Excel security limits appropriately
|
|
1763
|
-
|
|
1764
|
-
4. **Progress Monitoring**: Implement detailed progress tracking
|
|
1765
|
-
```python
|
|
1766
|
-
# Custom step with progress updates
|
|
1767
|
-
def execute(self, context):
|
|
1768
|
-
total_files = len(context.organized_files)
|
|
1769
|
-
for i, file_info in enumerate(context.organized_files):
|
|
1770
|
-
# Process file
|
|
1771
|
-
progress = (i + 1) / total_files * 100
|
|
1772
|
-
context.update_metrics('custom_step', {'progress': progress})
|
|
1773
|
-
```
|
|
1774
|
-
|
|
1775
|
-
### Security Considerations
|
|
1776
|
-
|
|
1777
|
-
1. **Input Validation**: Validate all input parameters and file paths
|
|
1778
|
-
|
|
1779
|
-
```python
|
|
1780
|
-
# Custom validation in strategy
|
|
1781
|
-
def validate_files(self, files, context):
|
|
1782
|
-
for file_path in files:
|
|
1783
|
-
if not self._is_safe_path(file_path):
|
|
1784
|
-
return False
|
|
1785
|
-
return True
|
|
1786
|
-
```
|
|
1787
|
-
|
|
1788
|
-
2. **File Content Security**: Implement content-based security checks
|
|
1789
|
-
|
|
1790
|
-
- Scan for malicious file signatures
|
|
1791
|
-
- Validate file headers match extensions
|
|
1792
|
-
- Check for embedded executables
|
|
1793
|
-
|
|
1794
|
-
3. **Excel Security**: Configure appropriate security limits
|
|
1795
|
-
|
|
1796
|
-
```python
|
|
1797
|
-
import os
|
|
1798
|
-
os.environ['EXCEL_MAX_FILE_SIZE_MB'] = '10'
|
|
1799
|
-
os.environ['EXCEL_MAX_MEMORY_MB'] = '30'
|
|
1800
|
-
```
|
|
1801
|
-
|
|
1802
|
-
4. **Path Sanitization**: Validate and sanitize all file paths
|
|
1803
|
-
- Prevent path traversal attacks
|
|
1804
|
-
- Validate file extensions
|
|
1805
|
-
- Check file permissions
|
|
1806
|
-
|
|
1807
|
-
### Error Handling and Recovery
|
|
1808
|
-
|
|
1809
|
-
1. **Graceful Degradation**: Design for partial failure scenarios
|
|
1810
|
-
|
|
1811
|
-
```python
|
|
1812
|
-
class RobustUploadStrategy(BaseUploadStrategy):
|
|
1813
|
-
def upload_files(self, files, context):
|
|
1814
|
-
successful_uploads = []
|
|
1815
|
-
failed_uploads = []
|
|
1816
|
-
|
|
1817
|
-
for file_info in files:
|
|
1818
|
-
try:
|
|
1819
|
-
result = self._upload_file(file_info)
|
|
1820
|
-
successful_uploads.append(result)
|
|
1821
|
-
except Exception as e:
|
|
1822
|
-
failed_uploads.append({'file': file_info, 'error': str(e)})
|
|
1823
|
-
# Continue with other files instead of failing completely
|
|
1824
|
-
|
|
1825
|
-
# Update context with partial results
|
|
1826
|
-
context.add_uploaded_files(successful_uploads)
|
|
1827
|
-
if failed_uploads:
|
|
1828
|
-
context.add_error(f"Failed to upload {len(failed_uploads)} files")
|
|
1829
|
-
|
|
1830
|
-
return successful_uploads
|
|
1831
|
-
```
|
|
1832
|
-
|
|
1833
|
-
2. **Rollback Design**: Implement comprehensive rollback strategies
|
|
1834
|
-
|
|
1835
|
-
```python
|
|
1836
|
-
def rollback(self, context):
|
|
1837
|
-
# Clean up in reverse order of operations
|
|
1838
|
-
if hasattr(self, '_created_temp_files'):
|
|
1839
|
-
for temp_file in self._created_temp_files:
|
|
1840
|
-
try:
|
|
1841
|
-
temp_file.unlink()
|
|
1842
|
-
except Exception:
|
|
1843
|
-
pass # Don't fail rollback due to cleanup issues
|
|
1844
|
-
```
|
|
1845
|
-
|
|
1846
|
-
3. **Detailed Logging**: Use structured logging for debugging
|
|
1847
|
-
```python
|
|
1848
|
-
def execute(self, context):
|
|
1849
|
-
try:
|
|
1850
|
-
context.run.log_message_with_code(
|
|
1851
|
-
'CUSTOM_STEP_STARTED',
|
|
1852
|
-
{'step': self.name, 'file_count': len(context.organized_files)}
|
|
1853
|
-
)
|
|
1854
|
-
# Step logic here
|
|
1855
|
-
except Exception as e:
|
|
1856
|
-
context.run.log_message_with_code(
|
|
1857
|
-
'CUSTOM_STEP_FAILED',
|
|
1858
|
-
{'step': self.name, 'error': str(e)},
|
|
1859
|
-
level=Context.DANGER
|
|
1860
|
-
)
|
|
1861
|
-
raise
|
|
1862
|
-
```
|
|
1863
|
-
|
|
1864
|
-
### Development Guidelines
|
|
1865
|
-
|
|
1866
|
-
1. **Custom Strategy Development**: Follow established patterns
|
|
1867
|
-
|
|
1868
|
-
```python
|
|
1869
|
-
# Always extend appropriate base class
|
|
1870
|
-
class MyCustomStrategy(BaseValidationStrategy):
|
|
1871
|
-
def __init__(self, config=None):
|
|
1872
|
-
self.config = config or {}
|
|
1873
|
-
|
|
1874
|
-
def validate_files(self, files, context):
|
|
1875
|
-
# Implement validation logic
|
|
1876
|
-
return True
|
|
1877
|
-
|
|
1878
|
-
def validate_security(self, file_path):
|
|
1879
|
-
# Implement security validation
|
|
1880
|
-
return True
|
|
1881
|
-
```
|
|
1882
|
-
|
|
1883
|
-
2. **Testing Strategy**: Comprehensive test coverage
|
|
1884
|
-
|
|
1885
|
-
```python
|
|
1886
|
-
# Test both success and failure scenarios
|
|
1887
|
-
class TestCustomStrategy:
|
|
1888
|
-
def test_success_case(self):
|
|
1889
|
-
strategy = MyCustomStrategy()
|
|
1890
|
-
result = strategy.validate_files([Path('valid_file.txt')], mock_context)
|
|
1891
|
-
assert result is True
|
|
1892
|
-
|
|
1893
|
-
def test_security_failure(self):
|
|
1894
|
-
strategy = MyCustomStrategy()
|
|
1895
|
-
result = strategy.validate_security(Path('malware.exe'))
|
|
1896
|
-
assert result is False
|
|
1897
|
-
|
|
1898
|
-
def test_rollback_cleanup(self):
|
|
1899
|
-
step = MyCustomStep()
|
|
1900
|
-
step.rollback(mock_context)
|
|
1901
|
-
# Assert cleanup was performed
|
|
1902
|
-
```
|
|
1903
|
-
|
|
1904
|
-
3. **Extension Points**: Use factory pattern for extensibility
|
|
1905
|
-
|
|
1906
|
-
```python
|
|
1907
|
-
class CustomStrategyFactory(StrategyFactory):
|
|
1908
|
-
def create_validation_strategy(self, params, context=None):
|
|
1909
|
-
validation_type = params.get('validation_type', 'basic')
|
|
1910
|
-
|
|
1911
|
-
strategy_map = {
|
|
1912
|
-
'basic': BasicValidationStrategy,
|
|
1913
|
-
'strict': StrictValidationStrategy,
|
|
1914
|
-
'custom': MyCustomValidationStrategy,
|
|
1915
|
-
}
|
|
1916
|
-
|
|
1917
|
-
strategy_class = strategy_map.get(validation_type, BasicValidationStrategy)
|
|
1918
|
-
return strategy_class(params)
|
|
1919
|
-
```
|
|
1920
|
-
|
|
1921
|
-
4. **Configuration Management**: Use environment variables and parameters
|
|
1922
|
-
|
|
1923
|
-
```python
|
|
1924
|
-
class ConfigurableStep(BaseStep):
|
|
1925
|
-
def __init__(self):
|
|
1926
|
-
# Allow runtime configuration
|
|
1927
|
-
self.batch_size = int(os.getenv('STEP_BATCH_SIZE', '50'))
|
|
1928
|
-
self.timeout = int(os.getenv('STEP_TIMEOUT_SECONDS', '300'))
|
|
1929
|
-
|
|
1930
|
-
def execute(self, context):
|
|
1931
|
-
# Use configured values
|
|
1932
|
-
batch_size = context.get_param('step_batch_size', self.batch_size)
|
|
1933
|
-
timeout = context.get_param('step_timeout', self.timeout)
|
|
1934
|
-
```
|
|
1935
|
-
|
|
1936
|
-
### Anti-Patterns to Avoid
|
|
1937
|
-
|
|
1938
|
-
1. **Tight Coupling**: Don't couple strategies to specific implementations
|
|
1939
|
-
2. **State Mutation**: Don't modify context state directly outside of update() method
|
|
1940
|
-
3. **Exception Swallowing**: Don't catch and ignore exceptions without proper handling
|
|
1941
|
-
4. **Blocking Operations**: Don't perform long-running synchronous operations without progress updates
|
|
1942
|
-
5. **Memory Leaks**: Don't hold references to large objects in step instances
|
|
1943
|
-
|
|
1944
|
-
### Troubleshooting Guide
|
|
1945
|
-
|
|
1946
|
-
1. **Step Failures**: Check step execution order and dependencies
|
|
1947
|
-
2. **Strategy Issues**: Verify strategy factory configuration and parameter passing
|
|
1948
|
-
3. **Context Problems**: Ensure proper context updates and state management
|
|
1949
|
-
4. **Rollback Failures**: Design idempotent rollback operations
|
|
1950
|
-
5. **Performance Issues**: Profile batch sizes and async operation usage
|
|
1951
|
-
|
|
1952
|
-
### Migration Checklist
|
|
1953
|
-
|
|
1954
|
-
When upgrading from legacy implementation:
|
|
1955
|
-
|
|
1956
|
-
- [ ] Update parameter name from `collection` to `data_collection`
|
|
1957
|
-
- [ ] Test existing workflows for compatibility
|
|
1958
|
-
- [ ] Review custom extensions for new architecture opportunities
|
|
1959
|
-
- [ ] Update error handling to leverage new rollback capabilities
|
|
1960
|
-
|
|
1961
|
-
For detailed information on developing custom upload plugins using the BaseUploader template, see the [Developing Upload Templates](./developing-upload-template.md) guide.
|
|
1962
|
-
- [ ] Consider implementing custom strategies for specialized requirements
|
|
1963
|
-
- [ ] Update test cases to validate new workflow steps
|
|
1964
|
-
- [ ] Review logging and metrics collection for enhanced information
|