@arela/uploader 1.0.23 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. package/docs/AUTO_PROCESSING_PIPELINE.md +258 -0
  2. package/docs/COMPLETE_USAGE_GUIDE.md +1363 -0
  3. package/docs/DATABASESERVICE_IMPROVEMENTS.md +546 -0
  4. package/docs/PASO_2_TEST_RESULTS.md +298 -0
  5. package/docs/PASO_3_PLAN.md +385 -0
  6. package/docs/PHASE_1_FILE_DETECTION.md +366 -0
  7. package/docs/PHASE_2_API_INTEGRATION.md +426 -0
  8. package/docs/PHASE_3_DATABASE_MANAGEMENT.md +480 -0
  9. package/docs/PHASE_4_FILE_OPERATIONS.md +448 -0
  10. package/docs/PHASE_5_WATCH_MODE.md +450 -0
  11. package/docs/PHASE_6_SIGNAL_HANDLING.md +472 -0
  12. package/docs/PHASE_7_ADVANCED_FEATURES.md +560 -0
  13. package/docs/PLAN_WATCH_FEATURE.md +417 -0
  14. package/docs/README.md +480 -0
  15. package/docs/SCHEMA_ALIGNMENT_SUMMARY.md +301 -0
  16. package/docs/SMARTWATCH_DATABASE_REFACTORING.md +181 -0
  17. package/docs/SMART_WATCH_DATABASE_CHANGES.md +502 -0
  18. package/docs/TESTING_WATCH_MODE.md +212 -0
  19. package/docs/WATCHER_API_IMPLEMENTATION.md +520 -0
  20. package/docs/WATCHER_API_INTEGRATION.md +562 -0
  21. package/docs/WATCHER_SETUP_GUIDE.md +614 -0
  22. package/docs/WATCH_ARCHITECTURE.md +395 -0
  23. package/docs/WATCH_AUTO_PIPELINE.md +334 -0
  24. package/docs/WATCH_CONFIGURATION.md +267 -0
  25. package/docs/WATCH_USAGE_GUIDE.md +567 -0
  26. package/docs/commands.md +14 -0
  27. package/package.json +1 -1
  28. package/scripts/scoring-compare.js +243 -0
  29. package/scripts/scoring-phase4-check.js +96 -0
  30. package/src/commands/IdentifyCommand.js +36 -0
  31. package/src/config/config.js +2 -2
  32. package/src/file-detection.js +71 -4
  33. package/src/scoring/db-matcher-adapter.js +98 -0
  34. package/src/scoring/matchers-seed.js +386 -0
  35. package/src/scoring/scoring-engine.js +246 -0
  36. package/src/services/ScanApiService.js +14 -0
  37. package/tests/unit/scoring-engine.test.js +221 -0
  38. package/.vscode/settings.json +0 -1
  39. package/coverage/IdentifyCommand.js.html +0 -1462
  40. package/coverage/PropagateCommand.js.html +0 -1507
  41. package/coverage/PushCommand.js.html +0 -1504
  42. package/coverage/ScanCommand.js.html +0 -1654
  43. package/coverage/UploadCommand.js.html +0 -1846
  44. package/coverage/WatchCommand.js.html +0 -4111
  45. package/coverage/base.css +0 -224
  46. package/coverage/block-navigation.js +0 -87
  47. package/coverage/favicon.png +0 -0
  48. package/coverage/index.html +0 -191
  49. package/coverage/lcov-report/IdentifyCommand.js.html +0 -1462
  50. package/coverage/lcov-report/PropagateCommand.js.html +0 -1507
  51. package/coverage/lcov-report/PushCommand.js.html +0 -1504
  52. package/coverage/lcov-report/ScanCommand.js.html +0 -1654
  53. package/coverage/lcov-report/UploadCommand.js.html +0 -1846
  54. package/coverage/lcov-report/WatchCommand.js.html +0 -4111
  55. package/coverage/lcov-report/base.css +0 -224
  56. package/coverage/lcov-report/block-navigation.js +0 -87
  57. package/coverage/lcov-report/favicon.png +0 -0
  58. package/coverage/lcov-report/index.html +0 -191
  59. package/coverage/lcov-report/prettify.css +0 -1
  60. package/coverage/lcov-report/prettify.js +0 -2
  61. package/coverage/lcov-report/sort-arrow-sprite.png +0 -0
  62. package/coverage/lcov-report/sorter.js +0 -210
  63. package/coverage/lcov.info +0 -1937
  64. package/coverage/prettify.css +0 -1
  65. package/coverage/prettify.js +0 -2
  66. package/coverage/sort-arrow-sprite.png +0 -0
  67. package/coverage/sorter.js +0 -210
  68. package/docs/API_ENDPOINTS_FOR_DETECTION.md +0 -647
  69. package/docs/API_RETRY_MECHANISM.md +0 -338
  70. package/docs/ARELA_IDENTIFY_IMPLEMENTATION.md +0 -489
  71. package/docs/ARELA_IDENTIFY_QUICKREF.md +0 -186
  72. package/docs/ARELA_PROPAGATE_IMPLEMENTATION.md +0 -581
  73. package/docs/ARELA_PROPAGATE_QUICKREF.md +0 -272
  74. package/docs/ARELA_PUSH_IMPLEMENTATION.md +0 -577
  75. package/docs/ARELA_PUSH_QUICKREF.md +0 -322
  76. package/docs/ARELA_SCAN_IMPLEMENTATION.md +0 -373
  77. package/docs/ARELA_SCAN_QUICKREF.md +0 -139
  78. package/docs/CROSS_PLATFORM_PATH_HANDLING.md +0 -597
  79. package/docs/DETECTION_ATTEMPT_TRACKING.md +0 -414
  80. package/docs/MIGRATION_UPLOADER_TO_FILE_STATS.md +0 -1020
  81. package/docs/MULTI_LEVEL_DIRECTORY_SCANNING.md +0 -494
  82. package/docs/QUICK_REFERENCE_API_DETECTION.md +0 -264
  83. package/docs/REFACTORING_SUMMARY_DETECT_PEDIMENTOS.md +0 -200
  84. package/docs/STATS_COMMAND_SEQUENCE_DIAGRAM.md +0 -287
  85. package/docs/STATS_COMMAND_SIMPLE.md +0 -93
@@ -0,0 +1,258 @@
1
+ # Automatic 4-Step Processing Pipeline
2
+
3
+ ## Overview
4
+
5
+ The automatic processing pipeline enables Watch Mode to automatically execute a 4-step workflow whenever a new file is detected in monitored directories. This streamlines the file upload process without manual intervention.
6
+
7
+ ## The 4-Step Pipeline
8
+
9
+ When auto-processing is enabled and a new file is detected, the pipeline automatically executes:
10
+
11
+ ### Step 1: Stats Collection (`stats --stats-only`)
12
+ - Collects file statistics and metadata
13
+ - Uploads file information to the database (uploads table)
14
+ - Records file properties: size, type, modification date, path
15
+
16
+ ### Step 2: PDF/Pedimento Detection (`detect --detect-pdfs`)
17
+ - Analyzes files to detect "Pedimento Simplificado" documents
18
+ - Identifies which files are simplified customs documents
19
+ - Updates database records with detection results
20
+ - **Important**: Files in the same directory will only be uploaded if a Pedimento Simplificado is detected
21
+
22
+ ### Step 3: Arela Path Propagation (`detect --propagate-arela-path`)
23
+ - Propagates `arela_path` from detected Pedimento records to related files
24
+ - Ensures all documents in the same directory are correctly linked
25
+ - Creates the hierarchical relationship between main and supporting documents
26
+
27
+ ### Step 4: RFC-Based Upload (`upload --upload-by-rfc --folder-structure`)
28
+ - Uploads files based on RFC values from `UPLOAD_RFCS` configuration
29
+ - Uses the configured folder structure for bucket organization
30
+ - Each watched directory can have its own folder structure
31
+
32
+ ## Configuration
33
+
34
+ ### 1. Environment Setup (`.env`)
35
+
36
+ Configure watched directories with associated folder structures:
37
+
38
+ ```env
39
+ # JSON format: {"directory_path": "folder_structure"}
40
+ WATCH_DIRECTORY_CONFIGS={"../../Documents/2022":"estructura-2022","../../Documents/2023":"estructura-2023"}
41
+ ```
42
+
43
+ Each directory can have a unique folder structure:
44
+ - `../../Documents/2022` → uploads to bucket structure `estructura-2022`
45
+ - `../../Documents/2023` → uploads to bucket structure `estructura-2023`
46
+
47
+ ### 2. CLI Options
48
+
49
+ Enable auto-processing when running watch mode:
50
+
51
+ ```bash
52
+ # Basic usage with auto-processing enabled
53
+ arela watch --auto-processing
54
+
55
+ # With custom directories
56
+ arela watch -d "../../Documents/2022,../../Documents/2023" --auto-processing
57
+
58
+ # With other options
59
+ arela watch --auto-processing -b 10 --debounce 1000 -s batch
60
+ ```
61
+
62
+ ### 3. Processing Options
63
+
64
+ When auto-processing is enabled, you can configure:
65
+
66
+ | Option | Default | Description |
67
+ |--------|---------|-------------|
68
+ | `--batch-size` / `-b` | 10 | Files to process per batch in stats collection |
69
+ | `--debounce` | 1000ms | Wait time before processing after file event |
70
+ | `--auto-processing` | - | Enable the 4-step pipeline |
71
+
72
+ ## Workflow Example
73
+
74
+ ### Scenario: New file detected in monitored directory
75
+
76
+ ```
77
+ 📄 New file: /Documents/2023/AKS151005E46/invoice.pdf
78
+
79
+ ⚡ File Event Detected (add)
80
+
81
+ ┌─────────────────────────────────────────┐
82
+ │ Step 1: Stats Collection │
83
+ │ → Collects file info │
84
+ │ → Updates uploads table │
85
+ └─────────────────────────────────────────┘
86
+
87
+ ┌─────────────────────────────────────────┐
88
+ │ Step 2: PDF Detection │
89
+ │ → Searches for Pedimento Simplificado │
90
+ │ → Updates detection status │
91
+ └─────────────────────────────────────────┘
92
+
93
+ ┌─────────────────────────────────────────┐
94
+ │ Step 3: Arela Path Propagation │
95
+ │ → Links related documents │
96
+ │ → Propagates metadata │
97
+ └─────────────────────────────────────────┘
98
+
99
+ ┌─────────────────────────────────────────┐
100
+ │ Step 4: RFC Upload │
101
+ │ → Identifies RFC from directory/file │
102
+ │ → Uploads with folder structure │
103
+ │ → Creates bucket organization │
104
+ └─────────────────────────────────────────┘
105
+
106
+ ✅ Processing Complete
107
+ ```
108
+
109
+ ## Important Constraints
110
+
111
+ ### Pedimento Detection Requirement
112
+ - If a Pedimento Simplificado is NOT detected in Step 2, documents will not be uploaded
113
+ - Ensure your documents include the proper Pedimento Simplificado format
114
+ - Check database records for detection status:
115
+ ```bash
116
+ arela query --ready-files
117
+ ```
118
+
119
+ ### RFC Configuration
120
+ - Ensure `UPLOAD_RFCS` environment variable contains required RFCs
121
+ - Files from unregistered RFCs will not be processed
122
+ - Example configuration:
123
+ ```env
124
+ UPLOAD_RFCS=AKS151005E46|IMS030409FZ0|RDG1107154L7
125
+ ```
126
+
127
+ ## Monitoring and Debugging
128
+
129
+ ### Check Auto-Processing Status
130
+
131
+ The watch command logs auto-processing events:
132
+
133
+ ```
134
+ [AutoPipeline] Triggering 4-step processing pipeline for: /path/to/file.pdf
135
+ [AutoPipeline] Step 1/4: Stats collection...
136
+ [AutoPipeline] ✅ Stats collection completed
137
+ [AutoPipeline] Step 2/4: PDF detection...
138
+ [AutoPipeline] ✅ PDF detection completed
139
+ ...
140
+ [AutoPipeline] ✅ Pipeline completed successfully (ID: pipeline-xxx-yyy)
141
+ ```
142
+
143
+ ### Query Ready Files
144
+
145
+ Check which files are prepared for upload:
146
+
147
+ ```bash
148
+ arela query --ready-files
149
+ ```
150
+
151
+ ### Manual Commands (if needed)
152
+
153
+ Run individual steps manually:
154
+
155
+ ```bash
156
+ # Stats only
157
+ arela stats --stats-only
158
+
159
+ # PDF Detection
160
+ arela detect --detect-pdfs
161
+
162
+ # Path Propagation
163
+ arela detect --propagate-arela-path
164
+
165
+ # RFC Upload
166
+ arela upload --upload-by-rfc --folder-structure estructura-2023
167
+ ```
168
+
169
+ ## Performance Considerations
170
+
171
+ - **Debounce Time**: Default 1000ms prevents redundant processing
172
+ - **Batch Size**: Affects concurrent processing in Step 1
173
+ - **Pipeline Concurrency**: Only one pipeline runs at a time (prevents conflicts)
174
+ - **Auto-Processing Overhead**: Each new file triggers all 4 steps (approximately 10-30 seconds depending on file count)
175
+
176
+ ## Disabling Auto-Processing
177
+
178
+ To use watch mode without auto-processing:
179
+
180
+ ```bash
181
+ # Watch mode without auto-processing
182
+ arela watch
183
+
184
+ # Files detected but not automatically processed
185
+ # Use manual commands instead:
186
+ arela stats --stats-only
187
+ arela detect --detect-pdfs
188
+ arela detect --propagate-arela-path
189
+ arela upload --upload-by-rfc
190
+ ```
191
+
192
+ ## Troubleshooting
193
+
194
+ ### Pipeline Not Triggering
195
+ 1. Check if `--auto-processing` flag is set
196
+ 2. Verify `WATCH_DIRECTORY_CONFIGS` is properly configured in `.env`
197
+ 3. Check logs for configuration errors: `--verbose` flag
198
+
199
+ ### Files Not Uploading (Step 4 Failing)
200
+ 1. Verify Pedimento was detected: `arela query --ready-files`
201
+ 2. Check `UPLOAD_RFCS` configuration
202
+ 3. Verify folder structure is valid
203
+
204
+ ### Performance Issues
205
+ - Increase debounce time: `--debounce 2000`
206
+ - Reduce batch size: `-b 5`
207
+ - Check system resources during processing
208
+
209
+ ## Files Modified
210
+
211
+ - **`.env`**: New `WATCH_DIRECTORY_CONFIGS` format
212
+ - **`src/config/config.js`**: Parser for JSON directory configuration
213
+ - **`src/services/AutoProcessingService.js`**: New service for 4-step pipeline
214
+ - **`src/services/WatchService.js`**: Integration of auto-processing
215
+ - **`src/commands/WatchCommand.js`**: Support for directory configurations
216
+ - **`src/index.js`**: New `--auto-processing` CLI option
217
+ - **`src/utils/WatchEventHandler.js`**: Pipeline invocation support
218
+
219
+ ## API Reference
220
+
221
+ ### WatchService Methods
222
+
223
+ ```javascript
224
+ // Enable automatic processing
225
+ watchService.enableAutoProcessing({ batchSize: 10 });
226
+
227
+ // Disable automatic processing
228
+ watchService.disableAutoProcessing();
229
+
230
+ // Check if enabled
231
+ const enabled = watchService.isAutoProcessingEnabled();
232
+
233
+ // Get stats including pipeline count
234
+ const stats = watchService.getStats();
235
+ // Output: { pipelinesTriggered: 5, ... }
236
+ ```
237
+
238
+ ### AutoProcessingService Methods
239
+
240
+ ```javascript
241
+ // Execute the 4-step pipeline
242
+ const result = await autoProcessingService.executeProcessingPipeline({
243
+ filePath: '/path/to/file.pdf',
244
+ watchDir: '/watched/directory',
245
+ folderStructure: 'estructura-2023',
246
+ batchSize: 10
247
+ });
248
+
249
+ // Returns:
250
+ // {
251
+ // pipelineId: 'pipeline-xxx-yyy',
252
+ // summary: {
253
+ // success: true,
254
+ // message: '✅ All 4 steps completed successfully!',
255
+ // details: { ... }
256
+ // }
257
+ // }
258
+ ```