@arela/uploader 1.0.24 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/docs/AUTO_PROCESSING_PIPELINE.md +258 -0
  2. package/docs/COMPLETE_USAGE_GUIDE.md +1363 -0
  3. package/docs/DATABASESERVICE_IMPROVEMENTS.md +546 -0
  4. package/docs/PASO_2_TEST_RESULTS.md +298 -0
  5. package/docs/PASO_3_PLAN.md +385 -0
  6. package/docs/PHASE_1_FILE_DETECTION.md +366 -0
  7. package/docs/PHASE_2_API_INTEGRATION.md +426 -0
  8. package/docs/PHASE_3_DATABASE_MANAGEMENT.md +480 -0
  9. package/docs/PHASE_4_FILE_OPERATIONS.md +448 -0
  10. package/docs/PHASE_5_WATCH_MODE.md +450 -0
  11. package/docs/PHASE_6_SIGNAL_HANDLING.md +472 -0
  12. package/docs/PHASE_7_ADVANCED_FEATURES.md +560 -0
  13. package/docs/PLAN_WATCH_FEATURE.md +417 -0
  14. package/docs/README.md +480 -0
  15. package/docs/SCHEMA_ALIGNMENT_SUMMARY.md +301 -0
  16. package/docs/SMARTWATCH_DATABASE_REFACTORING.md +181 -0
  17. package/docs/SMART_WATCH_DATABASE_CHANGES.md +502 -0
  18. package/docs/TESTING_WATCH_MODE.md +212 -0
  19. package/docs/WATCHER_API_IMPLEMENTATION.md +520 -0
  20. package/docs/WATCHER_API_INTEGRATION.md +562 -0
  21. package/docs/WATCHER_SETUP_GUIDE.md +614 -0
  22. package/docs/WATCH_ARCHITECTURE.md +395 -0
  23. package/docs/WATCH_AUTO_PIPELINE.md +334 -0
  24. package/docs/WATCH_CONFIGURATION.md +267 -0
  25. package/docs/WATCH_USAGE_GUIDE.md +567 -0
  26. package/docs/commands.md +14 -0
  27. package/package.json +1 -1
  28. package/src/commands/IdentifyCommand.js +11 -0
  29. package/src/config/config.js +2 -2
  30. package/src/file-detection.js +42 -1
  31. package/src/scoring/scoring-engine.js +40 -7
  32. package/src/services/LoggingService.js +5 -3
  33. package/.vscode/settings.json +0 -1
  34. package/coverage/IdentifyCommand.js.html +0 -1462
  35. package/coverage/PropagateCommand.js.html +0 -1507
  36. package/coverage/PushCommand.js.html +0 -1504
  37. package/coverage/ScanCommand.js.html +0 -1654
  38. package/coverage/UploadCommand.js.html +0 -1846
  39. package/coverage/WatchCommand.js.html +0 -4111
  40. package/coverage/base.css +0 -224
  41. package/coverage/block-navigation.js +0 -87
  42. package/coverage/favicon.png +0 -0
  43. package/coverage/index.html +0 -191
  44. package/coverage/lcov-report/IdentifyCommand.js.html +0 -1462
  45. package/coverage/lcov-report/PropagateCommand.js.html +0 -1507
  46. package/coverage/lcov-report/PushCommand.js.html +0 -1504
  47. package/coverage/lcov-report/ScanCommand.js.html +0 -1654
  48. package/coverage/lcov-report/UploadCommand.js.html +0 -1846
  49. package/coverage/lcov-report/WatchCommand.js.html +0 -4111
  50. package/coverage/lcov-report/base.css +0 -224
  51. package/coverage/lcov-report/block-navigation.js +0 -87
  52. package/coverage/lcov-report/favicon.png +0 -0
  53. package/coverage/lcov-report/index.html +0 -191
  54. package/coverage/lcov-report/prettify.css +0 -1
  55. package/coverage/lcov-report/prettify.js +0 -2
  56. package/coverage/lcov-report/sort-arrow-sprite.png +0 -0
  57. package/coverage/lcov-report/sorter.js +0 -210
  58. package/coverage/lcov.info +0 -1937
  59. package/coverage/prettify.css +0 -1
  60. package/coverage/prettify.js +0 -2
  61. package/coverage/sort-arrow-sprite.png +0 -0
  62. package/coverage/sorter.js +0 -210
  63. package/docs/API_ENDPOINTS_FOR_DETECTION.md +0 -647
  64. package/docs/API_RETRY_MECHANISM.md +0 -338
  65. package/docs/ARELA_IDENTIFY_IMPLEMENTATION.md +0 -489
  66. package/docs/ARELA_IDENTIFY_QUICKREF.md +0 -186
  67. package/docs/ARELA_PROPAGATE_IMPLEMENTATION.md +0 -581
  68. package/docs/ARELA_PROPAGATE_QUICKREF.md +0 -272
  69. package/docs/ARELA_PUSH_IMPLEMENTATION.md +0 -577
  70. package/docs/ARELA_PUSH_QUICKREF.md +0 -322
  71. package/docs/ARELA_SCAN_IMPLEMENTATION.md +0 -373
  72. package/docs/ARELA_SCAN_QUICKREF.md +0 -139
  73. package/docs/CROSS_PLATFORM_PATH_HANDLING.md +0 -597
  74. package/docs/DETECTION_ATTEMPT_TRACKING.md +0 -414
  75. package/docs/MIGRATION_UPLOADER_TO_FILE_STATS.md +0 -1020
  76. package/docs/MULTI_LEVEL_DIRECTORY_SCANNING.md +0 -494
  77. package/docs/QUICK_REFERENCE_API_DETECTION.md +0 -264
  78. package/docs/REFACTORING_SUMMARY_DETECT_PEDIMENTOS.md +0 -200
  79. package/docs/STATS_COMMAND_SEQUENCE_DIAGRAM.md +0 -287
  80. package/docs/STATS_COMMAND_SIMPLE.md +0 -93
@@ -1,338 +0,0 @@
1
- # API Retry Mechanism
2
-
3
- ## Overview
4
-
5
- The `arela scan` and `arela identify` commands now include robust retry logic with exponential backoff for all API requests. This ensures resilience against transient network issues, temporary server overload, and rate limiting.
6
-
7
- ## Features
8
-
9
- ### 1. **Automatic Retry on Transient Errors**
10
-
11
- Retries are automatically triggered for:
12
- - **Network errors**: Connection reset, timeout, refused, DNS failures
13
- - **HTTP 429**: Too Many Requests (rate limiting)
14
- - **HTTP 5xx**: Server errors (500, 502, 503, 504)
15
-
16
- ### 2. **Exponential Backoff (Default)**
17
-
18
- When enabled (default), retry delays increase exponentially:
19
- - Attempt 1: ~1 second
20
- - Attempt 2: ~2 seconds
21
- - Attempt 3: ~4 seconds
22
- - Attempt 4: ~8 seconds
23
- - Attempt 5: ~16 seconds (max)
24
-
25
- ### 3. **Jitter to Prevent Thundering Herd**
26
-
27
- Each retry delay includes ±20% random jitter to prevent multiple clients from retrying simultaneously, which could overwhelm the server.
28
-
29
- ### 4. **Smart Error Detection**
30
-
31
- The system distinguishes between:
32
- - **Retryable errors**: Network issues, server errors, rate limits
33
- - **Non-retryable errors**: Client errors (400, 401, 403, 404), validation errors
34
-
35
- Non-retryable errors fail immediately without wasting time on useless retries.
36
-
37
- ## Configuration
38
-
39
- ### Environment Variables
40
-
41
- Add to `.env` file:
42
-
43
- ```bash
44
- # Maximum number of retry attempts (default: 3)
45
- API_MAX_RETRIES=3
46
-
47
- # Use exponential backoff (default: true)
48
- # When true: 1s → 2s → 4s → 8s → 16s
49
- # When false: uses fixed delay
50
- API_RETRY_EXPONENTIAL_BACKOFF=true
51
-
52
- # Fixed retry delay in milliseconds (only used if exponential backoff is disabled)
53
- API_RETRY_DELAY=1000
54
- ```
55
-
56
- ### Configuration Examples
57
-
58
- #### High Reliability (Recommended)
59
-
60
- ```bash
61
- API_MAX_RETRIES=5
62
- API_RETRY_EXPONENTIAL_BACKOFF=true
63
- ```
64
-
65
- **Use case**: Production environments with unreliable network or high load
66
-
67
- #### Fast Failure (Testing/Development)
68
-
69
- ```bash
70
- API_MAX_RETRIES=1
71
- API_RETRY_EXPONENTIAL_BACKOFF=false
72
- API_RETRY_DELAY=500
73
- ```
74
-
75
- **Use case**: Local development where you want quick feedback
76
-
77
- #### Aggressive Retry (High-Volume Processing)
78
-
79
- ```bash
80
- API_MAX_RETRIES=7
81
- API_RETRY_EXPONENTIAL_BACKOFF=true
82
- ```
83
-
84
- **Use case**: Large batch operations where you can't afford to lose progress
85
-
86
- ## Retry Behavior
87
-
88
- ### Retryable Scenarios
89
-
90
- | Error Type | Example | Retry Behavior |
91
- |------------|---------|----------------|
92
- | Network timeout | `ETIMEDOUT` | ✅ Retry with backoff |
93
- | Connection reset | `ECONNRESET` | ✅ Retry with backoff |
94
- | Connection refused | `ECONNREFUSED` | ✅ Retry with backoff |
95
- | DNS failure | `ENOTFOUND`, `EAI_AGAIN` | ✅ Retry with backoff |
96
- | Rate limiting | HTTP 429 | ✅ Retry with backoff |
97
- | Server error | HTTP 5xx | ✅ Retry with backoff |
98
-
99
- ### Non-Retryable Scenarios
100
-
101
- | Error Type | Example | Retry Behavior |
102
- |------------|---------|----------------|
103
- | Unauthorized | HTTP 401 | ❌ Fail immediately |
104
- | Forbidden | HTTP 403 | ❌ Fail immediately |
105
- | Not found | HTTP 404 | ❌ Fail immediately |
106
- | Bad request | HTTP 400 | ❌ Fail immediately |
107
- | Conflict | HTTP 409 | ❌ Fail immediately |
108
-
109
- ## Logging
110
-
111
- ### Retry Warnings
112
-
113
- When a retry is triggered, you'll see warnings like:
114
-
115
- ```
116
- ⚠️ API request failed (attempt 1/4): Connection timeout. Retrying in 1234ms...
117
- ⚠️ API request failed (attempt 2/4): HTTP 503 Service Unavailable. Retrying in 2456ms...
118
- ```
119
-
120
- ### Retry Success
121
-
122
- When a retry succeeds, you'll see:
123
-
124
- ```
125
- ℹ️ API request succeeded on attempt 3/4
126
- ```
127
-
128
- ### Final Failure
129
-
130
- If all retries fail, you'll see:
131
-
132
- ```
133
- ❌ API request failed after 4 attempt(s): Connection timeout
134
- ```
135
-
136
- ## Performance Impact
137
-
138
- ### With Default Settings (3 retries, exponential backoff)
139
-
140
- **Best case** (no failures): No overhead
141
-
142
- **Worst case** (all retries fail):
143
- - Total retry time: ~1s + 2s + 4s = ~7 seconds
144
- - Total attempts: 4 (1 initial + 3 retries)
145
-
146
- ### Optimization Tips
147
-
148
- 1. **For stable networks**: Reduce `API_MAX_RETRIES` to 2-3
149
- 2. **For unstable networks**: Increase to 5-7
150
- 3. **For rate-limited APIs**: Keep exponential backoff enabled
151
- 4. **For fast development**: Disable retries or set to 1
152
-
153
- ## Integration with Commands
154
-
155
- ### arela scan
156
-
157
- All API operations during scan now have retry logic:
158
- - Instance registration (`POST /api/uploader/scan/register`)
159
- - Batch insert (`POST /api/uploader/scan/batch-insert`)
160
- - Scan completion (`PATCH /api/uploader/scan/complete`)
161
-
162
- ### arela identify
163
-
164
- All API operations during identify now have retry logic:
165
- - Fetch detection stats (`GET /api/uploader/scan/detection-stats`)
166
- - Fetch PDFs for detection (`GET /api/uploader/scan/pdfs-for-detection`)
167
- - Batch update detection (`PATCH /api/uploader/scan/batch-update-detection`)
168
-
169
- ## Comparison with DatabaseService
170
-
171
- | Feature | DatabaseService (Supabase) | ScanApiService (HTTP) |
172
- |---------|---------------------------|----------------------|
173
- | Retry Logic | ✅ Yes | ✅ Yes |
174
- | Max Retries | 3 (hardcoded) | 3 (configurable) |
175
- | Backoff Strategy | Exponential | Exponential or Fixed |
176
- | Jitter | No | ✅ Yes (±20%) |
177
- | Error Detection | Generic | HTTP-specific |
178
- | Configurable | No | ✅ Yes via .env |
179
-
180
- ## Best Practices
181
-
182
- ### 1. **Enable in Production**
183
-
184
- Always use retry logic in production:
185
-
186
- ```bash
187
- API_MAX_RETRIES=3
188
- API_RETRY_EXPONENTIAL_BACKOFF=true
189
- ```
190
-
191
- ### 2. **Monitor Retry Rates**
192
-
193
- Track retry warnings in logs to detect:
194
- - Network instability
195
- - Server overload
196
- - API rate limiting
197
-
198
- ### 3. **Adjust for Your Environment**
199
-
200
- - **Cloud/remote**: Higher retries (5-7)
201
- - **Local/LAN**: Lower retries (1-3)
202
- - **Rate-limited APIs**: Exponential backoff
203
-
204
- ### 4. **Use Jitter**
205
-
206
- Always keep jitter enabled (built-in) to prevent retry storms.
207
-
208
- ### 5. **Set Connection Timeout**
209
-
210
- Combine retries with appropriate timeout:
211
-
212
- ```bash
213
- API_CONNECTION_TIMEOUT=30000 # 30 seconds
214
- API_MAX_RETRIES=3
215
- ```
216
-
217
- This ensures retries happen within reasonable time.
218
-
219
- ## Troubleshooting
220
-
221
- ### Too Many Retries
222
-
223
- **Symptom**: Commands take too long due to retries
224
-
225
- **Solution**: Reduce `API_MAX_RETRIES` or disable exponential backoff
226
-
227
- ```bash
228
- API_MAX_RETRIES=1
229
- ```
230
-
231
- ### Not Enough Retries
232
-
233
- **Symptom**: Commands fail due to transient errors
234
-
235
- **Solution**: Increase `API_MAX_RETRIES`
236
-
237
- ```bash
238
- API_MAX_RETRIES=5
239
- ```
240
-
241
- ### Rate Limiting Issues
242
-
243
- **Symptom**: Many HTTP 429 errors
244
-
245
- **Solution**: Ensure exponential backoff is enabled and increase retries
246
-
247
- ```bash
248
- API_MAX_RETRIES=5
249
- API_RETRY_EXPONENTIAL_BACKOFF=true
250
- ```
251
-
252
- ### Network Timeout Issues
253
-
254
- **Symptom**: `ETIMEDOUT` errors
255
-
256
- **Solution**: Increase connection timeout and retries
257
-
258
- ```bash
259
- API_CONNECTION_TIMEOUT=60000 # 60 seconds
260
- API_MAX_RETRIES=5
261
- ```
262
-
263
- ## Implementation Details
264
-
265
- ### Code Location
266
-
267
- - **Service**: `arela-uploader/src/services/ScanApiService.js`
268
- - **Methods**:
269
- - `#isRetryableError()` - Determines if error should trigger retry
270
- - `#calculateBackoff()` - Calculates delay between retries
271
- - `#request()` - Main request method with retry loop
272
-
273
- ### Retry Loop Logic
274
-
275
- ```javascript
276
- for (let attempt = 1; attempt <= maxRetries + 1; attempt++) {
277
- try {
278
- // Make request
279
- const response = await fetch(url, options);
280
-
281
- // Check if response is ok
282
- if (!response.ok) {
283
- const error = new Error(`${response.status} ${response.statusText}`);
284
-
285
- // Retry if error is retryable
286
- if (isRetryable(error, response) && attempt <= maxRetries) {
287
- await sleep(calculateBackoff(attempt));
288
- continue;
289
- }
290
-
291
- throw error;
292
- }
293
-
294
- // Success
295
- return await response.json();
296
-
297
- } catch (error) {
298
- // Handle network errors
299
- if (isRetryable(error) && attempt <= maxRetries) {
300
- await sleep(calculateBackoff(attempt));
301
- continue;
302
- }
303
-
304
- throw error;
305
- }
306
- }
307
- ```
308
-
309
- ### Backoff Calculation
310
-
311
- ```javascript
312
- function calculateBackoff(attempt) {
313
- if (exponentialBackoff) {
314
- // 1s, 2s, 4s, 8s, 16s (max)
315
- const delay = Math.min(1000 * Math.pow(2, attempt - 1), 16000);
316
-
317
- // Add jitter (±20%)
318
- const jitter = delay * 0.2 * (Math.random() * 2 - 1);
319
- return delay + jitter;
320
- } else {
321
- // Fixed delay with jitter
322
- return fixedDelay + (fixedDelay * 0.2 * (Math.random() * 2 - 1));
323
- }
324
- }
325
- ```
326
-
327
- ## Future Enhancements
328
-
329
- Potential improvements:
330
- 1. **Circuit breaker pattern**: Stop retrying after N consecutive failures
331
- 2. **Adaptive backoff**: Adjust delays based on error patterns
332
- 3. **Retry budget**: Limit total retry time per operation
333
- 4. **Metrics collection**: Track retry rates and success rates
334
- 5. **Per-endpoint configuration**: Different retry settings for different endpoints
335
-
336
- ## Conclusion
337
-
338
- The retry mechanism provides robust error handling for the `arela scan` and `arela identify` commands, ensuring operations can recover from transient failures without manual intervention. Proper configuration and monitoring ensure optimal performance and reliability.