@arela/uploader 1.0.13 β†’ 1.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.old.md DELETED
@@ -1,673 +0,0 @@
1
- # arela-uploader
2
-
3
- CLI tool to upload files and directories to Arela API or Supabase Storage with automatic file processing, detection, and organization.
4
-
5
- ## ✨ What's New in v0.4.0
6
-
7
- - 🏒 **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
8
- - πŸ”€ **Cross-Tenant Mode**: Read from one API, write to another with `--source-api` and `--target-api`
9
- - βš™οΈ **Dynamic Client Config**: Switch clients by updating `.env` - no code changes needed!
10
- - πŸ‘οΈ **Enhanced Watch Mode**: Full cross-tenant support in automatic processing pipeline
11
- - ⚑ **Optimized Connections**: HTTP Agent with connection pooling for high performance
12
-
13
- ## πŸš€ OPTIMIZED 4-PHASE WORKFLOW
14
-
15
- **New in v0.2.0**: The tool now supports an optimized 4-phase workflow designed for maximum performance when processing large file collections:
16
-
17
- ### Phase 1: Filesystem Stats Collection πŸ“Š
18
- ```bash
19
- arela --stats-only
20
- ```
21
- - ⚑ **ULTRA FAST**: Only reads filesystem metadata (no file content)
22
- - πŸ“ˆ **Bulk database operations**: Processes 1000+ files per batch
23
- - πŸ”„ **Upsert optimization**: Handles duplicates efficiently
24
- - πŸ’Ύ **Minimal memory usage**: No file content loading
25
-
26
- ### Phase 2: PDF Detection πŸ”
27
- ```bash
28
- arela --detect-pdfs
29
- ```
30
- - 🎯 **Targeted processing**: Only processes PDF files from database
31
- - οΏ½ **Pedimento-simplificado detection**: Extracts RFC, pedimento numbers, and metadata
32
- - πŸ”„ **Batched processing**: Handles large datasets efficiently
33
- - πŸ“Š **Progress tracking**: Real-time detection statistics
34
-
35
- ### Phase 3: Path Propagation οΏ½πŸ“
36
- ```bash
37
- arela --propagate-arela-path
38
- ```
39
- - 🎯 **Smart path copying**: Propagates arela_path from pedimento documents to related files
40
- - πŸ“¦ **Batch updates**: Processes files in groups for optimal database performance
41
- - πŸ”— **Relationship mapping**: Links supporting documents to their pedimento
42
-
43
- ### Phase 4: RFC-based Upload πŸš€
44
- ```bash
45
- arela --upload-by-rfc
46
- ```
47
- - 🎯 **Targeted uploads**: Only uploads files for specified RFCs
48
- - πŸ“‹ **Supporting documents**: Includes all related files, not just pedimentos
49
- - πŸ—οΈ **Structure preservation**: Maintains proper folder hierarchy
50
-
51
- ### Combined Workflow 🎯
52
- ```bash
53
- # Run all 4 phases in sequence (recommended)
54
- arela --run-all-phases
55
-
56
- # Or run phases individually for more control
57
- arela --stats-only # Phase 1: Collect filesystem stats
58
- arela --detect-pdfs # Phase 2: Detect pedimento documents
59
- arela --propagate-arela-path # Phase 3: Propagate paths to related files
60
- arela --upload-by-rfc # Phase 4: Upload by RFC
61
- ```
62
-
63
- ### Performance Benefits
64
-
65
- **Before optimization** (single phase with detection):
66
- - 🐌 Read every file for detection
67
- - πŸ’Ύ High memory usage
68
- - πŸ”„ Slow database operations
69
- - ❌ Process unsupported files
70
-
71
- **After optimization** (4-phase approach):
72
- - ⚑ **10x faster**: Phase 1 only reads filesystem metadata
73
- - πŸ“Š **Bulk operations**: Database inserts up to 1000 records per batch
74
- - 🎯 **Targeted processing**: Phase 2 only processes PDFs needing detection
75
- - πŸ’Ύ **Memory efficient**: No unnecessary file content loading
76
- - πŸ”„ **Optimized I/O**: Separates filesystem, database, and network operations
77
-
78
- ## Features
79
-
80
- - πŸ“ Upload entire directories or individual files
81
- - πŸ€– **Automatic file detection and organization** (API mode)
82
- - πŸ—‚οΈ **Smart year/pedimento auto-detection from file paths**
83
- - πŸ—οΈ **Custom folder structure support**
84
- - πŸ”„ Automatic file renaming to handle problematic characters
85
- - πŸ“ Comprehensive logging (local and remote)
86
- - ⚑ Retry mechanism for failed uploads
87
- - 🎯 Skip duplicate files automatically
88
- - πŸ“Š Progress bars and detailed summaries
89
- - πŸ“‚ **Preserve directory structure with auto-organization**
90
- - πŸš€ **Batch processing with configurable concurrency**
91
- - πŸ”§ **Performance optimizations with caching**
92
- - πŸ“‹ **Upload files by specific RFC values**
93
- - πŸ” **Propagate arela_path from pedimento documents to related files**
94
- - ⚑ **4-Phase optimized workflow for maximum performance**
95
- - πŸ‘οΈ **Watch Mode** - Monitor directories for changes and upload automatically
96
- - Multiple watch strategies (batch, individual, full-structure)
97
- - **Multi-tenant and cross-tenant support** ⭐ NEW
98
- - Debounce and polling support
99
- - Auto-processing pipeline
100
- - Dry-run mode for testing
101
- - Pattern-based file ignoring
102
-
103
- ## 🏒 Multi-Tenant API Support
104
-
105
- Connect to different API instances: **default**, **agencia**, or **cliente**.
106
-
107
- ```bash
108
- # Upload to client API
109
- arela upload --api cliente --upload-by-rfc
110
-
111
- # Collect stats on agencia API
112
- arela stats --api agencia
113
-
114
- # Watch mode with specific API target
115
- arela watch --api cliente
116
- ```
117
-
118
- ### Cross-Tenant Mode
119
-
120
- Process files from one tenant and upload to another:
121
-
122
- ```bash
123
- # Read data from agencia, upload files to client
124
- arela watch --source-api agencia --target-api cliente
125
-
126
- # Same for upload command
127
- arela upload --source-api agencia --target-api cliente --upload-by-rfc
128
- ```
129
-
130
- **How Cross-Tenant Works:**
131
- | Phase | Description | API Used |
132
- |-------|-------------|----------|
133
- | Phase 1 | Stats Collection | `--source-api` |
134
- | Phase 2 | PDF Detection | `--source-api` |
135
- | Phase 3 | Path Propagation | `--source-api` |
136
- | Phase 4 | File Upload | `--target-api` |
137
-
138
- ### Available API Targets
139
-
140
- Only 3 API targets are available: `default`, `agencia`, `cliente`
141
-
142
- Configure in your `.env` file:
143
-
144
- ```env
145
- # Default API (--api default or no flag)
146
- ARELA_API_URL=http://localhost:3010
147
- ARELA_API_TOKEN=your_token
148
-
149
- # Agencia API (--api agencia)
150
- ARELA_API_AGENCIA_URL=http://localhost:4012
151
- ARELA_API_AGENCIA_TOKEN=your_agencia_token
152
-
153
- # Cliente API (--api cliente)
154
- # Configure the URL/Token for the specific client you need
155
- ARELA_API_CLIENTE_URL=http://localhost:4014
156
- ARELA_API_CLIENTE_TOKEN=your_cliente_token
157
-
158
- # Examples for different clients:
159
- # Cliente AUM9207011CA: ARELA_API_CLIENTE_URL=http://localhost:4014
160
- # Cliente KTJ931117P55: ARELA_API_CLIENTE_URL=http://localhost:4013
161
- ```
162
-
163
- > πŸ’‘ **Tip**: To switch between clients, just update `ARELA_API_CLIENTE_URL` and `ARELA_API_CLIENTE_TOKEN` in your `.env` file. No code changes needed!
164
-
165
- ## Installation
166
-
167
- ```bash
168
- npm install -g @arela/uploader
169
- ```
170
-
171
- ## Usage
172
-
173
- ### πŸš€ Optimized 4-Phase Workflow (Recommended)
174
-
175
- ```bash
176
- # Run all phases automatically (most efficient)
177
- arela upload --run-all-phases --batch-size 20
178
-
179
- # Or run phases individually for fine-grained control
180
- arela stats # Phase 1: Filesystem stats only
181
- arela detect # Phase 2: PDF detection
182
- arela detect --propagate-arela-path # Phase 3: Path propagation
183
- arela upload --upload-by-rfc # Phase 4: RFC-based upload
184
- ```
185
-
186
- ### Available Commands
187
-
188
- #### 1. **upload** - Upload files to Arela
189
- ```bash
190
- # Basic upload with auto-processing (API Mode)
191
- arela upload --batch-size 10
192
-
193
- # Upload with auto-detection of year/pedimento from file paths
194
- arela upload --auto-detect-structure --batch-size 10
195
-
196
- # Upload with custom folder structure
197
- arela upload --folder-structure "2024/4023260" --batch-size 10
198
-
199
- # Upload to Supabase directly (skip API)
200
- arela upload --force-supabase --prefix "my-folder"
201
-
202
- # Upload files by specific RFC values
203
- arela upload --upload-by-rfc --batch-size 5
204
-
205
- # Upload RFC files with custom folder prefix
206
- arela upload --upload-by-rfc --folder-structure "palco" --batch-size 5
207
-
208
- # Upload RFC files with nested folder structure
209
- arela upload --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
210
-
211
- # Upload with performance statistics
212
- arela upload --batch-size 10 --show-stats
213
-
214
- # Upload with client path tracking
215
- arela upload --client-path "/client/documents" --batch-size 10
216
- ```
217
-
218
- #### 2. **stats** - Collect file statistics without uploading
219
- ```bash
220
- # Collect filesystem statistics only (Phase 1)
221
- arela stats --batch-size 10
222
-
223
- # Stats with custom folder organization
224
- arela stats --folder-structure "2023/3019796" --batch-size 10
225
-
226
- # Stats with client path tracking
227
- arela stats --client-path "/client/documents" --batch-size 10
228
- ```
229
-
230
- #### 3. **detect** - Run document detection and path propagation
231
- ```bash
232
- # Run PDF detection on existing database records (Phase 2)
233
- arela detect --batch-size 10
234
-
235
- # Propagate arela_path from pedimento records to related files (Phase 3)
236
- arela detect --propagate-arela-path
237
- ```
238
-
239
- #### 4. **watch** - Monitor directories and upload automatically ⭐ NEW
240
- ```bash
241
- # Watch directories for changes with automatic upload
242
- arela watch --directories "/path/to/watch1,/path/to/watch2"
243
-
244
- # Watch with specific API target (single tenant)
245
- arela watch --api cliente
246
-
247
- # Watch with cross-tenant mode (read from agencia, upload to client)
248
- arela watch --source-api agencia --target-api cliente
249
-
250
- # Watch with custom upload strategy (default: batch)
251
- arela watch --directories "/path/to/watch" --strategy individual
252
- arela watch --directories "/path/to/watch" --strategy full-structure
253
-
254
- # Watch with custom debounce delay (default: 1000ms)
255
- arela watch --directories "/path/to/watch" --debounce 2000
256
-
257
- # Watch with automatic 4-step pipeline
258
- arela watch --directories "/path/to/watch" --auto-processing --batch-size 10
259
-
260
- # Watch with polling instead of native file system events
261
- arela watch --directories "/path/to/watch" --poll 5000
262
-
263
- # Watch with pattern ignoring
264
- arela watch --directories "/path/to/watch" --ignore "node_modules,*.log,*.tmp"
265
-
266
- # Watch in dry-run mode (simulate without uploading)
267
- arela watch --directories "/path/to/watch" --dry-run
268
-
269
- # Watch with verbose logging
270
- arela watch --directories "/path/to/watch" --verbose
271
- ```
272
-
273
- **Watch Strategies:**
274
- - `batch` **(default)**: Groups files and uploads periodically
275
- - `individual`: Uploads each file immediately as it changes
276
- - `full-structure`: Preserves directory structure during upload
277
-
278
- **Multi-Tenant Options:**
279
- - `--api <target>`: Use a single API for all operations
280
- - `--source-api <target>`: API for reading/processing (phases 1-3)
281
- - `--target-api <target>`: API for uploading (phase 4)
282
-
283
- #### 5. **query** - Query database for file status
284
- ```bash
285
- # Show files ready for upload
286
- arela query --ready-files
287
- ```
288
-
289
- #### 6. **config** - Show current configuration
290
- ```bash
291
- # Display all configuration settings
292
- arela config
293
- ```
294
-
295
- ### Legacy Syntax (Still Supported)
296
-
297
- The old flag-based syntax is still supported for backward compatibility:
298
-
299
- ```bash
300
- # These are equivalent to the commands above
301
- arela --stats-only # Same as: arela stats
302
- arela --detect-pdfs # Same as: arela detect
303
- arela --propagate-arela-path # Same as: arela detect --propagate-arela-path
304
- arela --upload-by-rfc # Same as: arela upload --upload-by-rfc
305
- ```
306
-
307
- #### Phase Control
308
- - `--stats-only`: **Phase 1** - Only collect filesystem stats (no file reading)
309
- - `--detect-pdfs`: **Phase 2** - Process PDF files for pedimento-simplificado detection
310
- - `--propagate-arela-path`: **Phase 3** - Propagate arela_path from pedimento records to related files
311
- - `--upload-by-rfc`: **Phase 4** - Upload files based on RFC values from UPLOAD_RFCS
312
- - `--run-all-phases`: **All Phases** - Run complete optimized workflow
313
-
314
- #### Global Options (all commands)
315
- - `-v, --verbose`: Enable verbose logging
316
- - `--clear-log`: Clear the log file before starting
317
- - `-h, --help`: Display help information
318
- - `--version`: Display version number
319
-
320
- #### Upload Command Options
321
- - `-b, --batch-size <size>`: API batch size (default: 10)
322
- - `--folder-structure <structure>`: Custom folder structure (e.g., "2024/4023260")
323
- - `--client-path <path>`: Client path for metadata tracking
324
- - `--auto-detect-structure`: Automatically detect year/pedimento from file paths
325
- - `--auto-detect`: Enable automatic document type detection
326
- - `--auto-organize`: Enable automatic file organization
327
- - `--force-supabase`: Force direct Supabase upload (skip API)
328
- - `--skip-processed`: Skip files already processed
329
- - `--show-stats`: Show performance statistics
330
- - `--upload-by-rfc`: Upload files based on RFC values from UPLOAD_RFCS
331
- - `--run-all-phases`: Run all processing phases sequentially
332
-
333
- #### Stats Command Options
334
- - `-b, --batch-size <size>`: Batch size for processing (default: 10)
335
- - `--client-path <path>`: Client path for metadata tracking
336
- - `--show-stats`: Show performance statistics
337
-
338
- #### Detect Command Options
339
- - `-b, --batch-size <size>`: Batch size for PDF detection (default: 10)
340
- - `--propagate-arela-path`: Propagate arela_path from pedimento records to related files
341
-
342
- #### Watch Command Options
343
- - `-d, --directories <paths>`: **Comma-separated directories to watch** (required)
344
- - `-s, --strategy <strategy>`: Upload strategy (default: batch)
345
- - `batch`: Groups files and uploads periodically
346
- - `individual`: Uploads each file immediately
347
- - `full-structure`: Preserves directory structure
348
- - `--api <target>`: Use a single API target for all operations
349
- - `--source-api <target>`: API for reading/processing (phases 1-3)
350
- - `--target-api <target>`: API for uploading (phase 4)
351
- - `--debounce <ms>`: Debounce delay in milliseconds (default: 1000)
352
- - `-b, --batch-size <size>`: Batch size for uploads (default: 10)
353
- - `--poll <ms>`: Use polling instead of native file system events (interval in ms)
354
- - `--ignore <patterns>`: Comma-separated patterns to ignore
355
- - `--auto-detect`: Enable automatic document type detection
356
- - `--auto-organize`: Enable automatic file organization
357
- - `--auto-processing`: Enable automatic 4-step pipeline (stats, detect, propagate, upload)
358
- - `--dry-run`: Simulate changes without uploading
359
- - `--verbose`: Enable verbose logging
360
-
361
- ## Environment Variables
362
-
363
- Create a `.env` file in your project root:
364
-
365
- ```env
366
- # Default API (--api default or no flag)
367
- ARELA_API_URL=http://localhost:3010
368
- ARELA_API_TOKEN=your_api_token
369
-
370
- # Agencia API (--api agencia)
371
- ARELA_API_AGENCIA_URL=http://localhost:4012
372
- ARELA_API_AGENCIA_TOKEN=your_agencia_token
373
-
374
- # Cliente API (--api cliente)
375
- # Configure for the specific client you need
376
- ARELA_API_CLIENTE_URL=http://localhost:4014
377
- ARELA_API_CLIENTE_TOKEN=your_cliente_token
378
-
379
- # For Direct Supabase Mode (fallback)
380
- SUPABASE_URL=your_supabase_url
381
- SUPABASE_KEY=your_supabase_anon_key
382
- SUPABASE_BUCKET=your_bucket_name
383
-
384
- # Required for both modes
385
- UPLOAD_BASE_PATH=/path/to/your/files
386
- UPLOAD_SOURCES=folder1|folder2|file.pdf
387
-
388
- # RFC-based Upload Configuration
389
- # Pipe-separated list of RFCs to upload files for
390
- UPLOAD_RFCS=MMJ0810145N1|ABC1234567XY|DEF9876543ZZ
391
-
392
- # Watch Mode Configuration (JSON format)
393
- WATCH_DIRECTORY_CONFIGS={"../../Documents/2022":"palco","../../Documents/2023":"palco"}
394
- ```
395
-
396
- **Environment Variable Details:**
397
-
398
- - `ARELA_API_URL`: Base URL for default API service
399
- - `ARELA_API_AGENCIA_URL`: URL for agencia API
400
- - `ARELA_API_CLIENTE_URL`: URL for client API (configure per client)
401
- - `ARELA_API_TOKEN`: Authentication token for default API
402
- - `ARELA_API_AGENCIA_TOKEN`: Token for agencia API
403
- - `ARELA_API_CLIENTE_TOKEN`: Token for client API
404
- - `SUPABASE_URL`: Your Supabase project URL
405
- - `SUPABASE_KEY`: Supabase anonymous key for direct uploads
406
- - `SUPABASE_BUCKET`: Target bucket name in Supabase Storage
407
- - `UPLOAD_BASE_PATH`: Root directory containing files to upload
408
- - `UPLOAD_SOURCES`: Pipe-separated list of folders/files to process
409
- - `UPLOAD_RFCS`: Pipe-separated list of RFC values for targeted uploads
410
- - `WATCH_DIRECTORY_CONFIGS`: JSON mapping directories to folder structures
411
-
412
- ## RFC-Based File Upload
413
-
414
- The `--upload-by-rfc` feature allows you to upload files to the Arela API based on specific RFC values. This is useful when you want to upload only files associated with certain companies or entities.
415
-
416
- ### How it works:
417
-
418
- 1. **Configure RFCs**: Set the `UPLOAD_RFCS` environment variable with pipe-separated RFC values
419
- 2. **Query Database**: The tool searches the Supabase database for files matching the specified RFCs
420
- 3. **Include Supporting Documents**: Finds all files sharing the same `arela_path` as the RFC matches (not just the pedimento files)
421
- 4. **Apply Folder Structure**: Optionally applies custom folder prefix using `--folder-structure`
422
- 5. **Group and Upload**: Files are grouped by their final destination path and uploaded with proper structure
423
-
424
- ### Folder Structure Options:
425
-
426
- **Default Behavior** (no `--folder-structure`):
427
- - Uses original `arela_path`: `CAD890407NK7/2023/3429/070/230734293000421/`
428
-
429
- **With Custom Prefix** (`--folder-structure "palco"`):
430
- - Results in: `palco/CAD890407NK7/2023/3429/070/230734293000421/`
431
-
432
- **With Nested Prefix** (`--folder-structure "2024/client1/pedimentos"`):
433
- - Results in: `2024/client1/pedimentos/CAD890407NK7/2023/3429/070/230734293000421/`
434
-
435
- ### Prerequisites:
436
-
437
- - Files must have been previously processed (have entries in the `uploader` table)
438
- - Files must have `rfc` field populated (from document detection)
439
- - Files must have `arela_path` populated (from pedimento processing)
440
- - Original files must still exist at their `original_path` locations
441
-
442
- ### Example:
443
-
444
- ```bash
445
- # Set RFCs in environment
446
- export UPLOAD_RFCS="MMJ0810145N1|ABC1234567XY|DEF9876543ZZ"
447
-
448
- # Upload files for these RFCs (original folder structure)
449
- arela --upload-by-rfc --batch-size 5 --show-stats
450
-
451
- # Upload with custom folder prefix
452
- arela --upload-by-rfc --folder-structure "palco" --batch-size 10
453
-
454
- # Upload with nested organization
455
- arela --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
456
- ```
457
-
458
- The tool will:
459
- - Find all database records matching the specified RFCs
460
- - Include ALL supporting documents that share the same `arela_path`
461
- - Apply the optional folder structure prefix if specified
462
- - Group files by their final destination folder structure
463
- - Upload each group maintaining the correct Arela folder hierarchy
464
- - Provide detailed progress and summary statistics
465
- - Handle large datasets with automatic pagination (no 1000-file limit)
466
-
467
- ## File Processing Modes
468
-
469
- ### API Mode (Default)
470
- When `ARELA_API_URL` and `ARELA_API_TOKEN` are configured:
471
- - βœ… Automatic file detection and classification
472
- - βœ… Intelligent file organization
473
- - βœ… **Smart year/pedimento auto-detection from paths**
474
- - βœ… **Custom folder structure support**
475
- - βœ… Batch processing with progress tracking
476
- - βœ… Advanced error handling and retry logic
477
- - βœ… **Performance optimizations with file sanitization caching**
478
-
479
- ### Auto-Detection Features
480
- The tool can automatically detect year and pedimento numbers from file paths using multiple patterns:
481
-
482
- **Pattern 1: Direct Structure**
483
- ```
484
- /path/to/2024/4023260/file.pdf
485
- /path/to/pedimentos/2024/4023260/file.pdf
486
- ```
487
-
488
- **Pattern 2: Named Patterns**
489
- ```
490
- /path/to/docs/aΓ±o2024/ped4023260/file.pdf
491
- /path/to/files/year2024/pedimento4023260/file.pdf
492
- ```
493
-
494
- **Pattern 3: Loose Detection**
495
- - Year: Any 4-digit number starting with "202" (2020-2029)
496
- - Pedimento: Any 4-8 consecutive digits in path
497
-
498
- Use `--auto-detect-structure` to enable automatic detection:
499
- ```bash
500
- arela --auto-detect-structure --batch-size 10
501
- ```
502
-
503
- ### Custom Folder Structure
504
- Specify a custom organization pattern:
505
- ```bash
506
- # Static structure
507
- arela --folder-structure "2024/4023260" --batch-size 10
508
-
509
- # Client-based structure
510
- arela --folder-structure "cliente1/pedimentos" --batch-size 10
511
- ```
512
-
513
- ### Directory Structure Preservation
514
- Use `--preserve-structure` to maintain your original folder structure even with auto-organization:
515
-
516
- ```bash
517
- # Without --preserve-structure
518
- # Files organized by API: bucket/filename.pdf
519
-
520
- # With --preserve-structure
521
- # Files keep structure: bucket/2024/4023260/filename.pdf
522
- arela --preserve-structure --batch-size 10
523
- ```
524
-
525
- ### Supabase Direct Mode (Fallback)
526
- When API is unavailable or `--force-supabase` is used:
527
- - βœ… Direct upload to Supabase Storage
528
- - βœ… File sanitization and renaming
529
- - βœ… Basic progress tracking
530
- - βœ… **Optimized sanitization with pre-compiled regex patterns**
531
- - βœ… **Performance caching for file name sanitization**
532
-
533
- ## Performance Features
534
-
535
- ### Database Pagination
536
- - **No Upload Limits**: Handles datasets larger than 1000 files through automatic pagination
537
- - **Efficient Querying**: Uses Supabase `.range()` method to fetch data in batches
538
- - **Memory Optimization**: Processes large datasets without memory overflow
539
-
540
- ### File Processing
541
- - **Pre-compiled Regex**: Sanitization patterns are compiled once for optimal performance
542
- - **Caching System**: File name sanitization results are cached to avoid re-processing
543
- - **Batch Processing**: Configurable batch sizes for optimal upload throughput
544
-
545
- ### RFC Upload Optimizations
546
- - **Smart Querying**: Three-step query process to efficiently find related files
547
- - **Supporting Document Inclusion**: Automatically includes all related documents, not just pedimentos
548
- - **Path Concatenation**: Efficiently combines custom folder structures with arela_paths
549
-
550
- ## File Sanitization
551
-
552
- The tool automatically handles problematic characters using advanced sanitization:
553
-
554
- **Character Replacements:**
555
- - **Accents**: Γ‘β†’a, Γ©β†’e, Γ­β†’i, Γ³β†’o, ΓΊβ†’u, Γ±β†’n, Γ§β†’c
556
- - **Korean characters**: λ©•β†’meok, μ‹œβ†’si, μ½”β†’ko, μš©β†’yong, othersβ†’kr
557
- - **Special symbols**: &β†’and, {}[]~^|"<>?*: β†’-
558
- - **Email symbols**: @→(removed), spaces→-
559
- - **Multiple dashes**: collapsed to single dash
560
- - **Leading/trailing**: dashes and dots removed
561
-
562
- **Performance Features:**
563
- - Pre-compiled regex patterns for faster processing
564
- - Sanitization result caching to avoid re-processing
565
- - Unicode normalization (NFD) for consistent handling
566
-
567
- ### Examples
568
-
569
- | Original | Renamed |
570
- |----------|---------|
571
- | `Facturas ImportaciΓ³n.pdf` | `Facturas-Importacion.pdf` |
572
- | `File{with}brackets.pdf` | `File-with-brackets.pdf` |
573
- | `Document ^& symbols.pdf` | `Document-and-symbols.pdf` |
574
- | `CI & PL-20221212(λ©•μ‹œμ½”μš©).xls` | `CI-and-PL-20221212.xls` |
575
- | `impresora@nereprint.com_file.xml` | `impresoranereprint.com_file.xml` |
576
- | `07-3429-3000430 HC.pdf` | `07-3429-3000430-HC.pdf` |
577
- | `FACTURA IN 3000430.pdf` | `FACTURA-IN-3000430.pdf` |
578
-
579
- ## Logging and Monitoring
580
-
581
- The tool maintains comprehensive logs both locally and remotely:
582
-
583
- **Local Logging (`arela-upload.log`):**
584
- - Upload status (SUCCESS/ERROR/SKIPPED/SANITIZED)
585
- - File paths and sanitization changes
586
- - Error messages and timestamps
587
- - Rename operations with before/after names
588
- - Processing statistics and performance metrics
589
-
590
- **Log Entry Examples:**
591
- ```
592
- [2025-09-04T01:17:00.141Z] SUCCESS: /Users/.../file.xml -> 2023/2003180/file.xml
593
- [2025-09-04T01:17:00.822Z] SANITIZED: file name.pdf β†’ file-name.pdf
594
- [2025-09-04T01:17:00.856Z] SKIPPED: /Users/.../duplicate.pdf (already exists)
595
- ```
596
-
597
- **Remote Logging:**
598
- - Integration with Supabase database for centralized logging
599
- - Upload tracking and audit trails
600
- - Error reporting and monitoring
601
-
602
- ## Performance Features
603
-
604
- **Version 2.0.0 introduces several performance optimizations:**
605
-
606
- - **Pre-compiled Regex Patterns**: Sanitization patterns are compiled once and reused
607
- - **Sanitization Caching**: File name sanitization results are cached to avoid reprocessing
608
- - **Batch Processing**: Configurable batch sizes for optimal API usage
609
- - **Concurrent Processing**: Adjustable concurrency levels for file processing
610
- - **Smart Skip Logic**: Efficiently skips already processed files using log analysis
611
- - **Memory Optimization**: Large file outputs are truncated to prevent memory issues
612
-
613
- ## Version History
614
-
615
- **v0.4.0** - Current Release πŸ†•
616
- - ✨ **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
617
- - ✨ **Cross-Tenant Mode**: Read from one API, upload to another
618
- - ✨ **Dynamic Client Config**: Change client by updating `.env` (no code changes)
619
- - ✨ New `--api` flag for single API target
620
- - ✨ New `--source-api` flag for source API (phases 1-3)
621
- - ✨ New `--target-api` flag for target API (phase 4)
622
- - ✨ `WATCH_DIRECTORY_CONFIGS` environment variable for watch mode
623
- - πŸ”§ Enhanced pipeline routing for cross-tenant operations
624
- - πŸ“ Simplified documentation for multi-tenant configuration
625
-
626
- **v0.3.0** - Watch Mode Release
627
- - ✨ Added watch command with chokidar integration
628
- - ✨ Automatic 4-step pipeline (stats β†’ detect β†’ propagate β†’ upload)
629
- - ✨ Multiple upload strategies (batch, individual, full-structure)
630
- - ✨ Configurable debounce and polling options
631
- - πŸ”§ Signal handling for graceful shutdown
632
-
633
- **v0.2.0** - Pipeline Automation
634
- - ✨ Added smart year/pedimento auto-detection from file paths
635
- - ✨ Custom folder structure support with `--folder-structure` option
636
- - ✨ Client path tracking with `--client-path` option
637
- - ✨ Performance optimizations with regex pre-compilation
638
- - ✨ Sanitization result caching for improved speed
639
- - ✨ Enhanced file sanitization with Korean character support
640
- - ✨ Improved email character handling in file names
641
- - ✨ Better error handling and logging
642
- - πŸ“ Comprehensive logging with SANITIZED status
643
- - πŸ”§ Memory optimization for large file processing
644
-
645
- **v0.1.0** - Initial Release
646
- - πŸ“¦ Basic upload functionality
647
- - πŸ”Œ API and Supabase direct mode support
648
- - πŸ“‚ RFC-based file upload
649
-
650
- ## Troubleshooting
651
-
652
- **Connection Issues:**
653
- - Verify `ARELA_API_URL` and `ARELA_API_TOKEN` are correct
654
- - Check network connectivity to the API endpoint
655
- - The tool will automatically fallback to Supabase direct mode if API is unavailable
656
-
657
- **Performance Issues:**
658
- - Adjust `--batch-size` for optimal API performance (default: 10)
659
- - Modify `--concurrency` to control parallel processing (default: 10)
660
- - Use `--show-stats` to monitor sanitization cache performance
661
-
662
- **File Issues:**
663
- - Check file permissions in `UPLOAD_BASE_PATH`
664
- - Verify `UPLOAD_SOURCES` paths exist and are accessible
665
- - Review `arela-upload.log` for detailed error information
666
-
667
- ## Contributing
668
-
669
- Contributions are welcome! Please feel free to submit a Pull Request.
670
-
671
- ## License
672
-
673
- ISC License - see LICENSE file for details.
package/comandos.md DELETED
@@ -1,11 +0,0 @@
1
- node src/index.js stats --stats-only
2
- node src/index.js detect --detect-pdfs
3
- node src/index.js detect --propagate-arela-path
4
- node src/index.js upload --upload-by-rfc
5
-
6
-
7
-
8
- node src/index.js scan
9
- node src/index.js identify
10
- node src/index.js propagate
11
- node src/index.js push