@arela/uploader 1.0.12 β 1.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.nuevo +277 -0
- package/.env.palco +315 -0
- package/README.md +235 -568
- package/README.old.md +673 -0
- package/comandos.md +11 -0
- package/package.json +1 -1
- package/src/config/config.js +2 -2
package/README.old.md
ADDED
|
@@ -0,0 +1,673 @@
|
|
|
1
|
+
# arela-uploader
|
|
2
|
+
|
|
3
|
+
CLI tool to upload files and directories to Arela API or Supabase Storage with automatic file processing, detection, and organization.
|
|
4
|
+
|
|
5
|
+
## β¨ What's New in v0.4.0
|
|
6
|
+
|
|
7
|
+
- π’ **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
|
|
8
|
+
- π **Cross-Tenant Mode**: Read from one API, write to another with `--source-api` and `--target-api`
|
|
9
|
+
- βοΈ **Dynamic Client Config**: Switch clients by updating `.env` - no code changes needed!
|
|
10
|
+
- ποΈ **Enhanced Watch Mode**: Full cross-tenant support in automatic processing pipeline
|
|
11
|
+
- β‘ **Optimized Connections**: HTTP Agent with connection pooling for high performance
|
|
12
|
+
|
|
13
|
+
## π OPTIMIZED 4-PHASE WORKFLOW
|
|
14
|
+
|
|
15
|
+
**New in v0.2.0**: The tool now supports an optimized 4-phase workflow designed for maximum performance when processing large file collections:
|
|
16
|
+
|
|
17
|
+
### Phase 1: Filesystem Stats Collection π
|
|
18
|
+
```bash
|
|
19
|
+
arela --stats-only
|
|
20
|
+
```
|
|
21
|
+
- β‘ **ULTRA FAST**: Only reads filesystem metadata (no file content)
|
|
22
|
+
- π **Bulk database operations**: Processes 1000+ files per batch
|
|
23
|
+
- π **Upsert optimization**: Handles duplicates efficiently
|
|
24
|
+
- πΎ **Minimal memory usage**: No file content loading
|
|
25
|
+
|
|
26
|
+
### Phase 2: PDF Detection π
|
|
27
|
+
```bash
|
|
28
|
+
arela --detect-pdfs
|
|
29
|
+
```
|
|
30
|
+
- π― **Targeted processing**: Only processes PDF files from database
|
|
31
|
+
- οΏ½ **Pedimento-simplificado detection**: Extracts RFC, pedimento numbers, and metadata
|
|
32
|
+
- π **Batched processing**: Handles large datasets efficiently
|
|
33
|
+
- π **Progress tracking**: Real-time detection statistics
|
|
34
|
+
|
|
35
|
+
### Phase 3: Path Propagation οΏ½π
|
|
36
|
+
```bash
|
|
37
|
+
arela --propagate-arela-path
|
|
38
|
+
```
|
|
39
|
+
- π― **Smart path copying**: Propagates arela_path from pedimento documents to related files
|
|
40
|
+
- π¦ **Batch updates**: Processes files in groups for optimal database performance
|
|
41
|
+
- π **Relationship mapping**: Links supporting documents to their pedimento
|
|
42
|
+
|
|
43
|
+
### Phase 4: RFC-based Upload π
|
|
44
|
+
```bash
|
|
45
|
+
arela --upload-by-rfc
|
|
46
|
+
```
|
|
47
|
+
- π― **Targeted uploads**: Only uploads files for specified RFCs
|
|
48
|
+
- π **Supporting documents**: Includes all related files, not just pedimentos
|
|
49
|
+
- ποΈ **Structure preservation**: Maintains proper folder hierarchy
|
|
50
|
+
|
|
51
|
+
### Combined Workflow π―
|
|
52
|
+
```bash
|
|
53
|
+
# Run all 4 phases in sequence (recommended)
|
|
54
|
+
arela --run-all-phases
|
|
55
|
+
|
|
56
|
+
# Or run phases individually for more control
|
|
57
|
+
arela --stats-only # Phase 1: Collect filesystem stats
|
|
58
|
+
arela --detect-pdfs # Phase 2: Detect pedimento documents
|
|
59
|
+
arela --propagate-arela-path # Phase 3: Propagate paths to related files
|
|
60
|
+
arela --upload-by-rfc # Phase 4: Upload by RFC
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Performance Benefits
|
|
64
|
+
|
|
65
|
+
**Before optimization** (single phase with detection):
|
|
66
|
+
- π Read every file for detection
|
|
67
|
+
- πΎ High memory usage
|
|
68
|
+
- π Slow database operations
|
|
69
|
+
- β Process unsupported files
|
|
70
|
+
|
|
71
|
+
**After optimization** (4-phase approach):
|
|
72
|
+
- β‘ **10x faster**: Phase 1 only reads filesystem metadata
|
|
73
|
+
- π **Bulk operations**: Database inserts up to 1000 records per batch
|
|
74
|
+
- π― **Targeted processing**: Phase 2 only processes PDFs needing detection
|
|
75
|
+
- πΎ **Memory efficient**: No unnecessary file content loading
|
|
76
|
+
- π **Optimized I/O**: Separates filesystem, database, and network operations
|
|
77
|
+
|
|
78
|
+
## Features
|
|
79
|
+
|
|
80
|
+
- π Upload entire directories or individual files
|
|
81
|
+
- π€ **Automatic file detection and organization** (API mode)
|
|
82
|
+
- ποΈ **Smart year/pedimento auto-detection from file paths**
|
|
83
|
+
- ποΈ **Custom folder structure support**
|
|
84
|
+
- π Automatic file renaming to handle problematic characters
|
|
85
|
+
- π Comprehensive logging (local and remote)
|
|
86
|
+
- β‘ Retry mechanism for failed uploads
|
|
87
|
+
- π― Skip duplicate files automatically
|
|
88
|
+
- π Progress bars and detailed summaries
|
|
89
|
+
- π **Preserve directory structure with auto-organization**
|
|
90
|
+
- π **Batch processing with configurable concurrency**
|
|
91
|
+
- π§ **Performance optimizations with caching**
|
|
92
|
+
- π **Upload files by specific RFC values**
|
|
93
|
+
- π **Propagate arela_path from pedimento documents to related files**
|
|
94
|
+
- β‘ **4-Phase optimized workflow for maximum performance**
|
|
95
|
+
- ποΈ **Watch Mode** - Monitor directories for changes and upload automatically
|
|
96
|
+
- Multiple watch strategies (batch, individual, full-structure)
|
|
97
|
+
- **Multi-tenant and cross-tenant support** β NEW
|
|
98
|
+
- Debounce and polling support
|
|
99
|
+
- Auto-processing pipeline
|
|
100
|
+
- Dry-run mode for testing
|
|
101
|
+
- Pattern-based file ignoring
|
|
102
|
+
|
|
103
|
+
## π’ Multi-Tenant API Support
|
|
104
|
+
|
|
105
|
+
Connect to different API instances: **default**, **agencia**, or **cliente**.
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
# Upload to client API
|
|
109
|
+
arela upload --api cliente --upload-by-rfc
|
|
110
|
+
|
|
111
|
+
# Collect stats on agencia API
|
|
112
|
+
arela stats --api agencia
|
|
113
|
+
|
|
114
|
+
# Watch mode with specific API target
|
|
115
|
+
arela watch --api cliente
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Cross-Tenant Mode
|
|
119
|
+
|
|
120
|
+
Process files from one tenant and upload to another:
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# Read data from agencia, upload files to client
|
|
124
|
+
arela watch --source-api agencia --target-api cliente
|
|
125
|
+
|
|
126
|
+
# Same for upload command
|
|
127
|
+
arela upload --source-api agencia --target-api cliente --upload-by-rfc
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**How Cross-Tenant Works:**
|
|
131
|
+
| Phase | Description | API Used |
|
|
132
|
+
|-------|-------------|----------|
|
|
133
|
+
| Phase 1 | Stats Collection | `--source-api` |
|
|
134
|
+
| Phase 2 | PDF Detection | `--source-api` |
|
|
135
|
+
| Phase 3 | Path Propagation | `--source-api` |
|
|
136
|
+
| Phase 4 | File Upload | `--target-api` |
|
|
137
|
+
|
|
138
|
+
### Available API Targets
|
|
139
|
+
|
|
140
|
+
Only 3 API targets are available: `default`, `agencia`, `cliente`
|
|
141
|
+
|
|
142
|
+
Configure in your `.env` file:
|
|
143
|
+
|
|
144
|
+
```env
|
|
145
|
+
# Default API (--api default or no flag)
|
|
146
|
+
ARELA_API_URL=http://localhost:3010
|
|
147
|
+
ARELA_API_TOKEN=your_token
|
|
148
|
+
|
|
149
|
+
# Agencia API (--api agencia)
|
|
150
|
+
ARELA_API_AGENCIA_URL=http://localhost:4012
|
|
151
|
+
ARELA_API_AGENCIA_TOKEN=your_agencia_token
|
|
152
|
+
|
|
153
|
+
# Cliente API (--api cliente)
|
|
154
|
+
# Configure the URL/Token for the specific client you need
|
|
155
|
+
ARELA_API_CLIENTE_URL=http://localhost:4014
|
|
156
|
+
ARELA_API_CLIENTE_TOKEN=your_cliente_token
|
|
157
|
+
|
|
158
|
+
# Examples for different clients:
|
|
159
|
+
# Cliente AUM9207011CA: ARELA_API_CLIENTE_URL=http://localhost:4014
|
|
160
|
+
# Cliente KTJ931117P55: ARELA_API_CLIENTE_URL=http://localhost:4013
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
> π‘ **Tip**: To switch between clients, just update `ARELA_API_CLIENTE_URL` and `ARELA_API_CLIENTE_TOKEN` in your `.env` file. No code changes needed!
|
|
164
|
+
|
|
165
|
+
## Installation
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
npm install -g @arela/uploader
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
## Usage
|
|
172
|
+
|
|
173
|
+
### π Optimized 4-Phase Workflow (Recommended)
|
|
174
|
+
|
|
175
|
+
```bash
|
|
176
|
+
# Run all phases automatically (most efficient)
|
|
177
|
+
arela upload --run-all-phases --batch-size 20
|
|
178
|
+
|
|
179
|
+
# Or run phases individually for fine-grained control
|
|
180
|
+
arela stats # Phase 1: Filesystem stats only
|
|
181
|
+
arela detect # Phase 2: PDF detection
|
|
182
|
+
arela detect --propagate-arela-path # Phase 3: Path propagation
|
|
183
|
+
arela upload --upload-by-rfc # Phase 4: RFC-based upload
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### Available Commands
|
|
187
|
+
|
|
188
|
+
#### 1. **upload** - Upload files to Arela
|
|
189
|
+
```bash
|
|
190
|
+
# Basic upload with auto-processing (API Mode)
|
|
191
|
+
arela upload --batch-size 10
|
|
192
|
+
|
|
193
|
+
# Upload with auto-detection of year/pedimento from file paths
|
|
194
|
+
arela upload --auto-detect-structure --batch-size 10
|
|
195
|
+
|
|
196
|
+
# Upload with custom folder structure
|
|
197
|
+
arela upload --folder-structure "2024/4023260" --batch-size 10
|
|
198
|
+
|
|
199
|
+
# Upload to Supabase directly (skip API)
|
|
200
|
+
arela upload --force-supabase --prefix "my-folder"
|
|
201
|
+
|
|
202
|
+
# Upload files by specific RFC values
|
|
203
|
+
arela upload --upload-by-rfc --batch-size 5
|
|
204
|
+
|
|
205
|
+
# Upload RFC files with custom folder prefix
|
|
206
|
+
arela upload --upload-by-rfc --folder-structure "palco" --batch-size 5
|
|
207
|
+
|
|
208
|
+
# Upload RFC files with nested folder structure
|
|
209
|
+
arela upload --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
|
|
210
|
+
|
|
211
|
+
# Upload with performance statistics
|
|
212
|
+
arela upload --batch-size 10 --show-stats
|
|
213
|
+
|
|
214
|
+
# Upload with client path tracking
|
|
215
|
+
arela upload --client-path "/client/documents" --batch-size 10
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
#### 2. **stats** - Collect file statistics without uploading
|
|
219
|
+
```bash
|
|
220
|
+
# Collect filesystem statistics only (Phase 1)
|
|
221
|
+
arela stats --batch-size 10
|
|
222
|
+
|
|
223
|
+
# Stats with custom folder organization
|
|
224
|
+
arela stats --folder-structure "2023/3019796" --batch-size 10
|
|
225
|
+
|
|
226
|
+
# Stats with client path tracking
|
|
227
|
+
arela stats --client-path "/client/documents" --batch-size 10
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
#### 3. **detect** - Run document detection and path propagation
|
|
231
|
+
```bash
|
|
232
|
+
# Run PDF detection on existing database records (Phase 2)
|
|
233
|
+
arela detect --batch-size 10
|
|
234
|
+
|
|
235
|
+
# Propagate arela_path from pedimento records to related files (Phase 3)
|
|
236
|
+
arela detect --propagate-arela-path
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
#### 4. **watch** - Monitor directories and upload automatically β NEW
|
|
240
|
+
```bash
|
|
241
|
+
# Watch directories for changes with automatic upload
|
|
242
|
+
arela watch --directories "/path/to/watch1,/path/to/watch2"
|
|
243
|
+
|
|
244
|
+
# Watch with specific API target (single tenant)
|
|
245
|
+
arela watch --api cliente
|
|
246
|
+
|
|
247
|
+
# Watch with cross-tenant mode (read from agencia, upload to client)
|
|
248
|
+
arela watch --source-api agencia --target-api cliente
|
|
249
|
+
|
|
250
|
+
# Watch with custom upload strategy (default: batch)
|
|
251
|
+
arela watch --directories "/path/to/watch" --strategy individual
|
|
252
|
+
arela watch --directories "/path/to/watch" --strategy full-structure
|
|
253
|
+
|
|
254
|
+
# Watch with custom debounce delay (default: 1000ms)
|
|
255
|
+
arela watch --directories "/path/to/watch" --debounce 2000
|
|
256
|
+
|
|
257
|
+
# Watch with automatic 4-step pipeline
|
|
258
|
+
arela watch --directories "/path/to/watch" --auto-processing --batch-size 10
|
|
259
|
+
|
|
260
|
+
# Watch with polling instead of native file system events
|
|
261
|
+
arela watch --directories "/path/to/watch" --poll 5000
|
|
262
|
+
|
|
263
|
+
# Watch with pattern ignoring
|
|
264
|
+
arela watch --directories "/path/to/watch" --ignore "node_modules,*.log,*.tmp"
|
|
265
|
+
|
|
266
|
+
# Watch in dry-run mode (simulate without uploading)
|
|
267
|
+
arela watch --directories "/path/to/watch" --dry-run
|
|
268
|
+
|
|
269
|
+
# Watch with verbose logging
|
|
270
|
+
arela watch --directories "/path/to/watch" --verbose
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
**Watch Strategies:**
|
|
274
|
+
- `batch` **(default)**: Groups files and uploads periodically
|
|
275
|
+
- `individual`: Uploads each file immediately as it changes
|
|
276
|
+
- `full-structure`: Preserves directory structure during upload
|
|
277
|
+
|
|
278
|
+
**Multi-Tenant Options:**
|
|
279
|
+
- `--api <target>`: Use a single API for all operations
|
|
280
|
+
- `--source-api <target>`: API for reading/processing (phases 1-3)
|
|
281
|
+
- `--target-api <target>`: API for uploading (phase 4)
|
|
282
|
+
|
|
283
|
+
#### 5. **query** - Query database for file status
|
|
284
|
+
```bash
|
|
285
|
+
# Show files ready for upload
|
|
286
|
+
arela query --ready-files
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
#### 6. **config** - Show current configuration
|
|
290
|
+
```bash
|
|
291
|
+
# Display all configuration settings
|
|
292
|
+
arela config
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### Legacy Syntax (Still Supported)
|
|
296
|
+
|
|
297
|
+
The old flag-based syntax is still supported for backward compatibility:
|
|
298
|
+
|
|
299
|
+
```bash
|
|
300
|
+
# These are equivalent to the commands above
|
|
301
|
+
arela --stats-only # Same as: arela stats
|
|
302
|
+
arela --detect-pdfs # Same as: arela detect
|
|
303
|
+
arela --propagate-arela-path # Same as: arela detect --propagate-arela-path
|
|
304
|
+
arela --upload-by-rfc # Same as: arela upload --upload-by-rfc
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
#### Phase Control
|
|
308
|
+
- `--stats-only`: **Phase 1** - Only collect filesystem stats (no file reading)
|
|
309
|
+
- `--detect-pdfs`: **Phase 2** - Process PDF files for pedimento-simplificado detection
|
|
310
|
+
- `--propagate-arela-path`: **Phase 3** - Propagate arela_path from pedimento records to related files
|
|
311
|
+
- `--upload-by-rfc`: **Phase 4** - Upload files based on RFC values from UPLOAD_RFCS
|
|
312
|
+
- `--run-all-phases`: **All Phases** - Run complete optimized workflow
|
|
313
|
+
|
|
314
|
+
#### Global Options (all commands)
|
|
315
|
+
- `-v, --verbose`: Enable verbose logging
|
|
316
|
+
- `--clear-log`: Clear the log file before starting
|
|
317
|
+
- `-h, --help`: Display help information
|
|
318
|
+
- `--version`: Display version number
|
|
319
|
+
|
|
320
|
+
#### Upload Command Options
|
|
321
|
+
- `-b, --batch-size <size>`: API batch size (default: 10)
|
|
322
|
+
- `--folder-structure <structure>`: Custom folder structure (e.g., "2024/4023260")
|
|
323
|
+
- `--client-path <path>`: Client path for metadata tracking
|
|
324
|
+
- `--auto-detect-structure`: Automatically detect year/pedimento from file paths
|
|
325
|
+
- `--auto-detect`: Enable automatic document type detection
|
|
326
|
+
- `--auto-organize`: Enable automatic file organization
|
|
327
|
+
- `--force-supabase`: Force direct Supabase upload (skip API)
|
|
328
|
+
- `--skip-processed`: Skip files already processed
|
|
329
|
+
- `--show-stats`: Show performance statistics
|
|
330
|
+
- `--upload-by-rfc`: Upload files based on RFC values from UPLOAD_RFCS
|
|
331
|
+
- `--run-all-phases`: Run all processing phases sequentially
|
|
332
|
+
|
|
333
|
+
#### Stats Command Options
|
|
334
|
+
- `-b, --batch-size <size>`: Batch size for processing (default: 10)
|
|
335
|
+
- `--client-path <path>`: Client path for metadata tracking
|
|
336
|
+
- `--show-stats`: Show performance statistics
|
|
337
|
+
|
|
338
|
+
#### Detect Command Options
|
|
339
|
+
- `-b, --batch-size <size>`: Batch size for PDF detection (default: 10)
|
|
340
|
+
- `--propagate-arela-path`: Propagate arela_path from pedimento records to related files
|
|
341
|
+
|
|
342
|
+
#### Watch Command Options
|
|
343
|
+
- `-d, --directories <paths>`: **Comma-separated directories to watch** (required)
|
|
344
|
+
- `-s, --strategy <strategy>`: Upload strategy (default: batch)
|
|
345
|
+
- `batch`: Groups files and uploads periodically
|
|
346
|
+
- `individual`: Uploads each file immediately
|
|
347
|
+
- `full-structure`: Preserves directory structure
|
|
348
|
+
- `--api <target>`: Use a single API target for all operations
|
|
349
|
+
- `--source-api <target>`: API for reading/processing (phases 1-3)
|
|
350
|
+
- `--target-api <target>`: API for uploading (phase 4)
|
|
351
|
+
- `--debounce <ms>`: Debounce delay in milliseconds (default: 1000)
|
|
352
|
+
- `-b, --batch-size <size>`: Batch size for uploads (default: 10)
|
|
353
|
+
- `--poll <ms>`: Use polling instead of native file system events (interval in ms)
|
|
354
|
+
- `--ignore <patterns>`: Comma-separated patterns to ignore
|
|
355
|
+
- `--auto-detect`: Enable automatic document type detection
|
|
356
|
+
- `--auto-organize`: Enable automatic file organization
|
|
357
|
+
- `--auto-processing`: Enable automatic 4-step pipeline (stats, detect, propagate, upload)
|
|
358
|
+
- `--dry-run`: Simulate changes without uploading
|
|
359
|
+
- `--verbose`: Enable verbose logging
|
|
360
|
+
|
|
361
|
+
## Environment Variables
|
|
362
|
+
|
|
363
|
+
Create a `.env` file in your project root:
|
|
364
|
+
|
|
365
|
+
```env
|
|
366
|
+
# Default API (--api default or no flag)
|
|
367
|
+
ARELA_API_URL=http://localhost:3010
|
|
368
|
+
ARELA_API_TOKEN=your_api_token
|
|
369
|
+
|
|
370
|
+
# Agencia API (--api agencia)
|
|
371
|
+
ARELA_API_AGENCIA_URL=http://localhost:4012
|
|
372
|
+
ARELA_API_AGENCIA_TOKEN=your_agencia_token
|
|
373
|
+
|
|
374
|
+
# Cliente API (--api cliente)
|
|
375
|
+
# Configure for the specific client you need
|
|
376
|
+
ARELA_API_CLIENTE_URL=http://localhost:4014
|
|
377
|
+
ARELA_API_CLIENTE_TOKEN=your_cliente_token
|
|
378
|
+
|
|
379
|
+
# For Direct Supabase Mode (fallback)
|
|
380
|
+
SUPABASE_URL=your_supabase_url
|
|
381
|
+
SUPABASE_KEY=your_supabase_anon_key
|
|
382
|
+
SUPABASE_BUCKET=your_bucket_name
|
|
383
|
+
|
|
384
|
+
# Required for both modes
|
|
385
|
+
UPLOAD_BASE_PATH=/path/to/your/files
|
|
386
|
+
UPLOAD_SOURCES=folder1|folder2|file.pdf
|
|
387
|
+
|
|
388
|
+
# RFC-based Upload Configuration
|
|
389
|
+
# Pipe-separated list of RFCs to upload files for
|
|
390
|
+
UPLOAD_RFCS=MMJ0810145N1|ABC1234567XY|DEF9876543ZZ
|
|
391
|
+
|
|
392
|
+
# Watch Mode Configuration (JSON format)
|
|
393
|
+
WATCH_DIRECTORY_CONFIGS={"../../Documents/2022":"palco","../../Documents/2023":"palco"}
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
**Environment Variable Details:**
|
|
397
|
+
|
|
398
|
+
- `ARELA_API_URL`: Base URL for default API service
|
|
399
|
+
- `ARELA_API_AGENCIA_URL`: URL for agencia API
|
|
400
|
+
- `ARELA_API_CLIENTE_URL`: URL for client API (configure per client)
|
|
401
|
+
- `ARELA_API_TOKEN`: Authentication token for default API
|
|
402
|
+
- `ARELA_API_AGENCIA_TOKEN`: Token for agencia API
|
|
403
|
+
- `ARELA_API_CLIENTE_TOKEN`: Token for client API
|
|
404
|
+
- `SUPABASE_URL`: Your Supabase project URL
|
|
405
|
+
- `SUPABASE_KEY`: Supabase anonymous key for direct uploads
|
|
406
|
+
- `SUPABASE_BUCKET`: Target bucket name in Supabase Storage
|
|
407
|
+
- `UPLOAD_BASE_PATH`: Root directory containing files to upload
|
|
408
|
+
- `UPLOAD_SOURCES`: Pipe-separated list of folders/files to process
|
|
409
|
+
- `UPLOAD_RFCS`: Pipe-separated list of RFC values for targeted uploads
|
|
410
|
+
- `WATCH_DIRECTORY_CONFIGS`: JSON mapping directories to folder structures
|
|
411
|
+
|
|
412
|
+
## RFC-Based File Upload
|
|
413
|
+
|
|
414
|
+
The `--upload-by-rfc` feature allows you to upload files to the Arela API based on specific RFC values. This is useful when you want to upload only files associated with certain companies or entities.
|
|
415
|
+
|
|
416
|
+
### How it works:
|
|
417
|
+
|
|
418
|
+
1. **Configure RFCs**: Set the `UPLOAD_RFCS` environment variable with pipe-separated RFC values
|
|
419
|
+
2. **Query Database**: The tool searches the Supabase database for files matching the specified RFCs
|
|
420
|
+
3. **Include Supporting Documents**: Finds all files sharing the same `arela_path` as the RFC matches (not just the pedimento files)
|
|
421
|
+
4. **Apply Folder Structure**: Optionally applies custom folder prefix using `--folder-structure`
|
|
422
|
+
5. **Group and Upload**: Files are grouped by their final destination path and uploaded with proper structure
|
|
423
|
+
|
|
424
|
+
### Folder Structure Options:
|
|
425
|
+
|
|
426
|
+
**Default Behavior** (no `--folder-structure`):
|
|
427
|
+
- Uses original `arela_path`: `CAD890407NK7/2023/3429/070/230734293000421/`
|
|
428
|
+
|
|
429
|
+
**With Custom Prefix** (`--folder-structure "palco"`):
|
|
430
|
+
- Results in: `palco/CAD890407NK7/2023/3429/070/230734293000421/`
|
|
431
|
+
|
|
432
|
+
**With Nested Prefix** (`--folder-structure "2024/client1/pedimentos"`):
|
|
433
|
+
- Results in: `2024/client1/pedimentos/CAD890407NK7/2023/3429/070/230734293000421/`
|
|
434
|
+
|
|
435
|
+
### Prerequisites:
|
|
436
|
+
|
|
437
|
+
- Files must have been previously processed (have entries in the `uploader` table)
|
|
438
|
+
- Files must have `rfc` field populated (from document detection)
|
|
439
|
+
- Files must have `arela_path` populated (from pedimento processing)
|
|
440
|
+
- Original files must still exist at their `original_path` locations
|
|
441
|
+
|
|
442
|
+
### Example:
|
|
443
|
+
|
|
444
|
+
```bash
|
|
445
|
+
# Set RFCs in environment
|
|
446
|
+
export UPLOAD_RFCS="MMJ0810145N1|ABC1234567XY|DEF9876543ZZ"
|
|
447
|
+
|
|
448
|
+
# Upload files for these RFCs (original folder structure)
|
|
449
|
+
arela --upload-by-rfc --batch-size 5 --show-stats
|
|
450
|
+
|
|
451
|
+
# Upload with custom folder prefix
|
|
452
|
+
arela --upload-by-rfc --folder-structure "palco" --batch-size 10
|
|
453
|
+
|
|
454
|
+
# Upload with nested organization
|
|
455
|
+
arela --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
The tool will:
|
|
459
|
+
- Find all database records matching the specified RFCs
|
|
460
|
+
- Include ALL supporting documents that share the same `arela_path`
|
|
461
|
+
- Apply the optional folder structure prefix if specified
|
|
462
|
+
- Group files by their final destination folder structure
|
|
463
|
+
- Upload each group maintaining the correct Arela folder hierarchy
|
|
464
|
+
- Provide detailed progress and summary statistics
|
|
465
|
+
- Handle large datasets with automatic pagination (no 1000-file limit)
|
|
466
|
+
|
|
467
|
+
## File Processing Modes
|
|
468
|
+
|
|
469
|
+
### API Mode (Default)
|
|
470
|
+
When `ARELA_API_URL` and `ARELA_API_TOKEN` are configured:
|
|
471
|
+
- β
Automatic file detection and classification
|
|
472
|
+
- β
Intelligent file organization
|
|
473
|
+
- β
**Smart year/pedimento auto-detection from paths**
|
|
474
|
+
- β
**Custom folder structure support**
|
|
475
|
+
- β
Batch processing with progress tracking
|
|
476
|
+
- β
Advanced error handling and retry logic
|
|
477
|
+
- β
**Performance optimizations with file sanitization caching**
|
|
478
|
+
|
|
479
|
+
### Auto-Detection Features
|
|
480
|
+
The tool can automatically detect year and pedimento numbers from file paths using multiple patterns:
|
|
481
|
+
|
|
482
|
+
**Pattern 1: Direct Structure**
|
|
483
|
+
```
|
|
484
|
+
/path/to/2024/4023260/file.pdf
|
|
485
|
+
/path/to/pedimentos/2024/4023260/file.pdf
|
|
486
|
+
```
|
|
487
|
+
|
|
488
|
+
**Pattern 2: Named Patterns**
|
|
489
|
+
```
|
|
490
|
+
/path/to/docs/aΓ±o2024/ped4023260/file.pdf
|
|
491
|
+
/path/to/files/year2024/pedimento4023260/file.pdf
|
|
492
|
+
```
|
|
493
|
+
|
|
494
|
+
**Pattern 3: Loose Detection**
|
|
495
|
+
- Year: Any 4-digit number starting with "202" (2020-2029)
|
|
496
|
+
- Pedimento: Any 4-8 consecutive digits in path
|
|
497
|
+
|
|
498
|
+
Use `--auto-detect-structure` to enable automatic detection:
|
|
499
|
+
```bash
|
|
500
|
+
arela --auto-detect-structure --batch-size 10
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
### Custom Folder Structure
|
|
504
|
+
Specify a custom organization pattern:
|
|
505
|
+
```bash
|
|
506
|
+
# Static structure
|
|
507
|
+
arela --folder-structure "2024/4023260" --batch-size 10
|
|
508
|
+
|
|
509
|
+
# Client-based structure
|
|
510
|
+
arela --folder-structure "cliente1/pedimentos" --batch-size 10
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
### Directory Structure Preservation
|
|
514
|
+
Use `--preserve-structure` to maintain your original folder structure even with auto-organization:
|
|
515
|
+
|
|
516
|
+
```bash
|
|
517
|
+
# Without --preserve-structure
|
|
518
|
+
# Files organized by API: bucket/filename.pdf
|
|
519
|
+
|
|
520
|
+
# With --preserve-structure
|
|
521
|
+
# Files keep structure: bucket/2024/4023260/filename.pdf
|
|
522
|
+
arela --preserve-structure --batch-size 10
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
### Supabase Direct Mode (Fallback)
|
|
526
|
+
When API is unavailable or `--force-supabase` is used:
|
|
527
|
+
- β
Direct upload to Supabase Storage
|
|
528
|
+
- β
File sanitization and renaming
|
|
529
|
+
- β
Basic progress tracking
|
|
530
|
+
- β
**Optimized sanitization with pre-compiled regex patterns**
|
|
531
|
+
- β
**Performance caching for file name sanitization**
|
|
532
|
+
|
|
533
|
+
## Performance Features
|
|
534
|
+
|
|
535
|
+
### Database Pagination
|
|
536
|
+
- **No Upload Limits**: Handles datasets larger than 1000 files through automatic pagination
|
|
537
|
+
- **Efficient Querying**: Uses Supabase `.range()` method to fetch data in batches
|
|
538
|
+
- **Memory Optimization**: Processes large datasets without memory overflow
|
|
539
|
+
|
|
540
|
+
### File Processing
|
|
541
|
+
- **Pre-compiled Regex**: Sanitization patterns are compiled once for optimal performance
|
|
542
|
+
- **Caching System**: File name sanitization results are cached to avoid re-processing
|
|
543
|
+
- **Batch Processing**: Configurable batch sizes for optimal upload throughput
|
|
544
|
+
|
|
545
|
+
### RFC Upload Optimizations
|
|
546
|
+
- **Smart Querying**: Three-step query process to efficiently find related files
|
|
547
|
+
- **Supporting Document Inclusion**: Automatically includes all related documents, not just pedimentos
|
|
548
|
+
- **Path Concatenation**: Efficiently combines custom folder structures with arela_paths
|
|
549
|
+
|
|
550
|
+
## File Sanitization
|
|
551
|
+
|
|
552
|
+
The tool automatically handles problematic characters using advanced sanitization:
|
|
553
|
+
|
|
554
|
+
**Character Replacements:**
|
|
555
|
+
- **Accents**: Γ‘βa, Γ©βe, Γβi, Γ³βo, ΓΊβu, Γ±βn, Γ§βc
|
|
556
|
+
- **Korean characters**: λ©βmeok, μβsi, μ½βko, μ©βyong, othersβkr
|
|
557
|
+
- **Special symbols**: &βand, {}[]~^|"<>?*: β-
|
|
558
|
+
- **Email symbols**: @β(removed), spacesβ-
|
|
559
|
+
- **Multiple dashes**: collapsed to single dash
|
|
560
|
+
- **Leading/trailing**: dashes and dots removed
|
|
561
|
+
|
|
562
|
+
**Performance Features:**
|
|
563
|
+
- Pre-compiled regex patterns for faster processing
|
|
564
|
+
- Sanitization result caching to avoid re-processing
|
|
565
|
+
- Unicode normalization (NFD) for consistent handling
|
|
566
|
+
|
|
567
|
+
### Examples
|
|
568
|
+
|
|
569
|
+
| Original | Renamed |
|
|
570
|
+
|----------|---------|
|
|
571
|
+
| `Facturas ImportaciΓ³n.pdf` | `Facturas-Importacion.pdf` |
|
|
572
|
+
| `File{with}brackets.pdf` | `File-with-brackets.pdf` |
|
|
573
|
+
| `Document ^& symbols.pdf` | `Document-and-symbols.pdf` |
|
|
574
|
+
| `CI & PL-20221212(λ©μμ½μ©).xls` | `CI-and-PL-20221212.xls` |
|
|
575
|
+
| `impresora@nereprint.com_file.xml` | `impresoranereprint.com_file.xml` |
|
|
576
|
+
| `07-3429-3000430 HC.pdf` | `07-3429-3000430-HC.pdf` |
|
|
577
|
+
| `FACTURA IN 3000430.pdf` | `FACTURA-IN-3000430.pdf` |
|
|
578
|
+
|
|
579
|
+
## Logging and Monitoring
|
|
580
|
+
|
|
581
|
+
The tool maintains comprehensive logs both locally and remotely:
|
|
582
|
+
|
|
583
|
+
**Local Logging (`arela-upload.log`):**
|
|
584
|
+
- Upload status (SUCCESS/ERROR/SKIPPED/SANITIZED)
|
|
585
|
+
- File paths and sanitization changes
|
|
586
|
+
- Error messages and timestamps
|
|
587
|
+
- Rename operations with before/after names
|
|
588
|
+
- Processing statistics and performance metrics
|
|
589
|
+
|
|
590
|
+
**Log Entry Examples:**
|
|
591
|
+
```
|
|
592
|
+
[2025-09-04T01:17:00.141Z] SUCCESS: /Users/.../file.xml -> 2023/2003180/file.xml
|
|
593
|
+
[2025-09-04T01:17:00.822Z] SANITIZED: file name.pdf β file-name.pdf
|
|
594
|
+
[2025-09-04T01:17:00.856Z] SKIPPED: /Users/.../duplicate.pdf (already exists)
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
**Remote Logging:**
|
|
598
|
+
- Integration with Supabase database for centralized logging
|
|
599
|
+
- Upload tracking and audit trails
|
|
600
|
+
- Error reporting and monitoring
|
|
601
|
+
|
|
602
|
+
## Performance Features
|
|
603
|
+
|
|
604
|
+
**Version 2.0.0 introduces several performance optimizations:**
|
|
605
|
+
|
|
606
|
+
- **Pre-compiled Regex Patterns**: Sanitization patterns are compiled once and reused
|
|
607
|
+
- **Sanitization Caching**: File name sanitization results are cached to avoid reprocessing
|
|
608
|
+
- **Batch Processing**: Configurable batch sizes for optimal API usage
|
|
609
|
+
- **Concurrent Processing**: Adjustable concurrency levels for file processing
|
|
610
|
+
- **Smart Skip Logic**: Efficiently skips already processed files using log analysis
|
|
611
|
+
- **Memory Optimization**: Large file outputs are truncated to prevent memory issues
|
|
612
|
+
|
|
613
|
+
## Version History
|
|
614
|
+
|
|
615
|
+
**v0.4.0** - Current Release π
|
|
616
|
+
- β¨ **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
|
|
617
|
+
- β¨ **Cross-Tenant Mode**: Read from one API, upload to another
|
|
618
|
+
- β¨ **Dynamic Client Config**: Change client by updating `.env` (no code changes)
|
|
619
|
+
- β¨ New `--api` flag for single API target
|
|
620
|
+
- β¨ New `--source-api` flag for source API (phases 1-3)
|
|
621
|
+
- β¨ New `--target-api` flag for target API (phase 4)
|
|
622
|
+
- β¨ `WATCH_DIRECTORY_CONFIGS` environment variable for watch mode
|
|
623
|
+
- π§ Enhanced pipeline routing for cross-tenant operations
|
|
624
|
+
- π Simplified documentation for multi-tenant configuration
|
|
625
|
+
|
|
626
|
+
**v0.3.0** - Watch Mode Release
|
|
627
|
+
- β¨ Added watch command with chokidar integration
|
|
628
|
+
- β¨ Automatic 4-step pipeline (stats β detect β propagate β upload)
|
|
629
|
+
- β¨ Multiple upload strategies (batch, individual, full-structure)
|
|
630
|
+
- β¨ Configurable debounce and polling options
|
|
631
|
+
- π§ Signal handling for graceful shutdown
|
|
632
|
+
|
|
633
|
+
**v0.2.0** - Pipeline Automation
|
|
634
|
+
- β¨ Added smart year/pedimento auto-detection from file paths
|
|
635
|
+
- β¨ Custom folder structure support with `--folder-structure` option
|
|
636
|
+
- β¨ Client path tracking with `--client-path` option
|
|
637
|
+
- β¨ Performance optimizations with regex pre-compilation
|
|
638
|
+
- β¨ Sanitization result caching for improved speed
|
|
639
|
+
- β¨ Enhanced file sanitization with Korean character support
|
|
640
|
+
- β¨ Improved email character handling in file names
|
|
641
|
+
- β¨ Better error handling and logging
|
|
642
|
+
- π Comprehensive logging with SANITIZED status
|
|
643
|
+
- π§ Memory optimization for large file processing
|
|
644
|
+
|
|
645
|
+
**v0.1.0** - Initial Release
|
|
646
|
+
- π¦ Basic upload functionality
|
|
647
|
+
- π API and Supabase direct mode support
|
|
648
|
+
- π RFC-based file upload
|
|
649
|
+
|
|
650
|
+
## Troubleshooting
|
|
651
|
+
|
|
652
|
+
**Connection Issues:**
|
|
653
|
+
- Verify `ARELA_API_URL` and `ARELA_API_TOKEN` are correct
|
|
654
|
+
- Check network connectivity to the API endpoint
|
|
655
|
+
- The tool will automatically fallback to Supabase direct mode if API is unavailable
|
|
656
|
+
|
|
657
|
+
**Performance Issues:**
|
|
658
|
+
- Adjust `--batch-size` for optimal API performance (default: 10)
|
|
659
|
+
- Modify `--concurrency` to control parallel processing (default: 10)
|
|
660
|
+
- Use `--show-stats` to monitor sanitization cache performance
|
|
661
|
+
|
|
662
|
+
**File Issues:**
|
|
663
|
+
- Check file permissions in `UPLOAD_BASE_PATH`
|
|
664
|
+
- Verify `UPLOAD_SOURCES` paths exist and are accessible
|
|
665
|
+
- Review `arela-upload.log` for detailed error information
|
|
666
|
+
|
|
667
|
+
## Contributing
|
|
668
|
+
|
|
669
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
670
|
+
|
|
671
|
+
## License
|
|
672
|
+
|
|
673
|
+
ISC License - see LICENSE file for details.
|
package/comandos.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
node src/index.js stats --stats-only
|
|
2
|
+
node src/index.js detect --detect-pdfs
|
|
3
|
+
node src/index.js detect --propagate-arela-path
|
|
4
|
+
node src/index.js upload --upload-by-rfc
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
node src/index.js scan
|
|
9
|
+
node src/index.js identify
|
|
10
|
+
node src/index.js propagate
|
|
11
|
+
node src/index.js push
|