@arela/uploader 1.0.11 β†’ 1.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,673 +1,340 @@
1
1
  # arela-uploader
2
2
 
3
- CLI tool to upload files and directories to Arela API or Supabase Storage with automatic file processing, detection, and organization.
3
+ CLI tool to scan, detect, and upload files to the Arela API with multi-tenant support, automatic document detection, and directory watching.
4
4
 
5
- ## ✨ What's New in v0.4.0
6
-
7
- - 🏒 **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
8
- - πŸ”€ **Cross-Tenant Mode**: Read from one API, write to another with `--source-api` and `--target-api`
9
- - βš™οΈ **Dynamic Client Config**: Switch clients by updating `.env` - no code changes needed!
10
- - πŸ‘οΈ **Enhanced Watch Mode**: Full cross-tenant support in automatic processing pipeline
11
- - ⚑ **Optimized Connections**: HTTP Agent with connection pooling for high performance
12
-
13
- ## πŸš€ OPTIMIZED 4-PHASE WORKFLOW
14
-
15
- **New in v0.2.0**: The tool now supports an optimized 4-phase workflow designed for maximum performance when processing large file collections:
16
-
17
- ### Phase 1: Filesystem Stats Collection πŸ“Š
18
- ```bash
19
- arela --stats-only
20
- ```
21
- - ⚑ **ULTRA FAST**: Only reads filesystem metadata (no file content)
22
- - πŸ“ˆ **Bulk database operations**: Processes 1000+ files per batch
23
- - πŸ”„ **Upsert optimization**: Handles duplicates efficiently
24
- - πŸ’Ύ **Minimal memory usage**: No file content loading
5
+ ## Installation
25
6
 
26
- ### Phase 2: PDF Detection πŸ”
27
7
  ```bash
28
- arela --detect-pdfs
8
+ npm install -g @arela/uploader
29
9
  ```
30
- - 🎯 **Targeted processing**: Only processes PDF files from database
31
- - οΏ½ **Pedimento-simplificado detection**: Extracts RFC, pedimento numbers, and metadata
32
- - πŸ”„ **Batched processing**: Handles large datasets efficiently
33
- - πŸ“Š **Progress tracking**: Real-time detection statistics
34
10
 
35
- ### Phase 3: Path Propagation οΏ½πŸ“
36
- ```bash
37
- arela --propagate-arela-path
38
- ```
39
- - 🎯 **Smart path copying**: Propagates arela_path from pedimento documents to related files
40
- - πŸ“¦ **Batch updates**: Processes files in groups for optimal database performance
41
- - πŸ”— **Relationship mapping**: Links supporting documents to their pedimento
11
+ ## Quick Start
42
12
 
43
- ### Phase 4: RFC-based Upload πŸš€
44
- ```bash
45
- arela --upload-by-rfc
46
- ```
47
- - 🎯 **Targeted uploads**: Only uploads files for specified RFCs
48
- - πŸ“‹ **Supporting documents**: Includes all related files, not just pedimentos
49
- - πŸ—οΈ **Structure preservation**: Maintains proper folder hierarchy
13
+ The recommended workflow runs 4 phases in sequence:
50
14
 
51
- ### Combined Workflow 🎯
52
15
  ```bash
53
- # Run all 4 phases in sequence (recommended)
54
- arela --run-all-phases
55
-
56
- # Or run phases individually for more control
57
- arela --stats-only # Phase 1: Collect filesystem stats
58
- arela --detect-pdfs # Phase 2: Detect pedimento documents
59
- arela --propagate-arela-path # Phase 3: Propagate paths to related files
60
- arela --upload-by-rfc # Phase 4: Upload by RFC
16
+ arela scan # 1. Discover files and register metadata
17
+ arela identify # 2. Detect document types from PDF content
18
+ arela propagate # 3. Propagate paths to related files
19
+ arela push # 4. Upload files to storage
61
20
  ```
62
21
 
63
- ### Performance Benefits
64
-
65
- **Before optimization** (single phase with detection):
66
- - 🐌 Read every file for detection
67
- - πŸ’Ύ High memory usage
68
- - πŸ”„ Slow database operations
69
- - ❌ Process unsupported files
70
-
71
- **After optimization** (4-phase approach):
72
- - ⚑ **10x faster**: Phase 1 only reads filesystem metadata
73
- - πŸ“Š **Bulk operations**: Database inserts up to 1000 records per batch
74
- - 🎯 **Targeted processing**: Phase 2 only processes PDFs needing detection
75
- - πŸ’Ύ **Memory efficient**: No unnecessary file content loading
76
- - πŸ”„ **Optimized I/O**: Separates filesystem, database, and network operations
77
-
78
- ## Features
79
-
80
- - πŸ“ Upload entire directories or individual files
81
- - πŸ€– **Automatic file detection and organization** (API mode)
82
- - πŸ—‚οΈ **Smart year/pedimento auto-detection from file paths**
83
- - πŸ—οΈ **Custom folder structure support**
84
- - πŸ”„ Automatic file renaming to handle problematic characters
85
- - πŸ“ Comprehensive logging (local and remote)
86
- - ⚑ Retry mechanism for failed uploads
87
- - 🎯 Skip duplicate files automatically
88
- - πŸ“Š Progress bars and detailed summaries
89
- - πŸ“‚ **Preserve directory structure with auto-organization**
90
- - πŸš€ **Batch processing with configurable concurrency**
91
- - πŸ”§ **Performance optimizations with caching**
92
- - πŸ“‹ **Upload files by specific RFC values**
93
- - πŸ” **Propagate arela_path from pedimento documents to related files**
94
- - ⚑ **4-Phase optimized workflow for maximum performance**
95
- - πŸ‘οΈ **Watch Mode** - Monitor directories for changes and upload automatically
96
- - Multiple watch strategies (batch, individual, full-structure)
97
- - **Multi-tenant and cross-tenant support** ⭐ NEW
98
- - Debounce and polling support
99
- - Auto-processing pipeline
100
- - Dry-run mode for testing
101
- - Pattern-based file ignoring
102
-
103
- ## 🏒 Multi-Tenant API Support
104
-
105
- Connect to different API instances: **default**, **agencia**, or **cliente**.
22
+ ## Commands
106
23
 
107
- ```bash
108
- # Upload to client API
109
- arela upload --api cliente --upload-by-rfc
24
+ ### `arela scan`
110
25
 
111
- # Collect stats on agencia API
112
- arela stats --api agencia
26
+ Scan the filesystem and register file metadata via the API. Supports streaming discovery and multi-level directory partitioning.
113
27
 
114
- # Watch mode with specific API target
115
- arela watch --api cliente
28
+ ```bash
29
+ arela scan # Basic scan
30
+ arela scan --api agencia # Scan using agencia API
31
+ arela scan --count-first # Count files first for progress %
32
+ arela scan --no-stream # Synchronous discovery (no streaming)
116
33
  ```
117
34
 
118
- ### Cross-Tenant Mode
35
+ | Flag | Description | Default |
36
+ |------|-------------|---------|
37
+ | `--api <target>` | API target (`default`, `agencia`, `cliente`) | `default` |
38
+ | `--count-first` | Count files before scanning for progress tracking | β€” |
39
+ | `--no-stream` | Use synchronous file discovery | β€” |
119
40
 
120
- Process files from one tenant and upload to another:
41
+ ### `arela identify`
121
42
 
122
- ```bash
123
- # Read data from agencia, upload files to client
124
- arela watch --source-api agencia --target-api cliente
43
+ Detect document types (e.g. pedimento simplificado) from PDF content using pattern matchers. Runs against existing database records.
125
44
 
126
- # Same for upload command
127
- arela upload --source-api agencia --target-api cliente --upload-by-rfc
45
+ ```bash
46
+ arela identify # Identify documents
47
+ arela identify --batch-size 200 # Larger batches
48
+ arela identify --show-stats # Show performance stats
128
49
  ```
129
50
 
130
- **How Cross-Tenant Works:**
131
- | Phase | Description | API Used |
132
- |-------|-------------|----------|
133
- | Phase 1 | Stats Collection | `--source-api` |
134
- | Phase 2 | PDF Detection | `--source-api` |
135
- | Phase 3 | Path Propagation | `--source-api` |
136
- | Phase 4 | File Upload | `--target-api` |
137
-
138
- ### Available API Targets
51
+ | Flag | Description | Default |
52
+ |------|-------------|---------|
53
+ | `--api <target>` | API target | `default` |
54
+ | `-b, --batch-size <size>` | Files per batch | `100` |
55
+ | `--show-stats` | Show performance statistics | β€” |
139
56
 
140
- Only 3 API targets are available: `default`, `agencia`, `cliente`
57
+ ### `arela propagate`
141
58
 
142
- Configure in your `.env` file:
59
+ Propagate `arela_path` from identified pedimento records to related files in the same directory.
143
60
 
144
- ```env
145
- # Default API (--api default or no flag)
146
- ARELA_API_URL=http://localhost:3010
147
- ARELA_API_TOKEN=your_token
148
-
149
- # Agencia API (--api agencia)
150
- ARELA_API_AGENCIA_URL=http://localhost:4012
151
- ARELA_API_AGENCIA_TOKEN=your_agencia_token
152
-
153
- # Cliente API (--api cliente)
154
- # Configure the URL/Token for the specific client you need
155
- ARELA_API_CLIENTE_URL=http://localhost:4014
156
- ARELA_API_CLIENTE_TOKEN=your_cliente_token
157
-
158
- # Examples for different clients:
159
- # Cliente AUM9207011CA: ARELA_API_CLIENTE_URL=http://localhost:4014
160
- # Cliente KTJ931117P55: ARELA_API_CLIENTE_URL=http://localhost:4013
61
+ ```bash
62
+ arela propagate # Propagate paths
63
+ arela propagate --batch-size 100 # Process 100 pedimentos per batch
64
+ arela propagate --show-stats # Show statistics
161
65
  ```
162
66
 
163
- > πŸ’‘ **Tip**: To switch between clients, just update `ARELA_API_CLIENTE_URL` and `ARELA_API_CLIENTE_TOKEN` in your `.env` file. No code changes needed!
67
+ | Flag | Description | Default |
68
+ |------|-------------|---------|
69
+ | `--api <target>` | API target | `default` |
70
+ | `-b, --batch-size <size>` | Pedimentos per batch | `50` |
71
+ | `--show-stats` | Show performance statistics | β€” |
164
72
 
165
- ## Installation
73
+ ### `arela push`
74
+
75
+ Upload files to Arela storage, filtered by RFC and/or year. Supports cross-tenant mode.
166
76
 
167
77
  ```bash
168
- npm install -g @arela/uploader
78
+ arela push # Upload all files with arela_path
79
+ arela push --rfcs RFC1,RFC2 --years 2023,2024 # Filter by RFC and year
80
+ arela push --source-api agencia --target-api cliente # Cross-tenant upload
81
+ arela push --folder-structure "prefix/path" # Add storage path prefix
82
+ arela push --no-auto-organize # Disable auto-organization
169
83
  ```
170
84
 
171
- ## Usage
172
-
173
- ### πŸš€ Optimized 4-Phase Workflow (Recommended)
85
+ | Flag | Description | Default |
86
+ |------|-------------|---------|
87
+ | `--api <target>` | API target | `default` |
88
+ | `--scan-api <target>` | API for reading scan data | `default` |
89
+ | `--push-api <target>` | API for uploading files | β€” |
90
+ | `--source-api <target>` | Source API (cross-tenant) | β€” |
91
+ | `--target-api <target>` | Target API (cross-tenant) | β€” |
92
+ | `-b, --batch-size <size>` | Files to fetch per batch | `100` |
93
+ | `--upload-batch-size <size>` | Concurrent uploads | `10` |
94
+ | `--rfcs <rfcs>` | Comma-separated RFCs (overrides `PUSH_RFCS`) | β€” |
95
+ | `--years <years>` | Comma-separated years (overrides `PUSH_YEARS`) | β€” |
96
+ | `--folder-structure <path>` | Storage path prefix | β€” |
97
+ | `--no-auto-organize` | Disable automatic file organization | β€” |
98
+ | `--show-stats` | Show performance statistics | β€” |
99
+
100
+ ### `arela upload`
101
+
102
+ Legacy upload command with multiple modes (stats-only, RFC-based, run-all-phases).
174
103
 
175
104
  ```bash
176
- # Run all phases automatically (most efficient)
177
- arela upload --run-all-phases --batch-size 20
178
-
179
- # Or run phases individually for fine-grained control
180
- arela stats # Phase 1: Filesystem stats only
181
- arela detect # Phase 2: PDF detection
182
- arela detect --propagate-arela-path # Phase 3: Path propagation
183
- arela upload --upload-by-rfc # Phase 4: RFC-based upload
105
+ arela upload --batch-size 10 # Basic upload
106
+ arela upload --upload-by-rfc # Upload by RFC
107
+ arela upload --run-all-phases # Run all phases
108
+ arela upload --force-supabase --prefix "folder" # Direct Supabase upload
109
+ arela upload --folder-structure "2024/pedimentos" # Custom folder structure
110
+ arela upload --auto-detect-structure # Auto-detect year/pedimento from paths
184
111
  ```
185
112
 
186
- ### Available Commands
113
+ | Flag | Description | Default |
114
+ |------|-------------|---------|
115
+ | `--api <target>` | API target | `default` |
116
+ | `--source-api <target>` | Source API (cross-tenant) | β€” |
117
+ | `--target-api <target>` | Target API (cross-tenant) | β€” |
118
+ | `-b, --batch-size <size>` | Files per batch | `10` |
119
+ | `-p, --prefix <prefix>` | Prefix for uploaded files | β€” |
120
+ | `--folder-structure <structure>` | Custom folder structure | β€” |
121
+ | `--client-path <path>` | Override client path for metadata | β€” |
122
+ | `--auto-detect-structure` | Auto-detect folder structure from paths | β€” |
123
+ | `--auto-detect` | Enable document type detection | β€” |
124
+ | `--auto-organize` | Enable automatic file organization | β€” |
125
+ | `--force-supabase` | Force direct Supabase upload (skip API) | β€” |
126
+ | `--skip-processed` | Skip already-processed files | β€” |
127
+ | `--show-stats` | Show performance statistics | β€” |
128
+ | `--upload-by-rfc` | Upload based on RFC values from `UPLOAD_RFCS` | β€” |
129
+ | `--run-all-phases` | Run all processing phases sequentially | β€” |
130
+
131
+ ### `arela watch`
132
+
133
+ Monitor directories for file changes and process them automatically.
187
134
 
188
- #### 1. **upload** - Upload files to Arela
189
135
  ```bash
190
- # Basic upload with auto-processing (API Mode)
191
- arela upload --batch-size 10
192
-
193
- # Upload with auto-detection of year/pedimento from file paths
194
- arela upload --auto-detect-structure --batch-size 10
195
-
196
- # Upload with custom folder structure
197
- arela upload --folder-structure "2024/4023260" --batch-size 10
198
-
199
- # Upload to Supabase directly (skip API)
200
- arela upload --force-supabase --prefix "my-folder"
201
-
202
- # Upload files by specific RFC values
203
- arela upload --upload-by-rfc --batch-size 5
204
-
205
- # Upload RFC files with custom folder prefix
206
- arela upload --upload-by-rfc --folder-structure "palco" --batch-size 5
207
-
208
- # Upload RFC files with nested folder structure
209
- arela upload --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
210
-
211
- # Upload with performance statistics
212
- arela upload --batch-size 10 --show-stats
213
-
214
- # Upload with client path tracking
215
- arela upload --client-path "/client/documents" --batch-size 10
136
+ arela watch -d "/path/to/dir1,/path/to/dir2" # Watch directories
137
+ arela watch --api cliente # Watch with specific API
138
+ arela watch --source-api agencia --target-api cliente # Cross-tenant watch
139
+ arela watch --strategy individual # Upload each file immediately
140
+ arela watch --auto-processing --batch-size 10 # Auto pipeline (scan→identify→propagate→push)
141
+ arela watch --dry-run # Simulate without uploading
142
+ arela watch --poll 5000 # Use polling (for NFS/remote FS)
216
143
  ```
217
144
 
218
- #### 2. **stats** - Collect file statistics without uploading
145
+ | Flag | Description | Default |
146
+ |------|-------------|---------|
147
+ | `--api <target>` | API target | `default` |
148
+ | `--source-api <target>` | Source API (cross-tenant) | β€” |
149
+ | `--target-api <target>` | Target API (cross-tenant) | β€” |
150
+ | `-d, --directories <paths>` | Comma-separated directories to watch | β€” |
151
+ | `-s, --strategy <strategy>` | `batch`, `individual`, or `full-structure` | `batch` |
152
+ | `--debounce <ms>` | Debounce delay in milliseconds | `1000` |
153
+ | `-b, --batch-size <size>` | Files per batch | `10` |
154
+ | `--poll <ms>` | Use polling with interval in ms | β€” |
155
+ | `--ignore <patterns>` | Comma-separated patterns to ignore | β€” |
156
+ | `--auto-detect` | Enable document type detection | β€” |
157
+ | `--auto-organize` | Enable file organization | β€” |
158
+ | `--auto-processing` | Enable automatic 4-step pipeline | β€” |
159
+ | `--dry-run` | Simulate without uploading | β€” |
160
+ | `--verbose` | Verbose logging | β€” |
161
+
162
+ **Watch strategies:**
163
+ - **`batch`** (default) β€” Groups files and uploads periodically
164
+ - **`individual`** β€” Uploads each file immediately on change
165
+ - **`full-structure`** β€” Preserves full directory structure during upload
166
+
167
+ ### `arela stats`
168
+
169
+ Collect filesystem statistics without uploading (legacy, wraps upload in stats-only mode).
170
+
219
171
  ```bash
220
- # Collect filesystem statistics only (Phase 1)
221
172
  arela stats --batch-size 10
222
-
223
- # Stats with custom folder organization
224
- arela stats --folder-structure "2023/3019796" --batch-size 10
225
-
226
- # Stats with client path tracking
227
- arela stats --client-path "/client/documents" --batch-size 10
228
173
  ```
229
174
 
230
- #### 3. **detect** - Run document detection and path propagation
231
- ```bash
232
- # Run PDF detection on existing database records (Phase 2)
233
- arela detect --batch-size 10
175
+ ### `arela detect`
234
176
 
235
- # Propagate arela_path from pedimento records to related files (Phase 3)
236
- arela detect --propagate-arela-path
237
- ```
177
+ Legacy document detection command. Prefer `arela identify` and `arela propagate` instead.
238
178
 
239
- #### 4. **watch** - Monitor directories and upload automatically ⭐ NEW
240
179
  ```bash
241
- # Watch directories for changes with automatic upload
242
- arela watch --directories "/path/to/watch1,/path/to/watch2"
243
-
244
- # Watch with specific API target (single tenant)
245
- arela watch --api cliente
246
-
247
- # Watch with cross-tenant mode (read from agencia, upload to client)
248
- arela watch --source-api agencia --target-api cliente
249
-
250
- # Watch with custom upload strategy (default: batch)
251
- arela watch --directories "/path/to/watch" --strategy individual
252
- arela watch --directories "/path/to/watch" --strategy full-structure
253
-
254
- # Watch with custom debounce delay (default: 1000ms)
255
- arela watch --directories "/path/to/watch" --debounce 2000
256
-
257
- # Watch with automatic 4-step pipeline
258
- arela watch --directories "/path/to/watch" --auto-processing --batch-size 10
259
-
260
- # Watch with polling instead of native file system events
261
- arela watch --directories "/path/to/watch" --poll 5000
262
-
263
- # Watch with pattern ignoring
264
- arela watch --directories "/path/to/watch" --ignore "node_modules,*.log,*.tmp"
265
-
266
- # Watch in dry-run mode (simulate without uploading)
267
- arela watch --directories "/path/to/watch" --dry-run
268
-
269
- # Watch with verbose logging
270
- arela watch --directories "/path/to/watch" --verbose
180
+ arela detect --batch-size 10 # PDF detection
181
+ arela detect --propagate-arela-path # Path propagation
271
182
  ```
272
183
 
273
- **Watch Strategies:**
274
- - `batch` **(default)**: Groups files and uploads periodically
275
- - `individual`: Uploads each file immediately as it changes
276
- - `full-structure`: Preserves directory structure during upload
277
-
278
- **Multi-Tenant Options:**
279
- - `--api <target>`: Use a single API for all operations
280
- - `--source-api <target>`: API for reading/processing (phases 1-3)
281
- - `--target-api <target>`: API for uploading (phase 4)
184
+ ### `arela query`
282
185
 
283
- #### 5. **query** - Query database for file status
284
- ```bash
285
- # Show files ready for upload
286
- arela query --ready-files
287
- ```
186
+ Query the database for file status.
288
187
 
289
- #### 6. **config** - Show current configuration
290
188
  ```bash
291
- # Display all configuration settings
292
- arela config
189
+ arela query --ready-files # Show files ready for upload
293
190
  ```
294
191
 
295
- ### Legacy Syntax (Still Supported)
192
+ ### `arela config`
296
193
 
297
- The old flag-based syntax is still supported for backward compatibility:
194
+ Display current configuration for all API targets and settings.
298
195
 
299
196
  ```bash
300
- # These are equivalent to the commands above
301
- arela --stats-only # Same as: arela stats
302
- arela --detect-pdfs # Same as: arela detect
303
- arela --propagate-arela-path # Same as: arela detect --propagate-arela-path
304
- arela --upload-by-rfc # Same as: arela upload --upload-by-rfc
305
- ```
306
-
307
- #### Phase Control
308
- - `--stats-only`: **Phase 1** - Only collect filesystem stats (no file reading)
309
- - `--detect-pdfs`: **Phase 2** - Process PDF files for pedimento-simplificado detection
310
- - `--propagate-arela-path`: **Phase 3** - Propagate arela_path from pedimento records to related files
311
- - `--upload-by-rfc`: **Phase 4** - Upload files based on RFC values from UPLOAD_RFCS
312
- - `--run-all-phases`: **All Phases** - Run complete optimized workflow
313
-
314
- #### Global Options (all commands)
315
- - `-v, --verbose`: Enable verbose logging
316
- - `--clear-log`: Clear the log file before starting
317
- - `-h, --help`: Display help information
318
- - `--version`: Display version number
319
-
320
- #### Upload Command Options
321
- - `-b, --batch-size <size>`: API batch size (default: 10)
322
- - `--folder-structure <structure>`: Custom folder structure (e.g., "2024/4023260")
323
- - `--client-path <path>`: Client path for metadata tracking
324
- - `--auto-detect-structure`: Automatically detect year/pedimento from file paths
325
- - `--auto-detect`: Enable automatic document type detection
326
- - `--auto-organize`: Enable automatic file organization
327
- - `--force-supabase`: Force direct Supabase upload (skip API)
328
- - `--skip-processed`: Skip files already processed
329
- - `--show-stats`: Show performance statistics
330
- - `--upload-by-rfc`: Upload files based on RFC values from UPLOAD_RFCS
331
- - `--run-all-phases`: Run all processing phases sequentially
332
-
333
- #### Stats Command Options
334
- - `-b, --batch-size <size>`: Batch size for processing (default: 10)
335
- - `--client-path <path>`: Client path for metadata tracking
336
- - `--show-stats`: Show performance statistics
337
-
338
- #### Detect Command Options
339
- - `-b, --batch-size <size>`: Batch size for PDF detection (default: 10)
340
- - `--propagate-arela-path`: Propagate arela_path from pedimento records to related files
341
-
342
- #### Watch Command Options
343
- - `-d, --directories <paths>`: **Comma-separated directories to watch** (required)
344
- - `-s, --strategy <strategy>`: Upload strategy (default: batch)
345
- - `batch`: Groups files and uploads periodically
346
- - `individual`: Uploads each file immediately
347
- - `full-structure`: Preserves directory structure
348
- - `--api <target>`: Use a single API target for all operations
349
- - `--source-api <target>`: API for reading/processing (phases 1-3)
350
- - `--target-api <target>`: API for uploading (phase 4)
351
- - `--debounce <ms>`: Debounce delay in milliseconds (default: 1000)
352
- - `-b, --batch-size <size>`: Batch size for uploads (default: 10)
353
- - `--poll <ms>`: Use polling instead of native file system events (interval in ms)
354
- - `--ignore <patterns>`: Comma-separated patterns to ignore
355
- - `--auto-detect`: Enable automatic document type detection
356
- - `--auto-organize`: Enable automatic file organization
357
- - `--auto-processing`: Enable automatic 4-step pipeline (stats, detect, propagate, upload)
358
- - `--dry-run`: Simulate changes without uploading
359
- - `--verbose`: Enable verbose logging
360
-
361
- ## Environment Variables
362
-
363
- Create a `.env` file in your project root:
364
-
365
- ```env
366
- # Default API (--api default or no flag)
367
- ARELA_API_URL=http://localhost:3010
368
- ARELA_API_TOKEN=your_api_token
369
-
370
- # Agencia API (--api agencia)
371
- ARELA_API_AGENCIA_URL=http://localhost:4012
372
- ARELA_API_AGENCIA_TOKEN=your_agencia_token
373
-
374
- # Cliente API (--api cliente)
375
- # Configure for the specific client you need
376
- ARELA_API_CLIENTE_URL=http://localhost:4014
377
- ARELA_API_CLIENTE_TOKEN=your_cliente_token
378
-
379
- # For Direct Supabase Mode (fallback)
380
- SUPABASE_URL=your_supabase_url
381
- SUPABASE_KEY=your_supabase_anon_key
382
- SUPABASE_BUCKET=your_bucket_name
383
-
384
- # Required for both modes
385
- UPLOAD_BASE_PATH=/path/to/your/files
386
- UPLOAD_SOURCES=folder1|folder2|file.pdf
387
-
388
- # RFC-based Upload Configuration
389
- # Pipe-separated list of RFCs to upload files for
390
- UPLOAD_RFCS=MMJ0810145N1|ABC1234567XY|DEF9876543ZZ
391
-
392
- # Watch Mode Configuration (JSON format)
393
- WATCH_DIRECTORY_CONFIGS={"../../Documents/2022":"palco","../../Documents/2023":"palco"}
197
+ arela config
394
198
  ```
395
199
 
396
- **Environment Variable Details:**
200
+ ### Global Options
397
201
 
398
- - `ARELA_API_URL`: Base URL for default API service
399
- - `ARELA_API_AGENCIA_URL`: URL for agencia API
400
- - `ARELA_API_CLIENTE_URL`: URL for client API (configure per client)
401
- - `ARELA_API_TOKEN`: Authentication token for default API
402
- - `ARELA_API_AGENCIA_TOKEN`: Token for agencia API
403
- - `ARELA_API_CLIENTE_TOKEN`: Token for client API
404
- - `SUPABASE_URL`: Your Supabase project URL
405
- - `SUPABASE_KEY`: Supabase anonymous key for direct uploads
406
- - `SUPABASE_BUCKET`: Target bucket name in Supabase Storage
407
- - `UPLOAD_BASE_PATH`: Root directory containing files to upload
408
- - `UPLOAD_SOURCES`: Pipe-separated list of folders/files to process
409
- - `UPLOAD_RFCS`: Pipe-separated list of RFC values for targeted uploads
410
- - `WATCH_DIRECTORY_CONFIGS`: JSON mapping directories to folder structures
202
+ | Flag | Description |
203
+ |------|-------------|
204
+ | `-v, --verbose` | Enable verbose logging |
205
+ | `--clear-log` | Clear log file before starting |
206
+ | `--version` | Display version number |
207
+ | `-h, --help` | Display help |
411
208
 
412
- ## RFC-Based File Upload
209
+ ## Multi-Tenant API
413
210
 
414
- The `--upload-by-rfc` feature allows you to upload files to the Arela API based on specific RFC values. This is useful when you want to upload only files associated with certain companies or entities.
211
+ Three API targets are available: `default`, `agencia`, and `cliente`. Configure each in your `.env` file.
415
212
 
416
- ### How it works:
417
-
418
- 1. **Configure RFCs**: Set the `UPLOAD_RFCS` environment variable with pipe-separated RFC values
419
- 2. **Query Database**: The tool searches the Supabase database for files matching the specified RFCs
420
- 3. **Include Supporting Documents**: Finds all files sharing the same `arela_path` as the RFC matches (not just the pedimento files)
421
- 4. **Apply Folder Structure**: Optionally applies custom folder prefix using `--folder-structure`
422
- 5. **Group and Upload**: Files are grouped by their final destination path and uploaded with proper structure
423
-
424
- ### Folder Structure Options:
213
+ ```bash
214
+ # Single-tenant
215
+ arela scan --api cliente
216
+ arela push --api agencia
425
217
 
426
- **Default Behavior** (no `--folder-structure`):
427
- - Uses original `arela_path`: `CAD890407NK7/2023/3429/070/230734293000421/`
218
+ # Cross-tenant (read from one API, upload to another)
219
+ arela push --source-api agencia --target-api cliente
220
+ ```
428
221
 
429
- **With Custom Prefix** (`--folder-structure "palco"`):
430
- - Results in: `palco/CAD890407NK7/2023/3429/070/230734293000421/`
222
+ | Phase | `--source-api` | `--target-api` |
223
+ |-------|-----------------|-----------------|
224
+ | Scan / Identify / Propagate | βœ“ | β€” |
225
+ | Push / Upload | β€” | βœ“ |
431
226
 
432
- **With Nested Prefix** (`--folder-structure "2024/client1/pedimentos"`):
433
- - Results in: `2024/client1/pedimentos/CAD890407NK7/2023/3429/070/230734293000421/`
227
+ ## Environment Variables
434
228
 
435
- ### Prerequisites:
229
+ Copy `.env.template` to `.env` and configure:
436
230
 
437
- - Files must have been previously processed (have entries in the `uploader` table)
438
- - Files must have `rfc` field populated (from document detection)
439
- - Files must have `arela_path` populated (from pedimento processing)
440
- - Original files must still exist at their `original_path` locations
231
+ ### API Configuration
441
232
 
442
- ### Example:
233
+ ```env
234
+ ARELA_API_URL=https://your-arela-api.example.com
235
+ ARELA_API_TOKEN=your-api-token
443
236
 
444
- ```bash
445
- # Set RFCs in environment
446
- export UPLOAD_RFCS="MMJ0810145N1|ABC1234567XY|DEF9876543ZZ"
237
+ ARELA_API_AGENCIA_URL=https://agencia-api.example.com
238
+ ARELA_API_AGENCIA_TOKEN=your-agencia-token
447
239
 
448
- # Upload files for these RFCs (original folder structure)
449
- arela --upload-by-rfc --batch-size 5 --show-stats
240
+ ARELA_API_CLIENTE_URL=https://cliente-api.example.com
241
+ ARELA_API_CLIENTE_TOKEN=your-cliente-token
242
+ ```
450
243
 
451
- # Upload with custom folder prefix
452
- arela --upload-by-rfc --folder-structure "palco" --batch-size 10
244
+ ### Supabase (fallback)
453
245
 
454
- # Upload with nested organization
455
- arela --upload-by-rfc --folder-structure "2024/Q1/processed" --batch-size 15
246
+ ```env
247
+ SUPABASE_URL=https://your-project.supabase.co
248
+ SUPABASE_KEY=your-supabase-key
249
+ SUPABASE_BUCKET=your-bucket-name
456
250
  ```
457
251
 
458
- The tool will:
459
- - Find all database records matching the specified RFCs
460
- - Include ALL supporting documents that share the same `arela_path`
461
- - Apply the optional folder structure prefix if specified
462
- - Group files by their final destination folder structure
463
- - Upload each group maintaining the correct Arela folder hierarchy
464
- - Provide detailed progress and summary statistics
465
- - Handle large datasets with automatic pagination (no 1000-file limit)
466
-
467
- ## File Processing Modes
468
-
469
- ### API Mode (Default)
470
- When `ARELA_API_URL` and `ARELA_API_TOKEN` are configured:
471
- - βœ… Automatic file detection and classification
472
- - βœ… Intelligent file organization
473
- - βœ… **Smart year/pedimento auto-detection from paths**
474
- - βœ… **Custom folder structure support**
475
- - βœ… Batch processing with progress tracking
476
- - βœ… Advanced error handling and retry logic
477
- - βœ… **Performance optimizations with file sanitization caching**
478
-
479
- ### Auto-Detection Features
480
- The tool can automatically detect year and pedimento numbers from file paths using multiple patterns:
481
-
482
- **Pattern 1: Direct Structure**
483
- ```
484
- /path/to/2024/4023260/file.pdf
485
- /path/to/pedimentos/2024/4023260/file.pdf
486
- ```
252
+ ### Upload & File Sources
487
253
 
488
- **Pattern 2: Named Patterns**
489
- ```
490
- /path/to/docs/aΓ±o2024/ped4023260/file.pdf
491
- /path/to/files/year2024/pedimento4023260/file.pdf
254
+ ```env
255
+ UPLOAD_BASE_PATH=/path/to/files
256
+ UPLOAD_SOURCES=folder1|folder2|folder3
257
+ UPLOAD_RFCS=RFC1|RFC2|RFC3
258
+ UPLOAD_YEARS=2023|2024
492
259
  ```
493
260
 
494
- **Pattern 3: Loose Detection**
495
- - Year: Any 4-digit number starting with "202" (2020-2029)
496
- - Pedimento: Any 4-8 consecutive digits in path
261
+ ### Scan Configuration
497
262
 
498
- Use `--auto-detect-structure` to enable automatic detection:
499
- ```bash
500
- arela --auto-detect-structure --batch-size 10
263
+ ```env
264
+ ARELA_COMPANY_SLUG=your_company # Company identifier
265
+ ARELA_SERVER_ID=nas01 # Server/NAS identifier
266
+ ARELA_BASE_PATH_LABEL= # Optional (auto-derived from base path)
267
+ SCAN_EXCLUDE_PATTERNS=.DS_Store,Thumbs.db,desktop.ini
268
+ SCAN_BATCH_SIZE=2000 # Records per API call
269
+ SCAN_DIRECTORY_LEVEL=0 # Directory depth for table partitioning
501
270
  ```
502
271
 
503
- ### Custom Folder Structure
504
- Specify a custom organization pattern:
505
- ```bash
506
- # Static structure
507
- arela --folder-structure "2024/4023260" --batch-size 10
272
+ ### Push Configuration
508
273
 
509
- # Client-based structure
510
- arela --folder-structure "cliente1/pedimentos" --batch-size 10
274
+ ```env
275
+ PUSH_RFCS= # Pipe-separated RFC filter
276
+ PUSH_YEARS= # Pipe-separated year filter
277
+ PUSH_BATCH_SIZE=100 # Fetch batch size
278
+ PUSH_UPLOAD_BATCH_SIZE=10 # Concurrent uploads
279
+ PUSH_BUCKET=arela # Storage bucket
280
+ PUSH_FOLDER_STRUCTURE= # Prefix for storage paths
511
281
  ```
512
282
 
513
- ### Directory Structure Preservation
514
- Use `--preserve-structure` to maintain your original folder structure even with auto-organization:
515
-
516
- ```bash
517
- # Without --preserve-structure
518
- # Files organized by API: bucket/filename.pdf
283
+ ### Performance Tuning
519
284
 
520
- # With --preserve-structure
521
- # Files keep structure: bucket/2024/4023260/filename.pdf
522
- arela --preserve-structure --batch-size 10
285
+ ```env
286
+ MAX_API_CONNECTIONS=10 # Match your API replica count
287
+ API_CONNECTION_TIMEOUT=60000 # Timeout in ms
288
+ API_MAX_RETRIES=3
289
+ API_RETRY_EXPONENTIAL_BACKOFF=true
290
+ BATCH_SIZE=100
291
+ BATCH_DELAY=0
292
+ MAX_CONCURRENT_SOURCES=2
523
293
  ```
524
294
 
525
- ### Supabase Direct Mode (Fallback)
526
- When API is unavailable or `--force-supabase` is used:
527
- - βœ… Direct upload to Supabase Storage
528
- - βœ… File sanitization and renaming
529
- - βœ… Basic progress tracking
530
- - βœ… **Optimized sanitization with pre-compiled regex patterns**
531
- - βœ… **Performance caching for file name sanitization**
295
+ ### Watch Mode
532
296
 
533
- ## Performance Features
297
+ ```env
298
+ WATCH_ENABLED=false
299
+ WATCH_DIRECTORY_CONFIGS={"../docs/2023":"prefix-2023","../docs/2024":"prefix-2024"}
300
+ WATCH_STRATEGY=batch # batch|individual|full-structure
301
+ WATCH_DEBOUNCE_MS=1000
302
+ WATCH_BATCH_SIZE=10
303
+ WATCH_USE_POLLING=false
304
+ WATCH_POLL_INTERVAL=100
305
+ WATCH_STABILITY_THRESHOLD=300
306
+ WATCH_IGNORE_PATTERNS=*.tmp,*.bak,*.swp
307
+ ```
534
308
 
535
- ### Database Pagination
536
- - **No Upload Limits**: Handles datasets larger than 1000 files through automatic pagination
537
- - **Efficient Querying**: Uses Supabase `.range()` method to fetch data in batches
538
- - **Memory Optimization**: Processes large datasets without memory overflow
309
+ ## Document Detection
539
310
 
540
- ### File Processing
541
- - **Pre-compiled Regex**: Sanitization patterns are compiled once for optimal performance
542
- - **Caching System**: File name sanitization results are cached to avoid re-processing
543
- - **Batch Processing**: Configurable batch sizes for optimal upload throughput
311
+ The tool detects and classifies documents:
544
312
 
545
- ### RFC Upload Optimizations
546
- - **Smart Querying**: Three-step query process to efficiently find related files
547
- - **Supporting Document Inclusion**: Automatically includes all related documents, not just pedimentos
548
- - **Path Concatenation**: Efficiently combines custom folder structures with arela_paths
313
+ - **Pedimento Simplificado** β€” Extracts RFC, pedimento number, patente, aduana, and year from PDF content to compose an `arela_path` (`RFC/Year/Patente/Aduana/Pedimento/`)
314
+ - **Supporting Documents** β€” XML, TXT, and JSON customs-related documents
549
315
 
550
316
  ## File Sanitization
551
317
 
552
- The tool automatically handles problematic characters using advanced sanitization:
553
-
554
- **Character Replacements:**
555
- - **Accents**: Γ‘β†’a, Γ©β†’e, Γ­β†’i, Γ³β†’o, ΓΊβ†’u, Γ±β†’n, Γ§β†’c
556
- - **Korean characters**: λ©•β†’meok, μ‹œβ†’si, μ½”β†’ko, μš©β†’yong, othersβ†’kr
557
- - **Special symbols**: &β†’and, {}[]~^|"<>?*: β†’-
558
- - **Email symbols**: @→(removed), spaces→-
559
- - **Multiple dashes**: collapsed to single dash
560
- - **Leading/trailing**: dashes and dots removed
561
-
562
- **Performance Features:**
563
- - Pre-compiled regex patterns for faster processing
564
- - Sanitization result caching to avoid re-processing
565
- - Unicode normalization (NFD) for consistent handling
318
+ Filenames are automatically sanitized before upload:
566
319
 
567
- ### Examples
568
-
569
- | Original | Renamed |
570
- |----------|---------|
320
+ | Original | Sanitized |
321
+ |----------|-----------|
571
322
  | `Facturas ImportaciΓ³n.pdf` | `Facturas-Importacion.pdf` |
572
323
  | `File{with}brackets.pdf` | `File-with-brackets.pdf` |
573
324
  | `Document ^& symbols.pdf` | `Document-and-symbols.pdf` |
574
- | `CI & PL-20221212(λ©•μ‹œμ½”μš©).xls` | `CI-and-PL-20221212.xls` |
575
- | `impresora@nereprint.com_file.xml` | `impresoranereprint.com_file.xml` |
576
- | `07-3429-3000430 HC.pdf` | `07-3429-3000430-HC.pdf` |
577
- | `FACTURA IN 3000430.pdf` | `FACTURA-IN-3000430.pdf` |
578
-
579
- ## Logging and Monitoring
325
+ | `file with spaces.pdf` | `file-with-spaces.pdf` |
580
326
 
581
- The tool maintains comprehensive logs both locally and remotely:
327
+ Handles accented characters, Korean glyphs, special symbols, and email addresses.
582
328
 
583
- **Local Logging (`arela-upload.log`):**
584
- - Upload status (SUCCESS/ERROR/SKIPPED/SANITIZED)
585
- - File paths and sanitization changes
586
- - Error messages and timestamps
587
- - Rename operations with before/after names
588
- - Processing statistics and performance metrics
329
+ ## Development
589
330
 
590
- **Log Entry Examples:**
591
- ```
592
- [2025-09-04T01:17:00.141Z] SUCCESS: /Users/.../file.xml -> 2023/2003180/file.xml
593
- [2025-09-04T01:17:00.822Z] SANITIZED: file name.pdf β†’ file-name.pdf
594
- [2025-09-04T01:17:00.856Z] SKIPPED: /Users/.../duplicate.pdf (already exists)
331
+ ```bash
332
+ npm test # Run tests
333
+ npm run test:watch # Run tests in watch mode
334
+ npm run test:coverage # Run tests with coverage
335
+ npm run format # Format code with Prettier
595
336
  ```
596
337
 
597
- **Remote Logging:**
598
- - Integration with Supabase database for centralized logging
599
- - Upload tracking and audit trails
600
- - Error reporting and monitoring
601
-
602
- ## Performance Features
603
-
604
- **Version 2.0.0 introduces several performance optimizations:**
605
-
606
- - **Pre-compiled Regex Patterns**: Sanitization patterns are compiled once and reused
607
- - **Sanitization Caching**: File name sanitization results are cached to avoid reprocessing
608
- - **Batch Processing**: Configurable batch sizes for optimal API usage
609
- - **Concurrent Processing**: Adjustable concurrency levels for file processing
610
- - **Smart Skip Logic**: Efficiently skips already processed files using log analysis
611
- - **Memory Optimization**: Large file outputs are truncated to prevent memory issues
612
-
613
- ## Version History
614
-
615
- **v0.4.0** - Current Release πŸ†•
616
- - ✨ **Simplified Multi-Tenant API**: Only 3 targets: `default`, `agencia`, `cliente`
617
- - ✨ **Cross-Tenant Mode**: Read from one API, upload to another
618
- - ✨ **Dynamic Client Config**: Change client by updating `.env` (no code changes)
619
- - ✨ New `--api` flag for single API target
620
- - ✨ New `--source-api` flag for source API (phases 1-3)
621
- - ✨ New `--target-api` flag for target API (phase 4)
622
- - ✨ `WATCH_DIRECTORY_CONFIGS` environment variable for watch mode
623
- - πŸ”§ Enhanced pipeline routing for cross-tenant operations
624
- - πŸ“ Simplified documentation for multi-tenant configuration
625
-
626
- **v0.3.0** - Watch Mode Release
627
- - ✨ Added watch command with chokidar integration
628
- - ✨ Automatic 4-step pipeline (stats β†’ detect β†’ propagate β†’ upload)
629
- - ✨ Multiple upload strategies (batch, individual, full-structure)
630
- - ✨ Configurable debounce and polling options
631
- - πŸ”§ Signal handling for graceful shutdown
632
-
633
- **v0.2.0** - Pipeline Automation
634
- - ✨ Added smart year/pedimento auto-detection from file paths
635
- - ✨ Custom folder structure support with `--folder-structure` option
636
- - ✨ Client path tracking with `--client-path` option
637
- - ✨ Performance optimizations with regex pre-compilation
638
- - ✨ Sanitization result caching for improved speed
639
- - ✨ Enhanced file sanitization with Korean character support
640
- - ✨ Improved email character handling in file names
641
- - ✨ Better error handling and logging
642
- - πŸ“ Comprehensive logging with SANITIZED status
643
- - πŸ”§ Memory optimization for large file processing
644
-
645
- **v0.1.0** - Initial Release
646
- - πŸ“¦ Basic upload functionality
647
- - πŸ”Œ API and Supabase direct mode support
648
- - πŸ“‚ RFC-based file upload
649
-
650
- ## Troubleshooting
651
-
652
- **Connection Issues:**
653
- - Verify `ARELA_API_URL` and `ARELA_API_TOKEN` are correct
654
- - Check network connectivity to the API endpoint
655
- - The tool will automatically fallback to Supabase direct mode if API is unavailable
656
-
657
- **Performance Issues:**
658
- - Adjust `--batch-size` for optimal API performance (default: 10)
659
- - Modify `--concurrency` to control parallel processing (default: 10)
660
- - Use `--show-stats` to monitor sanitization cache performance
661
-
662
- **File Issues:**
663
- - Check file permissions in `UPLOAD_BASE_PATH`
664
- - Verify `UPLOAD_SOURCES` paths exist and are accessible
665
- - Review `arela-upload.log` for detailed error information
666
-
667
- ## Contributing
668
-
669
- Contributions are welcome! Please feel free to submit a Pull Request.
670
-
671
338
  ## License
672
339
 
673
- ISC License - see LICENSE file for details.
340
+ ISC