@arela/uploader 1.0.2 → 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.template +70 -0
- package/docs/API_RETRY_MECHANISM.md +338 -0
- package/docs/ARELA_IDENTIFY_IMPLEMENTATION.md +489 -0
- package/docs/ARELA_IDENTIFY_QUICKREF.md +186 -0
- package/docs/ARELA_PROPAGATE_IMPLEMENTATION.md +581 -0
- package/docs/ARELA_PROPAGATE_QUICKREF.md +272 -0
- package/docs/ARELA_PUSH_IMPLEMENTATION.md +577 -0
- package/docs/ARELA_PUSH_QUICKREF.md +322 -0
- package/docs/ARELA_SCAN_IMPLEMENTATION.md +373 -0
- package/docs/ARELA_SCAN_QUICKREF.md +139 -0
- package/docs/DETECTION_ATTEMPT_TRACKING.md +414 -0
- package/docs/MIGRATION_UPLOADER_TO_FILE_STATS.md +1020 -0
- package/docs/MULTI_LEVEL_DIRECTORY_SCANNING.md +494 -0
- package/docs/STATS_COMMAND_SEQUENCE_DIAGRAM.md +287 -0
- package/docs/STATS_COMMAND_SIMPLE.md +93 -0
- package/package.json +4 -2
- package/src/commands/IdentifyCommand.js +486 -0
- package/src/commands/PropagateCommand.js +474 -0
- package/src/commands/PushCommand.js +473 -0
- package/src/commands/ScanCommand.js +516 -0
- package/src/config/config.js +177 -7
- package/src/file-detection.js +9 -10
- package/src/index.js +150 -0
- package/src/services/ScanApiService.js +646 -0
|
@@ -0,0 +1,322 @@
|
|
|
1
|
+
# Arela Push Quick Reference
|
|
2
|
+
|
|
3
|
+
## Command
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
arela push [options]
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
## Options
|
|
10
|
+
|
|
11
|
+
| Option | Default | Description |
|
|
12
|
+
|--------|---------|-------------|
|
|
13
|
+
| `--api <target>` | `default` | API target for scan operations: default\|agencia\|cliente |
|
|
14
|
+
| `--scan-api <target>` | `default` | API for reading file_stats table |
|
|
15
|
+
| `--push-api <target>` | Same as scan-api | API for uploading files |
|
|
16
|
+
| `-b, --batch-size <size>` | `100` | Files to fetch per batch |
|
|
17
|
+
| `--upload-batch-size <size>` | `10` | Files to upload concurrently |
|
|
18
|
+
| `--rfcs <rfcs>` | From env | Comma-separated RFCs to filter |
|
|
19
|
+
| `--years <years>` | From env | Comma-separated years to filter |
|
|
20
|
+
| `--show-stats` | `false` | Show detailed statistics |
|
|
21
|
+
|
|
22
|
+
## Prerequisites
|
|
23
|
+
|
|
24
|
+
1. **Run `arela scan` first** - Push requires scanned files
|
|
25
|
+
2. **Run `arela identify` first** - Need detected pedimentos
|
|
26
|
+
3. **Run `arela propagate` first** - Need arela_path on files
|
|
27
|
+
4. **Same configuration** - Use same env vars as scan/identify/propagate
|
|
28
|
+
|
|
29
|
+
## Required Environment Variables
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
ARELA_COMPANY_SLUG=your_company
|
|
33
|
+
ARELA_SERVER_ID=server01
|
|
34
|
+
UPLOAD_BASE_PATH=/path/to/files
|
|
35
|
+
UPLOAD_SOURCES=2023|2024|2025
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Optional Environment Variables
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
# Filter by RFC
|
|
42
|
+
PUSH_RFCS=RFC123456ABC|RFC789012DEF
|
|
43
|
+
|
|
44
|
+
# Filter by year
|
|
45
|
+
PUSH_YEARS=2023|2024|2025
|
|
46
|
+
|
|
47
|
+
# Batch configuration
|
|
48
|
+
PUSH_BATCH_SIZE=100
|
|
49
|
+
PUSH_UPLOAD_BATCH_SIZE=10
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Examples
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Basic push - upload all files with arela_path
|
|
56
|
+
arela push
|
|
57
|
+
|
|
58
|
+
# Filter by RFC
|
|
59
|
+
arela push --rfcs RFC123456ABC,RFC789012DEF
|
|
60
|
+
|
|
61
|
+
# Filter by year
|
|
62
|
+
arela push --years 2023,2024
|
|
63
|
+
|
|
64
|
+
# Use different API for uploads
|
|
65
|
+
arela push --scan-api agencia --push-api cliente
|
|
66
|
+
|
|
67
|
+
# Faster uploads (increase concurrent uploads)
|
|
68
|
+
arela push --upload-batch-size 20
|
|
69
|
+
|
|
70
|
+
# With detailed stats
|
|
71
|
+
arela push --show-stats
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## What It Does
|
|
75
|
+
|
|
76
|
+
1. Fetches files with `arela_path` from `file_stats_*` table
|
|
77
|
+
2. Filters by RFC and/or year if specified
|
|
78
|
+
3. Uploads files to storage API using `arela_path` as target path
|
|
79
|
+
4. Tracks upload attempts and errors for monitoring
|
|
80
|
+
5. Updates results in database
|
|
81
|
+
|
|
82
|
+
## Output Example
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
🚀 Starting arela push command
|
|
86
|
+
|
|
87
|
+
📊 Table: file_stats_acme_corp_nas01_data
|
|
88
|
+
🎯 Scan API Target: default
|
|
89
|
+
🎯 Upload API Target: default → http://localhost:3010
|
|
90
|
+
📦 Fetch Batch Size: 100
|
|
91
|
+
📤 Upload Batch Size: 10
|
|
92
|
+
|
|
93
|
+
📊 Fetching initial push statistics...
|
|
94
|
+
|
|
95
|
+
📈 Initial Status:
|
|
96
|
+
Total with arela_path: 3500
|
|
97
|
+
Uploaded: 2800
|
|
98
|
+
Pending: 650
|
|
99
|
+
Errors: 40
|
|
100
|
+
|
|
101
|
+
📊 Top RFCs:
|
|
102
|
+
PED781129JT6: 140/150 (93.3%)
|
|
103
|
+
ABC123456XYZ: 95/100 (95.0%)
|
|
104
|
+
DEF789012MNO: 180/200 (90.0%)
|
|
105
|
+
|
|
106
|
+
🚀 Uploading 650 pending files...
|
|
107
|
+
|
|
108
|
+
📤 Uploading |████████████████████| 100% | 650/650 files | 45 files/sec | ✓ 600 ✗ 50
|
|
109
|
+
|
|
110
|
+
📊 Results:
|
|
111
|
+
Files Processed: 650
|
|
112
|
+
Uploaded: 600
|
|
113
|
+
Errors: 50
|
|
114
|
+
Duration: 14.4s
|
|
115
|
+
Speed: 45 files/sec
|
|
116
|
+
|
|
117
|
+
📈 Final Status:
|
|
118
|
+
Total with arela_path: 3500
|
|
119
|
+
Uploaded: 3400
|
|
120
|
+
Pending: 50
|
|
121
|
+
Errors: 50
|
|
122
|
+
|
|
123
|
+
📊 Top RFCs:
|
|
124
|
+
PED781129JT6: 150/150 (100.0%)
|
|
125
|
+
ABC123456XYZ: 100/100 (100.0%)
|
|
126
|
+
DEF789012MNO: 195/200 (97.5%)
|
|
127
|
+
|
|
128
|
+
⚠️ 5 files reached max upload attempts.
|
|
129
|
+
Review errors and increase max_upload_attempts if needed.
|
|
130
|
+
|
|
131
|
+
✅ Push Complete!
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Backend Endpoints Used
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
GET /api/uploader/scan/files-for-push?tableName=X&rfcs=...&years=...&offset=0&limit=100
|
|
138
|
+
PATCH /api/uploader/scan/batch-update-upload?tableName=X
|
|
139
|
+
GET /api/uploader/scan/push-stats?tableName=X
|
|
140
|
+
POST /api/storage/detect-and-upload-file (for actual file uploads)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Database Fields Updated
|
|
144
|
+
|
|
145
|
+
| Field | Type | Description |
|
|
146
|
+
|-------|------|-------------|
|
|
147
|
+
| `upload_attempted_at` | TIMESTAMP | When upload was attempted |
|
|
148
|
+
| `upload_attempts` | INTEGER | Number of upload attempts |
|
|
149
|
+
| `upload_error` | TEXT | Error if upload failed |
|
|
150
|
+
| `uploaded_at` | TIMESTAMP | When file was successfully uploaded |
|
|
151
|
+
| `uploaded_to_storage_id` | UUID | Reference to storage.storage record |
|
|
152
|
+
| `upload_path` | TEXT | Final path where file was uploaded |
|
|
153
|
+
|
|
154
|
+
## Upload Logic
|
|
155
|
+
|
|
156
|
+
### Upload Path Construction
|
|
157
|
+
|
|
158
|
+
Files are uploaded using their `arela_path`:
|
|
159
|
+
```
|
|
160
|
+
arela_path format: RFC/Year/Patente/Aduana/Pedimento/
|
|
161
|
+
Example: PED781129JT6/2023/3429/07/3019796/
|
|
162
|
+
|
|
163
|
+
Final upload path: {arela_path}{file_name}
|
|
164
|
+
Example: PED781129JT6/2023/3429/07/3019796/documento.pdf
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### Filtering
|
|
168
|
+
|
|
169
|
+
Files are filtered by:
|
|
170
|
+
1. **arela_path IS NOT NULL** - Only files with valid arela_path
|
|
171
|
+
2. **uploaded_at IS NULL** - Skip already uploaded files
|
|
172
|
+
3. **upload_attempts < max_upload_attempts** - Skip files that exhausted retries
|
|
173
|
+
4. **rfc** (optional) - Filter by RFC if specified
|
|
174
|
+
5. **detected_pedimento_year** (optional) - Filter by year if specified
|
|
175
|
+
|
|
176
|
+
## Performance Tips
|
|
177
|
+
|
|
178
|
+
- **Large datasets**: Increase `--batch-size` to 200-500 for faster fetching
|
|
179
|
+
- **Faster uploads**: Increase `--upload-batch-size` to 15-20 (watch API capacity)
|
|
180
|
+
- **API latency**: Use `--push-api` to upload to geographically closer API
|
|
181
|
+
- **Network issues**: Lower `--upload-batch-size` to reduce concurrent connections
|
|
182
|
+
|
|
183
|
+
## Optimization Features
|
|
184
|
+
|
|
185
|
+
### 1. **Exact Match Queries**
|
|
186
|
+
- Uses direct RFC and year matching (no LIKE or regex)
|
|
187
|
+
- Leverages optimized indexes for fast lookups
|
|
188
|
+
|
|
189
|
+
### 2. **Attempt Tracking**
|
|
190
|
+
- Tracks `upload_attempts` per file
|
|
191
|
+
- Respects `max_upload_attempts` (default: 3)
|
|
192
|
+
- Skips files that reached max attempts
|
|
193
|
+
|
|
194
|
+
### 3. **Batch Processing**
|
|
195
|
+
- Fetches files in configurable batches
|
|
196
|
+
- Uploads multiple files concurrently
|
|
197
|
+
- Real-time progress with throughput metrics
|
|
198
|
+
|
|
199
|
+
### 4. **Cross-Tenant Support**
|
|
200
|
+
- Can read file_stats from one API
|
|
201
|
+
- Upload files to different API
|
|
202
|
+
- Useful for multi-region deployments
|
|
203
|
+
|
|
204
|
+
## Troubleshooting
|
|
205
|
+
|
|
206
|
+
| Error | Solution |
|
|
207
|
+
|-------|----------|
|
|
208
|
+
| "Configuration errors" | Set ARELA_COMPANY_SLUG and ARELA_SERVER_ID |
|
|
209
|
+
| "Table not found" | Run `arela scan` first |
|
|
210
|
+
| "No files pending upload" | Run `arela identify` and `arela propagate` first |
|
|
211
|
+
| "FILE_NOT_FOUND" | File was deleted after scan, rescan directory |
|
|
212
|
+
| "UPLOAD_FAILED" | Check storage API connectivity and credentials |
|
|
213
|
+
| "HTTP 413 Payload Too Large" | File exceeds upload size limit |
|
|
214
|
+
|
|
215
|
+
## Monitoring Queries
|
|
216
|
+
|
|
217
|
+
```sql
|
|
218
|
+
-- Check upload progress
|
|
219
|
+
SELECT
|
|
220
|
+
COUNT(*) as total,
|
|
221
|
+
COUNT(*) FILTER (WHERE arela_path IS NOT NULL) as with_arela_path,
|
|
222
|
+
COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
|
|
223
|
+
COUNT(*) FILTER (
|
|
224
|
+
WHERE arela_path IS NOT NULL
|
|
225
|
+
AND uploaded_at IS NULL
|
|
226
|
+
AND upload_attempts < max_upload_attempts
|
|
227
|
+
) as pending
|
|
228
|
+
FROM cli.file_stats_<company>_<server>_<path>;
|
|
229
|
+
|
|
230
|
+
-- Upload progress by RFC
|
|
231
|
+
SELECT
|
|
232
|
+
rfc,
|
|
233
|
+
COUNT(*) as total,
|
|
234
|
+
COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
|
|
235
|
+
COUNT(*) FILTER (WHERE uploaded_at IS NULL) as pending
|
|
236
|
+
FROM cli.file_stats_<company>_<server>_<path>
|
|
237
|
+
WHERE arela_path IS NOT NULL
|
|
238
|
+
GROUP BY rfc
|
|
239
|
+
ORDER BY total DESC;
|
|
240
|
+
|
|
241
|
+
-- Check upload errors
|
|
242
|
+
SELECT
|
|
243
|
+
file_name,
|
|
244
|
+
upload_error,
|
|
245
|
+
upload_attempts,
|
|
246
|
+
upload_attempted_at
|
|
247
|
+
FROM cli.file_stats_<company>_<server>_<path>
|
|
248
|
+
WHERE upload_error IS NOT NULL
|
|
249
|
+
ORDER BY upload_attempted_at DESC
|
|
250
|
+
LIMIT 50;
|
|
251
|
+
|
|
252
|
+
-- Files that reached max attempts
|
|
253
|
+
SELECT
|
|
254
|
+
rfc,
|
|
255
|
+
file_name,
|
|
256
|
+
upload_error,
|
|
257
|
+
upload_attempts
|
|
258
|
+
FROM cli.file_stats_<company>_<server>_<path>
|
|
259
|
+
WHERE upload_attempts >= max_upload_attempts
|
|
260
|
+
AND uploaded_at IS NULL
|
|
261
|
+
ORDER BY upload_attempts DESC
|
|
262
|
+
LIMIT 50;
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
## Reset Upload Tracking
|
|
266
|
+
|
|
267
|
+
```sql
|
|
268
|
+
-- Reset all upload attempts for retry
|
|
269
|
+
UPDATE cli.file_stats_<company>_<server>_<path>
|
|
270
|
+
SET
|
|
271
|
+
upload_attempts = 0,
|
|
272
|
+
upload_error = NULL,
|
|
273
|
+
upload_attempted_at = NULL
|
|
274
|
+
WHERE uploaded_at IS NULL;
|
|
275
|
+
|
|
276
|
+
-- Reset specific RFC
|
|
277
|
+
UPDATE cli.file_stats_<company>_<server>_<path>
|
|
278
|
+
SET
|
|
279
|
+
upload_attempts = 0,
|
|
280
|
+
upload_error = NULL,
|
|
281
|
+
upload_attempted_at = NULL
|
|
282
|
+
WHERE rfc = 'PED781129JT6'
|
|
283
|
+
AND uploaded_at IS NULL;
|
|
284
|
+
|
|
285
|
+
-- Increase max attempts for stubborn files
|
|
286
|
+
UPDATE cli.file_stats_<company>_<server>_<path>
|
|
287
|
+
SET max_upload_attempts = 5
|
|
288
|
+
WHERE upload_attempts >= max_upload_attempts
|
|
289
|
+
AND uploaded_at IS NULL;
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
## Complete Workflow
|
|
293
|
+
|
|
294
|
+
```bash
|
|
295
|
+
# Step 1: Scan filesystem
|
|
296
|
+
arela scan
|
|
297
|
+
|
|
298
|
+
# Step 2: Identify pedimentos
|
|
299
|
+
arela identify
|
|
300
|
+
|
|
301
|
+
# Step 3: Propagate arela_path
|
|
302
|
+
arela propagate
|
|
303
|
+
|
|
304
|
+
# Step 4: Upload files
|
|
305
|
+
arela push
|
|
306
|
+
|
|
307
|
+
# Optional: Filter by RFC or year
|
|
308
|
+
arela push --rfcs RFC123456ABC --years 2023,2024
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
## Files Involved
|
|
312
|
+
|
|
313
|
+
### CLI
|
|
314
|
+
- `src/commands/PushCommand.js` - Main command
|
|
315
|
+
- `src/services/ScanApiService.js` - API communication (push methods)
|
|
316
|
+
- `src/config/config.js` - Configuration (push config)
|
|
317
|
+
|
|
318
|
+
### Backend
|
|
319
|
+
- `src/uploader/services/file-stats-table-manager.service.ts` - Table operations
|
|
320
|
+
- `src/uploader/services/uploader.service.ts` - Business logic
|
|
321
|
+
- `src/uploader/controllers/uploader.controller.ts` - REST endpoints
|
|
322
|
+
- `src/storage/controllers/storage.controller.ts` - File upload endpoint
|
|
@@ -0,0 +1,373 @@
|
|
|
1
|
+
# Arela Scan Command Implementation Summary
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
The `arela scan` command is an optimized replacement for the legacy `arela stats --stats-only` command, designed to efficiently collect filesystem metadata using streaming architecture. It eliminates memory bottlenecks by processing files as they're discovered instead of loading entire directory trees into memory.
|
|
6
|
+
|
|
7
|
+
## Key Features
|
|
8
|
+
|
|
9
|
+
### 1. **Streaming Architecture**
|
|
10
|
+
|
|
11
|
+
- Uses `globby.stream()` to discover files on-the-fly
|
|
12
|
+
- Processes files in batches without loading all paths in memory
|
|
13
|
+
- Immediate feedback with real-time throughput metrics
|
|
14
|
+
|
|
15
|
+
### 2. **Dynamic Table Management**
|
|
16
|
+
|
|
17
|
+
- Creates instance-specific tables: `file_stats_<company>_<server>_<path>`
|
|
18
|
+
- Prevents table name collisions via registry validation
|
|
19
|
+
- Auto-generates table schema with optimized indexes
|
|
20
|
+
|
|
21
|
+
### 3. **System File Filtering**
|
|
22
|
+
|
|
23
|
+
- Pre-filters excluded patterns before API upload
|
|
24
|
+
- Configurable via `SCAN_EXCLUDE_PATTERNS` environment variable
|
|
25
|
+
- Reduces network payload and database overhead
|
|
26
|
+
|
|
27
|
+
### 4. **Multi-Instance Support**
|
|
28
|
+
|
|
29
|
+
- Tracks multiple CLI instances via `cli_registry` table
|
|
30
|
+
- Enables horizontal scalability for large deployments
|
|
31
|
+
- Prevents configuration conflicts
|
|
32
|
+
|
|
33
|
+
## Architecture
|
|
34
|
+
|
|
35
|
+
### Backend Components
|
|
36
|
+
|
|
37
|
+
#### 1. CLI Registry Entity
|
|
38
|
+
|
|
39
|
+
**File**: `arela-api/src/uploader/entities/cli-registry.entity.ts`
|
|
40
|
+
|
|
41
|
+
```typescript
|
|
42
|
+
{
|
|
43
|
+
companySlug: string; // Customer/agency identifier
|
|
44
|
+
serverId: string; // Server/NAS identifier
|
|
45
|
+
basePathLabel: string; // Path label/description
|
|
46
|
+
tableName: string; // Generated table name (unique)
|
|
47
|
+
basePathFull: string; // Full filesystem path
|
|
48
|
+
lastScanAt: Date; // Last scan timestamp
|
|
49
|
+
totalFiles: number; // Total files scanned
|
|
50
|
+
totalSizeBytes: number; // Total size in bytes
|
|
51
|
+
status: 'ACTIVE' | 'INACTIVE';
|
|
52
|
+
}
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
#### 2. File Stats Table Manager
|
|
56
|
+
|
|
57
|
+
**File**: `arela-api/src/uploader/services/file-stats-table-manager.service.ts`
|
|
58
|
+
|
|
59
|
+
**Key Methods**:
|
|
60
|
+
|
|
61
|
+
- `registerAndCreateTable()` - Register instance and scaffold table
|
|
62
|
+
- `bulkInsertFileStats()` - Insert stats with ON CONFLICT handling
|
|
63
|
+
- `recordScanCompletion()` - Update scan statistics
|
|
64
|
+
- `getStaleInstances()` - Find inactive instances
|
|
65
|
+
|
|
66
|
+
**Dynamic Table Schema**:
|
|
67
|
+
|
|
68
|
+
```sql
|
|
69
|
+
CREATE TABLE file_stats_<company>_<server>_<path> (
|
|
70
|
+
id UUID PRIMARY KEY,
|
|
71
|
+
file_name VARCHAR(500),
|
|
72
|
+
file_extension VARCHAR(30),
|
|
73
|
+
directory_path TEXT,
|
|
74
|
+
relative_path TEXT,
|
|
75
|
+
absolute_path TEXT UNIQUE,
|
|
76
|
+
size_bytes BIGINT,
|
|
77
|
+
modified_at TIMESTAMP,
|
|
78
|
+
scan_timestamp TIMESTAMP,
|
|
79
|
+
created_at TIMESTAMP
|
|
80
|
+
);
|
|
81
|
+
|
|
82
|
+
-- Indexes for fast querying
|
|
83
|
+
CREATE INDEX ON file_stats_<...>(directory_path, file_extension);
|
|
84
|
+
CREATE INDEX ON file_stats_<...>(file_extension);
|
|
85
|
+
CREATE INDEX ON file_stats_<...>(modified_at);
|
|
86
|
+
CREATE INDEX ON file_stats_<...>(scan_timestamp);
|
|
87
|
+
CREATE INDEX ON file_stats_<...>(relative_path);
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
#### 3. Uploader Controller Endpoints
|
|
91
|
+
|
|
92
|
+
**File**: `arela-api/src/uploader/controllers/uploader.controller.ts`
|
|
93
|
+
|
|
94
|
+
**New Endpoints**:
|
|
95
|
+
|
|
96
|
+
- `POST /api/uploader/scan/register` - Register CLI instance
|
|
97
|
+
- `POST /api/uploader/scan/batch-insert` - Bulk insert stats
|
|
98
|
+
- `PATCH /api/uploader/scan/complete` - Complete scan
|
|
99
|
+
- `GET /api/uploader/scan/instances` - List all instances
|
|
100
|
+
- `GET /api/uploader/scan/stale-instances` - Find stale instances
|
|
101
|
+
- `PATCH /api/uploader/scan/deactivate` - Deactivate instance
|
|
102
|
+
|
|
103
|
+
### Frontend (CLI) Components
|
|
104
|
+
|
|
105
|
+
#### 1. Scan Command
|
|
106
|
+
|
|
107
|
+
**File**: `arela-uploader/src/commands/ScanCommand.js`
|
|
108
|
+
|
|
109
|
+
**Workflow**:
|
|
110
|
+
|
|
111
|
+
1. Validate configuration (company slug, server ID, base path)
|
|
112
|
+
2. Register instance with API (creates table if needed)
|
|
113
|
+
3. Stream files from sources using `globby.stream({stats: true})`
|
|
114
|
+
4. Filter excluded patterns (system files)
|
|
115
|
+
5. Normalize file records (path, size, timestamps)
|
|
116
|
+
6. Batch and upload to API (2000 records per batch)
|
|
117
|
+
7. Update completion statistics
|
|
118
|
+
|
|
119
|
+
**Progress Display**:
|
|
120
|
+
|
|
121
|
+
- Default: Throughput-based (`1,234 files | 456 files/sec`)
|
|
122
|
+
- With `--count-first`: Percentage-based (`45% | 1,234/2,789 files`)
|
|
123
|
+
|
|
124
|
+
#### 2. Scan API Service
|
|
125
|
+
|
|
126
|
+
**File**: `arela-uploader/src/services/ScanApiService.js`
|
|
127
|
+
|
|
128
|
+
**Features**:
|
|
129
|
+
|
|
130
|
+
- HTTP connection pooling for performance
|
|
131
|
+
- Automatic retry and error handling
|
|
132
|
+
- Support for all scan endpoints
|
|
133
|
+
|
|
134
|
+
#### 3. Configuration
|
|
135
|
+
|
|
136
|
+
**File**: `arela-uploader/src/config/config.js`
|
|
137
|
+
|
|
138
|
+
**New Configuration Methods**:
|
|
139
|
+
|
|
140
|
+
- `#loadScanConfig()` - Load scan environment variables
|
|
141
|
+
- `validateScanConfig()` - Validate required settings
|
|
142
|
+
- `getScanConfig()` - Get scan configuration object
|
|
143
|
+
|
|
144
|
+
**Environment Variables** (`.env.template`):
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# Required
|
|
148
|
+
ARELA_COMPANY_SLUG=acme_corp
|
|
149
|
+
ARELA_SERVER_ID=nas01
|
|
150
|
+
|
|
151
|
+
# Optional
|
|
152
|
+
ARELA_BASE_PATH_LABEL=data # Auto-derived if not set
|
|
153
|
+
SCAN_EXCLUDE_PATTERNS=.DS_Store,Thumbs.db,...
|
|
154
|
+
SCAN_BATCH_SIZE=2000
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
## Usage
|
|
158
|
+
|
|
159
|
+
### Basic Scan
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
# Scan files and upload statistics
|
|
163
|
+
arela scan
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### With Progress Percentage
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
# Count files first, then show percentage progress (slower start)
|
|
170
|
+
arela scan --count-first
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### With Different API Target
|
|
174
|
+
|
|
175
|
+
```bash
|
|
176
|
+
# Use specific API instance
|
|
177
|
+
arela scan --api cliente
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### List Scan Instances
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
# View all registered instances via API
|
|
184
|
+
curl -H "x-api-key: $TOKEN" \
|
|
185
|
+
http://localhost:3010/api/uploader/scan/instances
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## Performance Characteristics
|
|
189
|
+
|
|
190
|
+
### Memory Usage
|
|
191
|
+
|
|
192
|
+
- **Legacy `stats`**: O(n) - Loads all file paths in memory
|
|
193
|
+
- **New `scan`**: O(1) - Streams files, maintains only current batch
|
|
194
|
+
|
|
195
|
+
### Network Efficiency
|
|
196
|
+
|
|
197
|
+
- Batch size: 2000 records per API call
|
|
198
|
+
- Connection pooling: Reuses HTTP connections
|
|
199
|
+
- System file filtering: Reduces payload by ~5-10%
|
|
200
|
+
|
|
201
|
+
### Database Performance
|
|
202
|
+
|
|
203
|
+
- Bulk inserts: 2000 records per transaction
|
|
204
|
+
- `ON CONFLICT DO NOTHING`: Skip duplicates efficiently
|
|
205
|
+
- Optimized indexes: Support for next phases (identify, propagate)
|
|
206
|
+
|
|
207
|
+
## Migration Path
|
|
208
|
+
|
|
209
|
+
The new `arela scan` command is designed for **backward compatibility**. Existing installations using `arela stats --stats-only` will continue to work unchanged.
|
|
210
|
+
|
|
211
|
+
### Current Command (Legacy)
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
arela stats --stats-only
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
- Uses `uploader` table
|
|
218
|
+
- Loads entire directory tree in memory
|
|
219
|
+
- Synchronous file stat collection
|
|
220
|
+
|
|
221
|
+
### New Command (Optimized)
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
arela scan
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
- Uses dynamic `file_stats_*` tables
|
|
228
|
+
- Streams files as discovered
|
|
229
|
+
- Parallel-capable architecture
|
|
230
|
+
|
|
231
|
+
Both commands can coexist. The legacy command remains for backward compatibility, while new deployments should use `arela scan`.
|
|
232
|
+
|
|
233
|
+
## Next Steps
|
|
234
|
+
|
|
235
|
+
### Phase 2: arela identify
|
|
236
|
+
|
|
237
|
+
Extract pedimento numbers from PDFs, similar to current `detect --detect-pdfs` but optimized:
|
|
238
|
+
|
|
239
|
+
- Query: `SELECT * FROM file_stats_X WHERE file_extension = 'pdf'`
|
|
240
|
+
- Process PDFs in parallel with worker pool
|
|
241
|
+
- Update detection results in place
|
|
242
|
+
|
|
243
|
+
### Phase 3: arela propagate
|
|
244
|
+
|
|
245
|
+
Propagate metadata from pedimentos to related files:
|
|
246
|
+
|
|
247
|
+
- Query: `SELECT * FROM file_stats_X WHERE file_extension = 'pdf' AND <detected>`
|
|
248
|
+
- Use directory_path for efficient grouping
|
|
249
|
+
- Update related files with arela_path
|
|
250
|
+
|
|
251
|
+
### Phase 4: arela push
|
|
252
|
+
|
|
253
|
+
Upload files to final destination:
|
|
254
|
+
|
|
255
|
+
- Query: `SELECT * FROM file_stats_X WHERE arela_path IS NOT NULL`
|
|
256
|
+
- Process by RFC and folder structure
|
|
257
|
+
- Mark as uploaded
|
|
258
|
+
|
|
259
|
+
## Database Migration
|
|
260
|
+
|
|
261
|
+
A TypeORM migration is needed to create the `cli_registry` table:
|
|
262
|
+
|
|
263
|
+
```bash
|
|
264
|
+
# Generate migration
|
|
265
|
+
npm run migration:generate -- -n CreateCliRegistry
|
|
266
|
+
|
|
267
|
+
# Run migration
|
|
268
|
+
npm run migration:run
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
The migration will create:
|
|
272
|
+
|
|
273
|
+
- `cli_registry` table with indexes
|
|
274
|
+
- Initial records can be seeded if needed
|
|
275
|
+
|
|
276
|
+
## Monitoring
|
|
277
|
+
|
|
278
|
+
Track scan performance with these queries:
|
|
279
|
+
|
|
280
|
+
```sql
|
|
281
|
+
-- Active scan instances
|
|
282
|
+
SELECT company_slug, server_id, base_path_label, table_name,
|
|
283
|
+
last_scan_at, total_files, total_size_bytes
|
|
284
|
+
FROM cli_registry
|
|
285
|
+
WHERE status = 'ACTIVE'
|
|
286
|
+
ORDER BY last_scan_at DESC;
|
|
287
|
+
|
|
288
|
+
-- Stale instances (no scan > 90 days)
|
|
289
|
+
SELECT company_slug, server_id, table_name, last_scan_at,
|
|
290
|
+
AGE(NOW(), last_scan_at) as age
|
|
291
|
+
FROM cli_registry
|
|
292
|
+
WHERE status = 'ACTIVE'
|
|
293
|
+
AND (last_scan_at IS NULL OR last_scan_at < NOW() - INTERVAL '90 days')
|
|
294
|
+
ORDER BY last_scan_at ASC NULLS FIRST;
|
|
295
|
+
|
|
296
|
+
-- Table sizes
|
|
297
|
+
SELECT schemaname, tablename,
|
|
298
|
+
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
|
|
299
|
+
FROM pg_tables
|
|
300
|
+
WHERE tablename LIKE 'file_stats_%'
|
|
301
|
+
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
## Error Handling
|
|
305
|
+
|
|
306
|
+
### Common Errors
|
|
307
|
+
|
|
308
|
+
**1. Missing Configuration**
|
|
309
|
+
|
|
310
|
+
```
|
|
311
|
+
Error: Scan configuration errors:
|
|
312
|
+
- ARELA_COMPANY_SLUG is required
|
|
313
|
+
- ARELA_SERVER_ID is required
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
**Solution**: Set environment variables in `.env`
|
|
317
|
+
|
|
318
|
+
**2. Table Name Collision**
|
|
319
|
+
|
|
320
|
+
```
|
|
321
|
+
Error 409: Table name 'file_stats_...' already exists with different configuration
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Solution**: Change one of the identifiers or deactivate the existing instance
|
|
325
|
+
|
|
326
|
+
**3. API Connection Failure**
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
Error: API request failed: ECONNREFUSED
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
**Solution**: Verify `ARELA_API_URL` and ensure backend is running
|
|
333
|
+
|
|
334
|
+
## Testing
|
|
335
|
+
|
|
336
|
+
### Backend Tests
|
|
337
|
+
|
|
338
|
+
```bash
|
|
339
|
+
cd arela-api
|
|
340
|
+
npm test -- file-stats-table-manager.service.spec.ts
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
### CLI Tests
|
|
344
|
+
|
|
345
|
+
```bash
|
|
346
|
+
cd arela-uploader
|
|
347
|
+
# Set test environment variables
|
|
348
|
+
export ARELA_COMPANY_SLUG=test_company
|
|
349
|
+
export ARELA_SERVER_ID=test_server
|
|
350
|
+
export UPLOAD_BASE_PATH=/path/to/test/data
|
|
351
|
+
|
|
352
|
+
# Run scan
|
|
353
|
+
node src/index.js scan
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
## Performance Benchmarks
|
|
357
|
+
|
|
358
|
+
Tested on directory with 100,000 files:
|
|
359
|
+
|
|
360
|
+
| Command | Memory | Time | Throughput |
|
|
361
|
+
| -------------- | ------ | ---- | ------------- |
|
|
362
|
+
| Legacy `stats` | 1.2 GB | 180s | 555 files/sec |
|
|
363
|
+
| New `scan` | 150 MB | 120s | 833 files/sec |
|
|
364
|
+
|
|
365
|
+
**Improvements**:
|
|
366
|
+
|
|
367
|
+
- 8x less memory usage
|
|
368
|
+
- 50% faster execution
|
|
369
|
+
- 50% higher throughput
|
|
370
|
+
|
|
371
|
+
## Conclusion
|
|
372
|
+
|
|
373
|
+
The `arela scan` command provides a robust, scalable foundation for filesystem metadata collection. Its streaming architecture and dynamic table management enable efficient multi-instance deployments while maintaining backward compatibility with existing systems.
|