@arela/uploader 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/.env.local +316 -0
  2. package/.env.template +70 -0
  3. package/coverage/IdentifyCommand.js.html +1462 -0
  4. package/coverage/PropagateCommand.js.html +1507 -0
  5. package/coverage/PushCommand.js.html +1504 -0
  6. package/coverage/ScanCommand.js.html +1654 -0
  7. package/coverage/UploadCommand.js.html +1846 -0
  8. package/coverage/WatchCommand.js.html +4111 -0
  9. package/coverage/base.css +224 -0
  10. package/coverage/block-navigation.js +87 -0
  11. package/coverage/favicon.png +0 -0
  12. package/coverage/index.html +191 -0
  13. package/coverage/lcov-report/IdentifyCommand.js.html +1462 -0
  14. package/coverage/lcov-report/PropagateCommand.js.html +1507 -0
  15. package/coverage/lcov-report/PushCommand.js.html +1504 -0
  16. package/coverage/lcov-report/ScanCommand.js.html +1654 -0
  17. package/coverage/lcov-report/UploadCommand.js.html +1846 -0
  18. package/coverage/lcov-report/WatchCommand.js.html +4111 -0
  19. package/coverage/lcov-report/base.css +224 -0
  20. package/coverage/lcov-report/block-navigation.js +87 -0
  21. package/coverage/lcov-report/favicon.png +0 -0
  22. package/coverage/lcov-report/index.html +191 -0
  23. package/coverage/lcov-report/prettify.css +1 -0
  24. package/coverage/lcov-report/prettify.js +2 -0
  25. package/coverage/lcov-report/sort-arrow-sprite.png +0 -0
  26. package/coverage/lcov-report/sorter.js +210 -0
  27. package/coverage/lcov.info +1937 -0
  28. package/coverage/prettify.css +1 -0
  29. package/coverage/prettify.js +2 -0
  30. package/coverage/sort-arrow-sprite.png +0 -0
  31. package/coverage/sorter.js +210 -0
  32. package/docs/API_RETRY_MECHANISM.md +338 -0
  33. package/docs/ARELA_IDENTIFY_IMPLEMENTATION.md +489 -0
  34. package/docs/ARELA_IDENTIFY_QUICKREF.md +186 -0
  35. package/docs/ARELA_PROPAGATE_IMPLEMENTATION.md +581 -0
  36. package/docs/ARELA_PROPAGATE_QUICKREF.md +272 -0
  37. package/docs/ARELA_PUSH_IMPLEMENTATION.md +577 -0
  38. package/docs/ARELA_PUSH_QUICKREF.md +322 -0
  39. package/docs/ARELA_SCAN_IMPLEMENTATION.md +373 -0
  40. package/docs/ARELA_SCAN_QUICKREF.md +139 -0
  41. package/docs/CROSS_PLATFORM_PATH_HANDLING.md +593 -0
  42. package/docs/DETECTION_ATTEMPT_TRACKING.md +414 -0
  43. package/docs/MIGRATION_UPLOADER_TO_FILE_STATS.md +1020 -0
  44. package/docs/MULTI_LEVEL_DIRECTORY_SCANNING.md +494 -0
  45. package/docs/STATS_COMMAND_SEQUENCE_DIAGRAM.md +287 -0
  46. package/docs/STATS_COMMAND_SIMPLE.md +93 -0
  47. package/package.json +31 -3
  48. package/src/commands/IdentifyCommand.js +459 -0
  49. package/src/commands/PropagateCommand.js +474 -0
  50. package/src/commands/PushCommand.js +473 -0
  51. package/src/commands/ScanCommand.js +523 -0
  52. package/src/config/config.js +154 -7
  53. package/src/file-detection.js +9 -10
  54. package/src/index.js +150 -0
  55. package/src/services/ScanApiService.js +645 -0
  56. package/src/utils/PathNormalizer.js +220 -0
  57. package/tests/commands/IdentifyCommand.test.js +570 -0
  58. package/tests/commands/PropagateCommand.test.js +568 -0
  59. package/tests/commands/PushCommand.test.js +754 -0
  60. package/tests/commands/ScanCommand.test.js +382 -0
  61. package/tests/unit/PathAndTableNameGeneration.test.js +1211 -0
@@ -0,0 +1,1020 @@
1
+ # Migration Plan: Uploader Table to file_stats_* Tables
2
+
3
+ ## Overview
4
+
5
+ This document outlines the strategy to migrate existing data from the legacy `uploader` table to the new `file_stats_<company>_<server>_<path>` table structure without re-uploading files or losing existing metadata.
6
+
7
+ ## Goals
8
+
9
+ 1. ✅ Preserve all uploaded file references and metadata
10
+ 2. ✅ Create appropriate `file_stats_*` tables based on existing data
11
+ 3. ✅ Maintain upload history and detection results
12
+ 4. ✅ Enable continued use of new CLI commands (identify, propagate, push)
13
+ 5. ✅ Zero downtime migration where possible
14
+ 6. ✅ Rollback capability if issues arise
15
+
16
+ ## Prerequisites
17
+
18
+ ### 1. Understand Current Schema
19
+
20
+ **Uploader Table (Legacy)**:
21
+ ```sql
22
+ -- Assumed schema based on implementation references
23
+ public.uploader (
24
+ id UUID PRIMARY KEY,
25
+ original_path TEXT UNIQUE,
26
+ relative_path TEXT,
27
+ file_name VARCHAR,
28
+ file_extension VARCHAR,
29
+ directory_path TEXT,
30
+ size_bytes BIGINT,
31
+ modified_at TIMESTAMP,
32
+
33
+ -- Detection fields
34
+ detected_type VARCHAR,
35
+ detected_pedimento VARCHAR,
36
+ detected_pedimento_year INTEGER,
37
+ rfc VARCHAR,
38
+ arela_path TEXT,
39
+ detection_attempted_at TIMESTAMP,
40
+ detection_error TEXT,
41
+
42
+ -- Upload fields
43
+ uploaded_at TIMESTAMP,
44
+ storage_id UUID,
45
+ upload_path TEXT,
46
+
47
+ -- Status fields
48
+ status VARCHAR,
49
+ processing_status VARCHAR,
50
+
51
+ -- Timestamps
52
+ created_at TIMESTAMP,
53
+ updated_at TIMESTAMP
54
+ )
55
+ ```
56
+
57
+ ### 2. New Schema (file_stats_*)
58
+
59
+ **File Stats Table (New)**:
60
+ ```sql
61
+ cli.file_stats_<company>_<server>_<path> (
62
+ id UUID PRIMARY KEY,
63
+ file_name VARCHAR(500),
64
+ file_extension VARCHAR(50),
65
+ directory_path TEXT,
66
+ relative_path TEXT,
67
+ absolute_path TEXT UNIQUE,
68
+ size_bytes BIGINT,
69
+ modified_at TIMESTAMP,
70
+ scan_timestamp TIMESTAMP,
71
+
72
+ -- Detection fields
73
+ detected_type VARCHAR(100),
74
+ detected_pedimento VARCHAR(50),
75
+ detected_pedimento_year INTEGER,
76
+ rfc VARCHAR(20),
77
+ arela_path TEXT,
78
+ detection_attempted_at TIMESTAMP,
79
+ detection_error TEXT,
80
+ detection_attempts INTEGER DEFAULT 0,
81
+ max_detection_attempts INTEGER DEFAULT 3,
82
+ is_not_pedimento BOOLEAN DEFAULT FALSE,
83
+
84
+ -- Propagation fields
85
+ propagation_attempted_at TIMESTAMP,
86
+ propagation_attempts INTEGER DEFAULT 0,
87
+ max_propagation_attempts INTEGER DEFAULT 3,
88
+ propagation_error TEXT,
89
+ propagated_from_id UUID,
90
+ needs_propagation BOOLEAN DEFAULT FALSE,
91
+
92
+ -- Upload fields
93
+ upload_attempted_at TIMESTAMP,
94
+ upload_attempts INTEGER DEFAULT 0,
95
+ max_upload_attempts INTEGER DEFAULT 3,
96
+ upload_error TEXT,
97
+ uploaded_at TIMESTAMP,
98
+ uploaded_to_storage_id UUID,
99
+ upload_path TEXT,
100
+
101
+ -- Timestamps
102
+ created_at TIMESTAMP DEFAULT NOW()
103
+ )
104
+ ```
105
+
106
+ ## Migration Strategy
107
+
108
+ ### Phase 1: Analysis and Planning
109
+
110
+ #### 1.1 Analyze Existing Data Distribution
111
+
112
+ Create analysis queries to understand data structure:
113
+
114
+ ```sql
115
+ -- Identify distinct company/server/path combinations
116
+ WITH path_analysis AS (
117
+ SELECT
118
+ -- Extract company identifier (if exists in path)
119
+ -- Extract server identifier (if exists in path)
120
+ -- Extract base path pattern
121
+ SPLIT_PART(original_path, '/', 1) as path_segment_1,
122
+ SPLIT_PART(original_path, '/', 2) as path_segment_2,
123
+ SPLIT_PART(original_path, '/', 3) as path_segment_3,
124
+ COUNT(*) as file_count,
125
+ SUM(size_bytes) as total_size_bytes,
126
+ COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded_count,
127
+ COUNT(*) FILTER (WHERE detected_type IS NOT NULL) as detected_count
128
+ FROM public.uploader
129
+ GROUP BY 1, 2, 3
130
+ ORDER BY file_count DESC
131
+ )
132
+ SELECT * FROM path_analysis
133
+ LIMIT 100;
134
+
135
+ -- Statistics by status
136
+ SELECT
137
+ status,
138
+ processing_status,
139
+ COUNT(*) as count,
140
+ COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
141
+ COUNT(*) FILTER (WHERE detected_type = 'pedimento_simplificado') as pedimentos,
142
+ pg_size_pretty(SUM(size_bytes)) as total_size
143
+ FROM public.uploader
144
+ GROUP BY status, processing_status
145
+ ORDER BY count DESC;
146
+
147
+ -- Uploaded files by RFC and year
148
+ SELECT
149
+ rfc,
150
+ detected_pedimento_year,
151
+ COUNT(*) as files,
152
+ COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
153
+ pg_size_pretty(SUM(size_bytes)) as total_size
154
+ FROM public.uploader
155
+ WHERE rfc IS NOT NULL
156
+ GROUP BY rfc, detected_pedimento_year
157
+ ORDER BY files DESC
158
+ LIMIT 50;
159
+ ```
160
+
161
+ **Output**: Save results to `migration_analysis.csv` for planning.
162
+
163
+ #### 1.2 Define Instance Mapping
164
+
165
+ Create a mapping file `instance_mapping.json`:
166
+
167
+ ```json
168
+ {
169
+ "instances": [
170
+ {
171
+ "company_slug": "agencia_palco",
172
+ "server_id": "nas01",
173
+ "base_path": "/mnt/storage/documentos",
174
+ "base_path_label": "documentos",
175
+ "path_pattern": "/mnt/storage/documentos/%",
176
+ "table_name": "file_stats_agencia_palco_nas01_documentos",
177
+ "estimated_records": 1500000
178
+ },
179
+ {
180
+ "company_slug": "cliente_xyz",
181
+ "server_id": "server01",
182
+ "base_path": "/data/archives",
183
+ "base_path_label": "archives",
184
+ "path_pattern": "/data/archives/%",
185
+ "table_name": "file_stats_cliente_xyz_server01_archives",
186
+ "estimated_records": 850000
187
+ }
188
+ ]
189
+ }
190
+ ```
191
+
192
+ **Action**: Review with stakeholders to confirm instance definitions.
193
+
194
+ #### 1.3 Estimate Migration Time
195
+
196
+ ```sql
197
+ -- Calculate migration metrics
198
+ SELECT
199
+ COUNT(*) as total_records,
200
+ pg_size_pretty(pg_total_relation_size('public.uploader')) as table_size,
201
+ COUNT(*) / NULLIF((EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at))) / 86400), 0) as records_per_day
202
+ FROM public.uploader;
203
+ ```
204
+
205
+ **Estimates**:
206
+ - **Records per batch**: 10,000
207
+ - **Expected throughput**: ~5,000 records/second
208
+ - **For 5M records**: ~17 minutes
209
+ - **Buffer time**: 2x = ~35 minutes per instance
210
+
211
+ ### Phase 2: Backend Implementation
212
+
213
+ #### 2.1 Create Migration Service
214
+
215
+ **File**: `arela-api/src/uploader/services/uploader-migration.service.ts`
216
+
217
+ ```typescript
218
+ import { Injectable, Logger } from '@nestjs/common';
219
+ import { InjectRepository } from '@nestjs/typeorm';
220
+ import { Repository } from 'typeorm';
221
+ import { FileStatsTableManagerService } from './file-stats-table-manager.service';
222
+ import { CliRegistry } from '../entities/cli-registry.entity';
223
+
224
+ interface MigrationInstance {
225
+ companySlug: string;
226
+ serverId: string;
227
+ basePath: string;
228
+ basePathLabel: string;
229
+ pathPattern: string;
230
+ tableName: string;
231
+ }
232
+
233
+ interface MigrationProgress {
234
+ tableName: string;
235
+ totalRecords: number;
236
+ migratedRecords: number;
237
+ errors: number;
238
+ startTime: Date;
239
+ estimatedCompletion?: Date;
240
+ }
241
+
242
+ @Injectable()
243
+ export class UploaderMigrationService {
244
+ private readonly logger = new Logger(UploaderMigrationService.name);
245
+
246
+ constructor(
247
+ @InjectRepository(CliRegistry)
248
+ private cliRegistryRepo: Repository<CliRegistry>,
249
+ private fileStatsManager: FileStatsTableManagerService,
250
+ ) {}
251
+
252
+ /**
253
+ * Migrate data from uploader table to file_stats_* tables
254
+ */
255
+ async migrateInstance(instance: MigrationInstance): Promise<MigrationProgress> {
256
+ const progress: MigrationProgress = {
257
+ tableName: instance.tableName,
258
+ totalRecords: 0,
259
+ migratedRecords: 0,
260
+ errors: 0,
261
+ startTime: new Date(),
262
+ };
263
+
264
+ try {
265
+ // 1. Count records to migrate
266
+ const countResult = await this.fileStatsManager.query(`
267
+ SELECT COUNT(*) as count
268
+ FROM public.uploader
269
+ WHERE original_path LIKE $1
270
+ `, [instance.pathPattern]);
271
+
272
+ progress.totalRecords = parseInt(countResult[0].count);
273
+ this.logger.log(`Starting migration for ${instance.tableName}: ${progress.totalRecords} records`);
274
+
275
+ // 2. Register CLI instance (creates table)
276
+ await this.fileStatsManager.registerAndCreateTable({
277
+ companySlug: instance.companySlug,
278
+ serverId: instance.serverId,
279
+ basePath: instance.basePath,
280
+ basePathLabel: instance.basePathLabel,
281
+ sources: [],
282
+ });
283
+
284
+ this.logger.log(`Created table: ${instance.tableName}`);
285
+
286
+ // 3. Migrate data in batches
287
+ const batchSize = 10000;
288
+ let offset = 0;
289
+
290
+ while (offset < progress.totalRecords) {
291
+ const batch = await this.migrateBatch(
292
+ instance.pathPattern,
293
+ instance.tableName,
294
+ instance.basePath,
295
+ offset,
296
+ batchSize,
297
+ );
298
+
299
+ progress.migratedRecords += batch.inserted;
300
+ progress.errors += batch.errors;
301
+
302
+ // Log progress every 10 batches
303
+ if (offset % (batchSize * 10) === 0) {
304
+ const pct = ((progress.migratedRecords / progress.totalRecords) * 100).toFixed(1);
305
+ this.logger.log(`Migration progress: ${pct}% (${progress.migratedRecords}/${progress.totalRecords})`);
306
+ }
307
+
308
+ offset += batchSize;
309
+ }
310
+
311
+ // 4. Update registry with migration info
312
+ await this.cliRegistryRepo.update(
313
+ { tableName: instance.tableName },
314
+ {
315
+ lastScanAt: new Date(),
316
+ totalFiles: progress.migratedRecords,
317
+ },
318
+ );
319
+
320
+ progress.estimatedCompletion = new Date();
321
+ this.logger.log(`Migration completed for ${instance.tableName}: ${progress.migratedRecords} records`);
322
+
323
+ return progress;
324
+ } catch (error) {
325
+ this.logger.error(`Migration failed for ${instance.tableName}:`, error);
326
+ throw error;
327
+ }
328
+ }
329
+
330
+ /**
331
+ * Migrate a batch of records
332
+ */
333
+ private async migrateBatch(
334
+ pathPattern: string,
335
+ tableName: string,
336
+ basePath: string,
337
+ offset: number,
338
+ limit: number,
339
+ ): Promise<{ inserted: number; errors: number }> {
340
+ try {
341
+ // Fetch batch from uploader table
342
+ const records = await this.fileStatsManager.query(`
343
+ SELECT
344
+ id,
345
+ original_path,
346
+ relative_path,
347
+ file_name,
348
+ file_extension,
349
+ directory_path,
350
+ size_bytes,
351
+ modified_at,
352
+
353
+ detected_type,
354
+ detected_pedimento,
355
+ detected_pedimento_year,
356
+ rfc,
357
+ arela_path,
358
+ detection_attempted_at,
359
+ detection_error,
360
+
361
+ uploaded_at,
362
+ storage_id as uploaded_to_storage_id,
363
+ upload_path,
364
+
365
+ created_at
366
+ FROM public.uploader
367
+ WHERE original_path LIKE $1
368
+ ORDER BY created_at
369
+ OFFSET $2 LIMIT $3
370
+ `, [pathPattern, offset, limit]);
371
+
372
+ if (records.length === 0) {
373
+ return { inserted: 0, errors: 0 };
374
+ }
375
+
376
+ // Transform records for new schema
377
+ const transformedRecords = records.map(record => ({
378
+ id: record.id,
379
+ file_name: record.file_name,
380
+ file_extension: record.file_extension,
381
+ directory_path: record.directory_path,
382
+ relative_path: record.relative_path,
383
+ absolute_path: record.original_path,
384
+ size_bytes: record.size_bytes,
385
+ modified_at: record.modified_at,
386
+ scan_timestamp: record.created_at,
387
+
388
+ // Detection fields
389
+ detected_type: record.detected_type,
390
+ detected_pedimento: record.detected_pedimento,
391
+ detected_pedimento_year: record.detected_pedimento_year,
392
+ rfc: record.rfc,
393
+ arela_path: record.arela_path,
394
+ detection_attempted_at: record.detection_attempted_at,
395
+ detection_error: record.detection_error,
396
+ detection_attempts: record.detection_attempted_at ? 1 : 0,
397
+ max_detection_attempts: 3,
398
+ is_not_pedimento: record.detection_error?.includes('NOT_PEDIMENTO') || false,
399
+
400
+ // Propagation fields (initialize)
401
+ propagation_attempts: 0,
402
+ max_propagation_attempts: 3,
403
+ needs_propagation: false,
404
+
405
+ // Upload fields
406
+ uploaded_at: record.uploaded_at,
407
+ uploaded_to_storage_id: record.uploaded_to_storage_id,
408
+ upload_path: record.upload_path,
409
+ upload_attempts: record.uploaded_at ? 1 : 0,
410
+ max_upload_attempts: 3,
411
+
412
+ created_at: record.created_at,
413
+ }));
414
+
415
+ // Insert into new table using ON CONFLICT
416
+ const insertQuery = `
417
+ INSERT INTO cli.${tableName} (
418
+ id, file_name, file_extension, directory_path, relative_path,
419
+ absolute_path, size_bytes, modified_at, scan_timestamp,
420
+ detected_type, detected_pedimento, detected_pedimento_year, rfc, arela_path,
421
+ detection_attempted_at, detection_error, detection_attempts,
422
+ max_detection_attempts, is_not_pedimento,
423
+ propagation_attempts, max_propagation_attempts, needs_propagation,
424
+ uploaded_at, uploaded_to_storage_id, upload_path, upload_attempts,
425
+ max_upload_attempts, created_at
426
+ )
427
+ SELECT * FROM UNNEST(
428
+ $1::uuid[], $2::varchar[], $3::varchar[], $4::text[], $5::text[],
429
+ $6::text[], $7::bigint[], $8::timestamp[], $9::timestamp[],
430
+ $10::varchar[], $11::varchar[], $12::integer[], $13::varchar[], $14::text[],
431
+ $15::timestamp[], $16::text[], $17::integer[],
432
+ $18::integer[], $19::boolean[],
433
+ $20::integer[], $21::integer[], $22::boolean[],
434
+ $23::timestamp[], $24::uuid[], $25::text[], $26::integer[],
435
+ $27::integer[], $28::timestamp[]
436
+ )
437
+ ON CONFLICT (absolute_path) DO UPDATE SET
438
+ detected_type = EXCLUDED.detected_type,
439
+ detected_pedimento = EXCLUDED.detected_pedimento,
440
+ detected_pedimento_year = EXCLUDED.detected_pedimento_year,
441
+ rfc = EXCLUDED.rfc,
442
+ arela_path = EXCLUDED.arela_path,
443
+ uploaded_at = EXCLUDED.uploaded_at,
444
+ uploaded_to_storage_id = EXCLUDED.uploaded_to_storage_id,
445
+ upload_path = EXCLUDED.upload_path
446
+ `;
447
+
448
+ const params = [
449
+ transformedRecords.map(r => r.id),
450
+ transformedRecords.map(r => r.file_name),
451
+ transformedRecords.map(r => r.file_extension),
452
+ transformedRecords.map(r => r.directory_path),
453
+ transformedRecords.map(r => r.relative_path),
454
+ transformedRecords.map(r => r.absolute_path),
455
+ transformedRecords.map(r => r.size_bytes),
456
+ transformedRecords.map(r => r.modified_at),
457
+ transformedRecords.map(r => r.scan_timestamp),
458
+ transformedRecords.map(r => r.detected_type),
459
+ transformedRecords.map(r => r.detected_pedimento),
460
+ transformedRecords.map(r => r.detected_pedimento_year),
461
+ transformedRecords.map(r => r.rfc),
462
+ transformedRecords.map(r => r.arela_path),
463
+ transformedRecords.map(r => r.detection_attempted_at),
464
+ transformedRecords.map(r => r.detection_error),
465
+ transformedRecords.map(r => r.detection_attempts),
466
+ transformedRecords.map(r => r.max_detection_attempts),
467
+ transformedRecords.map(r => r.is_not_pedimento),
468
+ transformedRecords.map(r => r.propagation_attempts),
469
+ transformedRecords.map(r => r.max_propagation_attempts),
470
+ transformedRecords.map(r => r.needs_propagation),
471
+ transformedRecords.map(r => r.uploaded_at),
472
+ transformedRecords.map(r => r.uploaded_to_storage_id),
473
+ transformedRecords.map(r => r.upload_path),
474
+ transformedRecords.map(r => r.upload_attempts),
475
+ transformedRecords.map(r => r.max_upload_attempts),
476
+ transformedRecords.map(r => r.created_at),
477
+ ];
478
+
479
+ await this.fileStatsManager.query(insertQuery, params);
480
+
481
+ return { inserted: records.length, errors: 0 };
482
+ } catch (error) {
483
+ this.logger.error(`Batch migration error:`, error);
484
+ return { inserted: 0, errors: limit };
485
+ }
486
+ }
487
+
488
+ /**
489
+ * Validate migration results
490
+ */
491
+ async validateMigration(instance: MigrationInstance): Promise<{
492
+ valid: boolean;
493
+ issues: string[];
494
+ }> {
495
+ const issues: string[] = [];
496
+
497
+ try {
498
+ // 1. Count comparison
499
+ const sourceCount = await this.fileStatsManager.query(`
500
+ SELECT COUNT(*) as count FROM public.uploader
501
+ WHERE original_path LIKE $1
502
+ `, [instance.pathPattern]);
503
+
504
+ const targetCount = await this.fileStatsManager.query(`
505
+ SELECT COUNT(*) as count FROM cli.${instance.tableName}
506
+ `);
507
+
508
+ const sourceCnt = parseInt(sourceCount[0].count);
509
+ const targetCnt = parseInt(targetCount[0].count);
510
+
511
+ if (sourceCnt !== targetCnt) {
512
+ issues.push(`Record count mismatch: source=${sourceCnt}, target=${targetCnt}`);
513
+ }
514
+
515
+ // 2. Uploaded files comparison
516
+ const sourceUploaded = await this.fileStatsManager.query(`
517
+ SELECT COUNT(*) as count FROM public.uploader
518
+ WHERE original_path LIKE $1 AND uploaded_at IS NOT NULL
519
+ `, [instance.pathPattern]);
520
+
521
+ const targetUploaded = await this.fileStatsManager.query(`
522
+ SELECT COUNT(*) as count FROM cli.${instance.tableName}
523
+ WHERE uploaded_at IS NOT NULL
524
+ `);
525
+
526
+ const sourceUploadedCnt = parseInt(sourceUploaded[0].count);
527
+ const targetUploadedCnt = parseInt(targetUploaded[0].count);
528
+
529
+ if (sourceUploadedCnt !== targetUploadedCnt) {
530
+ issues.push(`Uploaded count mismatch: source=${sourceUploadedCnt}, target=${targetUploadedCnt}`);
531
+ }
532
+
533
+ // 3. Detected pedimentos comparison
534
+ const sourceDetected = await this.fileStatsManager.query(`
535
+ SELECT COUNT(*) as count FROM public.uploader
536
+ WHERE original_path LIKE $1 AND detected_type = 'pedimento_simplificado'
537
+ `, [instance.pathPattern]);
538
+
539
+ const targetDetected = await this.fileStatsManager.query(`
540
+ SELECT COUNT(*) as count FROM cli.${instance.tableName}
541
+ WHERE detected_type = 'pedimento_simplificado'
542
+ `);
543
+
544
+ const sourceDetectedCnt = parseInt(sourceDetected[0].count);
545
+ const targetDetectedCnt = parseInt(targetDetected[0].count);
546
+
547
+ if (sourceDetectedCnt !== targetDetectedCnt) {
548
+ issues.push(`Detected count mismatch: source=${sourceDetectedCnt}, target=${targetDetectedCnt}`);
549
+ }
550
+
551
+ // 4. Sample data comparison
552
+ const sampleCheck = await this.fileStatsManager.query(`
553
+ SELECT
554
+ u.id,
555
+ u.original_path,
556
+ u.detected_type as src_type,
557
+ u.uploaded_at as src_uploaded,
558
+ f.detected_type as tgt_type,
559
+ f.uploaded_at as tgt_uploaded
560
+ FROM public.uploader u
561
+ LEFT JOIN cli.${instance.tableName} f ON f.id = u.id
562
+ WHERE u.original_path LIKE $1
563
+ AND (
564
+ u.detected_type IS DISTINCT FROM f.detected_type
565
+ OR u.uploaded_at IS DISTINCT FROM f.uploaded_at
566
+ )
567
+ LIMIT 10
568
+ `, [instance.pathPattern]);
569
+
570
+ if (sampleCheck.length > 0) {
571
+ issues.push(`Found ${sampleCheck.length} records with data mismatches`);
572
+ }
573
+
574
+ return {
575
+ valid: issues.length === 0,
576
+ issues,
577
+ };
578
+ } catch (error) {
579
+ issues.push(`Validation error: ${error.message}`);
580
+ return { valid: false, issues };
581
+ }
582
+ }
583
+
584
+ /**
585
+ * Rollback migration for an instance
586
+ */
587
+ async rollbackMigration(tableName: string): Promise<void> {
588
+ this.logger.warn(`Rolling back migration for ${tableName}`);
589
+
590
+ try {
591
+ // 1. Drop the table
592
+ await this.fileStatsManager.query(`
593
+ DROP TABLE IF EXISTS cli.${tableName}
594
+ `);
595
+
596
+ // 2. Remove registry entry
597
+ await this.cliRegistryRepo.delete({ tableName });
598
+
599
+ this.logger.log(`Rollback completed for ${tableName}`);
600
+ } catch (error) {
601
+ this.logger.error(`Rollback failed:`, error);
602
+ throw error;
603
+ }
604
+ }
605
+ }
606
+ ```
607
+
608
+ #### 2.2 Create Migration Controller Endpoints
609
+
610
+ **File**: `arela-api/src/uploader/controllers/uploader-migration.controller.ts`
611
+
612
+ ```typescript
613
+ import { Controller, Post, Get, Delete, Body, Query, UseGuards } from '@nestjs/common';
614
+ import { ApiKeyGuard } from '../../common/guards/api-key.guard';
615
+ import { UploaderMigrationService } from '../services/uploader-migration.service';
616
+
617
+ @Controller('api/uploader/migration')
618
+ @UseGuards(ApiKeyGuard)
619
+ export class UploaderMigrationController {
620
+ constructor(private migrationService: UploaderMigrationService) {}
621
+
622
+ @Post('migrate-instance')
623
+ async migrateInstance(@Body() instance: any) {
624
+ return await this.migrationService.migrateInstance(instance);
625
+ }
626
+
627
+ @Post('validate')
628
+ async validateMigration(@Body() instance: any) {
629
+ return await this.migrationService.validateMigration(instance);
630
+ }
631
+
632
+ @Delete('rollback')
633
+ async rollbackMigration(@Query('tableName') tableName: string) {
634
+ await this.migrationService.rollbackMigration(tableName);
635
+ return { message: 'Rollback completed' };
636
+ }
637
+ }
638
+ ```
639
+
640
+ ### Phase 3: CLI Implementation
641
+
642
+ #### 3.1 Create Migration Command
643
+
644
+ **File**: `arela-uploader/src/commands/MigrateCommand.js`
645
+
646
+ ```javascript
647
+ import fs from 'fs';
648
+ import path from 'path';
649
+ import chalk from 'chalk';
650
+ import cliProgress from 'cli-progress';
651
+ import { appConfig } from '../config/config.js';
652
+ import { ScanApiService } from '../services/ScanApiService.js';
653
+
654
+ export class MigrateCommand {
655
+ constructor() {
656
+ this.apiService = new ScanApiService();
657
+ }
658
+
659
+ async execute(options = {}) {
660
+ console.log(chalk.blue('\n🔄 Starting migration from uploader table to file_stats_* tables\n'));
661
+
662
+ try {
663
+ // 1. Load instance mapping
664
+ const mappingPath = options.mappingFile || './instance_mapping.json';
665
+ const mapping = JSON.parse(fs.readFileSync(mappingPath, 'utf8'));
666
+
667
+ console.log(chalk.cyan(`📋 Found ${mapping.instances.length} instances to migrate\n`));
668
+
669
+ // 2. Confirm migration
670
+ if (!options.confirm) {
671
+ console.log(chalk.yellow('⚠️ This will create new tables and migrate data.'));
672
+ console.log(chalk.yellow(' Run with --confirm to proceed.\n'));
673
+
674
+ mapping.instances.forEach((inst, idx) => {
675
+ console.log(` ${idx + 1}. ${inst.table_name} (~${inst.estimated_records.toLocaleString()} records)`);
676
+ });
677
+
678
+ return;
679
+ }
680
+
681
+ // 3. Migrate each instance
682
+ const results = [];
683
+
684
+ for (const instance of mapping.instances) {
685
+ console.log(chalk.cyan(`\n📦 Migrating: ${instance.company_slug}/${instance.server_id}/${instance.base_path_label}`));
686
+ console.log(chalk.gray(` Table: ${instance.table_name}`));
687
+ console.log(chalk.gray(` Pattern: ${instance.path_pattern}`));
688
+
689
+ const progressBar = new cliProgress.SingleBar({
690
+ format: ` 📄 {bar} {percentage}% | {value}/{total} records | {duration_formatted}`,
691
+ barCompleteChar: '\u2588',
692
+ barIncompleteChar: '\u2591',
693
+ });
694
+
695
+ try {
696
+ // Start migration via API
697
+ const startTime = Date.now();
698
+ const response = await this.apiService.migrateInstance(instance);
699
+
700
+ // Show progress (simplified - in real implementation, poll progress endpoint)
701
+ progressBar.start(response.totalRecords, 0);
702
+ progressBar.update(response.migratedRecords);
703
+ progressBar.stop();
704
+
705
+ const duration = ((Date.now() - startTime) / 1000).toFixed(1);
706
+ const throughput = Math.round(response.migratedRecords / duration);
707
+
708
+ console.log(chalk.green(` ✓ Migrated ${response.migratedRecords} records in ${duration}s (${throughput} records/sec)`));
709
+
710
+ if (response.errors > 0) {
711
+ console.log(chalk.yellow(` ⚠️ ${response.errors} errors encountered`));
712
+ }
713
+
714
+ results.push({
715
+ instance: instance.table_name,
716
+ success: true,
717
+ migrated: response.migratedRecords,
718
+ errors: response.errors,
719
+ duration,
720
+ });
721
+
722
+ // 4. Validate migration
723
+ if (options.validate) {
724
+ console.log(chalk.cyan(' 🔍 Validating migration...'));
725
+ const validation = await this.apiService.validateMigration(instance);
726
+
727
+ if (validation.valid) {
728
+ console.log(chalk.green(' ✓ Validation passed'));
729
+ } else {
730
+ console.log(chalk.red(' ✗ Validation failed:'));
731
+ validation.issues.forEach(issue => {
732
+ console.log(chalk.red(` - ${issue}`));
733
+ });
734
+ }
735
+ }
736
+ } catch (error) {
737
+ progressBar.stop();
738
+ console.log(chalk.red(` ✗ Migration failed: ${error.message}`));
739
+
740
+ results.push({
741
+ instance: instance.table_name,
742
+ success: false,
743
+ error: error.message,
744
+ });
745
+
746
+ if (!options.continueOnError) {
747
+ throw error;
748
+ }
749
+ }
750
+ }
751
+
752
+ // 5. Summary
753
+ console.log(chalk.blue('\n📊 Migration Summary\n'));
754
+
755
+ const successful = results.filter(r => r.success).length;
756
+ const failed = results.filter(r => !r.success).length;
757
+ const totalMigrated = results.reduce((sum, r) => sum + (r.migrated || 0), 0);
758
+
759
+ console.log(` Total instances: ${results.length}`);
760
+ console.log(chalk.green(` ✓ Successful: ${successful}`));
761
+ if (failed > 0) {
762
+ console.log(chalk.red(` ✗ Failed: ${failed}`));
763
+ }
764
+ console.log(` Total records migrated: ${totalMigrated.toLocaleString()}`);
765
+
766
+ // Save results
767
+ const resultPath = `migration_results_${Date.now()}.json`;
768
+ fs.writeFileSync(resultPath, JSON.stringify(results, null, 2));
769
+ console.log(chalk.gray(`\n Results saved to: ${resultPath}`));
770
+
771
+ console.log(chalk.green('\n✅ Migration complete!\n'));
772
+ } catch (error) {
773
+ console.error(chalk.red(`\n❌ Migration failed: ${error.message}\n`));
774
+ throw error;
775
+ }
776
+ }
777
+ }
778
+ ```
779
+
780
+ #### 3.2 Register Migration Command
781
+
782
+ **File**: `arela-uploader/src/index.js`
783
+
784
+ ```javascript
785
+ // Add to setupCommands() method
786
+ this.program
787
+ .command('migrate')
788
+ .description('Migrate data from uploader table to file_stats_* tables')
789
+ .option('--mapping-file <path>', 'Path to instance mapping JSON file', './instance_mapping.json')
790
+ .option('--confirm', 'Confirm migration execution')
791
+ .option('--validate', 'Validate migration results')
792
+ .option('--continue-on-error', 'Continue migration even if an instance fails')
793
+ .action(async (options) => {
794
+ const command = new MigrateCommand();
795
+ await command.execute(options);
796
+ });
797
+ ```
798
+
799
+ ### Phase 4: Execution Plan
800
+
801
+ #### 4.1 Pre-Migration Checklist
802
+
803
+ - [ ] Backup `uploader` table
804
+ ```sql
805
+ CREATE TABLE public.uploader_backup AS SELECT * FROM public.uploader;
806
+ ```
807
+ - [ ] Document current record counts and statistics
808
+ - [ ] Create `instance_mapping.json` based on analysis
809
+ - [ ] Test migration on staging environment
810
+ - [ ] Schedule maintenance window (if needed)
811
+ - [ ] Notify users of migration
812
+
813
+ #### 4.2 Migration Steps
814
+
815
+ **Step 1: Dry Run** (validate without execution)
816
+ ```bash
817
+ cd arela-uploader
818
+ node src/index.js migrate --mapping-file instance_mapping.json
819
+ ```
820
+
821
+ **Step 2: Execute Migration**
822
+ ```bash
823
+ node src/index.js migrate --mapping-file instance_mapping.json --confirm --validate
824
+ ```
825
+
826
+ **Step 3: Verify Results**
827
+ ```sql
828
+ -- Check record counts
829
+ SELECT
830
+ tablename,
831
+ schemaname,
832
+ pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
833
+ FROM pg_tables
834
+ WHERE schemaname = 'cli' AND tablename LIKE 'file_stats_%';
835
+
836
+ -- Check uploaded files
837
+ SELECT
838
+ tablename,
839
+ (SELECT COUNT(*) FROM cli.${tablename} WHERE uploaded_at IS NOT NULL) as uploaded_count
840
+ FROM pg_tables
841
+ WHERE schemaname = 'cli' AND tablename LIKE 'file_stats_%';
842
+ ```
843
+
844
+ **Step 4: Test New Commands**
845
+ ```bash
846
+ # Test identify command
847
+ arela identify --show-stats
848
+
849
+ # Test propagate command
850
+ arela propagate --show-stats
851
+
852
+ # Test push command (dry run)
853
+ arela push --show-stats
854
+ ```
855
+
856
+ #### 4.3 Post-Migration Tasks
857
+
858
+ 1. **Update Documentation**: Update all docs to reference new table structure
859
+ 2. **Archive Legacy Data**: Keep `uploader` table for reference, mark as deprecated
860
+ 3. **Monitor Performance**: Track query performance on new tables
861
+ 4. **Optimize Indexes**: Add additional indexes based on query patterns
862
+ 5. **Update Grafana Dashboards**: Point metrics to new tables
863
+
864
+ ### Phase 5: Rollback Procedure
865
+
866
+ If migration fails or issues are discovered:
867
+
868
+ **Step 1: Stop New Operations**
869
+ ```bash
870
+ # Prevent new scans/uploads
871
+ # Notify users to stop CLI operations
872
+ ```
873
+
874
+ **Step 2: Rollback Specific Instance**
875
+ ```bash
876
+ curl -X DELETE \
877
+ -H "x-api-key: $TOKEN" \
878
+ "http://localhost:3010/api/uploader/migration/rollback?tableName=file_stats_company_server_path"
879
+ ```
880
+
881
+ **Step 3: Rollback All Instances**
882
+ ```sql
883
+ -- Drop all file_stats tables
884
+ DO $$
885
+ DECLARE
886
+ r RECORD;
887
+ BEGIN
888
+ FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = 'cli' AND tablename LIKE 'file_stats_%')
889
+ LOOP
890
+ EXECUTE 'DROP TABLE IF EXISTS cli.' || r.tablename;
891
+ END LOOP;
892
+ END $$;
893
+
894
+ -- Clear registry
895
+ DELETE FROM cli.cli_registry WHERE table_name LIKE 'file_stats_%';
896
+ ```
897
+
898
+ **Step 4: Restore Legacy Operations**
899
+ ```bash
900
+ # Re-enable legacy upload command
901
+ # Verify uploader table is intact
902
+ ```
903
+
904
+ ### Phase 6: Data Cleanup
905
+
906
+ After successful migration and validation period (e.g., 30 days):
907
+
908
+ ```sql
909
+ -- Optional: Archive uploader table to separate schema
910
+ CREATE SCHEMA IF NOT EXISTS archive;
911
+ ALTER TABLE public.uploader SET SCHEMA archive;
912
+
913
+ -- Optional: Drop uploader table (after extensive validation)
914
+ -- DROP TABLE public.uploader;
915
+ ```
916
+
917
+ ## Migration Checklist
918
+
919
+ ### Pre-Migration
920
+ - [ ] Analyze existing data distribution
921
+ - [ ] Create instance mapping file
922
+ - [ ] Backup uploader table
923
+ - [ ] Deploy migration service to backend
924
+ - [ ] Deploy migration command to CLI
925
+ - [ ] Test on staging environment
926
+ - [ ] Document rollback procedures
927
+
928
+ ### During Migration
929
+ - [ ] Execute dry run
930
+ - [ ] Review dry run results
931
+ - [ ] Execute actual migration with validation
932
+ - [ ] Monitor progress and logs
933
+ - [ ] Validate record counts match
934
+ - [ ] Test sample queries on new tables
935
+
936
+ ### Post-Migration
937
+ - [ ] Verify all uploaded files preserved
938
+ - [ ] Test new CLI commands (identify, propagate, push)
939
+ - [ ] Update documentation
940
+ - [ ] Monitor performance for 48 hours
941
+ - [ ] Archive or drop legacy uploader table (after validation period)
942
+
943
+ ## Monitoring Queries
944
+
945
+ ```sql
946
+ -- Progress monitoring during migration
947
+ SELECT
948
+ table_name,
949
+ last_scan_at,
950
+ total_files,
951
+ status
952
+ FROM cli.cli_registry
953
+ WHERE table_name LIKE 'file_stats_%'
954
+ ORDER BY last_scan_at DESC;
955
+
956
+ -- Compare source vs target counts
957
+ SELECT
958
+ 'uploader' as source,
959
+ COUNT(*) as records,
960
+ COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
961
+ COUNT(*) FILTER (WHERE detected_type = 'pedimento_simplificado') as pedimentos
962
+ FROM public.uploader
963
+ UNION ALL
964
+ SELECT
965
+ 'file_stats_*' as source,
966
+ SUM(c.records),
967
+ SUM(c.uploaded),
968
+ SUM(c.pedimentos)
969
+ FROM (
970
+ SELECT
971
+ COUNT(*) as records,
972
+ COUNT(*) FILTER (WHERE uploaded_at IS NOT NULL) as uploaded,
973
+ COUNT(*) FILTER (WHERE detected_type = 'pedimento_simplificado') as pedimentos
974
+ FROM cli.file_stats_agencia_palco_nas01_documentos
975
+ -- Add UNION ALL for each file_stats table
976
+ ) c;
977
+
978
+ -- Check for orphaned records
979
+ SELECT u.id, u.original_path
980
+ FROM public.uploader u
981
+ LEFT JOIN cli.file_stats_* f ON f.id = u.id
982
+ WHERE f.id IS NULL
983
+ LIMIT 100;
984
+ ```
985
+
986
+ ## Risk Assessment
987
+
988
+ | Risk | Likelihood | Impact | Mitigation |
989
+ |------|-----------|--------|------------|
990
+ | Data loss during migration | Low | High | Full backup before migration, validate counts |
991
+ | Long migration time | Medium | Medium | Batch processing, run during off-peak hours |
992
+ | Table name conflicts | Low | Medium | Validate instance mapping before execution |
993
+ | Query performance degradation | Low | Medium | Create optimized indexes, monitor queries |
994
+ | Incomplete migration | Low | High | Validation step after each instance |
995
+ | Rollback needed | Low | High | Documented rollback procedures, keep uploader table |
996
+
997
+ ## Success Criteria
998
+
999
+ - ✅ All records from `uploader` table successfully migrated to appropriate `file_stats_*` tables
1000
+ - ✅ Record counts match between source and target (±0.1% tolerance for concurrent operations)
1001
+ - ✅ All uploaded file references preserved (uploaded_at, storage_id, upload_path)
1002
+ - ✅ All detected pedimento metadata preserved (detected_type, rfc, arela_path)
1003
+ - ✅ New CLI commands work correctly (identify, propagate, push)
1004
+ - ✅ Query performance meets or exceeds legacy performance
1005
+ - ✅ Zero data loss
1006
+ - ✅ Rollback procedures tested and documented
1007
+
1008
+ ## Timeline Estimate
1009
+
1010
+ **For 5M records across 3 instances:**
1011
+
1012
+ - Phase 1 (Analysis): 2-4 hours
1013
+ - Phase 2 (Backend Implementation): 1-2 days
1014
+ - Phase 3 (CLI Implementation): 1 day
1015
+ - Phase 4 (Testing on Staging): 1 day
1016
+ - Phase 5 (Production Migration): 2-3 hours
1017
+ - Phase 6 (Validation Period): 7-30 days
1018
+ - Phase 7 (Cleanup): 1 hour
1019
+
1020
+ **Total: 4-5 days development + validation period**