@danielarndt0/cnpj-db-loader 2.4.0-beta.2 → 2.4.0-beta.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -6
- package/dist/cli.js +1037 -296
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +337 -297
- package/dist/index.js +879 -290
- package/dist/index.js.map +1 -1
- package/docs/commands.md +11 -1
- package/docs/federal-revenue.md +36 -2
- package/docs/postgres-direct.md +235 -41
- package/package.json +1 -1
package/docs/commands.md
CHANGED
|
@@ -2,6 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
| Command | Purpose |
|
|
4
4
|
| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
5
|
+
| `federal-revenue config set` | Persist Federal Revenue WebDAV settings such as share token, WebDAV URL, and user agent in the local config file. |
|
|
6
|
+
| `federal-revenue config show` | Show the effective Federal Revenue configuration. |
|
|
7
|
+
| `federal-revenue config test` | Test the configured Federal Revenue WebDAV connection. |
|
|
8
|
+
| `federal-revenue config reset` | Reset one or all persisted Federal Revenue settings. |
|
|
5
9
|
| `federal-revenue check` | Check the selected or latest Federal Revenue monthly CNPJ reference and list the remote ZIP files. |
|
|
6
10
|
| `federal-revenue download` | Download the selected Federal Revenue monthly CNPJ ZIP files with retries, `.part` files, and skip-on-existing behavior. |
|
|
7
11
|
| `federal-revenue status` | Read the local Federal Revenue manifest and report downloaded, failed, partial, and missing files. |
|
|
@@ -35,6 +39,8 @@
|
|
|
35
39
|
## Examples
|
|
36
40
|
|
|
37
41
|
```bash
|
|
42
|
+
cnpj-db-loader federal-revenue config set share-token "<public-share-token>"
|
|
43
|
+
cnpj-db-loader federal-revenue config test
|
|
38
44
|
cnpj-db-loader federal-revenue check
|
|
39
45
|
cnpj-db-loader federal-revenue check 2026-05
|
|
40
46
|
cnpj-db-loader federal-revenue download --output ./downloads --force
|
|
@@ -71,7 +77,7 @@ cnpj-db-loader quarantine show 42
|
|
|
71
77
|
## PostgreSQL direct import helper
|
|
72
78
|
|
|
73
79
|
```bash
|
|
74
|
-
cnpj-db-loader postgres generate-script <input> [--output <path>] [--dataset <dataset>] [--script-name <name>] [--source-encoding <encoding>] [-f]
|
|
80
|
+
cnpj-db-loader postgres generate-script <input> [--output <path>] [--dataset <dataset>] [--script-name <name>] [--source-encoding <encoding>] [--transaction-mode <mode>] [--include <items>] [--skip-indexes] [--skip-analyze] [-f]
|
|
75
81
|
cnpj-db-loader postgres export-csv <input> [--output <path>] [--dataset <dataset>] [--script-name <name>] [-f]
|
|
76
82
|
```
|
|
77
83
|
|
|
@@ -85,4 +91,8 @@ Options:
|
|
|
85
91
|
- `--dataset <dataset>`: generate only one dataset block.
|
|
86
92
|
- `--script-name <name>`: custom generated SQL script name.
|
|
87
93
|
- `--source-encoding <encoding>`: source file encoding for `psql` copy operations. Defaults to `UTF8`.
|
|
94
|
+
- `--transaction-mode <mode>`: generated transaction strategy: `single`, `phase` or `none`. Defaults to `single`.
|
|
95
|
+
- `--include <items>`: comma-separated generation targets such as `domains,companies,establishments,secondary-cnaes,analyze`.
|
|
96
|
+
- `--skip-indexes`: skip the generated indexes phase.
|
|
97
|
+
- `--skip-analyze`: skip the generated analyze phase.
|
|
88
98
|
- `-f, --force`: skip confirmation.
|
package/docs/federal-revenue.md
CHANGED
|
@@ -10,9 +10,14 @@ check/download -> extract -> validate -> sanitize -> import
|
|
|
10
10
|
|
|
11
11
|
The command also has the shorter alias `revenue`.
|
|
12
12
|
|
|
13
|
+
The public Federal Revenue share token is no longer embedded in the package. Configure it locally before using remote commands, or pass it with `--share-token` when running a command.
|
|
14
|
+
|
|
13
15
|
## Commands
|
|
14
16
|
|
|
15
17
|
```bash
|
|
18
|
+
cnpj-db-loader federal-revenue config set share-token "<public-share-token>"
|
|
19
|
+
cnpj-db-loader federal-revenue config show
|
|
20
|
+
cnpj-db-loader federal-revenue config test
|
|
16
21
|
cnpj-db-loader federal-revenue check
|
|
17
22
|
cnpj-db-loader federal-revenue download --output ./downloads --force
|
|
18
23
|
cnpj-db-loader federal-revenue status --output ./downloads
|
|
@@ -29,6 +34,34 @@ cnpj-db-loader revenue status 2026-05 --output ./downloads
|
|
|
29
34
|
cnpj-db-loader revenue retry 2026-05 --output ./downloads --force
|
|
30
35
|
```
|
|
31
36
|
|
|
37
|
+
## Configuration
|
|
38
|
+
|
|
39
|
+
Federal Revenue WebDAV settings are stored in the same local CNPJ DB Loader config file used by `database config`. This keeps endpoint-specific values outside the published npm package.
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
cnpj-db-loader federal-revenue config set share-token "<public-share-token>"
|
|
43
|
+
cnpj-db-loader federal-revenue config set webdav-url "https://arquivos.receitafederal.gov.br/public.php/webdav"
|
|
44
|
+
cnpj-db-loader federal-revenue config set user-agent "cnpj-db-loader federal-revenue-client"
|
|
45
|
+
cnpj-db-loader federal-revenue config show
|
|
46
|
+
cnpj-db-loader federal-revenue config test
|
|
47
|
+
cnpj-db-loader federal-revenue config reset share-token --force
|
|
48
|
+
cnpj-db-loader federal-revenue config reset --force
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Configuration keys:
|
|
52
|
+
|
|
53
|
+
| Key | Purpose | Default |
|
|
54
|
+
| ------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------- |
|
|
55
|
+
| `share-token` | Public Federal Revenue share token used for WebDAV Basic authentication. | No default. Must be configured or provided with `--share-token`. |
|
|
56
|
+
| `webdav-url` | Federal Revenue WebDAV endpoint. | `https://arquivos.receitafederal.gov.br/public.php/webdav` |
|
|
57
|
+
| `user-agent` | HTTP user agent used by Federal Revenue requests. | `cnpj-db-loader federal-revenue-client` |
|
|
58
|
+
|
|
59
|
+
Command-line overrides still have priority over persisted configuration:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
cnpj-db-loader federal-revenue check --share-token "<token>" --base-url "https://arquivos.receitafederal.gov.br/public.php/webdav"
|
|
63
|
+
```
|
|
64
|
+
|
|
32
65
|
## Reference selection
|
|
33
66
|
|
|
34
67
|
By default, remote commands list the public share and select the latest available folder in the `YYYY-MM` format.
|
|
@@ -215,8 +248,9 @@ cnpj-db-loader federal-revenue sync \
|
|
|
215
248
|
| `--load-batch-size <size>` | `sync` | Import load batch size. |
|
|
216
249
|
| `--materialize-batch-size <size>` | `sync` | Materialization chunk size. |
|
|
217
250
|
| `--verbose-progress` | `sync` | Show detailed import progress. |
|
|
218
|
-
| `--base-url <url>` | `check`, `download`, `retry`, `sync`
|
|
219
|
-
| `--share-token <token>` | `check`, `download`, `retry`, `sync`
|
|
251
|
+
| `--base-url <url>` | `check`, `download`, `status`, `retry`, `clean`, `sync` | Override the WebDAV base URL. |
|
|
252
|
+
| `--share-token <token>` | `check`, `download`, `status`, `retry`, `clean`, `sync` | Override the public share token. |
|
|
253
|
+
| `--user-agent <value>` | `check`, `download`, `status`, `retry`, `clean`, `sync` | Override the HTTP user agent. |
|
|
220
254
|
| `--force` | `download`, `retry`, `clean`, `sync` | Skip confirmation prompts. |
|
|
221
255
|
|
|
222
256
|
## Notes
|
package/docs/postgres-direct.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
The PostgreSQL direct import workflow is a hybrid path for environments where the standard resumable importer is too expensive for a full monthly load.
|
|
4
4
|
|
|
5
|
-
It keeps the safe preparation steps inside CNPJ DB Loader and moves the heaviest database load/materialization work into
|
|
5
|
+
It keeps the safe preparation steps inside CNPJ DB Loader and moves the heaviest database load/materialization work into generated `psql` scripts.
|
|
6
6
|
|
|
7
7
|
## Intended flow
|
|
8
8
|
|
|
@@ -11,8 +11,8 @@ cnpj-db-loader federal-revenue download --output ./downloads
|
|
|
11
11
|
cnpj-db-loader extract ./downloads/<reference>
|
|
12
12
|
cnpj-db-loader validate ./downloads/<reference>/extracted
|
|
13
13
|
cnpj-db-loader sanitize ./downloads/<reference>/extracted
|
|
14
|
-
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --source-encoding UTF8 --force
|
|
15
|
-
psql "postgres://postgres:postgres@localhost:5432/cnpj" -f ./downloads/<reference>/postgres-direct/import-postgres-direct.sql
|
|
14
|
+
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --source-encoding UTF8 --transaction-mode phase --force
|
|
15
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./downloads/<reference>/postgres-direct/import-postgres-direct.sql
|
|
16
16
|
```
|
|
17
17
|
|
|
18
18
|
The loader remains responsible for:
|
|
@@ -22,7 +22,7 @@ The loader remains responsible for:
|
|
|
22
22
|
- validation
|
|
23
23
|
- sanitization
|
|
24
24
|
- preserving the sanitized Receita files without rewriting the whole dataset
|
|
25
|
-
- generating the
|
|
25
|
+
- generating the modular `psql` import scripts
|
|
26
26
|
- optionally exporting PostgreSQL-ready CSV files through `postgres export-csv` when an audit/debug CSV tree is useful
|
|
27
27
|
|
|
28
28
|
PostgreSQL is then responsible for:
|
|
@@ -34,18 +34,10 @@ PostgreSQL is then responsible for:
|
|
|
34
34
|
- `establishment_secondary_cnaes` materialization
|
|
35
35
|
- planner statistics refresh through `ANALYZE`
|
|
36
36
|
|
|
37
|
-
## Why this exists
|
|
38
|
-
|
|
39
|
-
The standard `import` command is safer and resumable, but it keeps more orchestration inside the Node.js process. That is useful for production safety, checkpoints, quarantine and incremental recovery.
|
|
40
|
-
|
|
41
|
-
The direct PostgreSQL path is optimized for bulk loading after the input files have already been sanitized. It avoids per-batch Node.js database inserts and avoids rewriting the full dataset into a second CSV tree. Instead, `psql` streams the sanitized Receita files into temporary text tables and PostgreSQL performs the value conversion and materialization with set-based SQL.
|
|
42
|
-
|
|
43
|
-
Use this when you want to benchmark or run a faster controlled load on a local machine.
|
|
44
|
-
|
|
45
37
|
## Command
|
|
46
38
|
|
|
47
39
|
```bash
|
|
48
|
-
cnpj-db-loader postgres generate-script <input> [--output <path>] [--dataset <dataset>] [--script-name <name>] [--source-encoding <encoding>] [-f]
|
|
40
|
+
cnpj-db-loader postgres generate-script <input> [--output <path>] [--dataset <dataset>] [--script-name <name>] [--source-encoding <encoding>] [--transaction-mode <mode>] [--include <items>] [--skip-indexes] [--skip-analyze] [-f]
|
|
49
41
|
```
|
|
50
42
|
|
|
51
43
|
### Arguments
|
|
@@ -58,57 +50,259 @@ cnpj-db-loader postgres generate-script <input> [--output <path>] [--dataset <da
|
|
|
58
50
|
|
|
59
51
|
| Option | Description |
|
|
60
52
|
| ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
61
|
-
| `--output <path>` | Custom output directory for the generated SQL
|
|
62
|
-
| `--dataset <dataset>` | Generate
|
|
63
|
-
| `--script-name <name>` | Name of the generated
|
|
53
|
+
| `--output <path>` | Custom output directory for the generated SQL scripts and manifest. |
|
|
54
|
+
| `--dataset <dataset>` | Generate scripts only for one dataset block. Useful for debugging. |
|
|
55
|
+
| `--script-name <name>` | Name of the generated orchestrator script. Defaults to `import-postgres-direct.sql`. |
|
|
64
56
|
| `--source-encoding <encoding>` | Source file encoding used by `psql` while reading the sanitized Receita files. Defaults to `UTF8` because the current `sanitize` command writes UTF-8 output. Use `WIN1252` or `LATIN1` only for legacy sanitized files generated by older versions. |
|
|
57
|
+
| `--transaction-mode <mode>` | Transaction strategy for generated scripts: `single`, `phase` or `none`. Defaults to `single`. |
|
|
58
|
+
| `--include <items>` | Comma-separated steps to include: `domains`, `companies`, `establishments`, `partners`, `simples`, `secondary-cnaes`, `indexes`, `analyze`. |
|
|
59
|
+
| `--skip-indexes` | Do not generate the `indexes.sql` step. |
|
|
60
|
+
| `--skip-analyze` | Do not generate the `analyze.sql` step. |
|
|
65
61
|
| `-f, --force` | Skip the confirmation prompt. |
|
|
66
62
|
|
|
63
|
+
## Transaction modes
|
|
64
|
+
|
|
65
|
+
### `single`
|
|
66
|
+
|
|
67
|
+
The orchestrator wraps all included steps in one transaction:
|
|
68
|
+
|
|
69
|
+
```text
|
|
70
|
+
BEGIN
|
|
71
|
+
setup
|
|
72
|
+
load domains
|
|
73
|
+
load companies
|
|
74
|
+
load establishments
|
|
75
|
+
load partners
|
|
76
|
+
load simples
|
|
77
|
+
materialize
|
|
78
|
+
materialize secondary CNAEs
|
|
79
|
+
indexes
|
|
80
|
+
analyze
|
|
81
|
+
COMMIT
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
This is the safest mode because a failure rolls back the whole run, but it is also the least convenient for very large imports because a late failure requires starting over.
|
|
85
|
+
|
|
86
|
+
### `phase`
|
|
87
|
+
|
|
88
|
+
Each generated phase script wraps its own work in a transaction.
|
|
89
|
+
|
|
90
|
+
This is the recommended mode for long local runs because completed phases remain committed if a later phase fails.
|
|
91
|
+
|
|
92
|
+
### `none`
|
|
93
|
+
|
|
94
|
+
No generated transaction wrapper is added.
|
|
95
|
+
|
|
96
|
+
This mode is useful for aggressive benchmark scenarios, but it can leave partial data if a command fails.
|
|
97
|
+
|
|
67
98
|
## Output structure
|
|
68
99
|
|
|
69
|
-
The command creates a
|
|
100
|
+
The command now creates a modular PostgreSQL direct output directory:
|
|
70
101
|
|
|
71
102
|
```text
|
|
72
103
|
postgres-direct/
|
|
73
104
|
manifest.json
|
|
74
105
|
import-postgres-direct.sql
|
|
106
|
+
setup.sql
|
|
107
|
+
load-domains.sql
|
|
108
|
+
load-companies.sql
|
|
109
|
+
load-establishments.sql
|
|
110
|
+
load-partners.sql
|
|
111
|
+
load-simples.sql
|
|
112
|
+
materialize.sql
|
|
113
|
+
materialize-secondary-cnaes.sql
|
|
114
|
+
indexes.sql
|
|
115
|
+
analyze.sql
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
The `import-postgres-direct.sql` file is an orchestrator that runs the included phase scripts in the correct order with `\ir`.
|
|
119
|
+
|
|
120
|
+
You can execute the full flow:
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./postgres-direct/import-postgres-direct.sql
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Or execute individual phase scripts:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./postgres-direct/load-domains.sql
|
|
130
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./postgres-direct/load-establishments.sql
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## Partial generation
|
|
134
|
+
|
|
135
|
+
Generate only domain scripts:
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --include domains --transaction-mode phase --force
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Generate without indexes and analyze:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --skip-indexes --skip-analyze --force
|
|
75
145
|
```
|
|
76
146
|
|
|
77
|
-
|
|
147
|
+
Generate only establishments and secondary CNAEs:
|
|
78
148
|
|
|
79
|
-
|
|
149
|
+
```bash
|
|
150
|
+
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --include establishments,secondary-cnaes,analyze --transaction-mode phase --force
|
|
151
|
+
```
|
|
80
152
|
|
|
81
153
|
## Generated script behavior
|
|
82
154
|
|
|
83
|
-
The generated
|
|
155
|
+
The generated scripts:
|
|
84
156
|
|
|
85
|
-
1.
|
|
86
|
-
2.
|
|
87
|
-
3.
|
|
88
|
-
4.
|
|
89
|
-
5.
|
|
90
|
-
6.
|
|
91
|
-
7.
|
|
92
|
-
8.
|
|
93
|
-
9.
|
|
94
|
-
10.
|
|
95
|
-
11. runs `ANALYZE` on the main final tables;
|
|
96
|
-
12. commits the transaction.
|
|
157
|
+
1. enable `ON_ERROR_STOP` for `psql`;
|
|
158
|
+
2. set the configured client encoding for `psql` copy operations;
|
|
159
|
+
3. load domain datasets from sanitized Receita files into temporary raw text tables;
|
|
160
|
+
4. upsert final domain tables;
|
|
161
|
+
5. load large datasets from sanitized Receita files into temporary raw text tables;
|
|
162
|
+
6. convert values inside PostgreSQL and insert them into `staging_companies`, `staging_establishments`, `staging_partners` and `staging_simples_options`;
|
|
163
|
+
7. materialize final `companies`, `establishments`, `partners` and `simples_options` tables using set-based SQL;
|
|
164
|
+
8. populate `establishment_secondary_cnaes` from `secondary_cnaes_raw`;
|
|
165
|
+
9. optionally generate an indexes phase;
|
|
166
|
+
10. optionally run `ANALYZE` on the affected tables.
|
|
97
167
|
|
|
98
|
-
The
|
|
168
|
+
The scripts do not recreate the schema. Run the normal schema first:
|
|
99
169
|
|
|
100
170
|
```bash
|
|
101
171
|
cnpj-db-loader schema generate --profile full --output ./sql/schema.sql
|
|
102
|
-
psql "postgres://postgres:postgres@localhost:5432/cnpj" -f ./sql/schema.sql
|
|
172
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./sql/schema.sql
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## Monitoring PostgreSQL while the import runs
|
|
176
|
+
|
|
177
|
+
The hybrid mode intentionally keeps loader checkpoints lightweight. Use PostgreSQL native views to monitor heavy work.
|
|
178
|
+
|
|
179
|
+
### Active queries
|
|
180
|
+
|
|
181
|
+
```sql
|
|
182
|
+
SELECT
|
|
183
|
+
pid,
|
|
184
|
+
now() - query_start AS duration,
|
|
185
|
+
state,
|
|
186
|
+
wait_event_type,
|
|
187
|
+
wait_event,
|
|
188
|
+
left(query, 200) AS query
|
|
189
|
+
FROM pg_stat_activity
|
|
190
|
+
WHERE datname = current_database()
|
|
191
|
+
AND state <> 'idle'
|
|
192
|
+
ORDER BY query_start;
|
|
103
193
|
```
|
|
104
194
|
|
|
105
|
-
|
|
195
|
+
### COPY progress
|
|
196
|
+
|
|
197
|
+
```sql
|
|
198
|
+
SELECT
|
|
199
|
+
pid,
|
|
200
|
+
command,
|
|
201
|
+
type,
|
|
202
|
+
bytes_processed,
|
|
203
|
+
bytes_total,
|
|
204
|
+
tuples_processed,
|
|
205
|
+
CASE
|
|
206
|
+
WHEN bytes_total > 0
|
|
207
|
+
THEN round((bytes_processed::numeric / bytes_total::numeric) * 100, 2)
|
|
208
|
+
ELSE NULL
|
|
209
|
+
END AS percent
|
|
210
|
+
FROM pg_stat_progress_copy;
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
With auto-refresh in `psql`:
|
|
214
|
+
|
|
215
|
+
```sql
|
|
216
|
+
SELECT
|
|
217
|
+
pid,
|
|
218
|
+
command,
|
|
219
|
+
type,
|
|
220
|
+
bytes_processed,
|
|
221
|
+
bytes_total,
|
|
222
|
+
tuples_processed,
|
|
223
|
+
CASE
|
|
224
|
+
WHEN bytes_total > 0
|
|
225
|
+
THEN round((bytes_processed::numeric / bytes_total::numeric) * 100, 2)
|
|
226
|
+
ELSE NULL
|
|
227
|
+
END AS percent
|
|
228
|
+
FROM pg_stat_progress_copy;
|
|
229
|
+
\watch 5
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### Locks
|
|
233
|
+
|
|
234
|
+
```sql
|
|
235
|
+
SELECT
|
|
236
|
+
blocked.pid AS blocked_pid,
|
|
237
|
+
blocking.pid AS blocking_pid,
|
|
238
|
+
blocked_activity.query AS blocked_query,
|
|
239
|
+
blocking_activity.query AS blocking_query
|
|
240
|
+
FROM pg_catalog.pg_locks blocked
|
|
241
|
+
JOIN pg_catalog.pg_stat_activity blocked_activity
|
|
242
|
+
ON blocked_activity.pid = blocked.pid
|
|
243
|
+
JOIN pg_catalog.pg_locks blocking
|
|
244
|
+
ON blocking.locktype = blocked.locktype
|
|
245
|
+
AND blocking.database IS NOT DISTINCT FROM blocked.database
|
|
246
|
+
AND blocking.relation IS NOT DISTINCT FROM blocked.relation
|
|
247
|
+
AND blocking.page IS NOT DISTINCT FROM blocked.page
|
|
248
|
+
AND blocking.tuple IS NOT DISTINCT FROM blocked.tuple
|
|
249
|
+
AND blocking.virtualxid IS NOT DISTINCT FROM blocked.virtualxid
|
|
250
|
+
AND blocking.transactionid IS NOT DISTINCT FROM blocked.transactionid
|
|
251
|
+
AND blocking.classid IS NOT DISTINCT FROM blocked.classid
|
|
252
|
+
AND blocking.objid IS NOT DISTINCT FROM blocked.objid
|
|
253
|
+
AND blocking.objsubid IS NOT DISTINCT FROM blocked.objsubid
|
|
254
|
+
AND blocking.pid <> blocked.pid
|
|
255
|
+
JOIN pg_catalog.pg_stat_activity blocking_activity
|
|
256
|
+
ON blocking_activity.pid = blocking.pid
|
|
257
|
+
WHERE NOT blocked.granted;
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Main table sizes
|
|
261
|
+
|
|
262
|
+
```sql
|
|
263
|
+
SELECT
|
|
264
|
+
relname AS table_name,
|
|
265
|
+
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
|
|
266
|
+
FROM pg_catalog.pg_statio_user_tables
|
|
267
|
+
WHERE relname IN (
|
|
268
|
+
'companies',
|
|
269
|
+
'establishments',
|
|
270
|
+
'partners',
|
|
271
|
+
'simples_options',
|
|
272
|
+
'establishment_secondary_cnaes'
|
|
273
|
+
)
|
|
274
|
+
ORDER BY pg_total_relation_size(relid) DESC;
|
|
275
|
+
```
|
|
106
276
|
|
|
107
|
-
|
|
277
|
+
### Estimated rows by table
|
|
278
|
+
|
|
279
|
+
```sql
|
|
280
|
+
SELECT
|
|
281
|
+
relname AS table_name,
|
|
282
|
+
n_live_tup AS estimated_rows,
|
|
283
|
+
n_dead_tup AS dead_rows,
|
|
284
|
+
last_analyze,
|
|
285
|
+
last_autoanalyze
|
|
286
|
+
FROM pg_stat_user_tables
|
|
287
|
+
ORDER BY n_live_tup DESC;
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### PostgreSQL logs on Windows
|
|
108
291
|
|
|
109
|
-
|
|
292
|
+
```powershell
|
|
293
|
+
Get-ChildItem "C:\Program Files\PostgreSQL\16\data\log" |
|
|
294
|
+
Sort-Object LastWriteTime -Descending |
|
|
295
|
+
Select-Object -First 1 |
|
|
296
|
+
Get-Content -Tail 120
|
|
297
|
+
```
|
|
110
298
|
|
|
111
|
-
|
|
299
|
+
Event Viewer logs through PowerShell:
|
|
300
|
+
|
|
301
|
+
```powershell
|
|
302
|
+
Get-EventLog -LogName Application -Newest 80 |
|
|
303
|
+
Where-Object { $_.Source -like "*postgres*" -or $_.Message -like "*PostgreSQL*" } |
|
|
304
|
+
Format-List TimeGenerated, Source, EntryType, Message
|
|
305
|
+
```
|
|
112
306
|
|
|
113
307
|
## Windows usage
|
|
114
308
|
|
|
@@ -119,7 +313,7 @@ This is intentional. With `\copy`, the `psql` client reads local files and strea
|
|
|
119
313
|
Example:
|
|
120
314
|
|
|
121
315
|
```powershell
|
|
122
|
-
psql "postgres://postgres:postgres@localhost:5432/cnpj" -f "D:/cnpj-data/2026-05/postgres-direct/import-postgres-direct.sql"
|
|
316
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f "D:/cnpj-data/2026-05/postgres-direct/import-postgres-direct.sql"
|
|
123
317
|
```
|
|
124
318
|
|
|
125
319
|
## Recommended comparison benchmark
|
|
@@ -131,8 +325,8 @@ To compare the standard and hybrid paths:
|
|
|
131
325
|
cnpj-db-loader import ./downloads/<reference>/sanitized --load-batch-size 500 --materialize-batch-size 50000 --verbose-progress
|
|
132
326
|
|
|
133
327
|
# Hybrid path
|
|
134
|
-
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --source-encoding UTF8 --force
|
|
135
|
-
psql "postgres://postgres:postgres@localhost:5432/cnpj" -f ./downloads/<reference>/postgres-direct/import-postgres-direct.sql
|
|
328
|
+
cnpj-db-loader postgres generate-script ./downloads/<reference>/sanitized --output ./downloads/<reference>/postgres-direct --source-encoding UTF8 --transaction-mode phase --force
|
|
329
|
+
psql -d "postgres://postgres:postgres@localhost:5432/cnpj" -f ./downloads/<reference>/postgres-direct/import-postgres-direct.sql
|
|
136
330
|
```
|
|
137
331
|
|
|
138
332
|
Compare total duration, disk usage, PostgreSQL CPU usage, WAL growth and final row counts.
|