@danielarndt0/cnpj-db-loader 2.2.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -50,7 +50,7 @@ The importer is now split into focused modules so future performance work can re
50
50
  - `quarantine-writer`: stores bad rows without stopping long imports
51
51
  - `runner`: orchestrates the current import flow while keeping the service entry point small
52
52
 
53
- The project now also generates dedicated staging tables for large datasets. The CLI exposes both a one-shot command (`import`) and split commands (`import load`, `import materialize`). Staging cleanup is handled explicitly through `database cleanup staging`. The write path sends the heavy datasets to staging tables first with only light normalization, then consolidates them into a simplified final schema in dependency order while keeping the smaller catalog datasets on the final schema directly. The final schema now stays closer to the Receita layout so the API can derive richer views later without forcing every first load to pay that cost inside PostgreSQL.
53
+ The project now also generates dedicated staging tables for large datasets. The CLI exposes both a one-shot command (`import`) and split commands (`import load`, `import materialize`). Staging cleanup is handled explicitly through `database cleanup staging`. The write path sends the heavy datasets to staging tables first with only light normalization, then consolidates them into a simplified final schema in dependency order while keeping the smaller catalog datasets on the final schema directly. The final schema stays close to the Receita layout while also exposing `establishment_secondary_cnaes`, a normalized helper table that lets CNPJ API query one row per establishment secondary CNAE without requiring an external backfill after each load.
54
54
 
55
55
  ## Staging schema
56
56
 
package/docs/commands.md CHANGED
@@ -19,7 +19,7 @@
19
19
  | `database config test` | Test the connection using the saved or overridden URL. |
20
20
  | `database config reset` | Remove the saved PostgreSQL URL after confirmation. |
21
21
  | `database cleanup staging` | Truncate staging tables and optionally clear linked materialization checkpoints for a validated path. |
22
- | `database cleanup materialized` | Truncate simplified final relational tables populated by materialization in safe order for the current schema. |
22
+ | `database cleanup materialized` | Truncate simplified final relational tables populated by materialization, including establishment secondary CNAEs when available, in safe order. |
23
23
  | `database cleanup checkpoints` | Clear load checkpoints, materialization checkpoints, or both without truncating staging or final tables. |
24
24
  | `database cleanup plans` | Delete saved import plans. Related plan files and materialization checkpoints are removed by database cascade. |
25
25
  | `import <input>` | Run the full pipeline: plan, load validated files into staging/direct final targets, materialize staged datasets into final tables, and finalize the import plan. |
package/docs/usage.md CHANGED
@@ -67,6 +67,7 @@ cnpj-db-loader schema generate --profile staging
67
67
  - it persists the import plan in the database and reuses it on resume when the validated source files and batch size match
68
68
  - it reads files in streaming mode
69
69
  - it loads the large datasets into lightweight staging tables through PostgreSQL COPY with only light normalization in the hot path and defers heavier work to the materialization stage in dependency order
70
+ - during establishment materialization, it also populates `establishment_secondary_cnaes` from `secondary_cnaes_raw`, replacing the previous need for a separate API-side backfill script
70
71
  - before each staged dataset is materialized into the final schema, the importer only reconciles missing lookup/domain codes when the current final schema still requires those lookup foreign keys
71
72
  - once the file import phase ends, the terminal switches to a dedicated MATERIALIZING stage and the JSONL progress log emits heartbeat entries during long staged-to-final upserts
72
73
  - it still upserts the smaller domain datasets directly into the final schema
package/package.json CHANGED
@@ -1,10 +1,20 @@
1
1
  {
2
2
  "name": "@danielarndt0/cnpj-db-loader",
3
- "version": "2.2.0",
3
+ "version": "2.3.0",
4
4
  "publishConfig": {
5
5
  "access": "public"
6
6
  },
7
7
  "description": "Practical CLI for preparing Brazilian Federal Revenue CNPJ open data for PostgreSQL.",
8
+ "keywords": [
9
+ "cnpj",
10
+ "federal-revenue",
11
+ "cli",
12
+ "command-line",
13
+ "postgresql",
14
+ "database",
15
+ "typescript",
16
+ "nodejs"
17
+ ],
8
18
  "author": "Daniel Arndt",
9
19
  "license": "MIT",
10
20
  "type": "module",