dbt-js 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,325 +1,365 @@
1
- # dbt-js
2
-
3
- A minimalist dbt-like SQL transformation tool for Postgres, MySQL, SQLite, and DuckDB. Models are plain SQL `SELECT` files; dbt-js compiles them (resolving `ref()` / `source()` / `var()`), builds a dependency DAG, and executes everything inside the database in dependency order. Like dbt, it is transformation-only — it never extracts or moves data; raw data must already be in your database (or, with DuckDB, in files it can read in place).
4
-
5
- Five dependencies: `pg`, `mysql2`, `better-sqlite3`, `@duckdb/node-api`, and `csv-parse` — the database drivers are loaded lazily, so each backend only pays for its own. Plain ESM JavaScript, no build step.
6
-
7
- ## Install
8
-
9
- ```sh
10
- npm install -g dbt-js # global CLI: dbt-js <command>
11
- npx dbt-js debug # or run without installing
12
- npm install dbt-js # as a library, for embedding (see below)
13
- ```
14
-
15
- Requires Node.js >= 20.
16
-
17
- ## Quick start
18
-
19
- The published package ships just the CLI and library; the runnable examples live in the
20
- [repository](https://github.com/<you>/dbt-js). Clone it to try the fully self-contained
21
- DuckDB example (no database server needed):
22
-
23
- ```sh
24
- git clone https://github.com/<you>/dbt-js && cd dbt-js
25
- npm install
26
- cd example-duckdb
27
- node ../bin/dbt-js.js debug # check config + connectivity
28
- node ../bin/dbt-js.js seed # load seeds/*.csv
29
- node ../bin/dbt-js.js run # build all models in DAG order
30
- node ../bin/dbt-js.js test # run data tests
31
- ```
32
-
33
- (With `dbt-js` installed globally, the commands are just `dbt-js debug`, `dbt-js run`, etc.)
34
- `example/` is the same project targeting a Postgres server instead, `example-mysql/` targets MySQL (a one-line Docker server is in its README), and `example-sqlite/` targets SQLite (also serverless).
35
-
36
- ## Project layout
37
-
38
- A dbt-js project is a directory containing:
39
-
40
- ```
41
- dbtjs.config.json # connection, target schema, sources, vars
42
- models/*.sql # one SELECT per file; filename = model name
43
- seeds/*.csv # one table per file; filename = table name
44
- ```
45
-
46
- ### dbtjs.config.json
47
-
48
- ```json
49
- {
50
- "connection": {
51
- "host": "localhost",
52
- "port": 5432,
53
- "user": "me",
54
- "password": "${DBTJS_PASSWORD}",
55
- "database": "mydb"
56
- },
57
- "schema": "analytics",
58
- "sources": { "raw": { "schema": "public" } },
59
- "vars": { "start": null },
60
- "seeds": { "columnTypes": { "my_seed": { "joined_on": "date" } } }
61
- }
62
- ```
63
-
64
- For MySQL, the same shape with `"type": "mysql"` (`port` defaults to 3306):
65
-
66
- ```json
67
- {
68
- "connection": {
69
- "type": "mysql",
70
- "host": "localhost",
71
- "user": "me",
72
- "password": "${DBTJS_PASSWORD}",
73
- "database": "mydb"
74
- },
75
- "schema": "analytics"
76
- }
77
- ```
78
-
79
- For DuckDB and SQLite, the connection is just a file path (the warehouse is an embedded local file):
80
-
81
- ```json
82
- {
83
- "connection": { "type": "duckdb", "path": "./warehouse.duckdb" },
84
- "schema": "analytics"
85
- }
86
- ```
87
-
88
- ```json
89
- {
90
- "connection": { "type": "sqlite", "path": "./warehouse.db" },
91
- "schema": "analytics"
92
- }
93
- ```
94
-
95
- - `connection.type` is `"postgres"` (default), `"mysql"`, `"sqlite"`, or `"duckdb"`.
96
- - `${NAME}` in connection values is replaced from the environment (error if unset). Omit `password` entirely to let `pg` use `PGPASSWORD`.
97
- - `schema` is where all models and seeds are created (`CREATE SCHEMA IF NOT EXISTS` runs automatically).
98
- - `sources` maps a source name to a schema, used by `{{ source('name', 'table') }}`.
99
- - `vars` are defaults, overridable per-invocation with `--vars '{"start": "2026-06-01"}'`.
100
- - `seeds.columnTypes` overrides inferred CSV column types (the escape hatch for dates/timestamps).
101
-
102
- ## Models
103
-
104
- A model is a single `SELECT`. Configuration lives in one leading block comment with a JSON body:
105
-
106
- ```sql
107
- /* config: {
108
- "materialized": "incremental",
109
- "strategy": "delete+insert",
110
- "unique_key": "day",
111
- "tests": { "day": ["not_null", "unique"] }
112
- } */
113
- select ...
114
- ```
115
-
116
- No config comment means `{ "materialized": "view" }`.
117
-
118
- ### Templating
119
-
120
- | Expression | Becomes |
121
- |---|---|
122
- | `{{ ref('other_model') }}` | `"schema"."other_model"` — and declares a DAG dependency |
123
- | `{{ this }}` | the current model's own table (for incremental high-water marks) |
124
- | `{{ source('raw', 'orders') }}` | `"public"."orders"` (schema from `sources` config) |
125
- | `{{ var('start') }}` / `{{ var('x', 0) }}` | the var's value, or the default; error if neither. Inserted verbatim — quote it yourself in SQL |
126
- | `{{ batch_start }}` / `{{ batch_end }}` | the current batch window as `YYYY-MM-DD HH:MM:SS` (microbatch models only). Inserted verbatim — quote it yourself |
127
- | `{% if is_incremental() %} ... {% endif %}` | body included only on incremental runs (table exists, not `--full-refresh`) |
128
-
129
- That's the whole template language. Anything else inside `{{ }}` / `{% %}` is a compile error.
130
-
131
- ### Materializations
132
-
133
- - **view** (default): `CREATE OR REPLACE VIEW`
134
- - **table**: transactional `DROP TABLE ... CASCADE; CREATE TABLE ... AS SELECT` (atomic to readers; CASCADE-dropped downstream views are rebuilt later in the same run — for partial runs use `--select model+`)
135
- - **incremental**: first run (or `--full-refresh`) builds like a table; after that only the rows your SELECT returns are applied, via a strategy:
136
- - `append` plain `INSERT INTO ... SELECT` (immutable event data)
137
- - `delete+insert` requires `unique_key` (string or array); deletes matching keys then inserts, in one transaction (idempotent re-runs)
138
- - `microbatch` splits the event-time range into aligned windows and replaces each window in its own transaction (see below)
139
-
140
- ### Hooks
141
-
142
- `pre_hook` / `post_hook` run extra SQL around a model's build — grants, indexes, `ANALYZE`, audit rows. Each is a string or array of strings, rendered with the same template language as the model body (everything except `batch_start` / `batch_end`):
143
-
144
- ```sql
145
- /* config: {
146
- "materialized": "table",
147
- "post_hook": [
148
- "create index if not exists idx_daily_revenue_day on {{ this }} (day)",
149
- "grant select on {{ this }} to reporting"
150
- ]
151
- } */
152
- select ...
153
- ```
154
-
155
- - Order: all pre-hooks → materialization → all post-hooks, each hook as its own statement.
156
- - One deliberate divergence from dbt: hooks run **outside** the materialization transaction, so they can use statements Postgres forbids inside one (`VACUUM`, `CREATE INDEX CONCURRENTLY`). A failing pre-hook aborts the model before any build; a failing post-hook marks the model FAIL but the built relation remains fix the hook and re-run.
157
- - Microbatch models run hooks once per model (pre-hooks before the first batch, post-hooks after the last), not per batch; post-hooks are skipped when any batch failed.
158
- - `{{ ref('x') }}` inside a hook declares a DAG dependency, same as in the body.
159
-
160
- ### Incremental pattern + backfill
161
-
162
- ```sql
163
- select date_trunc('day', created_at)::date as day, count(*) as orders
164
- from {{ ref('orders_enriched') }}
165
- {% if is_incremental() %}
166
- where created_at >= coalesce(
167
- nullif('{{ var("start", "") }}', '')::timestamptz,
168
- (select max(day) from {{ this }})::timestamptz)
169
- {% endif %}
170
- group by 1
171
- ```
172
-
173
- - Normal run: processes from the table's own high-water mark (`max(day)`).
174
- - Backfill: `dbt-js run --select daily_revenue --vars '{"start": "2026-01-01"}'` re-derives from that date; `delete+insert` makes it idempotent.
175
- - Full rebuild: `dbt-js run --select daily_revenue --full-refresh`.
176
-
177
- ### Microbatch (dbt 1.9-style)
178
-
179
- For batched, retryable backfills, use `strategy: "microbatch"`. dbt-js splits the time range into `batch_size` windows and runs each as its own transaction: `DELETE` the target rows whose `event_time` falls in the window, then `INSERT` the batch's rows. A failed batch is reported and the rest keep running.
180
-
181
- ```sql
182
- /* config: {
183
- "materialized": "incremental",
184
- "strategy": "microbatch",
185
- "event_time": "day",
186
- "begin": "2026-01-01",
187
- "batch_size": "day",
188
- "lookback": 1
189
- } */
190
- select date_trunc('day', created_at)::date as day, count(*) as orders
191
- from {{ ref('orders_enriched') }}
192
- where created_at >= '{{ batch_start }}'::timestamptz
193
- and created_at < '{{ batch_end }}'::timestamptz
194
- group by 1
195
- ```
196
-
197
- - `event_time` — column **of this model's output** bounding each batch (used by the engine's per-window DELETE).
198
- - `begin` — start of history; first run and `--full-refresh` build every batch from here.
199
- - `batch_size` — `hour` | `day` | `month` | `year`. Boundaries align to the model's `timezone` (default UTC).
200
- - `lookback` (default 1) — a normal run reprocesses the current batch plus this many previous ones (no high-water mark, same as dbt).
201
- - Backfill: `dbt-js run --select my_model --event-time-start 2026-06-02 --event-time-end 2026-06-04` rewrites exactly those windows (whole batches; end is exclusive). Idempotent by construction.
202
- - No `is_incremental()` needed — the `batch_start`/`batch_end` filter applies on every run, including the first.
203
- - If batches fail, the model exits FAIL listing the failed windows and the exact `--event-time-start/--event-time-end` retry command; other batches' work is kept.
204
-
205
- One deliberate divergence from dbt: dbt auto-filters upstream `ref()`s by their declared `event_time`; dbt-js does no hidden query rewriting — you filter your input yourself with `{{ batch_start }}` / `{{ batch_end }}`.
206
-
207
- ### Timezone
208
-
209
- Any model may set `"timezone"` in its config (a string IANA zone, default `"UTC"`):
210
-
211
- - For microbatch models it aligns each window to that zone's wall-clock. `{{ batch_start }}` / `{{ batch_end }}` are emitted as naive `YYYY-MM-DD HH:MM:SS` **wall-clock strings in that zone**, so they compare directly against a locally-stored `event_time` column. A `"day"` batch in `"America/New_York"` therefore spans local midnight-to-midnight, not UTC.
212
- - `{{ timezone }}` is available in **any** model's SQL (raw substitution — quote it yourself, e.g. `created_at at time zone '{{ timezone }}'`).
213
- - `begin`, `--event-time-start`, and `--event-time-end` given as naive strings are interpreted as wall-clock in the model's `timezone`; strings with an explicit `Z`/offset stay absolute.
214
- - DST caveat: with `batch_size: "hour"` in a DST zone the spring-forward/fall-back hour is irregular prefer UTC for hour-grain, or day+ grain for zoned models.
215
-
216
- ## Tests
217
-
218
- Declared per column in the model's config. Each compiles to a query returning violating rows; any row fails the test (exit code 1, with up to 10 sample rows printed).
219
-
220
- - `"not_null"` — rows where the column is NULL
221
- - `"unique"` — non-NULL values appearing more than once
222
- - `{ "accepted_values": ["a", "b"] }` — non-NULL values outside the list
223
-
224
- ## Seeds
225
-
226
- `dbt-js seed` loads each `seeds/*.csv` as a table (drop + create + insert, transactional). Column types are inferred (`integer`/`bigint`/`numeric`/`boolean`, else `text`; empty string → NULL); override per column via `seeds.columnTypes`. Models can `{{ ref('seed_name') }}` seeds.
227
-
228
- ## CLI
229
-
230
- ```
231
- dbt-js run [--select SPEC] [--full-refresh] [--vars JSON]
232
- [--event-time-start TS] [--event-time-end TS] # microbatch backfill window
233
- dbt-js test [--select SPEC] [--vars JSON]
234
- dbt-js seed [--select SPEC]
235
- dbt-js compile [--select SPEC] [--vars JSON] # print compiled SQL, no DB needed
236
- dbt-js ls # nodes in execution order
237
- dbt-js debug # config + connectivity check
238
- ```
239
-
240
- `--select` accepts comma-separated names; `+name` adds everything upstream, `name+` everything downstream (e.g. `--select orders_enriched+` rebuilds it and its dependents).
241
-
242
- On failure, downstream models are skipped and reported; exit code is 1 if anything failed.
243
-
244
- ## Embedding in a Node.js app
245
-
246
- The CLI is a thin wrapper over a programmatic API — `example-embed/` is a runnable ~70-line server using it. Install dbt-js as a dependency:
247
-
248
- ```sh
249
- npm install dbt-js
250
- ```
251
-
252
- ```js
253
- import { run, test, seed, compile, ls, debug } from 'dbt-js';
254
-
255
- const result = await run({
256
- projectDir: './analytics', // dir containing dbtjs.config.json always pass this
257
- select: 'daily_revenue+', // optional, same syntax as --select
258
- vars: { start: '2026-06-01' }, // optional, plain object (not a JSON string)
259
- fullRefresh: false,
260
- onEvent: (e) => logger.info(e), // optional progress stream; omit for silence
261
- });
262
- // result = { ok, models: [{ name, status: 'ok'|'fail'|'skip', action, rowCount,
263
- // batchCount, failedBatches, durationMs, error }] }
264
- ```
265
-
266
- The project can also be supplied inline instead of from files — handy when connection settings live in your app's config system or model SQL is generated:
267
-
268
- ```js
269
- await run({
270
- config: { // contents of dbtjs.config.json (file not read)
271
- connection: { host: 'db', port: 5432, user: 'analytics', password: process.env.PW, database: 'warehouse' },
272
- schema: 'analytics',
273
- sources: { raw: { schema: 'public' } },
274
- },
275
- models: { // replaces models/*.sql — same format, config comment included
276
- stg_orders: "select * from {{ source('raw', 'orders') }} where deleted = false",
277
- order_counts: "/* config: { \"materialized\": \"table\" } */ select count(*) as n from {{ ref('stg_orders') }}",
278
- },
279
- });
280
- ```
281
-
282
- With both given, `projectDir` is optional — it then only anchors relative DuckDB paths and locates `seeds/` (file seeds remain `ref()`-able from inline models). Inline `config` goes through the same validation and `${ENV}` interpolation as the file; your object is not mutated.
283
-
284
- - `run` also takes `eventTimeStart` / `eventTimeEnd` for microbatch backfills. `test` → `{ ok, tests: [{ id, pass, violations, sample }] }`; `seed` → `{ ok, seeds: [...] }`; `compile` → `[{ name, materialized, sql, preHookSql, postHookSql }]` (no DB needed); `ls` → `[{ name, kind, deps }]`; `debug` → connectivity info.
285
- - Config or project errors **throw**; model/test failures come back as `ok: false` (mirrors the CLI's exit code 1).
286
- - Every call opens its own connection and closes it before returning — nothing to pool.
287
- - **Serialize runs yourself** (a one-promise queue is enough — see `example-embed/server.js`): DuckDB allows a single writer per file, so a scheduled refresh and an HTTP-triggered run must not overlap.
288
- - Relative paths are anchored to `projectDir`, not your app's cwd: the DuckDB `connection.path` is resolved against it, and `read_csv('data/...')`-style paths in model SQL resolve via DuckDB's `file_search_path`.
289
-
290
- ## DuckDB notes
291
-
292
- - `sources` resolve to schemas inside the same `.duckdb` file, exactly like Postgres schemas.
293
- - Models can call DuckDB-native readers directly — `from read_csv('data/orders.csv')` or `read_parquet('...')` — no template syntax needed; raw data files never pass through dbt-js.
294
- - DuckDB doesn't report row counts for full table builds (CTAS), so those log lines omit the count. Incremental and seed counts are reported normally.
295
- - `:memory:` is a valid path but pointless for a CLI — each invocation is a separate process, so nothing would persist between `seed` and `run`.
296
- - Attaching external databases (DuckDB `ATTACH`) is not supported in v1.
297
- - One Postgres-specific change: pre-existing **materialized views** squatting on a model's name are no longer auto-dropped (relation detection now uses `information_schema`, which can't see them); you'd get a clear Postgres error at build time instead. dbt-js itself never creates materialized views.
298
-
299
- ## MySQL notes
300
-
301
- Requires MySQL 8.0+ (`CREATE TABLE ... AS SELECT` under GTID consistency additionally needs 8.0.21+, and temp-table-in-transaction is disallowed when it's enforced).
302
-
303
- - dbt-js enables `ANSI_QUOTES` for its session, so double quotes are **identifier** quotes exactly as on Postgres/DuckDB — write string literals with single quotes in model SQL (the habit you already have from Postgres).
304
- - `schema` maps to a MySQL **database**: `CREATE SCHEMA IF NOT EXISTS` is `CREATE DATABASE`, so the connecting user needs the server-wide CREATE privilege (or pre-create the schema and grant on it — see `example-mysql/README.md`).
305
- - MySQL DDL implicitly commits, so `table` and `--full-refresh` rebuilds (DROP + CREATE TABLE AS) are **not** atomic to readers the way they are on Postgres/DuckDB. `delete+insert` and microbatch window replacement remain fully transactional.
306
- - No `CREATE INDEX IF NOT EXISTS` use an idempotent post-hook like `analyze table {{ this }}`, or guard index creation yourself.
307
- - Seed type inference maps `numeric` to `decimal(38,10)` (bare `NUMERIC` is `DECIMAL(10,0)` on MySQL and would round); `boolean` becomes `TINYINT(1)` with `true/false` loaded as `1/0`. Override per column via `seeds.columnTypes` as usual.
308
- - Microbatch boundaries are computed in UTC and compared as `DATETIME` literals prefer a `DATETIME` event-time column, or set the session time zone to UTC via mysql2's `timezone` connection option.
309
- - Rows come back with `dateStrings: true` (dates as strings, JSON-safe, matching the DuckDB adapter); set `dateStrings: false` in the connection object to get JS `Date`s from the `query` API.
310
-
311
- ## SQLite notes
312
-
313
- Driver: `better-sqlite3` (synchronous a long-running statement blocks the embedding app's event loop; irrelevant for CLI use).
314
-
315
- - `schema` maps to a **separate database file** `<schema>.db` next to `connection.path`, ATTACHed for the session (created automatically when writable). `"schema": "main"` keeps everything in the single main file — see `example-sqlite/README.md`.
316
- - SQLite DDL is transactional, so **all** rebuilds — including `table` and `--full-refresh` — are atomic, like Postgres/DuckDB. One caveat: switching `journal_mode` to WAL in a hook removes crash atomicity for transactions spanning the main and attached files.
317
- - There is no `DROP ... CASCADE`: dropping a table leaves dependent views dangling (they error when next queried) instead of dropping them.
318
- - Type affinity gotchas: never `CAST(x AS DATETIME)` — `DATETIME` gets NUMERIC affinity, truncating `'2026-06-03'` to `2026`. Store timestamps as `'YYYY-MM-DD HH:MM:SS'` text; lexicographic comparison is chronological, and microbatch window boundaries are normalized with `datetime()` so day-granularity event-time columns work too.
319
- - Seed `boolean` columns load as `1/0` (the text `'true'` would be falsy in `CASE WHEN`); `numeric` needs no special mapping (affinity stores decimals losslessly).
320
- - The read-only `query` API opens the files with SQLite's readonly flag writes fail with `SQLITE_READONLY`, and the database files must already exist.
321
- - INTEGER values beyond 2^53 come back as imprecise JS numbers from the `query` API.
322
-
323
- ## License
324
-
325
- MIT
1
+ # dbt-js
2
+
3
+ A minimalist dbt-like SQL transformation tool for Postgres, MySQL, SQLite, and DuckDB. Models are plain SQL `SELECT` files; dbt-js compiles them (resolving `ref()` / `source()` / `var()`), builds a dependency DAG, and executes everything inside the database in dependency order. Like dbt, it is transformation-only — it never extracts or moves data; raw data must already be in your database (or, with DuckDB, in files it can read in place).
4
+
5
+ Five dependencies: `pg`, `mysql2`, `better-sqlite3`, `@duckdb/node-api`, and `csv-parse` — the database drivers are loaded lazily, so each backend only pays for its own. Plain ESM JavaScript, no build step.
6
+
7
+ ## Install
8
+
9
+ ```sh
10
+ npm install -g dbt-js # global CLI: dbt-js <command>
11
+ npx dbt-js debug # or run without installing
12
+ npm install dbt-js # as a library, for embedding (see below)
13
+ ```
14
+
15
+ Requires Node.js >= 20.
16
+
17
+ ## Quick start
18
+
19
+ DuckDB needs no database server, so a project is just two files. Create a directory with:
20
+
21
+ ```json
22
+ // dbtjs.config.json
23
+ {
24
+ "connection": { "type": "duckdb", "path": "./warehouse.duckdb" },
25
+ "schema": "analytics"
26
+ }
27
+ ```
28
+
29
+ ```sql
30
+ -- models/hello.sql
31
+ select 1 as id, 'world' as greeting
32
+ ```
33
+
34
+ Then, from that directory:
35
+
36
+ ```sh
37
+ dbt-js debug # check config + connectivity
38
+ dbt-js run # build all models in DAG order
39
+ dbt-js test # run data tests
40
+ ```
41
+
42
+ (Add a `seeds/*.csv` file and `dbt-js seed` to load CSV data first; `ref()` it from a
43
+ model.) Swap the connection block for Postgres, MySQL, or SQLite — see **Project layout**
44
+ below — and the same commands work unchanged.
45
+
46
+ ## Project layout
47
+
48
+ A dbt-js project is a directory containing:
49
+
50
+ ```
51
+ dbtjs.config.json # connection, target schema, sources, vars
52
+ models/*.sql # one SELECT per file; filename = model name
53
+ seeds/*.csv # one table per file; filename = table name
54
+ ```
55
+
56
+ Model and seed names must be word characters only (`[A-Za-z0-9_]`) — `ref()` / `source()`
57
+ match `\w+`, so a name like `my-model` would be unreferenceable. Use underscores (`my_model`).
58
+
59
+ ### dbtjs.config.json
60
+
61
+ ```json
62
+ {
63
+ "connection": {
64
+ "host": "localhost",
65
+ "port": 5432,
66
+ "user": "me",
67
+ "password": "${DBTJS_PASSWORD}",
68
+ "database": "mydb"
69
+ },
70
+ "schema": "analytics",
71
+ "sources": { "raw": { "schema": "public" } },
72
+ "vars": { "start": null },
73
+ "seeds": { "columnTypes": { "my_seed": { "joined_on": "date" } } }
74
+ }
75
+ ```
76
+
77
+ For MySQL, the same shape with `"type": "mysql"` (`port` defaults to 3306):
78
+
79
+ ```json
80
+ {
81
+ "connection": {
82
+ "type": "mysql",
83
+ "host": "localhost",
84
+ "user": "me",
85
+ "password": "${DBTJS_PASSWORD}",
86
+ "database": "mydb"
87
+ },
88
+ "schema": "analytics"
89
+ }
90
+ ```
91
+
92
+ For DuckDB and SQLite, the connection is just a file path (the warehouse is an embedded local file):
93
+
94
+ ```json
95
+ {
96
+ "connection": { "type": "duckdb", "path": "./warehouse.duckdb" },
97
+ "schema": "analytics"
98
+ }
99
+ ```
100
+
101
+ ```json
102
+ {
103
+ "connection": { "type": "sqlite", "path": "./warehouse.db" },
104
+ "schema": "analytics"
105
+ }
106
+ ```
107
+
108
+ - `connection.type` is `"postgres"` (default), `"mysql"`, `"sqlite"`, or `"duckdb"`.
109
+ - `${NAME}` in connection values is replaced from the environment (error if unset). Omit `password` entirely to let `pg` use `PGPASSWORD`.
110
+ - `schema` is where all models and seeds are created (`CREATE SCHEMA IF NOT EXISTS` runs automatically).
111
+ - `sources` maps a source name to a schema, used by `{{ source('name', 'table') }}`; add `"database"` to a source to point it at a DuckDB attached catalog (see `connection.attach` in DuckDB notes).
112
+ - `vars` are defaults, overridable per-invocation with `--vars '{"start": "2026-06-01"}'`.
113
+ - `seeds.columnTypes` overrides inferred CSV column types (the escape hatch for dates/timestamps).
114
+
115
+ ## Models
116
+
117
+ A model is a single `SELECT`. Configuration lives in one leading block comment with a JSON body:
118
+
119
+ ```sql
120
+ /* config: {
121
+ "materialized": "incremental",
122
+ "strategy": "delete+insert",
123
+ "unique_key": "day",
124
+ "tests": { "day": ["not_null", "unique"] }
125
+ } */
126
+ select ...
127
+ ```
128
+
129
+ No config comment means `{ "materialized": "view" }`.
130
+
131
+ ### Templating
132
+
133
+ | Expression | Becomes |
134
+ | ------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
135
+ | `{{ ref('other_model') }}` | `"schema"."other_model"` and declares a DAG dependency |
136
+ | `{{ this }}` | the current model's own table (for incremental high-water marks) |
137
+ | `{{ source('raw', 'orders') }}` | `"public"."orders"` (schema from `sources` config; a source with a `database` resolves to `"db"."schema"."orders"` a DuckDB attached catalog) |
138
+ | `{{ var('start') }}` / `{{ var('x', 0) }}` | the var's value, or the default; error if neither. Inserted verbatim quote it yourself in SQL |
139
+ | `{{ batch_start }}` / `{{ batch_end }}` | the current batch window as `YYYY-MM-DD HH:MM:SS` (microbatch models only). Inserted verbatim — quote it yourself |
140
+ | `{{ timezone }}` | the model's configured IANA zone (default `UTC`). Inserted verbatim — quote it yourself (see Timezone below) |
141
+ | `{% if is_incremental() %} ... {% endif %}` | body included only on incremental runs (table exists, not `--full-refresh`) |
142
+
143
+ That's the whole template language. Anything else inside `{{ }}` / `{% %}` is a compile error.
144
+
145
+ ### Materializations
146
+
147
+ - **view** (default): `CREATE OR REPLACE VIEW`
148
+ - **table**: transactional `DROP TABLE ... CASCADE; CREATE TABLE ... AS SELECT` (atomic to readers; CASCADE-dropped downstream views are rebuilt later in the same run — for partial runs use `--select model+`)
149
+ - **incremental**: first run (or `--full-refresh`) builds like a table; after that only the rows your SELECT returns are applied, via a strategy:
150
+ - `append` — plain `INSERT INTO ... SELECT` (immutable event data)
151
+ - `delete+insert` — requires `unique_key` (string or array); deletes matching keys then inserts, in one transaction (idempotent re-runs)
152
+ - `microbatch` — splits the event-time range into aligned windows and replaces each window in its own transaction (see below)
153
+
154
+ ### Hooks
155
+
156
+ `pre_hook` / `post_hook` run extra SQL around a model's build grants, indexes, `ANALYZE`, audit rows. Each is a string or array of strings, rendered with the same template language as the model body (everything except `batch_start` / `batch_end`):
157
+
158
+ ```sql
159
+ /* config: {
160
+ "materialized": "table",
161
+ "post_hook": [
162
+ "create index if not exists idx_daily_revenue_day on {{ this }} (day)",
163
+ "grant select on {{ this }} to reporting"
164
+ ]
165
+ } */
166
+ select ...
167
+ ```
168
+
169
+ - Order: all pre-hooks → materialization → all post-hooks, each hook as its own statement.
170
+ - One deliberate divergence from dbt: hooks run **outside** the materialization transaction, so they can use statements Postgres forbids inside one (`VACUUM`, `CREATE INDEX CONCURRENTLY`). A failing pre-hook aborts the model before any build; a failing post-hook marks the model FAIL but the built relation remains — fix the hook and re-run.
171
+ - Microbatch models run hooks once per model (pre-hooks before the first batch, post-hooks after the last), not per batch; post-hooks are skipped when any batch failed.
172
+ - `{{ ref('x') }}` inside a hook declares a DAG dependency, same as in the body.
173
+
174
+ ### Incremental pattern + backfill
175
+
176
+ ```sql
177
+ select date_trunc('day', created_at)::date as day, count(*) as orders
178
+ from {{ ref('orders_enriched') }}
179
+ {% if is_incremental() %}
180
+ where created_at >= coalesce(
181
+ nullif('{{ var("start", "") }}', '')::timestamptz,
182
+ (select max(day) from {{ this }})::timestamptz)
183
+ {% endif %}
184
+ group by 1
185
+ ```
186
+
187
+ - Normal run: processes from the table's own high-water mark (`max(day)`).
188
+ - Backfill: `dbt-js run --select daily_revenue --vars '{"start": "2026-01-01"}'` re-derives from that date; `delete+insert` makes it idempotent.
189
+ - Full rebuild: `dbt-js run --select daily_revenue --full-refresh`.
190
+
191
+ ### Microbatch
192
+
193
+ For batched, retryable backfills, use `strategy: "microbatch"`. dbt-js splits the time range into `batch_size` windows and runs each as its own transaction: `DELETE` the target rows whose `event_time` falls in the window, then `INSERT` the batch's rows. A failed batch is reported and the rest keep running.
194
+
195
+ ```sql
196
+ /* config: {
197
+ "materialized": "incremental",
198
+ "strategy": "microbatch",
199
+ "event_time": "day",
200
+ "begin": "2026-01-01",
201
+ "batch_size": "day",
202
+ "lookback": 1
203
+ } */
204
+ select date_trunc('day', created_at)::date as day, count(*) as orders
205
+ from {{ ref('orders_enriched') }}
206
+ where created_at >= '{{ batch_start }}'::timestamptz
207
+ and created_at < '{{ batch_end }}'::timestamptz
208
+ group by 1
209
+ ```
210
+
211
+ - `event_time` column **of this model's output** bounding each batch (used by the engine's per-window DELETE).
212
+ - `begin` start of history; first run and `--full-refresh` build every batch from here.
213
+ - `batch_size` `hour` | `day` | `month` | `year`. Boundaries align to the model's `timezone` (default UTC).
214
+ - `lookback` (default 1) a normal run reprocesses the current batch plus this many previous ones (no high-water mark, same as dbt).
215
+ - Backfill: `dbt-js run --select my_model --event-time-start 2026-06-02 --event-time-end 2026-06-04` rewrites exactly those windows (whole batches; end is exclusive). Idempotent by construction.
216
+ - No `is_incremental()` needed — the `batch_start`/`batch_end` filter applies on every run, including the first.
217
+ - If batches fail, the model exits FAIL listing the failed windows and the exact `--event-time-start/--event-time-end` retry command; other batches' work is kept.
218
+
219
+ One deliberate divergence from dbt: dbt auto-filters upstream `ref()`s by their declared `event_time`; dbt-js does no hidden query rewriting — you filter your input yourself with `{{ batch_start }}` / `{{ batch_end }}`.
220
+
221
+ ### Timezone
222
+
223
+ Any model may set `"timezone"` in its config (a string IANA zone, default `"UTC"`):
224
+
225
+ - For microbatch models it aligns each window to that zone's wall-clock. `{{ batch_start }}` / `{{ batch_end }}` are emitted as naive `YYYY-MM-DD HH:MM:SS` **wall-clock strings in that zone**, so they compare directly against a locally-stored `event_time` column. A `"day"` batch in `"America/New_York"` therefore spans local midnight-to-midnight, not UTC.
226
+ - `{{ timezone }}` is available in **any** model's SQL (raw substitution quote it yourself, e.g. `created_at at time zone '{{ timezone }}'`).
227
+ - `begin`, `--event-time-start`, and `--event-time-end` given as naive strings are interpreted as wall-clock in the model's `timezone`; strings with an explicit `Z`/offset stay absolute.
228
+ - DST caveat: with `batch_size: "hour"` in a DST zone the spring-forward/fall-back hour is irregular — prefer UTC for hour-grain, or day+ grain for zoned models.
229
+
230
+ ## Tests
231
+
232
+ Declared per column in the model's config. Each compiles to a query returning violating rows; any row fails the test (exit code 1, with up to 10 sample rows printed).
233
+
234
+ - `"not_null"` — rows where the column is NULL
235
+ - `"unique"` non-NULL values appearing more than once
236
+ - `{ "accepted_values": ["a", "b"] }` — non-NULL values outside the list
237
+
238
+ ## Seeds
239
+
240
+ `dbt-js seed` loads each `seeds/*.csv` as a table (drop + create + insert, transactional). Column types are inferred (`integer`/`bigint`/`numeric`/`boolean`, else `text`; empty string NULL); override per column via `seeds.columnTypes`. Models can `{{ ref('seed_name') }}` seeds.
241
+
242
+ ## CLI
243
+
244
+ ```
245
+ dbt-js run [--select SPEC] [--full-refresh] [--vars JSON]
246
+ [--event-time-start TS] [--event-time-end TS] # microbatch backfill window
247
+ dbt-js test [--select SPEC] [--vars JSON]
248
+ dbt-js seed [--select SPEC]
249
+ dbt-js compile [--select SPEC] [--vars JSON] # print compiled SQL, no DB needed
250
+ dbt-js ls # nodes in execution order
251
+ dbt-js debug # config + connectivity check
252
+ ```
253
+
254
+ `--select` accepts comma-separated names; `+name` adds everything upstream, `name+` everything downstream (e.g. `--select orders_enriched+` rebuilds it and its dependents).
255
+
256
+ On failure, downstream models are skipped and reported; exit code is 1 if anything failed.
257
+
258
+ ## Embedding in a Node.js app
259
+
260
+ The CLI is a thin wrapper over a programmatic API. Install dbt-js as a dependency:
261
+
262
+ ```sh
263
+ npm install dbt-js
264
+ ```
265
+
266
+ ```js
267
+ import { run, test, seed, compile, ls, query, debug } from "dbt-js";
268
+
269
+ const result = await run({
270
+ projectDir: "./analytics", // dir containing dbtjs.config.json always pass this
271
+ select: "daily_revenue+", // optional, same syntax as --select
272
+ vars: { start: "2026-06-01" }, // optional, plain object (not a JSON string)
273
+ fullRefresh: false,
274
+ onEvent: (e) => logger.info(e), // optional progress stream; omit for silence
275
+ });
276
+ // result = { ok, models: [{ name, status: 'ok'|'fail'|'skip', materialized, action,
277
+ // rowCount, batchCount, failedBatches, durationMs, error }] }
278
+ ```
279
+
280
+ The project can also be supplied inline instead of from files — handy when connection settings live in your app's config system or model SQL is generated:
281
+
282
+ ```js
283
+ await run({
284
+ config: {
285
+ // contents of dbtjs.config.json (file not read)
286
+ connection: {
287
+ host: "db",
288
+ port: 5432,
289
+ user: "analytics",
290
+ password: process.env.PW,
291
+ database: "warehouse",
292
+ },
293
+ schema: "analytics",
294
+ sources: { raw: { schema: "public" } },
295
+ },
296
+ models: {
297
+ // replaces models/*.sql same format, config comment included
298
+ stg_orders:
299
+ "select * from {{ source('raw', 'orders') }} where deleted = false",
300
+ order_counts:
301
+ '/* config: { "materialized": "table" } */ select count(*) as n from {{ ref(\'stg_orders\') }}',
302
+ },
303
+ });
304
+ ```
305
+
306
+ With both given, `projectDir` is optional it then only anchors relative DuckDB paths and locates `seeds/` (file seeds remain `ref()`-able from inline models). Inline `config` goes through the same validation and `${ENV}` interpolation as the file; your object is not mutated.
307
+
308
+ - `run` also takes `eventTimeStart` / `eventTimeEnd` for microbatch backfills. `test` `{ ok, tests: [{ id, model, pass, violations, sample }] }`; `seed` `{ ok, seeds: [...] }`; `compile` `[{ name, materialized, sql, preHookSql, postHookSql }]` (no DB needed); `ls` `[{ name, kind, deps }]`; `debug` → connectivity info (including `attached`, the list of DuckDB `ATTACH` catalogs — empty on other backends).
309
+ - `query({ sql, params?, readOnly = true, projectDir?, config? })` `{ rows, rowCount }` runs one arbitrary statement against the warehouse. It bypasses model loading, so it works on a project with zero models (handy for inspecting results from your app). Read-only by default — DuckDB opens with `READ_ONLY` access mode, Postgres sets the session read-only — pass `readOnly: false` to write.
310
+ - Config or project errors **throw**; model/test failures come back as `ok: false` (mirrors the CLI's exit code 1).
311
+ - Every call opens its own connection and closes it before returning — nothing to pool.
312
+ - **Serialize runs yourself** (a one-promise queue is enough): DuckDB allows a single writer per file, so a scheduled refresh and an HTTP-triggered run must not overlap.
313
+ - Relative paths are anchored to `projectDir`, not your app's cwd: the DuckDB `connection.path` is resolved against it, and `read_csv('data/...')`-style paths in model SQL resolve via DuckDB's `file_search_path`.
314
+
315
+ ## DuckDB notes
316
+
317
+ - `sources` resolve to schemas inside the same `.duckdb` file, exactly like Postgres schemas.
318
+ - Models can call DuckDB-native readers directly — `from read_csv('data/orders.csv')` or `read_parquet('...')` no template syntax needed; raw data files never pass through dbt-js.
319
+ - DuckDB doesn't report row counts for full table builds (CTAS), so those log lines omit the count. Incremental and seed counts are reported normally.
320
+ - `:memory:` is a valid path but pointless for a CLIeach invocation is a separate process, so nothing would persist between `seed` and `run`.
321
+ - **Attaching external databases** list databases to `ATTACH` under `connection.attach`; each becomes a catalog you read through `source()` with a `database` qualifier:
322
+ ```json
323
+ {
324
+ "connection": {
325
+ "type": "duckdb",
326
+ "path": "./warehouse.duckdb",
327
+ "attach": [
328
+ { "alias": "raw", "path": "./raw.duckdb" },
329
+ { "alias": "legacy", "path": "./legacy.db", "type": "sqlite" }
330
+ ]
331
+ },
332
+ "schema": "analytics",
333
+ "sources": { "raw_orders": { "database": "raw", "schema": "main" } }
334
+ }
335
+ ```
336
+ Then `{{ source('raw_orders', 'orders') }}` resolves to `"raw"."main"."orders"`. Each entry needs a `path` (a file path for `duckdb`/`sqlite`, a connection string for `postgres`/`mysql`); optional `type` (default `"duckdb"`), `read_only`, and `alias`. `alias` defaults to the file's basename without extension (`./raw.duckdb` → `raw`) and is required for `postgres`/`mysql` connection strings. Attachments are **read-only by default** (and the `query` API forces all of them read-only) — models materialize into the main database's `schema`, never into an attached catalog. File paths anchor to the project dir; `${ENV}` interpolation works in connection strings. Non-DuckDB types autoload the matching scanner extension, which needs network access on first use.
337
+ - One Postgres-specific change: pre-existing **materialized views** squatting on a model's name are no longer auto-dropped (relation detection now uses `information_schema`, which can't see them); you'd get a clear Postgres error at build time instead. dbt-js itself never creates materialized views.
338
+
339
+ ## MySQL notes
340
+
341
+ Requires MySQL 8.0+ (`CREATE TABLE ... AS SELECT` under GTID consistency additionally needs 8.0.21+, and temp-table-in-transaction is disallowed when it's enforced).
342
+
343
+ - dbt-js enables `ANSI_QUOTES` for its session, so double quotes are **identifier** quotes exactly as on Postgres/DuckDB — write string literals with single quotes in model SQL (the habit you already have from Postgres).
344
+ - `schema` maps to a MySQL **database**: `CREATE SCHEMA IF NOT EXISTS` is `CREATE DATABASE`, so the connecting user needs the server-wide CREATE privilege (or pre-create the schema and grant on it).
345
+ - MySQL DDL implicitly commits, so `table` and `--full-refresh` rebuilds (DROP + CREATE TABLE AS) are **not** atomic to readers the way they are on Postgres/DuckDB. `delete+insert` and microbatch window replacement remain fully transactional.
346
+ - No `CREATE INDEX IF NOT EXISTS` — use an idempotent post-hook like `analyze table {{ this }}`, or guard index creation yourself.
347
+ - Seed type inference maps `numeric` to `decimal(38,10)` (bare `NUMERIC` is `DECIMAL(10,0)` on MySQL and would round); `boolean` becomes `TINYINT(1)` with `true/false` loaded as `1/0`. Override per column via `seeds.columnTypes` as usual.
348
+ - Microbatch boundaries are computed in UTC and compared as `DATETIME` literals — prefer a `DATETIME` event-time column, or set the session time zone to UTC via mysql2's `timezone` connection option.
349
+ - Rows come back with `dateStrings: true` (dates as strings, JSON-safe, matching the DuckDB adapter); set `dateStrings: false` in the connection object to get JS `Date`s from the `query` API.
350
+
351
+ ## SQLite notes
352
+
353
+ Driver: `better-sqlite3` (synchronous — a long-running statement blocks the embedding app's event loop; irrelevant for CLI use).
354
+
355
+ - `schema` maps to a **separate database file** `<schema>.db` next to `connection.path`, ATTACHed for the session (created automatically when writable). `"schema": "main"` keeps everything in the single main file.
356
+ - SQLite DDL is transactional, so **all** rebuilds — including `table` and `--full-refresh` — are atomic, like Postgres/DuckDB. One caveat: switching `journal_mode` to WAL in a hook removes crash atomicity for transactions spanning the main and attached files.
357
+ - There is no `DROP ... CASCADE`: dropping a table leaves dependent views dangling (they error when next queried) instead of dropping them.
358
+ - Type affinity gotchas: never `CAST(x AS DATETIME)` — `DATETIME` gets NUMERIC affinity, truncating `'2026-06-03'` to `2026`. Store timestamps as `'YYYY-MM-DD HH:MM:SS'` text; lexicographic comparison is chronological, and microbatch window boundaries are normalized with `datetime()` so day-granularity event-time columns work too.
359
+ - Seed `boolean` columns load as `1/0` (the text `'true'` would be falsy in `CASE WHEN`); `numeric` needs no special mapping (affinity stores decimals losslessly).
360
+ - The read-only `query` API opens the files with SQLite's readonly flag — writes fail with `SQLITE_READONLY`, and the database files must already exist.
361
+ - INTEGER values beyond 2^53 come back as imprecise JS numbers from the `query` API.
362
+
363
+ ## License
364
+
365
+ MIT