ga4-export-fixer 0.8.0-dev.1 → 0.8.0-dev.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,20 +1,20 @@
1
- <img src="docs/images/header.svg" alt="ga4-export-fixer">
2
-
3
- # An enhanced, incremental GA4 events table, built with Dataform
4
-
5
- [![npm version](https://img.shields.io/npm/v/ga4-export-fixer)](https://www.npmjs.com/package/ga4-export-fixer)
6
- [![License](https://img.shields.io/npm/l/ga4-export-fixer)](https://github.com/tanelytics/ga4-export-fixer/blob/main/LICENSE)
7
- ![Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen)
8
-
9
- **ga4-export-fixer** is a **Dataform NPM package** that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines **daily, fresh (360), and intraday exports** so the best available version of each event is always in use, adds session-level fields like `session_id` and `landing_page`, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.
10
-
11
- The goal of the package is to **speed up development** when building data models and pipelines on top of GA4 export data, allowing you to focus on your use case instead of wrestling with the raw export format.
12
-
13
- <img src="./docs/images/example_data_model.png" alt="Example Data Model" width="600">
14
-
15
- *Example data model built with ga4-export-fixer*
16
-
17
- ## Table of Contents
1
+ <img src="docs/images/header.svg" alt="ga4-export-fixer">
2
+
3
+ # An enhanced, incremental GA4 events table, built with Dataform
4
+
5
+ [![npm version](https://img.shields.io/npm/v/ga4-export-fixer)](https://www.npmjs.com/package/ga4-export-fixer)
6
+ [![License](https://img.shields.io/npm/l/ga4-export-fixer)](https://github.com/tanelytics/ga4-export-fixer/blob/main/LICENSE)
7
+ ![Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen)
8
+
9
+ **ga4-export-fixer** is a **Dataform NPM package** that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines **daily, fresh (360), and intraday exports** so the best available version of each event is always in use, adds session-level fields like `session_id` and `landing_page`, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.
10
+
11
+ The goal of the package is to **speed up development** when building data models and pipelines on top of GA4 export data, allowing you to focus on your use case instead of wrestling with the raw export format.
12
+
13
+ <img src="./docs/images/example_data_model.png" alt="Example Data Model" width="600">
14
+
15
+ *Example data model built with ga4-export-fixer*
16
+
17
+ ## Table of Contents
18
18
  <!-- TOC -->
19
19
  - [Main Features](#main-features)
20
20
  - [Planned Features](#planned-features)
@@ -28,652 +28,690 @@ The goal of the package is to **speed up development** when building data models
28
28
  - [Creating Incremental Downstream Tables from `ga4_events_enhanced`](#creating-incremental-downstream-tables-from-ga4_events_enhanced)
29
29
  - [Helpers](#helpers)
30
30
  - [License](#license)
31
- <!-- /TOC -->
32
-
33
- ## Main Features
34
-
35
- <table>
36
- <tr>
37
- <td width="50%" valign="top">
38
- <b>📦 Best Available Data</b><br>
39
- Combines daily, fresh (360) &amp; intraday exports so the most complete version is always available
40
- </td>
41
- <td width="50%" valign="top">
42
- <b>🔄 Incremental Updates</b><br>
43
- Run on any schedule — daily, hourly, or custom
44
- </td>
45
- </tr>
46
- <tr>
47
- <td valign="top">
48
- <b>📐 Flexible Schema</b><br>
49
- Keeps the flexible structure of the original export with key fields promoted to columns for better query performance; partitioning &amp; clustering enabled
50
- </td>
51
- <td valign="top">
52
- <b>🤖 AI Agent Ready</b><br>
53
- Extensive table &amp; column descriptions for AI agents and humans
54
- </td>
55
- </tr>
56
- <tr>
57
- <td valign="top">
58
- <b>🔑 Session Identity Resolution</b><br>
59
- <code>user_id</code> resolved per session; <code>merged_user_id</code> coalesces with <code>user_pseudo_id</code>
60
- </td>
61
- <td valign="top">
62
- <b>📡 Session Traffic Sources</b><br>
63
- <code>session_first_traffic_source</code> and <code>session_traffic_source_last_click</code> computed automatically, adjusting for sessions that span midnight
64
- </td>
65
- </tr>
66
- <tr>
67
- <td valign="top">
68
- <b>📍 Landing Page Detection</b><br>
69
- Derived per session from the first page where <code>entrances > 0</code>
70
- </td>
71
- <td valign="top">
72
- <b>🔗 Page URL Parsing</b><br>
73
- Parsed <code>hostname</code>, <code>path</code>, <code>query</code>, and <code>query_params</code> from <code>page_location</code>
74
- </td>
75
- </tr>
76
- <tr>
77
- <td valign="top">
78
- <b>🛒 Ecommerce Data Fixes</b><br>
79
- Nullifies placeholder <code>transaction_id</code>; corrects <code>purchase_revenue</code> bugs
80
- </td>
81
- <td valign="top">
82
- <b>🏷️ Item List Attribution</b><br>
83
- Attributes <code>item_list_name</code>, <code>item_list_id</code>, and <code>item_list_index</code> from item selection events to downstream ecommerce events
84
- </td>
85
- </tr>
86
- <tr>
87
- <td valign="top">
88
- <b>⚙️ Event Parameter Handling</b><br>
89
- Promote event params to columns; include or exclude by name
90
- </td>
91
- <td valign="top">
92
- <b>📊 Session Parameters</b><br>
93
- Promote selected event parameters as <code>session_params</code>
94
- </td>
95
- </tr>
96
- <tr>
97
- <td valign="top">
98
- <b>⏱️ Custom Timestamp</b><br>
99
- Use a custom event parameter as primary timestamp with automatic fallback
100
- </td>
101
- <td valign="top">
102
- <b>🔒 Schema Lock</b><br>
103
- Lock table schema to a specific GA4 export date to prevent schema drift
104
- </td>
105
- </tr>
106
- <tr>
107
- <td valign="top">
108
- <b>✅ Data Freshness Tracking</b><br>
109
- <code>data_is_final</code> flag and <code>export_type</code> label on every row
110
- </td>
111
- <td valign="top">
112
- <b>🔍 Data Quality Assertions</b><br>
113
- Built-in daily assertion reconciles sessions, events, and revenue between the enhanced table and raw export
114
- </td>
115
- </tr>
116
- <tr>
117
- <td valign="top">
118
- <b>🔃 Selective Re-processing</b><br>
119
- Re-process a date range without full table rebuild using <code>incrementalStartOverride</code> and <code>incrementalEndOverride</code>
120
- </td>
121
- <td valign="top">
122
- <b>📑 Batch Processing</b><br>
123
- Process large exports in smaller batches via <code>numberOfDaysToProcess</code>
124
- </td>
125
- </tr>
126
- <tr>
127
- <td valign="top">
128
- <b>🕐 Timezone-Aware Datetime</b><br>
129
- <code>event_datetime</code> converted to a configurable IANA timezone
130
- </td>
131
- <td valign="top">
132
- <b>🛡️ Zero Dependencies</b><br>
133
- No additional external dependencies added to your Dataform repository
134
- </td>
135
- </tr>
136
- </table>
137
-
138
- ## Planned Features
139
-
140
- Features under consideration for future releases:
141
-
142
- - Web and app specific default configurations
143
- - Custom channel grouping
144
- - Data enrichment (item-level, session-level, event-level)
145
- - Custom processing steps (additional CTEs)
146
- - Custom traffic source attribution
147
-
148
- ## Installation
149
-
150
- ### Bash
151
-
152
- ```bash
153
- npm install ga4-export-fixer
154
- ```
155
-
156
- ### In Google Cloud Dataform
157
-
158
- Include the package in the package.json file in your Dataform repository.
159
-
160
- **`package.json`**
161
-
162
- ```json
163
- {
164
- "dependencies": {
165
- "@dataform/core": "3.0.42",
166
- "ga4-export-fixer": "0.7.1"
167
- }
168
- }
169
- ```
170
-
171
- **Note:** The best practice is to specify the package version explicitly (e.g. `"0.1.2"`) rather than using `"latest"` or `"*"`, to avoid unexpected breaking changes when the package is updated.
172
-
173
- In Google Cloud Dataform, click "Install Packages" to install it in your development workspace.
174
-
175
- If your Dataform repository does not have a package.json file, see this guide: [https://docs.cloud.google.com/dataform/docs/manage-repository#move-to-package-json](https://docs.cloud.google.com/dataform/docs/manage-repository#move-to-package-json)
176
-
177
- ## Usage
178
-
179
- ### Create GA4 Events Enhanced Table
180
-
181
- Creates an **enhanced** version of the GA4 BigQuery export (daily & intraday).
182
-
183
- #### JS Deployment (Recommended) ![.JS](https://img.shields.io/badge/.JS-F7DF1E?style=flat-square)
184
-
185
- Create a new **ga4_events_enhanced** table using a **.js** file in your repository's **definitions** folder.
186
-
187
- **Using Defaults**
188
-
189
- **`definitions/ga4/ga4_events_enhanced.js`**
190
-
191
- ```javascript
192
- const { ga4EventsEnhanced } = require('ga4-export-fixer');
193
-
194
- const config = {
195
- sourceTable: constants.GA4_TABLES.MY_GA4_EXPORT
196
- };
197
-
198
- ga4EventsEnhanced.createTable(publish, config);
199
- ```
200
-
201
- **With Custom Configuration**
202
-
203
- **`definitions/ga4/ga4_events_enhanced.js`**
204
-
205
- ```javascript
206
- const { ga4EventsEnhanced } = require('ga4-export-fixer');
207
-
208
- const config = {
209
- sourceTable: constants.GA4_TABLES.MY_GA4_EXPORT,
210
- // use dataformTableConfig to make changes to the default Dataform table configuration
211
- dataformTableConfig: {
212
- schema: 'ga4'
213
- },
214
- // test configurations
215
- test: false,
216
- testConfig: {
217
- dateRangeStart: 'current_date()-1',
218
- dateRangeEnd: 'current_date()',
219
- },
220
- schemaLock: '20260101', // lock to daily export; also supports 'intraday_20260101' or 'fresh_20260101'
221
- customTimestampParam: 'custom_event_timestamp', // custom timestamp collected as an event param
222
- timezone: 'Europe/Helsinki',
223
- // not needed data
224
- excludedColumns: [
225
- 'app_info',
226
- 'publisher'
227
- ],
228
- // not needed events
229
- excludedEvents: [
230
- 'session_start',
231
- 'first_visit',
232
- 'user_engagement'
233
- ],
234
- // transform to session-level
235
- sessionParams: [
236
- 'user_agent'
237
- ],
238
- // promote as columns
239
- eventParamsToColumns: [
240
- {name: 'session_engaged'},
241
- {name: 'ga_session_number', type: 'int'},
242
- {name: 'page_type', type: 'string'},
243
- ],
244
- // not needed in the event_params array
245
- excludedEventParams: [
246
- 'session_engaged',
247
- 'ga_session_number',
248
- 'page_type',
249
- 'user_agent'
250
- ],
251
- // use export type for data_is_final instead of the default DAY_THRESHOLD
252
- dataIsFinal: {
253
- detectionMethod: 'EXPORT_TYPE',
254
- },
255
- // attribute item lists to downstream ecommerce events within the same session
256
- itemListAttribution: {
257
- lookbackType: 'SESSION',
258
- },
259
- };
260
-
261
- ga4EventsEnhanced.createTable(publish, config);
262
- ```
263
-
264
- #### SQLX Deployment ![.SQLX](https://img.shields.io/badge/.SQLX-4285F4?style=flat-square)
265
-
266
- Alternatively, you can create the **ga4_events_enhanced** table using a .SQLX file.
267
-
268
- **`definitions/ga4/ga4_events_enhanced.sqlx`**
269
-
270
- ```javascript
271
- config {
272
- type: "incremental",
273
- description: "GA4 Events Enhanced table",
274
- schema: "ga4",
275
- onSchemaChange: "EXTEND",
276
- bigquery: {
277
- partitionBy: "event_date",
278
- clusterBy: ['event_name', 'session_id', 'page_location', 'data_is_final'],
279
- },
280
- tags: ['ga4_export_fixer']
281
- }
282
-
283
- js {
284
- const { ga4EventsEnhanced } = require('ga4-export-fixer');
285
-
286
- const config = {
287
- sourceTable: ref(constants.GA4_TABLES.MY_GA4_EXPORT),
288
- self: self(),
289
- incremental: incremental()
290
- };
291
- }
292
-
293
- ${ga4EventsEnhanced.generateSql(config)}
294
-
295
- pre_operations {
296
- ${ga4EventsEnhanced.setPreOperations(config)}
297
- }
298
- ```
299
-
300
- <br>
301
-
302
- ---
303
-
304
- ### Configuration Object
305
-
306
- All fields are optional except `sourceTable`. Default values are applied automatically, so you only need to specify the fields you want to override.
307
-
308
-
309
- | Field | Type | Default/Required | Description |
310
- | ---------------------- | ----------------------- | ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
311
- | `sourceTable` | Dataform ref() / string | **required** | Source GA4 export table. Use `ref()` in Dataform or a string in format ``project.dataset.table`` |
312
- | `self` | Dataform self() | **required for .SQLX deployment** | Reference to the table itself. Use `self()` in Dataform |
313
- | `incremental` | Dataform incremental() | **required for .SQLX deployment** | Switch between incremental and full refresh logic. Use `incremental()` in Dataform |
314
- | `dataformTableConfig` | object | **In JS deployment only.** [See default](#default-dataformtableconfig) | Override the default Dataform table configuration for JS deployment. See: [ITableConfig reference](https://docs.cloud.google.com/dataform/docs/reference/dataform-core-reference#itableconfig) |
315
- | `schemaLock` | string | `undefined` | Lock the table schema to a specific GA4 export table suffix. Accepts `"YYYYMMDD"` (daily), `"intraday_YYYYMMDD"`, or `"fresh_YYYYMMDD"`. Date must be >= `"20241009"` |
316
- | `timezone` | string | `'Etc/UTC'` | IANA timezone for event datetime (e.g. `'Europe/Helsinki'`) |
317
- | `customTimestampParam` | string | `undefined` | Name of a custom event parameter containing a JS timestamp in milliseconds (e.g. collected via `Date.now()`) |
318
- | `bufferDays` | integer | `1` | Extra days to include for sessions that span midnight. Auto-adjusted when `itemListAttribution.lookbackType` is `'TIME'` and the lookback exceeds `bufferDays` |
319
- | `itemListAttribution` | object | `undefined` | Enable item list attribution. See [Item List Attribution](#item-list-attribution) |
320
- | `test` | boolean | `false` | Enable test mode (uses `testConfig` date range instead of pre-operations) |
321
- | `excludedEventParams` | string[] | `[]` | Event parameter names to exclude from the `event_params` array |
322
- | `excludedEvents` | string[] | `['session_start', 'first_visit']` | Event names to exclude from the table. These events are excluded by default because they have no use for analysis purposes. Override this to include them if needed |
323
- | `excludedColumns` | string[] | `[]` | Default GA4 export columns to exclude from the final table, for example `'app_info'` or `'publisher'` |
324
- | `sessionParams` | string[] | `[]` | Event parameter names to aggregate as session-level parameters |
325
- | `includedExportTypes` | object | [See details](#includedExportTypes) | Which GA4 export types to include (daily, fresh, intraday) |
326
- | `dataIsFinal` | object | [See details](#dataIsFinal) | How to determine whether data is final (not expected to change) |
327
- | `testConfig` | object | [See details](#testConfig) | Date range used when `test` is `true` |
328
- | `preOperations` | object | [See details](#preOperations) | Date range and incremental refresh configuration |
329
- | `eventParamsToColumns` | object[] | `[]` | Event parameters to promote to columns. [See item schema](#eventParamsToColumns) |
330
-
331
- <a id="default-dataformtableconfig"></a>
332
- <details>
333
- <summary><strong>Default dataformTableConfig</strong></summary>
334
-
335
- ```json
336
- {
337
- "name": "ga4_events_enhanced_<dataset_id>",
338
- "type": "incremental",
339
- "schema": "<source_dataset>",
340
- "description": "<default description>",
341
- "bigquery": {
342
- "partitionBy": "event_date",
343
- "clusterBy": [
344
- "event_name",
345
- "session_id",
346
- "page_location",
347
- "data_is_final"
348
- ],
349
- "labels": {
350
- "ga4_export_fixer": "true"
351
- }
352
- },
353
- "onSchemaChange": "EXTEND",
354
- "tags": [
355
- "ga4_export_fixer"
356
- ]
357
- }
358
- ```
359
-
360
- The `onSchemaChange: "EXTEND"` setting updates the result table schema on incremental runs, adding columns for any new fields the query produces.
361
-
362
- </details>
363
- <br>
364
-
365
- <a id="includedExportTypes"></a>
366
-
367
- **`includedExportTypes`** — which GA4 export types to include:
368
-
369
-
370
- | Field | Type | Default | Description |
371
- | ------------------------------ | ------- | ------- | -------------------------------- |
372
- | `includedExportTypes.daily` | boolean | `true` | Include daily (processed) export |
373
- | `includedExportTypes.fresh` | boolean | `false` | Include fresh (hourly-updated) export |
374
- | `includedExportTypes.intraday` | boolean | `true` | Include intraday export |
375
-
376
-
377
- Export priority: **daily > fresh > intraday**. Each lower-priority export only provides data not already covered by a higher-priority one. All seven combinations of the three export types are supported.
378
-
379
- When all three exports are enabled, the package:
380
- 1. Gets all data from daily export tables
381
- 2. Gets fresh export data for days not yet covered by a daily table
382
- 3. Gets intraday export data for events after the latest fresh event timestamp
383
-
384
- The boundary between fresh and intraday is timestamp-based because the fresh export is updated hourly, so within the same day some events come from the fresh export and the rest from intraday.
385
-
386
- > **Without daily export:** When `daily` is `false`, `dataIsFinal.detectionMethod` must be set to `'DAY_THRESHOLD'`, because `EXPORT_TYPE` detection relies on daily tables to mark data as final.
387
-
388
- <a id="dataIsFinal"></a>
389
-
390
- **`dataIsFinal`** — how to determine whether data is final (not expected to change):
391
-
392
-
393
- | Field | Type | Default | Description |
394
- | ----------------------------- | ------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
395
- | `dataIsFinal.detectionMethod` | string | `'DAY_THRESHOLD'` | `'DAY_THRESHOLD'` (uses days since event; data older than `dayThreshold` is considered final) or `'EXPORT_TYPE'` (uses table suffix; all data from the daily export is considered final). `'EXPORT_TYPE'` is suitable for most **web only** properties as data is rarely received with a delay. Must be `'DAY_THRESHOLD'` when daily export is not enabled |
396
- | `dataIsFinal.dayThreshold` | integer | `3` | Days after which data is considered final. According to GA4 documentation, data up to 72 hours old is subject to possible changes. Required when `detectionMethod` is `'DAY_THRESHOLD'` |
397
-
398
-
399
- <a id="testConfig"></a>
400
-
401
- **`testConfig`** — date range used when `test` is `true`:
402
-
403
-
404
- | Field | Type | Default | Description |
405
- | --------------------------- | ----------------- | -------------------- | --------------------------- |
406
- | `testConfig.dateRangeStart` | string (SQL date) | `'current_date()-1'` | Start date for test queries |
407
- | `testConfig.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for test queries |
408
-
409
-
410
- <a id="preOperations"></a>
411
-
412
- **`preOperations`** — date range and incremental refresh configuration:
413
-
414
-
415
- | Field | Type | Default | Description |
416
- | ------------------------------------------ | ----------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
417
- | `preOperations.dateRangeStartFullRefresh` | string (SQL date) | `'date(2000, 1, 1)'` | Start date for full refresh |
418
- | `preOperations.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for queries |
419
- | `preOperations.numberOfPreviousDaysToScan` | integer | `10` | Number of days to scan backwards from the result table's last partition when determining the incremental refresh start checkpoint. Needs to cover the number of days that can still contain not final `(data_is_final = false)` data |
420
- | `preOperations.incrementalStartOverride` | string (SQL date) | `undefined` | Override the incremental start date to re-process a specific range |
421
- | `preOperations.incrementalEndOverride` | string (SQL date) | `undefined` | Override the incremental end date to re-process a specific range |
422
- | `preOperations.numberOfDaysToProcess` | integer | `undefined` | Limit each run to N days of data. When set, the end date becomes `start + N - 1` (capped at `current_date()`). When `undefined`, `dateRangeEnd` is used as-is. `incrementalEndOverride` takes priority |
423
-
424
- Date fields (`dateRangeStart`, `dateRangeEnd`, etc.) accept string dates in `YYYYMMDD` or `YYYY-MM-DD` format, or BigQuery SQL expressions (e.g. `'current_date()'`, `'date(2026, 1, 1)'`).
425
-
426
- <a id="eventParamsToColumns"></a>
427
-
428
- **`eventParamsToColumns`** — each item in the array is an object:
429
-
430
-
431
- | Field | Type | Required | Description |
432
- | ------------ | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------- |
433
- | `name` | string | Yes | Event parameter name |
434
- | `type` | string | No | Data type: `'string'`, `'int'`, `'int64'`, `'double'`, `'float'`, or `'float64'`. If omitted, returns the value converted to a string |
435
- | `columnName` | string | No | Column name in the output. Defaults to the parameter `name` |
436
-
437
-
438
- <a id="item-list-attribution"></a>
439
-
440
- **`itemListAttribution`** — when set to an object, enables attribution of `item_list_name`, `item_list_id`, and `item_list_index` from `select_item`/`select_promotion` events to downstream ecommerce events (e.g. `add_to_cart`, `purchase`). Disabled by default.
441
-
442
- | Field | Type | Required | Description |
443
- | ---------------- | ------- | --------------------------- | --------------------------------------------------------------------- |
444
- | `lookbackType` | string | Yes | `'SESSION'` (partition by session) or `'TIME'` (time-based window) |
445
- | `lookbackTimeMs` | integer | When `lookbackType: 'TIME'` | Lookback window in milliseconds (e.g. `86400000` for 24h) |
446
-
447
- ```javascript
448
- // Session-based: attribute within the same session
449
- itemListAttribution: { lookbackType: 'SESSION' }
450
-
451
- // Time-based: attribute within a 24-hour window across sessions
452
- itemListAttribution: { lookbackType: 'TIME', lookbackTimeMs: 86400000 }
453
- ```
454
-
455
- > **Note:** This feature adds a compute-heavy CTE with a window function over unnested items. Only enable it if you need item list attribution for ecommerce analysis.
456
-
457
- <br>
458
-
459
- ---
460
-
461
- ### Assertions
462
-
463
- The package includes built-in data quality assertions that can be automatically created alongside the enhanced events table. Pass Dataform's `assert` function as the third argument to `createTable`:
464
-
465
- ```javascript
466
- ga4EventsEnhanced.createTable(publish, config, { assert });
467
- ```
468
-
469
- This creates the table along with the default-enabled assertions, using the same configuration:
470
-
471
- | Assertion | Name | Enabled by default | Description |
472
- | --------- | ---- | ------------------ | ----------- |
473
- | `dailyQuality` | `{tableName}_daily_quality` | Yes | Compares session count, event count, item revenue, and ecommerce purchase revenue per day between the enhanced table and raw export. Also reconciles item_revenue at the (event_date, item_id) grain on purchase events for days both sides consider final. Detects missing days, count mismatches, and non-final data inflation |
474
-
475
- The assertion inherits the table's schema and tags from `dataformTableConfig` and queries the last 5 days of data.
476
-
477
- #### Selective Assertions
478
-
479
- Disable the assertion by setting it to `false`:
480
-
481
- ```javascript
482
- ga4EventsEnhanced.createTable(publish, config, {
483
- assert,
484
- assertions: { dailyQuality: false },
485
- });
486
- ```
487
-
488
- #### Assertion Config Overrides
489
-
490
- Override the assertion's Dataform configuration (name, schema, tags):
491
-
492
- ```javascript
493
- ga4EventsEnhanced.createTable(publish, config, {
494
- assert,
495
- assertions: {
496
- dailyQuality: { tags: ['data_quality', 'ga4_export_fixer'] },
497
- },
498
- });
499
- ```
500
-
501
- #### Standalone Assertions (SQLX Deployment)
502
-
503
- For SQLX deployments or when you need full control, assertions can also be used as standalone SQL generators:
504
-
505
- ```javascript
506
- const { ga4EventsEnhanced } = require('ga4-export-fixer');
507
-
508
- assert('daily_quality_check', {
509
- schema: 'analytics_123456789',
510
- tags: ['ga4_export_fixer'],
511
- }).query(ctx => {
512
- return ga4EventsEnhanced.assertions.dailyQuality(
513
- ctx.ref('ga4_events_enhanced_123456789'),
514
- { ...config, sourceTable: ctx.ref(config.sourceTable) }
515
- );
516
- });
517
- ```
518
-
519
- <br>
520
-
521
- ---
522
-
523
- ### Creating Incremental Downstream Tables from `ga4_events_enhanced`
524
-
525
- Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_events_enhanced** table.
526
-
527
- The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
528
-
529
- Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
530
-
531
- **`definitions/ga4/ga4_sessions.sqlx`**
532
-
533
- ```javascript
534
- config {
535
- type: "incremental",
536
- description: "GA4 sessions table",
537
- schema: "ga4_export_fixer",
538
- bigquery: {
539
- partitionBy: "event_date",
540
- clusterBy: ['session_id', 'data_is_final'],
541
- },
542
- tags: ['ga4_export_fixer']
543
- }
544
-
545
- js {
546
- const { setPreOperations, helpers } = require('ga4-export-fixer');
547
-
548
- const config = {
549
- self: self(),
550
- incremental: incremental(),
551
- /*
552
- Default options that can be overriden:
553
- test: false,
554
- testConfig: {
555
- dateRangeStart: 'current_date()-1',
556
- dateRangeEnd: 'current_date()',
557
- },
558
- preOperations: {
559
- dateRangeStartFullRefresh: 'date(2000, 1, 1)',
560
- dateRangeEnd: 'current_date()',
561
- // incremental date range overrides allow re-processing only a subset of the data:
562
- //incrementalStartOverride: undefined,
563
- //incrementalEndOverride: undefined,
564
- },
565
- */
566
- };
567
- }
568
-
569
- select
570
- event_date,
571
- session_id,
572
- user_pseudo_id,
573
- user_id,
574
- any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
575
- any_value(landing_page) as landing_page,
576
- current_datetime() as row_inserted_timestamp,
577
- min(data_is_final) as data_is_final
578
- from
579
- ${ref('ga4_events_enhanced_298233330')}
580
- where
581
- ${helpers.incrementalDateFilter(config)}
582
- group by
583
- event_date,
584
- session_id,
585
- user_pseudo_id,
586
- user_id
587
-
588
- pre_operations {
589
- ${setPreOperations(config)}
590
- }
591
- ```
592
-
593
- <br>
594
-
595
- ---
596
-
597
- ### Helpers
598
-
599
- The helpers contain templates for common SQL expressions. The functions are referenced by **ga4EventsEnhanced** but can also be imported as utility functions for working with GA4 data.
600
-
601
- ```javascript
602
- const { helpers } = require('ga4-export-fixer');
603
- ```
604
-
605
- #### SQL Templates
606
-
607
-
608
- | Name | Example | Description |
609
- | ----------- | ------------------- | ------------------------------------------------------------------------- |
610
- | `eventDate` | `helpers.eventDate` | Casts `event_date` string to a DATE using YYYYMMDD format |
611
- | `sessionId` | `helpers.sessionId` | Builds a session ID by concatenating `user_pseudo_id` and `ga_session_id` |
612
-
613
-
614
- #### Functions
615
-
616
- **Unnesting parameters**
617
-
618
-
619
- | Function | Example | Description |
620
- | ------------------ | --------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
621
- | `unnestEventParam` | `unnestEventParam('page_location', 'string')` | Extracts a value from the `event_params` array by key. Supported types: `'string'`, `'int'`, `'int64'`, `'double'`, `'float'`, `'float64'`. Omit type to get the value converted as a string |
622
-
623
-
624
- **Date and time**
625
-
626
-
627
- | Function | Example | Description |
628
- | ------------------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
629
- | `getEventTimestampMicros` | `getEventTimestampMicros('custom_ts')` | Returns SQL for event timestamp in microseconds. With a custom parameter, uses it (converted from ms) with fallback to `event_timestamp` |
630
- | `getEventDateTime` | `getEventDateTime({ timezone: 'Europe/Helsinki' })` | Returns SQL for event datetime in the given timezone. Defaults to `'Etc/UTC'` |
631
-
632
-
633
- **Date filters**
634
-
635
-
636
- | Function | Example | Description |
637
- | --------------------- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
638
- | `ga4ExportDateFilter` | `ga4ExportDateFilter('daily', 'current_date()-7', 'current_date()')` | Generates a `_table_suffix` filter for a single export type (`'daily'` or `'intraday'`) and date range |
639
-
640
-
641
- **Page details**
642
-
643
-
644
- | Function | Example | Description |
645
- | ----------------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
646
- | `extractUrlHostname` | `extractUrlHostname('page_location')` | Extracts hostname from a URL column |
647
- | `extractUrlPath` | `extractUrlPath('page_location')` | Extracts the path component from a URL column |
648
- | `extractUrlQuery` | `extractUrlQuery('page_location')` | Extracts the query string (including `?`) from a URL column |
649
- | `extractUrlQueryParams` | `extractUrlQueryParams('page_location')` | Parses URL query parameters into `ARRAY<STRUCT<key STRING, value STRING>>` |
650
- | `extractPageDetails` | `extractPageDetails()` | Returns a struct with `hostname`, `path`, `query`, and `query_params`. Defaults to `page_location` event parameter |
651
-
652
-
653
- **Aggregation**
654
-
655
-
656
- | Function | Example | Description |
657
- | ---------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
658
- | `aggregateValue` | `aggregateValue('user_id', 'last', 'event_timestamp')` | Aggregates a column using `'max'`, `'min'`, `'first'`, `'last'`, or `'any'`. `'first'` and `'last'` use the timestamp column for ordering |
659
-
660
-
661
- **Ecommerce**
662
-
663
-
664
- | Function | Example | Description |
665
- | -------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
666
- | `fixEcommerceStruct` | `fixEcommerceStruct()` | Cleans the ecommerce struct: sets `transaction_id` to null when `'(not set)'`, and fixes missing/NaN `purchase_revenue` for purchase events |
667
-
668
-
669
- **Data freshness**
670
-
671
-
672
- | Function | Example | Description |
673
- | ------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
674
- | `isFinalData` | `isFinalData('DAY_THRESHOLD', 3)` | Returns SQL that evaluates to `true` when data is final. `'DAY_THRESHOLD'` uses days since event (`dayThreshold` is required and must be a non-negative integer); `'EXPORT_TYPE'` checks table suffix |
675
-
676
-
677
- ## License
678
-
31
+ <!-- /TOC -->
32
+
33
+ ## Main Features
34
+
35
+ <table>
36
+ <tr>
37
+ <td width="50%" valign="top">
38
+ <b>📦 Best Available Data</b><br>
39
+ Combines daily, fresh (360) &amp; intraday exports so the most complete version is always available
40
+ </td>
41
+ <td width="50%" valign="top">
42
+ <b>🔄 Incremental Updates</b><br>
43
+ Run on any schedule — daily, hourly, or custom
44
+ </td>
45
+ </tr>
46
+ <tr>
47
+ <td valign="top">
48
+ <b>📐 Flexible Schema</b><br>
49
+ Keeps the flexible structure of the original export with key fields promoted to columns for better query performance; partitioning &amp; clustering enabled
50
+ </td>
51
+ <td valign="top">
52
+ <b>🤖 AI Agent Ready</b><br>
53
+ Extensive table &amp; column descriptions for AI agents and humans
54
+ </td>
55
+ </tr>
56
+ <tr>
57
+ <td valign="top">
58
+ <b>🔑 Session Identity Resolution</b><br>
59
+ <code>user_id</code> resolved per session; <code>merged_user_id</code> coalesces with <code>user_pseudo_id</code>
60
+ </td>
61
+ <td valign="top">
62
+ <b>📡 Session Traffic Sources</b><br>
63
+ <code>session_first_traffic_source</code> and <code>session_traffic_source_last_click</code> computed automatically, adjusting for sessions that span midnight
64
+ </td>
65
+ </tr>
66
+ <tr>
67
+ <td valign="top">
68
+ <b>📍 Landing Page Detection</b><br>
69
+ Derived per session from the first page where <code>entrances > 0</code>
70
+ </td>
71
+ <td valign="top">
72
+ <b>🔗 Page URL Parsing</b><br>
73
+ Parsed <code>hostname</code>, <code>path</code>, <code>query</code>, and <code>query_params</code> from <code>page_location</code>
74
+ </td>
75
+ </tr>
76
+ <tr>
77
+ <td valign="top">
78
+ <b>🛒 Ecommerce Data Fixes</b><br>
79
+ Nullifies placeholder <code>transaction_id</code>; corrects <code>purchase_revenue</code> bugs
80
+ </td>
81
+ <td valign="top">
82
+ <b>🏷️ Item List Attribution</b><br>
83
+ Attributes <code>item_list_name</code>, <code>item_list_id</code>, and <code>item_list_index</code> from item selection events to downstream ecommerce events
84
+ </td>
85
+ </tr>
86
+ <tr>
87
+ <td valign="top">
88
+ <b>⚙️ Event Parameter Handling</b><br>
89
+ Promote event params to columns; include or exclude by name
90
+ </td>
91
+ <td valign="top">
92
+ <b>📊 Session Parameters</b><br>
93
+ Promote selected event parameters as <code>session_params</code>
94
+ </td>
95
+ </tr>
96
+ <tr>
97
+ <td valign="top">
98
+ <b>⏱️ Custom Timestamp</b><br>
99
+ Use a custom event parameter as primary timestamp with automatic fallback
100
+ </td>
101
+ <td valign="top">
102
+ <b>🔒 Schema Lock</b><br>
103
+ Lock table schema to a specific GA4 export date to prevent schema drift
104
+ </td>
105
+ </tr>
106
+ <tr>
107
+ <td valign="top">
108
+ <b>✅ Data Freshness Tracking</b><br>
109
+ <code>data_is_final</code> flag and <code>export_type</code> label on every row
110
+ </td>
111
+ <td valign="top">
112
+ <b>🔍 Data Quality Assertions</b><br>
113
+ Built-in daily assertion reconciles sessions, events, and revenue between the enhanced table and raw export
114
+ </td>
115
+ </tr>
116
+ <tr>
117
+ <td valign="top">
118
+ <b>🔃 Selective Re-processing</b><br>
119
+ Re-process a date range without full table rebuild using <code>incrementalStartOverride</code> and <code>incrementalEndOverride</code>
120
+ </td>
121
+ <td valign="top">
122
+ <b>📑 Batch Processing</b><br>
123
+ Process large exports in smaller batches via <code>numberOfDaysToProcess</code>
124
+ </td>
125
+ </tr>
126
+ <tr>
127
+ <td valign="top">
128
+ <b>🕐 Timezone-Aware Datetime</b><br>
129
+ <code>event_datetime</code> converted to a configurable IANA timezone
130
+ </td>
131
+ <td valign="top">
132
+ <b>🛡️ Zero Dependencies</b><br>
133
+ No additional external dependencies added to your Dataform repository
134
+ </td>
135
+ </tr>
136
+ </table>
137
+
138
+ ## Planned Features
139
+
140
+ Features under consideration for future releases:
141
+
142
+ - Web and app specific default configurations
143
+ - Custom channel grouping
144
+ - Data enrichment (item-level, session-level, event-level)
145
+ - Custom processing steps (additional CTEs)
146
+ - Custom traffic source attribution
147
+
148
+ ## Installation
149
+
150
+ ### Bash
151
+
152
+ ```bash
153
+ npm install ga4-export-fixer
154
+ ```
155
+
156
+ ### In Google Cloud Dataform
157
+
158
+ Include the package in the package.json file in your Dataform repository.
159
+
160
+ **`package.json`**
161
+
162
+ ```json
163
+ {
164
+ "dependencies": {
165
+ "@dataform/core": "3.0.42",
166
+ "ga4-export-fixer": "0.7.1"
167
+ }
168
+ }
169
+ ```
170
+
171
+ **Note:** The best practice is to specify the package version explicitly (e.g. `"0.1.2"`) rather than using `"latest"` or `"*"`, to avoid unexpected breaking changes when the package is updated.
172
+
173
+ In Google Cloud Dataform, click "Install Packages" to install it in your development workspace.
174
+
175
+ If your Dataform repository does not have a package.json file, see this guide: [https://docs.cloud.google.com/dataform/docs/manage-repository#move-to-package-json](https://docs.cloud.google.com/dataform/docs/manage-repository#move-to-package-json)
176
+
177
+ ## Usage
178
+
179
+ ### Create GA4 Events Enhanced Table
180
+
181
+ Creates an **enhanced** version of the GA4 BigQuery export (daily & intraday).
182
+
183
+ #### JS Deployment (Recommended) ![.JS](https://img.shields.io/badge/.JS-F7DF1E?style=flat-square)
184
+
185
+ Create a new **ga4_events_enhanced** table using a **.js** file in your repository's **definitions** folder.
186
+
187
+ **Using Defaults**
188
+
189
+ **`definitions/ga4/ga4_events_enhanced.js`**
190
+
191
+ ```javascript
192
+ const { ga4EventsEnhanced } = require('ga4-export-fixer');
193
+
194
+ const config = {
195
+ sourceTable: constants.GA4_TABLES.MY_GA4_EXPORT
196
+ };
197
+
198
+ ga4EventsEnhanced.createTable(publish, config);
199
+ ```
200
+
201
+ **With Custom Configuration**
202
+
203
+ **`definitions/ga4/ga4_events_enhanced.js`**
204
+
205
+ ```javascript
206
+ const { ga4EventsEnhanced } = require('ga4-export-fixer');
207
+
208
+ const config = {
209
+ sourceTable: constants.GA4_TABLES.MY_GA4_EXPORT,
210
+ // use dataformTableConfig to make changes to the default Dataform table configuration
211
+ dataformTableConfig: {
212
+ schema: 'ga4'
213
+ },
214
+ // test configurations
215
+ test: false,
216
+ testConfig: {
217
+ dateRangeStart: 'current_date()-1',
218
+ dateRangeEnd: 'current_date()',
219
+ },
220
+ schemaLock: '20260101', // lock to daily export; also supports 'intraday_20260101' or 'fresh_20260101'
221
+ customTimestampParam: 'custom_event_timestamp', // custom timestamp collected as an event param
222
+ timezone: 'Europe/Helsinki',
223
+ // not needed data
224
+ excludedColumns: [
225
+ 'app_info',
226
+ 'publisher'
227
+ ],
228
+ // not needed events
229
+ excludedEvents: [
230
+ 'session_start',
231
+ 'first_visit',
232
+ 'user_engagement'
233
+ ],
234
+ // transform to session-level
235
+ sessionParams: [
236
+ 'user_agent'
237
+ ],
238
+ // promote as columns
239
+ eventParamsToColumns: [
240
+ {name: 'session_engaged'},
241
+ {name: 'ga_session_number', type: 'int'},
242
+ {name: 'page_type', type: 'string'},
243
+ ],
244
+ // not needed in the event_params array
245
+ excludedEventParams: [
246
+ 'session_engaged',
247
+ 'ga_session_number',
248
+ 'page_type',
249
+ 'user_agent'
250
+ ],
251
+ // use export type for data_is_final instead of the default DAY_THRESHOLD
252
+ dataIsFinal: {
253
+ detectionMethod: 'EXPORT_TYPE',
254
+ },
255
+ // attribute item lists to downstream ecommerce events within the same session
256
+ itemListAttribution: {
257
+ lookbackType: 'SESSION',
258
+ },
259
+ };
260
+
261
+ ga4EventsEnhanced.createTable(publish, config);
262
+ ```
263
+
264
+ #### SQLX Deployment ![.SQLX](https://img.shields.io/badge/.SQLX-4285F4?style=flat-square)
265
+
266
+ Alternatively, you can create the **ga4_events_enhanced** table using a .SQLX file.
267
+
268
+ **`definitions/ga4/ga4_events_enhanced.sqlx`**
269
+
270
+ ```javascript
271
+ config {
272
+ type: "incremental",
273
+ description: "GA4 Events Enhanced table",
274
+ schema: "ga4",
275
+ onSchemaChange: "EXTEND",
276
+ bigquery: {
277
+ partitionBy: "event_date",
278
+ clusterBy: ['event_name', 'session_id', 'page_location', 'data_is_final'],
279
+ },
280
+ tags: ['ga4_export_fixer']
281
+ }
282
+
283
+ js {
284
+ const { ga4EventsEnhanced } = require('ga4-export-fixer');
285
+
286
+ const config = {
287
+ sourceTable: ref(constants.GA4_TABLES.MY_GA4_EXPORT),
288
+ self: self(),
289
+ incremental: incremental()
290
+ };
291
+ }
292
+
293
+ ${ga4EventsEnhanced.generateSql(config)}
294
+
295
+ pre_operations {
296
+ ${ga4EventsEnhanced.setPreOperations(config)}
297
+ }
298
+ ```
299
+
300
+ <br>
301
+
302
+ ---
303
+
304
+ ### Configuration Object
305
+
306
+ All fields are optional except `sourceTable`. Default values are applied automatically, so you only need to specify the fields you want to override.
307
+
308
+
309
+ | Field | Type | Default/Required | Description |
310
+ | ---------------------- | ----------------------- | ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
311
+ | `sourceTable` | Dataform ref() / string | **required** | Source GA4 export table. Use `ref()` in Dataform or a string in format ``project.dataset.table`` |
312
+ | `self` | Dataform self() | **required for .SQLX deployment** | Reference to the table itself. Use `self()` in Dataform |
313
+ | `incremental` | Dataform incremental() | **required for .SQLX deployment** | Switch between incremental and full refresh logic. Use `incremental()` in Dataform |
314
+ | `dataformTableConfig` | object | **In JS deployment only.** [See default](#default-dataformtableconfig) | Override the default Dataform table configuration for JS deployment. See: [ITableConfig reference](https://docs.cloud.google.com/dataform/docs/reference/dataform-core-reference#itableconfig) |
315
+ | `schemaLock` | string | `undefined` | Lock the table schema to a specific GA4 export table suffix. Accepts `"YYYYMMDD"` (daily), `"intraday_YYYYMMDD"`, or `"fresh_YYYYMMDD"`. Date must be >= `"20241009"` |
316
+ | `timezone` | string | `'Etc/UTC'` | IANA timezone for event datetime (e.g. `'Europe/Helsinki'`) |
317
+ | `customTimestampParam` | string | `undefined` | Name of a custom event parameter containing a JS timestamp in milliseconds (e.g. collected via `Date.now()`) |
318
+ | `bufferDays` | integer | `1` | Extra days to include for sessions that span midnight. Auto-adjusted when `itemListAttribution.lookbackType` is `'TIME'` and the lookback exceeds `bufferDays` |
319
+ | `itemListAttribution` | object | `undefined` | Enable item list attribution. See [Item List Attribution](#item-list-attribution) |
320
+ | `test` | boolean | `false` | Enable test mode (uses `testConfig` date range instead of pre-operations) |
321
+ | `excludedEventParams` | string[] | `[]` | Event parameter names to exclude from the `event_params` array |
322
+ | `excludedEvents` | string[] | `['session_start', 'first_visit']` | Event names to exclude from the table. These events are excluded by default because they have no use for analysis purposes. Override this to include them if needed |
323
+ | `excludedColumns` | string[] | `[]` | Default GA4 export columns to exclude from the final table, for example `'app_info'` or `'publisher'` |
324
+ | `sessionParams` | string[] | `[]` | Event parameter names to aggregate as session-level parameters |
325
+ | `includedExportTypes` | object | [See details](#includedExportTypes) | Which GA4 export types to include (daily, fresh, intraday) |
326
+ | `dataIsFinal` | object | [See details](#dataIsFinal) | How to determine whether data is final (not expected to change) |
327
+ | `testConfig` | object | [See details](#testConfig) | Date range used when `test` is `true` |
328
+ | `preOperations` | object | [See details](#preOperations) | Date range and incremental refresh configuration |
329
+ | `eventParamsToColumns` | object[] | `[]` | Event parameters to promote to columns. [See item schema](#eventParamsToColumns) |
330
+ | `customSteps` | object[] | `[]` | User-defined CTEs appended to the pipeline after `enhanced_events`. [See Custom CTEs](#custom-ctes) |
331
+
332
+ <a id="default-dataformtableconfig"></a>
333
+ <details>
334
+ <summary><strong>Default dataformTableConfig</strong></summary>
335
+
336
+ ```json
337
+ {
338
+ "name": "ga4_events_enhanced_<dataset_id>",
339
+ "type": "incremental",
340
+ "schema": "<source_dataset>",
341
+ "description": "<default description>",
342
+ "bigquery": {
343
+ "partitionBy": "event_date",
344
+ "clusterBy": [
345
+ "event_name",
346
+ "session_id",
347
+ "page_location",
348
+ "data_is_final"
349
+ ],
350
+ "labels": {
351
+ "ga4_export_fixer": "true"
352
+ }
353
+ },
354
+ "onSchemaChange": "EXTEND",
355
+ "tags": [
356
+ "ga4_export_fixer"
357
+ ]
358
+ }
359
+ ```
360
+
361
+ The `onSchemaChange: "EXTEND"` setting updates the result table schema on incremental runs, adding columns for any new fields the query produces.
362
+
363
+ </details>
364
+ <br>
365
+
366
+ <a id="includedExportTypes"></a>
367
+
368
+ **`includedExportTypes`** — which GA4 export types to include:
369
+
370
+
371
+ | Field | Type | Default | Description |
372
+ | ------------------------------ | ------- | ------- | -------------------------------- |
373
+ | `includedExportTypes.daily` | boolean | `true` | Include daily (processed) export |
374
+ | `includedExportTypes.fresh` | boolean | `false` | Include fresh (hourly-updated) export |
375
+ | `includedExportTypes.intraday` | boolean | `true` | Include intraday export |
376
+
377
+
378
+ Export priority: **daily > fresh > intraday**. Each lower-priority export only provides data not already covered by a higher-priority one. All seven combinations of the three export types are supported.
379
+
380
+ When all three exports are enabled, the package:
381
+ 1. Gets all data from daily export tables
382
+ 2. Gets fresh export data for days not yet covered by a daily table
383
+ 3. Gets intraday export data for events after the latest fresh event timestamp
384
+
385
+ The boundary between fresh and intraday is timestamp-based because the fresh export is updated hourly, so within the same day some events come from the fresh export and the rest from intraday.
386
+
387
+ > **Without daily export:** When `daily` is `false`, `dataIsFinal.detectionMethod` must be set to `'DAY_THRESHOLD'`, because `EXPORT_TYPE` detection relies on daily tables to mark data as final.
388
+
389
+ <a id="dataIsFinal"></a>
390
+
391
+ **`dataIsFinal`** — how to determine whether data is final (not expected to change):
392
+
393
+
394
+ | Field | Type | Default | Description |
395
+ | ----------------------------- | ------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
396
+ | `dataIsFinal.detectionMethod` | string | `'DAY_THRESHOLD'` | `'DAY_THRESHOLD'` (uses days since event; data older than `dayThreshold` is considered final) or `'EXPORT_TYPE'` (uses table suffix; all data from the daily export is considered final). `'EXPORT_TYPE'` is suitable for most **web only** properties as data is rarely received with a delay. Must be `'DAY_THRESHOLD'` when daily export is not enabled |
397
+ | `dataIsFinal.dayThreshold` | integer | `3` | Days after which data is considered final. According to GA4 documentation, data up to 72 hours old is subject to possible changes. Required when `detectionMethod` is `'DAY_THRESHOLD'` |
398
+
399
+
400
+ <a id="testConfig"></a>
401
+
402
+ **`testConfig`** — date range used when `test` is `true`:
403
+
404
+
405
+ | Field | Type | Default | Description |
406
+ | --------------------------- | ----------------- | -------------------- | --------------------------- |
407
+ | `testConfig.dateRangeStart` | string (SQL date) | `'current_date()-1'` | Start date for test queries |
408
+ | `testConfig.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for test queries |
409
+
410
+
411
+ <a id="preOperations"></a>
412
+
413
+ **`preOperations`** — date range and incremental refresh configuration:
414
+
415
+
416
+ | Field | Type | Default | Description |
417
+ | ------------------------------------------ | ----------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
418
+ | `preOperations.dateRangeStartFullRefresh` | string (SQL date) | `'date(2000, 1, 1)'` | Start date for full refresh |
419
+ | `preOperations.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for queries |
420
+ | `preOperations.numberOfPreviousDaysToScan` | integer | `10` | Number of days to scan backwards from the result table's last partition when determining the incremental refresh start checkpoint. Needs to cover the number of days that can still contain not final `(data_is_final = false)` data |
421
+ | `preOperations.incrementalStartOverride` | string (SQL date) | `undefined` | Override the incremental start date to re-process a specific range |
422
+ | `preOperations.incrementalEndOverride` | string (SQL date) | `undefined` | Override the incremental end date to re-process a specific range |
423
+ | `preOperations.numberOfDaysToProcess` | integer | `undefined` | Limit each run to N days of data. When set, the end date becomes `start + N - 1` (capped at `current_date()`). When `undefined`, `dateRangeEnd` is used as-is. `incrementalEndOverride` takes priority |
424
+
425
+ Date fields (`dateRangeStart`, `dateRangeEnd`, etc.) accept string dates in `YYYYMMDD` or `YYYY-MM-DD` format, or BigQuery SQL expressions (e.g. `'current_date()'`, `'date(2026, 1, 1)'`).
426
+
427
+ <a id="eventParamsToColumns"></a>
428
+
429
+ **`eventParamsToColumns`** — each item in the array is an object:
430
+
431
+
432
+ | Field | Type | Required | Description |
433
+ | ------------ | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------- |
434
+ | `name` | string | Yes | Event parameter name |
435
+ | `type` | string | No | Data type: `'string'`, `'int'`, `'int64'`, `'double'`, `'float'`, or `'float64'`. If omitted, returns the value converted to a string |
436
+ | `columnName` | string | No | Column name in the output. Defaults to the parameter `name` |
437
+
438
+
439
+ <a id="item-list-attribution"></a>
440
+
441
+ **`itemListAttribution`** — when set to an object, enables attribution of `item_list_name`, `item_list_id`, and `item_list_index` from `select_item`/`select_promotion` events to downstream ecommerce events (e.g. `add_to_cart`, `purchase`). Disabled by default.
442
+
443
+ | Field | Type | Required | Description |
444
+ | ---------------- | ------- | --------------------------- | --------------------------------------------------------------------- |
445
+ | `lookbackType` | string | Yes | `'SESSION'` (partition by session) or `'TIME'` (time-based window) |
446
+ | `lookbackTimeMs` | integer | When `lookbackType: 'TIME'` | Lookback window in milliseconds (e.g. `86400000` for 24h) |
447
+
448
+ ```javascript
449
+ // Session-based: attribute within the same session
450
+ itemListAttribution: { lookbackType: 'SESSION' }
451
+
452
+ // Time-based: attribute within a 24-hour window across sessions
453
+ itemListAttribution: { lookbackType: 'TIME', lookbackTimeMs: 86400000 }
454
+ ```
455
+
456
+ > **Note:** This feature adds a compute-heavy CTE with a window function over unnested items. Only enable it if you need item list attribution for ecommerce analysis.
457
+
458
+ <a id="custom-ctes"></a>
459
+
460
+ **`customSteps`** — append CTEs after `enhanced_events`. Each entry is either a raw `{name, query}` or a structured `{name, select, from, ...}`. The last entry becomes the table's final SELECT; earlier entries become CTEs.
461
+
462
+ **Stable CTE names you can reference from your custom steps:**
463
+
464
+ | Name | Always present? | Contents |
465
+ | ------------------------ | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
466
+ | `event_data` | yes | Extracted and shaped events from `sourceTable`, with date filtering and column promotions applied. *Unfiltered for the buffer-days range.* |
467
+ | `session_data` | yes | Session-level aggregations (grouped by `session_id`). |
468
+ | `item_list_attribution` | only when `itemListAttribution` is on | Per-event item attribution rows. |
469
+ | `item_list_data` | only when `itemListAttribution` is on | Re-aggregated items with attributed list fields. |
470
+ | `enhanced_events` | yes | The package's standard output shape (joined event_data + session_data + item_list_data, columns ordered, incremental date filter applied). The natural starting point for most custom CTEs. |
471
+
472
+
473
+ ```javascript
474
+ // Add a content_group column derived from page.path
475
+ customSteps: [
476
+ {
477
+ name: 'final',
478
+ query: `select
479
+ enhanced_events.*,
480
+ case
481
+ when page.path like '/blog/%' then 'blog'
482
+ when page.path like '/products/%' then 'product'
483
+ when page.path = '/' then 'home'
484
+ else 'other'
485
+ end as content_group
486
+ from enhanced_events`,
487
+ },
488
+ ],
489
+ ```
490
+
491
+ > **Note:** Custom columns aren't auto-documented. Use `dataformTableConfig.columns` to add descriptions — it's deep-merged with the package's defaults, so your keys are added or override matching defaults, and untouched defaults stay.
492
+
493
+ > **Note:** Built-in assertions assume the package's standard schema. If your custom CTEs rename, drop, or filter rows in ways that break those assumptions, disable the affected assertions explicitly via the `assertions` config option.
494
+
495
+ <br>
496
+
497
+ ---
498
+
499
+ ### Assertions
500
+
501
+ The package includes built-in data quality assertions that can be automatically created alongside the enhanced events table. Pass Dataform's `assert` function as the third argument to `createTable`:
502
+
503
+ ```javascript
504
+ ga4EventsEnhanced.createTable(publish, config, { assert });
505
+ ```
506
+
507
+ This creates the table along with the default-enabled assertions, using the same configuration:
508
+
509
+ | Assertion | Name | Enabled by default | Description |
510
+ | --------- | ---- | ------------------ | ----------- |
511
+ | `dailyQuality` | `{tableName}_daily_quality` | Yes | Compares session count, event count, item revenue, and ecommerce purchase revenue per day between the enhanced table and raw export. Also reconciles item_revenue at the (event_date, item_id) grain on purchase events for days both sides consider final. Detects missing days, count mismatches, and non-final data inflation |
512
+
513
+ The assertion inherits the table's schema and tags from `dataformTableConfig` and queries the last 5 days of data.
514
+
515
+ #### Selective Assertions
516
+
517
+ Disable the assertion by setting it to `false`:
518
+
519
+ ```javascript
520
+ ga4EventsEnhanced.createTable(publish, config, {
521
+ assert,
522
+ assertions: { dailyQuality: false },
523
+ });
524
+ ```
525
+
526
+ #### Assertion Config Overrides
527
+
528
+ Override the assertion's Dataform configuration (name, schema, tags):
529
+
530
+ ```javascript
531
+ ga4EventsEnhanced.createTable(publish, config, {
532
+ assert,
533
+ assertions: {
534
+ dailyQuality: { tags: ['data_quality', 'ga4_export_fixer'] },
535
+ },
536
+ });
537
+ ```
538
+
539
+ #### Standalone Assertions (SQLX Deployment)
540
+
541
+ For SQLX deployments or when you need full control, assertions can also be used as standalone SQL generators:
542
+
543
+ ```javascript
544
+ const { ga4EventsEnhanced } = require('ga4-export-fixer');
545
+
546
+ assert('daily_quality_check', {
547
+ schema: 'analytics_123456789',
548
+ tags: ['ga4_export_fixer'],
549
+ }).query(ctx => {
550
+ return ga4EventsEnhanced.assertions.dailyQuality(
551
+ ctx.ref('ga4_events_enhanced_123456789'),
552
+ { ...config, sourceTable: ctx.ref(config.sourceTable) }
553
+ );
554
+ });
555
+ ```
556
+
557
+ <br>
558
+
559
+ ---
560
+
561
+ ### Creating Incremental Downstream Tables from `ga4_events_enhanced`
562
+
563
+ Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_events_enhanced** table.
564
+
565
+ The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
566
+
567
+ Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
568
+
569
+ **`definitions/ga4/ga4_sessions.sqlx`**
570
+
571
+ ```javascript
572
+ config {
573
+ type: "incremental",
574
+ description: "GA4 sessions table",
575
+ schema: "ga4_export_fixer",
576
+ bigquery: {
577
+ partitionBy: "event_date",
578
+ clusterBy: ['session_id', 'data_is_final'],
579
+ },
580
+ tags: ['ga4_export_fixer']
581
+ }
582
+
583
+ js {
584
+ const { setPreOperations, helpers } = require('ga4-export-fixer');
585
+
586
+ const config = {
587
+ self: self(),
588
+ incremental: incremental(),
589
+ /*
590
+ Default options that can be overriden:
591
+ test: false,
592
+ testConfig: {
593
+ dateRangeStart: 'current_date()-1',
594
+ dateRangeEnd: 'current_date()',
595
+ },
596
+ preOperations: {
597
+ dateRangeStartFullRefresh: 'date(2000, 1, 1)',
598
+ dateRangeEnd: 'current_date()',
599
+ // incremental date range overrides allow re-processing only a subset of the data:
600
+ //incrementalStartOverride: undefined,
601
+ //incrementalEndOverride: undefined,
602
+ },
603
+ */
604
+ };
605
+ }
606
+
607
+ select
608
+ event_date,
609
+ session_id,
610
+ user_pseudo_id,
611
+ user_id,
612
+ any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
613
+ any_value(landing_page) as landing_page,
614
+ current_datetime() as row_inserted_timestamp,
615
+ min(data_is_final) as data_is_final
616
+ from
617
+ ${ref('ga4_events_enhanced_298233330')}
618
+ where
619
+ ${helpers.incrementalDateFilter(config)}
620
+ group by
621
+ event_date,
622
+ session_id,
623
+ user_pseudo_id,
624
+ user_id
625
+
626
+ pre_operations {
627
+ ${setPreOperations(config)}
628
+ }
629
+ ```
630
+
631
+ <br>
632
+
633
+ ---
634
+
635
+ ### Helpers
636
+
637
+ The helpers contain templates for common SQL expressions. The functions are referenced by **ga4EventsEnhanced** but can also be imported as utility functions for working with GA4 data.
638
+
639
+ ```javascript
640
+ const { helpers } = require('ga4-export-fixer');
641
+ ```
642
+
643
+ #### SQL Templates
644
+
645
+
646
+ | Name | Example | Description |
647
+ | ----------- | ------------------- | ------------------------------------------------------------------------- |
648
+ | `eventDate` | `helpers.eventDate` | Casts `event_date` string to a DATE using YYYYMMDD format |
649
+ | `sessionId` | `helpers.sessionId` | Builds a session ID by concatenating `user_pseudo_id` and `ga_session_id` |
650
+
651
+
652
+ #### Functions
653
+
654
+ **Unnesting parameters**
655
+
656
+
657
+ | Function | Example | Description |
658
+ | ------------------ | --------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
659
+ | `unnestEventParam` | `unnestEventParam('page_location', 'string')` | Extracts a value from the `event_params` array by key. Supported types: `'string'`, `'int'`, `'int64'`, `'double'`, `'float'`, `'float64'`. Omit type to get the value converted as a string |
660
+
661
+
662
+ **Date and time**
663
+
664
+
665
+ | Function | Example | Description |
666
+ | ------------------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
667
+ | `getEventTimestampMicros` | `getEventTimestampMicros('custom_ts')` | Returns SQL for event timestamp in microseconds. With a custom parameter, uses it (converted from ms) with fallback to `event_timestamp` |
668
+ | `getEventDateTime` | `getEventDateTime({ timezone: 'Europe/Helsinki' })` | Returns SQL for event datetime in the given timezone. Defaults to `'Etc/UTC'` |
669
+
670
+
671
+ **Date filters**
672
+
673
+
674
+ | Function | Example | Description |
675
+ | --------------------- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
676
+ | `ga4ExportDateFilter` | `ga4ExportDateFilter('daily', 'current_date()-7', 'current_date()')` | Generates a `_table_suffix` filter for a single export type (`'daily'` or `'intraday'`) and date range |
677
+
678
+
679
+ **Page details**
680
+
681
+
682
+ | Function | Example | Description |
683
+ | ----------------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
684
+ | `extractUrlHostname` | `extractUrlHostname('page_location')` | Extracts hostname from a URL column |
685
+ | `extractUrlPath` | `extractUrlPath('page_location')` | Extracts the path component from a URL column |
686
+ | `extractUrlQuery` | `extractUrlQuery('page_location')` | Extracts the query string (including `?`) from a URL column |
687
+ | `extractUrlQueryParams` | `extractUrlQueryParams('page_location')` | Parses URL query parameters into `ARRAY<STRUCT<key STRING, value STRING>>` |
688
+ | `extractPageDetails` | `extractPageDetails()` | Returns a struct with `hostname`, `path`, `query`, and `query_params`. Defaults to `page_location` event parameter |
689
+
690
+
691
+ **Aggregation**
692
+
693
+
694
+ | Function | Example | Description |
695
+ | ---------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
696
+ | `aggregateValue` | `aggregateValue('user_id', 'last', 'event_timestamp')` | Aggregates a column using `'max'`, `'min'`, `'first'`, `'last'`, or `'any'`. `'first'` and `'last'` use the timestamp column for ordering |
697
+
698
+
699
+ **Ecommerce**
700
+
701
+
702
+ | Function | Example | Description |
703
+ | -------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
704
+ | `fixEcommerceStruct` | `fixEcommerceStruct()` | Cleans the ecommerce struct: sets `transaction_id` to null when `'(not set)'`, and fixes missing/NaN `purchase_revenue` for purchase events |
705
+
706
+
707
+ **Data freshness**
708
+
709
+
710
+ | Function | Example | Description |
711
+ | ------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
712
+ | `isFinalData` | `isFinalData('DAY_THRESHOLD', 3)` | Returns SQL that evaluates to `true` when data is final. `'DAY_THRESHOLD'` uses days since event (`dayThreshold` is required and must be a non-negative integer); `'EXPORT_TYPE'` checks table suffix |
713
+
714
+
715
+ ## License
716
+
679
717
  MIT