ga4-export-fixer 0.2.3-dev.3 → 0.2.4-dev.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,26 +1,54 @@
1
1
  # ga4-export-fixer
2
2
 
3
- **ga4-export-fixer** is a **Dataform NPM package** that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines daily and intraday exports so the best available version of each event is always in use, adds session-level fields like `session_id` and `landing_page`, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.
3
+ [![npm version](https://img.shields.io/npm/v/ga4-export-fixer)](https://www.npmjs.com/package/ga4-export-fixer)
4
+
5
+ **ga4-export-fixer** is a **Dataform NPM package** that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines daily and intraday exports (360 fresh export support not yet available) so the best available version of each event is always in use, adds session-level fields like `session_id` and `landing_page`, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.
4
6
 
5
7
  The goal of the package is to **speed up development** when building data models and pipelines on top of GA4 export data, allowing you to focus on your use case instead of wrestling with the raw export format.
6
8
 
9
+ <img src="./docs/images/example_data_model.png" alt="Example Data Model" width="600">
10
+
11
+ *Example data model built with ga4-export-fixer*
12
+
7
13
  ### Table of Contents
8
14
  <!-- TOC -->
15
+ - [Main Features](#main-features)
9
16
  - [Planned, Upcoming Features](#planned-upcoming-features)
10
17
  - [Installation](#installation)
11
18
  - [Bash](#bash)
12
19
  - [In Google Cloud Dataform](#in-google-cloud-dataform)
13
20
  - [Usage](#usage)
14
21
  - [Create GA4 Events Enhanced Table](#create-ga4-events-enhanced-table)
15
- - [Building on top of the ga4_events_enhanced table](#building-on-top-of-the-ga4_events_enhanced-table)
16
22
  - [Configuration Object](#configuration-object)
23
+ - [Creating Incremental Downstream Tables from ga4_events_enhanced](#creating-incremental-downstream-tables-from-ga4_events_enhanced)
17
24
  - [Helpers](#helpers)
18
25
  - [License](#license)
19
26
  <!-- /TOC -->
20
27
 
28
+ ### Main Features
29
+
30
+ The **ga4_events_enhanced** table comes with features such as these:
31
+
32
+ - **Best available data at any time** – Combines daily (processed) and intraday exports so the most complete, accurate version of the data is always available
33
+ - **Robust incremental updates** – Run on any schedule (daily, hourly, or custom)
34
+ - **Flexible schema, better optimized for building data models** – Keeps the flexible structure of the original export while promoting key fields (e.g. `page_location`, `session_id`) to columns for better query performance; **partitioning and clustering** enabled
35
+ - **Session-level identity resolution** – `user_id` resolved to the last authenticated value per session; `merged_user_id` coalesces it with `user_pseudo_id`
36
+ - **Session traffic sources** – `session_first_traffic_source` and session-scoped `session_traffic_source_last_click` (adjusting for sessions that span midnight) computed automatically
37
+ - **Landing page detection** – `landing_page` derived per session from the first page where `entrances > 0`
38
+ - **Page URL parsing** – `page` struct with parsed `hostname`, `path`, `query`, and `query_params` from `page_location`
39
+ - **Ecommerce data fixes** – Nullifies `transaction_id` placeholder values and corrects `purchase_revenue` NaN / missing-value bugs
40
+ - **Event parameter handling** – Promote event params to columns; include or exclude by name
41
+ - **Session parameters** – Promote selected event parameters as session-level parameters
42
+ - **Custom timestamp support** – Optionally use a custom event parameter as the primary timestamp, with automatic fallback to `event_timestamp`
43
+ - **Schema lock** – Lock the table schema to a specific GA4 export date to prevent schema drift
44
+ - **Data freshness tracking** – `data_is_final` flag and `export_type` label on every row
45
+ - **Timezone-aware datetime** – `event_datetime` converted to a configurable IANA timezone
46
+ - **Column descriptions** – Full column-level documentation included in the Dataform table configuration, reflecting the specific configuration used to build the table
47
+
21
48
  ### Planned, Upcoming Features
22
49
 
23
- - Column descriptions
50
+ Features under consideration for future releases:
51
+
24
52
  - Web and app specific default configurations
25
53
  - Ecommerce item list attribution
26
54
  - Custom channel grouping
@@ -48,7 +76,7 @@ Include the package in the package.json file in your Dataform repository.
48
76
  {
49
77
  "dependencies": {
50
78
  "@dataform/core": "3.0.42",
51
- "ga4-export-fixer": "0.2.2"
79
+ "ga4-export-fixer": "0.2.3"
52
80
  }
53
81
  }
54
82
  ```
@@ -65,14 +93,6 @@ If your Dataform repository does not have a package.json file, see this guide: [
65
93
 
66
94
  Creates an **enhanced** version of the GA4 BigQuery export (daily & intraday).
67
95
 
68
- The main features include:
69
-
70
- - **Best available data at any time** – Combines daily (processed) and intraday exports so the most complete, accurate version of the data is always available
71
- - **Robust incremental updates** – Run on any schedule (daily, hourly, or custom)
72
- - **Flexible schema, better optimized for building data models** – Keeps the flexible structure of the original export while promoting key fields (e.g. `page_location`, `session_id`) to columns for better query performance; **partitioning and clustering** enabled
73
- - **Event parameter handling** – Promote event params to columns; include or exclude by name
74
- - **Session parameters** – Promote selected event parameters as session-level parameters
75
-
76
96
  #### JS Deployment (Recommended)
77
97
 
78
98
  Create a new **ga4_events_enhanced** table using a **.js** file in your repository's **definitions** folder.
@@ -103,7 +123,7 @@ const config = {
103
123
  // use dataformTableConfig to make changes to the default Dataform table configuration
104
124
  dataformTableConfig: {
105
125
  schema: 'ga4'
106
- }
126
+ },
107
127
  // test configurations
108
128
  test: false,
109
129
  testConfig: {
@@ -187,76 +207,6 @@ pre_operations {
187
207
  }
188
208
  ```
189
209
 
190
- ### Building on top of the ga4_events_enhanced table
191
-
192
- Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_enhanced_events** table.
193
-
194
- The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
195
-
196
- Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
197
-
198
- **`definitions/ga4/ga4_sessions.sqlx`**
199
-
200
- ```javascript
201
- config {
202
- type: "incremental",
203
- description: "GA4 sessions table",
204
- schema: "ga4_export_fixer",
205
- bigquery: {
206
- partitionBy: "event_date",
207
- clusterBy: ['session_id', 'data_is_final'],
208
- },
209
- tags: ['ga4_export_fixer']
210
- }
211
-
212
- js {
213
- const { setPreOperations, helpers } = require('ga4-export-fixer');
214
-
215
- const config = {
216
- self: self(),
217
- incremental: incremental(),
218
- /*
219
- Default options that can be overriden:
220
- test: false,
221
- testConfig: {
222
- dateRangeStart: 'current_date()-1',
223
- dateRangeEnd: 'current_date()',
224
- },
225
- preOperations: {
226
- dateRangeStartFullRefresh: 'date(2000, 1, 1)',
227
- dateRangeEnd: 'current_date()',
228
- // incremental date range overrides allow re-processing only a subset of the data:
229
- //incrementalStartOverride: undefined,
230
- //incrementalEndOverride: undefined,
231
- },
232
- */
233
- };
234
- }
235
-
236
- select
237
- event_date,
238
- session_id,
239
- user_pseudo_id,
240
- user_id,
241
- any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
242
- any_value(landing_page) as landing_page,
243
- current_datetime() as row_inserted_timestamp,
244
- min(data_is_final) as data_is_final
245
- from
246
- ${ref('ga4_events_enhanced_298233330')}
247
- where
248
- ${helpers.incrementalDateFilter(config)}
249
- group by
250
- event_date,
251
- session_id,
252
- user_pseudo_id,
253
- user_id
254
-
255
- pre_operations {
256
- ${setPreOperations(config)}
257
- }
258
- ```
259
-
260
210
  ### Configuration Object
261
211
 
262
212
  All fields are optional except `sourceTable`. Default values are applied automatically, so you only need to specify the fields you want to override.
@@ -286,7 +236,7 @@ All fields are optional except `sourceTable`. Default values are applied automat
286
236
  {
287
237
  "name": "ga4_events_enhanced_<dataset_id>",
288
238
  "type": "incremental",
289
- "schema": "ga4_export_fixer",
239
+ "schema": "<source_dataset>",
290
240
  "description": "<default description>",
291
241
  "bigquery": {
292
242
  "partitionBy": "event_date",
@@ -312,7 +262,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
312
262
  </details>
313
263
  <br>
314
264
 
315
- `**includedExportTypes`** — which GA4 export types to include:
265
+ `**includedExportTypes**` — which GA4 export types to include:
316
266
 
317
267
 
318
268
  | Field | Type | Default | Description |
@@ -323,7 +273,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
323
273
 
324
274
  > **Intraday-only mode:** Set `daily` to `false` and `intraday` to `true` to use only intraday export tables. When using intraday-only mode, `dataIsFinal.detectionMethod` must be set to `'DAY_THRESHOLD'`.
325
275
 
326
- `**dataIsFinal`** — how to determine whether data is final (not expected to change):
276
+ **`dataIsFinal`** — how to determine whether data is final (not expected to change):
327
277
 
328
278
 
329
279
  | Field | Type | Default | Description |
@@ -332,7 +282,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
332
282
  | `dataIsFinal.dayThreshold` | integer | `4` | Days after which data is considered final. Required when `detectionMethod` is `'DAY_THRESHOLD'` |
333
283
 
334
284
 
335
- `**testConfig**` — date range used when `test` is `true`:
285
+ **`testConfig`** — date range used when `test` is `true`:
336
286
 
337
287
 
338
288
  | Field | Type | Default | Description |
@@ -341,7 +291,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
341
291
  | `testConfig.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for test queries |
342
292
 
343
293
 
344
- `**preOperations**` — date range and incremental refresh configuration:
294
+ **`preOperations`** — date range and incremental refresh configuration:
345
295
 
346
296
 
347
297
  | Field | Type | Default | Description |
@@ -353,7 +303,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
353
303
  | `preOperations.incrementalEndOverride` | string (SQL date) | `undefined` | Override the incremental end date to re-process a specific range |
354
304
 
355
305
 
356
- `**eventParamsToColumns**` — each item in the array is an object:
306
+ **`eventParamsToColumns`** — each item in the array is an object:
357
307
 
358
308
 
359
309
  | Field | Type | Required | Description |
@@ -365,6 +315,76 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
365
315
 
366
316
  Date fields (`dateRangeStart`, `dateRangeEnd`, etc.) accept string dates in `YYYYMMDD` or `YYYY-MM-DD` format, or BigQuery SQL expressions (e.g. `'current_date()'`, `'date(2026, 1, 1)'`).
367
317
 
318
+ ### Creating Incremental Downstream Tables from ga4_events_enhanced
319
+
320
+ Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_events_enhanced** table.
321
+
322
+ The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
323
+
324
+ Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
325
+
326
+ **`definitions/ga4/ga4_sessions.sqlx`**
327
+
328
+ ```javascript
329
+ config {
330
+ type: "incremental",
331
+ description: "GA4 sessions table",
332
+ schema: "ga4_export_fixer",
333
+ bigquery: {
334
+ partitionBy: "event_date",
335
+ clusterBy: ['session_id', 'data_is_final'],
336
+ },
337
+ tags: ['ga4_export_fixer']
338
+ }
339
+
340
+ js {
341
+ const { setPreOperations, helpers } = require('ga4-export-fixer');
342
+
343
+ const config = {
344
+ self: self(),
345
+ incremental: incremental(),
346
+ /*
347
+ Default options that can be overriden:
348
+ test: false,
349
+ testConfig: {
350
+ dateRangeStart: 'current_date()-1',
351
+ dateRangeEnd: 'current_date()',
352
+ },
353
+ preOperations: {
354
+ dateRangeStartFullRefresh: 'date(2000, 1, 1)',
355
+ dateRangeEnd: 'current_date()',
356
+ // incremental date range overrides allow re-processing only a subset of the data:
357
+ //incrementalStartOverride: undefined,
358
+ //incrementalEndOverride: undefined,
359
+ },
360
+ */
361
+ };
362
+ }
363
+
364
+ select
365
+ event_date,
366
+ session_id,
367
+ user_pseudo_id,
368
+ user_id,
369
+ any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
370
+ any_value(landing_page) as landing_page,
371
+ current_datetime() as row_inserted_timestamp,
372
+ min(data_is_final) as data_is_final
373
+ from
374
+ ${ref('ga4_events_enhanced_298233330')}
375
+ where
376
+ ${helpers.incrementalDateFilter(config)}
377
+ group by
378
+ event_date,
379
+ session_id,
380
+ user_pseudo_id,
381
+ user_id
382
+
383
+ pre_operations {
384
+ ${setPreOperations(config)}
385
+ }
386
+ ```
387
+
368
388
  ### Helpers
369
389
 
370
390
  The helpers contain templates for common SQL expressions. The functions are referenced by **ga4EventsEnhanced** but can also be imported as utility functions for working with GA4 data.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "event_date": "Date of the event, cast to DATE from the original YYYYMMDD string",
3
3
  "event_datetime": "Event datetime converted to the configured timezone from event_timestamp (or custom timestamp if configured)",
4
- "event_timestamp": "Time in microseconds (UTC) when the event was logged by Google Analytics",
4
+ "event_timestamp": "Time in microseconds (UTC) when the event was received by Google Analytics",
5
5
  "event_custom_timestamp": "Event timestamp in microseconds derived from a custom event parameter (e.g. collected via Date.now()), falling back to event_timestamp when the custom parameter is null. Only present when customTimestampParam is configured",
6
6
  "event_name": "Name of the event (e.g. page_view, purchase, scroll)",
7
7
  "session_id": "Unique session identifier, constructed by concatenating user_pseudo_id with the ga_session_id event parameter",
@@ -48,7 +48,7 @@
48
48
  "columns": {
49
49
  "string_value": "String value of the event parameter",
50
50
  "int_value": "Integer value of the event parameter",
51
- "float_value": "Float value of the event parameter",
51
+ "float_value": "Float value of the event parameter (currently unused in GA4)",
52
52
  "double_value": "Double value of the event parameter"
53
53
  }
54
54
  }
@@ -63,14 +63,14 @@
63
63
  "columns": {
64
64
  "string_value": "String value of the session parameter",
65
65
  "int_value": "Integer value of the session parameter",
66
- "float_value": "Float value of the session parameter",
66
+ "float_value": "Float value of the session parameter (currently unused in GA4)",
67
67
  "double_value": "Double value of the session parameter"
68
68
  }
69
69
  }
70
70
  }
71
71
  },
72
72
  "user_properties": {
73
- "description": "User properties set via the Google Analytics SDK or gtag API",
73
+ "description": "User properties",
74
74
  "columns": {
75
75
  "key": "Name of the user property",
76
76
  "value": {
@@ -79,7 +79,7 @@
79
79
  "string_value": "String value of the user property",
80
80
  "int_value": "Integer value of the user property",
81
81
  "double_value": "Double value of the user property",
82
- "float_value": "Float value of the user property (currently unused by GA4)",
82
+ "float_value": "Float value of the user property (currently unused in GA4)",
83
83
  "set_timestamp_micros": "Time in microseconds at which the user property was last set"
84
84
  }
85
85
  }
@@ -284,7 +284,7 @@
284
284
  "source": "Source network that first acquired the user"
285
285
  }
286
286
  },
287
- "event_previous_timestamp": "Time in microseconds (UTC) when the previous event was logged",
287
+ "event_previous_timestamp": "Time in microseconds (UTC) when the previous event happened",
288
288
  "event_value_in_usd": "Currency-converted value (in USD) of the event's 'value' parameter",
289
289
  "event_bundle_sequence_id": "Sequential ID of the bundle in which the event was uploaded",
290
290
  "event_server_timestamp_offset": "Timestamp offset between collection time and upload time in microseconds",
package/documentation.js CHANGED
@@ -10,6 +10,8 @@ const columnDescriptions = require('./columns/columnDescriptions.json');
10
10
  const getColumnDescriptions = (config) => {
11
11
  const descriptions = JSON.parse(JSON.stringify(columnDescriptions));
12
12
 
13
+ if (!config) return descriptions;
14
+
13
15
  const appendToDescription = (key, suffix) => {
14
16
  if (!descriptions[key]) return;
15
17
  if (typeof descriptions[key] === 'string') {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ga4-export-fixer",
3
- "version": "0.2.3-dev.3",
3
+ "version": "0.2.4-dev.0",
4
4
  "description": "",
5
5
  "main": "index.js",
6
6
  "files": [
package/preOperations.js CHANGED
@@ -128,6 +128,11 @@ const createSchemaLockTable = (config) => {
128
128
 
129
129
  // Set the pre operations for the query
130
130
  const setPreOperations = (config) => {
131
+ // if in test mode, avoid setting BigQuery variables to make query dry run estimation accurate
132
+ if (config.test) {
133
+ return '';
134
+ }
135
+
131
136
  // define the pre operations
132
137
  const preOperations = [
133
138
  {