ga4-export-fixer 0.2.3-dev.3 → 0.2.4-dev.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +109 -89
- package/columns/columnDescriptions.json +6 -6
- package/documentation.js +2 -0
- package/package.json +1 -1
- package/preOperations.js +5 -0
package/README.md
CHANGED
|
@@ -1,26 +1,54 @@
|
|
|
1
1
|
# ga4-export-fixer
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/ga4-export-fixer)
|
|
4
|
+
|
|
5
|
+
**ga4-export-fixer** is a **Dataform NPM package** that transforms raw GA4 BigQuery export data into a cleaner, more queryable incremental table. It combines daily and intraday exports (360 fresh export support not yet available) so the best available version of each event is always in use, adds session-level fields like `session_id` and `landing_page`, promotes key event parameters to columns, and fixes known GA4 export issues — handling the boilerplate transformations that are otherwise tedious to include in every GA4 query.
|
|
4
6
|
|
|
5
7
|
The goal of the package is to **speed up development** when building data models and pipelines on top of GA4 export data, allowing you to focus on your use case instead of wrestling with the raw export format.
|
|
6
8
|
|
|
9
|
+
<img src="./docs/images/example_data_model.png" alt="Example Data Model" width="600">
|
|
10
|
+
|
|
11
|
+
*Example data model built with ga4-export-fixer*
|
|
12
|
+
|
|
7
13
|
### Table of Contents
|
|
8
14
|
<!-- TOC -->
|
|
15
|
+
- [Main Features](#main-features)
|
|
9
16
|
- [Planned, Upcoming Features](#planned-upcoming-features)
|
|
10
17
|
- [Installation](#installation)
|
|
11
18
|
- [Bash](#bash)
|
|
12
19
|
- [In Google Cloud Dataform](#in-google-cloud-dataform)
|
|
13
20
|
- [Usage](#usage)
|
|
14
21
|
- [Create GA4 Events Enhanced Table](#create-ga4-events-enhanced-table)
|
|
15
|
-
- [Building on top of the ga4_events_enhanced table](#building-on-top-of-the-ga4_events_enhanced-table)
|
|
16
22
|
- [Configuration Object](#configuration-object)
|
|
23
|
+
- [Creating Incremental Downstream Tables from ga4_events_enhanced](#creating-incremental-downstream-tables-from-ga4_events_enhanced)
|
|
17
24
|
- [Helpers](#helpers)
|
|
18
25
|
- [License](#license)
|
|
19
26
|
<!-- /TOC -->
|
|
20
27
|
|
|
28
|
+
### Main Features
|
|
29
|
+
|
|
30
|
+
The **ga4_events_enhanced** table comes with features such as these:
|
|
31
|
+
|
|
32
|
+
- **Best available data at any time** – Combines daily (processed) and intraday exports so the most complete, accurate version of the data is always available
|
|
33
|
+
- **Robust incremental updates** – Run on any schedule (daily, hourly, or custom)
|
|
34
|
+
- **Flexible schema, better optimized for building data models** – Keeps the flexible structure of the original export while promoting key fields (e.g. `page_location`, `session_id`) to columns for better query performance; **partitioning and clustering** enabled
|
|
35
|
+
- **Session-level identity resolution** – `user_id` resolved to the last authenticated value per session; `merged_user_id` coalesces it with `user_pseudo_id`
|
|
36
|
+
- **Session traffic sources** – `session_first_traffic_source` and session-scoped `session_traffic_source_last_click` (adjusting for sessions that span midnight) computed automatically
|
|
37
|
+
- **Landing page detection** – `landing_page` derived per session from the first page where `entrances > 0`
|
|
38
|
+
- **Page URL parsing** – `page` struct with parsed `hostname`, `path`, `query`, and `query_params` from `page_location`
|
|
39
|
+
- **Ecommerce data fixes** – Nullifies `transaction_id` placeholder values and corrects `purchase_revenue` NaN / missing-value bugs
|
|
40
|
+
- **Event parameter handling** – Promote event params to columns; include or exclude by name
|
|
41
|
+
- **Session parameters** – Promote selected event parameters as session-level parameters
|
|
42
|
+
- **Custom timestamp support** – Optionally use a custom event parameter as the primary timestamp, with automatic fallback to `event_timestamp`
|
|
43
|
+
- **Schema lock** – Lock the table schema to a specific GA4 export date to prevent schema drift
|
|
44
|
+
- **Data freshness tracking** – `data_is_final` flag and `export_type` label on every row
|
|
45
|
+
- **Timezone-aware datetime** – `event_datetime` converted to a configurable IANA timezone
|
|
46
|
+
- **Column descriptions** – Full column-level documentation included in the Dataform table configuration, reflecting the specific configuration used to build the table
|
|
47
|
+
|
|
21
48
|
### Planned, Upcoming Features
|
|
22
49
|
|
|
23
|
-
|
|
50
|
+
Features under consideration for future releases:
|
|
51
|
+
|
|
24
52
|
- Web and app specific default configurations
|
|
25
53
|
- Ecommerce item list attribution
|
|
26
54
|
- Custom channel grouping
|
|
@@ -48,7 +76,7 @@ Include the package in the package.json file in your Dataform repository.
|
|
|
48
76
|
{
|
|
49
77
|
"dependencies": {
|
|
50
78
|
"@dataform/core": "3.0.42",
|
|
51
|
-
"ga4-export-fixer": "0.2.
|
|
79
|
+
"ga4-export-fixer": "0.2.3"
|
|
52
80
|
}
|
|
53
81
|
}
|
|
54
82
|
```
|
|
@@ -65,14 +93,6 @@ If your Dataform repository does not have a package.json file, see this guide: [
|
|
|
65
93
|
|
|
66
94
|
Creates an **enhanced** version of the GA4 BigQuery export (daily & intraday).
|
|
67
95
|
|
|
68
|
-
The main features include:
|
|
69
|
-
|
|
70
|
-
- **Best available data at any time** – Combines daily (processed) and intraday exports so the most complete, accurate version of the data is always available
|
|
71
|
-
- **Robust incremental updates** – Run on any schedule (daily, hourly, or custom)
|
|
72
|
-
- **Flexible schema, better optimized for building data models** – Keeps the flexible structure of the original export while promoting key fields (e.g. `page_location`, `session_id`) to columns for better query performance; **partitioning and clustering** enabled
|
|
73
|
-
- **Event parameter handling** – Promote event params to columns; include or exclude by name
|
|
74
|
-
- **Session parameters** – Promote selected event parameters as session-level parameters
|
|
75
|
-
|
|
76
96
|
#### JS Deployment (Recommended)
|
|
77
97
|
|
|
78
98
|
Create a new **ga4_events_enhanced** table using a **.js** file in your repository's **definitions** folder.
|
|
@@ -103,7 +123,7 @@ const config = {
|
|
|
103
123
|
// use dataformTableConfig to make changes to the default Dataform table configuration
|
|
104
124
|
dataformTableConfig: {
|
|
105
125
|
schema: 'ga4'
|
|
106
|
-
}
|
|
126
|
+
},
|
|
107
127
|
// test configurations
|
|
108
128
|
test: false,
|
|
109
129
|
testConfig: {
|
|
@@ -187,76 +207,6 @@ pre_operations {
|
|
|
187
207
|
}
|
|
188
208
|
```
|
|
189
209
|
|
|
190
|
-
### Building on top of the ga4_events_enhanced table
|
|
191
|
-
|
|
192
|
-
Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_enhanced_events** table.
|
|
193
|
-
|
|
194
|
-
The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
|
|
195
|
-
|
|
196
|
-
Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
|
|
197
|
-
|
|
198
|
-
**`definitions/ga4/ga4_sessions.sqlx`**
|
|
199
|
-
|
|
200
|
-
```javascript
|
|
201
|
-
config {
|
|
202
|
-
type: "incremental",
|
|
203
|
-
description: "GA4 sessions table",
|
|
204
|
-
schema: "ga4_export_fixer",
|
|
205
|
-
bigquery: {
|
|
206
|
-
partitionBy: "event_date",
|
|
207
|
-
clusterBy: ['session_id', 'data_is_final'],
|
|
208
|
-
},
|
|
209
|
-
tags: ['ga4_export_fixer']
|
|
210
|
-
}
|
|
211
|
-
|
|
212
|
-
js {
|
|
213
|
-
const { setPreOperations, helpers } = require('ga4-export-fixer');
|
|
214
|
-
|
|
215
|
-
const config = {
|
|
216
|
-
self: self(),
|
|
217
|
-
incremental: incremental(),
|
|
218
|
-
/*
|
|
219
|
-
Default options that can be overriden:
|
|
220
|
-
test: false,
|
|
221
|
-
testConfig: {
|
|
222
|
-
dateRangeStart: 'current_date()-1',
|
|
223
|
-
dateRangeEnd: 'current_date()',
|
|
224
|
-
},
|
|
225
|
-
preOperations: {
|
|
226
|
-
dateRangeStartFullRefresh: 'date(2000, 1, 1)',
|
|
227
|
-
dateRangeEnd: 'current_date()',
|
|
228
|
-
// incremental date range overrides allow re-processing only a subset of the data:
|
|
229
|
-
//incrementalStartOverride: undefined,
|
|
230
|
-
//incrementalEndOverride: undefined,
|
|
231
|
-
},
|
|
232
|
-
*/
|
|
233
|
-
};
|
|
234
|
-
}
|
|
235
|
-
|
|
236
|
-
select
|
|
237
|
-
event_date,
|
|
238
|
-
session_id,
|
|
239
|
-
user_pseudo_id,
|
|
240
|
-
user_id,
|
|
241
|
-
any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
|
|
242
|
-
any_value(landing_page) as landing_page,
|
|
243
|
-
current_datetime() as row_inserted_timestamp,
|
|
244
|
-
min(data_is_final) as data_is_final
|
|
245
|
-
from
|
|
246
|
-
${ref('ga4_events_enhanced_298233330')}
|
|
247
|
-
where
|
|
248
|
-
${helpers.incrementalDateFilter(config)}
|
|
249
|
-
group by
|
|
250
|
-
event_date,
|
|
251
|
-
session_id,
|
|
252
|
-
user_pseudo_id,
|
|
253
|
-
user_id
|
|
254
|
-
|
|
255
|
-
pre_operations {
|
|
256
|
-
${setPreOperations(config)}
|
|
257
|
-
}
|
|
258
|
-
```
|
|
259
|
-
|
|
260
210
|
### Configuration Object
|
|
261
211
|
|
|
262
212
|
All fields are optional except `sourceTable`. Default values are applied automatically, so you only need to specify the fields you want to override.
|
|
@@ -286,7 +236,7 @@ All fields are optional except `sourceTable`. Default values are applied automat
|
|
|
286
236
|
{
|
|
287
237
|
"name": "ga4_events_enhanced_<dataset_id>",
|
|
288
238
|
"type": "incremental",
|
|
289
|
-
"schema": "
|
|
239
|
+
"schema": "<source_dataset>",
|
|
290
240
|
"description": "<default description>",
|
|
291
241
|
"bigquery": {
|
|
292
242
|
"partitionBy": "event_date",
|
|
@@ -312,7 +262,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
312
262
|
</details>
|
|
313
263
|
<br>
|
|
314
264
|
|
|
315
|
-
`**includedExportTypes
|
|
265
|
+
`**includedExportTypes**` — which GA4 export types to include:
|
|
316
266
|
|
|
317
267
|
|
|
318
268
|
| Field | Type | Default | Description |
|
|
@@ -323,7 +273,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
323
273
|
|
|
324
274
|
> **Intraday-only mode:** Set `daily` to `false` and `intraday` to `true` to use only intraday export tables. When using intraday-only mode, `dataIsFinal.detectionMethod` must be set to `'DAY_THRESHOLD'`.
|
|
325
275
|
|
|
326
|
-
|
|
276
|
+
**`dataIsFinal`** — how to determine whether data is final (not expected to change):
|
|
327
277
|
|
|
328
278
|
|
|
329
279
|
| Field | Type | Default | Description |
|
|
@@ -332,7 +282,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
332
282
|
| `dataIsFinal.dayThreshold` | integer | `4` | Days after which data is considered final. Required when `detectionMethod` is `'DAY_THRESHOLD'` |
|
|
333
283
|
|
|
334
284
|
|
|
335
|
-
`**
|
|
285
|
+
**`testConfig`** — date range used when `test` is `true`:
|
|
336
286
|
|
|
337
287
|
|
|
338
288
|
| Field | Type | Default | Description |
|
|
@@ -341,7 +291,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
341
291
|
| `testConfig.dateRangeEnd` | string (SQL date) | `'current_date()'` | End date for test queries |
|
|
342
292
|
|
|
343
293
|
|
|
344
|
-
`**
|
|
294
|
+
**`preOperations`** — date range and incremental refresh configuration:
|
|
345
295
|
|
|
346
296
|
|
|
347
297
|
| Field | Type | Default | Description |
|
|
@@ -353,7 +303,7 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
353
303
|
| `preOperations.incrementalEndOverride` | string (SQL date) | `undefined` | Override the incremental end date to re-process a specific range |
|
|
354
304
|
|
|
355
305
|
|
|
356
|
-
`**
|
|
306
|
+
**`eventParamsToColumns`** — each item in the array is an object:
|
|
357
307
|
|
|
358
308
|
|
|
359
309
|
| Field | Type | Required | Description |
|
|
@@ -365,6 +315,76 @@ The `onSchemaChange: "EXTEND"` setting updates the result table schema on increm
|
|
|
365
315
|
|
|
366
316
|
Date fields (`dateRangeStart`, `dateRangeEnd`, etc.) accept string dates in `YYYYMMDD` or `YYYY-MM-DD` format, or BigQuery SQL expressions (e.g. `'current_date()'`, `'date(2026, 1, 1)'`).
|
|
367
317
|
|
|
318
|
+
### Creating Incremental Downstream Tables from ga4_events_enhanced
|
|
319
|
+
|
|
320
|
+
Setting up incremental updates is easy using the **setPreOperations()** function. Just ensure that your result table includes the **data_is_final** flag from the **ga4_events_enhanced** table.
|
|
321
|
+
|
|
322
|
+
The **incrementalDateFilter()** function applies the same date filtering used by **ga4_events_enhanced**, based on the **config** options and the variables declared by **setPreOperations()**.
|
|
323
|
+
|
|
324
|
+
Key fields such as session_id, user_id, and session_traffic_source_last_click are available as clean, sessionized versions that handle edge cases like sessions spanning midnight.
|
|
325
|
+
|
|
326
|
+
**`definitions/ga4/ga4_sessions.sqlx`**
|
|
327
|
+
|
|
328
|
+
```javascript
|
|
329
|
+
config {
|
|
330
|
+
type: "incremental",
|
|
331
|
+
description: "GA4 sessions table",
|
|
332
|
+
schema: "ga4_export_fixer",
|
|
333
|
+
bigquery: {
|
|
334
|
+
partitionBy: "event_date",
|
|
335
|
+
clusterBy: ['session_id', 'data_is_final'],
|
|
336
|
+
},
|
|
337
|
+
tags: ['ga4_export_fixer']
|
|
338
|
+
}
|
|
339
|
+
|
|
340
|
+
js {
|
|
341
|
+
const { setPreOperations, helpers } = require('ga4-export-fixer');
|
|
342
|
+
|
|
343
|
+
const config = {
|
|
344
|
+
self: self(),
|
|
345
|
+
incremental: incremental(),
|
|
346
|
+
/*
|
|
347
|
+
Default options that can be overriden:
|
|
348
|
+
test: false,
|
|
349
|
+
testConfig: {
|
|
350
|
+
dateRangeStart: 'current_date()-1',
|
|
351
|
+
dateRangeEnd: 'current_date()',
|
|
352
|
+
},
|
|
353
|
+
preOperations: {
|
|
354
|
+
dateRangeStartFullRefresh: 'date(2000, 1, 1)',
|
|
355
|
+
dateRangeEnd: 'current_date()',
|
|
356
|
+
// incremental date range overrides allow re-processing only a subset of the data:
|
|
357
|
+
//incrementalStartOverride: undefined,
|
|
358
|
+
//incrementalEndOverride: undefined,
|
|
359
|
+
},
|
|
360
|
+
*/
|
|
361
|
+
};
|
|
362
|
+
}
|
|
363
|
+
|
|
364
|
+
select
|
|
365
|
+
event_date,
|
|
366
|
+
session_id,
|
|
367
|
+
user_pseudo_id,
|
|
368
|
+
user_id,
|
|
369
|
+
any_value(session_traffic_source_last_click.cross_channel_campaign) as session_traffic_source,
|
|
370
|
+
any_value(landing_page) as landing_page,
|
|
371
|
+
current_datetime() as row_inserted_timestamp,
|
|
372
|
+
min(data_is_final) as data_is_final
|
|
373
|
+
from
|
|
374
|
+
${ref('ga4_events_enhanced_298233330')}
|
|
375
|
+
where
|
|
376
|
+
${helpers.incrementalDateFilter(config)}
|
|
377
|
+
group by
|
|
378
|
+
event_date,
|
|
379
|
+
session_id,
|
|
380
|
+
user_pseudo_id,
|
|
381
|
+
user_id
|
|
382
|
+
|
|
383
|
+
pre_operations {
|
|
384
|
+
${setPreOperations(config)}
|
|
385
|
+
}
|
|
386
|
+
```
|
|
387
|
+
|
|
368
388
|
### Helpers
|
|
369
389
|
|
|
370
390
|
The helpers contain templates for common SQL expressions. The functions are referenced by **ga4EventsEnhanced** but can also be imported as utility functions for working with GA4 data.
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"event_date": "Date of the event, cast to DATE from the original YYYYMMDD string",
|
|
3
3
|
"event_datetime": "Event datetime converted to the configured timezone from event_timestamp (or custom timestamp if configured)",
|
|
4
|
-
"event_timestamp": "Time in microseconds (UTC) when the event was
|
|
4
|
+
"event_timestamp": "Time in microseconds (UTC) when the event was received by Google Analytics",
|
|
5
5
|
"event_custom_timestamp": "Event timestamp in microseconds derived from a custom event parameter (e.g. collected via Date.now()), falling back to event_timestamp when the custom parameter is null. Only present when customTimestampParam is configured",
|
|
6
6
|
"event_name": "Name of the event (e.g. page_view, purchase, scroll)",
|
|
7
7
|
"session_id": "Unique session identifier, constructed by concatenating user_pseudo_id with the ga_session_id event parameter",
|
|
@@ -48,7 +48,7 @@
|
|
|
48
48
|
"columns": {
|
|
49
49
|
"string_value": "String value of the event parameter",
|
|
50
50
|
"int_value": "Integer value of the event parameter",
|
|
51
|
-
"float_value": "Float value of the event parameter",
|
|
51
|
+
"float_value": "Float value of the event parameter (currently unused in GA4)",
|
|
52
52
|
"double_value": "Double value of the event parameter"
|
|
53
53
|
}
|
|
54
54
|
}
|
|
@@ -63,14 +63,14 @@
|
|
|
63
63
|
"columns": {
|
|
64
64
|
"string_value": "String value of the session parameter",
|
|
65
65
|
"int_value": "Integer value of the session parameter",
|
|
66
|
-
"float_value": "Float value of the session parameter",
|
|
66
|
+
"float_value": "Float value of the session parameter (currently unused in GA4)",
|
|
67
67
|
"double_value": "Double value of the session parameter"
|
|
68
68
|
}
|
|
69
69
|
}
|
|
70
70
|
}
|
|
71
71
|
},
|
|
72
72
|
"user_properties": {
|
|
73
|
-
"description": "User properties
|
|
73
|
+
"description": "User properties",
|
|
74
74
|
"columns": {
|
|
75
75
|
"key": "Name of the user property",
|
|
76
76
|
"value": {
|
|
@@ -79,7 +79,7 @@
|
|
|
79
79
|
"string_value": "String value of the user property",
|
|
80
80
|
"int_value": "Integer value of the user property",
|
|
81
81
|
"double_value": "Double value of the user property",
|
|
82
|
-
"float_value": "Float value of the user property (currently unused
|
|
82
|
+
"float_value": "Float value of the user property (currently unused in GA4)",
|
|
83
83
|
"set_timestamp_micros": "Time in microseconds at which the user property was last set"
|
|
84
84
|
}
|
|
85
85
|
}
|
|
@@ -284,7 +284,7 @@
|
|
|
284
284
|
"source": "Source network that first acquired the user"
|
|
285
285
|
}
|
|
286
286
|
},
|
|
287
|
-
"event_previous_timestamp": "Time in microseconds (UTC) when the previous event
|
|
287
|
+
"event_previous_timestamp": "Time in microseconds (UTC) when the previous event happened",
|
|
288
288
|
"event_value_in_usd": "Currency-converted value (in USD) of the event's 'value' parameter",
|
|
289
289
|
"event_bundle_sequence_id": "Sequential ID of the bundle in which the event was uploaded",
|
|
290
290
|
"event_server_timestamp_offset": "Timestamp offset between collection time and upload time in microseconds",
|
package/documentation.js
CHANGED
|
@@ -10,6 +10,8 @@ const columnDescriptions = require('./columns/columnDescriptions.json');
|
|
|
10
10
|
const getColumnDescriptions = (config) => {
|
|
11
11
|
const descriptions = JSON.parse(JSON.stringify(columnDescriptions));
|
|
12
12
|
|
|
13
|
+
if (!config) return descriptions;
|
|
14
|
+
|
|
13
15
|
const appendToDescription = (key, suffix) => {
|
|
14
16
|
if (!descriptions[key]) return;
|
|
15
17
|
if (typeof descriptions[key] === 'string') {
|
package/package.json
CHANGED
package/preOperations.js
CHANGED
|
@@ -128,6 +128,11 @@ const createSchemaLockTable = (config) => {
|
|
|
128
128
|
|
|
129
129
|
// Set the pre operations for the query
|
|
130
130
|
const setPreOperations = (config) => {
|
|
131
|
+
// if in test mode, avoid setting BigQuery variables to make query dry run estimation accurate
|
|
132
|
+
if (config.test) {
|
|
133
|
+
return '';
|
|
134
|
+
}
|
|
135
|
+
|
|
131
136
|
// define the pre operations
|
|
132
137
|
const preOperations = [
|
|
133
138
|
{
|