@ken-e/dataform-youtube 0.0.10 → 0.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md ADDED
@@ -0,0 +1,228 @@
1
+ # CLAUDE.md — dataform-youtube
2
+
3
+ ## Overview
4
+
5
+ `@ken-e/dataform-youtube` is a Dataform package that processes YouTube analytics data from Google's BigQuery Data Transfer Service. It transforms raw partitioned transfer tables into clean, enriched output tables ready for downstream reporting and analysis.
6
+
7
+ The package depends on `@ken-e/dataform-helpers` for incremental partition detection logic.
8
+
9
+ ## Architecture
10
+
11
+ ### Two-layer processing model
12
+
13
+ 1. **Staging layer** (`stg_ytc_*`): Reads raw BQ Data Transfer partitioned tables, normalizes columns, and handles the a2/a3 table version migration via UNION ALL. All staging tables are incremental, partitioned by `data_date`.
14
+
15
+ 2. **Output layer** (`ytc_*`): Reads from staging tables, joins lookup/dimension tables for human-readable names, computes derived metrics (e.g., `subscribers_net`, `row_max_duration_seconds`), deduplicates via `QUALIFY rank() OVER (PARTITION BY data_date, site_nm ORDER BY updated_at DESC) = 1`, and joins video/playlist titles.
16
+
17
+ ### Incremental processing
18
+
19
+ Both layers use partition-based checkpointing provided by `@ken-e/dataform-helpers`:
20
+
21
+ - **Staging preops** (`BQ_DATA_TRANSFER_PARTITION_STAGING_PREOPS`): Queries `INFORMATION_SCHEMA.PARTITIONS` on the raw source tables to find partitions modified since the last run. Populates a `partitions_to_update` array used in WHERE clauses.
22
+ - **Output preops** (`BQ_DATA_TRANSFER_PARTITION_OUTPUT_PREOPS`): Queries `INFORMATION_SCHEMA.PARTITIONS` on the staging table to find updated partitions.
23
+
24
+ On full refresh, the `self_checkpoint` is set to `config.startDate` and all partitions after that date are processed.
25
+
26
+ ## Directory structure
27
+
28
+ ```
29
+ index.js Main entry point — config API, source table declarations, model wiring
30
+ includes/
31
+ helpers.js Thin wrapper re-exporting preops from @ken-e/dataform-helpers
32
+ column_descriptions.js Shared column descriptions applied to all output tables
33
+ constants.js Package-level constants (currently unused)
34
+ project_variables.js Template file (currently unused)
35
+ definitions/
36
+ ytc_basic.js Output: channel basic metrics
37
+ ytc_combined.js Output: channel combined dimensions
38
+ ytc_demographics.js Output: age/gender breakdowns
39
+ ytc_demographics_views.js Output: demographics with view counts
40
+ ytc_device_os.js Output: device and OS breakdowns
41
+ ytc_playback.js Output: playback location
42
+ ytc_province.js Output: geographic (province/state)
43
+ ytc_traffic_source.js Output: traffic source analysis
44
+ ytc_share_platform.js Output: sharing service breakdowns
45
+ ytc_annotation.js Output: annotation performance
46
+ ytc_cards.js Output: card performance
47
+ ytc_end_screens.js Output: end screen element performance
48
+ ytc_subtitles.js Output: subtitle/caption analytics
49
+ ytc_list_basic.js Output: playlist basic metrics
50
+ ytc_list_combined.js Output: playlist combined dimensions
51
+ ytc_list_device_os.js Output: playlist device/OS
52
+ ytc_list_playback.js Output: playlist playback location
53
+ ytc_list_province.js Output: playlist geographic
54
+ ytc_list_traffic_source.js Output: playlist traffic source
55
+ sources/
56
+ stg_ytc_*.js Staging models (13 with dual-table UNION ALL, 5 single-table)
57
+ stg_ytc_lu_*.js Lookup/dimension tables (8 static reference tables)
58
+ ```
59
+
60
+ ## Configuration API
61
+
62
+ The package exports a single function that accepts a `config` object:
63
+
64
+ ```javascript
65
+ const youtube = require("@ken-e/dataform-youtube");
66
+
67
+ youtube({
68
+ startDate: "2025-01-01",
69
+ days_back: 10,
70
+
71
+ // Video and playlist title lookup tables
72
+ titlesProject: "my-project",
73
+ titlesDataset: "my_dataset",
74
+ titlesTable: "stg_yt_video_titles",
75
+ playlistTable: "stg_yt_playlist_titles",
76
+
77
+ // One or more YouTube data sources
78
+ sources: [{
79
+ database: "my-bq-project",
80
+ schema: "youtube_channel",
81
+ suffix: "_"
82
+ }],
83
+
84
+ // Target schemas for staging and output layers
85
+ target: {
86
+ database: dataform.projectConfig.defaultDatabase,
87
+ stagingSchema: "my_staging_dataset",
88
+ outputSchema: "my_output_dataset"
89
+ }
90
+ });
91
+ ```
92
+
93
+ ### Required config fields
94
+
95
+ | Field | Description |
96
+ |---|---|
97
+ | `sources` | Array of `{ database, schema, suffix }` objects pointing to raw BQ Data Transfer datasets |
98
+ | `target.stagingSchema` | BigQuery dataset for staging tables |
99
+ | `target.outputSchema` | BigQuery dataset for output tables |
100
+ | `startDate` | Earliest date to process on full refresh (YYYY-MM-DD) |
101
+ | `titlesProject` | Project containing the video titles lookup table |
102
+ | `titlesDataset` | Dataset containing the video titles lookup table |
103
+ | `titlesTable` | Table name for video titles |
104
+ | `playlistTable` | Table name for playlist titles |
105
+
106
+ ## Source table versioning (a2 to a3 migration)
107
+
108
+ ### Background
109
+
110
+ On September 22, 2025, Google's BigQuery Data Transfer Service deprecated several YouTube source table versions and introduced new ones:
111
+
112
+ - **Channel reports**: `a2` tables replaced by `a3` tables
113
+ - **Playlist reports**: `a1` tables replaced by `a2` tables
114
+
115
+ The old tables stopped receiving new data around mid-November 2025. The new tables include one additional column (`engaged_views`) and use a different view-counting methodology that results in significantly higher view counts.
116
+
117
+ ### Official documentation
118
+
119
+ - YouTube Channel Transfer overview: https://docs.cloud.google.com/bigquery/docs/youtube-channel-transfer
120
+ - BQ Data Transfer change log: https://docs.cloud.google.com/bigquery/docs/transfer-changes
121
+
122
+ ### Implementation
123
+
124
+ The 13 affected staging models use `.flatMap()` to produce two SELECT blocks per source, joined with UNION ALL:
125
+
126
+ - **Old table block**: Filters to `date(_PARTITIONTIME) < date '2025-09-22'`, casts `engaged_views` as null
127
+ - **New table block**: Filters to `date(_PARTITIONTIME) >= date '2025-09-22'`, selects `engaged_views` natively
128
+
129
+ The cutover date `2025-09-22` is hardcoded in each staging model. On incremental runs, the date filter is also applied to prevent duplicate rows when both old and new tables have partitions for overlapping dates.
130
+
131
+ The preops helper accepts an array of table names (e.g., `["p_channel_basic_a2_", "p_channel_basic_a3_"]`) to detect modified partitions across both table versions.
132
+
133
+ ### Data discontinuity warning
134
+
135
+ The a3 source tables report significantly higher view counts than the a2 tables for the same dates. Our analysis of the `p_channel_basic` table for October 15, 2025 showed:
136
+
137
+ - a2 table: 18,892 total views
138
+ - a3 table: 119,044 total views (~6.3x higher)
139
+
140
+ This is a Google-side methodology change, not a data processing error. View counts (and likely other metrics) are **not directly comparable** across the September 22, 2025 cutover boundary. Downstream reports should account for this discontinuity.
141
+
142
+ ### Tables affected by the migration
143
+
144
+ | Old (deprecated) | New | Type |
145
+ |---|---|---|
146
+ | `p_channel_basic_a2_` | `p_channel_basic_a3_` | Channel |
147
+ | `p_channel_combined_a2_` | `p_channel_combined_a3_` | Channel |
148
+ | `p_channel_device_os_a2_` | `p_channel_device_os_a3_` | Channel |
149
+ | `p_channel_playback_location_a2_` | `p_channel_playback_location_a3_` | Channel |
150
+ | `p_channel_province_a2_` | `p_channel_province_a3_` | Channel |
151
+ | `p_channel_subtitles_a2_` | `p_channel_subtitles_a3_` | Channel |
152
+ | `p_channel_traffic_source_a2_` | `p_channel_traffic_source_a3_` | Channel |
153
+ | `p_playlist_basic_a1_` | `p_playlist_basic_a2_` | Playlist |
154
+ | `p_playlist_combined_a1_` | `p_playlist_combined_a2_` | Playlist |
155
+ | `p_playlist_device_os_a1_` | `p_playlist_device_os_a2_` | Playlist |
156
+ | `p_playlist_playback_location_a1_` | `p_playlist_playback_location_a2_` | Playlist |
157
+ | `p_playlist_province_a1_` | `p_playlist_province_a2_` | Playlist |
158
+ | `p_playlist_traffic_source_a1_` | `p_playlist_traffic_source_a2_` | Playlist |
159
+
160
+ **Unchanged** (no version bump): `p_channel_demographics_a1_`, `p_channel_sharing_service_a1_`, `p_channel_annotations_a1_`, `p_channel_cards_a1_`, `p_channel_end_screens_a1_`
161
+
162
+ ## Output tables
163
+
164
+ ### Channel reports
165
+
166
+ | Table | Description | Key dimensions |
167
+ |---|---|---|
168
+ | `ytc_basic` | Core video metrics (views, likes, comments, shares, watch time) | video_id, country_code |
169
+ | `ytc_combined` | Metrics broken down by all major dimensions simultaneously | video_id, playback_location, traffic_source, device, OS |
170
+ | `ytc_demographics` | Age group and gender breakdowns (percentage-based) | video_id, age_group, gender |
171
+ | `ytc_demographics_views` | Demographics with absolute view counts (joined from basic) | video_id, age_group, gender |
172
+ | `ytc_device_os` | Device type and operating system breakdowns | video_id, device_type, operating_system |
173
+ | `ytc_playback` | Playback location breakdowns | video_id, playback_location_type |
174
+ | `ytc_province` | Province/state-level geographic breakdowns | video_id, province_code |
175
+ | `ytc_traffic_source` | Traffic source breakdowns | video_id, traffic_source_type |
176
+ | `ytc_share_platform` | Social sharing service breakdowns | video_id, sharing_service |
177
+ | `ytc_annotation` | Annotation performance metrics | video_id, annotation_type |
178
+ | `ytc_cards` | Card performance metrics | video_id, card_type |
179
+ | `ytc_end_screens` | End screen element performance | video_id, end_screen_element_type |
180
+ | `ytc_subtitles` | Subtitle/caption analytics | video_id, subtitle_language |
181
+
182
+ ### Playlist reports
183
+
184
+ | Table | Description | Key dimensions |
185
+ |---|---|---|
186
+ | `ytc_list_basic` | Core playlist metrics | playlist_id, video_id, country_code |
187
+ | `ytc_list_combined` | Playlist metrics by all dimensions | playlist_id, video_id, playback_location, traffic_source, device, OS |
188
+ | `ytc_list_device_os` | Playlist device/OS breakdowns | playlist_id, video_id, device_type, operating_system |
189
+ | `ytc_list_playback` | Playlist playback location breakdowns | playlist_id, video_id, playback_location_type |
190
+ | `ytc_list_province` | Playlist province-level breakdowns | playlist_id, video_id, province_code |
191
+ | `ytc_list_traffic_source` | Playlist traffic source breakdowns | playlist_id, video_id, traffic_source_type |
192
+
193
+ ### Lookup tables
194
+
195
+ | Table | Enriches | Provides |
196
+ |---|---|---|
197
+ | `stg_ytc_lu_annotation_type` | annotation_type (int) | annotation_type_name |
198
+ | `stg_ytc_lu_card_type` | card_type (int) | card_type_name |
199
+ | `stg_ytc_lu_device_types` | device_type (int) | device_name |
200
+ | `stg_ytc_lu_end_screen_element_type` | end_screen_element_type (int) | end_screen_element_type_name |
201
+ | `stg_ytc_lu_operating_systems` | operating_system (int) | operating_system_name |
202
+ | `stg_ytc_lu_playback_location` | playback_location_type (int) | playback_location_name |
203
+ | `stg_ytc_lu_sharing_services` | sharing_service (int) | sharing_service_name |
204
+ | `stg_ytc_lu_traffic_sources` | traffic_source_type (int) | traffic_source_name |
205
+
206
+ ## Development notes
207
+
208
+ ### Adding a new column to staging models
209
+
210
+ 1. Add the column to the appropriate `columns` or `restColumns` template literal in the staging model
211
+ 2. If the column exists only in the new (a3) table version, add `cast(null as <type>) as <column>` to the old table block and the raw column name to the new table block
212
+ 3. Add a description in `includes/column_descriptions.js`
213
+
214
+ ### Adding a new output model
215
+
216
+ 1. Create a new file in `includes/definitions/` following the pattern of existing output models
217
+ 2. Wire it up in `index.js` by adding it to the model instantiation section
218
+ 3. Ensure it uses `helpers.output_preops()` for incremental processing
219
+ 4. Include `QUALIFY rank() OVER (...) = 1` for deduplication
220
+
221
+ ### The `.flatMap()` dual-table pattern
222
+
223
+ Staging models that span the a2/a3 migration use `.flatMap()` instead of `.map()` to return two SELECT blocks per source. Each block targets a different table version with complementary date filters. The preops helper accepts an array of table names to detect modified partitions across both versions.
224
+
225
+ ### Known issues
226
+
227
+ - `config.target.datasetStaging` is used in staging models but some consumer configs set `config.target.stagingSchema` — ensure your config uses the property name expected by the staging models
228
+ - The `daysBack` / `days_back` parameter is declared in consumer configs but currently unused by the package
package/README.md CHANGED
@@ -1,4 +1,85 @@
1
- # Propeller - Youtube Dataform Package
1
+ # @ken-e/dataform-youtube
2
2
 
3
- ## Overview
4
- Welcome to the **Youtube** Dataform project.
3
+ A [Dataform](https://dataform.co/) package for processing YouTube analytics data from Google's [BigQuery Data Transfer Service](https://docs.cloud.google.com/bigquery/docs/youtube-channel-transfer).
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install @ken-e/dataform-youtube
9
+ ```
10
+
11
+ ## Quick start
12
+
13
+ ```javascript
14
+ const youtube = require("@ken-e/dataform-youtube");
15
+
16
+ youtube({
17
+ startDate: "2025-01-01",
18
+
19
+ // Video and playlist title lookup tables
20
+ titlesProject: "my-project",
21
+ titlesDataset: "my_dataset",
22
+ titlesTable: "stg_yt_video_titles",
23
+ playlistTable: "stg_yt_playlist_titles",
24
+
25
+ // YouTube data sources (one or more)
26
+ sources: [{
27
+ database: "my-bq-project",
28
+ schema: "youtube_channel",
29
+ suffix: "_"
30
+ }],
31
+
32
+ // Target BigQuery datasets
33
+ target: {
34
+ database: dataform.projectConfig.defaultDatabase,
35
+ stagingSchema: "my_staging_dataset",
36
+ outputSchema: "my_output_dataset"
37
+ }
38
+ });
39
+ ```
40
+
41
+ ## Output tables
42
+
43
+ ### Channel reports
44
+
45
+ | Table | Description |
46
+ |---|---|
47
+ | `ytc_basic` | Core video metrics: views, likes, comments, shares, watch time, subscribers |
48
+ | `ytc_combined` | Metrics by all major dimensions (location, traffic source, device, OS) |
49
+ | `ytc_demographics` | Age group and gender breakdowns |
50
+ | `ytc_demographics_views` | Demographics with absolute view counts |
51
+ | `ytc_device_os` | Device type and operating system breakdowns |
52
+ | `ytc_playback` | Playback location breakdowns |
53
+ | `ytc_province` | Province/state-level geographic breakdowns |
54
+ | `ytc_traffic_source` | Traffic source breakdowns |
55
+ | `ytc_share_platform` | Social sharing service breakdowns |
56
+ | `ytc_annotation` | Annotation performance metrics |
57
+ | `ytc_cards` | Card performance metrics |
58
+ | `ytc_end_screens` | End screen element performance |
59
+ | `ytc_subtitles` | Subtitle/caption analytics |
60
+
61
+ ### Playlist reports
62
+
63
+ | Table | Description |
64
+ |---|---|
65
+ | `ytc_list_basic` | Core playlist metrics |
66
+ | `ytc_list_combined` | Playlist metrics by all dimensions |
67
+ | `ytc_list_device_os` | Playlist device/OS breakdowns |
68
+ | `ytc_list_playback` | Playlist playback location breakdowns |
69
+ | `ytc_list_province` | Playlist province-level breakdowns |
70
+ | `ytc_list_traffic_source` | Playlist traffic source breakdowns |
71
+
72
+ ## Data source migration (September 2025)
73
+
74
+ On September 22, 2025, Google deprecated several YouTube BQ Data Transfer table versions (`a2` channel tables and `a1` playlist tables) and replaced them with new versions (`a3` and `a2` respectively). This package handles the migration transparently by reading from both old and new tables with a date-based cutover.
75
+
76
+ **Important**: Google changed the view-counting methodology in the new tables. View counts after September 22, 2025 are significantly higher than before and are not directly comparable across the cutover boundary. See the [BQ Data Transfer change log](https://docs.cloud.google.com/bigquery/docs/transfer-changes) for details.
77
+
78
+ ## Documentation
79
+
80
+ See [CLAUDE.md](./CLAUDE.md) for detailed development documentation including architecture, configuration reference, and the source table migration guide.
81
+
82
+ ## Google documentation
83
+
84
+ - [YouTube Channel Reports in BigQuery](https://docs.cloud.google.com/bigquery/docs/youtube-channel-transfer)
85
+ - [BigQuery Data Transfer change log](https://docs.cloud.google.com/bigquery/docs/transfer-changes)
@@ -5,6 +5,7 @@
5
5
 
6
6
  const column_descriptions = {
7
7
  ad_impressions: "The number of verified ad impressions served.",
8
+ data_date: "The date of the reported metrics, parsed from the raw date field.",
8
9
  ad_type:
9
10
  "The type of ad displayed (e.g., auction_display, auction_instream, reserved_instream).",
10
11
  age_group:
@@ -52,6 +53,8 @@ const column_descriptions = {
52
53
  device_type:
53
54
  "The type of device used by the viewer (e.g., DESKTOP, MOBILE, TABLET, TV, GAME_CONSOLE).",
54
55
  dislikes: "The number of times users disliked videos (negative ratings).",
56
+ engaged_views:
57
+ "The number of engaged views. Available only for data after September 22, 2025 (null for earlier data). Note: Google changed the view-counting methodology when migrating from a2 to a3 tables, so total view counts are significantly higher in post-cutover data and are not directly comparable to pre-cutover figures.",
55
58
  end_screen_element_id: "Unique identifier for the end screen element.",
56
59
  end_screen_element_type:
57
60
  "Integer index indicating the type of end screen element.",
@@ -81,10 +84,16 @@ const column_descriptions = {
81
84
  playback_based_cpm:
82
85
  "Estimated gross revenue per 1000 monetized playbacks (Playback-based CPM, USD).",
83
86
  playback_location: "Type of location where playback occured (text).",
87
+ playback_location_detail:
88
+ "Specific detail about the playback location (e.g., embedded page URL).",
84
89
  playback_location_type:
85
90
  "Integer index indicating type of location where playback occured.",
86
91
  playback_location_name: "String name of the playback location.",
87
92
  playlist_id: "Unique identifier for the YouTube playlist.",
93
+ playlist_saves_added:
94
+ "The number of times videos were saved (added) to playlists.",
95
+ playlist_saves_removed:
96
+ "The number of times videos were removed from playlists.",
88
97
  playlist_saves_net:
89
98
  "The number of times users saved videos to their playlists less the number of times they removed them.",
90
99
  playlist_starts: "Times the playlist was started.",
@@ -92,9 +101,15 @@ const column_descriptions = {
92
101
  province_code:
93
102
  "The ISO code for the province or state where the interactions occurred.",
94
103
  red_views: "The number of times YouTube Premium members viewed videos.",
104
+ red_watch_time_minutes:
105
+ "The number of minutes that YouTube Premium members watched videos.",
95
106
  row_max_duration_seconds:
96
107
  "Maximum duration in seconds for the row's if all videos were played to the end. Provided simplify downstream calculation of percentage of videos watched.",
97
108
  shares: "The number of times users shared videos via the 'Share' button.",
109
+ site_nm:
110
+ "Identifier for the data source, derived from the source schema name.",
111
+ source_partition_date:
112
+ "The partition date from the raw BigQuery Data Transfer source table.",
98
113
  sharing_service:
99
114
  "The service used to share the video (e.g., FACEBOOK, WHATSAPP, REDDIT, EMAIL).",
100
115
  sharing_service_name: "String name of the sharing service.",
@@ -112,6 +127,8 @@ const column_descriptions = {
112
127
  traffic_source_name: "String name of the traffic source type.",
113
128
  traffic_source_type:
114
129
  "The type of referrer through which viewers reached the video (e.g., YT_SEARCH, RELATED_VIDEO, EXT_URL, SUBSCRIBER).",
130
+ updated_at:
131
+ "Timestamp of when the staging row was created during processing.",
115
132
  uploader_type:
116
133
  "Indicates if metrics relate to content uploaded by the owner ('SELF') or claimed third-party content ('THIRD_PARTY').",
117
134
  video_id: "The ID of a YouTube video.",
@@ -121,7 +138,7 @@ const column_descriptions = {
121
138
  videos_removed_from_playlists:
122
139
  "The number of times videos were removed from any YouTube playlist (owner's or others').",
123
140
  views:
124
- "The total number of times videos were viewed. In playlist reports, counts views within the playlist context.",
141
+ "The total number of times videos were viewed. In playlist reports, counts views within the playlist context. Note: Google changed the view-counting methodology in the a3 source tables effective September 22, 2025. View counts after this date are significantly higher than before and are not directly comparable across the cutover boundary.",
125
142
  views_percentage:
126
143
  "Percentage of total views for the time period that were logged-in.",
127
144
  views_with_demographics:
@@ -39,23 +39,23 @@ module.exports = (config) => {
39
39
  })
40
40
  .preOps((ctx) => {
41
41
  // Get pre_operations to find updated source partitions
42
- return `${helpers.staging_preops(ctx, config, "p_channel_basic_a2_")}`;
42
+ return `${helpers.staging_preops(ctx, config, ["p_channel_basic_a2_", "p_channel_basic_a3_"])}`;
43
43
  })
44
44
  .query((ctx) =>
45
45
  config.sources
46
- .map((t) => {
47
- return `
48
- select
46
+ .flatMap((t) => {
47
+ const columns = `
49
48
  _PARTITIONDATE as source_partition_date,
50
49
  parse_date('%Y%m%d', date) as data_date,
51
50
  "${t.schema}" as site_nm,
52
- current_timestamp() as updated_at,
51
+ current_timestamp() as updated_at,
53
52
  channel_id,
54
53
  video_id,
55
54
  live_or_on_demand,
56
55
  subscribed_status,
57
56
  country_code,
58
- views,
57
+ views,`;
58
+ const restColumns = `
59
59
  comments,
60
60
  likes,
61
61
  dislikes,
@@ -81,14 +81,32 @@ select
81
81
  videos_added_to_playlists,
82
82
  videos_removed_from_playlists,
83
83
  red_views,
84
- red_watch_time_minutes,
85
- from ${ctx.ref(t.database, t.schema, "p_channel_basic_a2_" + t.suffix)}
84
+ red_watch_time_minutes,`;
85
+ const incrementalWhere = `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`;
86
+ return [
87
+ `
88
+ -- Deprecated a2 table (data before cutover)
89
+ select ${columns}
90
+ cast(null as int64) as engaged_views,${restColumns}
91
+ from ${ctx.ref(t.database, t.schema, "p_channel_basic_a2_" + t.suffix)}
86
92
  ${ctx.when(
87
93
  ctx.incremental(),
88
- `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`,
89
- `where date(_PARTITIONTIME) > date(self_checkpoint)`,
94
+ `${incrementalWhere} and date(_PARTITIONTIME) < date '2025-09-22'`,
95
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) < date '2025-09-22'`,
90
96
  )}
91
- `;
97
+ `,
98
+ `
99
+ -- New a3 table (data from cutover onward)
100
+ select ${columns}
101
+ engaged_views,${restColumns}
102
+ from ${ctx.ref(t.database, t.schema, "p_channel_basic_a3_" + t.suffix)}
103
+ ${ctx.when(
104
+ ctx.incremental(),
105
+ `${incrementalWhere} and date(_PARTITIONTIME) >= date '2025-09-22'`,
106
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) >= date '2025-09-22'`,
107
+ )}
108
+ `,
109
+ ];
92
110
  })
93
111
  .join(" union all "),
94
112
  );
@@ -43,40 +43,57 @@ module.exports = (config) => {
43
43
  })
44
44
  .preOps((ctx) => {
45
45
  // Get pre_operations to find updated source partitions
46
- return `${helpers.staging_preops(ctx, config, "p_channel_combined_a2_")}`;
46
+ return `${helpers.staging_preops(ctx, config, ["p_channel_combined_a2_", "p_channel_combined_a3_"])}`;
47
47
  })
48
48
  .query((ctx) =>
49
49
  config.sources
50
- .map((t) => {
51
- return `
52
-
53
- select
50
+ .flatMap((t) => {
51
+ const columns = `
54
52
  _PARTITIONDATE as source_partition_date,
55
53
  parse_date('%Y%m%d', date) as data_date,
56
54
  "${t.schema}" as site_nm,
57
- current_timestamp() as updated_at,
55
+ current_timestamp() as updated_at,
58
56
  channel_id,
59
57
  video_id,
60
58
  live_or_on_demand,
61
59
  subscribed_status,
62
60
  country_code,
63
- playback_location_type,
64
- traffic_source_type,
65
- device_type,
61
+ playback_location_type,
62
+ traffic_source_type,
63
+ device_type,
66
64
  operating_system,
67
- views,
65
+ views,`;
66
+ const restColumns = `
68
67
  watch_time_minutes,
69
68
  average_view_duration_seconds,
70
69
  average_view_duration_percentage,
71
70
  red_views,
72
- red_watch_time_minutes,
73
- from ${ctx.ref(t.database, t.schema, "p_channel_combined_a2_" + t.suffix)}
71
+ red_watch_time_minutes,`;
72
+ const incrementalWhere = `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`;
73
+ return [
74
+ `
75
+ -- Deprecated a2 table (data before cutover)
76
+ select ${columns}
77
+ cast(null as int64) as engaged_views,${restColumns}
78
+ from ${ctx.ref(t.database, t.schema, "p_channel_combined_a2_" + t.suffix)}
79
+ ${ctx.when(
80
+ ctx.incremental(),
81
+ `${incrementalWhere} and date(_PARTITIONTIME) < date '2025-09-22'`,
82
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) < date '2025-09-22'`,
83
+ )}
84
+ `,
85
+ `
86
+ -- New a3 table (data from cutover onward)
87
+ select ${columns}
88
+ engaged_views,${restColumns}
89
+ from ${ctx.ref(t.database, t.schema, "p_channel_combined_a3_" + t.suffix)}
74
90
  ${ctx.when(
75
91
  ctx.incremental(),
76
- `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`,
77
- `where date(_PARTITIONTIME) > date(self_checkpoint)`,
92
+ `${incrementalWhere} and date(_PARTITIONTIME) >= date '2025-09-22'`,
93
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) >= date '2025-09-22'`,
78
94
  )}
79
- `;
95
+ `,
96
+ ];
80
97
  })
81
98
  .join(" union all "),
82
99
  );
@@ -39,14 +39,12 @@ module.exports = (config) => {
39
39
  })
40
40
  .preOps((ctx) => {
41
41
  // Get pre_operations to find updated source partitions
42
- return `${helpers.staging_preops(ctx, config, "p_channel_device_os_a2_")}`;
42
+ return `${helpers.staging_preops(ctx, config, ["p_channel_device_os_a2_", "p_channel_device_os_a3_"])}`;
43
43
  })
44
44
  .query((ctx) =>
45
45
  config.sources
46
- .map((t) => {
47
- return `
48
-
49
- select
46
+ .flatMap((t) => {
47
+ const columns = `
50
48
  _PARTITIONDATE as source_partition_date,
51
49
  parse_date('%Y%m%d', date) as data_date,
52
50
  "${t.schema}" as site_nm,
@@ -58,19 +56,38 @@ select
58
56
  country_code,
59
57
  device_type,
60
58
  operating_system,
61
- views,
59
+ views,`;
60
+ const restColumns = `
62
61
  watch_time_minutes,
63
62
  average_view_duration_seconds,
64
63
  average_view_duration_percentage,
65
64
  red_views,
66
- red_watch_time_minutes
65
+ red_watch_time_minutes`;
66
+ const incrementalWhere = `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`;
67
+ return [
68
+ `
69
+ -- Deprecated a2 table (data before cutover)
70
+ select ${columns}
71
+ cast(null as int64) as engaged_views,${restColumns}
67
72
  from ${ctx.ref(t.database, t.schema, "p_channel_device_os_a2_" + t.suffix)}
68
73
  ${ctx.when(
69
74
  ctx.incremental(),
70
- `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`,
71
- `where date(_PARTITIONTIME) > date(self_checkpoint)`,
75
+ `${incrementalWhere} and date(_PARTITIONTIME) < date '2025-09-22'`,
76
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) < date '2025-09-22'`,
77
+ )}
78
+ `,
79
+ `
80
+ -- New a3 table (data from cutover onward)
81
+ select ${columns}
82
+ engaged_views,${restColumns}
83
+ from ${ctx.ref(t.database, t.schema, "p_channel_device_os_a3_" + t.suffix)}
84
+ ${ctx.when(
85
+ ctx.incremental(),
86
+ `${incrementalWhere} and date(_PARTITIONTIME) >= date '2025-09-22'`,
87
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) >= date '2025-09-22'`,
72
88
  )}
73
- `;
89
+ `,
90
+ ];
74
91
  })
75
92
  .join(" union all "),
76
93
  );
@@ -40,37 +40,54 @@ module.exports = (config) => {
40
40
  })
41
41
  .preOps((ctx) => {
42
42
  // Get pre_operations to find updated source partitions
43
- return `${helpers.staging_preops(ctx, config, "p_playlist_basic_a1_")}`;
43
+ return `${helpers.staging_preops(ctx, config, ["p_playlist_basic_a1_", "p_playlist_basic_a2_"])}`;
44
44
  })
45
45
  .query((ctx) =>
46
46
  config.sources
47
- .map((t) => {
48
- return `
49
-
50
- select
47
+ .flatMap((t) => {
48
+ const columns = `
51
49
  _PARTITIONDATE as source_partition_date,
52
50
  parse_date('%Y%m%d', date) as data_date,
53
51
  "${t.schema}" as site_nm,
54
- current_timestamp() as updated_at,
52
+ current_timestamp() as updated_at,
55
53
  channel_id,
56
54
  playlist_id,
57
55
  video_id,
58
56
  live_or_on_demand,
59
57
  subscribed_status,
60
58
  country_code,
61
- views,
59
+ views,`;
60
+ const restColumns = `
62
61
  watch_time_minutes,
63
62
  average_view_duration_seconds,
64
63
  playlist_starts,
65
64
  playlist_saves_added,
66
- playlist_saves_removed
67
- from ${ctx.ref(t.database, t.schema, "p_playlist_basic_a1_" + t.suffix)}
65
+ playlist_saves_removed`;
66
+ const incrementalWhere = `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`;
67
+ return [
68
+ `
69
+ -- Deprecated a1 table (data before cutover)
70
+ select ${columns}
71
+ cast(null as int64) as engaged_views,${restColumns}
72
+ from ${ctx.ref(t.database, t.schema, "p_playlist_basic_a1_" + t.suffix)}
73
+ ${ctx.when(
74
+ ctx.incremental(),
75
+ `${incrementalWhere} and date(_PARTITIONTIME) < date '2025-09-22'`,
76
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) < date '2025-09-22'`,
77
+ )}
78
+ `,
79
+ `
80
+ -- New a2 table (data from cutover onward)
81
+ select ${columns}
82
+ engaged_views,${restColumns}
83
+ from ${ctx.ref(t.database, t.schema, "p_playlist_basic_a2_" + t.suffix)}
68
84
  ${ctx.when(
69
85
  ctx.incremental(),
70
- `where date(_PARTITIONTIME) in unnest((select partition_dates from unnest(partitions_to_update) where site_nm = "${t.schema}"))`,
71
- `where date(_PARTITIONTIME) > date(self_checkpoint)`,
86
+ `${incrementalWhere} and date(_PARTITIONTIME) >= date '2025-09-22'`,
87
+ `where date(_PARTITIONTIME) > date(self_checkpoint) and date(_PARTITIONTIME) >= date '2025-09-22'`,
72
88
  )}
73
- `;
89
+ `,
90
+ ];
74
91
  })
75
92
  .join(" union all "),
76
93
  );