dremiojs 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/.eslintrc.json +14 -0
  2. package/.prettierrc +7 -0
  3. package/README.md +59 -0
  4. package/dremiodocs/dremio-cloud/cloud-api-reference.md +748 -0
  5. package/dremiodocs/dremio-cloud/dremio-cloud-about.md +225 -0
  6. package/dremiodocs/dremio-cloud/dremio-cloud-admin.md +3754 -0
  7. package/dremiodocs/dremio-cloud/dremio-cloud-bring-data.md +6098 -0
  8. package/dremiodocs/dremio-cloud/dremio-cloud-changelog.md +32 -0
  9. package/dremiodocs/dremio-cloud/dremio-cloud-developer.md +1147 -0
  10. package/dremiodocs/dremio-cloud/dremio-cloud-explore-analyze.md +2522 -0
  11. package/dremiodocs/dremio-cloud/dremio-cloud-get-started.md +300 -0
  12. package/dremiodocs/dremio-cloud/dremio-cloud-help-support.md +869 -0
  13. package/dremiodocs/dremio-cloud/dremio-cloud-manage-govern.md +800 -0
  14. package/dremiodocs/dremio-cloud/dremio-cloud-overview.md +36 -0
  15. package/dremiodocs/dremio-cloud/dremio-cloud-security.md +1844 -0
  16. package/dremiodocs/dremio-cloud/sql-docs.md +7180 -0
  17. package/dremiodocs/dremio-software/dremio-software-acceleration.md +1575 -0
  18. package/dremiodocs/dremio-software/dremio-software-admin.md +884 -0
  19. package/dremiodocs/dremio-software/dremio-software-client-applications.md +3277 -0
  20. package/dremiodocs/dremio-software/dremio-software-data-products.md +560 -0
  21. package/dremiodocs/dremio-software/dremio-software-data-sources.md +8701 -0
  22. package/dremiodocs/dremio-software/dremio-software-deploy-dremio.md +3446 -0
  23. package/dremiodocs/dremio-software/dremio-software-get-started.md +848 -0
  24. package/dremiodocs/dremio-software/dremio-software-monitoring.md +422 -0
  25. package/dremiodocs/dremio-software/dremio-software-reference.md +677 -0
  26. package/dremiodocs/dremio-software/dremio-software-security.md +2074 -0
  27. package/dremiodocs/dremio-software/dremio-software-v25-api.md +32637 -0
  28. package/dremiodocs/dremio-software/dremio-software-v26-api.md +36757 -0
  29. package/jest.config.js +10 -0
  30. package/package.json +25 -0
  31. package/src/api/catalog.ts +74 -0
  32. package/src/api/jobs.ts +105 -0
  33. package/src/api/reflection.ts +77 -0
  34. package/src/api/source.ts +61 -0
  35. package/src/api/user.ts +32 -0
  36. package/src/client/base.ts +66 -0
  37. package/src/client/cloud.ts +37 -0
  38. package/src/client/software.ts +73 -0
  39. package/src/index.ts +16 -0
  40. package/src/types/catalog.ts +31 -0
  41. package/src/types/config.ts +18 -0
  42. package/src/types/job.ts +18 -0
  43. package/src/types/reflection.ts +29 -0
  44. package/tests/integration_manual.ts +95 -0
  45. package/tsconfig.json +19 -0
@@ -0,0 +1,3754 @@
1
+ # Administration | Dremio Documentation
2
+
3
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/
4
+
5
+ On this page
6
+
7
+ Dremio administration covers organization-wide and project-level management. Use these tools to configure your environment, manage users and resources, and monitor system performance.
8
+
9
+ ## Organization Management
10
+
11
+ * [Manage Your Subscription](/dremio-cloud/admin/subscription/) – Upgrade from a free trial to a paid subscription, manage billing and payment methods, and track your organization's usage and costs.
12
+ * [Manage Users](/dremio-cloud/admin/users) – Add users to your organization, configure authentication methods (local or SSO), manage user roles and privileges, and control access to Dremio resources.
13
+ * [Configure Model Providers](/dremio-cloud/admin/model-providers) – Configure AI model providers for Dremio's AI Agent, enabling natural language queries and data exploration across your organization.
14
+
15
+ ## Project Management
16
+
17
+ * [Manage Projects](/dremio-cloud/admin/projects/) – Create new projects to isolate compute and data resources for different teams. Configure storage options (Dremio-managed or your own S3 bucket) and manage project-level settings.
18
+ * [Manage Engines](/dremio-cloud/admin/engines/) – Set up and configure query engines that provide the compute resources for running queries. Choose engine sizes, configure auto-scaling, and manage multiple engine replicas for your projects.
19
+ * [Configure External Engines](/dremio-cloud/admin/external-engines) – Connect industry-standard engines like Apache Spark, Trino, and Apache Flink directly to Dremio without vendor lock-in or proprietary protocols.
20
+ * [Monitor Jobs and Audit Logs](/dremio-cloud/admin/monitor/) – Monitor system health, query performance, and resource utilization. View metrics, logs, and alerts to ensure your Dremio environment is running optimally.
21
+ * [Optimize Performance](/dremio-cloud/admin/performance/) – Improve query performance and resource efficiency through Reflection management and the results cache.
22
+
23
+ ## Shared Responsibility Model
24
+
25
+ Dremio operates on a shared responsibility model. For detailed information about responsibilities in each area, download the [Dremio Shared Responsibility Model](https://docs-3063.dremio-documentation.pages.dev/assets/files/Dremio-Cloud-Shared-Responsibility-Model-15f76b24f0b48153532ca15b25d831c4.pdf).
26
+
27
+ Was this page helpful?
28
+
29
+ * Organization Management
30
+ * Project Management
31
+ * Shared Responsibility Model
32
+
33
+ <div style="page-break-after: always;"></div>
34
+
35
+ # Manage Users | Dremio Documentation
36
+
37
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/users
38
+
39
+ On this page
40
+
41
+ Manage user access to your Dremio organization through internal authentication or external identity providers. This page covers user types, account management, and administrative tasks.
42
+
43
+ All users in Dremio are identified by their email address, which serves as their username. Invitations are sent to users' email addresses to set up their accounts.
44
+
45
+ ## User Types
46
+
47
+ Dremio supports two user types with different authentication and management workflows:
48
+
49
+ | Feature | Local Users | SSO Users |
50
+ | --- | --- | --- |
51
+ | **Authentication** | Password set in Dremio | Identity Provider (IdP) credentials |
52
+ | **Credential Management** | Within Dremio | Through your IdP |
53
+ | **Provisioning** | Manual invitation | Manual invitation or SCIM automated |
54
+ | **Password Reset** | Self-service or admin-initiated | Through IdP only |
55
+
56
+ ### Local Users
57
+
58
+ Local users authenticate with passwords managed directly in Dremio. These users must be invited manually. Use local users when you need standalone accounts for contractors, external partners, or testing and development environments.
59
+
60
+ ### SSO Users
61
+
62
+ SSO users authenticate through your organization's identity provider (IdP) like Microsoft Entra ID or Okta, or through social identity providers like Google or GitHub. These users can be invited manually or provisioned automatically via System for Cross-domain Identity Management (SCIM).
63
+
64
+ #### What is SCIM?
65
+
66
+ SCIM is an open standard protocol that automates user provisioning between your identity provider and Dremio. Instead of manually creating and managing users in multiple systems, SCIM keeps everything synchronized automatically. When you add, update, or remove a user in your IdP, those changes propagate to Dremio without manual intervention.
67
+
68
+ #### SCIM Provisioning Benefits
69
+
70
+ When SCIM is configured, Dremio stays synchronized with your IdP. Deleting a user in your IdP automatically reflects in Dremio. Additional benefits of SCIM integration include:
71
+
72
+ * Automatic user creation and deactivation
73
+ * Synchronized user attributes
74
+ * Centralized access management
75
+
76
+ To learn more:
77
+
78
+ * [Configure SCIM with Microsoft Entra ID](/dremio-cloud/security/authentication/idp/microsoft-entra-id)
79
+ * [Configure SCIM with Okta](/dremio-cloud/security/authentication/idp/okta)
80
+ * [Configure SCIM with a generic OIDC provider](/dremio-cloud/security/authentication/idp/generic-oidc-provider)
81
+
82
+ ## Manage Your Account
83
+
84
+ ### Update Your Password
85
+
86
+ **Local users** can reset passwords using either method:
87
+
88
+ **If locked out:**
89
+
90
+ 1. On the login screen, enter your email.
91
+ 2. Click **Forgot Password?**.
92
+ 3. Check your email for the reset link.
93
+
94
+ **If logged in:**
95
+
96
+ 1. Hover over the user icon at the bottom of the navigation sidebar.
97
+ 2. Select **Account Settings**.
98
+ 3. Click **Reset Password**.
99
+ 4. Check your email for the reset link.
100
+
101
+ Changing your password ends all existing Dremio web sessions.
102
+
103
+ **SSO users** must reset passwords through their organization's identity provider. Contact your authentication administrator for assistance.
104
+
105
+ ### Update Your Name
106
+
107
+ You can change your display name at any time:
108
+
109
+ 1. Click the user icon on the side navigation bar.
110
+ 2. Select **Account Settings**.
111
+ 3. On the **General Information** page, edit **First Name** and **Last Name**.
112
+ 4. Click **Save**.
113
+
114
+ ## Administrative Tasks
115
+
116
+ The following tasks require administrator privileges or the [CREATE USER](/dremio-cloud/security/privileges#organization-privileges) privilege.
117
+
118
+ ### View All Users
119
+
120
+ 1. Click ![Settings](/images/icons/settings.png "Settings") on the left navigation bar and choose **Organization settings**.
121
+ 2. Select **Users** in the organization settings sidebar.
122
+
123
+ The table displays all local and SSO users with access to your Dremio instance.
124
+
125
+ ### Add a User
126
+
127
+ **SSO users** are added automatically when you configure [SCIM provisioning](/dremio-cloud/security/authentication/idp#scim).
128
+
129
+ **To add a local user:**
130
+
131
+ 1. Click ![Settings](/images/icons/settings.png "Settings") on the left navigation bar and choose **Organization settings**.
132
+ 2. Select **Users**.
133
+ 3. Click **Add Users**.
134
+ 4. In the **Email address(es)** field, enter one or more email addresses separated by commas, spaces, or line breaks.
135
+ 5. For **Dremio Role**, select the [roles](/dremio-cloud/security/roles) where the user will be a member. All users are members of the PUBLIC role by default.
136
+ 6. Click **Add**.
137
+
138
+ Each user receives an invitation email to set up their account. You can configure additional roles after users accept their invitations.
139
+
140
+ A user's email address serves as their unique identifier and cannot be changed after account creation. If a user's email changes, you must create a new account with the new email address.
141
+
142
+ If invited users don't receive the email, check spam folders and verify the email addresses are correct.
143
+
144
+ ### Edit a User
145
+
146
+ You can modify a user's name and role assignments. Email addresses cannot be edited—if a user's email changes, you must create a new account.
147
+
148
+ 1. Click ![Settings](/images/icons/settings.png "Settings") on the left navigation bar and choose **Organization settings**.
149
+ 2. Select **Users**.
150
+ 3. Hover over the user's row and click ![Edit icon](/images/icons/edit.png) to edit the user.
151
+ 4. **Details tab:** Edit **First Name** and **Last Name**, then click **Save**.
152
+ 5. **Roles tab:** Manage role assignments:
153
+ * **Add roles:** Search for and select roles, then click **Add Roles**.
154
+ * **Remove roles:** Hover over a role and click **Remove**.
155
+ 6. Click **Save**.
156
+
157
+ ### Reset a User's Password
158
+
159
+ This option is only available for local users. SSO users must reset passwords through their identity provider. To send a password reset email to a local user:
160
+
161
+ 1. Click ![Settings](/images/icons/settings.png "Settings") on the left navigation bar and choose **Organization settings**.
162
+ 2. Select **Users**.
163
+ 3. Click the user's name.
164
+ 4. Click **Send Password Reset**.
165
+
166
+ The user receives an immediate email with reset instructions.
167
+
168
+ ### Remove a User
169
+
170
+ **To remove an SSO user:**
171
+
172
+ 1. First, remove the user from your external identity provider.
173
+ 2. Then follow the steps below to remove them from Dremio.
174
+
175
+ **To remove a local user:**
176
+
177
+ 1. Click ![Settings](/images/icons/settings.png "Settings") on the left navigation bar and choose **Organization settings**.
178
+ 2. Select **Users**.
179
+ 3. Click the user's name.
180
+ 4. Click ![Remove icon](/images/icons/trash.png) to remove.
181
+ 5. Confirm the deletion.
182
+
183
+ ## Related Topics
184
+
185
+ * [Roles](/dremio-cloud/security/roles)
186
+ * [Privileges](/dremio-cloud/security/privileges)
187
+ * [Configure Identity Providers](/dremio-cloud/security/authentication/idp/)
188
+
189
+ Was this page helpful?
190
+
191
+ * User Types
192
+ + Local Users
193
+ + SSO Users
194
+ * Manage Your Account
195
+ + Update Your Password
196
+ + Update Your Name
197
+ * Administrative Tasks
198
+ + View All Users
199
+ + Add a User
200
+ + Edit a User
201
+ + Reset a User's Password
202
+ + Remove a User
203
+ * Related Topics
204
+
205
+ <div style="page-break-after: always;"></div>
206
+
207
+ # Manage Engines | Dremio Documentation
208
+
209
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/engines/
210
+
211
+ On this page
212
+
213
+ An engine is a Dremio entity that manages compute resources. Each engine has one or more replicas that are created for executing queries. An engine replica consists of a group of executor instances defined by the engine capacity.
214
+
215
+ When you signed up for Dremio, an organization and a project were automatically created. Each new project has a preview engine. The preview engine, by default, will scale down after 1 hour without a query. As the name suggests, it provides previews of queries and datasets. Unlike other engines, the preview engine cannot be disabled.
216
+
217
+ If an engine is created with a minimum replica of 0, it remains idle until the first query runs. No executor instances run initially. When you run a query, Dremio allocates executors to your project and starts the engine. Engines automatically start and stop based on query load.
218
+
219
+ ## Sizes
220
+
221
+ Dremio provides a standard executor, which is used in all of our query engine sizes. Query engine sizes are differentiated by the number of executors in a replica. For each size, Dremio provides a default query concurrency, as shown in the table below.
222
+
223
+ | Replica Size | Executors per Replica | DCUs | Default Concurrency | Max Concurrency |
224
+ | --- | --- | --- | --- | --- |
225
+ | 2XSmall | 1 | 14 | 2 | 20 |
226
+ | XSmall | 1 | 30 | 4 | 40 |
227
+ | Small | 2 | 60 | 6 | 60 |
228
+ | Medium | 4 | 120 | 8 | 80 |
229
+ | Large | 8 | 240 | 10 | 100 |
230
+ | XLarge | 16 | 480 | 12 | 120 |
231
+ | 2XLarge | 32 | 960 | 16 | 160 |
232
+ | 3XLarge | 64 | 1920 | 20 | 200 |
233
+
234
+ ## States
235
+
236
+ An engine can be in one of the following states.
237
+
238
+ | State | Icon | Description |
239
+ | --- | --- | --- |
240
+ | Running | | Represents an enabled engine (replicas are provisioned automatically or running as per the minimum number of replicas configured). You can use this engine for running queries. |
241
+ | Adding Replica | | Represents an engine that is scaling up (adding a replica). |
242
+ | Removing Replica | | Represents an engine that is scaling down (removing a replica). |
243
+ | Disabling | | Represents an engine that is being disabled. |
244
+ | Disabled | | Represents a disabled engine (no engine replicas have been provisioned dynamically or there are no active replicas). You cannot use this engine for running queries. |
245
+ | Starting Engine | | Represents an engine that is starting (transitioning from the disabled state to the enabled state). |
246
+ | Stopping Engine | | Represents an engine that is stopping (transitioning from the enabled state to the disabled state). |
247
+ | Stopped | | Represents an enabled engine that has been stopped (zero replicas running). |
248
+ | Deleting | | Represents an engine that is being deleted. |
249
+
250
+ ## Autoscaling
251
+
252
+ The autoscaling capability dynamically manages query workload for you based on parameters that you set for the engine. Engine replicas are started and stopped as required to provide a seamless query execution by monitoring the engine replica health.
253
+
254
+ The following table describes the engine parameters along with their role in autoscaling.
255
+
256
+ | Parameter | Description |
257
+ | --- | --- |
258
+ | **Size** | The number of executors that make up an engine replica. |
259
+ | **Max Concurrency** | Maximum number of jobs that can be run concurrently on an engine replica. |
260
+ | **Last Replica Auto-Stop** | Time to wait before deleting the last replica if the engine is not in use. Not valid when the minimum engine replicas is 1 or higher. The default value is 2 hours. |
261
+ | **Enqueued Time Limit** | If there are no available resources, the query waits for a period of time that is set by this parameter. When this time limit exceeds, the query gets canceled. You are notified with the timeout during slot reservation error if the query gets canceled due to the query time limit being exceeded. The default value is 5 minutes. |
262
+ | **Query Runtime Limit** | Time a query can run before it is canceled. The default value is 5 minutes. |
263
+ | **Drain Time Limit** | Time until an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and the running queries fail. The default value is 30 minutes. If there are no queries running on a replica, the engine is terminated without waiting for the drain time limit. |
264
+
265
+ For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. Replicas are dynamically created and assigned to queries based on the query workload. The control plane observes the query workload and current active engine replicas to determine whether to scale up or scale down replicas. Replica is assigned to the query until the query execution is done. For a given engine, Dremio Cloud does not scale up replicas beyond the configured maximum replicas and it does not scale them down below the configured minimum replicas.
266
+
267
+ ### Monitor Engine Health
268
+
269
+ The Dremio Cloud control plane monitors the engines health and manages unhealthy replicas to provide a seamless query execution experience. The replica nodes send periodic heartbeats to the control plane, which determines their liveness. If a periodic heartbeat is not returned from a replica node, the control plane marks that node as unhealthy and replaces it with a healthy one.
270
+
271
+ ## View All Engines
272
+
273
+ To view engines:
274
+
275
+ 1. In the Dremio Cloud application, click the Project Settings ![This is the icon that represents the Project Settings.](/images/icons/project-settings.png "Icon represents the Project Settings.") icon in the side navigation bar.
276
+ 2. Select **Engines** in the project settings sidebar to see the list of engines in the project. On the **Engines** page, you can also see engines as per the status. Click the **Status** dropdown list to see the different statuses.
277
+
278
+ ## Add an Engine
279
+
280
+ To add a new engine:
281
+
282
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The **Engines** page lists the engines created for the project. Every engine created in a project is created in the cloud account associated with that project.
283
+ 2. Click the **Add Engine** button on the top-right of the **Engines** page to create a new engine.
284
+ 3. In the **Add Engine** dialog, for **Engine**, enter a name.
285
+ 4. (Optional) For **Description**, enter a description.
286
+ 5. (Optional) For **Size**, select the size of the engine. The size designates the number of executors.
287
+ 6. (Optional) For **Max Concurrency per Replica**, enter the maximum number of jobs that can be run concurrently on this engine.
288
+
289
+ The following parameters are for **Engine Replicas**:
290
+
291
+ 7. For **Min Replicas**, enter the minimum number of engine replicas that Dremio Cloud has running at any given time. For auto-stop, set it to 0. To guarantee low-latency query execution, set it to 1 or higher. The default number of minimum replicas is 0.
292
+ 8. For **Max Replicas**, enter the maximum number of engine replicas that Dremio Cloud scales up to. The default number of maximum replicas is 1.
293
+
294
+ tip
295
+
296
+ You can use these settings to control costs and ensure that excessive replicas are not spun up.
297
+
298
+ 10. Under **Advanced Configuration**. For **Last Replica Auto-Stop**, enter the time to wait before deleting the last replica if engine is not in use. The default value is 2 hours, and the minimum value is 1 minute.
299
+
300
+ note
301
+
302
+ The last replica auto stop is not valid when the minimum number of engine replicas is 1 or higher.
303
+
304
+ The following parameters are for **Time Limit**:
305
+
306
+ 11. For **Enable Enqueued Time Limit**, check the box.
307
+ 12. For **Enqueued Time Limit**, enter the time a query waits before being cancelled. The default value is 5 minutes.
308
+
309
+ caution
310
+
311
+ You should not set the enqueued time limit to less than one minutes, which is the typical time to start a new replica. Changing this setting does not affect queries that are currently running or queued.
312
+
313
+ 13. (Optional) For **Enable Query Time Limit**, check the box to enable the query time limit for making a query run before it is canceled.
314
+ 14. (Optional) For **Query Runtime Limit**, enter the time a query can run before it is canceled. The default query runtime limit is 5 minutes.
315
+ 15. For **Drain Time Limit**, enter the time (in minutes) that an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and the running queries fail. The default value is 30 minutes. If there are no queries running on a replica, the engine is terminated without waiting for the drain time limit.
316
+ 16. Click **Save and Launch**. This action saves the configuration, enables this engine, and allocates the executors.
317
+
318
+ ## Edit an Engine
319
+
320
+ To edit an engine:
321
+
322
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar.
323
+ 2. On the **Engines** page, hover over the row of the engine that you want to edit and click on the Edit Engine ![This is the icon that represents the Edit Engine settings.](/images/icons/edit.png "Icon represents the Edit Engines settings.") icon that appears next to the engine. The **Edit Engine** dialog is opened.
324
+
325
+ Alternatively, you can click the engine to go to the engine's page. Click the **Edit Engine** button on the top-right of the page.
326
+
327
+ note
328
+
329
+ You cannot edit the **Engine name** parameter.
330
+
331
+ 3. For **Description**, enter a description.
332
+ 4. For **Size**, select the size of the engine. The size designates the number of executors.
333
+ 5. For **Max Concurrency per Replica**, enter the maximum number of jobs that can be run concurrently on this engine.
334
+
335
+ The following parameters are for **Engine Replicas**:
336
+
337
+ 6. For **Min Replicas**, enter the number of engine replicas that Dremio has running at any given time. Set this value to 0 to enable auto-stop, or to 1 or higher to ensure low-latency query execution.
338
+ 7. For **Max Replicas**, enter the maximum number of engine replicas that Dremio scales up to.
339
+ 8. Under **Advanced Configuration**. **Last Replica Auto-Stop**, enter the time to wait before deleting the last replica if the engine is not in use. The default value is 2 hours.
340
+
341
+ note
342
+
343
+ The last replica auto-stop is not valid when the minimum number of engine replicas is 1 or higher.
344
+
345
+ The following parameters are for **Time Limit**:
346
+
347
+ 10. For **Enable Enqueued Time Limit**, check the box.
348
+ 11. For **Enqueued Time Limit**, enter the time a query waits before being canceled. The default value is 5 minutes.
349
+
350
+ caution
351
+
352
+ You should not set the enqueued time limit to less than one minutes, which is the typical time to start a new replica. Changing this setting does not affect queries that are currently running or queued.
353
+
354
+ 12. (Optional) For **Enable Query Time Limit**, check the box to enable the query time limit for making a query run before it is canceled.
355
+ 13. (Optional) For **Query Runtime Limit**, enter the time a query can run before it is canceled. The default query runtime limit is 5 minutes.
356
+ 14. For **Drain Time Limit**, enter the time (in minutes) that an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and any running queries fail. The default value is 30 minutes. If no queries are running on a replica, the engine is terminated without waiting for the drain time limit.
357
+ 15. Click **Save**.
358
+
359
+ ## Disable an Engine
360
+
361
+ You can disable an engine that is not being used:
362
+
363
+ To disable the engine:
364
+
365
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
366
+ 2. Disable the engine by using the toggle in the **Enabled** column.
367
+ 3. Confirm that you want to disable the engine.
368
+
369
+ ## Enable an Engine
370
+
371
+ To enable a disabled engine:
372
+
373
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
374
+ 2. Enable the engine by using the toggle in the **Enabled** column.
375
+ 3. Confirm that you want to enable the engine.
376
+
377
+ ## Delete an Engine
378
+
379
+ You can permanently delete an engine if it is not in use (this action is irreversible). If queries are running on the engine, then Dremio waits for the drain-time-limit for the running queries to complete before deleting the engine.
380
+
381
+ caution
382
+
383
+ An engine that has a routing rule associated with it cannot be deleted. Delete the rules before deleting the engine.
384
+
385
+ To delete an engine:
386
+
387
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
388
+ 2. On the **Engines** page, hover over the row of the engine that you want to delete and click the Delete ![This is the icon that represents the Delete settings.](/images/icons/trash.png "Icon represents the Delete settings.") icon that appears next to the engine.
389
+ 3. Confirm that you want to delete the engine.
390
+
391
+ ## Troubleshoot
392
+
393
+ If your engines are not scaling up or down as expected, you can reference the engine events to see the error that is causing the issue.
394
+
395
+ To view engine events:
396
+
397
+ 1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
398
+ 2. On the **Engines** page, click on the engine that you want to investigate.
399
+ 3. In the engine details page, click on the **Events** tab to view the scaling events and status of each event.
400
+ 4. If any scaling problems persist, contact [Dremio Support](https://support.dremio.com/).
401
+
402
+ Was this page helpful?
403
+
404
+ * Sizes
405
+ * States
406
+ * Autoscaling
407
+ + Monitor Engine Health
408
+ * View All Engines
409
+ * Add an Engine
410
+ * Edit an Engine
411
+ * Disable an Engine
412
+ * Enable an Engine
413
+ * Delete an Engine
414
+ * Troubleshoot
415
+
416
+ <div style="page-break-after: always;"></div>
417
+
418
+ # Monitor | Dremio Documentation
419
+
420
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/
421
+
422
+ On this page
423
+
424
+ As an administrator, you can monitor catalog usage and jobs in the Dremio console. You can also use the Dremio APIs and SQL to retrieve information about jobs and events for the projects in your organization.
425
+
426
+ ### Monitor the Dremio Console
427
+
428
+ The Monitor page in the Dremio console allows you to monitor usage across your project, making it easier to observe patterns, analyze the resources being consumed by your data platform, and understand the impact on your users. You must be a member of the `ADMIN` role to access the Monitor page.
429
+
430
+ #### Catalog Usage
431
+
432
+ The data visualizations on the Monitor page point you to the most queried data and folders in a catalog.
433
+
434
+ Go to **Settings** > **Monitor** to view your catalog usage. When you open the Monitor page, you are directed to the Catalog Usage tab by default where you can see the following metrics:
435
+
436
+ * A table of the top 10 most queried datasets within the specified time range, including for each the number of linked jobs, the percentage of linked jobs in which the dataset was accelerated, and the total number of Reflections defined on the dataset
437
+ * A table of the top 10 most queried source folders within the specified time range, including for each the number of linked jobs and the top users of that folder
438
+
439
+ note
440
+
441
+ A source can be listed in the top 10 most queried source folders if the source contains a child dataset that was used in the query (for example, `postgres.accounts`). Queries of datasets in sub-folders (for example, `s3.mybucket.iceberg_table`) are classified by the sub-folder and not the source.
442
+
443
+ All datasets are assessed in the metrics on the Monitor page except for datasets in the [system tables](/dremio-cloud/sql/system-tables/) and the [information schema](/dremio-cloud/sql/information-schema/).
444
+
445
+ The metrics on the Monitor page analyze only user queries. Refreshes of data Reflections and metadata refreshes are excluded.
446
+
447
+ #### Jobs
448
+
449
+ The data visualizations on the Monitor page show the metrics for queries executed in your project, including statistics about performance and utilization.
450
+
451
+ Go to **Settings** > **Monitor** > **Jobs** to open the Jobs tab and see an aggregate view of the following metrics for the jobs that are running in your project:
452
+
453
+ * A report of today's job count and failed/canceled rate in comparison to yesterday's metrics
454
+ * A list of the top 10 most active users within the specified time range, including the number of linked jobs for each user
455
+ * Total jobs accelerated, total job time saved, and average job speedup from Autonomous Reflections over the past month
456
+ * Total number of jobs accelerated by autonomous and manual Reflections over time
457
+ * A graph showing the total number of completed and failed jobs over time (aggregated hourly or daily)
458
+ * A graph of all completed and failed jobs according to their engine (aggregated hourly or daily)
459
+ * A graph of all job states showing the percentage of time consumed for each [state](/dremio-cloud/admin/monitor/jobs#job-states-and-statuses) (aggregated hourly or daily)
460
+ * A table of the top 10 longest running jobs within the specified time range, including the linked ID, duration, user, query type, and start time of each job
461
+
462
+ To examine all jobs and the details of specific jobs, see [Viewing Jobs](/dremio-cloud/admin/monitor/jobs).
463
+
464
+ You can create reports of jobs in other BI tools by leveraging the [`sys.project.history.jobs` table](/dremio-cloud/sql/system-tables/jobs-historical).
465
+
466
+ ### Monitor with Dremio APIs and SQL
467
+
468
+ Administrators can use the Dremio APIs and SQL to retrieve information about the jobs and events in every project in the organization. This information is useful for further monitoring and analysis.
469
+
470
+ Before you begin, make sure that you are assigned to the ADMIN role for the organization whose information you want to retrieve. You also need a [personal access token (PAT)](/dremio-cloud/security/authentication/personal-access-token#create-a-pat) to make the necessary API requests.
471
+
472
+ The code examples in this section are written in Python.
473
+
474
+ The procedure below provides individual code examples for retrieving project IDs, retrieving information for jobs and events, saving query results to Parquet files, and uploading the Parquet files to an AWS S3 bucket. See the combined example for a single code example that combines all of the steps.
475
+
476
+ 1. Get the IDs for all projects in the organization. In the code example for this step, the `get_projects` method uses the [Projects](/dremio-cloud/api/projects) API to get the project IDs.
477
+
478
+ note
479
+
480
+ In the following code example, replace `<personal_access_token>` with your PAT.
481
+
482
+ To use the API control plane for the EU rather than the US, replace `https://api.dremio.cloud/` with `https://api.eu.dremio.cloud/`.
483
+
484
+ Get the IDs for all projects
485
+
486
+ ```
487
+ import requests
488
+ import json
489
+
490
+ dremio_server = "https://api.dremio.cloud/"
491
+ personal_access_token = "<personal_access_token>"
492
+
493
+ headers = {
494
+ 'Authorization': "Bearer " + personal_access_token,
495
+ 'Content-Type': "application/json"
496
+ }
497
+
498
+ def api_get(endpoint: str) -> Response:
499
+ return requests.get(f'{dremio_server}/{endpoint}', headers=headers)
500
+
501
+ def get_projects() -> dict:
502
+ """
503
+ Get all projects in the Dremio Cloud organization
504
+ :return: Dictionary of project IDs and project names
505
+ """
506
+ projects = dict()
507
+ projects_response = api_get('v0/projects')
508
+ for project in projects_response.json():
509
+ projects[project['id']] = project['name']
510
+ return projects
511
+ ```
512
+
513
+ 2. Run a SQL query to get the jobs or events for the project. The code examples for this step show how to use the [SQL](/dremio-cloud/api/sql) API to submit a SQL query, get all jobs during a specific period with the `get_jobs` method, and get all events in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) system table during a specific period with the `get_events` method.
514
+
515
+ Submit SQL query using the API
516
+
517
+ ```
518
+ def api_post(endpoint: str, body=None) -> Response:
519
+ return requests.post(f'{dremio_server}/{endpoint}',
520
+ headers=headers, data=json.dumps(body))
521
+
522
+ def run_sql(project_id: str, query: str) -> str:
523
+ """
524
+ Run a SQL query
525
+ :param project_id: project ID
526
+ :param query: SQL query
527
+ :return: query job ID
528
+ """
529
+ query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
530
+ job_id = query_response.json()['id']
531
+ return job_id
532
+ ```
533
+
534
+ Get all jobs in the project during a specific period
535
+
536
+ ```
537
+ def api_post(endpoint: str, body=None) -> Response:
538
+ return requests.post(f'{dremio_server}/{endpoint}',
539
+ headers=headers, data=json.dumps(body))
540
+
541
+ def get_jobs(project_id: str, start_time: str, end_time: str) -> str:
542
+ """
543
+ Run SQL query to get all jobs in a project during the specified time period
544
+ :param project_id: project ID
545
+ :param start_time: start timestamp (inclusive)
546
+ :param end_time: end timestamp (exclusive)
547
+ :return: query job ID
548
+ """
549
+ query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
550
+ job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.jobs '
551
+ f'WHERE "submitted_ts" >= \'{start_time}\' '
552
+ f'AND "submitted_ts" < \'{end_time}\'')
553
+ return job_id
554
+ ```
555
+
556
+ Get all events during a specific period
557
+
558
+ ```
559
+ def get_events(project_id: str, start_time: str, end_time: str) -> str:
560
+ """
561
+ Run SQL query to get all events in sys.project.history.events during the specified time period
562
+ :param project_id: project ID
563
+ :param start_time: start timestamp (inclusive)
564
+ :param end_time: end timestamp (exclusive)
565
+ :return: query job ID
566
+ """
567
+ job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.events '
568
+ f'WHERE "timestamp" >= \'{start_time}\' '
569
+ f'AND "timestamp" < \'{end_time}\'')
570
+ return job_id
571
+ ```
572
+
573
+ 3. Check the status of the query to get jobs or events. In the code example for this step, the `wait_for_job_complete` method periodically checks and returns the query job state and prints out the final job status when the query is complete.
574
+
575
+ Check status of the query to get jobs or events
576
+
577
+ ```
578
+ def wait_for_job_complete(project_id: str, job_id: str) -> str:
579
+ """
580
+ Wait for a query job to complete
581
+ :param project_id: project ID
582
+ :param job_id: job ID
583
+ :return: if the job completed successfully, True; otherwise, False
584
+ """
585
+ while True:
586
+ time.sleep(1)
587
+ job = api_get(f'v0/projects/{project_id}/job/{job_id}')
588
+ job_state = job.json()["jobState"]
589
+ if job_state == 'COMPLETED':
590
+ print("Job complete.")
591
+ break
592
+ elif job_state == 'FAILED':
593
+ print("Job failed.", job.json()['errorMessage'])
594
+ break
595
+ elif job_state == 'CANCELED':
596
+ print("Job canceled.")
597
+ break
598
+
599
+ return job_state
600
+ ```
601
+
602
+ 4. Download the result for the query to get jobs or events and save it to a Parquet file. In the code example for this step, the `save_job_results_to_parquet` method downloads the query result and, if the result contains at least one row, saves the result to a single Parquet file.
603
+
604
+ Download query result and save to a Parquet file
605
+
606
+ ```
607
+ def save_job_results_to_parquet(project_id: str, job_id: str,
608
+ parquet_file_name: str) -> bool:
609
+ """
610
+ Download the query result and save it to a Parquet file
611
+ :param project_id: project ID
612
+ :param job_id: query job ID
613
+ :param parquet_file_name: file name to save the job result
614
+ :return: if the query returns more than 0 rows and parquet file is saved, True; otherwise False
615
+ """
616
+ offset = 0
617
+ rows_downloaded = 0
618
+ rows = []
619
+ while True:
620
+ job_result = api_get(f'v0/projects/{project_id}/job/{job_id}/'
621
+ f'results/?offset={offset}&limit=500')
622
+ job_result_json = job_result.json()
623
+ row_count = job_result_json['rowCount']
624
+ rows_downloaded += len(job_result_json['rows'])
625
+ rows += job_result_json['rows']
626
+ if rows_downloaded >= row_count:
627
+ break
628
+ offset += 500
629
+
630
+ print(rows_downloaded, "rows")
631
+ if rows_downloaded > 0:
632
+ py_rows = pyarrow.array(job_result_json['rows'])
633
+ table = pyarrow.Table.from_struct_array(py_rows)
634
+ pyarrow.parquet.write_table(table, parquet_file_name)
635
+ return True
636
+
637
+ return False
638
+ ```
639
+
640
+ 5. If desired, you can use the [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) library to upload the Parquet file to an AWS S3 bucket.
641
+
642
+ Upload Parquet file to AWS S3 with Boto3 library
643
+
644
+ ```
645
+ def upload_file(file_name: str, bucket: str, folder: str):
646
+ """Upload Parquet file to an S3 bucket with Boto3
647
+ :param file_name: File to upload
648
+ :param bucket: Bucket to upload to
649
+ :param folder: Folder to upload to
650
+ :return: True if file was uploaded, else False
651
+ """
652
+
653
+ # Upload the file
654
+ s3_client = boto3.client('s3')
655
+ try:
656
+ response = s3_client.upload_file(file_name, bucket, f'{folder}/{file_name}')
657
+ except ClientError as e:
658
+ print(e)
659
+ return False
660
+ return True
661
+ ```
662
+
663
+ #### Combined Example
664
+
665
+ The following code example combines the steps above to get all jobs and events from all projects during a specific period, save the query results to Parquet files, and upload the Parquet files to an AWS S3 bucket. The parameter `start` is the start timestamp (inclusive) and the parameter `end` is the end timestamp (exclusive).
666
+
667
+ All jobs in each project during the specified time period are saved in an individual Parquet file with file name `jobs_<project_id><start>.parquet`. All events in each project during the specified time period are saved in one Parquet file with file name `events_<project_id><start>.parquet`.
668
+
669
+ Combine all steps in a single code example
670
+
671
+ ```
672
+ def main(start: str, end: str):
673
+ """
674
+ Get all jobs and events from all projects during the specified time period, save the results in Parquet files, and upload the files to an AWS S3 bucket.
675
+ :param start: start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
676
+ :param end: end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
677
+ """
678
+ projects = get_projects()
679
+ print("Projects in organization:")
680
+ print(projects)
681
+
682
+ # Get jobs for each project
683
+ for project_id in projects:
684
+ print("Get jobs for project", projects[project_id])
685
+ # run query
686
+ job_id = get_jobs(project_id, start, end)
687
+ # check job status
688
+ job_state = wait_for_job_complete(project_id, job_id)
689
+ if job_state == "COMPLETED":
690
+ file_name = f'jobs_{project_id}{start}.parquet'
691
+ if save_job_results_to_parquet(project_id, job_id, file_name):
692
+ upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/jobs')
693
+
694
+ for project_id in projects:
695
+ print("Get events for project", projects[project_id])
696
+ # run query
697
+ job_id = get_events(project_id, start, end)
698
+ # check job status
699
+ job_state = wait_for_job_complete(project_id, job_id)
700
+ if job_state == "COMPLETED":
701
+ file_name = f'events_{project_id}{start}.parquet'
702
+ if save_job_results_to_parquet(project_id, job_id, file_name):
703
+ upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/events')
704
+
705
+ if __name__ == "__main__":
706
+ parser = argparse.ArgumentParser(
707
+ description='Demo of collecting jobs and events from Dremio Cloud Projects')
708
+ parser.add_argument('start',
709
+ help='start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
710
+ parser.add_argument('end',
711
+ help='end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
712
+ args = parser.parse_args()
713
+
714
+ main(args.start, args.end)
715
+ ```
716
+
717
+ Was this page helpful?
718
+
719
+ * Monitor the Dremio Console
720
+ * Monitor with Dremio APIs and SQL
721
+
722
+ <div style="page-break-after: always;"></div>
723
+
724
+ # Manage Projects | Dremio Documentation
725
+
726
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/
727
+
728
+ On this page
729
+
730
+ A project isolates the compute, data, and other resources a team needs for data analysis. An organization may contain multiple projects. Your first project is created during the sign-up process.
731
+
732
+ Each project in Dremio has its own storage. This is used to store metadata and Reflections and serves as the default storage location for the project's Open Catalog. You can choose between two storage options:
733
+
734
+ * Dremio-managed storage – No setup or configuration required. Usage is priced per TB, billed monthly.
735
+ * Your own storage – Use your own Amazon S3 storage. However, this requires you to manage this infrastructure.
736
+
737
+ For details on pricing, see [How Storage Usage Is Calculated](/dremio-cloud/admin/subscription/usage#how-storage-usage-is-calculated).
738
+
739
+ Each project in your organization contains a preview engine. Each new project has a preview engine. The preview engine, by default, will scale down after 1 hour without a query. As the name suggests, it provides previews of queries and datasets. Unlike other engines, the preview engine cannot be disabled, ensuring that many core Dremio functions that require an engine can always run.
740
+
741
+ ## View All Projects
742
+
743
+ To view all projects:
744
+
745
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
746
+ 2. Select **Projects** in the organization settings sidebar.
747
+
748
+ The Projects page displays the status of all projects in your organization. Possible statuses include:
749
+
750
+ * Creating
751
+ * Active
752
+ * Inactive
753
+ * Deactivating
754
+ * Activating
755
+ * Archiving
756
+ * Archived
757
+ * Restoring
758
+
759
+ ## Grant Access to a Project
760
+
761
+ New projects are private by default. In the projects page, users can see only the projects for which they have USAGE or OWNERSHIP [privileges](/dremio-cloud/security/privileges). The projects page is empty for users without USAGE or OWNERSHIP privileges on any projects. The projects dropdown list shares this behavior.
762
+
763
+ Similarly, the [Projects API](/dremio-cloud/api/projects) returns an HTTP 403 Forbidden error for requests from users who do not have USAGE or OWNERSHIP privileges on the project. Also, users must have USAGE or OWNERSHIP privileges on a project before they can make API requests or run SQL queries on any objects in the project, even if they have object-level privileges on sources, folders, or other objects in the project.
764
+
765
+ To allow users to access a project, use the [`GRANT TO ROLE`](/dremio-cloud/sql/commands/grant-to-role) or [`GRANT TO USER`](/dremio-cloud/sql/commands/grant-to-user) SQL command or the [Grants API](/dremio-cloud/api/catalog/grants) to grant them the USAGE privilege on the project. For users who do not own the project, USAGE is the minimum privilege required to perform any operation on the project and the objects the project contains. For example, if you are using `GRANT TO USER`, you can run `GRANT USAGE ON PROJECT TO USER <username>`.
766
+
767
+ ## Obtain the ID of a Project
768
+
769
+ A BI client application might require the ID of a project as part of the information for creating a connection to Dremio. You can obtain the ID from the General Information page of a project's settings.
770
+
771
+ To obtain a project ID:
772
+
773
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Project settings**.
774
+ 2. Select **General Information** in the project settings sidebar.
775
+ 3. Copy the value in the **Project ID** field.
776
+
777
+ ## Set the Default Project
778
+
779
+ When your data consumers connect to Dremio from BI tools, they must connect to the projects where their datasets reside. They can either connect to the default project or select a different project.
780
+
781
+ If an organization administrator does not set this value, Dremio automatically sets the default project to the oldest project in your organization.
782
+
783
+ You can change the default project at any time.
784
+
785
+ note
786
+
787
+ Data consumers who do not have access to the default project must select an alternative project ID when connecting to Dremio from their BI tools.
788
+
789
+ To specify the default project for your organization:
790
+
791
+ 1. Hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
792
+ 2. Select **General Information** in the organization settings sidebar.
793
+ 3. In the **Default Project** field, select the project that you want data consumers to connect to by default through their BI tools.
794
+ 4. Click **Save**.
795
+
796
+ ## Create a Project
797
+
798
+ If you're planning on using your own bucket, you will need create a role for Dremio granting access, this must be done prior to creating a project, see [Bring Your Own Project Store](/dremio-cloud/admin/projects/your-own-project-storage) for instructions. To avoid having to do this simply use Dremio-managed storage.
799
+
800
+ To add a project:
801
+
802
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
803
+ 2. Select **Projects** option in the organization settings sidebar.
804
+ 3. In the top-right corner of the Projects page, click **Create**.
805
+ 4. For **Project name**, specify a name that is unique within the organization.
806
+ 5. For **Region**, select the AWS Region where you wish the project to reside.
807
+ 6. Select one of the two **Storage** options:
808
+
809
+ * For **Dremio managed storage**, Dremio will create and manage object storage for your use.
810
+ * For **your own storage**, you will need to provide Dremio the bucket URI and Role ARN previously created.
811
+
812
+ ## Activate a Project
813
+
814
+ Dremio automatically deactivates any project that has not been accessed in the last 15 days. Dremio sends a courtesy email to project owners three days prior to deactivation. Inactive projects are displayed in the project selector in the side navigation bar and on the Projects page. An inactive project will be activated automatically when any user tries to access it via the Dremio console, an ODBC or JDBC connection, or an API call.
815
+
816
+ note
817
+
818
+ Inactive projects do not consume any compute resources.
819
+
820
+ You can activate an inactive project on the Projects page, or by clicking the project in the project selector. It takes a few minutes to activate a project.
821
+
822
+ To activate a project from the Projects page:
823
+
824
+ 1. Hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
825
+ 2. Select **Projects** in the organization settings sidebar.
826
+ 3. Click the ellipsis menu to the far right of the inactive project, and then click **Activate Project**.
827
+
828
+ The project status will change to *Activating* while the project is activated. You can access the project after the status changes to *Active*.
829
+
830
+ ## Archive a Project
831
+
832
+ Users with OWNERSHIP privileges or users assigned to the ADMIN role can archive a project. Archived projects are displayed only on the Projects page.
833
+
834
+ note
835
+
836
+ Archived projects do not consume any compute resources.
837
+
838
+ To archive a project:
839
+
840
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
841
+ 2. Select **Projects** in the organization settings sidebar.
842
+ 3. Click the ellipsis menu to the far right of an active or inactive project, and then click **Archive Project**.
843
+
844
+ The project status will change to *Archiving* while the project is archived. When archiving is complete, the status changes to *Archived*.
845
+
846
+ ## Restore an Archived Project
847
+
848
+ An archived project will not be restored automatically if a user tries to access it and can only be restored manually by a user with OWNERSHIP privileges on the project or users assigned to the ADMIN role. It takes a few minutes to restore an archived project.
849
+
850
+ To restore an archived project:
851
+
852
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization settings**.
853
+ 2. Select **Projects** in the organization settings sidebar.
854
+ 3. Click the ellipsis menu to the far right of an archived project and select **Restore Project**.
855
+
856
+ The project status will change to *Restoring* while the project is restored. You can access the project after the status changes to *Active*.
857
+
858
+ ## Delete a Project
859
+
860
+ Default projects cannot be deleted. If you want to delete the default project, you must first set another project as the default. See Set the Default Project.
861
+
862
+ To delete a project:
863
+
864
+ 1. In the Dremio console, hover over ![This is the Dremio Settings icon.](/images/icons/settings.png "This is the Dremio Settings icon.") in the side navigation bar and select **Organization Settings**.
865
+ 2. Select **Projects** in the organization settings sidebar.
866
+ 3. Click the ellipsis menu to the far right and select **Delete Project**.
867
+ 4. Confirm that you want to delete the project.
868
+
869
+ Was this page helpful?
870
+
871
+ * View All Projects
872
+ * Grant Access to a Project
873
+ * Obtain the ID of a Project
874
+ * Set the Default Project
875
+ * Create a Project
876
+ * Activate a Project
877
+ * Archive a Project
878
+ * Restore an Archived Project
879
+ * Delete a Project
880
+
881
+ <div style="page-break-after: always;"></div>
882
+
883
+ # Audit Logs | Dremio Documentation
884
+
885
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/logs
886
+
887
+ On this page
888
+
889
+ The creation and modification of Dremio resources are tracked and traceable via the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table. Audit logging is enabled by default and available to users with administrative permissions on the project.
890
+
891
+ An event can take up to three hours to propagate to the system table. There is currently no maximum retention policy for audit events.
892
+
893
+ note
894
+
895
+ This is a subset of the events that Dremio supports.
896
+
897
+ ## Organization Events
898
+
899
+ Dremio supports audit logging for the following organization event types and actions. The [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table contains these events in the default project.
900
+
901
+ | Event Type | Actions | Description |
902
+ | --- | --- | --- |
903
+ | BILLING\_ACCOUNT | BILLING\_ACCOUNT\_ADD\_PROJECT | Dremio added a new project to the billing account during project creation. |
904
+ | BILLING\_ACCOUNT | BILLING\_ACCOUNT\_CREATE | A user created a billing account. |
905
+ | BILLING\_ACCOUNT | BILLING\_ACCOUNT\_REMOVE\_PROJECT | Dremio removed a project from the billing account during project deletion. |
906
+ | BILLING\_ACCOUNT | BILLING\_ACCOUNT\_UPDATE | A user modified the billing account, such as the notification email address. |
907
+ | BILLING\_TRANSACTION | TRANSACTION\_CHARGE | Dremio recorded Dremio Consumption Unit (DCU) usage charges for the period. |
908
+ | BILLING\_TRANSACTION | TRANSACTION\_CREDIT\_LOAD | Dremio loaded DCU credits into the billing account. |
909
+ | CATALOG | CREATE | A user created a new Open Catalog. Catalog creation is included with project creation. |
910
+ | CATALOG | DELETE | A user deleted an Open Catalog. Project deletion also deletes its primary Open Catalog. |
911
+ | CATALOG | UPDATE | A user updated an Open Catalog configuration. |
912
+ | CLOUD | CREATE\_STARTED CREATE\_COMPLETED | A user created a cloud. Clouds provide resources for running engines and storing metadata in a project. |
913
+ | CLOUD | DELETE\_STARTED DELETE\_COMPLETED | A user deleted a cloud. |
914
+ | CLOUD | UPDATE | A user updated a cloud. |
915
+ | CONNECTION | FORCE\_LOGOUT | A user changed their password or deactivated another user, ending all of that user's sessions. |
916
+ | CONNECTION | LOGIN | A user logged in. |
917
+ | CONNECTION | LOGOUT | A user logged out. |
918
+ | EDITION | DOWNGRADE | A user downgraded the billing edition in the Dremio organization. |
919
+ | EDITION | UPGRADE | A user upgraded the billing edition in the Dremio organization. |
920
+ | IDENTITY\_PROVIDER | CREATE | A user configured a new OpenID Connect (OIDC) identity provider integration. |
921
+ | IDENTITY\_PROVIDER | DELETE | A user deleted an OIDC identity provider. |
922
+ | IDENTITY\_PROVIDER | UPDATE | A user updated an OIDC identity provider configuration. |
923
+ | MODEL\_PROVIDER\_CONFIG | CREATE | A user created a new model provider in the Dremio organization. |
924
+ | MODEL\_PROVIDER\_CONFIG | UPDATE | A user updated a model provider in the Dremio organization. |
925
+ | MODEL\_PROVIDER\_CONFIG | DELETE | A user deleted a model provider in the Dremio organization. |
926
+ | MODEL\_PROVIDER\_CONFIG | SET\_DEFAULT | A user set a new default model provider in the Dremio organization. |
927
+ | ORGANIZATION | CREATE\_STARTED CREATE\_COMPLETED | A user created the Dremio organization. |
928
+ | ORGANIZATION | DELETE\_STARTED DELETE\_COMPLETED | A user closed and deleted the Dremio organization. |
929
+ | ORGANIZATION | UPDATE | A user updated the Dremio Cloud organization. |
930
+ | PERSONAL\_ACCESS\_TOKEN | CREATE | A user created a new personal access token in their account. |
931
+ | PERSONAL\_ACCESS\_TOKEN | DELETE | A user deleted a personal access token. |
932
+ | PROJECT | CREATE\_STARTED CREATE\_COMPLETED | A user created a project in the Dremio organization. |
933
+ | PROJECT | DELETE\_STARTED DELETE\_COMPLETED | A user deleted a project. |
934
+ | PROJECT | HIBERNATE\_STARTED HIBERNATE\_COMPLETED | A user archived a project. |
935
+ | PROJECT | UNHIBERNATE\_STARTED UNHIBERNATE\_COMPLETED | A user activated an archived project. |
936
+ | PROJECT | UPDATE | A user updated the configuration of a project. |
937
+ | ROLE | CREATE | A user created a custom role. |
938
+ | ROLE | DELETE | A user deleted a role. |
939
+ | ROLE | MEMBERS\_ADDED | A user added users or roles as members of a role. |
940
+ | ROLE | MEMBERS\_REMOVED | A user removed users or roles as members of a role. |
941
+ | ROLE | UPDATE | A user updated the metadata of a custom role, such as the description. |
942
+ | USER\_ACCOUNT | CREATE | A user added a user account. |
943
+ | USER\_ACCOUNT | DELETE | A user deleted a user account. |
944
+ | USER\_ACCOUNT | PASSWORD\_CHANGE | A user updated their account password. |
945
+ | USER\_ACCOUNT | UPDATE | A user updated user account metadata. |
946
+
947
+ ## Project Events
948
+
949
+ | Event Type | Actions | Description |
950
+ | --- | --- | --- |
951
+ | AI\_AGENT | REQUEST RESPONSE | A user sent a request to the AI Agent and received a response. |
952
+ | ENGINE | CREATE\_STARTED CREATE\_COMPLETED | A user created an engine. |
953
+ | ENGINE | DELETE\_STARTED DELETE\_COMPLETED | A user deleted an engine. |
954
+ | ENGINE | DISABLE\_STARTED DISABLE\_COMPLETED | A user disabled an engine. |
955
+ | ENGINE | ENABLE\_STARTED ENABLE\_COMPLETED | A user enabled an engine. |
956
+ | ENGINE | UPDATE\_STARTED UPDATE\_COMPLETED | A user updated an engine configuration. |
957
+ | ENGINE\_SCALING | SCALE\_DOWN\_STARTED SCALE\_DOWN\_COMPLETED | Dremio scaled down an engine by stopping one or more running replicas. |
958
+ | ENGINE\_SCALING | SCALE\_UP\_STARTED SCALE\_UP\_COMPLETED | Dremio scaled up an engine by starting one or more additional replicas. |
959
+ | LABEL | UPDATE | A user created a label on a dataset, source, or other object. |
960
+ | PIPE | CREATE | A user created an autoingest pipe for Apache Iceberg. |
961
+ | PIPE | DELETE | A user dropped an autoingest pipe. |
962
+ | PIPE | UPDATE | A user updated the configuration of an existing autoingest pipe. |
963
+ | PRIVILEGE | DELETE | A user deleted a privilege from a user or role. |
964
+ | PRIVILEGE | UPDATE | A user granted a privilege to a user or role. |
965
+ | REFLECTION | CREATE | A user created a new raw or aggregate Reflection. |
966
+ | REFLECTION | DELETE | A user deleted a Reflection. |
967
+ | REFLECTION | UPDATE | A user updated the content or configuration of a Reflection. |
968
+ | REPLICA | CREATE\_STARTED CREATE\_COMPLETED | Dremio started a replica during an ENGINE\_SCALING scale-up event. |
969
+ | REPLICA | DELETE\_STARTED DELETE\_COMPLETED | Dremio stopped a replica during an ENGINE\_SCALING scale-down event. |
970
+ | ROUTING\_RULESET | UPDATE | A user modified an engine routing rule. |
971
+ | SUPPORT\_SETTING | RESET | A user reset an advanced configuration or diagnostic setting. |
972
+ | SUPPORT\_SETTING | SET | A user set an advanced configuration or diagnostic setting. |
973
+ | UDF | CREATE | A user created a user-defined function. |
974
+ | UDF | DELETE | A user deleted a user-defined function. |
975
+ | UDF | UPDATE | A user modified the SQL definition of a user-defined function. |
976
+ | WIKI | EDIT | A user created or updated a wiki. |
977
+
978
+ ## Open Catalog Events
979
+
980
+ These events appear in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table of the project where the catalog is designated as the primary catalog.
981
+
982
+ | Event Type | Actions | Description |
983
+ | --- | --- | --- |
984
+ | FOLDER | CREATE | A user created a folder in the catalog. |
985
+ | FOLDER | DELETE | A user deleted a folder in the catalog. |
986
+ | TABLE | CREATE | A user created a table in the catalog. |
987
+ | TABLE | DELETE | A user deleted a table in the catalog. |
988
+ | TABLE | READ | A user read table information or data from the catalog. |
989
+ | TABLE | REGISTER | A user registered a new table in the catalog. |
990
+ | TABLE | UPDATE | A user updated a table definition in the catalog. |
991
+ | VIEW | CREATE | A user created a view in the catalog. |
992
+ | VIEW | DELETE | A user deleted a view in the catalog. |
993
+ | VIEW | READ | A user read a view in the catalog. |
994
+ | VIEW | UPDATE | A user updated a view definition in the catalog. |
995
+
996
+ ## Source Events
997
+
998
+ These events appear in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table for any source in the project.
999
+
1000
+ | Event Type | Actions | Description |
1001
+ | --- | --- | --- |
1002
+ | SOURCE | CREATE | A user created a data source. |
1003
+ | SOURCE | DELETE | A user deleted a source connection. Any tables from the source were removed. |
1004
+ | SOURCE | UPDATE | A user updated a source configuration. |
1005
+ | FOLDER | CREATE | A user created a folder. |
1006
+ | FOLDER | DELETE | A user deleted a folder. |
1007
+ | TABLE | CREATE | A user created a non-catalog table. |
1008
+ | TABLE | DELETE | A user deleted a non-catalog table. |
1009
+ | TABLE | UPDATE | A user updated a table, or Dremio performed a metadata refresh on a non-Parquet table. |
1010
+
1011
+ Was this page helpful?
1012
+
1013
+ * Organization Events
1014
+ * Project Events
1015
+ * Open Catalog Events
1016
+ * Source Events
1017
+
1018
+ <div style="page-break-after: always;"></div>
1019
+
1020
+ # Optimize Performance | Dremio Documentation
1021
+
1022
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/
1023
+
1024
+ Dremio uses a variety of tools to help you autonomously optimize your lakehouse. These tools apply at four stages: (1) source files, (2) intermediate transformations, (3) final or production transformations, and (4) client queries. Dremio also offers tools that allow you to manually fine-tune performance. Both approaches can coexist, enabling Dremio to manage most optimizations automatically while still giving you the flexibility to take direct action when desired.
1025
+
1026
+ For details on how Dremio autonomously manages your tables, see [Automatic Optimization](/dremio-cloud/manage-govern/optimization), which focuses on Iceberg table management.
1027
+
1028
+ This section focuses instead on accelerating views and SQL queries, including those from clients such as AI agents and BI dashboards. The principal method for this acceration is Dremio's patterned materialization and query-rewriting, known as Reflections.
1029
+
1030
+ * [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) – Learn how Dremio automatically learns your query patterns and manages Reflections to optimize performance accordingly. This capability is available for Iceberg tables, UniForm tables, Parquet datasets, and any views built on these datasets.
1031
+ * [Manual Reflections](/dremio-cloud/admin/performance/manual-reflections) – Use this option primarily for data formats not supported by Autonomous Reflections. Learn how to define your own Reflections and the best practices for using and managing them.
1032
+ * [Results Cache](/dremio-cloud/admin/performance/results-cache) – Understand how Dremio caches the results of queries from AI agents and BI dashboards.
1033
+
1034
+ Was this page helpful?
1035
+
1036
+ <div style="page-break-after: always;"></div>
1037
+
1038
+ # Jobs | Dremio Documentation
1039
+
1040
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/jobs/
1041
+
1042
+ On this page
1043
+
1044
+ All jobs run in Dremio are listed on a separate page, showing the job ID, type, status, and other attributes.
1045
+
1046
+ To navigate to the Jobs page, click ![This is the icon that represents the Jobs page.](/images/cloud/jobs-page-icon.png "Icon represents the Jobs page.") in the side navigation bar.
1047
+
1048
+ ## Search Filters and Columns
1049
+
1050
+ By default, the Jobs page lists the jobs run within the last 30 days and the jobs are filtered by **UI, External Tools** job types. To change these defaults for your account, you can filter on values and manage columns directly on the Jobs page, as shown in this image:
1051
+
1052
+ ![This is a screenshot showing the main components of the Jobs page.](/images/cloud/jobs-map.png "This is a screenshot showing the main components of the Jobs page.")
1053
+
1054
+ a. **Search Jobs** by typing the username or job ID.
1055
+
1056
+ b. **Start Time** allows you to pick the date and time at which the job began.
1057
+
1058
+ c. **Status** represents one or more job states. For descriptions, see [Job States and Statuses](.#job-states-and-statuses).
1059
+
1060
+ d. **Type** includes Accelerator, Downloads, External Tools, Internal, and UI. For descriptions, see Job Properties.
1061
+
1062
+ e. **User** can be searched by typing the username or checking the box next to the username in the dropdown.
1063
+
1064
+ f. **Manage Columns** by checking the boxes next to additional columns that you want to see in the Jobs list. The grayed out checkboxes show the columns that are required by default. You can also rearrange the column order by clicking directly on a column to drag and drop.
1065
+
1066
+ ## Job Properties
1067
+
1068
+ Each job has the following properties, which can appear as columns in the list of jobs on the Jobs page or as details on the Job Overview page:
1069
+
1070
+ | Property | Description |
1071
+ | --- | --- |
1072
+ | Accelerated | A purple lightning bolt in a row indicates that the job ran a query that was accelerated by one or more Reflections. |
1073
+ | Attribute | Represents at least one of the following query types: * **UI** - queries issued from the SQL Runner in the Dremio console. * **External Tools** - queries from client applications, such as Microsoft Power BI, Superset, Tableau, other third-party client applications, and custom applications. * **Accelerator** - queries related to creating, maintaining, and removing Reflections. * **Internal** - queries that Dremio submits for internal operations. * **Downloads** - queries used to download datasets. * **AI** – queries issued from the Dremio AI Agent. |
1074
+ | CPU Used | Provides statistics about the actual cost of the query operations in terms of CPU processing. |
1075
+ | Dataset | The queried dataset, if one was queried. Hover over the dataset to see a metadata card appear with details about the dataset. For more information, see [Discover Data](/dremio-cloud/explore-analyze/discover). |
1076
+ | Duration | The length of time (in seconds) that a job required from start to completion. |
1077
+ | Engine | The engine used to run the query. |
1078
+ | Input | The number of bytes and the number of rows considered for the job. |
1079
+ | Job ID | A universally unique identifier. |
1080
+ | Output | The number of bytes and the number of rows resulted as output from the job. |
1081
+ | Planner Cost Estimate | A cost estimate calculated by Dremio based on an evaluation of the resources that to be used in the execution of a query. The number is not in units, and is intended to give a an idea of the cost of executing a query relative to the costs of executing other queries. Values are derived by adding weighted estimates of required I/O, memory, and CPU load. In reported values, K = thousand, M = million, B = billion, and T = trillion. For example, a value of 12,543,765,321 is reported as 12.5B. |
1082
+ | Planning Time | The length of time (in seconds) in which the query optimizer planned the execution of the query. |
1083
+ | Rows Returned | Number of output records. |
1084
+ | Rows Scanned | Number of input records. |
1085
+ | SQL | The SQL query that was submitted for the job. |
1086
+ | Start Time | The date and time which the job began. |
1087
+ | Status | Represents one or more job states. For descriptions, see Job States and Statuses. |
1088
+ | Total Memory | Provides statistics about the actual cost of the query operations in terms of memory. |
1089
+ | User | Username of the user who ran the query and initiated the job. |
1090
+ | Wait on Client | The length of time (in seconds) that is waiting on the client. |
1091
+
1092
+ ## Job States and Statuses
1093
+
1094
+ Each job passes through a sequence of states until it is complete, though the sequence can be interrupted if a query is canceled or if there is an error during a state. In this diagram, the states that a job passes through are in white, and the possible end states are in dark gray.
1095
+
1096
+ ![](/assets/images/job-states-d8a1b49d0b4cef93a610cd185648e268.png)
1097
+
1098
+ This table lists the statuses that the UI lets you filter on and shows how they map to the states:
1099
+
1100
+ | Icon | Status | State | Description |
1101
+ | --- | --- | --- | --- |
1102
+ | | Setup | Pending | Represents a state where the query is waiting to be scheduled on the query pool. |
1103
+ | Metadata Retrieval | Represents a state where metadata schema is retrieved and the SQL command is parsed. |
1104
+ | Planning | Represents a state where the following are done: * Physical and logical planning * Reflection matching * Partition metadata retrieval * Mapping the query to an engine-based workload management rule * Pick the engine associated with the query to run the query. |
1105
+ | | Engine Start | Engine Start | Represents a state where the engine starts if it has stopped. If the engine is stopped, it takes time to restart for the executors to be active. If the engine is already started, then this state does not have a duration. |
1106
+ | | Queued | Queued | Represents a state where a job is queued. Each engine has a limit of concurrent queries. If the queries in progress exceed the concurrency limit, the query should wait until the jobs in progress complete. |
1107
+ | | Running | Execution Planning | Represents a state where executor nodes are selected from the chosen engine to run the query, and work is distributed to each executor. |
1108
+ | Running | Represents a state where executor nodes execute and complete the fragments assigned to them. Typically, most queries spend more time in this state. |
1109
+ | Starting | Represents a state where the query is starting up. |
1110
+ | | Canceled | Canceled | Represents a terminal state that indicates that the query is canceled by the user or an intervention in the system. |
1111
+ | | Completed | Completed | Represents a terminal state that indicates that the query is successfully completed. |
1112
+ | | Failed | Failed | Represents a terminal state that indicates that the query has failed due to an error. |
1113
+
1114
+ ## View Job Details
1115
+
1116
+ You can view the details of a specific job by viewing the Job Overview, SQL, Visual Profile, and Raw Profile pages.
1117
+
1118
+ To navigate to the job details:
1119
+
1120
+ 1. Click ![This is the icon that represents the Jobs page.](/images/cloud/jobs-page-icon.png "Icon represents the Jobs page.") in the side navigation bar.
1121
+ 2. On the Jobs page, click a job that you would like to see the job overview for.
1122
+ 3. The Job Overview page then replaces the list of jobs.
1123
+
1124
+ ### Explain SQL
1125
+
1126
+ Use the **Explain SQL** option in the SQL Runner to analyze and optimize your SQL queries with assistance from the AI Agent. In the SQL Runner, highlight the SQL you want to review, right-click, and select **Explain SQL**. This prompts the AI Agent to examine the query, datasets, and underlying architecture to identify potential optimizations. The AI Agent uses Dremio’s SQL Parser—the same logic used during query execution—to identify referenced tables, schemas, and relationships. Based on this analysis, the Agent provides insights and recommendations to improve query performance and structure. You can continue interacting with the AI Agent to refine the analysis and iterate on the SQL. The AI Agent applies SQL best practices when suggesting improvements and may execute revised queries to validate quality before presenting recommendations.
1127
+
1128
+ ### Explain Job
1129
+
1130
+ Use the **Explain Job** option on the Job Details page to analyze job performance and identify opportunities for optimization. From the Job Details page, click **Explain Job** to prompt the AI Agent to review the job’s query profile, planning, and execution details to compare with the AI Agents’s internal understanding of optimal performance characteristics. The AI Agent generates a detailed analysis that highlights key performance metrics such as data skew, memory usage, threading efficiency, and network utilization. Based on this assessment, it recommends potential optimizations to improve performance and resource utilization. You can continue the conversation with the AI Agent to explore the job in greater depth or reference additional job IDs to extend the investigation and compare results.
1131
+
1132
+ ### Job Overview
1133
+
1134
+ You can view the details of a specific job on the Job Overview page.
1135
+
1136
+ To navigate to a job overview:
1137
+
1138
+ 1. Click ![This is the icon that represents the Jobs page.](/images/cloud/jobs-page-icon.png "Icon represents the Jobs page.") in the side navigation bar.
1139
+ 2. On the Jobs page, click a job that you would like to see the job overview for. The Job Overview page then replaces the list of jobs.
1140
+
1141
+ The main components of the Job Overview page are numbered below:
1142
+
1143
+ ![This is a screenshot showing the main components of the Job Overview page.](/images/cloud/job-overview-page-cloud.png "This is a screenshot showing the main components of the Job Overview page.")
1144
+
1145
+ #### 1. Summary
1146
+
1147
+ Each job is summarized.
1148
+
1149
+ #### 2. Total Execution Time
1150
+
1151
+ The total execution time is the length of time for the total execution and the job state durations in the order they occur. Only the duration of the Engine Start state is in minutes and seconds. If the engine is stopped, it takes time to restart for the executors to be active. If the engine is already started, then Engine Start duration does not have a value. For descriptions, see Job States and Statuses.
1152
+
1153
+ #### 3. Download Profile
1154
+
1155
+ To download the query profile, click the **Download Profile** button in the bottom-left corner of the Job Overview page. The profile will help you see more granular details about the job.
1156
+
1157
+ The profile downloads as a **ZIP** file. When you extract the **ZIP** file, you will see the following JSON files:
1158
+
1159
+ * profile\_attempt\_0.json: This file helps with troubleshooting out of memory and wrong results issues. Note that the start and end time of the query is provided in EPOCH format. See the [Epoch Converter](https://www.epochconverter.com) utility for converting query time.
1160
+ * header.json: This file provides the full list of Dremio coordinators and executors, data sets, and sources.
1161
+ This information is useful when you are using REST calls.
1162
+
1163
+ #### 4. Submitted SQL
1164
+
1165
+ The SQL query for the selected job.
1166
+
1167
+ #### 5. Queried Datasets
1168
+
1169
+ The datasets queried for the selected job. These can be views or tables.
1170
+
1171
+ #### 6. Scans
1172
+
1173
+ Scan details include the source type, scan thread count, IO wait time (in milliseconds), and the number of rows scanned.
1174
+
1175
+ #### 7. Acceleration
1176
+
1177
+ Only if the job was accelerated, the Acceleration section appears and Reflections data is provided. See [Optimize Performance](/dremio-cloud/admin/performance/) for more information.
1178
+
1179
+ #### 8. Results
1180
+
1181
+ To see the job results, click the **Open Results** link in the top-right corner of the Job Overview page. As long as the engine that ran the job is up, the **Open Results** link is visible in the UI. It disappears when the engine that ran the job shuts down and is only visible for the jobs that are run through the UI.
1182
+
1183
+ ### Job SQL
1184
+
1185
+ Next to the Job Overview page is a tab for the SQL page, which shows the Submitted SQL and Dataset Graph.
1186
+
1187
+ You can view the SQL statement that was used for the selected job. Although the SQL statement is in read-only mode on the SQL Details page, the statement can be copied from the page and pasted into the SQL editor.
1188
+
1189
+ A dataset graph only appears if there is a queried dataset for the selected job. The dataset graph is a visual representation of the datasets used in the SQL statement.
1190
+
1191
+ ## Related Topics
1192
+
1193
+ * [Profiles](/dremio-cloud/admin/monitor/jobs/profiles) – See the visual profiles and raw profiles of jobs.
1194
+
1195
+ Was this page helpful?
1196
+
1197
+ * Search Filters and Columns
1198
+ * Job Properties
1199
+ * Job States and Statuses
1200
+ * View Job Details
1201
+ + Explain SQL
1202
+ + Explain Job
1203
+ + Job Overview
1204
+ + Job SQL
1205
+ * Related Topics
1206
+
1207
+ <div style="page-break-after: always;"></div>
1208
+
1209
+ # Manage Your Subscription | Dremio Documentation
1210
+
1211
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/subscription/
1212
+
1213
+ On this page
1214
+
1215
+ Dremio offers multiple payment options for users to upgrade their organization after conclusion of the free trial:
1216
+
1217
+ * Pay-as-you-go (PAYG): Provide credit card details via the upgrade steps in the Dremio console.
1218
+ * Commit-based offerings: Prepaid contracts for your organization's usage. Dremio invoices you directly, and a variety of payment options are available. Please contact [Dremio Sales](https://www.dremio.com/contact/) for more details.
1219
+ * AWS Marketplace: A commit-based contract paid with AWS credits. Check out the [Dremio Cloud AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-pnlijtzyoyjok) listing on the AWS Marketplace. Once ready to proceed, contact [Dremio Sales](https://www.dremio.com/contact/) for more details.
1220
+
1221
+ Note that your organization can be moved to a commit-based contract after upgrading to PAYG.
1222
+
1223
+ ## Upgrade
1224
+
1225
+ At any point during your free trial of Dremio, an organization can be upgraded by entering your credit card details. If the free trial concludes, your organization will become partially inaccessible for 30 days. During this time, you can still log in to upgrade your account, but if you do not upgrade your account before then, your organization and all of its contents may be deleted.
1226
+
1227
+ ## Pay-as-you-go Billing Cycles
1228
+
1229
+ Your billing cycle starts from the day of your organization's upgrade and ends one month later. At the conclusion of the billing period, we will immediately attempt to charge your card for the outstanding balance.
1230
+
1231
+ If for any reason payment fails (or is only partially successful), we will attempt the charge again. If these subsequent attempts fail, your organization will become partially inaccessible. You can still log in but only to update your payment method. If a new payment method is not provided before the end of this billing period, your organization and all of its contents may be deleted.
1232
+
1233
+ ## Organizations
1234
+
1235
+ A Dremio organization can have one or more projects. Usage across projects is aggregated for billing purposes, meaning that when the PAYG bill is paid for an organization, the balance is paid for all projects. Only users who are members of the ADMIN role within the organization can manage billing details within the Dremio console.
1236
+
1237
+ ## Find Your Organization ID
1238
+
1239
+ The ID of your organization can be helpful during communication with Dremio Sales or Support. To find your organization's ID:
1240
+
1241
+ 1. In the Dremio console, click ![Settings](/images/icons/settings.png "Settings") in the side navigation bar and select **Organization settings** to open the Organization settings page.
1242
+ 2. On the General Information tab, copy your organization's ID.
1243
+
1244
+ ## Delete Your Organization
1245
+
1246
+ Please contact Dremio's Support team if you would like to have your organization deleted.
1247
+
1248
+ Was this page helpful?
1249
+
1250
+ * Upgrade
1251
+ * Pay-as-you-go Billing Cycles
1252
+ * Organizations
1253
+ * Find Your Organization ID
1254
+ * Delete Your Organization
1255
+
1256
+ <div style="page-break-after: always;"></div>
1257
+
1258
+ # Configure Model Providers | Dremio Documentation
1259
+
1260
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/model-providers
1261
+
1262
+ On this page
1263
+
1264
+ You configure model providers for your organization for AI features when deploying Dremio. After you configure at least one model provider, you must set a default model provider and optionally set an allowlist of available models. Dremio uses this default provider for all Dremio's AI Agent interactions, whereas the allowlist models can be used by anyone writing AI functions. By default CALL MODEL is granted to all users for all new model providers so if the default changes users can continue to use the AI Agent without interruption.
1265
+
1266
+ ## Dremio-Provided LLM
1267
+
1268
+ Dremio provides all organizations with an out-of-the-box model provider so that all users can begin engaging with the AI Agent and AI functions without any other configuration required. Once you have added your own model provider and set it as the new default, the Dremio-Provided LLM will no longer be used. If you delete all other model providers, then the Dremio-Provided LLM will revert to the organization's default model provider. This model provider cannot be deleted.
1269
+
1270
+ ## Supported Model Providers
1271
+
1272
+ Dremio supports configuration of the following model providers and models. Dremio recommends using enterprise-grade reasoning models for the best performance and experience.
1273
+
1274
+ | Category | Models | Connection Method(s) |
1275
+ | --- | --- | --- |
1276
+ | **OpenAI** | * gpt-5-2025-08-07 * gpt-5-mini-2025-08-07 * gpt-5-nano-2025-08-07 * gpt-4.1-2025-04-14 * gpt-4o-2024-11-20 * gpt-4-turbo-2024-04-09 * gpt-4.1-mini-2025-04-14 * o3-mini-2025-01-31 * o4-mini-2025-04-16 * o3-2025-04-16 | * Access Key |
1277
+ | **Anthropic** | * claude-sonnet-4-5-20250929 * claude-opus-4-1-20250805 * claude-opus-4-20250514 * claude-sonnet-4-20250514 | * Access Key |
1278
+ | **Google Gemini** | * gemini-2.5-pro | * Access Key |
1279
+ | **AWS Bedrock** | * specify Model ID(s) * [AWS Bedrock Supported Models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html) | * Access Key * IAM Role |
1280
+ | **Azure OpenAI** | * specify Deployment Name(s) * [Azure Supported Models](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/models-sold-directly-by-azure?tabs=global-standard-aoai%2Cstandard-chat-completions%2Cglobal-standard&pivots=azure-openai#azure-openai-in-azure-ai-foundry-models) | Combination of 1. Resource Name 2. Directory ID 3. Application ID 4. Client Secret Value |
1281
+
1282
+ ## Rate Limiting and Quotas
1283
+
1284
+ ### AWS Bedrock Rate Limits
1285
+
1286
+ When using AWS Bedrock model providers, you may encounter rate limiting errors such as "429 Too Many Tokens (Rate Limit Exceeded)". This is particularly common with new AWS accounts that start with lower or fixed quotas.
1287
+
1288
+ If you experience rate limiting issues, you can contact AWS Support and request a quota increase by providing:
1289
+
1290
+ * Quota name
1291
+ * Model ID
1292
+ * AWS region
1293
+ * Use case description
1294
+ * Projected token and request usage
1295
+
1296
+ For more information about AWS Bedrock quotas and limits, see the [AWS Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).
1297
+
1298
+ ## Default Model Provider
1299
+
1300
+ To delete the model provider, you must assign a new default unless you are deleting the last available model provider that you have configured. To update the default model provider to a new one, you must have MODIFY privilege on both the current default and the new proposed default model provider.
1301
+
1302
+ ## Add Model Provider
1303
+
1304
+ To add a model provider in the Dremio console:
1305
+
1306
+ 1. Click ![This is the Settings icon.](/images/green-settings-icon.png "The Settings icon.") in the side navigation bar to go to the Settings page.
1307
+ 2. Select **Organization Settings**.
1308
+ 3. Select the **AI Configuration** setting.
1309
+ 4. Click **Add model provider**.
1310
+
1311
+ Was this page helpful?
1312
+
1313
+ * Dremio-Provided LLM
1314
+ * Supported Model Providers
1315
+ * Rate Limiting and Quotas
1316
+ + AWS Bedrock Rate Limits
1317
+ * Default Model Provider
1318
+ * Add Model Provider
1319
+
1320
+ <div style="page-break-after: always;"></div>
1321
+
1322
+ # External Engines | Dremio Documentation
1323
+
1324
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/external-engines
1325
+
1326
+ On this page
1327
+
1328
+ Dremio's Open Catalog is built on Apache Polaris, providing a standards-based, open approach to data catalog management. At its core is the Iceberg REST interface, which enables seamless integration with any query engine that supports the Apache Iceberg REST catalog specification. This open architecture means you can connect industry-standard engines such as Apache Spark, Trino, and Apache Flink directly to Dremio.
1329
+
1330
+ | Engine | Best For | Key Features |
1331
+ | --- | --- | --- |
1332
+ | [Apache Spark](https://spark.apache.org/) | Data engineering, ETL | Token exchange, nested folders, views |
1333
+ | [Trino](https://trino.io/) | Interactive analytics | Fast queries, BI workloads |
1334
+ | [Apache Flink](https://flink.apache.org/) | Real-time streaming | Event-driven, continuous pipelines |
1335
+
1336
+ By leveraging the Iceberg REST standard, the Open Catalog acts as a universal catalog layer that query engines can communicate with using a common language. This allows organizations to build flexible data architectures where multiple engines can work together, each accessing and managing the same Iceberg tables through Dremio's centralized catalog.
1337
+
1338
+ ## Apache Spark
1339
+
1340
+ Apache Spark is a unified analytics engine for large-scale data processing, widely used for ETL, batch processing, and data engineering workflows.
1341
+
1342
+ ### Prerequisites
1343
+
1344
+ This example uses Spark 3.5.3 with Iceberg 1.9.1. For other versions, ensure compatibility between Spark, Scala, and Iceberg runtime versions. Additional prerequisites include:
1345
+
1346
+ * The following JAR files downloaded to your local directory:
1347
+ + `authmgr-oauth2-runtime-0.0.5.jar` from [Dremio Auth Manager releases](https://github.com/dremio/iceberg-auth-manager/releases). This open-source library handles token exchange, automatically converting your personal access token (PAT) into an OAuth token for seamless authentication. For more details about Dremio Auth Manager's capabilities and configuration options, see [Introducing Dremio Auth Manager for Apache Iceberg](https://www.dremio.com/blog/introducing-dremio-auth-manager-for-apache-iceberg/).
1348
+ + `iceberg-spark-runtime-3.5_2.12-1.9.1.jar` (from [Apache Iceberg releases](https://iceberg.apache.org/releases/))
1349
+ + `iceberg-aws-bundle-1.9.1.jar` (from [Apache Iceberg releases](https://iceberg.apache.org/releases/))
1350
+ * Docker installed and running.
1351
+ * Your Dremio catalog name – The default catalog in each project has the same name as the project.
1352
+ * If authenticating with a PAT, you must generate a token. See [Personal Access Tokens](/dremio-cloud/security/authentication/personal-access-token/) for step-by-step instructions.
1353
+ * If authenticating with an identity provider (IDP), your IDP or other external token provider must be configured as a trusted OAuth [external token provider](/dremio-cloud/security/authentication/app-authentication/external-token) in Dremio.
1354
+ * You must have an OAuth2 client registered in your IDP configured to issue tokens that Dremio accepts (matching audience and scopes) and with a client ID and client secret provided by your IDP.
1355
+
1356
+ ### Authenticate with a PAT
1357
+
1358
+ You can authenticate your Apache Spark session with a Dremio personal access token using the following script. Replace `<personal_access_token>` with your Dremio personal access token and replace `<catalog_name>` with your catalog name.
1359
+
1360
+ In addition, you can adjust the volume mount paths to match where you've downloaded the JAR files and where you want your workspace directory. The example uses `$HOME/downloads` and `$HOME/workspace`.
1361
+
1362
+ Spark with PAT Authentication
1363
+
1364
+ ```
1365
+ #!/bin/bash
1366
+ export CATALOG_NAME="<catalog_name>"
1367
+ export DREMIO_PAT="<personal_access_token>"
1368
+
1369
+ docker run -it \
1370
+ -v $HOME/downloads:/opt/jars \
1371
+ -v $HOME/workspace:/workspace \
1372
+ apache/spark:3.5.3 \
1373
+ /opt/spark/bin/spark-shell \
1374
+ --jars /opt/jars/authmgr-oauth2-runtime-0.0.5.jar,/opt/jars/iceberg-spark-runtime-3.5_2.12-1.9.1.jar,/opt/jars/iceberg-aws-bundle-1.9.1.jar \
1375
+ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
1376
+ --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
1377
+ --conf spark.sql.catalog.polaris.type=rest \
1378
+ --conf spark.sql.catalog.polaris.cache-enabled=false \
1379
+ --conf spark.sql.catalog.polaris.warehouse=$CATALOG_NAME \
1380
+ --conf spark.sql.catalog.polaris.uri=https://catalog.dremio.cloud/api/iceberg \
1381
+ --conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
1382
+ --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
1383
+ --conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
1384
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.token-endpoint=https://login.dremio.cloud/oauth/token \
1385
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=token_exchange \
1386
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=dremio-catalog-cli \
1387
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=dremio.all \
1388
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token="$DREMIO_PAT" \
1389
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:dremio:personal-access-token
1390
+ ```
1391
+
1392
+ note
1393
+
1394
+ In this configuration, `polaris` is the catalog identifier used within Spark. This identifier is mapped to your actual Dremio catalog via the `spark.sql.catalog.polaris.warehouse` property.
1395
+
1396
+ ### Authenticate with an IDP
1397
+
1398
+ You can authenticate your Apache Spark session using an [external token provider](/dremio-cloud/security/authentication/app-authentication/external-token) that has been integrated with Dremio.
1399
+
1400
+ **Using this configuration:**
1401
+
1402
+ * Spark obtains a user-specific JWT from the external token provider.
1403
+ * Spark connects to Dremio and [exchanges the JWT](/dremio-cloud/api/oauth-token) for an access token.
1404
+ * Spark connects to the Open Catalog using the access token.
1405
+
1406
+ Using the following script, replace `<catalog_name>` with your catalog name, `<idp_url>` with the location of your external token provider, `<client_id>` and `<client_secret>` with the credentials issued by the external token provider.
1407
+
1408
+ In addition, you can adjust the volume mount paths to match where you've downloaded the JAR files and where you want your workspace directory. The example uses `$HOME/downloads` and `$HOME/workspace`.
1409
+
1410
+ Spark with IDP Authentication
1411
+
1412
+ ```
1413
+ #!/bin/bash
1414
+ export CATALOG_NAME="<catalog_name>"
1415
+ export IDP_URL="<idp_url>"
1416
+ export CLIENT_ID="<idp_client_id>"
1417
+ export CLIENT_SECRET="<idp_client_secret>"
1418
+
1419
+ docker run -it \
1420
+ -v $HOME/downloads:/opt/jars \
1421
+ -v $HOME/workspace:/workspace \
1422
+ apache/spark:3.5.3 \
1423
+ /opt/spark/bin/spark-shell \
1424
+ --jars /opt/jars/authmgr-oauth2-runtime-0.0.5.jar,/opt/jars/iceberg-spark-runtime-3.5_2.12-1.9.1.jar,/opt/jars/iceberg-aws-bundle-1.9.1.jar \
1425
+ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
1426
+ --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
1427
+ --conf spark.sql.catalog.polaris.type=rest \
1428
+ --conf spark.sql.catalog.polaris.cache-enabled=false \
1429
+ --conf spark.sql.catalog.polaris.warehouse=$CATALOG_NAME \
1430
+ --conf spark.sql.catalog.polaris.uri=https://catalog.dremio.cloud/api/iceberg \
1431
+ --conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
1432
+ --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
1433
+ --conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
1434
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.issuer-url=$IDP_URL \
1435
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=device_code \
1436
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=$CLIENT_ID \
1437
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.client-secret=$CLIENT_SECRET \
1438
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=dremio.all \
1439
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.enabled=true \
1440
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.token-endpoint=https://login.dremio.cloud/oauth/token \
1441
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.scope=dremio.all \
1442
+ --conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:jwt
1443
+ ```
1444
+
1445
+ ### Usage Examples
1446
+
1447
+ With these configurations, `polaris` is the catalog identifier used within Spark. This identifier is mapped to your actual Dremio catalog via the `spark.sql.catalog.polaris.warehouse` property. Once Spark is running and connected to your Dremio catalog:
1448
+
1449
+ List namespaces
1450
+
1451
+ ```
1452
+ spark.sql("SHOW NAMESPACES IN polaris").show()
1453
+ ```
1454
+
1455
+ Query a table
1456
+
1457
+ ```
1458
+ spark.sql("SELECT * FROM polaris.your_namespace.your_table LIMIT 10").show()
1459
+ ```
1460
+
1461
+ Create a table
1462
+
1463
+ ```
1464
+ spark.sql("""
1465
+ CREATE TABLE polaris.your_namespace.new_table (
1466
+ id INT,
1467
+ name STRING
1468
+ ) USING iceberg
1469
+ """)
1470
+ ```
1471
+
1472
+ ## Trino
1473
+
1474
+ Trino is a distributed SQL query engine designed for fast analytic queries against data sources of all sizes. It excels at interactive SQL analysis, ad hoc queries, and joining data across multiple sources.
1475
+
1476
+ ### Prerequisites
1477
+
1478
+ * Docker installed and running.
1479
+ * A valid Dremio personal access token – See [Personal Access Tokens](/dremio-cloud/security/authentication/personal-access-token/) for instructions to generate a personal access token.
1480
+ * Your Dremio catalog name – The default catalog in each project has the same name as the project.
1481
+
1482
+ ### Configuration
1483
+
1484
+ To connect Trino to Dremio using Docker, follow these steps:
1485
+
1486
+ 1. Create a directory for Trino configuration and add a catalog configuration:
1487
+
1488
+ ```
1489
+ mkdir -p ~/trino-config/catalog
1490
+ ```
1491
+
1492
+ In `trino-config/catalog`, create a catalog configuration file named `polaris.properties` with the following values:
1493
+
1494
+ Trino polaris.properties
1495
+
1496
+ ```
1497
+ connector.name=iceberg
1498
+ iceberg.catalog.type=rest
1499
+ iceberg.rest-catalog.uri=https://catalog.dremio.cloud/api/iceberg
1500
+ iceberg.rest-catalog.oauth2.token=<personal_access_token>
1501
+
1502
+ iceberg.rest-catalog.warehouse=<catalog_name>
1503
+ iceberg.rest-catalog.security=OAUTH2
1504
+
1505
+ iceberg.rest-catalog.vended-credentials-enabled=true
1506
+ fs.native-s3.enabled=true
1507
+ s3.region=<region>
1508
+ ```
1509
+
1510
+ Replace the following:
1511
+
1512
+ * `<personal_access_token>` with your Dremio personal access token.
1513
+ * `<catalog_name>` with your catalog name.
1514
+ * `<region>` with the AWS region where your data is stored, such as `us-west-2`.
1515
+
1516
+ note
1517
+
1518
+ * In this configuration, `polaris` (from the filename `polaris.properties`) is the catalog identifier used in Trino queries. The `iceberg.rest-catalog.warehouse` property maps this identifier to your actual Dremio catalog.
1519
+ * In `oauth2.token`, you provide your Dremio personal access token directly. Dremio's catalog API accepts PATs as bearer tokens without requiring token exchange.
1520
+ 2. Pull and start the Trino container:
1521
+
1522
+ ```
1523
+ docker run --name trino -d -p 8080:8080 trinodb/trino:latest
1524
+ ```
1525
+ 3. Verify that Trino is running:
1526
+
1527
+ ```
1528
+ docker ps
1529
+ ```
1530
+
1531
+ You can access the web UI at `http://localhost:8080` and log in as `admin`.
1532
+ 4. Restart Trino with the configuration:
1533
+
1534
+ ```
1535
+ docker stop trino
1536
+ docker rm trino
1537
+
1538
+ # Start with mounted configuration
1539
+ docker run --name trino -d -p 8080:8080 -v ~/trino-config/catalog:/etc/trino/catalog trinodb/trino:latest
1540
+
1541
+ # Verify Trino is running
1542
+ docker ps
1543
+
1544
+ # Check logs
1545
+ docker logs trino -f
1546
+ ```
1547
+ 5. In another window, connect to the Trino CLI:
1548
+
1549
+ ```
1550
+ docker exec -it trino trino --user admin
1551
+ ```
1552
+
1553
+ You should see the Trino prompt:
1554
+
1555
+ ```
1556
+ trino>
1557
+ ```
1558
+ 6. Verify the catalog connection:
1559
+
1560
+ ```
1561
+ trino> show catalogs;
1562
+ ```
1563
+
1564
+ ### Usage Examples
1565
+
1566
+ Once Trino is running and connected to your Dremio catalog:
1567
+
1568
+ List namespaces
1569
+
1570
+ ```
1571
+ trino> show schemas from polaris;
1572
+ ```
1573
+
1574
+ Query a table
1575
+
1576
+ ```
1577
+ trino> select * from polaris.your_namespace.your_table;
1578
+ ```
1579
+
1580
+ Create a table
1581
+
1582
+ ```
1583
+ trino> CREATE TABLE polaris.demo_namespace.test_table (
1584
+ id INT,
1585
+ name VARCHAR,
1586
+ created_date DATE,
1587
+ value DOUBLE
1588
+ );
1589
+ ```
1590
+
1591
+ ### Limitations
1592
+
1593
+ * **Case sensitivity:** Namespace and table names must be in lowercase. Trino will not list or access tables in namespaces that begin with an uppercase character.
1594
+ * **View compatibility:** Trino cannot read views created in Dremio due to SQL dialect incompatibility. Returns error: "Cannot read unsupported dialect 'DremioSQL'."
1595
+
1596
+ ## Apache Flink
1597
+
1598
+ Apache Flink is a distributed stream processing framework designed for stateful computations over bounded and unbounded data streams, enabling real-time data pipelines and event-driven applications.
1599
+
1600
+ To connect Apache Flink to Dremio using Docker Compose, follow these steps:
1601
+
1602
+ ### Prerequisites
1603
+
1604
+ You'll need to download the required JAR files and organize them in a project directory structure.
1605
+
1606
+ 1. Create the project directory structure:
1607
+
1608
+ ```
1609
+ mkdir -p flink-dremio/jars
1610
+ cd flink-dremio
1611
+ ```
1612
+ 2. Download the required JARs into the `jars/` directory:
1613
+
1614
+ * Iceberg Flink Runtime 1.20:
1615
+
1616
+ ```
1617
+ wget -P jars/ https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.9.1/iceberg-flink-runtime-1.20-1.9.1.jar
1618
+ ```
1619
+ * Iceberg AWS Bundle for vended credentials:
1620
+
1621
+ ```
1622
+ wget -P jars/ https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.9.1/iceberg-aws-bundle-1.9.1.jar
1623
+ ```
1624
+ * Hadoop dependencies required by Flink:
1625
+
1626
+ ```
1627
+ wget -P jars/ https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
1628
+ ```
1629
+ 3. Create the Dockerfile.
1630
+
1631
+ Create a file named `Dockerfile` in the `flink-dremio` directory:
1632
+
1633
+ Flink Dockerfile
1634
+
1635
+ ```
1636
+ FROM flink:1.20-scala_2.12
1637
+
1638
+ # Copy all required JARs
1639
+ COPY jars/*.jar /opt/flink/lib/
1640
+ ```
1641
+ 4. Create the `docker-compose.yml` file in the `flink-dremio` directory:
1642
+
1643
+ Flink docker-compose.yml
1644
+
1645
+ ```
1646
+ services:
1647
+ flink-jobmanager:
1648
+ build: .
1649
+ ports:
1650
+ - "8081:8081"
1651
+ command: jobmanager
1652
+ environment:
1653
+ - |
1654
+ FLINK_PROPERTIES=
1655
+ jobmanager.rpc.address: flink-jobmanager
1656
+ parallelism.default: 2
1657
+ - AWS_REGION=us-west-2
1658
+
1659
+ flink-taskmanager:
1660
+ build: .
1661
+ depends_on:
1662
+ - flink-jobmanager
1663
+ command: taskmanager
1664
+ scale: 1
1665
+ environment:
1666
+ - |
1667
+ FLINK_PROPERTIES=
1668
+ jobmanager.rpc.address: flink-jobmanager
1669
+ taskmanager.numberOfTaskSlots: 4
1670
+ parallelism.default: 2
1671
+ - AWS_REGION=us-west-2
1672
+ ```
1673
+ 5. Build and start the Flink cluster:
1674
+
1675
+ ```
1676
+ # Build and start the cluster
1677
+ docker-compose build --no-cache
1678
+ docker-compose up -d
1679
+
1680
+ # Verify the cluster is running
1681
+ docker-compose ps
1682
+
1683
+ # Verify required JARs are present
1684
+ docker-compose exec flink-jobmanager ls -la /opt/flink/lib/ | grep -E "(iceberg|hadoop)"
1685
+ ```
1686
+
1687
+ You should see the JARs you downloaded in the previous step.
1688
+ 6. Connect to the Flink SQL client:
1689
+
1690
+ ```
1691
+ docker-compose exec flink-jobmanager ./bin/sql-client.sh
1692
+ ```
1693
+
1694
+ You can also access the Flink web UI at `http://localhost:8081` to monitor jobs.
1695
+ 7. Create the Dremio catalog connection in Flink:
1696
+
1697
+ ```
1698
+ CREATE CATALOG polaris WITH (
1699
+ 'type' = 'iceberg',
1700
+ 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
1701
+ 'uri' = 'https://catalog.dremio.cloud/api/iceberg',
1702
+ 'token' = '<personal_access_token>',
1703
+ 'warehouse' = '<catalog_name>',
1704
+ 'header.X-Iceberg-Access-Delegation' = 'vended-credentials',
1705
+ 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO'
1706
+ );
1707
+ ```
1708
+
1709
+ Replace the following:
1710
+
1711
+ * `<personal_access_token>` with your Dremio personal access token.
1712
+ * `<catalog_name>` with your catalog name.
1713
+
1714
+ note
1715
+
1716
+ * In this configuration, `polaris` is the catalog identifier used in Flink queries. The `CREATE CATALOG` command maps this identifier to your actual Dremio catalog.
1717
+ * In `token`, you provide your Dremio personal access token directly. Dremio's catalog API accepts PATs as bearer tokens without requiring token exchange.
1718
+ 8. Verify the catalog connection:
1719
+
1720
+ ```
1721
+ Flink SQL> show catalogs;
1722
+ ```
1723
+
1724
+ ### Usage Examples
1725
+
1726
+ Once Apache Flink is running and connected to your Dremio catalog:
1727
+
1728
+ List namespaces
1729
+
1730
+ ```
1731
+ Flink SQL> show databases in polaris;
1732
+ ```
1733
+
1734
+ Query a table
1735
+
1736
+ ```
1737
+ Flink SQL> select * from polaris.your_namespace.your_table;
1738
+ ```
1739
+
1740
+ Create a table
1741
+
1742
+ ```
1743
+ Flink SQL> CREATE TABLE polaris.demo_namespace.test_table (
1744
+ id INT,
1745
+ name STRING,
1746
+ created_date DATE,
1747
+ `value` DOUBLE
1748
+ );
1749
+ ```
1750
+
1751
+ ### Limitations
1752
+
1753
+ * **Reserved keywords:** Column names that are reserved keywords, such as `value`, `timestamp`, and `date`, must be enclosed in backticks when creating or querying tables.
1754
+
1755
+ Was this page helpful?
1756
+
1757
+ * Apache Spark
1758
+ + Prerequisites
1759
+ + Authenticate with a PAT
1760
+ + Authenticate with an IDP
1761
+ + Usage Examples
1762
+ * Trino
1763
+ + Prerequisites
1764
+ + Configuration
1765
+ + Usage Examples
1766
+ + Limitations
1767
+ * Apache Flink
1768
+ + Prerequisites
1769
+ + Usage Examples
1770
+ + Limitations
1771
+
1772
+ <div style="page-break-after: always;"></div>
1773
+
1774
+ # Usage | Dremio Documentation
1775
+
1776
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/subscription/usage
1777
+
1778
+ On this page
1779
+
1780
+ There are multiple forms of billable Dremio usage within an [organization](/dremio-cloud/admin/subscription/#organizations):
1781
+
1782
+ * Dremio Consumption Units (DCUs) represent the usage of Dremio engines. DCUs are only consumed when your engines are running.
1783
+ * Large-language model (LLM) tokens are billed when you use Dremio's AI features via the Dremio-Provided LLM.
1784
+ * Storage usage is billed in terabyte-months and only applies to projects that use Dremio-hosted storage. If your projects use an object storage bucket in your account with a cloud provider as the catalog store, storage fees do not apply.
1785
+
1786
+ ## How DCUs are Calculated
1787
+
1788
+ The number of DCUs consumed by an engine depends on two factors:
1789
+
1790
+ * The size of the engine
1791
+ * How long the engine and its replicas have been running for
1792
+
1793
+ DCU consumption for an engine is calculated as `(Total uptime for the engine and its replicas) * (DCU consumption rate for that engine size)`.
1794
+
1795
+ Uptime is measured in seconds and has a 60-second minimum.
1796
+
1797
+ The DCU consumption rate for each engine size supported in Dremio is listed in [Manage Engines](/dremio-cloud/admin/engines/).
1798
+
1799
+ ### DCU Examples
1800
+
1801
+ #### Example 1
1802
+
1803
+ An organization has two Dremio Cloud engines defined: Engine A and Engine B, where Engine A is a 2XSmall engine, and Engine B is a Medium engine.
1804
+
1805
+ Suppose that between 8 a.m. and 9 a.m. one day:
1806
+
1807
+ * Engine A had 2 replicas running for 40 minutes each, so it accumulates a total of 80 minutes of engine uptime.
1808
+ * Engine B had 5 replicas running for 50 minutes each, so it accumulates a total of 250 minutes of engine uptime.
1809
+
1810
+ The total usage for Engine A for this hour is `(80/60) * (16 DCUs/hour) = 21.33 DCUs`.
1811
+
1812
+ The total usage for Engine B for this hour is `(250/60) * (128 DCUs/hour) = 533.33 DCUs`.
1813
+
1814
+ #### Example 2
1815
+
1816
+ An organization has one Dremio Cloud engine defined: Engine A, where Engine A is a Medium engine.
1817
+
1818
+ Suppose that between 8 a.m. and 9 a.m. one day:
1819
+
1820
+ * Engine A had 1 replica running for the entire hour (60 minutes).
1821
+ * Engine A needed to spin up an additional replica for 30 minutes to tackle a workload spike.
1822
+
1823
+ Engine A accumulated a total of 90 minutes of engine uptime, so the total usage for Engine A for this hour is `(90/60) * (128 DCUs/hour) = 192 DCUs`.
1824
+
1825
+ ## How AI Usage Is Calculated
1826
+
1827
+ If you use the Dremio-Provided LLM, you pay directly for the cost of both the input and output tokens used. If you connect to another LLM via your own model provider, you are not currently charged for this usage.
1828
+
1829
+ ### AI Examples
1830
+
1831
+ #### Example 1
1832
+
1833
+ Say that you use an external model provider as well as the Dremio-Provided LLM to use Dremio's AI features, resulting in a usage footprint like the below:
1834
+
1835
+ * External model provider 500K input tokens used.
1836
+ * External model provider: 30K output tokens used.
1837
+ * Dremio-Provided: 200K input tokens used.
1838
+ * Dremio-Provided: 20K output tokens used.
1839
+
1840
+ You are not charged for using Dremio's AI features via an external model. Instead, you are only charged for the tokens consumed by the Dremio-Provided LLM:
1841
+
1842
+ * (200K input tokens)\*($1.25/1 million tokens) = $0.25
1843
+ * (20K output tokens)\*($10.00/1 million tokens) = $0.20
1844
+
1845
+ In this scenario, you would be billed for $0.45 of AI feature usage.
1846
+
1847
+ In order to simplify the billing experience for AI features Dremio may explore the addition of an AI specific credit, similar to DCUs, in the future.
1848
+
1849
+ ## How Storage Usage Is Calculated
1850
+
1851
+ Storage is calculated through the collection of periodic snapshots of the Dremio-hosted bucket. These snapshots throughout a billing period are averaged through the billing period to calculate a number of billable terabyte-months.
1852
+
1853
+ ### Storage Usage Examples
1854
+
1855
+ #### Example 1
1856
+
1857
+ Suppose an organization has one Dremio project in a region where the price of a terabyte-month is $23.00, and that in a given month this project:
1858
+
1859
+ * Stores 1 terabyte of data for the entire 30 days of the billing period
1860
+
1861
+ Then the total amount charged for the storage would be (1) \* ($23.00) = $23.00
1862
+
1863
+ #### Example 2
1864
+
1865
+ Suppose your organization has a project in a region where the price of a terabyte-month is $23.00, and that in a given period this project:
1866
+
1867
+ * Stores 1 terabyte of data for the first 15 days of the month
1868
+ * Stores 2 terabyte of data for the last 15 days of the month
1869
+
1870
+ On average throughout the month, the project was storing 1.5Tb of data. So the bill would be (1.5) \* ($23.00) = $34.5.
1871
+
1872
+ Was this page helpful?
1873
+
1874
+ * How DCUs are Calculated
1875
+ + DCU Examples
1876
+ * How AI Usage Is Calculated
1877
+ + AI Examples
1878
+ * How Storage Usage Is Calculated
1879
+ + Storage Usage Examples
1880
+
1881
+ <div style="page-break-after: always;"></div>
1882
+
1883
+ # Project Preferences | Dremio Documentation
1884
+
1885
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/preferences
1886
+
1887
+ Preferences let you customize the behavior of specific features in the Dremio console.
1888
+
1889
+ To view the available preferences:
1890
+
1891
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project settings**.
1892
+ 2. Select **Preferences** in the project settings sidebar.
1893
+
1894
+ This opens the Preferences page, showing the Dremio console settings that can be modified.
1895
+ 3. Use the toggle next to the setting to enable or disable for all users.
1896
+
1897
+ If any preferences are modified, users must refresh their browsers to see the change.
1898
+
1899
+ These preferences and their descriptions are listed in the table below.
1900
+
1901
+ | Setting | Default | Enabled | Disabled | Details |
1902
+ | --- | --- | --- | --- | --- |
1903
+ | SQL Autocomplete | Enabled | Autocomplete provides suggestions for SQL keywords, catalog objects, and functions while you are constructing SQL statements. is visible in the SQL editor, although users can switch the button off within their own accounts. | The button is hidden from the SQL editor and suggestions are not provided. | See how this works in the [SQL editor](/dremio-cloud/get-started/quick-tour). |
1904
+ | Copy or Download Results | Enabled | and are visible above the results table, because users are allowed to copy or download the results in the SQL editor. | The buttons are hidden and users cannot copy or download results in the SQL editor. | See how this works in [result set actions](/dremio-cloud/get-started/quick-tour). |
1905
+ | Query Dataset on Click | Enabled | Clicking on a dataset opens the SQL Runner with a `SELECT` statement on the dataset. | If you would rather click directly on a dataset to see or edit the definition, disable this preference. Clicking on a dataset opens the Datasets page, showing a `SELECT` statement on the dataset or the dataset's definition that you can view or edit depending on your dataset privileges. | |
1906
+ | Autonomous Reflections | Enabled | Dremio automatically creates and drops Reflections based on query patterns from the last 7 days to seamlessly accelerate performance. | Dremio will provide recommendations to create and drop Reflections based on query patterns from the last 7 days to accelerate query performance. | See how this works in the [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections). |
1907
+ | AI Features | Enabled | When enabled, users can interact with the Dremio's AI Agent and AI functions. The AI Agent enables agentic workflows, allowing analysts to work with the agent to generate SQL queries, find insights, and create visualizations. The AI functions allow engineers to query unstructured data and use LLMs during SQL execution. | The AI Agent and AI functions will not work. | See how this works in [Explore with AI Agent](/dremio-cloud/explore-analyze/ai-agent) and [AI Functions](/dremio-cloud/sql/sql-functions/AI). |
1908
+ | Generate wikis and labels | Enabled | In the Details panel, both **Generate wiki** and **Generate labels** links will be visible for generating wikis and labels. | The links for **Generate wiki** and **Generate labels** will be hidden, making these features unavailable. | See how this works in [Wikis and Labels](/dremio-cloud/manage-govern/wikis-labels). |
1909
+
1910
+ Was this page helpful?
1911
+
1912
+ <div style="page-break-after: always;"></div>
1913
+
1914
+ # Profiles | Dremio Documentation
1915
+
1916
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/jobs/profiles
1917
+
1918
+ On this page
1919
+
1920
+ Visual profiles and raw profiles are available for jobs that have run queries.
1921
+
1922
+ ## Visual Profiles
1923
+
1924
+ You can view the operations in visual profiles to diagnose performance or cost issues and to see the results of changes that you make, either to queries themselves or their environment, to improve performance or reduce costs.
1925
+
1926
+ A query profile details the plan that Dremio devised for running a query and shows statistics from the query's execution. A visual representation of a query profile is located on the Visual Profile tab. This visual profile consists of operators that are arranged as a tree, where each operator has one or two upstream operators that represent a specific action, such as a table scan, join, or sort. At the top of the tree, a single root operator represents the query results, and at the bottom, the leaf operators represent scan or read operations from datasets.
1927
+
1928
+ Data processing begins with the reading of datasets at the bottom of the tree structure, and data is sequentially processed up the tree. A query plan can have many branches, and each branch is processed separately until a join or other operation connects it to the rest of the tree.
1929
+
1930
+ ### Phases
1931
+
1932
+ A query plan is composed of query phases (also called major fragments), and each phase defines a series of operations that are running in parallel. A query phase is depicted by the same colored boxes that are grouped together in a visual profile.
1933
+
1934
+ Within the query phases are multiple, single-threaded instances (also called minor fragments) running in parallel. Each thread is processing a different set of data through the same series of operations, and this data is exchanged from one phase to another. The number of threads for each operator can be found in the Details section (right panel) of a visual profile.
1935
+
1936
+ ### Use Visual Profiles
1937
+
1938
+ To navigate to the visual profile for a job:
1939
+
1940
+ 1. Click ![This is the icon that represents the Jobs page.](/images/cloud/jobs-page-icon.png "Icon represents the Jobs page.") in the side navigation bar.
1941
+ 2. On the Jobs page, click a job that you would like to see the visual profile for.
1942
+ 3. At the top of the next page, click the Visual Profile tab to open.
1943
+
1944
+ The main components of a visual profile are shown below:
1945
+
1946
+ ![](/images/query-profile-visualizer.png)
1947
+
1948
+ | Location | Description |
1949
+ | --- | --- |
1950
+ | 1 | The Visual Profile tab shows a visual representation of a query profile. |
1951
+ | 2 | The left panel is where you can view the phases of the query execution or single operators, sorting them by runtime, total memory used, or records produced. Operators of the same color are within the same phase. Clicking the Collapse This button hides a panel from view. hides the left panel from view. |
1952
+ | 3 | The tree graph allows you to select an operator and find out where it is in relation to the rest of the query plan. |
1953
+ | 4 | The zoom controls the size of the tree graph so it's easier for you to view. |
1954
+ | 5 | The right panel shows the details and statistics about the selected operator. Clicking the Collapse This button hides a panel from view. hides the right panel from view. |
1955
+
1956
+ ### Use Cases
1957
+
1958
+ #### Improve the Performance of Queries
1959
+
1960
+ You may notice that a query is taking more time than expected and want to know if something can be done to reduce the execution time. By viewing its visual profile, you can, for example, quickly find the operators with the highest processing times.
1961
+
1962
+ You might decide to try making simple adjustments to cause Dremio to choose a different plan. Some of the possible adjustments include:
1963
+
1964
+ * Adding a filter on a partition column to reduce the amount of data scanned
1965
+ * Changing join logic to avoid expanding joins (which return more rows than either of the inputs) or nested-loop joins
1966
+ * Creating a Reflection to avoid some of processing-intensive work done by the query
1967
+
1968
+ #### Reduce Query-Execution Costs
1969
+
1970
+ If you are an administrator, you may be interested in tuning the system as a whole to support higher concurrency and lower resource usage across the system, because you want to identify the most expensive queries in the system and then see what can be done to lower the cost of these queries. Such an investigation is often important even if individual users are happy with the performance of their own queries.
1971
+
1972
+ On the Jobs page, you can use the columns to find the queries with the highest cost, greatest number of rows scanned, and more. You can then study the visual profiles for these queries, identifying system or data problems, and mismatches between how data is stored and how these queries retrieve it. You can try repartitioning data, modifying data types, sorting, creating views, creating Reflections, and other changes.
1973
+
1974
+ ## Raw Profiles
1975
+
1976
+ Click **Raw Profile** to open a raw profile of the job in a separate dialog, which includes a job summary, state durations, threads, resource allocation, operators, visualized plan, acceleration, and other details.
1977
+
1978
+ A raw profile is a UI-generated profile that is a subset of the data that you can download and provides a summary of metrics collected for each executed query that can be used to monitor and analyze query performance.
1979
+
1980
+ To navigate to a raw profile:
1981
+
1982
+ 1. Click ![This is the icon that represents the Jobs page.](/images/cloud/jobs-page-icon.png "Icon represents the Jobs page.") in the side navigation bar to open the Jobs page.
1983
+ 2. On the Jobs page, click a job that you would like to see the raw profile for.
1984
+ 3. At the top of the next page, click the Raw Profile tab to open a raw profile of the job in a separate dialog. The associated raw profile dialog shows a variety of information for review.
1985
+
1986
+ ### Views
1987
+
1988
+ Within the Raw Profile dialog, you can analyze the Job Metrics based on the following views:
1989
+
1990
+ | View | Description |
1991
+ | --- | --- |
1992
+ | Query | Shows the selected query statement and job metrics. See if your SQL query is what you were expecting and the query is run against the source data. |
1993
+ | Visualized Plan | Shows a visualized diagram and job metrics. This view is useful in understanding the flow of the query and for analyzing out of memory issues and incorrect results. The detailed visualized pan diagram is always read from the bottom up. |
1994
+ | Planning | Shows planning metrics, query output schema, non default options, and job metrics. This view shows how query planning is executed, because it provides statistics about the actual cost of the query operations in terms of memory, input/output, and CPU processing. You can use this view to identify which operations consumed the majority of the resources during a query and to address the cost-intensive operations. In particular, the following information is useful: * Non Default Options – See if non-default parameters are being used. * Metadata Cache Hits and Misses with times * Final Physical Transformation – Look for pushdown queries for RDBMS, MongoDB, or Elasticsearch, filter pushdowns or partition pruning for parquet, and view usage of stripes for ORC. * Compare the estimated row count versus the actual scan, join, or aggregate result. * Row Count – See if row count (versus rows) is used. Row count can cause an expensive broadcast. * Build – See if build (versus probe) is used. Build loads data into memory. |
1995
+ | Acceleration | Shows Reflection outcome, canonicalized user query alternatives, Reflection details, and job metrics. * Multiple substitutions – See if the substitutions are excessive. * System activity – See if `sys.project.reflections`, `sys.project.materializations`, and `sys.project.refreshes` are excessive. * Comparisons – Compare cumulative cost (found in Best Cost Replacement Plan) against Logical Planning, which is in the Planning view. This view is useful for determining whether exceptions or matches are occurring. The following considerations determines the acceleration process: * Considered, Matched, Chosen – The query is accelerated. * Considered, Matched, Not Chosen – The query is not accelerated because either a costing issue or an exception during substitution occurred. * Considered, Not Matched, Not Chosen – The query is not accelerated because the Reflection does not have the data to accelerate. |
1996
+ | Error | (If applicable) Shows information about an error. The Failure Node is always the coordinator node and the server name inside the error message is the actual affected node. |
1997
+
1998
+ ### Job Metrics
1999
+
2000
+ Each view displays the following metrics:
2001
+
2002
+ * **Job Summary**
2003
+ * **Time in UTC**
2004
+ * **State Durations**
2005
+ * **Context**
2006
+ * **Threads**
2007
+ * **Resource Allocation**
2008
+ * **Nodes**
2009
+ * **Operators**
2010
+
2011
+ #### Job Summary
2012
+
2013
+ The job summary information includes:
2014
+
2015
+ * State
2016
+ * Coordinator
2017
+ * Threads
2018
+ * Command Pool Wait
2019
+ * Total Query Time
2020
+ * # Joins in user query
2021
+ * # Joins in final plan
2022
+ * Considered Reflections
2023
+ * Matched Reflections
2024
+ * Chosen Reflections
2025
+
2026
+ #### Time in UTC
2027
+
2028
+ The Time in UTC section lists the job's start and end time, in UTC format.
2029
+
2030
+ #### State Durations
2031
+
2032
+ The State Durations section lists the length of time (in milliseconds) for each of the job states:
2033
+
2034
+ * Pending
2035
+ * Metadata Retrieval
2036
+ * Planning
2037
+ * Engine Start
2038
+ * Queued
2039
+ * Execution Planning
2040
+ * Starting
2041
+ * Running
2042
+
2043
+ For descriptions of the job states, see [Job States and Statuses](/dremio-cloud/admin/monitor/jobs/#job-states-and-statuses).
2044
+
2045
+ #### Context
2046
+
2047
+ If you are querying an Iceberg catalog object, the Context section lists the Iceberg catalog and branch that is referenced in the query. Otherwise, the Context section is not populated. Read [Iceberg Catalogs in Dremio](/dremio-cloud/developer/data-formats/iceberg#iceberg-catalogs-in-dremio) for more information.
2048
+
2049
+ #### Threads
2050
+
2051
+ The Threads section provides an overview table and a major fragment block for each major fragment. Each row in the Overview table provides the number of minor fragments that Dremio parallelized from each major fragment, as well as aggregate time and memory metrics for the minor fragments.
2052
+
2053
+ Major fragment blocks correspond to a row in the Overview table. You can expand the blocks to see metrics for all of the minor fragments that were parallelized from each major fragment, including the host on which each minor fragment ran. Each row in the major fragment table presents the fragment state, time metrics, memory metrics, and aggregate input metrics of each minor fragment.
2054
+
2055
+ In particular, the following metrics are useful:
2056
+
2057
+ * Setup – Time opening and closing of files.
2058
+ * Waiting – Time waiting on the CPU.
2059
+ * Blocked on Downstream – Represents completed work whereas the next phase is not ready to accept work.
2060
+ * Blocked on Upstream – Represents the phase before it is ready to give work though the cloud phase is not ready.
2061
+ * Phase Metrics – Displays memory used per node (Phases can run in parallel).
2062
+
2063
+ #### Resource Allocation
2064
+
2065
+ The Resource Allocation section shows the following details for managed resources and workloads:
2066
+
2067
+ * Engine Name
2068
+ * Queue Name
2069
+ * Queue Id
2070
+ * Query Cost
2071
+ * Query Type
2072
+
2073
+ #### Nodes
2074
+
2075
+ The Nodes section includes host name, resource waiting time, and peak memory.
2076
+
2077
+ #### Operators
2078
+
2079
+ The Operators section shows aggregate metrics for each operator within a major fragment that performed relational operations during query execution.
2080
+
2081
+ **Operator Overview Table**
2082
+
2083
+ The following table lists descriptions for each column in the Operators Overview table:
2084
+
2085
+ | Column Name | Description |
2086
+ | --- | --- |
2087
+ | SqlOperatorImpl ID | The coordinates of an operator that performed an operation during a particular phase of the query. For example, 02-xx-03 where 02 is the major fragment ID, xx corresponds to a minor fragment ID, and 03 is the Operator ID. |
2088
+ | Type | The operator type. Operators can be of type project, filter, hash join, single sender, or unordered receiver. |
2089
+ | Min Setup Time, Avg Setup Time, Max Setup Time | In general, the time spent opening and closing files. Specifically, the minimum, average, and maximum amount of time spent by the operator to set up before performing the operation. |
2090
+ | Min Process Time, Avg Process Time, Max Process Time | The shortest amount of time the operator spent processing a record, the average time the operator spent in processing each record, and the maximum time that the operator spent in processing a record. |
2091
+ | Wait (min, avg, max) | In general, the time spent waiting on Disk I/O. These fields represent the minimum, average, and maximum times spent by operators waiting on disk I/O. |
2092
+ | Avg Peak Memory | Represents the average of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort. |
2093
+ | Max Peak Memory | Represents the maximum of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort. |
2094
+
2095
+ **Operator Block**
2096
+
2097
+ The Operator Block shows time and memory metrics for each operator type within a major fragment. Examples of operator types include:
2098
+
2099
+ * SCREEN
2100
+ * PROJECT
2101
+ * WRITER\_COMMITTER
2102
+ * ARROW\_WRITER
2103
+
2104
+ The following table describes each column in the Operator Block:
2105
+
2106
+ | Column Name | Description |
2107
+ | --- | --- |
2108
+ | Thread | The coordinate ID of the minor fragment on which the operator ran. For example, 04-03-01 where 04 is the major fragment ID, 03 is the minor fragment ID, and 01 is the Operator ID. |
2109
+ | Setup Time | The amount of time spent by the operator to set up before performing its operation. This includes run-time code generation and opening a file. |
2110
+ | Process Time | The amount of time spent by the operator to perform its operation. |
2111
+ | Wait Time | The cumulative amount of time spent by an operator waiting for external resources. such as waiting to send records, waiting to receive records, waiting to write to disk, and waiting to read from disk. |
2112
+ | Max Batches | The maximum number of record batches consumed from a single input stream. |
2113
+ | Max Records | The maximum number of records consumed from a single input stream. |
2114
+ | Peak Memory | Represents the peak direct memory allocated. Relates to the memory needed by the operators to perform their operations, such as hash join and sort. |
2115
+ | Host Name | The hostname of the Executor the minor fragment is running on. |
2116
+ | Record Processing Rate | The rate at which records in the minor fragment are being processed. Combined with the Host Name, the Record Processing Rate can help find hot spots in the cluster, either from skewed data or a noisy query running on the same cluster. |
2117
+ | Operator State | The status of the minor fragment. |
2118
+ | Last Schedule Time | The last time at which work related to the minor fragment was scheduled to be executed. |
2119
+
2120
+ Operator blocks also contain three drop-down menus: Operator Metrics, Operator Details, and Host Metrics. Operator Metrics and Operator Details are unique to the type of operator and provide more detail about the operation of the minor fragments. Operator Metrics and Operator Details are intended to be consumed by Dremio engineers. Depending on the operator, both can be blank. Host Metrics provides high-level information about the host used when executing the operator.
2121
+
2122
+ Was this page helpful?
2123
+
2124
+ * Visual Profiles
2125
+ + Phases
2126
+ + Use Visual Profiles
2127
+ + Use Cases
2128
+ * Raw Profiles
2129
+ + Views
2130
+ + Job Metrics
2131
+
2132
+ <div style="page-break-after: always;"></div>
2133
+
2134
+ # Results Cache | Dremio Documentation
2135
+
2136
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/results-cache
2137
+
2138
+ On this page
2139
+
2140
+ Results cache improves query performance by reusing results from previous executions of the same deterministic query, provided that the underlying dataset remains unchanged and the previous execution was by the same user. The results cache feature works out of the box, requires no configuration, and automatically caches and reuses results. Regardless of whether a query uses results cache, it always returns the same results.
2141
+
2142
+ Results cache is client-agnostic, meaning a query executed in the Dremio console will result in a cache hit even if it is later re-run through other clients like JDBC, ODBC, REST, or Arrow Flight. For a query to use the cache, its query plan must remain identical to the original cached version. Any changes to the schema or dataset generate a new query plan, invalidating the cache.
2143
+
2144
+ Results cache also supports seamless coordinator scale-out, allowing newly added coordinators to benefit immediately from previously cached results.
2145
+
2146
+ ## Cases Supported By Results Cache
2147
+
2148
+ Query result are cached in the following cases:
2149
+
2150
+ * The SQL statement is a `SELECT` statement.
2151
+ * The query reads from an Iceberg, Parquet dataset, or from a raw Reflection defined on other Dremio supported data sources and formats, such as relational databases, `CSV`, `JSON`, or `TEXT`.
2152
+ * The query does not contain dynamic functions such as `QUERY_USER`, `IS_MEMBER`, `RAND`, `CURRENT_DATE`, or `NOW`.
2153
+ * The query does not reference `SYS` or `INFORMATION_SCHEMA` tables, or use external query.
2154
+ * The result set size, when stored in Arrow format, is less than or equal to 20 MB.
2155
+ * The query is not executed in Dremio console as a preview.
2156
+
2157
+ ## View Whether Queries Used Results Cache
2158
+
2159
+ You can view the list of jobs on the Jobs page to determine if queries from data consumers were accelerated by the results cache.
2160
+
2161
+ To find whether a query was accelerated by a results cache:
2162
+
2163
+ 1. Find the job that ran the query and look for ![This is the icon that indicates a Reflection was used.](/images/icons/reflections.png "Reflections icon") next to it, which indicates that the query was accelerated using either Reflections or the results cache.
2164
+ 2. Click on the row representing the job that ran the query to view the job summary. The summary, displayed in the pane to the right, provides details on whether the query was accelerated using results cache or Reflections.
2165
+
2166
+ ![Results cache on the Job Overview page](/images/cloud/jobs-details-results-cache.png "Results cache")
2167
+
2168
+ ## Storage
2169
+
2170
+ Cached results are stored in the project store alongside all project-specific data, such as metadata and Reflections. Executors write cache entries as Arrow data files and read them when processing `SELECT` queries that result in a cache hit. Coordinators are responsible for managing the deletion of expired cache files.
2171
+
2172
+ ## Deletion
2173
+
2174
+ A background task running on one of the Dremio coordinators handles cache expiration. This task runs every hour to mark cache entries that have not been accessed in the past 24 hours as expired and subsequently deletes them along with their associated cache files.
2175
+
2176
+ ## Considerations and Limitations
2177
+
2178
+ SQL queries executed through the Dremio console or a REST client that access the cache will rewrite the cached query results to the job results store to enable pagination.
2179
+
2180
+ Was this page helpful?
2181
+
2182
+ * Cases Supported By Results Cache
2183
+ * View Whether Queries Used Results Cache
2184
+ * Storage
2185
+ * Deletion
2186
+ * Considerations and Limitations
2187
+
2188
+ <div style="page-break-after: always;"></div>
2189
+
2190
+ # Workload Management | Dremio Documentation
2191
+
2192
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/engines/workload-management
2193
+
2194
+ On this page
2195
+
2196
+ This topic covers how to manage resources and workloads by routing queries to particular engines through rules.
2197
+
2198
+ ## Overview
2199
+
2200
+ You can manage Dremio workloads via routing rules, which are evaluated at runtime (before query planning) to decide which [query engine](/dremio-cloud/admin/engines/) to use for a given query. In projects with only one engine, all queries share the same execution resources and route to the same single engine. However, when multiple engines are provisioned, rules determine the engine to be used.
2201
+
2202
+ You must arrange the rules in the order that you want them to be evaluated. In the case that multiple rules evaluate to true for a given query, the first rule that returns true will be used to select the engine.
2203
+ The following diagram shows a series of rules that are evaluated when a job gets submitted.
2204
+
2205
+ * Rule1 routes jobs to Engine1
2206
+ * Rule2 routes jobs to Engine2
2207
+ * Rule3 routes jobs to the default engine that was created on project start up
2208
+ * Rule4 rejects the jobs that evaluate to true
2209
+
2210
+ ![](/images/cloud/rules-diagram.png)
2211
+
2212
+ ## Rules
2213
+
2214
+ You can use Dremio SQL syntax to specify rules to target particular jobs.
2215
+
2216
+ The following are the types of rules that can be created along with examples.
2217
+
2218
+ ### User
2219
+
2220
+ Create a rule that identifies the user that triggers the job.
2221
+
2222
+ Create rule that identifies user
2223
+
2224
+ ```
2225
+ USER in ('JRyan','PDirk','CPhillips')
2226
+ ```
2227
+
2228
+ ### Group Membership
2229
+
2230
+ Create a rule that identifies if the user that triggers the job is part of a particular group.
2231
+
2232
+ Create rule that identifies whether user belongs to a specified group
2233
+
2234
+ ```
2235
+ is_member('MarketingOps') OR
2236
+ is_member('Engineering')
2237
+ ```
2238
+
2239
+ ### Job Type
2240
+
2241
+ Create a rule depending on the type of job. The types of jobs can be identified by the following categories:
2242
+
2243
+ * Flight
2244
+ * JDBC
2245
+ * Internal Preview
2246
+ * Internal Run
2247
+ * Metadata Refresh
2248
+ * ODBC
2249
+ * Reflections
2250
+ * REST
2251
+ * UI Download
2252
+ * UI Preview
2253
+ * UI Run
2254
+
2255
+ Create rule based on type of job
2256
+
2257
+ ```
2258
+ query_type() IN ('JDBC', 'ODBC', 'UI Run', 'Flight')
2259
+ ```
2260
+
2261
+ ### Query Label
2262
+
2263
+ Labels enable rules that route queries running named commands to specific engines. Dremio supports the following query labels:
2264
+
2265
+ | Query Label | Description |
2266
+ | --- | --- |
2267
+ | COPY | Assigned to all queries running a [COPY INTO](/dremio-cloud/sql/commands/copy-into-table) SQL command |
2268
+ | CTAS | Assigned to all queries running a [CREATE TABLE AS](/dremio-cloud/sql/commands/create-table-as) SQL command |
2269
+ | DML | Assigned to all queries running an [INSERT](/dremio-cloud/sql/commands/insert), [UPDATE](/dremio-cloud/sql/commands/update), [DELETE](/dremio-cloud/sql/commands/delete), [MERGE](/dremio-cloud/sql/commands/merge), or [TRUNCATE](/dremio-cloud/sql/commands/truncate) SQL command |
2270
+ | OPTIMIZATION | Assigned to all queries running an [OPTIMIZE](/dremio-cloud/sql/commands/optimize-table) SQL command |
2271
+
2272
+ Here are two example routing rules:
2273
+
2274
+ Create a routing rule for queries running a COPY INTO command
2275
+
2276
+ ```
2277
+ query_label() IN ('COPY')
2278
+ ```
2279
+
2280
+ Create a routing rule for queries running the DML commands INSERT, UPDATE, DELETE, MERGE, or TRUNCATE
2281
+
2282
+ ```
2283
+ query_label() IN ('DML')
2284
+ ```
2285
+
2286
+ ### Query Attributes
2287
+
2288
+ Query attributes enable routing rules that direct queries to specific engines based on their characteristics.
2289
+
2290
+ Dremio supports the following query attributes:
2291
+
2292
+ | Query Attribute | Description |
2293
+ | --- | --- |
2294
+ | `DREMIO_MCP` | Set when the job is submitted via the Dremio MCP Server. |
2295
+ | `AI_AGENT` | Set when the job is submitted via the Dremio AI Agent. |
2296
+ | `AI_FUNCTIONS` | Set when the job contains AI functions. |
2297
+
2298
+ You can use the following functions to define routing rules based on query attributes:
2299
+
2300
+ | Function | Applicable Attribute | Description |
2301
+ | --- | --- | --- |
2302
+ | `query_has_attribute(<attr>)` | `DREMIO_MCP`, `AI_AGENT`, `AI_FUNCTIONS` | Returns true if the specified attribute is present. |
2303
+ | `query_attribute(<attr>)` | `DREMIO_MCP`, `AI_AGENT`, `AI_FUNCTIONS` | Returns the value of the attribute (if present), otherwise NULL. |
2304
+ | `query_calls_ai_functions()` | NA | Returns true if the job has an AI function in the query. |
2305
+
2306
+ Examples:
2307
+
2308
+ Create a routing rule for queries that use AI functions and are executed by a user
2309
+
2310
+ ```
2311
+ query_calls_ai_functions() AND USER = 'JRyan'
2312
+ ```
2313
+
2314
+ Create a routing rule for queries with `DREMIO_MCP` and `AI_FUNCTION`
2315
+
2316
+ ```
2317
+ query_has_attribute('DREMIO_MCP') AND query_has_attribute('AI_FUNCTIONS')
2318
+ ```
2319
+
2320
+ ### Tag
2321
+
2322
+ Create a rule that routes jobs based on a routing tag.
2323
+
2324
+ Create rule that routes jobs based on routing tag
2325
+
2326
+ ```
2327
+ tag() = 'ProductionDashboardQueue'
2328
+ ```
2329
+
2330
+ ### Date and Time
2331
+
2332
+ Create a rule that routes a job based on the time it was triggered. Use Dremio SQL Functions.
2333
+
2334
+ Create rule that routes jobs based on time triggered
2335
+
2336
+ ```
2337
+ EXTRACT(HOUR FROM CURRENT_TIME)
2338
+ BETWEEN 9 AND 18
2339
+ ```
2340
+
2341
+ ### Combined Conditions
2342
+
2343
+ Create rules based on multiple conditions.
2344
+
2345
+ The following example routes a job depending on user, group membership, query type, query cost, tag, and the time of day that it was triggered.
2346
+
2347
+ Create rule based on user, group, job type, query cost, tag, and time triggered
2348
+
2349
+ ```
2350
+ (
2351
+ USER IN ('JRyan', 'PDirk', 'CPhillips')
2352
+ OR is_member('superadmins')
2353
+ )
2354
+ AND query_type IN ('ODBC')
2355
+ AND EXTRACT(HOUR FROM CURRENT_TIME)
2356
+ BETWEEN 9 AND 18
2357
+ ```
2358
+
2359
+ ### Default Rules
2360
+
2361
+ Each Dremio [project](/dremio-cloud/admin/projects/) has its own set of rules. When a project is created, Dremio automatically creates rules for the default and preview engines. You can edit these rules as needed.
2362
+
2363
+ | Order | Rule Name | Rule | Engine |
2364
+ | --- | --- | --- | --- |
2365
+ | 1 | UI Previews | query\_type() = 'UI Preview' | preview |
2366
+ | 2 | Reflections | query\_type() = 'Reflections' | default |
2367
+ | 3 | All Other Queries | All other queries | default |
2368
+
2369
+ ## View All Rules
2370
+
2371
+ To view all rules:
2372
+
2373
+ 1. Click the Project Settings ![This is the icon that represents the Project Settings.](/images/icons/project-settings.png "Icon represents the Project Settings.") icon in the side navigation bar.
2374
+ 2. Select **Engine Routing** in the project settings sidebar to see the list of engine routing rules.
2375
+
2376
+ ## Add a Rule
2377
+
2378
+ To add a rule:
2379
+
2380
+ 1. On the Engine Routing page, click the **Add Rule** button at the top-right corner of the screen.
2381
+ 2. In the **New Rule** dialog, for **Rule Name**, enter a name.
2382
+ 3. For **Conditions**, enter the routing condition. See Rules for supported conditions.
2383
+ 4. For **Action**, complete one of the following options:
2384
+
2385
+ a. If you want to route the jobs that meet the conditions to a particular engine, select the **Route to engine** option. Then use the engine selector to choose the engine.
2386
+
2387
+ b. If you want to reject the jobs that meet the conditions, select the **Reject** option.
2388
+ 5. Click **Add**.
2389
+
2390
+ ## Edit a Rule
2391
+
2392
+ To edit a rule:
2393
+
2394
+ 1. On the Engine Routing page, hover over the rule and click the Edit Rule ![This is the icon that represents the Edit Rule settings.](/images/icons/edit.png "Icon represents the Edit Rule settings.") icon that appears next to the rule.
2395
+ 2. In the **Edit Rule** dialog, for **Rule Name**, enter a name.
2396
+ 3. For **Conditions**, enter the routing condition. See Rules for supported conditions.
2397
+ 4. For **Action**, complete one of the following options:
2398
+
2399
+ a. If you want to route the jobs that meet the conditions to a particular engine, select the **Route to engine** option. Then use the engine selector to choose the engine.
2400
+
2401
+ b. If you want to reject the jobs that meet the conditions, select the **Reject** option.
2402
+ 5. Click **Save**.
2403
+
2404
+ ## Delete a Rule
2405
+
2406
+ To delete a rule:
2407
+
2408
+ 1. On the Engine Routing page, hover over the rule and click the Delete Rule ![This is the icon that represents the Delete Rule settings.](/images/icons/trash.png "Icon represents the Delete Rule settings.") icon that appears next to the rule.
2409
+
2410
+ caution
2411
+
2412
+ You must have at least one rule per project to route queries to a particular engine.
2413
+
2414
+ 2. In the **Delete Rule** dialog, click **Delete** to confirm.
2415
+
2416
+ ## Set and Reset Engines
2417
+
2418
+ The [`SET ENGINE`](/dremio-cloud/sql/commands/set-engine) SQL command is used to specify the exact execution engine to run subsequent queries in the current session. When using `SET ENGINE`, WLM rules and direct routing connection properties are bypassed, and queries are routed directly to the specified queue. The [`RESET ENGINE`](/dremio-cloud/sql/commands/reset-engine) command clears the session-level engine override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.
2419
+
2420
+ ## SET TAG
2421
+
2422
+ The [`SET TAG`](/dremio-cloud/sql/commands/set-tag) SQL command is used to specify routing tag for subsequent queries in the current session. If a `ROUTING_TAG` connection property is already set for the session, `SET TAG` will override it. When using `SET TAG`, you must have a previously defined Workload Management (WLM) routing rule that routes queries based on that routing tag. The [`RESET TAG`](/dremio-cloud/sql/commands/reset-tag) command clears the session-level routing tag override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.
2423
+
2424
+ ## Connection Tagging and Direct Routing Configuration
2425
+
2426
+ Routing tags are configured by setting the `ROUTING_TAG = <Tag Name>` parameter for a given session to the desired tag name.
2427
+
2428
+ ### JDBC Session Configuration
2429
+
2430
+ To configure JDBC sessions add the `ROUTING_TAG` parameter to the JDBC connection URL. For example: `jdbc:dremio:direct=localhost;ROUTING_TAG='TagA'`.
2431
+
2432
+ ### ODBC Session Configuration
2433
+
2434
+ Configure ODBC sessions as follows:
2435
+
2436
+ *Windows Sessions*
2437
+
2438
+ Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the ODBC DSN field.
2439
+
2440
+ *Mac OS Sessions*
2441
+
2442
+ 1. Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the system `odbc.ini` file located at `/Library/ODBC/odbc.ini`. After adding the parameter, an example Advanced Properties configuration might be: `AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;NumberOfPrefetchBuffers=5;ROUTING_TAG='TagA';`
2443
+ 2. Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the user's DSN located at `~/Library/ODBC/odbc.ini`
2444
+
2445
+ ## Best Practices for Workload Management
2446
+
2447
+ Because every query workload is different, engine sizing often depends on several factors, such as the complexity of queries, number of concurrent users, data sources, dataset size, file and table formats, and specific business requirements for latency and cost. Workload management (WLM) ensures reliable query performance by choosing adequately sized engines for each workload type, configuring engines, and implementing query routing rules to segregate and route query workload types to appropriate engines.
2448
+
2449
+ This section describes best practices for adding and using Dremio engines, as well as configuring WLM to achieve reliable query performance in Dremio. This section also includes tips for migrating from self-managed Dremio Software to fully managed Dremio and information about using the system table `sys.project.history.jobs`, which stores metadata for historical jobs executed in a project, to assess the efficacy of WLM settings and make adjustments.
2450
+
2451
+ ### Set Up Engines
2452
+
2453
+ As a fully managed offering, Dremio is the best deployment model for Dremio in production because it allows you to achieve high levels of reliability and durability for your queries and maximize resource efficiency with engine autoscaling and does not require you to manually create and manage engines.
2454
+
2455
+ Segregating workload types into separate engines is vital for mitigating noisy neighbor issues, which can jeopardize performance reliability. You can segregate workloads by type, such as ad hoc, dashboard, and lakehouse (COPY INTO, DML, and optimization), as well as by business unit to facilitate cost distribution.
2456
+
2457
+ Metadata and Reflection refresh workloads should have their own engines for executing metadata and Reflection refresh queries. These internal queries can use a substantial amount of engine bandwidth, so assigning separate engines ensures that they do not interfere with user-initiated queries. Initial engine sizes should be XSmall and Small, but these sizes may change depending on the number and complexity of Reflection refresh and metadata jobs.
2458
+
2459
+ Dremio recommends the following engine setup configurations:
2460
+
2461
+ * Dremio offers a range of [engine sizes](/dremio-cloud/admin/engines/#sizes). Experiment with typical queries, concurrency, and engine sizes to establish the best engine size for each workload type based on your organization's budget constraints and latency requirements.
2462
+ * Maximum concurrency is the maximum number of jobs that Dremio can execute concurrently on an engine replica. Dremio provides an out-of-the-box value for maximum concurrency based on engine size, but we recommend testing with typical queries directed to specific engines to determine the best maximum concurrency values for your query workloads.
2463
+ * Dremio offers autoscaling to meet the demands of dynamic workloads with engine replicas. It is vital to assess and configure each engine's autoscaling parameters based on your organization's budget constraints and latency requirements for each workload type. You can choose the minimum and maximum number of replicas for each engine and specify any advanced configuration as needed. For example, dashboard workloads must meet stringent low-latency requirements and are prioritized for performance rather than cost. Engines added and assigned to execute the dashboard workloads may therefore be configured to autoscale using replicas. On the other hand, an engine for ad hoc workloads may have budget constraints and therefore be configured to autoscale with a maximum of one replica.
2464
+
2465
+ ### Route Workloads
2466
+
2467
+ Queries are routed to engines according to routing rules. You may use Dremio's out-of-the-box routing rules that route queries to the preview engines that are established by default, but Dremio recommends creating custom routing rules based on your workloads and business requirements. Custom rules can include factors such as user, group membership, job type, date and time, query label, and tag. Read Rules for examples.
2468
+
2469
+ The following table lists example routing rules based on query\_type, query\_label, and tags:
2470
+
2471
+ | Order | Rule Name | Rule | Engine |
2472
+ | --- | --- | --- | --- |
2473
+ | 1 | Reflections | `query_type() = 'Reflections'` | Reflection |
2474
+ | 2 | Metadata | `query_type() = 'Metadata Refresh'` | Metadata |
2475
+ | 3 | Dashboards | `tag() = 'dashboard'` | Dashboard |
2476
+ | 4 | Ad hoc Queries | `query_type() IN ( 'UI Run' , 'REST') OR tag() = 'ad hoc'` | Ad hoc |
2477
+ | 5 | Lakehouse Queries | `query_label() IN ('COPY','DML','CTAS', 'OPTIMIZATION')` | Lakehouse |
2478
+ | 6 | All Other Queries | All other queries | Preview |
2479
+
2480
+ ### Use the `sys.project.history.jobs` System Table
2481
+
2482
+ The [`sys.project.history.jobs`](/dremio-cloud/sql/system-tables/jobs-historical) system table contains metadata for recent jobs executed in a project, including time statistics, cost, and other relevant information. You can use the data in the `sys.project.history.jobs` system table to evaluate the effectiveness of WLM settings and make adjustments based on job metadata.
2483
+
2484
+ ### Use Job Analyzer
2485
+
2486
+ Job Analyzer is a package of useful query and view definitions that you may create over the `sys.projects.history.jobs` system table and use to analyze job metadata. Job Analyzer is available in a [public GitHub repository](https://github.com/dremio/professional-services/tree/main/tools/dremio-cloud-job-analyzer).
2487
+
2488
+ Was this page helpful?
2489
+
2490
+ * Overview
2491
+ * Rules
2492
+ + User
2493
+ + Group Membership
2494
+ + Job Type
2495
+ + Query Label
2496
+ + Query Attributes
2497
+ + Tag
2498
+ + Date and Time
2499
+ + Combined Conditions
2500
+ + Default Rules
2501
+ * View All Rules
2502
+ * Add a Rule
2503
+ * Edit a Rule
2504
+ * Delete a Rule
2505
+ * Set and Reset Engines
2506
+ * SET TAG
2507
+ * Connection Tagging and Direct Routing Configuration
2508
+ + JDBC Session Configuration
2509
+ + ODBC Session Configuration
2510
+ * Best Practices for Workload Management
2511
+ + Set Up Engines
2512
+ + Route Workloads
2513
+ + Use the `sys.project.history.jobs` System Table
2514
+ + Use Job Analyzer
2515
+
2516
+ <div style="page-break-after: always;"></div>
2517
+
2518
+ # Manual Reflections | Dremio Documentation
2519
+
2520
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/
2521
+
2522
+ On this page
2523
+
2524
+ With [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) reducing the need for manual work, you no longer need to create or manage Reflections. However, when Autonomous Reflections are not enabled or for situations that require manual control, this page provides guidance on getting Reflection recommendations and how to manage raw Reflections, aggregation Reflections, and external Reflections in Dremio.
2525
+
2526
+ note
2527
+
2528
+ For non-duplicating joins, Dremio can accelerate queries that reference only some of the joins in a Reflection, eliminating the need to create separate Reflections for every table combination.
2529
+
2530
+ ## Reflection Recommendations
2531
+
2532
+ When [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) are not enabled, Dremio automatically provides recommendations to add and remove Reflections based on query patterns to optimize performance for queries on Iceberg tables, UniForm table, Parquet datasets, and any views built on these datasets.
2533
+
2534
+ Recommendations to add Reflections are sorted by overall effectiveness, with the most effective recommendations shown on top. Effectiveness relates to metrics such as the estimated number of accelerated jobs, potential increase in query execution speedup, and potential time saved during querying. These are rough estimates based on past data that can give you insight into the potential benefits of each recommendation.
2535
+ Reflections created using these recommendations refresh automatically when source data changes on:
2536
+
2537
+ * Iceberg tables – When the table is modified through Dremio or other engines. Dremio polls tables every 10 seconds.
2538
+ * Parquet datasets – When metadata is updated in Dremio.
2539
+
2540
+ To view and apply the Reflection recommendations:
2541
+
2542
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project Settings**.
2543
+ 2. Select **Reflections** from the project settings sidebar.
2544
+ 3. Click **Reflections Recommendations** to access the list of suggested Reflections.
2545
+ 4. To apply a recommendation, click ![](/images/icons/add-recommendation.png) at the end of the corresponding row.
2546
+
2547
+ Reflections created using usage-based recommendations are only used when fully synchronized with their source data to ensure up-to-date query results.
2548
+
2549
+ To generate recommendations for default raw and aggregation Reflections, you can obtain the job IDs by looking them up on the [Jobs page](/dremio-cloud/admin/monitor/jobs). Then, use either the [`SYS.RECOMMEND_REFLECTIONS`](/dremio-cloud/sql/table-functions/recommend-reflections) table function or the [Recommendations API](/dremio-cloud/api/reflection/recommendations) to submit job IDs to accelerate specific SQL queries.
2550
+
2551
+ ## Raw Reflections
2552
+
2553
+ Retain the same number of records as its anchor while allowing a subset of columns. It enhances query performance by materializing complex views, transforming data from non-performant sources into the Iceberg table format optimized for large-scale analytics, and utilizing partitioning and sorting for faster access. By precomputing and storing data in an optimized format, raw Reflections significantly reduce query latency and improve overall efficiency.
2554
+
2555
+ You can use the Reflections editor to create two types of raw Reflection:
2556
+
2557
+ * A default raw Reflection that includes all of the columns of the anchor, but does not sort or horizontally partition on any columns
2558
+ * A raw Reflection that includes all or a subset of the columns of the anchor, and that does one or both of the following things:
2559
+
2560
+ + Sorts on one or more columns
2561
+ + Horizontally partitions the data according to the values in one or more columns
2562
+
2563
+ note
2564
+
2565
+ For creating Reflections on views and tables with row-access and column-masking policies, see [Use Reflections on Datasets with Policies](/dremio-cloud/manage-govern/row-column-policies#use-reflections-on-datasets-with-policies).
2566
+
2567
+ ### Prerequisites
2568
+
2569
+ * If you want to accelerate queries on unoptimized data or data in slow storage, create a view that is itself created from a table in a non-columnar format or on slow-scan storage. You can then create your raw Reflection from that view.
2570
+ * If you want to accelerate "needle-in-a-haystack" queries, create a view that includes a predicate to include only the rows that you want to scan. You can then create your raw Reflection from that view.
2571
+ * If you want to accelerate queries that perform expensive transformations, create a view that performs those transformations. You can then create your raw Reflection from that view.
2572
+ * If you want to accelerate queries that perform joins, create a view that performs the joins. You can then create your raw Reflection from that view.
2573
+
2574
+ ### Create Default Raw Reflections
2575
+
2576
+ In the **Basic** view of the Reflections editor, you can create a raw Reflection that includes all of the fields that are in a table or view. Creating a basic raw Reflection ensures that Dremio never runs user queries against the underlying table or view when the raw Reflection is enabled.
2577
+
2578
+ To create a raw Reflection in the **Basic** view of the Reflections editor:
2579
+
2580
+ 1. In the Dremio console, click ![This is the icon that represents the Datasets page.](/images/icons/datasets-page.png "Datasets page.") in the side navigation bar to go to the Datasets page.
2581
+ 2. In the catalog or folder in which the anchor is located, hover over the anchor name and click ![](/images/icons/settings.png).
2582
+ 3. Select **Reflections** in the table or view settings sidebar.
2583
+ 4. Click the toggle switch on the left side of the **Raw Reflections** bar.
2584
+
2585
+ ![](/images/enabling-raw-reflections.png)
2586
+ 5. Click **Save**.
2587
+
2588
+ #### Restrictions of the **Basic** View
2589
+
2590
+ * You cannot select fields to sort or create horizontal partitions on.
2591
+ * The name of the Reflection that you create is restricted to "Raw Reflection".
2592
+ * You can create only one raw Reflection. If you want to create multiple raw Reflections at a time, use the **Advanced** view.
2593
+
2594
+ ### Create Customized Raw Reflections
2595
+
2596
+ In the **Advanced** view of the Reflections editor, you can create one or more raw Reflections that include all or a selection of the fields that are in the anchor or supported anchor. You can also choose sort fields and fields for partitioning horizontally.
2597
+
2598
+ Dremio recommends that you follow the best practices listed in [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) when you create customized raw Reflections.
2599
+
2600
+ If you make any of the following changes to a raw Reflection when you are using the **Advanced** view, you cannot switch to the **Basic** view:
2601
+
2602
+ * Deselect one or more fields in the **Display** column. By default, all of the fields are selected.
2603
+ * Select one or more fields in the **Sort**, **Partition**, or **Distribute** column.
2604
+
2605
+ To create a raw Reflection in the **Advanced** view of the Reflections editor:
2606
+
2607
+ 1. In the Dremio console, click ![This is the icon that represents the Datasets page.](/images/icons/datasets-page.png "Datasets page.") in the side navigation bar to go to the Datasets page.
2608
+ 2. In the catalog or folder in which the anchor is located, hover over the anchor name and click ![](/images/icons/settings.png).
2609
+ 3. If the **Advanced** view is not already displayed, click the **Advanced View** button in the top-right corner of the editor.
2610
+ 4. Click the toggle switch in the table labeled **Raw Reflection** to enable the raw Reflection.
2611
+
2612
+ Queries do not start using the Reflection, however, until after you have finished editing the Reflection and click **Save** in a later step.
2613
+
2614
+ ![](/images/raw-reflections.png)
2615
+ 5. (Optional) Click in the label to rename the Reflection.
2616
+
2617
+ The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
2618
+ 6. In the columns of the table, follow these steps, which you don't have to do in any particular order:
2619
+
2620
+ note
2621
+
2622
+ Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
2623
+
2624
+ * Click in the **Display** column to include fields in or exclude them from your Reflection.
2625
+ * Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
2626
+ * Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
2627
+
2628
+ note
2629
+
2630
+ If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, recommended partition columns and transforms are selected for you. If you change the selection of columns, then this icon appears at the top of the table: ![This is the Recommendations icon.](/images/icons/partition-column-recommendation-icon.png "The Recommendations icon"). You can click it to revert to the recommended selection of partition columns.
2631
+ 7. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
2632
+
2633
+ a. Click the ![](/images/icons/settings.png) in the table in which you are defining the Reflection.
2634
+
2635
+ b. In the field **Reflection execution strategy**, select either of these options:
2636
+
2637
+ * Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
2638
+ * Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
2639
+ 8. Click **Save** when you are finished.
2640
+
2641
+ ### Edit Raw Reflections
2642
+
2643
+ You can edit an existing raw Reflection. You might want to do so if you are iteratively designing and testing a raw Reflection, if the definition of the view that the Reflection was created from was changed, or if the schema of the underlying table was changed.
2644
+
2645
+ If you created a raw Reflection in the **Basic** view of the Reflections editor, you must use the **Advanced** view to edit it.
2646
+
2647
+ Dremio runs the job or jobs to recreate the Reflection after you click **Save**.
2648
+
2649
+ To edit a raw Reflection in the **Advanced** view of the Reflections editor:
2650
+
2651
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project settings**.
2652
+ 2. Select **Reflections** in the project settings sidebar.
2653
+ 3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
2654
+ 4. Click the **Advanced View** button in the top-right corner of the editor.
2655
+ 5. In the **Raw Reflections** section of the **Advanced** view, locate the table that shows the definition of your Reflection.
2656
+ 6. (Optional) Click in the label to rename the Reflection.
2657
+
2658
+ The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
2659
+ 7. In the columns of the table, follow these steps, which you don't have to do in any particular order:
2660
+
2661
+ * Click in the **Display** column to include fields in or exclude them from your Reflection.
2662
+ * Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
2663
+ * Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
2664
+
2665
+ If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, partition columns and transforms are recommended for you. Hover over ![This is the Recommendations icon.](/images/icons/partition-column-recommendation-icon.png "The Recommendations icon") at the top of the table to see the recommendation. Click the icon to accept the recommendation.
2666
+
2667
+ note
2668
+
2669
+ Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
2670
+ 8. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
2671
+
2672
+ a. Click the ![](/images/icons/settings.png) in the table in which you are defining the Reflection.
2673
+
2674
+ b. In the field **Reflection execution strategy**, select either of these options:
2675
+
2676
+ * Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
2677
+ * Select **Minimize Number Of Files** when you want to improve read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
2678
+ 9. Click **Save** when you are finished.
2679
+
2680
+ ## Aggregation Reflections
2681
+
2682
+ Accelerate BI-style queries that involve aggregations (`GROUP BY` queries) by precomputing results (like `SUM`, `COUNT`, `AVG`, `GROUP BY`) across selected dimensions and measures. By precomputing expensive computations, they significantly improve query performance at runtime. These Reflections are ideal for analytical workloads with frequent aggregations on large datasets.
2683
+
2684
+ ### Create Default Aggregation Reflections
2685
+
2686
+ You can use the **Basic** view of the Reflections editor to create one aggregation Reflection that includes fields, from the anchor or supported anchor, that are recommended for use as dimensions or measures. You can add or remove dimensions and measures, too.
2687
+
2688
+ To create an aggregation Reflection in the **Basic** view of the Reflections editor:
2689
+
2690
+ 1. In the Dremio console, click ![This is the icon that represents the Datasets page.](/images/icons/datasets-page.png "Datasets page.") in the side navigation bar to go to the Datasets page.
2691
+ 2. In the catalog or folder in which the anchor is located, hover over the anchor name and click ![](/images/icons/settings.png).
2692
+ 3. In the **Aggregations Reflections** section of the editor, click **Generate** to get recommended fields to use as dimensions and measures. This will override any previously selected dimensions and measures. If you wish to proceed, click **Continue** in the confirmation dialog that follows.
2693
+ 4. In the **Aggregation Reflection** section of the editor, modify or accept the recommended fields for dimensions and measures.
2694
+ 5. To make the Reflection available to the query optimizer after you create it, click the toggle switch on the left side of the **Aggregation Reflections** bar.
2695
+
2696
+ ![](/images/enabling-aggregation-reflections.png)
2697
+ 6. Click **Save**.
2698
+
2699
+ #### Restrictions
2700
+
2701
+ * You can create only one aggregation Reflection in the **Basic** view. If you want to create multiple aggregations Reflections at a time, use the **Advanced** view.
2702
+ * You cannot select fields for sorting or horizontally partitioning.
2703
+ * The name of the Reflection is restricted to "Aggregation Reflection".
2704
+
2705
+ ### Create Customized Aggregation Reflections
2706
+
2707
+ You can use the **Advanced** view of the Reflections editor to create one or more aggregation Reflections that select which fields in the anchor or supporting anchor to use as dimensions and measures. For each field that you use as a measure, you can use one or more of these SQL functions: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`. You can also choose sort fields and fields for partitioning horizontally.
2708
+
2709
+ Before you create customized aggregation Reflections, Dremio recommends that you follow the best practices listed in [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) when you create customized aggregation Reflections.
2710
+
2711
+ To create an aggregation Reflection in the **Advanced** view of the Reflections editor:
2712
+
2713
+ 1. In the Dremio console, click ![This is the icon that represents the Datasets page.](/images/icons/datasets-page.png "Datasets page.") in the side navigation bar to go to the Datasets page.
2714
+ 2. In the catalog or folder in which the anchor is located, hover over the anchor name and click ![](/images/icons/settings.png).
2715
+ 3. Click the **Advanced View** button in the top-right corner of the editor.
2716
+ 4. Click **Aggregation Reflections**.
2717
+
2718
+ The Aggregation Reflections section is displayed, and one table for refining the aggregation Reflection that appeared in the **Basic** view is ready.
2719
+
2720
+ ![](/images/aggregation-reflections.png)
2721
+ 5. (Optional) Click in the name to rename the Reflection.
2722
+
2723
+ The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
2724
+ 6. In the columns of the table, follow these steps, which you don't have to do in any particular order:
2725
+
2726
+ * Click in the **Dimension** column to include or exclude fields to use as dimensions.
2727
+ * Click in the **Measure** column to include or exclude fields to use as measures. You can use one or more of these SQL functions for each measure: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`.
2728
+
2729
+ If you want to include a computed measure, first create a view with the computed column to use as a measure, and then create the aggregation Reflection on the view.
2730
+
2731
+ The full list of SQL aggregation functions that Dremio supports is not supported in the Reflections editor. If you want to create a Reflection that aggregates data by using the `AVG`, `CORR`, `HLL`, `SUM`, `VAR_POP`, or `VAR_SAMP` SQL functions, you must create a view that uses the function, and then create a raw Reflection from that view.
2732
+
2733
+ * Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
2734
+ * Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
2735
+
2736
+ If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, recommended partition columns and transforms are selected for you. If you change the selection of columns, then this icon appears at the top of the table: ![This is the Recommendations icon.](/images/icons/partition-column-recommendation-icon.png "The Recommendations icon"). You can click it to revert back to the recommended selection of partition columns.
2737
+
2738
+ note
2739
+
2740
+ Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
2741
+ 7. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
2742
+
2743
+ a. Click the ![](/images/icons/settings.png) in the table in which you are defining the Reflection.
2744
+
2745
+ b. In the field **Reflection execution strategy**, select either of these options:
2746
+
2747
+ * Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
2748
+ * Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
2749
+ 8. Click **Save** when you are finished.
2750
+
2751
+ ### Edit Aggregation Reflections
2752
+
2753
+ You might want to edit an aggregation Reflection if you are iteratively designing and testing an aggregation Reflection, if the definition of the view that the Reflection was created from was changed, if the schema of the underlying table was changed, or if you want to revise one or more aggregations defined in the Reflection.
2754
+
2755
+ If you created an aggregation Reflection in the **Basic** view of the Reflections editor, you can edit that Reflection either in the **Basic** view or in the **Advanced** view.
2756
+
2757
+ Dremio runs the job or jobs to recreate the Reflection after you click **Save**.
2758
+
2759
+ #### Use the Basic View
2760
+
2761
+ To edit an aggregation Reflection in the **Basic** view of the Reflections editor:
2762
+
2763
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project settings**.
2764
+ 2. Select **Reflections** in the project settings sidebar.
2765
+ 3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
2766
+ 4. In the Aggregation Reflection section of the editor, modify or accept the recommendation for **Dimension** and **Measure** columns.
2767
+ 5. Click **Save**.
2768
+
2769
+ #### Use the Advanced View
2770
+
2771
+ To edit an aggregation Reflection in the **Advanced** view of the Reflections editor:
2772
+
2773
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project settings**.
2774
+ 2. Select **Reflections** in the project settings sidebar.
2775
+ 3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
2776
+ 4. Click the **Advanced View** button in the top-right corner of the editor.
2777
+ 5. Click **Aggregation Reflections**.
2778
+ 6. (Optional) Click in the name to rename the Reflection.
2779
+
2780
+ The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
2781
+ 7. In the columns of the table, follow these steps, which you don't have to do in any particular order:
2782
+
2783
+ * Click in the **Dimension** column to include or exclude fields to use as dimensions.
2784
+ * Click in the **Measure** column to include or exclude fields to use as measures. You can use one or more of these SQL functions for each measure: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`.
2785
+
2786
+ The full list of SQL aggregation functions that Dremio supports is not supported in the Reflections editor. If you want to create a Reflection that aggregates data by using the `AVG`, `CORR`, `HLL`, `SUM`, `VAR_POP`, or `VAR_SAMP` SQL functions, you must create a view that uses the function, and then create a raw Reflection from that view.
2787
+
2788
+ * Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
2789
+ * Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
2790
+
2791
+ If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, partition columns and transforms are recommended for you. Hover over ![This is the Recommendations icon.](/images/icons/partition-column-recommendation-icon.png "The Recommendations icon") at the top of the table to see the recommendation. Click the icon to accept the recommendation.
2792
+
2793
+ note
2794
+
2795
+ Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
2796
+ 8. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
2797
+
2798
+ a. Click the ![](/images/icons/settings.png) in the table in which you are defining the Reflection.
2799
+
2800
+ b. In the field **Reflection execution strategy**, select either of these options:
2801
+
2802
+ * Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
2803
+ * Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
2804
+ 9. Click **Save** when you are finished.
2805
+
2806
+ ## External Reflections
2807
+
2808
+ Reference precomputed tables in external data sources instead of materializing Reflections within Dremio, eliminating refresh overhead and storage costs. You can use an external Reflection by defining a view in Dremio that matches the precomputed table and mapping the view to the external data source. The data in the precomputed table is not refreshed by Dremio. When querying the view, Dremio’s query planner leverages the external Reflection to generate optimal execution plans, improving query performance without additional storage consumption in Dremio.
2809
+
2810
+ ### Create External Reflections
2811
+
2812
+ To create an external Reflection:
2813
+
2814
+ 1. Follow these steps in the data source:
2815
+
2816
+ a. Select your source table.
2817
+
2818
+ b. Create a table that is derived from the source table, such as an aggregation table, if you do not already have one.
2819
+ 2. Follow these steps in Dremio:
2820
+
2821
+ a. [Define a view on the derived table in the data source.](/dremio-cloud/sql/commands/create-view) The definition must match that of the derived table.
2822
+
2823
+ b. [Define a new external Reflection that maps the view to the derived table.](/dremio-cloud/sql/commands/alter-table)
2824
+
2825
+ note
2826
+
2827
+ The data types and column names in the external Reflection must match those in the view that the external Reflection is mapped to.
2828
+
2829
+ Suppose you have a data source named `mySource` that is connected to Dremio. In that data source, there are (among all of your other tables) these two tables:
2830
+
2831
+ * `sales`, which is a very large table of sales data.
2832
+ * `sales_by_region`, which aggregates by region the data that is in `sales`.
2833
+ You want to make the data in `sales_by_region` available to data analysts who use Dremio. However, because you already have the `sales_by_region` table created, you do not see the need to create a Dremio table from `sales`, then create a Dremio view that duplicates `sales_by_region`, and finally create a Reflection on the view. You would like instead to make `sales_by_region` available to queries run from BI tools through Dremio.
2834
+
2835
+ To do that, you follow these steps:
2836
+
2837
+ 1. Create a view in Dremio that has the same definition as `sales_by_region`. Notice that the `FROM` clause points to the `sales` table that is in your data source, not to a Dremio table.
2838
+
2839
+ Example View
2840
+
2841
+ ```
2842
+ CREATE VIEW "myWorkspace"."sales_by_region" AS
2843
+ SELECT
2844
+ AVG(sales_amount) average_sales,
2845
+ SUM(sales_amount) total_sales,
2846
+ COUNT(*) sales_count,
2847
+ region
2848
+ FROM mySource.sales
2849
+ GROUP BY region
2850
+ ```
2851
+ 2. Create an external Reflection that maps the view above to `sales_by_region` in `mySource`.
2852
+
2853
+ Example External Reflection
2854
+
2855
+ ```
2856
+ ALTER DATASET "myWorkspace"."sales_by_region"
2857
+ CREATE EXTERNAL Reflection "external_sales_by_region"
2858
+ USING "mySource"."sales_by_region"
2859
+ ```
2860
+
2861
+ The external Reflection lets Dremio's query planner know that there is a table in `mySource` that matches the Dremio view `myWorkplace.sales_by_region` and that can be used to satisfy queries against the view. When Dremio users query `myWorkspace.sales_by_region`, Dremio routes the query to the data source `mySource`, which runs the query against `mySource.sales_by_region`.
2862
+
2863
+ ### Edit External Reflections
2864
+
2865
+ If you have modified the DDL of a derived table in your data source, follow these steps in Dremio to update the corresponding external Reflection:
2866
+
2867
+ 1. [Replace the view with one that has a definition that matches the definition of the derived table](/dremio-cloud/sql/commands/create-view). When you do so, the external Reflection is dropped.
2868
+ 2. [Define a new external Reflection that maps the view to the derived table.](/dremio-cloud/sql/commands/alter-table)
2869
+
2870
+ ## Test Reflections
2871
+
2872
+ You can test whether the Reflections that you created are used to satisfy a query without actually running the query. This practice can be helpful when the tables are very large and you want to avoid processing large queries unnecessarily.
2873
+
2874
+ To test whether one or more Reflections are used by a query:
2875
+
2876
+ 1. In the Dremio console, click ![The SQL Runner icon](/images/sql-runner-icon.png "The SQL Runner icon") in the side navigation bar to open the SQL Runner.
2877
+ 2. In the SQL editor, type `EXPLAIN PLAN FOR` and then type or paste in your query.
2878
+ 3. Click **Run**.
2879
+ 4. When the query has finished, click the **Run** link found directly above the query results to view the job details. Any Reflections used will be shown on the page.
2880
+
2881
+ ## View Whether Queries Used Reflections
2882
+
2883
+ You can view the list of jobs on the Jobs page to find out whether queries were accelerated by Reflections. The Jobs page lists the jobs that ran queries, both queries from your data consumers and queries run within the Dremio user interface.
2884
+
2885
+ To find whether a query used a Reflection:
2886
+
2887
+ 1. Find the job that ran the query by looking below the details in each row.
2888
+ 2. Look for ![This is the icon that indicates a Reflection was used.](/images/icons/reflections.png "Reflections icon") next to the job to indicate that one or more Reflections were used.
2889
+ 3. View the job summary by clicking the row that represents the job that ran the query. The job summary appears in the pane to the right of the list of jobs.
2890
+
2891
+ ### Relationship between Reflections and Jobs
2892
+
2893
+ The relationship between a job and a Reflection can be one of the following types:
2894
+
2895
+ * CONSIDERED – The Reflection is defined on a dataset that is used in the query but was determined not to cover the query (for example, the Reflection did not have a field that is used by the query).
2896
+ * MATCHED – A Reflection could have been used to accelerate the query, but Dremio determined that it would not provide any benefits or another Reflection was determined to be a better choice.
2897
+ * CHOSEN – A Reflection is used to accelerate the query. Note that multiple Reflections can be used to accelerate queries.
2898
+
2899
+ ## Disable Reflections
2900
+
2901
+ Disabled Reflections become unavailable for use by queries and will not be refreshed manually or according to their schedule.
2902
+
2903
+ note
2904
+
2905
+ Dremio does not disable external Reflections.
2906
+
2907
+ To disable a Reflection:
2908
+
2909
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project Settings**.
2910
+ 2. Select **Reflections** in the project settings sidebar.
2911
+
2912
+ This opens the Reflections editor for the Reflection's anchor or supporting anchor.
2913
+ 3. Follow one of these steps:
2914
+
2915
+ * If there is only one raw Reflection for the table or view, in the **Basic** view, click the toggle switch in the **Raw Reflections** bar.
2916
+ * If there are two or more raw Reflections for the table or view, in the **Advanced** view, click the toggle switch for the individual raw Reflection that you want to disable.
2917
+ * If there is only one aggregation Reflection for the table or view, in the **Basic** view, click the toggle switch in the **Raw Reflections** bar.
2918
+ * If there are two or more aggregation Reflections for the table or view, in the **Advanced** view, click the toggle switch for the individual aggregation Reflection that you want to disable.
2919
+ 4. Click **Save**. The changes take effect immediately.
2920
+
2921
+ ## Delete Reflections
2922
+
2923
+ You can delete Reflections individually, or all of the Reflections on a table or view. When you delete a Reflection, its definition, data, and metadata are entirely deleted.
2924
+
2925
+ To delete a single raw or aggregation Reflection:
2926
+
2927
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project settings**.
2928
+ 2. Select **Reflections** in the project settings sidebar.
2929
+
2930
+ This opens the Reflections editor for the Reflection's anchor or supporting anchor.
2931
+ 3. Open the **Advanced** view, if it is not already open.
2932
+ 4. If the Reflection is an aggregation Reflection, click **Aggregation Reflections**.
2933
+ 5. Click ![](/images/icons/trash.png) for the Reflection that you want to delete.
2934
+ 6. Click **Save**. The deletion takes effect immediately.
2935
+
2936
+ To delete all raw and aggregation Reflections on a table or view:
2937
+
2938
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project Settings**.
2939
+ 2. Select **Reflections** in the project settings sidebar.
2940
+
2941
+ This opens the Reflections editor for the Reflection's anchor or supporting anchor.
2942
+ 3. Click the in the top right corner of the Reflections page.
2943
+ 4. Click **Delete all reflections**.
2944
+ 5. Click **Save**.
2945
+
2946
+ To delete an external Reflection, or to delete a raw or aggregation Reflection without using the Reflections editor, run this SQL command:
2947
+
2948
+ Delete a Reflection
2949
+
2950
+ ```
2951
+ ALTER DATASET <DATASET_PATH> DROP Reflection <REFLECTION_NAME>
2952
+ ```
2953
+
2954
+ * `DATASET_PATH`: The path of the view on which the external Reflection is based.
2955
+ * `REFLECTION_NAME`: The name of the external Reflection.
2956
+
2957
+ ## Related Topics
2958
+
2959
+ * [Data Reflections Deep Dive](https://university.dremio.com/course/data-reflections-deep-dive) – Enroll in this Dremio University course to learn more about Reflections.
2960
+ * [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) - Follow best practices in Dremio's Well-Architected Framework for creating and managing Reflections.
2961
+
2962
+ Was this page helpful?
2963
+
2964
+ * Reflection Recommendations
2965
+ * Raw Reflections
2966
+ + Prerequisites
2967
+ + Create Default Raw Reflections
2968
+ + Create Customized Raw Reflections
2969
+ + Edit Raw Reflections
2970
+ * Aggregation Reflections
2971
+ + Create Default Aggregation Reflections
2972
+ + Create Customized Aggregation Reflections
2973
+ + Edit Aggregation Reflections
2974
+ * External Reflections
2975
+ + Create External Reflections
2976
+ + Edit External Reflections
2977
+ * Test Reflections
2978
+ * View Whether Queries Used Reflections
2979
+ + Relationship between Reflections and Jobs
2980
+ * Disable Reflections
2981
+ * Delete Reflections
2982
+ * Related Topics
2983
+
2984
+ <div style="page-break-after: always;"></div>
2985
+
2986
+ # Bring Your Own Project Store | Dremio Documentation
2987
+
2988
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/your-own-project-storage
2989
+
2990
+ On this page
2991
+
2992
+ To enable secure access between Dremio and your AWS environment, you must create an AWS Identity and Access Management (IAM) role with specific permissions and a trust relationship that allows Dremio’s AWS account to assume that role. The IAM policy and trust configuration are detailed bellow.
2993
+
2994
+ ## Create Your IAM Role
2995
+
2996
+ You will create an IAM Role in your AWS account that grants Dremio the permissions it needs to access your S3 bucket.
2997
+
2998
+ Attach the following policy to the role and replace `<bucket-name>` with the name of your own S3 bucket.
2999
+
3000
+ IAM Policy
3001
+
3002
+ ```
3003
+ {
3004
+ "Version": "2012-10-17",
3005
+ "Statement": [
3006
+ {
3007
+ "Effect": "Allow",
3008
+ "Action": [
3009
+ "s3:GetBucketLocation",
3010
+ "s3:ListAllMyBuckets"
3011
+ ],
3012
+ "Resource": "*"
3013
+ },
3014
+ {
3015
+ "Effect": "Allow",
3016
+ "Action": [
3017
+ "s3:PutObject",
3018
+ "s3:GetObject",
3019
+ "s3:ListBucket",
3020
+ "s3:DeleteObject"
3021
+ ],
3022
+ "Resource": [
3023
+ "arn:aws:s3:::<bucket-name>",
3024
+ "arn:aws:s3:::<bucket-name>/*"
3025
+ ]
3026
+ }
3027
+ ]
3028
+ }
3029
+ ```
3030
+
3031
+ The first statement allows Dremio to find buckets in your account.
3032
+
3033
+ * **ListAllMyBuckets** – Allow Dremio to discover your buckets when validating connectivity.
3034
+ * **GetBucketLocation** - Allow Dremio to discover your bucket's location.
3035
+
3036
+ The second statement allows Dremio to work with the data in your bucket.
3037
+
3038
+ * **PutObject / GetObject / DeleteObject** – Allow Dremio to read, write, and delete data within the bucket.
3039
+ * **ListBucket** – Allow Dremio to enumerate objects in the bucket.
3040
+
3041
+ ## Define the Trust Relationship
3042
+
3043
+ The trust relationship determines which AWS account (in this case, Dremio’s) is permitted to assume your IAM role.
3044
+
3045
+ Attach the following policy to the role.
3046
+
3047
+ Dremio's US trust account ID is `894535543691`.
3048
+
3049
+ Trust Relationship
3050
+
3051
+ ```
3052
+ {
3053
+ "Version": "2012-10-17",
3054
+ "Statement": [
3055
+ {
3056
+ "Effect": "Allow",
3057
+ "Principal": {
3058
+ "AWS": "arn:aws:iam::894535543691:root"
3059
+ },
3060
+ "Action": [
3061
+ "sts:AssumeRole",
3062
+ "sts:TagSession"
3063
+ ]
3064
+ }
3065
+ ]
3066
+ }
3067
+ ```
3068
+
3069
+ * **AssumeRole** - Allows Dremio to assume the provided role.
3070
+ * **TagSession** - Allows Dremio to pass identifying tags during role assumption, enabling improved tracking and auditing across accounts.
3071
+
3072
+ ## Validate Role Configuration
3073
+
3074
+ 1. In the AWS Console, navigate to **IAM → Roles → [Your Role Name]**.
3075
+ 2. Confirm that:
3076
+
3077
+ * The permissions policy matches the example above.
3078
+ * The trust relationship allows the Dremio AWS account as the trusted principal.
3079
+ * Both `sts:AssumeRole` and `sts:TagSession` actions are present.
3080
+ 3. If Dremio provided an AWS account ID or specific region endpoint, ensure these match your configuration.
3081
+
3082
+ ## Provide Role ARN to Dremio
3083
+
3084
+ Once your role is created and validated:
3085
+
3086
+ * Copy the Role ARN (e.g. `arn:aws:iam::<your-account-id>:role/<role-name>`).
3087
+ * Provide this ARN to Dremio via the [Create Project](/dremio-cloud/admin/projects/#create-a-project) flow.
3088
+
3089
+ This allows Dremio to assume the role securely and begin reading/writing data to your S3 bucket.
3090
+
3091
+ ## (Optional) Enable PrivateLink Connectivity
3092
+
3093
+ To enhance security and keep data traffic within AWS’s private network, Dremio supports integration via [AWS PrivateLink](/dremio-cloud/security/privatelink) with DNS-based endpoint resolution.
3094
+
3095
+ **To enable:**
3096
+
3097
+ * Ensure your AWS environment has PrivateLink endpoints configured for the required services.
3098
+ * Verify that DNS resolution is enabled so that Dremio can route traffic to your private endpoints.
3099
+ * Confirm connectivity by testing the endpoint using your VPC configuration.
3100
+
3101
+ Was this page helpful?
3102
+
3103
+ * Create Your IAM Role
3104
+ * Define the Trust Relationship
3105
+ * Validate Role Configuration
3106
+ * Provide Role ARN to Dremio
3107
+ * (Optional) Enable PrivateLink Connectivity
3108
+
3109
+ <div style="page-break-after: always;"></div>
3110
+
3111
+ # Autonomous Reflections | Dremio Documentation
3112
+
3113
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/autonomous-reflections/
3114
+
3115
+ On this page
3116
+
3117
+ Dremio automatically creates and manages Reflections based on query patterns to optimize performance for queries on Iceberg tables, UniForm tables, Parquet datasets, and any views built on these datasets. With Autonomous Reflections, management and maintenance are fully automated, reducing manual effort and ensuring queries run efficiently. This eliminates the need for manual performance tuning while maintaining query correctness.
3118
+
3119
+ note
3120
+
3121
+ For data sources and formats not supported by Autonomous Reflections, you can create [manual Reflections](/dremio-cloud/admin/performance/manual-reflections) to optimize query performance.
3122
+
3123
+ ## What Is a Reflection?
3124
+
3125
+ A Reflection is a precomputed and optimized copy of a query result, designed to speed up query performance. It is derived from an existing table or view, known as its anchor.
3126
+
3127
+ Dremio's query optimizer uses Reflections to accelerate queries by avoiding the need to scan the original data. Instead of querying the raw source, Dremio automatically rewrites queries to use Reflections when they provide the necessary results, without requiring you to reference them directly.
3128
+
3129
+ When Dremio receives a query, it first determines whether any Reflections have at least one table in common with the tables and views referenced by the query. If any Reflections do, Dremio evaluates them to determine whether they satisfy the query. Then, if any Reflections do satisfy the query, Dremio generates a query plan that uses them.
3130
+
3131
+ Dremio then compares the cost of the plan to the cost of executing the query directly against the tables, and selects the plan with the lower cost. Finally, Dremio executes the selected query plan. Typically, plans that use one or more Reflections are less expensive than plans that run against raw data.
3132
+
3133
+ ## How Workloads Are Autonomously Accelerated
3134
+
3135
+ Dremio autonomously creates Reflections to accelerate queries on existing views, queries with joins written directly on base tables (not referencing any views), and queries that summarize data, typically submitted by AI Agents and BI dashboards.
3136
+
3137
+ Reflections are automatically generated based on query patterns without user intervention. Dremio continuously collects metadata from user queries, and the Autonomous Algorithm runs daily at midnight UTC to analyze recent query patterns from the last 7 days and create Autonomous Reflections that accelerate frequent and expensive queries.
3138
+
3139
+ ### Query Qualification
3140
+
3141
+ Only queries meeting the following criteria are considered:
3142
+
3143
+ 1. Based on Iceberg tables, Uniform table, Parquet datasets, or views built on them. Queries referencing non-Iceberg or non-Parquet datasets, either directly or via a view, are excluded.
3144
+ 2. Execution time longer than one second.
3145
+
3146
+ Dremio may create system-managed views to anchor raw or aggregation Reflections that cannot be modified or referenced by users. Admins can drop these views, which also deletes the associated Reflection.
3147
+
3148
+ ### Reflection Limits
3149
+
3150
+ Dremio can create up to 100 Reflections total, with a maximum of 10 new Reflections created per day. The actual number depends on query patterns.
3151
+
3152
+ ## How Autonomous Reflections Are Maintained
3153
+
3154
+ Autonomous Reflections refresh automatically when source data changes:
3155
+
3156
+ * **Iceberg tables**: Refreshed when the table is modified via Dremio (triggered immediately) or other engines (Dremio polls tables every 10 seconds to detect changes).
3157
+ * **Uniform tables**: Refreshed when the table is modified via Dremio (triggered immediately) or other engines (Dremio polls tables every 10 seconds to detect changes).
3158
+ * **Parquet datasets**: Refreshed when metadata updates occur in Dremio.
3159
+
3160
+ **Refresh Engine:** When a project is created, Dremio automatically provisions a Small internal refresh engine dedicated to executing Autonomous Reflection refresh jobs. This ensures Reflections are always accurate and up-to-date without manual refresh. The engine automatically shuts down after 30 seconds of idle time to optimize resource usage and costs.
3161
+
3162
+ ## Usage and Data Freshness
3163
+
3164
+ Dremio only uses Reflections in query plans when they refresh with the most recent data in tables on which they are based. If a Reflection is not yet refreshed, queries automatically fall back to the raw data source, ensuring query correctness is never compromised.
3165
+
3166
+ ### Monitor Reflections
3167
+
3168
+ To view Autonomous Reflections created for your project and their metadata (including status, score, footprint, and queries accelerated), see [View Reflection Details](/dremio-cloud/admin/performance/manual-reflections/reflection-details).
3169
+
3170
+ To view the history of changes to Autonomous Reflections in the last 30 days:
3171
+
3172
+ 1. Go to **Project Settings** > **Reflections**.
3173
+ 2. Click **History Log**.
3174
+
3175
+ ## Remove Reflections
3176
+
3177
+ Autonomous Reflections can be removed in two ways:
3178
+
3179
+ 1. **Automatic Removal** – When a Autonomous Reflection's score falls below the threshold, it is disabled for 7 days before being automatically dropped. Admins can view disabled Autonomous Reflections in the history log.
3180
+ 2. **Manual Removal** – Admins can manually drop Autonomous Reflections at any time. Autonomous Reflections cannot be modified by users. If an admin manually drops a Autonomous Reflection three times, Dremio will not recreate it for 90 days.
3181
+
3182
+ ## Disable Reflections
3183
+
3184
+ Every project created in Dremio is automatically accelerated with Autonomous Reflections. To disable Autonomous Reflections for a project:
3185
+
3186
+ 1. Go to **Project Settings** > **Preferences**.
3187
+ 2. Toggle the **Autonomous Reflections** setting to off.
3188
+
3189
+ ## Related Topics
3190
+
3191
+ * [Data Product Fundamentals](https://university.dremio.com/course/data-product-fundamentals) – Enroll in this Dremio University course to learn more about Autonomous Reflections.
3192
+
3193
+ Was this page helpful?
3194
+
3195
+ * What Is a Reflection?
3196
+ * How Workloads Are Autonomously Accelerated
3197
+ + Query Qualification
3198
+ + Reflection Limits
3199
+ * How Autonomous Reflections Are Maintained
3200
+ * Usage and Data Freshness
3201
+ + Monitor Reflections
3202
+ * Remove Reflections
3203
+ * Disable Reflections
3204
+ * Related Topics
3205
+
3206
+ <div style="page-break-after: always;"></div>
3207
+
3208
+ # View Reflection Details | Dremio Documentation
3209
+
3210
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/reflection-details
3211
+
3212
+ On this page
3213
+
3214
+ The Reflections page lists all raw and aggregation Reflections in Dremio.
3215
+
3216
+ To view this page, follow these steps:
3217
+
3218
+ 1. In the Dremio console, hover over ![](/images/icons/settings.png) in the side navigation bar and select **Project Settings**.
3219
+ 2. Select **Reflections** in the project settings sidebar.
3220
+
3221
+ For any particular Reflection, the Reflections page presents information that answers these questions:
3222
+
3223
+ | Question | Column with the answer |
3224
+ | --- | --- |
3225
+ | What is the status of this Reflection? | Name |
3226
+ | Is this a raw or aggregation Reflection? | Type |
3227
+ | Which table or view is this Reflection defined on? | Dataset |
3228
+ | How valuable is this Reflection? | Reflection Score |
3229
+ | How Reflection was created and managed? | Mode |
3230
+ | How can I see a list of the jobs that created and refreshed this Reflection? | Refresh Job History |
3231
+ | How many times has the query planner chosen this Reflection? | Acceleration Count |
3232
+ | How many times has the query planner considered using this Reflection? | Considered Count |
3233
+ | How many times did the query planner match a query to this Reflection? | Matched Count |
3234
+ | How do I find out how effective this Reflection is? | Acceleration Count |
3235
+ | When was this Reflection last refreshed? | Last Refresh From Table |
3236
+ | Is this Reflection being refreshed now? | Refresh Status |
3237
+ | What type of refreshes are used for this Reflection? | Refresh Method |
3238
+ | Are refreshes scheduled for this Reflection, or do they need to be triggered manually? | Refresh Status |
3239
+ | How much time did the most recent refresh of this Reflection take? | Last Refresh Duration |
3240
+ | How many records are in this Reflection? | Record Count |
3241
+ | How much storage is this Reflection taking up? | Current Footprint |
3242
+ | When does this Reflection expire? | Available Until |
3243
+
3244
+ ## Columns
3245
+
3246
+ ### Acceleration Count
3247
+
3248
+ Shows the number of times within the last 30 days that the query planner considered using a Reflection defined on a dataset referenced by a query, determined the Reflection could be used to satisfy the query, and chose to use the Reflection to satisfy the query.
3249
+
3250
+ If this count is low relative to the numbers in the **Considered Count** and **Matched Count**, the Reflection is not effective in reducing the execution times of queries on the dataset.
3251
+
3252
+ ### Available Until
3253
+
3254
+ Shows the date and time when this Reflection expires, based on the refresh policy of the queried dataset.
3255
+
3256
+ If a Reflection is set to expire soon and you want to continue using it, you can take either of these actions:
3257
+
3258
+ * Change the expiration setting on the table which the Reflection is either directly or indirectly defined on. A Reflection is indirectly defined on a table when it is defined on a view that is derived from that table. When you change the setting by using this method, the change goes into effect after the next refresh.
3259
+ * Change the expiration setting on the data source where the table is located.
3260
+
3261
+ For the steps, see [Set the Reflection Expiration Policy](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh#set-the-reflection-expiration-policy).
3262
+
3263
+ ### Mode
3264
+
3265
+ Shows how Reflection was created and managed.
3266
+
3267
+ * **autonomous**: Created and managed by Dremio
3268
+ * **manual**: Created and managed by user
3269
+
3270
+ ### Considered Count
3271
+
3272
+ Shows the number of queries, within the last 30 days, that referenced the dataset that a Reflection is defined on. Whenever a query references a dataset on which a Reflection is defined, the query planner considers whether to use the Reflection to help satisfy the query.
3273
+
3274
+ If the query planner determines that the Reflection can do that (that the Reflection matches the query), the query planner compares the Reflection to any others that might also be defined on the same dataset.
3275
+
3276
+ If the query planner does not determine this, it ignores the Reflection.
3277
+
3278
+ Reflections with high considered counts and no match counts are contributing to high logical planning times. Consider deleting them.
3279
+
3280
+ Reflections with a considered count of 0 should be removed. They are merely taking up storage and, during refreshes, resources on compute engines.
3281
+
3282
+ ### Current Footprint
3283
+
3284
+ Shows the current size, in kilobytes, of a Reflection.
3285
+
3286
+ ### Dataset
3287
+
3288
+ Shows the name of the table or view that a Reflection is defined on.
3289
+
3290
+ ### Last Refresh Duration
3291
+
3292
+ Shows the length of time required for the most recent refresh of a Reflection.
3293
+
3294
+ ### Last Refresh From Table
3295
+
3296
+ Shows the date and time that the Reflection data was last refreshed. If the refresh is running, failing, or disabled, the value is `12/31/1969 23:59:59`.
3297
+
3298
+ ### Matched Count
3299
+
3300
+ Shows the number of times, within the last 30 days, that the query planner both considered a Reflection for satisfying a query and determined that the Reflection would in fact satisfy the query. However, the query planner might have decided to use a different Reflection that also matched the query. For example, a different query plan that did not include the Reflection might have had a lower cost.
3301
+
3302
+ This number does not show how many times the query planner used the Reflection to satisfy the query. For that number, see Acceleration Count.
3303
+
3304
+ If the matched count is high and the accelerating count is low, the query planner is more often deciding to use a different Reflection that also matches a query. In this case, consider deleting the Reflection.
3305
+
3306
+ ### Name
3307
+
3308
+ Shows the name of the Reflection and its status. The tooltip on the icon represents a combination of the status of the Reflection (which you can filter on through the values in the **Acceleration Status** field above the list) and the value in the **Refresh Status** column.
3309
+
3310
+ ### Record Count
3311
+
3312
+ Shows the number of records in the Reflection.
3313
+
3314
+ ### Reflection Score
3315
+
3316
+ Shows the score for a Reflection on a scale of 0 (worst) to 100 (best). The score indicates the value that the Reflection provides to your workloads based on the jobs that have been executed in the last 7 days. Reflection scores are calculated once each day. Factors considered in the score include the number of jobs accelerated by the Reflection and the expected improvement in query run times due to the Reflection.
3317
+
3318
+ To help you interpret the scores, the scores have the following labels:
3319
+
3320
+ * **Good**: The score is more than 75.
3321
+ * **Fair**: The score is 25 to 75.
3322
+ * **Poor**: The score is less than 25.
3323
+ * **New**: The score is blank because the Reflection was created within the past 24 hours.
3324
+
3325
+ note
3326
+
3327
+ If a Reflection's score is listed as **-**, the score needs to be recalculated due to an error or an upgraded instance.
3328
+
3329
+ ### Refresh Job History
3330
+
3331
+ Opens a list of all of the jobs that created and refreshed a Reflection.
3332
+
3333
+ ### Refresh Method
3334
+
3335
+ Shows which type of refresh was last used for a Reflection.
3336
+
3337
+ * **Full**: All of the data in the Reflection was replaced. The new data is based on the current data in the underlying dataset.
3338
+ * **Incremental**:
3339
+ + For Reflections defined on Apache Iceberg tables: Either snapshot-based incremental refresh was used (if the changes were appends only) or partition-based incremental refresh was used (if the changes included DML operations).
3340
+ + For Reflections defined on Delta Lake tables: This value does not appear. Only full refreshes are supported for these Reflections.
3341
+ + For Reflections defined on all other tables: Data added to the underlying dataset since the last refresh of the Reflection was appended to the existing data in the Reflection.
3342
+ * **None**: Incremental refreshes were selected in the settings for the table. However, Dremio has not confirmed that it is possible to refresh the Reflection incrementally. Applies only to Reflections that are not defined on Iceberg or Delta Lake tables.
3343
+
3344
+ For more information, see [Refresh Reflections](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh).
3345
+
3346
+ ### Refresh Status
3347
+
3348
+ Shows one of these values:
3349
+
3350
+ * **Manual**: Refreshes are not run on a schedule, but must be triggered manually. See [Trigger Reflection Refreshes](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh#trigger-reflection-refreshes).
3351
+ * **Pending**: If the Reflection depends on other Reflections, the refresh will begin after the refreshes of the other Reflections are finished.
3352
+ * **Running**: The Reflection is currently being refreshed.
3353
+ * **Scheduled**: Refreshes run on a schedule, but a refresh is not currently running.
3354
+ * **Auto**: All of the Reflection’s underlying tables are in Iceberg format, and the Reflection automatically refreshes when new snapshots are created after an update to an underlying table, but a refresh is not currently running.
3355
+ * **Failed**: Multiple attempts to refresh a Reflection have failed. You must disable and enable the Reflection to rebuild it and continue using it. Reflections in this state will not be considered to accelerate queries.
3356
+
3357
+ For more information, see [Refresh Reflections](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh).
3358
+
3359
+ ### Total Footprint
3360
+
3361
+ Shows the current size, in kilobytes, of all of the existing materializations of the Reflection. More than one materialization of a Reflection can exist at the same time, so that refreshes do not interrupt running queries that are being satisfied by the Reflection.
3362
+
3363
+ ### Type
3364
+
3365
+ Shows whether the Reflection is a raw or aggregation Reflection.
3366
+
3367
+ Was this page helpful?
3368
+
3369
+ * Columns
3370
+ + Acceleration Count
3371
+ + Available Until
3372
+ + Mode
3373
+ + Considered Count
3374
+ + Current Footprint
3375
+ + Dataset
3376
+ + Last Refresh Duration
3377
+ + Last Refresh From Table
3378
+ + Matched Count
3379
+ + Name
3380
+ + Record Count
3381
+ + Reflection Score
3382
+ + Refresh Job History
3383
+ + Refresh Method
3384
+ + Refresh Status
3385
+ + Total Footprint
3386
+ + Type
3387
+
3388
+ <div style="page-break-after: always;"></div>
3389
+
3390
+ # Refresh Reflections | Dremio Documentation
3391
+
3392
+ Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/reflection-refresh
3393
+
3394
+ On this page
3395
+
3396
+ The data in a Reflection can become stale and may need to be refreshed. Refreshing a Reflection triggers two updates:
3397
+
3398
+ * The data stored in the Apache Iceberg table for the Reflection is updated.
3399
+ * The metadata that stores details about the Reflection is updated.
3400
+
3401
+ note
3402
+
3403
+ Dremio does not refresh the data that external Reflections are mapped to.
3404
+
3405
+ ## Types of Reflection Refresh
3406
+
3407
+ How Reflections are refreshed depend on the format of the base table.
3408
+
3409
+ ### Apache Iceberg Tables, Filesystem Sources, AWS Glue Sources, and Hive Sources
3410
+
3411
+ There are two methods that can be used to refresh Reflections that are defined either on Iceberg tables or on these types of datasets in filesystem, AWS Glue, and Hive sources:
3412
+
3413
+ * Parquet datasets in Filesystem sources (on S3, Azure Storage, Google Cloud Storage, or HDFS)
3414
+ * Parquet datasets, Avro datasets, or non-transactional ORC datasets on AWS Glue or Hive (Hive 2 or Hive 3) sources
3415
+
3416
+ Iceberg tables in all supported file-system sources (Amazon S3, Azure Storage, Google Cloud Storage, and HDFS) and non-file-system sources (AWS Glue, Hive, and Nessie) can be refreshed with either of these methods.
3417
+
3418
+ * Incremental refreshes
3419
+ * Full refreshes
3420
+
3421
+ #### Incremental Refreshes
3422
+
3423
+ There are two types of incremental refreshes:
3424
+
3425
+ * Incremental refreshes when changes to an anchor table are only append operations
3426
+ * Incremental refreshes when changes to an anchor table include non-append operations
3427
+
3428
+ note
3429
+
3430
+ * Whether an incremental refresh can be performed depends on the outcome of an algorithm.
3431
+ * The initial refresh of a Reflection is always a full refresh.
3432
+
3433
+ #### Incremental Refreshes When Changes to an Anchor Table Are Only Append Operations
3434
+
3435
+ note
3436
+
3437
+ Optimize operations on Iceberg tables are also supported for this type of incremental refresh.
3438
+
3439
+ This type of incremental refresh is used only when the changes to the anchor table are appends and do not include updates or deletes. There are two cases to consider:
3440
+
3441
+ * When a Reflection is defined on one anchor table
3442
+
3443
+ When a Reflection is defined on an anchor table or on a view that is defined on one anchor table, an incremental refresh is based on the differences between the current snapshot of the anchor table and the snapshot at the time of the last refresh.
3444
+ * When a Reflection is defined on a view that joins two or more anchor tables
3445
+
3446
+ When a Reflection is defined on a view that joins two or more anchor tables, whether an incremental refresh can be performed depends on how many anchor tables have changed since the last refresh of the Reflection:
3447
+
3448
+ + If just one of the anchor tables has changed since the last refresh, an incremental refresh can be performed. It is based on the differences between the current snapshot of the one changed anchor table and the snapshot at the time of the last refresh.
3449
+ + If two or more tables have been refreshed since the last refresh, then a full refresh is used to refresh the Reflection.
3450
+
3451
+ #### Incremental Refreshes When Changes to an Anchor Table Include Non-append Operations
3452
+
3453
+ For Iceberg tables, this type of incremental refresh is used when the changes are DML operations that delete or modify the data (UPDATE, DELETE, etc.) made either through the Copy-on-Write (COW) or the Merge-on-Read (MOR) storage mechanism. For more information about COW and MOR, see [Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg](https://www.dremio.com/blog/row-level-changes-on-the-lakehouse-copy-on-write-vs-merge-on-read-in-apache-iceberg/).
3454
+
3455
+ For sources in filesystems or AWS Glue, non-append operations can include, for example:
3456
+
3457
+ * In filesystem sources, files being deleted from Parquet datasets
3458
+ * In AWS Glue sources, DML-equivalent operations being performed on Parquet datasets, Avro datasets, or non-transactional ORC datasets
3459
+
3460
+ Both the anchor table and the Reflection must be partitioned, and the partition transforms that they use must be compatible.
3461
+
3462
+ There are two cases to consider:
3463
+
3464
+ * When a Reflection is defined on one anchor table
3465
+
3466
+ When a Reflection is defined on an anchor table or on a view that is defined on one anchor table, an incremental refresh is based on Iceberg metadata that is used to identify modified partitions and to restrict the scope of the refresh to only those partitions.
3467
+ * When a Reflection is defined on a view that joins two or more anchor tables
3468
+
3469
+ When a Reflection is defined on a view that joins two or more anchor tables, whether an incremental refresh can be performed depends on how many anchor tables have changed since the last refresh of the Reflection:
3470
+
3471
+ + If just one of the anchor tables has changed since the last refresh, an incremental refresh can be performed. It is based on Iceberg metadata that is used to identify modified partitions and to restrict the scope of the refresh to only those partitions.
3472
+ + If two or more tables have been refreshed since the last refresh, then a full refresh is used to refresh the Reflection.
3473
+
3474
+ note
3475
+
3476
+ Dremio uses Iceberg tables to store metadata for filesystem and AWS Glue sources.
3477
+
3478
+ For information about partitioning Reflections and applying partition transforms, see the section [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence/#horizontally-partition-reflections-that-have-many-rows).
3479
+
3480
+ For information about partitioning Reflections in ways that are compatible with the partitioning of anchor tables, see [Partition Reflections to Allow for Partition-Based Incremental Refreshes](/dremio-cloud/help-support/well-architected-framework/operational-excellence/#partition-reflections-to-allow-for-partition-based-incremental-refreshes).
3481
+
3482
+ #### Full Refreshes
3483
+
3484
+ In a full refresh, a Reflection is dropped, recreated, and loaded.
3485
+
3486
+ note
3487
+
3488
+ * Whether a full refresh is performed depends on the outcome of an algorithm.
3489
+ * The initial refresh of a Reflection is always a full refresh.
3490
+
3491
+ #### Algorithm for Determining Whether an Incremental or a Full Refresh Is Used
3492
+
3493
+ The following algorithm determines which refresh method is used:
3494
+
3495
+ 1. If the Reflection has never been refreshed, then a full refresh is performed.
3496
+ 2. If the Reflection is created from a view that uses nested group-bys, unions, window functions, or joins other than inner or cross joins, then a full refresh is performed.
3497
+ 3. If the Reflection is created from a view that joins two or more anchor tables and more than one anchor table has changed since the previous refresh, then a full refresh is performed.
3498
+ 4. If the Reflection is based on a view and the changed anchor table is used multiple times in that view, then a full refresh is performed.
3499
+ 5. If the changes to the anchor table are only appends, then an incremental refresh based on table snapshots is performed.
3500
+ 6. If the changes to the anchor table include non-append operations, then the compatibility of the partitions of the anchor table and the partitions of the Reflection is checked:
3501
+ * If the partitions of the anchor table and the partitions of the Reflection are not compatible, or if either the anchor table or the Reflection is not partitioned, then a full refresh is performed.
3502
+ * If the partition scheme of the anchor table has been changed since the last refresh to be incompatible with the partitioning scheme of a Reflection, and if changes have occurred to data belonging to a prior partition scheme or the new partition scheme, then a full refresh is performed.
3503
+ To avoid a full refresh when these two conditions hold, update the partition scheme for Reflection to match the partition scheme for the table. You do so in the **Advanced** view of the Reflection editor or through the `ALTER DATASET` SQL command.
3504
+ * If the partitions of the anchor table and the partitions of the Reflection are compatible, then an incremental refresh is performed.
3505
+
3506
+ Because this algorithm is used to determine which type of refresh to perform, you do not select a type of refresh for Reflections in the settings of the anchor table.
3507
+
3508
+ However, no data is read in the `REFRESH REFLECTION` job for Reflections that are dependent only on Iceberg, Parquet, Avro, non-transactional ORC datasets, or other Reflections and that have no new data since the last refresh based on the table snapshots. Instead, a "no-op" Reflection refresh is planned and a materialization is created, eliminating redundancy and minimizing the cost of a full or incremental Reflection refresh.
3509
+
3510
+ ### Delta Lake tables
3511
+
3512
+ Only full refreshes are supported. In a full refresh, the Reflection being refreshed is dropped, recreated, and loaded.
3513
+
3514
+ ### All Other Tables
3515
+
3516
+ * **Incremental refreshes**
3517
+
3518
+ Dremio appends data to the existing data for a Reflection. Incremental refreshes are faster than full refreshes for large Reflections, and are appropriate for Reflections that are defined on tables that are not partitioned.
3519
+
3520
+ There are two ways in which Dremio can identify new records:
3521
+
3522
+ + **For directory datasets in file-based data sources like S3 and HDFS:**
3523
+ Dremio can automatically identify new files in the directory that were added after the prior refresh.
3524
+ + **For all other datasets (such as datasets in relational or NoSQL databases):**
3525
+ An administrator specifies a strictly monotonically increasing field, such as an auto-incrementing key, that must be of type BigInt, Int, Timestamp, Date, Varchar, Float, Double, or Decimal. This allows Dremio to find and fetch the records that have been created since the last time the acceleration was incrementally refreshed.
3526
+
3527
+ caution
3528
+
3529
+ Use incremental refreshes only for Reflections that are based on tables and views that are appended to. If records can be updated or deleted in a table or view, use full refreshes for the Reflections that are based on that table or view.
3530
+ * **Full refreshes**
3531
+
3532
+ In a full refresh, the Reflection being refreshed is dropped, recreated, and loaded.
3533
+
3534
+ Full refreshes are always used in these three cases:
3535
+
3536
+ + A Reflection is partitioned on one or more fields.
3537
+ + A Reflection is created on a table that was promoted from a file, rather than from a folder, or is created on a view that is based on such a table.
3538
+ + A Reflection is created from a view that uses nested group-bys, joins, unions, or window functions.
3539
+
3540
+ ## Specify the Reflection Refresh Policy
3541
+
3542
+ In the settings for a data source, you specify the refresh policy for refreshes of all Reflections that are on the tables in that data source. The default policy is period-based, with one hour between each refresh. If you select a schedule policy, the default is every day at 8:00 a.m. (UTC).
3543
+
3544
+ In the settings for a table that is not in the Iceberg or Delta Lake format, you can specify the type of refresh to use for all Reflections that are ultimately derived from the table. The default refresh type is **Full refresh**.
3545
+
3546
+ For tables in all supported table formats, you can specify a refresh policy for Reflection refreshes that overrides the policy specified in the settings for the table's data source. The default policy is the schedule set at the source of the table.
3547
+
3548
+ To set the refresh policy on a data source:
3549
+
3550
+ 1. In the Dremio console, right-click a data lake or external source.
3551
+ 2. Select **Edit Details**.
3552
+ 3. In the sidebar of the Edit Source window, select **Reflection Refresh**.
3553
+ 4. When you are done making your selections, click **Save**. Your changes go into effect immediately.
3554
+
3555
+ To edit the refresh policy on a table:
3556
+
3557
+ 1. Locate the table.
3558
+ 2. Hover over the row in which it appears and click ![The Settings icon](/images/cloud/settings-icon.png "The Settings icon") to the right.
3559
+ 3. Select **Reflection Refresh** in the dataset settings sidebar.
3560
+ 4. When you are done making your selections, click **Save**. Your changes go into effect immediately.
3561
+
3562
+ ### Types of Refresh Policies
3563
+
3564
+ Datasets and sources can set Reflections to refresh according to the following policy types:
3565
+
3566
+ | Refresh policy type | Description |
3567
+ | --- | --- |
3568
+ | Never | Reflections are not refreshed. |
3569
+ | Period (default) | Reflections refresh at the specified number of hours, days, or weeks. The default refresh period is one hour. |
3570
+ | Schedule | Reflections refresh at a specific time on the specified days of the week, in UTC. The default is every day at 8:00 a.m. (UTC). |
3571
+ | Auto refresh when Iceberg table data changes | Reflections automatically refresh for underlying Iceberg tables whenever new updates occur. Reflections under this policy type are known as Live Reflections. Live Reflections are also updated based on the minimum refresh frequency defined by the source-level policy. This refresh policy is only available for data sources that support the Iceberg table format. |
3572
+
3573
+ ## Set the Reflection Expiration Policy
3574
+
3575
+ Rather than delete a Reflection manually, you can specify how long you want Dremio to retain the Reflection before deleting it automatically.
3576
+
3577
+ note
3578
+
3579
+ Dremio does not allow expiration policies to be set on external Reflections or Reflections that automatically refresh when Iceberg data changes according to the refresh policy.
3580
+
3581
+ To set the expiration policy for all Reflections derived from tables in a data source:
3582
+
3583
+ 1. Right-click a data lake or external source.
3584
+ 2. Select **Edit Details**.
3585
+ 3. Select **Reflection Refresh** in the edit source sidebar.
3586
+ 4. After making your changes, click **Save**. The changes take effect on the next refresh.
3587
+
3588
+ To set the expiration policy on Reflections derived from a particular table:
3589
+
3590
+ note
3591
+
3592
+ The table must be based on more than one file.
3593
+
3594
+ 1. Locate a table.
3595
+ 2. Click the ![The Settings icon](/images/cloud/settings-icon.png "The Settings icon") to its right.
3596
+ 3. Select **Reflection Refresh** in the dataset settings sidebar.
3597
+ 4. After making your changes, click **Save**. The changes take effect on the next refresh.
3598
+
3599
+ ## View the Reflection Refresh History
3600
+
3601
+ You can find out whether a refresh job for a Reflection has run, and how many times refresh jobs for a Reflection have been run.
3602
+
3603
+ To view the refresh history:
3604
+
3605
+ 1. In the Dremio console, go to the catalog or folder that lists the table or view from which the Reflection was created.
3606
+ 2. Hover over the row for the table or view.
3607
+ 3. In the **Actions** field, click ![The Settings icon](/images/cloud/settings-icon.png "The Settings icon").
3608
+ 4. Select **Reflections** in the dataset settings sidebar.
3609
+ 5. Click **History** in the heading for the Reflection.
3610
+
3611
+ The Jobs page is opened with the ID of the Reflection in the search box, and only jobs related to that ID are listed.
3612
+
3613
+ When a Reflection is refreshed, Dremio runs a single job with two steps:
3614
+
3615
+ * The first step writes the query results as a materialization to the distributed acceleration storage by running a `REFRESH REFLECTION` command.
3616
+ * The second step registers the materialization table and its metadata with the catalog so that the query optimizer can find the Reflection's definition and structure.
3617
+
3618
+ The following screenshot shows the `REFRESH REFLECTION` command used to refresh the Reflection named `Super-duper reflection`:
3619
+
3620
+ ![Reflection refresh job listed on the Jobs page in the Dremio console](/images/sw_reflection_creation_command.png "Reflection refresh job listed on the Jobs page")
3621
+
3622
+ The Reflection refresh is listed as a single job on the Jobs page, as shown in the example below:
3623
+
3624
+ ![Reflection refresh job listed on the Jobs page in the Dremio console](/images/sw_reflection_creation_single_job.png "Reflection refresh job listed on the Jobs page")
3625
+
3626
+ To find out which type of refresh was performed:
3627
+
3628
+ 1. Click the ID of the job that ran the `REFRESH REFLECTION` command.
3629
+ 2. Click the **Raw Profile** tab.
3630
+ 3. Click the **Planning** tab.
3631
+ 4. Scroll down to the **Refresh Decision** section.
3632
+
3633
+ ## Retry a Reflection Refresh Policy
3634
+
3635
+ When a Reflection refresh job fails, Dremio retries the refresh according to a uniform policy. This policy is designed to balance resource consumption with the need to keep Reflection data up to date. It prioritizes newly failed Reflections to reduce excessive retries on persistent failures and helps ensure that Reflection data does not become overly stale.
3636
+
3637
+ After a refresh failure, Dremio's default is to repeat the refresh attempt at exponential intervals up to 4 hours: 1 minute, 2 minutes, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, and 4 hours. Then, Dremio continues trying to refresh the Reflection every 4 hours.
3638
+
3639
+ There are two optimizations for special cases:
3640
+
3641
+ * **Long-running refresh jobs**: The backoff interval will never be shorter than the last successful duration.
3642
+ * **Small maximum retry attempts**: At least one 4-hour backoff attempt is guaranteed to ensure meaningful coverage of the retry policy.
3643
+
3644
+ Dremio stops retrying after 24 attempts, which typically takes about 71 hours and 52 minutes, or when the 72-hour retry window is reached, whichever comes first.
3645
+
3646
+ To configure a different maximum number of retry attempts for Reflection refreshes than Dremio's default of 24 retries:
3647
+
3648
+ 1. Click ![The Settings icon](/images/cloud/green-settings-icon.png "The Settings icon") in the left navbar.
3649
+ 2. Select **Reflections** in the left sidebar.
3650
+ 3. On the Reflections page, click ![The Settings icon](/images/cloud/settings-icon.png "The Settings icon") in the top-right corner and select **Acceleration Settings**.
3651
+ 4. In the field next to **Maximum attempts for Reflection job failures**, specify the maximum number of retries.
3652
+ 5. Click **Save**. The change goes into effect immediately.
3653
+
3654
+ Dremio applies the retry policy after a refresh failure for all types of Reflection refreshes, no matter whether the refresh was triggered or set by a refresh policy.
3655
+
3656
+ ## Trigger Reflection Refreshes
3657
+
3658
+ You can click a button to start the refresh of all of the Reflections that are defined on a table or on views derived from that table.
3659
+
3660
+ To trigger a refresh manually:
3661
+
3662
+ 1. Locate the table.
3663
+ 2. Hover over the row in which it appears and click ![The Settings icon](/images/cloud/settings-icon.png "The Settings icon") to the right.
3664
+ 3. In the sidebar of the Dataset Settings window, click **Reflection Refresh**.
3665
+ 4. Click **Refresh Now**. The message "All dependent Reflections will be refreshed." appears at the top of the screen.
3666
+ 5. Click **Save**.
3667
+
3668
+ You can refresh Reflections by using the Reflection API, the Catalog API, and the SQL commands [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table) and [`ALTER VIEW`](/dremio-cloud/sql/commands/alter-view).
3669
+
3670
+ * With the Reflection API, you specify the ID of a Reflection. See [Refresh a Reflection](/dremio-cloud/api/reflection/#refresh-a-reflection).
3671
+ * With the Catalog API, you specify the ID of a table or view that the Reflections are defined on. See [Refresh the Reflections on a Table](/dremio-cloud/api/catalog/table#refresh-the-reflections-on-a-table) and [Refresh the Reflections on a View](/dremio-cloud/api/catalog/view#refresh-the-reflections-on-a-view).
3672
+ * With the [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table) and [`ALTER VIEW`](/dremio-cloud/sql/commands/alter-view) commands, you specify the path and name of the table or view that the Reflections are defined on.
3673
+
3674
+ The refresh action follows this logic for the Reflection API:
3675
+
3676
+ * If the Reflection is defined on a view, the action refreshes all Reflections that are defined on the tables and on downstream/dependent views that the anchor view is itself defined on.
3677
+ * If the Reflection is defined on a table, the action refreshes the Reflections that are defined on the table and all Reflections that are defined on the downstream/dependent views of the anchor table.
3678
+
3679
+ The refresh action follows similar logic for the Catalog API and the SQL commands:
3680
+
3681
+ * If the action is started on a view, it refreshes all Reflections that are defined on the tables and on downstream/dependent views that the view is itself defined on.
3682
+ * If the action is started on a table, it refreshes the Reflections that are defined on the table and all Reflections that are defined on the downstream/dependent views of the anchor table.
3683
+
3684
+ For example, suppose that you had the following tables and views, with Reflections R1 through R5 defined on them:
3685
+
3686
+ ```
3687
+ View2(R5)
3688
+ / \
3689
+ View1(R3) Table3(R4)
3690
+ / \
3691
+ Table1(R1) Table2(R2)
3692
+ ```
3693
+
3694
+ * Refreshing Reflection R5 through the API also refreshes R1, R2, R3, and R4.
3695
+ * Refreshing Reflection R4 through the API also refreshes R5.
3696
+ * Refreshing Reflection R3 through the API also refreshes R1, R2, and R5.
3697
+ * Refreshing Reflection R2 through the API also refreshes R3 and R5.
3698
+ * Refreshing Reflection R1 through the API also refreshes R3 and R5.
3699
+
3700
+ ## Obtain Reflection IDs
3701
+
3702
+ You will need one or more Reflection IDs for some of the Reflection hints. Reflection IDs can be found in three places: the Acceleration section of the raw profile of the job that ran a query using the Reflection, the [`SYS.PROJECT.REFLECTIONS`](/dremio-cloud/sql/system-tables/reflections) system table, and the Reflection summary objects that you retrieve with the Reflection API.
3703
+
3704
+ To find the ID of a Reflection in Acceleration section of the raw profile of job that ran a query that used the Reflection:
3705
+
3706
+ 1. In the Dremio console, click ![The Jobs icon](/images/jobs-icon.png "The Jobs icon") in the side navigation bar.
3707
+ 2. In the list of jobs, locate the job that ran the query. If the query was satisfied by a Reflection, ![This is the icon that indicates a Reflection was used.](/images/icons/reflections.png "Reflections icon") appears after the name of the user who ran the query.
3708
+ 3. Click the ID of the job.
3709
+ 4. Click **Raw Profile** at the top of the page.
3710
+ 5. Click the **Acceleration** tab.
3711
+ 6. In the Reflection Outcome section, locate the ID of the Reflection.
3712
+
3713
+
3714
+
3715
+ To find the ID of a Reflection in the `SYS.PROJECT.REFLECTIONS` system table:
3716
+
3717
+ 1. In the Dremio console, click ![The SQL Runner icon](/images/sql-runner-icon.png "The SQL Runner icon") in the left navbar.
3718
+ 2. Copy this query and paste it into the SQL editor:
3719
+
3720
+ Query for listing info about all existing Reflections
3721
+
3722
+ ```
3723
+ SELECT * FROM SYS.PROJECT.REFLECTIONS
3724
+ ```
3725
+ 3. Sort the results on the `dataset_name` column.
3726
+ 4. In the `dataset_name` column, locate the name of the dataset that the Reflection was defined on.
3727
+ 5. Scroll the table to the right to look through the display columns, dimensions, measures, sort columns, and partition columns to find the combination of attributes that define the Reflection.
3728
+ 6. Scroll the table all the way to the left to find the ID of the Reflection.
3729
+
3730
+
3731
+
3732
+ To find the ID of a Reflection by using REST APIs:
3733
+
3734
+ 1. Obtain the ID of the table or view that the Reflection was defined on by using retrieving either the [table](/dremio-cloud/api/catalog/table#retrieve-a-table-by-path) or [view](/dremio-cloud/api/catalog/view#retrieve-a-view-by-path) by its path.
3735
+ 2. [Use the Reflections API to retrieve a list of all of the Reflections that are defined on the table or view](/dremio-cloud/api/reflection/#retrieve-all-reflections-for-a-dataset).
3736
+ 3. In the response, locate the Reflection by its combination of attributes.
3737
+ 4. Copy the Reflection's ID.
3738
+
3739
+ Was this page helpful?
3740
+
3741
+ * Types of Reflection Refresh
3742
+ + Apache Iceberg Tables, Filesystem Sources, AWS Glue Sources, and Hive Sources
3743
+ + Delta Lake tables
3744
+ + All Other Tables
3745
+ * Specify the Reflection Refresh Policy
3746
+ + Types of Refresh Policies
3747
+ * Set the Reflection Expiration Policy
3748
+ * View the Reflection Refresh History
3749
+ * Retry a Reflection Refresh Policy
3750
+ * Trigger Reflection Refreshes
3751
+ * Obtain Reflection IDs
3752
+
3753
+ <div style="page-break-after: always;"></div>
3754
+