dremiojs 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.eslintrc.json +14 -0
- package/.prettierrc +7 -0
- package/README.md +59 -0
- package/dremiodocs/dremio-cloud/cloud-api-reference.md +748 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-about.md +225 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-admin.md +3754 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-bring-data.md +6098 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-changelog.md +32 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-developer.md +1147 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-explore-analyze.md +2522 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-get-started.md +300 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-help-support.md +869 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-manage-govern.md +800 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-overview.md +36 -0
- package/dremiodocs/dremio-cloud/dremio-cloud-security.md +1844 -0
- package/dremiodocs/dremio-cloud/sql-docs.md +7180 -0
- package/dremiodocs/dremio-software/dremio-software-acceleration.md +1575 -0
- package/dremiodocs/dremio-software/dremio-software-admin.md +884 -0
- package/dremiodocs/dremio-software/dremio-software-client-applications.md +3277 -0
- package/dremiodocs/dremio-software/dremio-software-data-products.md +560 -0
- package/dremiodocs/dremio-software/dremio-software-data-sources.md +8701 -0
- package/dremiodocs/dremio-software/dremio-software-deploy-dremio.md +3446 -0
- package/dremiodocs/dremio-software/dremio-software-get-started.md +848 -0
- package/dremiodocs/dremio-software/dremio-software-monitoring.md +422 -0
- package/dremiodocs/dremio-software/dremio-software-reference.md +677 -0
- package/dremiodocs/dremio-software/dremio-software-security.md +2074 -0
- package/dremiodocs/dremio-software/dremio-software-v25-api.md +32637 -0
- package/dremiodocs/dremio-software/dremio-software-v26-api.md +36757 -0
- package/jest.config.js +10 -0
- package/package.json +25 -0
- package/src/api/catalog.ts +74 -0
- package/src/api/jobs.ts +105 -0
- package/src/api/reflection.ts +77 -0
- package/src/api/source.ts +61 -0
- package/src/api/user.ts +32 -0
- package/src/client/base.ts +66 -0
- package/src/client/cloud.ts +37 -0
- package/src/client/software.ts +73 -0
- package/src/index.ts +16 -0
- package/src/types/catalog.ts +31 -0
- package/src/types/config.ts +18 -0
- package/src/types/job.ts +18 -0
- package/src/types/reflection.ts +29 -0
- package/tests/integration_manual.ts +95 -0
- package/tsconfig.json +19 -0
|
@@ -0,0 +1,3754 @@
|
|
|
1
|
+
# Administration | Dremio Documentation
|
|
2
|
+
|
|
3
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/
|
|
4
|
+
|
|
5
|
+
On this page
|
|
6
|
+
|
|
7
|
+
Dremio administration covers organization-wide and project-level management. Use these tools to configure your environment, manage users and resources, and monitor system performance.
|
|
8
|
+
|
|
9
|
+
## Organization Management
|
|
10
|
+
|
|
11
|
+
* [Manage Your Subscription](/dremio-cloud/admin/subscription/) – Upgrade from a free trial to a paid subscription, manage billing and payment methods, and track your organization's usage and costs.
|
|
12
|
+
* [Manage Users](/dremio-cloud/admin/users) – Add users to your organization, configure authentication methods (local or SSO), manage user roles and privileges, and control access to Dremio resources.
|
|
13
|
+
* [Configure Model Providers](/dremio-cloud/admin/model-providers) – Configure AI model providers for Dremio's AI Agent, enabling natural language queries and data exploration across your organization.
|
|
14
|
+
|
|
15
|
+
## Project Management
|
|
16
|
+
|
|
17
|
+
* [Manage Projects](/dremio-cloud/admin/projects/) – Create new projects to isolate compute and data resources for different teams. Configure storage options (Dremio-managed or your own S3 bucket) and manage project-level settings.
|
|
18
|
+
* [Manage Engines](/dremio-cloud/admin/engines/) – Set up and configure query engines that provide the compute resources for running queries. Choose engine sizes, configure auto-scaling, and manage multiple engine replicas for your projects.
|
|
19
|
+
* [Configure External Engines](/dremio-cloud/admin/external-engines) – Connect industry-standard engines like Apache Spark, Trino, and Apache Flink directly to Dremio without vendor lock-in or proprietary protocols.
|
|
20
|
+
* [Monitor Jobs and Audit Logs](/dremio-cloud/admin/monitor/) – Monitor system health, query performance, and resource utilization. View metrics, logs, and alerts to ensure your Dremio environment is running optimally.
|
|
21
|
+
* [Optimize Performance](/dremio-cloud/admin/performance/) – Improve query performance and resource efficiency through Reflection management and the results cache.
|
|
22
|
+
|
|
23
|
+
## Shared Responsibility Model
|
|
24
|
+
|
|
25
|
+
Dremio operates on a shared responsibility model. For detailed information about responsibilities in each area, download the [Dremio Shared Responsibility Model](https://docs-3063.dremio-documentation.pages.dev/assets/files/Dremio-Cloud-Shared-Responsibility-Model-15f76b24f0b48153532ca15b25d831c4.pdf).
|
|
26
|
+
|
|
27
|
+
Was this page helpful?
|
|
28
|
+
|
|
29
|
+
* Organization Management
|
|
30
|
+
* Project Management
|
|
31
|
+
* Shared Responsibility Model
|
|
32
|
+
|
|
33
|
+
<div style="page-break-after: always;"></div>
|
|
34
|
+
|
|
35
|
+
# Manage Users | Dremio Documentation
|
|
36
|
+
|
|
37
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/users
|
|
38
|
+
|
|
39
|
+
On this page
|
|
40
|
+
|
|
41
|
+
Manage user access to your Dremio organization through internal authentication or external identity providers. This page covers user types, account management, and administrative tasks.
|
|
42
|
+
|
|
43
|
+
All users in Dremio are identified by their email address, which serves as their username. Invitations are sent to users' email addresses to set up their accounts.
|
|
44
|
+
|
|
45
|
+
## User Types
|
|
46
|
+
|
|
47
|
+
Dremio supports two user types with different authentication and management workflows:
|
|
48
|
+
|
|
49
|
+
| Feature | Local Users | SSO Users |
|
|
50
|
+
| --- | --- | --- |
|
|
51
|
+
| **Authentication** | Password set in Dremio | Identity Provider (IdP) credentials |
|
|
52
|
+
| **Credential Management** | Within Dremio | Through your IdP |
|
|
53
|
+
| **Provisioning** | Manual invitation | Manual invitation or SCIM automated |
|
|
54
|
+
| **Password Reset** | Self-service or admin-initiated | Through IdP only |
|
|
55
|
+
|
|
56
|
+
### Local Users
|
|
57
|
+
|
|
58
|
+
Local users authenticate with passwords managed directly in Dremio. These users must be invited manually. Use local users when you need standalone accounts for contractors, external partners, or testing and development environments.
|
|
59
|
+
|
|
60
|
+
### SSO Users
|
|
61
|
+
|
|
62
|
+
SSO users authenticate through your organization's identity provider (IdP) like Microsoft Entra ID or Okta, or through social identity providers like Google or GitHub. These users can be invited manually or provisioned automatically via System for Cross-domain Identity Management (SCIM).
|
|
63
|
+
|
|
64
|
+
#### What is SCIM?
|
|
65
|
+
|
|
66
|
+
SCIM is an open standard protocol that automates user provisioning between your identity provider and Dremio. Instead of manually creating and managing users in multiple systems, SCIM keeps everything synchronized automatically. When you add, update, or remove a user in your IdP, those changes propagate to Dremio without manual intervention.
|
|
67
|
+
|
|
68
|
+
#### SCIM Provisioning Benefits
|
|
69
|
+
|
|
70
|
+
When SCIM is configured, Dremio stays synchronized with your IdP. Deleting a user in your IdP automatically reflects in Dremio. Additional benefits of SCIM integration include:
|
|
71
|
+
|
|
72
|
+
* Automatic user creation and deactivation
|
|
73
|
+
* Synchronized user attributes
|
|
74
|
+
* Centralized access management
|
|
75
|
+
|
|
76
|
+
To learn more:
|
|
77
|
+
|
|
78
|
+
* [Configure SCIM with Microsoft Entra ID](/dremio-cloud/security/authentication/idp/microsoft-entra-id)
|
|
79
|
+
* [Configure SCIM with Okta](/dremio-cloud/security/authentication/idp/okta)
|
|
80
|
+
* [Configure SCIM with a generic OIDC provider](/dremio-cloud/security/authentication/idp/generic-oidc-provider)
|
|
81
|
+
|
|
82
|
+
## Manage Your Account
|
|
83
|
+
|
|
84
|
+
### Update Your Password
|
|
85
|
+
|
|
86
|
+
**Local users** can reset passwords using either method:
|
|
87
|
+
|
|
88
|
+
**If locked out:**
|
|
89
|
+
|
|
90
|
+
1. On the login screen, enter your email.
|
|
91
|
+
2. Click **Forgot Password?**.
|
|
92
|
+
3. Check your email for the reset link.
|
|
93
|
+
|
|
94
|
+
**If logged in:**
|
|
95
|
+
|
|
96
|
+
1. Hover over the user icon at the bottom of the navigation sidebar.
|
|
97
|
+
2. Select **Account Settings**.
|
|
98
|
+
3. Click **Reset Password**.
|
|
99
|
+
4. Check your email for the reset link.
|
|
100
|
+
|
|
101
|
+
Changing your password ends all existing Dremio web sessions.
|
|
102
|
+
|
|
103
|
+
**SSO users** must reset passwords through their organization's identity provider. Contact your authentication administrator for assistance.
|
|
104
|
+
|
|
105
|
+
### Update Your Name
|
|
106
|
+
|
|
107
|
+
You can change your display name at any time:
|
|
108
|
+
|
|
109
|
+
1. Click the user icon on the side navigation bar.
|
|
110
|
+
2. Select **Account Settings**.
|
|
111
|
+
3. On the **General Information** page, edit **First Name** and **Last Name**.
|
|
112
|
+
4. Click **Save**.
|
|
113
|
+
|
|
114
|
+
## Administrative Tasks
|
|
115
|
+
|
|
116
|
+
The following tasks require administrator privileges or the [CREATE USER](/dremio-cloud/security/privileges#organization-privileges) privilege.
|
|
117
|
+
|
|
118
|
+
### View All Users
|
|
119
|
+
|
|
120
|
+
1. Click  on the left navigation bar and choose **Organization settings**.
|
|
121
|
+
2. Select **Users** in the organization settings sidebar.
|
|
122
|
+
|
|
123
|
+
The table displays all local and SSO users with access to your Dremio instance.
|
|
124
|
+
|
|
125
|
+
### Add a User
|
|
126
|
+
|
|
127
|
+
**SSO users** are added automatically when you configure [SCIM provisioning](/dremio-cloud/security/authentication/idp#scim).
|
|
128
|
+
|
|
129
|
+
**To add a local user:**
|
|
130
|
+
|
|
131
|
+
1. Click  on the left navigation bar and choose **Organization settings**.
|
|
132
|
+
2. Select **Users**.
|
|
133
|
+
3. Click **Add Users**.
|
|
134
|
+
4. In the **Email address(es)** field, enter one or more email addresses separated by commas, spaces, or line breaks.
|
|
135
|
+
5. For **Dremio Role**, select the [roles](/dremio-cloud/security/roles) where the user will be a member. All users are members of the PUBLIC role by default.
|
|
136
|
+
6. Click **Add**.
|
|
137
|
+
|
|
138
|
+
Each user receives an invitation email to set up their account. You can configure additional roles after users accept their invitations.
|
|
139
|
+
|
|
140
|
+
A user's email address serves as their unique identifier and cannot be changed after account creation. If a user's email changes, you must create a new account with the new email address.
|
|
141
|
+
|
|
142
|
+
If invited users don't receive the email, check spam folders and verify the email addresses are correct.
|
|
143
|
+
|
|
144
|
+
### Edit a User
|
|
145
|
+
|
|
146
|
+
You can modify a user's name and role assignments. Email addresses cannot be edited—if a user's email changes, you must create a new account.
|
|
147
|
+
|
|
148
|
+
1. Click  on the left navigation bar and choose **Organization settings**.
|
|
149
|
+
2. Select **Users**.
|
|
150
|
+
3. Hover over the user's row and click  to edit the user.
|
|
151
|
+
4. **Details tab:** Edit **First Name** and **Last Name**, then click **Save**.
|
|
152
|
+
5. **Roles tab:** Manage role assignments:
|
|
153
|
+
* **Add roles:** Search for and select roles, then click **Add Roles**.
|
|
154
|
+
* **Remove roles:** Hover over a role and click **Remove**.
|
|
155
|
+
6. Click **Save**.
|
|
156
|
+
|
|
157
|
+
### Reset a User's Password
|
|
158
|
+
|
|
159
|
+
This option is only available for local users. SSO users must reset passwords through their identity provider. To send a password reset email to a local user:
|
|
160
|
+
|
|
161
|
+
1. Click  on the left navigation bar and choose **Organization settings**.
|
|
162
|
+
2. Select **Users**.
|
|
163
|
+
3. Click the user's name.
|
|
164
|
+
4. Click **Send Password Reset**.
|
|
165
|
+
|
|
166
|
+
The user receives an immediate email with reset instructions.
|
|
167
|
+
|
|
168
|
+
### Remove a User
|
|
169
|
+
|
|
170
|
+
**To remove an SSO user:**
|
|
171
|
+
|
|
172
|
+
1. First, remove the user from your external identity provider.
|
|
173
|
+
2. Then follow the steps below to remove them from Dremio.
|
|
174
|
+
|
|
175
|
+
**To remove a local user:**
|
|
176
|
+
|
|
177
|
+
1. Click  on the left navigation bar and choose **Organization settings**.
|
|
178
|
+
2. Select **Users**.
|
|
179
|
+
3. Click the user's name.
|
|
180
|
+
4. Click  to remove.
|
|
181
|
+
5. Confirm the deletion.
|
|
182
|
+
|
|
183
|
+
## Related Topics
|
|
184
|
+
|
|
185
|
+
* [Roles](/dremio-cloud/security/roles)
|
|
186
|
+
* [Privileges](/dremio-cloud/security/privileges)
|
|
187
|
+
* [Configure Identity Providers](/dremio-cloud/security/authentication/idp/)
|
|
188
|
+
|
|
189
|
+
Was this page helpful?
|
|
190
|
+
|
|
191
|
+
* User Types
|
|
192
|
+
+ Local Users
|
|
193
|
+
+ SSO Users
|
|
194
|
+
* Manage Your Account
|
|
195
|
+
+ Update Your Password
|
|
196
|
+
+ Update Your Name
|
|
197
|
+
* Administrative Tasks
|
|
198
|
+
+ View All Users
|
|
199
|
+
+ Add a User
|
|
200
|
+
+ Edit a User
|
|
201
|
+
+ Reset a User's Password
|
|
202
|
+
+ Remove a User
|
|
203
|
+
* Related Topics
|
|
204
|
+
|
|
205
|
+
<div style="page-break-after: always;"></div>
|
|
206
|
+
|
|
207
|
+
# Manage Engines | Dremio Documentation
|
|
208
|
+
|
|
209
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/engines/
|
|
210
|
+
|
|
211
|
+
On this page
|
|
212
|
+
|
|
213
|
+
An engine is a Dremio entity that manages compute resources. Each engine has one or more replicas that are created for executing queries. An engine replica consists of a group of executor instances defined by the engine capacity.
|
|
214
|
+
|
|
215
|
+
When you signed up for Dremio, an organization and a project were automatically created. Each new project has a preview engine. The preview engine, by default, will scale down after 1 hour without a query. As the name suggests, it provides previews of queries and datasets. Unlike other engines, the preview engine cannot be disabled.
|
|
216
|
+
|
|
217
|
+
If an engine is created with a minimum replica of 0, it remains idle until the first query runs. No executor instances run initially. When you run a query, Dremio allocates executors to your project and starts the engine. Engines automatically start and stop based on query load.
|
|
218
|
+
|
|
219
|
+
## Sizes
|
|
220
|
+
|
|
221
|
+
Dremio provides a standard executor, which is used in all of our query engine sizes. Query engine sizes are differentiated by the number of executors in a replica. For each size, Dremio provides a default query concurrency, as shown in the table below.
|
|
222
|
+
|
|
223
|
+
| Replica Size | Executors per Replica | DCUs | Default Concurrency | Max Concurrency |
|
|
224
|
+
| --- | --- | --- | --- | --- |
|
|
225
|
+
| 2XSmall | 1 | 14 | 2 | 20 |
|
|
226
|
+
| XSmall | 1 | 30 | 4 | 40 |
|
|
227
|
+
| Small | 2 | 60 | 6 | 60 |
|
|
228
|
+
| Medium | 4 | 120 | 8 | 80 |
|
|
229
|
+
| Large | 8 | 240 | 10 | 100 |
|
|
230
|
+
| XLarge | 16 | 480 | 12 | 120 |
|
|
231
|
+
| 2XLarge | 32 | 960 | 16 | 160 |
|
|
232
|
+
| 3XLarge | 64 | 1920 | 20 | 200 |
|
|
233
|
+
|
|
234
|
+
## States
|
|
235
|
+
|
|
236
|
+
An engine can be in one of the following states.
|
|
237
|
+
|
|
238
|
+
| State | Icon | Description |
|
|
239
|
+
| --- | --- | --- |
|
|
240
|
+
| Running | | Represents an enabled engine (replicas are provisioned automatically or running as per the minimum number of replicas configured). You can use this engine for running queries. |
|
|
241
|
+
| Adding Replica | | Represents an engine that is scaling up (adding a replica). |
|
|
242
|
+
| Removing Replica | | Represents an engine that is scaling down (removing a replica). |
|
|
243
|
+
| Disabling | | Represents an engine that is being disabled. |
|
|
244
|
+
| Disabled | | Represents a disabled engine (no engine replicas have been provisioned dynamically or there are no active replicas). You cannot use this engine for running queries. |
|
|
245
|
+
| Starting Engine | | Represents an engine that is starting (transitioning from the disabled state to the enabled state). |
|
|
246
|
+
| Stopping Engine | | Represents an engine that is stopping (transitioning from the enabled state to the disabled state). |
|
|
247
|
+
| Stopped | | Represents an enabled engine that has been stopped (zero replicas running). |
|
|
248
|
+
| Deleting | | Represents an engine that is being deleted. |
|
|
249
|
+
|
|
250
|
+
## Autoscaling
|
|
251
|
+
|
|
252
|
+
The autoscaling capability dynamically manages query workload for you based on parameters that you set for the engine. Engine replicas are started and stopped as required to provide a seamless query execution by monitoring the engine replica health.
|
|
253
|
+
|
|
254
|
+
The following table describes the engine parameters along with their role in autoscaling.
|
|
255
|
+
|
|
256
|
+
| Parameter | Description |
|
|
257
|
+
| --- | --- |
|
|
258
|
+
| **Size** | The number of executors that make up an engine replica. |
|
|
259
|
+
| **Max Concurrency** | Maximum number of jobs that can be run concurrently on an engine replica. |
|
|
260
|
+
| **Last Replica Auto-Stop** | Time to wait before deleting the last replica if the engine is not in use. Not valid when the minimum engine replicas is 1 or higher. The default value is 2 hours. |
|
|
261
|
+
| **Enqueued Time Limit** | If there are no available resources, the query waits for a period of time that is set by this parameter. When this time limit exceeds, the query gets canceled. You are notified with the timeout during slot reservation error if the query gets canceled due to the query time limit being exceeded. The default value is 5 minutes. |
|
|
262
|
+
| **Query Runtime Limit** | Time a query can run before it is canceled. The default value is 5 minutes. |
|
|
263
|
+
| **Drain Time Limit** | Time until an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and the running queries fail. The default value is 30 minutes. If there are no queries running on a replica, the engine is terminated without waiting for the drain time limit. |
|
|
264
|
+
|
|
265
|
+
For a query that is submitted to execute on an engine, the control plane assigns an engine replica to that query. Replicas are dynamically created and assigned to queries based on the query workload. The control plane observes the query workload and current active engine replicas to determine whether to scale up or scale down replicas. Replica is assigned to the query until the query execution is done. For a given engine, Dremio Cloud does not scale up replicas beyond the configured maximum replicas and it does not scale them down below the configured minimum replicas.
|
|
266
|
+
|
|
267
|
+
### Monitor Engine Health
|
|
268
|
+
|
|
269
|
+
The Dremio Cloud control plane monitors the engines health and manages unhealthy replicas to provide a seamless query execution experience. The replica nodes send periodic heartbeats to the control plane, which determines their liveness. If a periodic heartbeat is not returned from a replica node, the control plane marks that node as unhealthy and replaces it with a healthy one.
|
|
270
|
+
|
|
271
|
+
## View All Engines
|
|
272
|
+
|
|
273
|
+
To view engines:
|
|
274
|
+
|
|
275
|
+
1. In the Dremio Cloud application, click the Project Settings  icon in the side navigation bar.
|
|
276
|
+
2. Select **Engines** in the project settings sidebar to see the list of engines in the project. On the **Engines** page, you can also see engines as per the status. Click the **Status** dropdown list to see the different statuses.
|
|
277
|
+
|
|
278
|
+
## Add an Engine
|
|
279
|
+
|
|
280
|
+
To add a new engine:
|
|
281
|
+
|
|
282
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The **Engines** page lists the engines created for the project. Every engine created in a project is created in the cloud account associated with that project.
|
|
283
|
+
2. Click the **Add Engine** button on the top-right of the **Engines** page to create a new engine.
|
|
284
|
+
3. In the **Add Engine** dialog, for **Engine**, enter a name.
|
|
285
|
+
4. (Optional) For **Description**, enter a description.
|
|
286
|
+
5. (Optional) For **Size**, select the size of the engine. The size designates the number of executors.
|
|
287
|
+
6. (Optional) For **Max Concurrency per Replica**, enter the maximum number of jobs that can be run concurrently on this engine.
|
|
288
|
+
|
|
289
|
+
The following parameters are for **Engine Replicas**:
|
|
290
|
+
|
|
291
|
+
7. For **Min Replicas**, enter the minimum number of engine replicas that Dremio Cloud has running at any given time. For auto-stop, set it to 0. To guarantee low-latency query execution, set it to 1 or higher. The default number of minimum replicas is 0.
|
|
292
|
+
8. For **Max Replicas**, enter the maximum number of engine replicas that Dremio Cloud scales up to. The default number of maximum replicas is 1.
|
|
293
|
+
|
|
294
|
+
tip
|
|
295
|
+
|
|
296
|
+
You can use these settings to control costs and ensure that excessive replicas are not spun up.
|
|
297
|
+
|
|
298
|
+
10. Under **Advanced Configuration**. For **Last Replica Auto-Stop**, enter the time to wait before deleting the last replica if engine is not in use. The default value is 2 hours, and the minimum value is 1 minute.
|
|
299
|
+
|
|
300
|
+
note
|
|
301
|
+
|
|
302
|
+
The last replica auto stop is not valid when the minimum number of engine replicas is 1 or higher.
|
|
303
|
+
|
|
304
|
+
The following parameters are for **Time Limit**:
|
|
305
|
+
|
|
306
|
+
11. For **Enable Enqueued Time Limit**, check the box.
|
|
307
|
+
12. For **Enqueued Time Limit**, enter the time a query waits before being cancelled. The default value is 5 minutes.
|
|
308
|
+
|
|
309
|
+
caution
|
|
310
|
+
|
|
311
|
+
You should not set the enqueued time limit to less than one minutes, which is the typical time to start a new replica. Changing this setting does not affect queries that are currently running or queued.
|
|
312
|
+
|
|
313
|
+
13. (Optional) For **Enable Query Time Limit**, check the box to enable the query time limit for making a query run before it is canceled.
|
|
314
|
+
14. (Optional) For **Query Runtime Limit**, enter the time a query can run before it is canceled. The default query runtime limit is 5 minutes.
|
|
315
|
+
15. For **Drain Time Limit**, enter the time (in minutes) that an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and the running queries fail. The default value is 30 minutes. If there are no queries running on a replica, the engine is terminated without waiting for the drain time limit.
|
|
316
|
+
16. Click **Save and Launch**. This action saves the configuration, enables this engine, and allocates the executors.
|
|
317
|
+
|
|
318
|
+
## Edit an Engine
|
|
319
|
+
|
|
320
|
+
To edit an engine:
|
|
321
|
+
|
|
322
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar.
|
|
323
|
+
2. On the **Engines** page, hover over the row of the engine that you want to edit and click on the Edit Engine  icon that appears next to the engine. The **Edit Engine** dialog is opened.
|
|
324
|
+
|
|
325
|
+
Alternatively, you can click the engine to go to the engine's page. Click the **Edit Engine** button on the top-right of the page.
|
|
326
|
+
|
|
327
|
+
note
|
|
328
|
+
|
|
329
|
+
You cannot edit the **Engine name** parameter.
|
|
330
|
+
|
|
331
|
+
3. For **Description**, enter a description.
|
|
332
|
+
4. For **Size**, select the size of the engine. The size designates the number of executors.
|
|
333
|
+
5. For **Max Concurrency per Replica**, enter the maximum number of jobs that can be run concurrently on this engine.
|
|
334
|
+
|
|
335
|
+
The following parameters are for **Engine Replicas**:
|
|
336
|
+
|
|
337
|
+
6. For **Min Replicas**, enter the number of engine replicas that Dremio has running at any given time. Set this value to 0 to enable auto-stop, or to 1 or higher to ensure low-latency query execution.
|
|
338
|
+
7. For **Max Replicas**, enter the maximum number of engine replicas that Dremio scales up to.
|
|
339
|
+
8. Under **Advanced Configuration**. **Last Replica Auto-Stop**, enter the time to wait before deleting the last replica if the engine is not in use. The default value is 2 hours.
|
|
340
|
+
|
|
341
|
+
note
|
|
342
|
+
|
|
343
|
+
The last replica auto-stop is not valid when the minimum number of engine replicas is 1 or higher.
|
|
344
|
+
|
|
345
|
+
The following parameters are for **Time Limit**:
|
|
346
|
+
|
|
347
|
+
10. For **Enable Enqueued Time Limit**, check the box.
|
|
348
|
+
11. For **Enqueued Time Limit**, enter the time a query waits before being canceled. The default value is 5 minutes.
|
|
349
|
+
|
|
350
|
+
caution
|
|
351
|
+
|
|
352
|
+
You should not set the enqueued time limit to less than one minutes, which is the typical time to start a new replica. Changing this setting does not affect queries that are currently running or queued.
|
|
353
|
+
|
|
354
|
+
12. (Optional) For **Enable Query Time Limit**, check the box to enable the query time limit for making a query run before it is canceled.
|
|
355
|
+
13. (Optional) For **Query Runtime Limit**, enter the time a query can run before it is canceled. The default query runtime limit is 5 minutes.
|
|
356
|
+
14. For **Drain Time Limit**, enter the time (in minutes) that an engine replica continues to run after the engine is resized, disabled, or deleted before it is terminated and any running queries fail. The default value is 30 minutes. If no queries are running on a replica, the engine is terminated without waiting for the drain time limit.
|
|
357
|
+
15. Click **Save**.
|
|
358
|
+
|
|
359
|
+
## Disable an Engine
|
|
360
|
+
|
|
361
|
+
You can disable an engine that is not being used:
|
|
362
|
+
|
|
363
|
+
To disable the engine:
|
|
364
|
+
|
|
365
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
|
|
366
|
+
2. Disable the engine by using the toggle in the **Enabled** column.
|
|
367
|
+
3. Confirm that you want to disable the engine.
|
|
368
|
+
|
|
369
|
+
## Enable an Engine
|
|
370
|
+
|
|
371
|
+
To enable a disabled engine:
|
|
372
|
+
|
|
373
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
|
|
374
|
+
2. Enable the engine by using the toggle in the **Enabled** column.
|
|
375
|
+
3. Confirm that you want to enable the engine.
|
|
376
|
+
|
|
377
|
+
## Delete an Engine
|
|
378
|
+
|
|
379
|
+
You can permanently delete an engine if it is not in use (this action is irreversible). If queries are running on the engine, then Dremio waits for the drain-time-limit for the running queries to complete before deleting the engine.
|
|
380
|
+
|
|
381
|
+
caution
|
|
382
|
+
|
|
383
|
+
An engine that has a routing rule associated with it cannot be deleted. Delete the rules before deleting the engine.
|
|
384
|
+
|
|
385
|
+
To delete an engine:
|
|
386
|
+
|
|
387
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
|
|
388
|
+
2. On the **Engines** page, hover over the row of the engine that you want to delete and click the Delete  icon that appears next to the engine.
|
|
389
|
+
3. Confirm that you want to delete the engine.
|
|
390
|
+
|
|
391
|
+
## Troubleshoot
|
|
392
|
+
|
|
393
|
+
If your engines are not scaling up or down as expected, you can reference the engine events to see the error that is causing the issue.
|
|
394
|
+
|
|
395
|
+
To view engine events:
|
|
396
|
+
|
|
397
|
+
1. On the **Project Settings** page, select **Engines** in the project settings sidebar. The list of engines in this project are displayed.
|
|
398
|
+
2. On the **Engines** page, click on the engine that you want to investigate.
|
|
399
|
+
3. In the engine details page, click on the **Events** tab to view the scaling events and status of each event.
|
|
400
|
+
4. If any scaling problems persist, contact [Dremio Support](https://support.dremio.com/).
|
|
401
|
+
|
|
402
|
+
Was this page helpful?
|
|
403
|
+
|
|
404
|
+
* Sizes
|
|
405
|
+
* States
|
|
406
|
+
* Autoscaling
|
|
407
|
+
+ Monitor Engine Health
|
|
408
|
+
* View All Engines
|
|
409
|
+
* Add an Engine
|
|
410
|
+
* Edit an Engine
|
|
411
|
+
* Disable an Engine
|
|
412
|
+
* Enable an Engine
|
|
413
|
+
* Delete an Engine
|
|
414
|
+
* Troubleshoot
|
|
415
|
+
|
|
416
|
+
<div style="page-break-after: always;"></div>
|
|
417
|
+
|
|
418
|
+
# Monitor | Dremio Documentation
|
|
419
|
+
|
|
420
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/
|
|
421
|
+
|
|
422
|
+
On this page
|
|
423
|
+
|
|
424
|
+
As an administrator, you can monitor catalog usage and jobs in the Dremio console. You can also use the Dremio APIs and SQL to retrieve information about jobs and events for the projects in your organization.
|
|
425
|
+
|
|
426
|
+
### Monitor the Dremio Console
|
|
427
|
+
|
|
428
|
+
The Monitor page in the Dremio console allows you to monitor usage across your project, making it easier to observe patterns, analyze the resources being consumed by your data platform, and understand the impact on your users. You must be a member of the `ADMIN` role to access the Monitor page.
|
|
429
|
+
|
|
430
|
+
#### Catalog Usage
|
|
431
|
+
|
|
432
|
+
The data visualizations on the Monitor page point you to the most queried data and folders in a catalog.
|
|
433
|
+
|
|
434
|
+
Go to **Settings** > **Monitor** to view your catalog usage. When you open the Monitor page, you are directed to the Catalog Usage tab by default where you can see the following metrics:
|
|
435
|
+
|
|
436
|
+
* A table of the top 10 most queried datasets within the specified time range, including for each the number of linked jobs, the percentage of linked jobs in which the dataset was accelerated, and the total number of Reflections defined on the dataset
|
|
437
|
+
* A table of the top 10 most queried source folders within the specified time range, including for each the number of linked jobs and the top users of that folder
|
|
438
|
+
|
|
439
|
+
note
|
|
440
|
+
|
|
441
|
+
A source can be listed in the top 10 most queried source folders if the source contains a child dataset that was used in the query (for example, `postgres.accounts`). Queries of datasets in sub-folders (for example, `s3.mybucket.iceberg_table`) are classified by the sub-folder and not the source.
|
|
442
|
+
|
|
443
|
+
All datasets are assessed in the metrics on the Monitor page except for datasets in the [system tables](/dremio-cloud/sql/system-tables/) and the [information schema](/dremio-cloud/sql/information-schema/).
|
|
444
|
+
|
|
445
|
+
The metrics on the Monitor page analyze only user queries. Refreshes of data Reflections and metadata refreshes are excluded.
|
|
446
|
+
|
|
447
|
+
#### Jobs
|
|
448
|
+
|
|
449
|
+
The data visualizations on the Monitor page show the metrics for queries executed in your project, including statistics about performance and utilization.
|
|
450
|
+
|
|
451
|
+
Go to **Settings** > **Monitor** > **Jobs** to open the Jobs tab and see an aggregate view of the following metrics for the jobs that are running in your project:
|
|
452
|
+
|
|
453
|
+
* A report of today's job count and failed/canceled rate in comparison to yesterday's metrics
|
|
454
|
+
* A list of the top 10 most active users within the specified time range, including the number of linked jobs for each user
|
|
455
|
+
* Total jobs accelerated, total job time saved, and average job speedup from Autonomous Reflections over the past month
|
|
456
|
+
* Total number of jobs accelerated by autonomous and manual Reflections over time
|
|
457
|
+
* A graph showing the total number of completed and failed jobs over time (aggregated hourly or daily)
|
|
458
|
+
* A graph of all completed and failed jobs according to their engine (aggregated hourly or daily)
|
|
459
|
+
* A graph of all job states showing the percentage of time consumed for each [state](/dremio-cloud/admin/monitor/jobs#job-states-and-statuses) (aggregated hourly or daily)
|
|
460
|
+
* A table of the top 10 longest running jobs within the specified time range, including the linked ID, duration, user, query type, and start time of each job
|
|
461
|
+
|
|
462
|
+
To examine all jobs and the details of specific jobs, see [Viewing Jobs](/dremio-cloud/admin/monitor/jobs).
|
|
463
|
+
|
|
464
|
+
You can create reports of jobs in other BI tools by leveraging the [`sys.project.history.jobs` table](/dremio-cloud/sql/system-tables/jobs-historical).
|
|
465
|
+
|
|
466
|
+
### Monitor with Dremio APIs and SQL
|
|
467
|
+
|
|
468
|
+
Administrators can use the Dremio APIs and SQL to retrieve information about the jobs and events in every project in the organization. This information is useful for further monitoring and analysis.
|
|
469
|
+
|
|
470
|
+
Before you begin, make sure that you are assigned to the ADMIN role for the organization whose information you want to retrieve. You also need a [personal access token (PAT)](/dremio-cloud/security/authentication/personal-access-token#create-a-pat) to make the necessary API requests.
|
|
471
|
+
|
|
472
|
+
The code examples in this section are written in Python.
|
|
473
|
+
|
|
474
|
+
The procedure below provides individual code examples for retrieving project IDs, retrieving information for jobs and events, saving query results to Parquet files, and uploading the Parquet files to an AWS S3 bucket. See the combined example for a single code example that combines all of the steps.
|
|
475
|
+
|
|
476
|
+
1. Get the IDs for all projects in the organization. In the code example for this step, the `get_projects` method uses the [Projects](/dremio-cloud/api/projects) API to get the project IDs.
|
|
477
|
+
|
|
478
|
+
note
|
|
479
|
+
|
|
480
|
+
In the following code example, replace `<personal_access_token>` with your PAT.
|
|
481
|
+
|
|
482
|
+
To use the API control plane for the EU rather than the US, replace `https://api.dremio.cloud/` with `https://api.eu.dremio.cloud/`.
|
|
483
|
+
|
|
484
|
+
Get the IDs for all projects
|
|
485
|
+
|
|
486
|
+
```
|
|
487
|
+
import requests
|
|
488
|
+
import json
|
|
489
|
+
|
|
490
|
+
dremio_server = "https://api.dremio.cloud/"
|
|
491
|
+
personal_access_token = "<personal_access_token>"
|
|
492
|
+
|
|
493
|
+
headers = {
|
|
494
|
+
'Authorization': "Bearer " + personal_access_token,
|
|
495
|
+
'Content-Type': "application/json"
|
|
496
|
+
}
|
|
497
|
+
|
|
498
|
+
def api_get(endpoint: str) -> Response:
|
|
499
|
+
return requests.get(f'{dremio_server}/{endpoint}', headers=headers)
|
|
500
|
+
|
|
501
|
+
def get_projects() -> dict:
|
|
502
|
+
"""
|
|
503
|
+
Get all projects in the Dremio Cloud organization
|
|
504
|
+
:return: Dictionary of project IDs and project names
|
|
505
|
+
"""
|
|
506
|
+
projects = dict()
|
|
507
|
+
projects_response = api_get('v0/projects')
|
|
508
|
+
for project in projects_response.json():
|
|
509
|
+
projects[project['id']] = project['name']
|
|
510
|
+
return projects
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
2. Run a SQL query to get the jobs or events for the project. The code examples for this step show how to use the [SQL](/dremio-cloud/api/sql) API to submit a SQL query, get all jobs during a specific period with the `get_jobs` method, and get all events in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) system table during a specific period with the `get_events` method.
|
|
514
|
+
|
|
515
|
+
Submit SQL query using the API
|
|
516
|
+
|
|
517
|
+
```
|
|
518
|
+
def api_post(endpoint: str, body=None) -> Response:
|
|
519
|
+
return requests.post(f'{dremio_server}/{endpoint}',
|
|
520
|
+
headers=headers, data=json.dumps(body))
|
|
521
|
+
|
|
522
|
+
def run_sql(project_id: str, query: str) -> str:
|
|
523
|
+
"""
|
|
524
|
+
Run a SQL query
|
|
525
|
+
:param project_id: project ID
|
|
526
|
+
:param query: SQL query
|
|
527
|
+
:return: query job ID
|
|
528
|
+
"""
|
|
529
|
+
query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
|
|
530
|
+
job_id = query_response.json()['id']
|
|
531
|
+
return job_id
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
Get all jobs in the project during a specific period
|
|
535
|
+
|
|
536
|
+
```
|
|
537
|
+
def api_post(endpoint: str, body=None) -> Response:
|
|
538
|
+
return requests.post(f'{dremio_server}/{endpoint}',
|
|
539
|
+
headers=headers, data=json.dumps(body))
|
|
540
|
+
|
|
541
|
+
def get_jobs(project_id: str, start_time: str, end_time: str) -> str:
|
|
542
|
+
"""
|
|
543
|
+
Run SQL query to get all jobs in a project during the specified time period
|
|
544
|
+
:param project_id: project ID
|
|
545
|
+
:param start_time: start timestamp (inclusive)
|
|
546
|
+
:param end_time: end timestamp (exclusive)
|
|
547
|
+
:return: query job ID
|
|
548
|
+
"""
|
|
549
|
+
query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
|
|
550
|
+
job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.jobs '
|
|
551
|
+
f'WHERE "submitted_ts" >= \'{start_time}\' '
|
|
552
|
+
f'AND "submitted_ts" < \'{end_time}\'')
|
|
553
|
+
return job_id
|
|
554
|
+
```
|
|
555
|
+
|
|
556
|
+
Get all events during a specific period
|
|
557
|
+
|
|
558
|
+
```
|
|
559
|
+
def get_events(project_id: str, start_time: str, end_time: str) -> str:
|
|
560
|
+
"""
|
|
561
|
+
Run SQL query to get all events in sys.project.history.events during the specified time period
|
|
562
|
+
:param project_id: project ID
|
|
563
|
+
:param start_time: start timestamp (inclusive)
|
|
564
|
+
:param end_time: end timestamp (exclusive)
|
|
565
|
+
:return: query job ID
|
|
566
|
+
"""
|
|
567
|
+
job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.events '
|
|
568
|
+
f'WHERE "timestamp" >= \'{start_time}\' '
|
|
569
|
+
f'AND "timestamp" < \'{end_time}\'')
|
|
570
|
+
return job_id
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
3. Check the status of the query to get jobs or events. In the code example for this step, the `wait_for_job_complete` method periodically checks and returns the query job state and prints out the final job status when the query is complete.
|
|
574
|
+
|
|
575
|
+
Check status of the query to get jobs or events
|
|
576
|
+
|
|
577
|
+
```
|
|
578
|
+
def wait_for_job_complete(project_id: str, job_id: str) -> str:
|
|
579
|
+
"""
|
|
580
|
+
Wait for a query job to complete
|
|
581
|
+
:param project_id: project ID
|
|
582
|
+
:param job_id: job ID
|
|
583
|
+
:return: if the job completed successfully, True; otherwise, False
|
|
584
|
+
"""
|
|
585
|
+
while True:
|
|
586
|
+
time.sleep(1)
|
|
587
|
+
job = api_get(f'v0/projects/{project_id}/job/{job_id}')
|
|
588
|
+
job_state = job.json()["jobState"]
|
|
589
|
+
if job_state == 'COMPLETED':
|
|
590
|
+
print("Job complete.")
|
|
591
|
+
break
|
|
592
|
+
elif job_state == 'FAILED':
|
|
593
|
+
print("Job failed.", job.json()['errorMessage'])
|
|
594
|
+
break
|
|
595
|
+
elif job_state == 'CANCELED':
|
|
596
|
+
print("Job canceled.")
|
|
597
|
+
break
|
|
598
|
+
|
|
599
|
+
return job_state
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
4. Download the result for the query to get jobs or events and save it to a Parquet file. In the code example for this step, the `save_job_results_to_parquet` method downloads the query result and, if the result contains at least one row, saves the result to a single Parquet file.
|
|
603
|
+
|
|
604
|
+
Download query result and save to a Parquet file
|
|
605
|
+
|
|
606
|
+
```
|
|
607
|
+
def save_job_results_to_parquet(project_id: str, job_id: str,
|
|
608
|
+
parquet_file_name: str) -> bool:
|
|
609
|
+
"""
|
|
610
|
+
Download the query result and save it to a Parquet file
|
|
611
|
+
:param project_id: project ID
|
|
612
|
+
:param job_id: query job ID
|
|
613
|
+
:param parquet_file_name: file name to save the job result
|
|
614
|
+
:return: if the query returns more than 0 rows and parquet file is saved, True; otherwise False
|
|
615
|
+
"""
|
|
616
|
+
offset = 0
|
|
617
|
+
rows_downloaded = 0
|
|
618
|
+
rows = []
|
|
619
|
+
while True:
|
|
620
|
+
job_result = api_get(f'v0/projects/{project_id}/job/{job_id}/'
|
|
621
|
+
f'results/?offset={offset}&limit=500')
|
|
622
|
+
job_result_json = job_result.json()
|
|
623
|
+
row_count = job_result_json['rowCount']
|
|
624
|
+
rows_downloaded += len(job_result_json['rows'])
|
|
625
|
+
rows += job_result_json['rows']
|
|
626
|
+
if rows_downloaded >= row_count:
|
|
627
|
+
break
|
|
628
|
+
offset += 500
|
|
629
|
+
|
|
630
|
+
print(rows_downloaded, "rows")
|
|
631
|
+
if rows_downloaded > 0:
|
|
632
|
+
py_rows = pyarrow.array(job_result_json['rows'])
|
|
633
|
+
table = pyarrow.Table.from_struct_array(py_rows)
|
|
634
|
+
pyarrow.parquet.write_table(table, parquet_file_name)
|
|
635
|
+
return True
|
|
636
|
+
|
|
637
|
+
return False
|
|
638
|
+
```
|
|
639
|
+
|
|
640
|
+
5. If desired, you can use the [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) library to upload the Parquet file to an AWS S3 bucket.
|
|
641
|
+
|
|
642
|
+
Upload Parquet file to AWS S3 with Boto3 library
|
|
643
|
+
|
|
644
|
+
```
|
|
645
|
+
def upload_file(file_name: str, bucket: str, folder: str):
|
|
646
|
+
"""Upload Parquet file to an S3 bucket with Boto3
|
|
647
|
+
:param file_name: File to upload
|
|
648
|
+
:param bucket: Bucket to upload to
|
|
649
|
+
:param folder: Folder to upload to
|
|
650
|
+
:return: True if file was uploaded, else False
|
|
651
|
+
"""
|
|
652
|
+
|
|
653
|
+
# Upload the file
|
|
654
|
+
s3_client = boto3.client('s3')
|
|
655
|
+
try:
|
|
656
|
+
response = s3_client.upload_file(file_name, bucket, f'{folder}/{file_name}')
|
|
657
|
+
except ClientError as e:
|
|
658
|
+
print(e)
|
|
659
|
+
return False
|
|
660
|
+
return True
|
|
661
|
+
```
|
|
662
|
+
|
|
663
|
+
#### Combined Example
|
|
664
|
+
|
|
665
|
+
The following code example combines the steps above to get all jobs and events from all projects during a specific period, save the query results to Parquet files, and upload the Parquet files to an AWS S3 bucket. The parameter `start` is the start timestamp (inclusive) and the parameter `end` is the end timestamp (exclusive).
|
|
666
|
+
|
|
667
|
+
All jobs in each project during the specified time period are saved in an individual Parquet file with file name `jobs_<project_id><start>.parquet`. All events in each project during the specified time period are saved in one Parquet file with file name `events_<project_id><start>.parquet`.
|
|
668
|
+
|
|
669
|
+
Combine all steps in a single code example
|
|
670
|
+
|
|
671
|
+
```
|
|
672
|
+
def main(start: str, end: str):
|
|
673
|
+
"""
|
|
674
|
+
Get all jobs and events from all projects during the specified time period, save the results in Parquet files, and upload the files to an AWS S3 bucket.
|
|
675
|
+
:param start: start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
|
|
676
|
+
:param end: end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
|
|
677
|
+
"""
|
|
678
|
+
projects = get_projects()
|
|
679
|
+
print("Projects in organization:")
|
|
680
|
+
print(projects)
|
|
681
|
+
|
|
682
|
+
# Get jobs for each project
|
|
683
|
+
for project_id in projects:
|
|
684
|
+
print("Get jobs for project", projects[project_id])
|
|
685
|
+
# run query
|
|
686
|
+
job_id = get_jobs(project_id, start, end)
|
|
687
|
+
# check job status
|
|
688
|
+
job_state = wait_for_job_complete(project_id, job_id)
|
|
689
|
+
if job_state == "COMPLETED":
|
|
690
|
+
file_name = f'jobs_{project_id}{start}.parquet'
|
|
691
|
+
if save_job_results_to_parquet(project_id, job_id, file_name):
|
|
692
|
+
upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/jobs')
|
|
693
|
+
|
|
694
|
+
for project_id in projects:
|
|
695
|
+
print("Get events for project", projects[project_id])
|
|
696
|
+
# run query
|
|
697
|
+
job_id = get_events(project_id, start, end)
|
|
698
|
+
# check job status
|
|
699
|
+
job_state = wait_for_job_complete(project_id, job_id)
|
|
700
|
+
if job_state == "COMPLETED":
|
|
701
|
+
file_name = f'events_{project_id}{start}.parquet'
|
|
702
|
+
if save_job_results_to_parquet(project_id, job_id, file_name):
|
|
703
|
+
upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/events')
|
|
704
|
+
|
|
705
|
+
if __name__ == "__main__":
|
|
706
|
+
parser = argparse.ArgumentParser(
|
|
707
|
+
description='Demo of collecting jobs and events from Dremio Cloud Projects')
|
|
708
|
+
parser.add_argument('start',
|
|
709
|
+
help='start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
|
|
710
|
+
parser.add_argument('end',
|
|
711
|
+
help='end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
|
|
712
|
+
args = parser.parse_args()
|
|
713
|
+
|
|
714
|
+
main(args.start, args.end)
|
|
715
|
+
```
|
|
716
|
+
|
|
717
|
+
Was this page helpful?
|
|
718
|
+
|
|
719
|
+
* Monitor the Dremio Console
|
|
720
|
+
* Monitor with Dremio APIs and SQL
|
|
721
|
+
|
|
722
|
+
<div style="page-break-after: always;"></div>
|
|
723
|
+
|
|
724
|
+
# Manage Projects | Dremio Documentation
|
|
725
|
+
|
|
726
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/
|
|
727
|
+
|
|
728
|
+
On this page
|
|
729
|
+
|
|
730
|
+
A project isolates the compute, data, and other resources a team needs for data analysis. An organization may contain multiple projects. Your first project is created during the sign-up process.
|
|
731
|
+
|
|
732
|
+
Each project in Dremio has its own storage. This is used to store metadata and Reflections and serves as the default storage location for the project's Open Catalog. You can choose between two storage options:
|
|
733
|
+
|
|
734
|
+
* Dremio-managed storage – No setup or configuration required. Usage is priced per TB, billed monthly.
|
|
735
|
+
* Your own storage – Use your own Amazon S3 storage. However, this requires you to manage this infrastructure.
|
|
736
|
+
|
|
737
|
+
For details on pricing, see [How Storage Usage Is Calculated](/dremio-cloud/admin/subscription/usage#how-storage-usage-is-calculated).
|
|
738
|
+
|
|
739
|
+
Each project in your organization contains a preview engine. Each new project has a preview engine. The preview engine, by default, will scale down after 1 hour without a query. As the name suggests, it provides previews of queries and datasets. Unlike other engines, the preview engine cannot be disabled, ensuring that many core Dremio functions that require an engine can always run.
|
|
740
|
+
|
|
741
|
+
## View All Projects
|
|
742
|
+
|
|
743
|
+
To view all projects:
|
|
744
|
+
|
|
745
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Organization settings**.
|
|
746
|
+
2. Select **Projects** in the organization settings sidebar.
|
|
747
|
+
|
|
748
|
+
The Projects page displays the status of all projects in your organization. Possible statuses include:
|
|
749
|
+
|
|
750
|
+
* Creating
|
|
751
|
+
* Active
|
|
752
|
+
* Inactive
|
|
753
|
+
* Deactivating
|
|
754
|
+
* Activating
|
|
755
|
+
* Archiving
|
|
756
|
+
* Archived
|
|
757
|
+
* Restoring
|
|
758
|
+
|
|
759
|
+
## Grant Access to a Project
|
|
760
|
+
|
|
761
|
+
New projects are private by default. In the projects page, users can see only the projects for which they have USAGE or OWNERSHIP [privileges](/dremio-cloud/security/privileges). The projects page is empty for users without USAGE or OWNERSHIP privileges on any projects. The projects dropdown list shares this behavior.
|
|
762
|
+
|
|
763
|
+
Similarly, the [Projects API](/dremio-cloud/api/projects) returns an HTTP 403 Forbidden error for requests from users who do not have USAGE or OWNERSHIP privileges on the project. Also, users must have USAGE or OWNERSHIP privileges on a project before they can make API requests or run SQL queries on any objects in the project, even if they have object-level privileges on sources, folders, or other objects in the project.
|
|
764
|
+
|
|
765
|
+
To allow users to access a project, use the [`GRANT TO ROLE`](/dremio-cloud/sql/commands/grant-to-role) or [`GRANT TO USER`](/dremio-cloud/sql/commands/grant-to-user) SQL command or the [Grants API](/dremio-cloud/api/catalog/grants) to grant them the USAGE privilege on the project. For users who do not own the project, USAGE is the minimum privilege required to perform any operation on the project and the objects the project contains. For example, if you are using `GRANT TO USER`, you can run `GRANT USAGE ON PROJECT TO USER <username>`.
|
|
766
|
+
|
|
767
|
+
## Obtain the ID of a Project
|
|
768
|
+
|
|
769
|
+
A BI client application might require the ID of a project as part of the information for creating a connection to Dremio. You can obtain the ID from the General Information page of a project's settings.
|
|
770
|
+
|
|
771
|
+
To obtain a project ID:
|
|
772
|
+
|
|
773
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
774
|
+
2. Select **General Information** in the project settings sidebar.
|
|
775
|
+
3. Copy the value in the **Project ID** field.
|
|
776
|
+
|
|
777
|
+
## Set the Default Project
|
|
778
|
+
|
|
779
|
+
When your data consumers connect to Dremio from BI tools, they must connect to the projects where their datasets reside. They can either connect to the default project or select a different project.
|
|
780
|
+
|
|
781
|
+
If an organization administrator does not set this value, Dremio automatically sets the default project to the oldest project in your organization.
|
|
782
|
+
|
|
783
|
+
You can change the default project at any time.
|
|
784
|
+
|
|
785
|
+
note
|
|
786
|
+
|
|
787
|
+
Data consumers who do not have access to the default project must select an alternative project ID when connecting to Dremio from their BI tools.
|
|
788
|
+
|
|
789
|
+
To specify the default project for your organization:
|
|
790
|
+
|
|
791
|
+
1. Hover over  in the side navigation bar and select **Organization settings**.
|
|
792
|
+
2. Select **General Information** in the organization settings sidebar.
|
|
793
|
+
3. In the **Default Project** field, select the project that you want data consumers to connect to by default through their BI tools.
|
|
794
|
+
4. Click **Save**.
|
|
795
|
+
|
|
796
|
+
## Create a Project
|
|
797
|
+
|
|
798
|
+
If you're planning on using your own bucket, you will need create a role for Dremio granting access, this must be done prior to creating a project, see [Bring Your Own Project Store](/dremio-cloud/admin/projects/your-own-project-storage) for instructions. To avoid having to do this simply use Dremio-managed storage.
|
|
799
|
+
|
|
800
|
+
To add a project:
|
|
801
|
+
|
|
802
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Organization settings**.
|
|
803
|
+
2. Select **Projects** option in the organization settings sidebar.
|
|
804
|
+
3. In the top-right corner of the Projects page, click **Create**.
|
|
805
|
+
4. For **Project name**, specify a name that is unique within the organization.
|
|
806
|
+
5. For **Region**, select the AWS Region where you wish the project to reside.
|
|
807
|
+
6. Select one of the two **Storage** options:
|
|
808
|
+
|
|
809
|
+
* For **Dremio managed storage**, Dremio will create and manage object storage for your use.
|
|
810
|
+
* For **your own storage**, you will need to provide Dremio the bucket URI and Role ARN previously created.
|
|
811
|
+
|
|
812
|
+
## Activate a Project
|
|
813
|
+
|
|
814
|
+
Dremio automatically deactivates any project that has not been accessed in the last 15 days. Dremio sends a courtesy email to project owners three days prior to deactivation. Inactive projects are displayed in the project selector in the side navigation bar and on the Projects page. An inactive project will be activated automatically when any user tries to access it via the Dremio console, an ODBC or JDBC connection, or an API call.
|
|
815
|
+
|
|
816
|
+
note
|
|
817
|
+
|
|
818
|
+
Inactive projects do not consume any compute resources.
|
|
819
|
+
|
|
820
|
+
You can activate an inactive project on the Projects page, or by clicking the project in the project selector. It takes a few minutes to activate a project.
|
|
821
|
+
|
|
822
|
+
To activate a project from the Projects page:
|
|
823
|
+
|
|
824
|
+
1. Hover over  in the side navigation bar and select **Organization settings**.
|
|
825
|
+
2. Select **Projects** in the organization settings sidebar.
|
|
826
|
+
3. Click the ellipsis menu to the far right of the inactive project, and then click **Activate Project**.
|
|
827
|
+
|
|
828
|
+
The project status will change to *Activating* while the project is activated. You can access the project after the status changes to *Active*.
|
|
829
|
+
|
|
830
|
+
## Archive a Project
|
|
831
|
+
|
|
832
|
+
Users with OWNERSHIP privileges or users assigned to the ADMIN role can archive a project. Archived projects are displayed only on the Projects page.
|
|
833
|
+
|
|
834
|
+
note
|
|
835
|
+
|
|
836
|
+
Archived projects do not consume any compute resources.
|
|
837
|
+
|
|
838
|
+
To archive a project:
|
|
839
|
+
|
|
840
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Organization settings**.
|
|
841
|
+
2. Select **Projects** in the organization settings sidebar.
|
|
842
|
+
3. Click the ellipsis menu to the far right of an active or inactive project, and then click **Archive Project**.
|
|
843
|
+
|
|
844
|
+
The project status will change to *Archiving* while the project is archived. When archiving is complete, the status changes to *Archived*.
|
|
845
|
+
|
|
846
|
+
## Restore an Archived Project
|
|
847
|
+
|
|
848
|
+
An archived project will not be restored automatically if a user tries to access it and can only be restored manually by a user with OWNERSHIP privileges on the project or users assigned to the ADMIN role. It takes a few minutes to restore an archived project.
|
|
849
|
+
|
|
850
|
+
To restore an archived project:
|
|
851
|
+
|
|
852
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Organization settings**.
|
|
853
|
+
2. Select **Projects** in the organization settings sidebar.
|
|
854
|
+
3. Click the ellipsis menu to the far right of an archived project and select **Restore Project**.
|
|
855
|
+
|
|
856
|
+
The project status will change to *Restoring* while the project is restored. You can access the project after the status changes to *Active*.
|
|
857
|
+
|
|
858
|
+
## Delete a Project
|
|
859
|
+
|
|
860
|
+
Default projects cannot be deleted. If you want to delete the default project, you must first set another project as the default. See Set the Default Project.
|
|
861
|
+
|
|
862
|
+
To delete a project:
|
|
863
|
+
|
|
864
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Organization Settings**.
|
|
865
|
+
2. Select **Projects** in the organization settings sidebar.
|
|
866
|
+
3. Click the ellipsis menu to the far right and select **Delete Project**.
|
|
867
|
+
4. Confirm that you want to delete the project.
|
|
868
|
+
|
|
869
|
+
Was this page helpful?
|
|
870
|
+
|
|
871
|
+
* View All Projects
|
|
872
|
+
* Grant Access to a Project
|
|
873
|
+
* Obtain the ID of a Project
|
|
874
|
+
* Set the Default Project
|
|
875
|
+
* Create a Project
|
|
876
|
+
* Activate a Project
|
|
877
|
+
* Archive a Project
|
|
878
|
+
* Restore an Archived Project
|
|
879
|
+
* Delete a Project
|
|
880
|
+
|
|
881
|
+
<div style="page-break-after: always;"></div>
|
|
882
|
+
|
|
883
|
+
# Audit Logs | Dremio Documentation
|
|
884
|
+
|
|
885
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/logs
|
|
886
|
+
|
|
887
|
+
On this page
|
|
888
|
+
|
|
889
|
+
The creation and modification of Dremio resources are tracked and traceable via the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table. Audit logging is enabled by default and available to users with administrative permissions on the project.
|
|
890
|
+
|
|
891
|
+
An event can take up to three hours to propagate to the system table. There is currently no maximum retention policy for audit events.
|
|
892
|
+
|
|
893
|
+
note
|
|
894
|
+
|
|
895
|
+
This is a subset of the events that Dremio supports.
|
|
896
|
+
|
|
897
|
+
## Organization Events
|
|
898
|
+
|
|
899
|
+
Dremio supports audit logging for the following organization event types and actions. The [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table contains these events in the default project.
|
|
900
|
+
|
|
901
|
+
| Event Type | Actions | Description |
|
|
902
|
+
| --- | --- | --- |
|
|
903
|
+
| BILLING\_ACCOUNT | BILLING\_ACCOUNT\_ADD\_PROJECT | Dremio added a new project to the billing account during project creation. |
|
|
904
|
+
| BILLING\_ACCOUNT | BILLING\_ACCOUNT\_CREATE | A user created a billing account. |
|
|
905
|
+
| BILLING\_ACCOUNT | BILLING\_ACCOUNT\_REMOVE\_PROJECT | Dremio removed a project from the billing account during project deletion. |
|
|
906
|
+
| BILLING\_ACCOUNT | BILLING\_ACCOUNT\_UPDATE | A user modified the billing account, such as the notification email address. |
|
|
907
|
+
| BILLING\_TRANSACTION | TRANSACTION\_CHARGE | Dremio recorded Dremio Consumption Unit (DCU) usage charges for the period. |
|
|
908
|
+
| BILLING\_TRANSACTION | TRANSACTION\_CREDIT\_LOAD | Dremio loaded DCU credits into the billing account. |
|
|
909
|
+
| CATALOG | CREATE | A user created a new Open Catalog. Catalog creation is included with project creation. |
|
|
910
|
+
| CATALOG | DELETE | A user deleted an Open Catalog. Project deletion also deletes its primary Open Catalog. |
|
|
911
|
+
| CATALOG | UPDATE | A user updated an Open Catalog configuration. |
|
|
912
|
+
| CLOUD | CREATE\_STARTED CREATE\_COMPLETED | A user created a cloud. Clouds provide resources for running engines and storing metadata in a project. |
|
|
913
|
+
| CLOUD | DELETE\_STARTED DELETE\_COMPLETED | A user deleted a cloud. |
|
|
914
|
+
| CLOUD | UPDATE | A user updated a cloud. |
|
|
915
|
+
| CONNECTION | FORCE\_LOGOUT | A user changed their password or deactivated another user, ending all of that user's sessions. |
|
|
916
|
+
| CONNECTION | LOGIN | A user logged in. |
|
|
917
|
+
| CONNECTION | LOGOUT | A user logged out. |
|
|
918
|
+
| EDITION | DOWNGRADE | A user downgraded the billing edition in the Dremio organization. |
|
|
919
|
+
| EDITION | UPGRADE | A user upgraded the billing edition in the Dremio organization. |
|
|
920
|
+
| IDENTITY\_PROVIDER | CREATE | A user configured a new OpenID Connect (OIDC) identity provider integration. |
|
|
921
|
+
| IDENTITY\_PROVIDER | DELETE | A user deleted an OIDC identity provider. |
|
|
922
|
+
| IDENTITY\_PROVIDER | UPDATE | A user updated an OIDC identity provider configuration. |
|
|
923
|
+
| MODEL\_PROVIDER\_CONFIG | CREATE | A user created a new model provider in the Dremio organization. |
|
|
924
|
+
| MODEL\_PROVIDER\_CONFIG | UPDATE | A user updated a model provider in the Dremio organization. |
|
|
925
|
+
| MODEL\_PROVIDER\_CONFIG | DELETE | A user deleted a model provider in the Dremio organization. |
|
|
926
|
+
| MODEL\_PROVIDER\_CONFIG | SET\_DEFAULT | A user set a new default model provider in the Dremio organization. |
|
|
927
|
+
| ORGANIZATION | CREATE\_STARTED CREATE\_COMPLETED | A user created the Dremio organization. |
|
|
928
|
+
| ORGANIZATION | DELETE\_STARTED DELETE\_COMPLETED | A user closed and deleted the Dremio organization. |
|
|
929
|
+
| ORGANIZATION | UPDATE | A user updated the Dremio Cloud organization. |
|
|
930
|
+
| PERSONAL\_ACCESS\_TOKEN | CREATE | A user created a new personal access token in their account. |
|
|
931
|
+
| PERSONAL\_ACCESS\_TOKEN | DELETE | A user deleted a personal access token. |
|
|
932
|
+
| PROJECT | CREATE\_STARTED CREATE\_COMPLETED | A user created a project in the Dremio organization. |
|
|
933
|
+
| PROJECT | DELETE\_STARTED DELETE\_COMPLETED | A user deleted a project. |
|
|
934
|
+
| PROJECT | HIBERNATE\_STARTED HIBERNATE\_COMPLETED | A user archived a project. |
|
|
935
|
+
| PROJECT | UNHIBERNATE\_STARTED UNHIBERNATE\_COMPLETED | A user activated an archived project. |
|
|
936
|
+
| PROJECT | UPDATE | A user updated the configuration of a project. |
|
|
937
|
+
| ROLE | CREATE | A user created a custom role. |
|
|
938
|
+
| ROLE | DELETE | A user deleted a role. |
|
|
939
|
+
| ROLE | MEMBERS\_ADDED | A user added users or roles as members of a role. |
|
|
940
|
+
| ROLE | MEMBERS\_REMOVED | A user removed users or roles as members of a role. |
|
|
941
|
+
| ROLE | UPDATE | A user updated the metadata of a custom role, such as the description. |
|
|
942
|
+
| USER\_ACCOUNT | CREATE | A user added a user account. |
|
|
943
|
+
| USER\_ACCOUNT | DELETE | A user deleted a user account. |
|
|
944
|
+
| USER\_ACCOUNT | PASSWORD\_CHANGE | A user updated their account password. |
|
|
945
|
+
| USER\_ACCOUNT | UPDATE | A user updated user account metadata. |
|
|
946
|
+
|
|
947
|
+
## Project Events
|
|
948
|
+
|
|
949
|
+
| Event Type | Actions | Description |
|
|
950
|
+
| --- | --- | --- |
|
|
951
|
+
| AI\_AGENT | REQUEST RESPONSE | A user sent a request to the AI Agent and received a response. |
|
|
952
|
+
| ENGINE | CREATE\_STARTED CREATE\_COMPLETED | A user created an engine. |
|
|
953
|
+
| ENGINE | DELETE\_STARTED DELETE\_COMPLETED | A user deleted an engine. |
|
|
954
|
+
| ENGINE | DISABLE\_STARTED DISABLE\_COMPLETED | A user disabled an engine. |
|
|
955
|
+
| ENGINE | ENABLE\_STARTED ENABLE\_COMPLETED | A user enabled an engine. |
|
|
956
|
+
| ENGINE | UPDATE\_STARTED UPDATE\_COMPLETED | A user updated an engine configuration. |
|
|
957
|
+
| ENGINE\_SCALING | SCALE\_DOWN\_STARTED SCALE\_DOWN\_COMPLETED | Dremio scaled down an engine by stopping one or more running replicas. |
|
|
958
|
+
| ENGINE\_SCALING | SCALE\_UP\_STARTED SCALE\_UP\_COMPLETED | Dremio scaled up an engine by starting one or more additional replicas. |
|
|
959
|
+
| LABEL | UPDATE | A user created a label on a dataset, source, or other object. |
|
|
960
|
+
| PIPE | CREATE | A user created an autoingest pipe for Apache Iceberg. |
|
|
961
|
+
| PIPE | DELETE | A user dropped an autoingest pipe. |
|
|
962
|
+
| PIPE | UPDATE | A user updated the configuration of an existing autoingest pipe. |
|
|
963
|
+
| PRIVILEGE | DELETE | A user deleted a privilege from a user or role. |
|
|
964
|
+
| PRIVILEGE | UPDATE | A user granted a privilege to a user or role. |
|
|
965
|
+
| REFLECTION | CREATE | A user created a new raw or aggregate Reflection. |
|
|
966
|
+
| REFLECTION | DELETE | A user deleted a Reflection. |
|
|
967
|
+
| REFLECTION | UPDATE | A user updated the content or configuration of a Reflection. |
|
|
968
|
+
| REPLICA | CREATE\_STARTED CREATE\_COMPLETED | Dremio started a replica during an ENGINE\_SCALING scale-up event. |
|
|
969
|
+
| REPLICA | DELETE\_STARTED DELETE\_COMPLETED | Dremio stopped a replica during an ENGINE\_SCALING scale-down event. |
|
|
970
|
+
| ROUTING\_RULESET | UPDATE | A user modified an engine routing rule. |
|
|
971
|
+
| SUPPORT\_SETTING | RESET | A user reset an advanced configuration or diagnostic setting. |
|
|
972
|
+
| SUPPORT\_SETTING | SET | A user set an advanced configuration or diagnostic setting. |
|
|
973
|
+
| UDF | CREATE | A user created a user-defined function. |
|
|
974
|
+
| UDF | DELETE | A user deleted a user-defined function. |
|
|
975
|
+
| UDF | UPDATE | A user modified the SQL definition of a user-defined function. |
|
|
976
|
+
| WIKI | EDIT | A user created or updated a wiki. |
|
|
977
|
+
|
|
978
|
+
## Open Catalog Events
|
|
979
|
+
|
|
980
|
+
These events appear in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table of the project where the catalog is designated as the primary catalog.
|
|
981
|
+
|
|
982
|
+
| Event Type | Actions | Description |
|
|
983
|
+
| --- | --- | --- |
|
|
984
|
+
| FOLDER | CREATE | A user created a folder in the catalog. |
|
|
985
|
+
| FOLDER | DELETE | A user deleted a folder in the catalog. |
|
|
986
|
+
| TABLE | CREATE | A user created a table in the catalog. |
|
|
987
|
+
| TABLE | DELETE | A user deleted a table in the catalog. |
|
|
988
|
+
| TABLE | READ | A user read table information or data from the catalog. |
|
|
989
|
+
| TABLE | REGISTER | A user registered a new table in the catalog. |
|
|
990
|
+
| TABLE | UPDATE | A user updated a table definition in the catalog. |
|
|
991
|
+
| VIEW | CREATE | A user created a view in the catalog. |
|
|
992
|
+
| VIEW | DELETE | A user deleted a view in the catalog. |
|
|
993
|
+
| VIEW | READ | A user read a view in the catalog. |
|
|
994
|
+
| VIEW | UPDATE | A user updated a view definition in the catalog. |
|
|
995
|
+
|
|
996
|
+
## Source Events
|
|
997
|
+
|
|
998
|
+
These events appear in the [`sys.project.history.events`](/dremio-cloud/sql/system-tables/events-historical) table for any source in the project.
|
|
999
|
+
|
|
1000
|
+
| Event Type | Actions | Description |
|
|
1001
|
+
| --- | --- | --- |
|
|
1002
|
+
| SOURCE | CREATE | A user created a data source. |
|
|
1003
|
+
| SOURCE | DELETE | A user deleted a source connection. Any tables from the source were removed. |
|
|
1004
|
+
| SOURCE | UPDATE | A user updated a source configuration. |
|
|
1005
|
+
| FOLDER | CREATE | A user created a folder. |
|
|
1006
|
+
| FOLDER | DELETE | A user deleted a folder. |
|
|
1007
|
+
| TABLE | CREATE | A user created a non-catalog table. |
|
|
1008
|
+
| TABLE | DELETE | A user deleted a non-catalog table. |
|
|
1009
|
+
| TABLE | UPDATE | A user updated a table, or Dremio performed a metadata refresh on a non-Parquet table. |
|
|
1010
|
+
|
|
1011
|
+
Was this page helpful?
|
|
1012
|
+
|
|
1013
|
+
* Organization Events
|
|
1014
|
+
* Project Events
|
|
1015
|
+
* Open Catalog Events
|
|
1016
|
+
* Source Events
|
|
1017
|
+
|
|
1018
|
+
<div style="page-break-after: always;"></div>
|
|
1019
|
+
|
|
1020
|
+
# Optimize Performance | Dremio Documentation
|
|
1021
|
+
|
|
1022
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/
|
|
1023
|
+
|
|
1024
|
+
Dremio uses a variety of tools to help you autonomously optimize your lakehouse. These tools apply at four stages: (1) source files, (2) intermediate transformations, (3) final or production transformations, and (4) client queries. Dremio also offers tools that allow you to manually fine-tune performance. Both approaches can coexist, enabling Dremio to manage most optimizations automatically while still giving you the flexibility to take direct action when desired.
|
|
1025
|
+
|
|
1026
|
+
For details on how Dremio autonomously manages your tables, see [Automatic Optimization](/dremio-cloud/manage-govern/optimization), which focuses on Iceberg table management.
|
|
1027
|
+
|
|
1028
|
+
This section focuses instead on accelerating views and SQL queries, including those from clients such as AI agents and BI dashboards. The principal method for this acceration is Dremio's patterned materialization and query-rewriting, known as Reflections.
|
|
1029
|
+
|
|
1030
|
+
* [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) – Learn how Dremio automatically learns your query patterns and manages Reflections to optimize performance accordingly. This capability is available for Iceberg tables, UniForm tables, Parquet datasets, and any views built on these datasets.
|
|
1031
|
+
* [Manual Reflections](/dremio-cloud/admin/performance/manual-reflections) – Use this option primarily for data formats not supported by Autonomous Reflections. Learn how to define your own Reflections and the best practices for using and managing them.
|
|
1032
|
+
* [Results Cache](/dremio-cloud/admin/performance/results-cache) – Understand how Dremio caches the results of queries from AI agents and BI dashboards.
|
|
1033
|
+
|
|
1034
|
+
Was this page helpful?
|
|
1035
|
+
|
|
1036
|
+
<div style="page-break-after: always;"></div>
|
|
1037
|
+
|
|
1038
|
+
# Jobs | Dremio Documentation
|
|
1039
|
+
|
|
1040
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/jobs/
|
|
1041
|
+
|
|
1042
|
+
On this page
|
|
1043
|
+
|
|
1044
|
+
All jobs run in Dremio are listed on a separate page, showing the job ID, type, status, and other attributes.
|
|
1045
|
+
|
|
1046
|
+
To navigate to the Jobs page, click  in the side navigation bar.
|
|
1047
|
+
|
|
1048
|
+
## Search Filters and Columns
|
|
1049
|
+
|
|
1050
|
+
By default, the Jobs page lists the jobs run within the last 30 days and the jobs are filtered by **UI, External Tools** job types. To change these defaults for your account, you can filter on values and manage columns directly on the Jobs page, as shown in this image:
|
|
1051
|
+
|
|
1052
|
+

|
|
1053
|
+
|
|
1054
|
+
a. **Search Jobs** by typing the username or job ID.
|
|
1055
|
+
|
|
1056
|
+
b. **Start Time** allows you to pick the date and time at which the job began.
|
|
1057
|
+
|
|
1058
|
+
c. **Status** represents one or more job states. For descriptions, see [Job States and Statuses](.#job-states-and-statuses).
|
|
1059
|
+
|
|
1060
|
+
d. **Type** includes Accelerator, Downloads, External Tools, Internal, and UI. For descriptions, see Job Properties.
|
|
1061
|
+
|
|
1062
|
+
e. **User** can be searched by typing the username or checking the box next to the username in the dropdown.
|
|
1063
|
+
|
|
1064
|
+
f. **Manage Columns** by checking the boxes next to additional columns that you want to see in the Jobs list. The grayed out checkboxes show the columns that are required by default. You can also rearrange the column order by clicking directly on a column to drag and drop.
|
|
1065
|
+
|
|
1066
|
+
## Job Properties
|
|
1067
|
+
|
|
1068
|
+
Each job has the following properties, which can appear as columns in the list of jobs on the Jobs page or as details on the Job Overview page:
|
|
1069
|
+
|
|
1070
|
+
| Property | Description |
|
|
1071
|
+
| --- | --- |
|
|
1072
|
+
| Accelerated | A purple lightning bolt in a row indicates that the job ran a query that was accelerated by one or more Reflections. |
|
|
1073
|
+
| Attribute | Represents at least one of the following query types: * **UI** - queries issued from the SQL Runner in the Dremio console. * **External Tools** - queries from client applications, such as Microsoft Power BI, Superset, Tableau, other third-party client applications, and custom applications. * **Accelerator** - queries related to creating, maintaining, and removing Reflections. * **Internal** - queries that Dremio submits for internal operations. * **Downloads** - queries used to download datasets. * **AI** – queries issued from the Dremio AI Agent. |
|
|
1074
|
+
| CPU Used | Provides statistics about the actual cost of the query operations in terms of CPU processing. |
|
|
1075
|
+
| Dataset | The queried dataset, if one was queried. Hover over the dataset to see a metadata card appear with details about the dataset. For more information, see [Discover Data](/dremio-cloud/explore-analyze/discover). |
|
|
1076
|
+
| Duration | The length of time (in seconds) that a job required from start to completion. |
|
|
1077
|
+
| Engine | The engine used to run the query. |
|
|
1078
|
+
| Input | The number of bytes and the number of rows considered for the job. |
|
|
1079
|
+
| Job ID | A universally unique identifier. |
|
|
1080
|
+
| Output | The number of bytes and the number of rows resulted as output from the job. |
|
|
1081
|
+
| Planner Cost Estimate | A cost estimate calculated by Dremio based on an evaluation of the resources that to be used in the execution of a query. The number is not in units, and is intended to give a an idea of the cost of executing a query relative to the costs of executing other queries. Values are derived by adding weighted estimates of required I/O, memory, and CPU load. In reported values, K = thousand, M = million, B = billion, and T = trillion. For example, a value of 12,543,765,321 is reported as 12.5B. |
|
|
1082
|
+
| Planning Time | The length of time (in seconds) in which the query optimizer planned the execution of the query. |
|
|
1083
|
+
| Rows Returned | Number of output records. |
|
|
1084
|
+
| Rows Scanned | Number of input records. |
|
|
1085
|
+
| SQL | The SQL query that was submitted for the job. |
|
|
1086
|
+
| Start Time | The date and time which the job began. |
|
|
1087
|
+
| Status | Represents one or more job states. For descriptions, see Job States and Statuses. |
|
|
1088
|
+
| Total Memory | Provides statistics about the actual cost of the query operations in terms of memory. |
|
|
1089
|
+
| User | Username of the user who ran the query and initiated the job. |
|
|
1090
|
+
| Wait on Client | The length of time (in seconds) that is waiting on the client. |
|
|
1091
|
+
|
|
1092
|
+
## Job States and Statuses
|
|
1093
|
+
|
|
1094
|
+
Each job passes through a sequence of states until it is complete, though the sequence can be interrupted if a query is canceled or if there is an error during a state. In this diagram, the states that a job passes through are in white, and the possible end states are in dark gray.
|
|
1095
|
+
|
|
1096
|
+

|
|
1097
|
+
|
|
1098
|
+
This table lists the statuses that the UI lets you filter on and shows how they map to the states:
|
|
1099
|
+
|
|
1100
|
+
| Icon | Status | State | Description |
|
|
1101
|
+
| --- | --- | --- | --- |
|
|
1102
|
+
| | Setup | Pending | Represents a state where the query is waiting to be scheduled on the query pool. |
|
|
1103
|
+
| Metadata Retrieval | Represents a state where metadata schema is retrieved and the SQL command is parsed. |
|
|
1104
|
+
| Planning | Represents a state where the following are done: * Physical and logical planning * Reflection matching * Partition metadata retrieval * Mapping the query to an engine-based workload management rule * Pick the engine associated with the query to run the query. |
|
|
1105
|
+
| | Engine Start | Engine Start | Represents a state where the engine starts if it has stopped. If the engine is stopped, it takes time to restart for the executors to be active. If the engine is already started, then this state does not have a duration. |
|
|
1106
|
+
| | Queued | Queued | Represents a state where a job is queued. Each engine has a limit of concurrent queries. If the queries in progress exceed the concurrency limit, the query should wait until the jobs in progress complete. |
|
|
1107
|
+
| | Running | Execution Planning | Represents a state where executor nodes are selected from the chosen engine to run the query, and work is distributed to each executor. |
|
|
1108
|
+
| Running | Represents a state where executor nodes execute and complete the fragments assigned to them. Typically, most queries spend more time in this state. |
|
|
1109
|
+
| Starting | Represents a state where the query is starting up. |
|
|
1110
|
+
| | Canceled | Canceled | Represents a terminal state that indicates that the query is canceled by the user or an intervention in the system. |
|
|
1111
|
+
| | Completed | Completed | Represents a terminal state that indicates that the query is successfully completed. |
|
|
1112
|
+
| | Failed | Failed | Represents a terminal state that indicates that the query has failed due to an error. |
|
|
1113
|
+
|
|
1114
|
+
## View Job Details
|
|
1115
|
+
|
|
1116
|
+
You can view the details of a specific job by viewing the Job Overview, SQL, Visual Profile, and Raw Profile pages.
|
|
1117
|
+
|
|
1118
|
+
To navigate to the job details:
|
|
1119
|
+
|
|
1120
|
+
1. Click  in the side navigation bar.
|
|
1121
|
+
2. On the Jobs page, click a job that you would like to see the job overview for.
|
|
1122
|
+
3. The Job Overview page then replaces the list of jobs.
|
|
1123
|
+
|
|
1124
|
+
### Explain SQL
|
|
1125
|
+
|
|
1126
|
+
Use the **Explain SQL** option in the SQL Runner to analyze and optimize your SQL queries with assistance from the AI Agent. In the SQL Runner, highlight the SQL you want to review, right-click, and select **Explain SQL**. This prompts the AI Agent to examine the query, datasets, and underlying architecture to identify potential optimizations. The AI Agent uses Dremio’s SQL Parser—the same logic used during query execution—to identify referenced tables, schemas, and relationships. Based on this analysis, the Agent provides insights and recommendations to improve query performance and structure. You can continue interacting with the AI Agent to refine the analysis and iterate on the SQL. The AI Agent applies SQL best practices when suggesting improvements and may execute revised queries to validate quality before presenting recommendations.
|
|
1127
|
+
|
|
1128
|
+
### Explain Job
|
|
1129
|
+
|
|
1130
|
+
Use the **Explain Job** option on the Job Details page to analyze job performance and identify opportunities for optimization. From the Job Details page, click **Explain Job** to prompt the AI Agent to review the job’s query profile, planning, and execution details to compare with the AI Agents’s internal understanding of optimal performance characteristics. The AI Agent generates a detailed analysis that highlights key performance metrics such as data skew, memory usage, threading efficiency, and network utilization. Based on this assessment, it recommends potential optimizations to improve performance and resource utilization. You can continue the conversation with the AI Agent to explore the job in greater depth or reference additional job IDs to extend the investigation and compare results.
|
|
1131
|
+
|
|
1132
|
+
### Job Overview
|
|
1133
|
+
|
|
1134
|
+
You can view the details of a specific job on the Job Overview page.
|
|
1135
|
+
|
|
1136
|
+
To navigate to a job overview:
|
|
1137
|
+
|
|
1138
|
+
1. Click  in the side navigation bar.
|
|
1139
|
+
2. On the Jobs page, click a job that you would like to see the job overview for. The Job Overview page then replaces the list of jobs.
|
|
1140
|
+
|
|
1141
|
+
The main components of the Job Overview page are numbered below:
|
|
1142
|
+
|
|
1143
|
+

|
|
1144
|
+
|
|
1145
|
+
#### 1. Summary
|
|
1146
|
+
|
|
1147
|
+
Each job is summarized.
|
|
1148
|
+
|
|
1149
|
+
#### 2. Total Execution Time
|
|
1150
|
+
|
|
1151
|
+
The total execution time is the length of time for the total execution and the job state durations in the order they occur. Only the duration of the Engine Start state is in minutes and seconds. If the engine is stopped, it takes time to restart for the executors to be active. If the engine is already started, then Engine Start duration does not have a value. For descriptions, see Job States and Statuses.
|
|
1152
|
+
|
|
1153
|
+
#### 3. Download Profile
|
|
1154
|
+
|
|
1155
|
+
To download the query profile, click the **Download Profile** button in the bottom-left corner of the Job Overview page. The profile will help you see more granular details about the job.
|
|
1156
|
+
|
|
1157
|
+
The profile downloads as a **ZIP** file. When you extract the **ZIP** file, you will see the following JSON files:
|
|
1158
|
+
|
|
1159
|
+
* profile\_attempt\_0.json: This file helps with troubleshooting out of memory and wrong results issues. Note that the start and end time of the query is provided in EPOCH format. See the [Epoch Converter](https://www.epochconverter.com) utility for converting query time.
|
|
1160
|
+
* header.json: This file provides the full list of Dremio coordinators and executors, data sets, and sources.
|
|
1161
|
+
This information is useful when you are using REST calls.
|
|
1162
|
+
|
|
1163
|
+
#### 4. Submitted SQL
|
|
1164
|
+
|
|
1165
|
+
The SQL query for the selected job.
|
|
1166
|
+
|
|
1167
|
+
#### 5. Queried Datasets
|
|
1168
|
+
|
|
1169
|
+
The datasets queried for the selected job. These can be views or tables.
|
|
1170
|
+
|
|
1171
|
+
#### 6. Scans
|
|
1172
|
+
|
|
1173
|
+
Scan details include the source type, scan thread count, IO wait time (in milliseconds), and the number of rows scanned.
|
|
1174
|
+
|
|
1175
|
+
#### 7. Acceleration
|
|
1176
|
+
|
|
1177
|
+
Only if the job was accelerated, the Acceleration section appears and Reflections data is provided. See [Optimize Performance](/dremio-cloud/admin/performance/) for more information.
|
|
1178
|
+
|
|
1179
|
+
#### 8. Results
|
|
1180
|
+
|
|
1181
|
+
To see the job results, click the **Open Results** link in the top-right corner of the Job Overview page. As long as the engine that ran the job is up, the **Open Results** link is visible in the UI. It disappears when the engine that ran the job shuts down and is only visible for the jobs that are run through the UI.
|
|
1182
|
+
|
|
1183
|
+
### Job SQL
|
|
1184
|
+
|
|
1185
|
+
Next to the Job Overview page is a tab for the SQL page, which shows the Submitted SQL and Dataset Graph.
|
|
1186
|
+
|
|
1187
|
+
You can view the SQL statement that was used for the selected job. Although the SQL statement is in read-only mode on the SQL Details page, the statement can be copied from the page and pasted into the SQL editor.
|
|
1188
|
+
|
|
1189
|
+
A dataset graph only appears if there is a queried dataset for the selected job. The dataset graph is a visual representation of the datasets used in the SQL statement.
|
|
1190
|
+
|
|
1191
|
+
## Related Topics
|
|
1192
|
+
|
|
1193
|
+
* [Profiles](/dremio-cloud/admin/monitor/jobs/profiles) – See the visual profiles and raw profiles of jobs.
|
|
1194
|
+
|
|
1195
|
+
Was this page helpful?
|
|
1196
|
+
|
|
1197
|
+
* Search Filters and Columns
|
|
1198
|
+
* Job Properties
|
|
1199
|
+
* Job States and Statuses
|
|
1200
|
+
* View Job Details
|
|
1201
|
+
+ Explain SQL
|
|
1202
|
+
+ Explain Job
|
|
1203
|
+
+ Job Overview
|
|
1204
|
+
+ Job SQL
|
|
1205
|
+
* Related Topics
|
|
1206
|
+
|
|
1207
|
+
<div style="page-break-after: always;"></div>
|
|
1208
|
+
|
|
1209
|
+
# Manage Your Subscription | Dremio Documentation
|
|
1210
|
+
|
|
1211
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/subscription/
|
|
1212
|
+
|
|
1213
|
+
On this page
|
|
1214
|
+
|
|
1215
|
+
Dremio offers multiple payment options for users to upgrade their organization after conclusion of the free trial:
|
|
1216
|
+
|
|
1217
|
+
* Pay-as-you-go (PAYG): Provide credit card details via the upgrade steps in the Dremio console.
|
|
1218
|
+
* Commit-based offerings: Prepaid contracts for your organization's usage. Dremio invoices you directly, and a variety of payment options are available. Please contact [Dremio Sales](https://www.dremio.com/contact/) for more details.
|
|
1219
|
+
* AWS Marketplace: A commit-based contract paid with AWS credits. Check out the [Dremio Cloud AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-pnlijtzyoyjok) listing on the AWS Marketplace. Once ready to proceed, contact [Dremio Sales](https://www.dremio.com/contact/) for more details.
|
|
1220
|
+
|
|
1221
|
+
Note that your organization can be moved to a commit-based contract after upgrading to PAYG.
|
|
1222
|
+
|
|
1223
|
+
## Upgrade
|
|
1224
|
+
|
|
1225
|
+
At any point during your free trial of Dremio, an organization can be upgraded by entering your credit card details. If the free trial concludes, your organization will become partially inaccessible for 30 days. During this time, you can still log in to upgrade your account, but if you do not upgrade your account before then, your organization and all of its contents may be deleted.
|
|
1226
|
+
|
|
1227
|
+
## Pay-as-you-go Billing Cycles
|
|
1228
|
+
|
|
1229
|
+
Your billing cycle starts from the day of your organization's upgrade and ends one month later. At the conclusion of the billing period, we will immediately attempt to charge your card for the outstanding balance.
|
|
1230
|
+
|
|
1231
|
+
If for any reason payment fails (or is only partially successful), we will attempt the charge again. If these subsequent attempts fail, your organization will become partially inaccessible. You can still log in but only to update your payment method. If a new payment method is not provided before the end of this billing period, your organization and all of its contents may be deleted.
|
|
1232
|
+
|
|
1233
|
+
## Organizations
|
|
1234
|
+
|
|
1235
|
+
A Dremio organization can have one or more projects. Usage across projects is aggregated for billing purposes, meaning that when the PAYG bill is paid for an organization, the balance is paid for all projects. Only users who are members of the ADMIN role within the organization can manage billing details within the Dremio console.
|
|
1236
|
+
|
|
1237
|
+
## Find Your Organization ID
|
|
1238
|
+
|
|
1239
|
+
The ID of your organization can be helpful during communication with Dremio Sales or Support. To find your organization's ID:
|
|
1240
|
+
|
|
1241
|
+
1. In the Dremio console, click  in the side navigation bar and select **Organization settings** to open the Organization settings page.
|
|
1242
|
+
2. On the General Information tab, copy your organization's ID.
|
|
1243
|
+
|
|
1244
|
+
## Delete Your Organization
|
|
1245
|
+
|
|
1246
|
+
Please contact Dremio's Support team if you would like to have your organization deleted.
|
|
1247
|
+
|
|
1248
|
+
Was this page helpful?
|
|
1249
|
+
|
|
1250
|
+
* Upgrade
|
|
1251
|
+
* Pay-as-you-go Billing Cycles
|
|
1252
|
+
* Organizations
|
|
1253
|
+
* Find Your Organization ID
|
|
1254
|
+
* Delete Your Organization
|
|
1255
|
+
|
|
1256
|
+
<div style="page-break-after: always;"></div>
|
|
1257
|
+
|
|
1258
|
+
# Configure Model Providers | Dremio Documentation
|
|
1259
|
+
|
|
1260
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/model-providers
|
|
1261
|
+
|
|
1262
|
+
On this page
|
|
1263
|
+
|
|
1264
|
+
You configure model providers for your organization for AI features when deploying Dremio. After you configure at least one model provider, you must set a default model provider and optionally set an allowlist of available models. Dremio uses this default provider for all Dremio's AI Agent interactions, whereas the allowlist models can be used by anyone writing AI functions. By default CALL MODEL is granted to all users for all new model providers so if the default changes users can continue to use the AI Agent without interruption.
|
|
1265
|
+
|
|
1266
|
+
## Dremio-Provided LLM
|
|
1267
|
+
|
|
1268
|
+
Dremio provides all organizations with an out-of-the-box model provider so that all users can begin engaging with the AI Agent and AI functions without any other configuration required. Once you have added your own model provider and set it as the new default, the Dremio-Provided LLM will no longer be used. If you delete all other model providers, then the Dremio-Provided LLM will revert to the organization's default model provider. This model provider cannot be deleted.
|
|
1269
|
+
|
|
1270
|
+
## Supported Model Providers
|
|
1271
|
+
|
|
1272
|
+
Dremio supports configuration of the following model providers and models. Dremio recommends using enterprise-grade reasoning models for the best performance and experience.
|
|
1273
|
+
|
|
1274
|
+
| Category | Models | Connection Method(s) |
|
|
1275
|
+
| --- | --- | --- |
|
|
1276
|
+
| **OpenAI** | * gpt-5-2025-08-07 * gpt-5-mini-2025-08-07 * gpt-5-nano-2025-08-07 * gpt-4.1-2025-04-14 * gpt-4o-2024-11-20 * gpt-4-turbo-2024-04-09 * gpt-4.1-mini-2025-04-14 * o3-mini-2025-01-31 * o4-mini-2025-04-16 * o3-2025-04-16 | * Access Key |
|
|
1277
|
+
| **Anthropic** | * claude-sonnet-4-5-20250929 * claude-opus-4-1-20250805 * claude-opus-4-20250514 * claude-sonnet-4-20250514 | * Access Key |
|
|
1278
|
+
| **Google Gemini** | * gemini-2.5-pro | * Access Key |
|
|
1279
|
+
| **AWS Bedrock** | * specify Model ID(s) * [AWS Bedrock Supported Models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html) | * Access Key * IAM Role |
|
|
1280
|
+
| **Azure OpenAI** | * specify Deployment Name(s) * [Azure Supported Models](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/models-sold-directly-by-azure?tabs=global-standard-aoai%2Cstandard-chat-completions%2Cglobal-standard&pivots=azure-openai#azure-openai-in-azure-ai-foundry-models) | Combination of 1. Resource Name 2. Directory ID 3. Application ID 4. Client Secret Value |
|
|
1281
|
+
|
|
1282
|
+
## Rate Limiting and Quotas
|
|
1283
|
+
|
|
1284
|
+
### AWS Bedrock Rate Limits
|
|
1285
|
+
|
|
1286
|
+
When using AWS Bedrock model providers, you may encounter rate limiting errors such as "429 Too Many Tokens (Rate Limit Exceeded)". This is particularly common with new AWS accounts that start with lower or fixed quotas.
|
|
1287
|
+
|
|
1288
|
+
If you experience rate limiting issues, you can contact AWS Support and request a quota increase by providing:
|
|
1289
|
+
|
|
1290
|
+
* Quota name
|
|
1291
|
+
* Model ID
|
|
1292
|
+
* AWS region
|
|
1293
|
+
* Use case description
|
|
1294
|
+
* Projected token and request usage
|
|
1295
|
+
|
|
1296
|
+
For more information about AWS Bedrock quotas and limits, see the [AWS Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).
|
|
1297
|
+
|
|
1298
|
+
## Default Model Provider
|
|
1299
|
+
|
|
1300
|
+
To delete the model provider, you must assign a new default unless you are deleting the last available model provider that you have configured. To update the default model provider to a new one, you must have MODIFY privilege on both the current default and the new proposed default model provider.
|
|
1301
|
+
|
|
1302
|
+
## Add Model Provider
|
|
1303
|
+
|
|
1304
|
+
To add a model provider in the Dremio console:
|
|
1305
|
+
|
|
1306
|
+
1. Click  in the side navigation bar to go to the Settings page.
|
|
1307
|
+
2. Select **Organization Settings**.
|
|
1308
|
+
3. Select the **AI Configuration** setting.
|
|
1309
|
+
4. Click **Add model provider**.
|
|
1310
|
+
|
|
1311
|
+
Was this page helpful?
|
|
1312
|
+
|
|
1313
|
+
* Dremio-Provided LLM
|
|
1314
|
+
* Supported Model Providers
|
|
1315
|
+
* Rate Limiting and Quotas
|
|
1316
|
+
+ AWS Bedrock Rate Limits
|
|
1317
|
+
* Default Model Provider
|
|
1318
|
+
* Add Model Provider
|
|
1319
|
+
|
|
1320
|
+
<div style="page-break-after: always;"></div>
|
|
1321
|
+
|
|
1322
|
+
# External Engines | Dremio Documentation
|
|
1323
|
+
|
|
1324
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/external-engines
|
|
1325
|
+
|
|
1326
|
+
On this page
|
|
1327
|
+
|
|
1328
|
+
Dremio's Open Catalog is built on Apache Polaris, providing a standards-based, open approach to data catalog management. At its core is the Iceberg REST interface, which enables seamless integration with any query engine that supports the Apache Iceberg REST catalog specification. This open architecture means you can connect industry-standard engines such as Apache Spark, Trino, and Apache Flink directly to Dremio.
|
|
1329
|
+
|
|
1330
|
+
| Engine | Best For | Key Features |
|
|
1331
|
+
| --- | --- | --- |
|
|
1332
|
+
| [Apache Spark](https://spark.apache.org/) | Data engineering, ETL | Token exchange, nested folders, views |
|
|
1333
|
+
| [Trino](https://trino.io/) | Interactive analytics | Fast queries, BI workloads |
|
|
1334
|
+
| [Apache Flink](https://flink.apache.org/) | Real-time streaming | Event-driven, continuous pipelines |
|
|
1335
|
+
|
|
1336
|
+
By leveraging the Iceberg REST standard, the Open Catalog acts as a universal catalog layer that query engines can communicate with using a common language. This allows organizations to build flexible data architectures where multiple engines can work together, each accessing and managing the same Iceberg tables through Dremio's centralized catalog.
|
|
1337
|
+
|
|
1338
|
+
## Apache Spark
|
|
1339
|
+
|
|
1340
|
+
Apache Spark is a unified analytics engine for large-scale data processing, widely used for ETL, batch processing, and data engineering workflows.
|
|
1341
|
+
|
|
1342
|
+
### Prerequisites
|
|
1343
|
+
|
|
1344
|
+
This example uses Spark 3.5.3 with Iceberg 1.9.1. For other versions, ensure compatibility between Spark, Scala, and Iceberg runtime versions. Additional prerequisites include:
|
|
1345
|
+
|
|
1346
|
+
* The following JAR files downloaded to your local directory:
|
|
1347
|
+
+ `authmgr-oauth2-runtime-0.0.5.jar` from [Dremio Auth Manager releases](https://github.com/dremio/iceberg-auth-manager/releases). This open-source library handles token exchange, automatically converting your personal access token (PAT) into an OAuth token for seamless authentication. For more details about Dremio Auth Manager's capabilities and configuration options, see [Introducing Dremio Auth Manager for Apache Iceberg](https://www.dremio.com/blog/introducing-dremio-auth-manager-for-apache-iceberg/).
|
|
1348
|
+
+ `iceberg-spark-runtime-3.5_2.12-1.9.1.jar` (from [Apache Iceberg releases](https://iceberg.apache.org/releases/))
|
|
1349
|
+
+ `iceberg-aws-bundle-1.9.1.jar` (from [Apache Iceberg releases](https://iceberg.apache.org/releases/))
|
|
1350
|
+
* Docker installed and running.
|
|
1351
|
+
* Your Dremio catalog name – The default catalog in each project has the same name as the project.
|
|
1352
|
+
* If authenticating with a PAT, you must generate a token. See [Personal Access Tokens](/dremio-cloud/security/authentication/personal-access-token/) for step-by-step instructions.
|
|
1353
|
+
* If authenticating with an identity provider (IDP), your IDP or other external token provider must be configured as a trusted OAuth [external token provider](/dremio-cloud/security/authentication/app-authentication/external-token) in Dremio.
|
|
1354
|
+
* You must have an OAuth2 client registered in your IDP configured to issue tokens that Dremio accepts (matching audience and scopes) and with a client ID and client secret provided by your IDP.
|
|
1355
|
+
|
|
1356
|
+
### Authenticate with a PAT
|
|
1357
|
+
|
|
1358
|
+
You can authenticate your Apache Spark session with a Dremio personal access token using the following script. Replace `<personal_access_token>` with your Dremio personal access token and replace `<catalog_name>` with your catalog name.
|
|
1359
|
+
|
|
1360
|
+
In addition, you can adjust the volume mount paths to match where you've downloaded the JAR files and where you want your workspace directory. The example uses `$HOME/downloads` and `$HOME/workspace`.
|
|
1361
|
+
|
|
1362
|
+
Spark with PAT Authentication
|
|
1363
|
+
|
|
1364
|
+
```
|
|
1365
|
+
#!/bin/bash
|
|
1366
|
+
export CATALOG_NAME="<catalog_name>"
|
|
1367
|
+
export DREMIO_PAT="<personal_access_token>"
|
|
1368
|
+
|
|
1369
|
+
docker run -it \
|
|
1370
|
+
-v $HOME/downloads:/opt/jars \
|
|
1371
|
+
-v $HOME/workspace:/workspace \
|
|
1372
|
+
apache/spark:3.5.3 \
|
|
1373
|
+
/opt/spark/bin/spark-shell \
|
|
1374
|
+
--jars /opt/jars/authmgr-oauth2-runtime-0.0.5.jar,/opt/jars/iceberg-spark-runtime-3.5_2.12-1.9.1.jar,/opt/jars/iceberg-aws-bundle-1.9.1.jar \
|
|
1375
|
+
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
|
|
1376
|
+
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
|
|
1377
|
+
--conf spark.sql.catalog.polaris.type=rest \
|
|
1378
|
+
--conf spark.sql.catalog.polaris.cache-enabled=false \
|
|
1379
|
+
--conf spark.sql.catalog.polaris.warehouse=$CATALOG_NAME \
|
|
1380
|
+
--conf spark.sql.catalog.polaris.uri=https://catalog.dremio.cloud/api/iceberg \
|
|
1381
|
+
--conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
|
|
1382
|
+
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
|
|
1383
|
+
--conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
|
|
1384
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-endpoint=https://login.dremio.cloud/oauth/token \
|
|
1385
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=token_exchange \
|
|
1386
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=dremio-catalog-cli \
|
|
1387
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=dremio.all \
|
|
1388
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token="$DREMIO_PAT" \
|
|
1389
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:dremio:personal-access-token
|
|
1390
|
+
```
|
|
1391
|
+
|
|
1392
|
+
note
|
|
1393
|
+
|
|
1394
|
+
In this configuration, `polaris` is the catalog identifier used within Spark. This identifier is mapped to your actual Dremio catalog via the `spark.sql.catalog.polaris.warehouse` property.
|
|
1395
|
+
|
|
1396
|
+
### Authenticate with an IDP
|
|
1397
|
+
|
|
1398
|
+
You can authenticate your Apache Spark session using an [external token provider](/dremio-cloud/security/authentication/app-authentication/external-token) that has been integrated with Dremio.
|
|
1399
|
+
|
|
1400
|
+
**Using this configuration:**
|
|
1401
|
+
|
|
1402
|
+
* Spark obtains a user-specific JWT from the external token provider.
|
|
1403
|
+
* Spark connects to Dremio and [exchanges the JWT](/dremio-cloud/api/oauth-token) for an access token.
|
|
1404
|
+
* Spark connects to the Open Catalog using the access token.
|
|
1405
|
+
|
|
1406
|
+
Using the following script, replace `<catalog_name>` with your catalog name, `<idp_url>` with the location of your external token provider, `<client_id>` and `<client_secret>` with the credentials issued by the external token provider.
|
|
1407
|
+
|
|
1408
|
+
In addition, you can adjust the volume mount paths to match where you've downloaded the JAR files and where you want your workspace directory. The example uses `$HOME/downloads` and `$HOME/workspace`.
|
|
1409
|
+
|
|
1410
|
+
Spark with IDP Authentication
|
|
1411
|
+
|
|
1412
|
+
```
|
|
1413
|
+
#!/bin/bash
|
|
1414
|
+
export CATALOG_NAME="<catalog_name>"
|
|
1415
|
+
export IDP_URL="<idp_url>"
|
|
1416
|
+
export CLIENT_ID="<idp_client_id>"
|
|
1417
|
+
export CLIENT_SECRET="<idp_client_secret>"
|
|
1418
|
+
|
|
1419
|
+
docker run -it \
|
|
1420
|
+
-v $HOME/downloads:/opt/jars \
|
|
1421
|
+
-v $HOME/workspace:/workspace \
|
|
1422
|
+
apache/spark:3.5.3 \
|
|
1423
|
+
/opt/spark/bin/spark-shell \
|
|
1424
|
+
--jars /opt/jars/authmgr-oauth2-runtime-0.0.5.jar,/opt/jars/iceberg-spark-runtime-3.5_2.12-1.9.1.jar,/opt/jars/iceberg-aws-bundle-1.9.1.jar \
|
|
1425
|
+
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
|
|
1426
|
+
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
|
|
1427
|
+
--conf spark.sql.catalog.polaris.type=rest \
|
|
1428
|
+
--conf spark.sql.catalog.polaris.cache-enabled=false \
|
|
1429
|
+
--conf spark.sql.catalog.polaris.warehouse=$CATALOG_NAME \
|
|
1430
|
+
--conf spark.sql.catalog.polaris.uri=https://catalog.dremio.cloud/api/iceberg \
|
|
1431
|
+
--conf spark.sql.catalog.polaris.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
|
|
1432
|
+
--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
|
|
1433
|
+
--conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
|
|
1434
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.issuer-url=$IDP_URL \
|
|
1435
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=device_code \
|
|
1436
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=$CLIENT_ID \
|
|
1437
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-secret=$CLIENT_SECRET \
|
|
1438
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=dremio.all \
|
|
1439
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.enabled=true \
|
|
1440
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.token-endpoint=https://login.dremio.cloud/oauth/token \
|
|
1441
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.scope=dremio.all \
|
|
1442
|
+
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:jwt
|
|
1443
|
+
```
|
|
1444
|
+
|
|
1445
|
+
### Usage Examples
|
|
1446
|
+
|
|
1447
|
+
With these configurations, `polaris` is the catalog identifier used within Spark. This identifier is mapped to your actual Dremio catalog via the `spark.sql.catalog.polaris.warehouse` property. Once Spark is running and connected to your Dremio catalog:
|
|
1448
|
+
|
|
1449
|
+
List namespaces
|
|
1450
|
+
|
|
1451
|
+
```
|
|
1452
|
+
spark.sql("SHOW NAMESPACES IN polaris").show()
|
|
1453
|
+
```
|
|
1454
|
+
|
|
1455
|
+
Query a table
|
|
1456
|
+
|
|
1457
|
+
```
|
|
1458
|
+
spark.sql("SELECT * FROM polaris.your_namespace.your_table LIMIT 10").show()
|
|
1459
|
+
```
|
|
1460
|
+
|
|
1461
|
+
Create a table
|
|
1462
|
+
|
|
1463
|
+
```
|
|
1464
|
+
spark.sql("""
|
|
1465
|
+
CREATE TABLE polaris.your_namespace.new_table (
|
|
1466
|
+
id INT,
|
|
1467
|
+
name STRING
|
|
1468
|
+
) USING iceberg
|
|
1469
|
+
""")
|
|
1470
|
+
```
|
|
1471
|
+
|
|
1472
|
+
## Trino
|
|
1473
|
+
|
|
1474
|
+
Trino is a distributed SQL query engine designed for fast analytic queries against data sources of all sizes. It excels at interactive SQL analysis, ad hoc queries, and joining data across multiple sources.
|
|
1475
|
+
|
|
1476
|
+
### Prerequisites
|
|
1477
|
+
|
|
1478
|
+
* Docker installed and running.
|
|
1479
|
+
* A valid Dremio personal access token – See [Personal Access Tokens](/dremio-cloud/security/authentication/personal-access-token/) for instructions to generate a personal access token.
|
|
1480
|
+
* Your Dremio catalog name – The default catalog in each project has the same name as the project.
|
|
1481
|
+
|
|
1482
|
+
### Configuration
|
|
1483
|
+
|
|
1484
|
+
To connect Trino to Dremio using Docker, follow these steps:
|
|
1485
|
+
|
|
1486
|
+
1. Create a directory for Trino configuration and add a catalog configuration:
|
|
1487
|
+
|
|
1488
|
+
```
|
|
1489
|
+
mkdir -p ~/trino-config/catalog
|
|
1490
|
+
```
|
|
1491
|
+
|
|
1492
|
+
In `trino-config/catalog`, create a catalog configuration file named `polaris.properties` with the following values:
|
|
1493
|
+
|
|
1494
|
+
Trino polaris.properties
|
|
1495
|
+
|
|
1496
|
+
```
|
|
1497
|
+
connector.name=iceberg
|
|
1498
|
+
iceberg.catalog.type=rest
|
|
1499
|
+
iceberg.rest-catalog.uri=https://catalog.dremio.cloud/api/iceberg
|
|
1500
|
+
iceberg.rest-catalog.oauth2.token=<personal_access_token>
|
|
1501
|
+
|
|
1502
|
+
iceberg.rest-catalog.warehouse=<catalog_name>
|
|
1503
|
+
iceberg.rest-catalog.security=OAUTH2
|
|
1504
|
+
|
|
1505
|
+
iceberg.rest-catalog.vended-credentials-enabled=true
|
|
1506
|
+
fs.native-s3.enabled=true
|
|
1507
|
+
s3.region=<region>
|
|
1508
|
+
```
|
|
1509
|
+
|
|
1510
|
+
Replace the following:
|
|
1511
|
+
|
|
1512
|
+
* `<personal_access_token>` with your Dremio personal access token.
|
|
1513
|
+
* `<catalog_name>` with your catalog name.
|
|
1514
|
+
* `<region>` with the AWS region where your data is stored, such as `us-west-2`.
|
|
1515
|
+
|
|
1516
|
+
note
|
|
1517
|
+
|
|
1518
|
+
* In this configuration, `polaris` (from the filename `polaris.properties`) is the catalog identifier used in Trino queries. The `iceberg.rest-catalog.warehouse` property maps this identifier to your actual Dremio catalog.
|
|
1519
|
+
* In `oauth2.token`, you provide your Dremio personal access token directly. Dremio's catalog API accepts PATs as bearer tokens without requiring token exchange.
|
|
1520
|
+
2. Pull and start the Trino container:
|
|
1521
|
+
|
|
1522
|
+
```
|
|
1523
|
+
docker run --name trino -d -p 8080:8080 trinodb/trino:latest
|
|
1524
|
+
```
|
|
1525
|
+
3. Verify that Trino is running:
|
|
1526
|
+
|
|
1527
|
+
```
|
|
1528
|
+
docker ps
|
|
1529
|
+
```
|
|
1530
|
+
|
|
1531
|
+
You can access the web UI at `http://localhost:8080` and log in as `admin`.
|
|
1532
|
+
4. Restart Trino with the configuration:
|
|
1533
|
+
|
|
1534
|
+
```
|
|
1535
|
+
docker stop trino
|
|
1536
|
+
docker rm trino
|
|
1537
|
+
|
|
1538
|
+
# Start with mounted configuration
|
|
1539
|
+
docker run --name trino -d -p 8080:8080 -v ~/trino-config/catalog:/etc/trino/catalog trinodb/trino:latest
|
|
1540
|
+
|
|
1541
|
+
# Verify Trino is running
|
|
1542
|
+
docker ps
|
|
1543
|
+
|
|
1544
|
+
# Check logs
|
|
1545
|
+
docker logs trino -f
|
|
1546
|
+
```
|
|
1547
|
+
5. In another window, connect to the Trino CLI:
|
|
1548
|
+
|
|
1549
|
+
```
|
|
1550
|
+
docker exec -it trino trino --user admin
|
|
1551
|
+
```
|
|
1552
|
+
|
|
1553
|
+
You should see the Trino prompt:
|
|
1554
|
+
|
|
1555
|
+
```
|
|
1556
|
+
trino>
|
|
1557
|
+
```
|
|
1558
|
+
6. Verify the catalog connection:
|
|
1559
|
+
|
|
1560
|
+
```
|
|
1561
|
+
trino> show catalogs;
|
|
1562
|
+
```
|
|
1563
|
+
|
|
1564
|
+
### Usage Examples
|
|
1565
|
+
|
|
1566
|
+
Once Trino is running and connected to your Dremio catalog:
|
|
1567
|
+
|
|
1568
|
+
List namespaces
|
|
1569
|
+
|
|
1570
|
+
```
|
|
1571
|
+
trino> show schemas from polaris;
|
|
1572
|
+
```
|
|
1573
|
+
|
|
1574
|
+
Query a table
|
|
1575
|
+
|
|
1576
|
+
```
|
|
1577
|
+
trino> select * from polaris.your_namespace.your_table;
|
|
1578
|
+
```
|
|
1579
|
+
|
|
1580
|
+
Create a table
|
|
1581
|
+
|
|
1582
|
+
```
|
|
1583
|
+
trino> CREATE TABLE polaris.demo_namespace.test_table (
|
|
1584
|
+
id INT,
|
|
1585
|
+
name VARCHAR,
|
|
1586
|
+
created_date DATE,
|
|
1587
|
+
value DOUBLE
|
|
1588
|
+
);
|
|
1589
|
+
```
|
|
1590
|
+
|
|
1591
|
+
### Limitations
|
|
1592
|
+
|
|
1593
|
+
* **Case sensitivity:** Namespace and table names must be in lowercase. Trino will not list or access tables in namespaces that begin with an uppercase character.
|
|
1594
|
+
* **View compatibility:** Trino cannot read views created in Dremio due to SQL dialect incompatibility. Returns error: "Cannot read unsupported dialect 'DremioSQL'."
|
|
1595
|
+
|
|
1596
|
+
## Apache Flink
|
|
1597
|
+
|
|
1598
|
+
Apache Flink is a distributed stream processing framework designed for stateful computations over bounded and unbounded data streams, enabling real-time data pipelines and event-driven applications.
|
|
1599
|
+
|
|
1600
|
+
To connect Apache Flink to Dremio using Docker Compose, follow these steps:
|
|
1601
|
+
|
|
1602
|
+
### Prerequisites
|
|
1603
|
+
|
|
1604
|
+
You'll need to download the required JAR files and organize them in a project directory structure.
|
|
1605
|
+
|
|
1606
|
+
1. Create the project directory structure:
|
|
1607
|
+
|
|
1608
|
+
```
|
|
1609
|
+
mkdir -p flink-dremio/jars
|
|
1610
|
+
cd flink-dremio
|
|
1611
|
+
```
|
|
1612
|
+
2. Download the required JARs into the `jars/` directory:
|
|
1613
|
+
|
|
1614
|
+
* Iceberg Flink Runtime 1.20:
|
|
1615
|
+
|
|
1616
|
+
```
|
|
1617
|
+
wget -P jars/ https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.9.1/iceberg-flink-runtime-1.20-1.9.1.jar
|
|
1618
|
+
```
|
|
1619
|
+
* Iceberg AWS Bundle for vended credentials:
|
|
1620
|
+
|
|
1621
|
+
```
|
|
1622
|
+
wget -P jars/ https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.9.1/iceberg-aws-bundle-1.9.1.jar
|
|
1623
|
+
```
|
|
1624
|
+
* Hadoop dependencies required by Flink:
|
|
1625
|
+
|
|
1626
|
+
```
|
|
1627
|
+
wget -P jars/ https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
|
|
1628
|
+
```
|
|
1629
|
+
3. Create the Dockerfile.
|
|
1630
|
+
|
|
1631
|
+
Create a file named `Dockerfile` in the `flink-dremio` directory:
|
|
1632
|
+
|
|
1633
|
+
Flink Dockerfile
|
|
1634
|
+
|
|
1635
|
+
```
|
|
1636
|
+
FROM flink:1.20-scala_2.12
|
|
1637
|
+
|
|
1638
|
+
# Copy all required JARs
|
|
1639
|
+
COPY jars/*.jar /opt/flink/lib/
|
|
1640
|
+
```
|
|
1641
|
+
4. Create the `docker-compose.yml` file in the `flink-dremio` directory:
|
|
1642
|
+
|
|
1643
|
+
Flink docker-compose.yml
|
|
1644
|
+
|
|
1645
|
+
```
|
|
1646
|
+
services:
|
|
1647
|
+
flink-jobmanager:
|
|
1648
|
+
build: .
|
|
1649
|
+
ports:
|
|
1650
|
+
- "8081:8081"
|
|
1651
|
+
command: jobmanager
|
|
1652
|
+
environment:
|
|
1653
|
+
- |
|
|
1654
|
+
FLINK_PROPERTIES=
|
|
1655
|
+
jobmanager.rpc.address: flink-jobmanager
|
|
1656
|
+
parallelism.default: 2
|
|
1657
|
+
- AWS_REGION=us-west-2
|
|
1658
|
+
|
|
1659
|
+
flink-taskmanager:
|
|
1660
|
+
build: .
|
|
1661
|
+
depends_on:
|
|
1662
|
+
- flink-jobmanager
|
|
1663
|
+
command: taskmanager
|
|
1664
|
+
scale: 1
|
|
1665
|
+
environment:
|
|
1666
|
+
- |
|
|
1667
|
+
FLINK_PROPERTIES=
|
|
1668
|
+
jobmanager.rpc.address: flink-jobmanager
|
|
1669
|
+
taskmanager.numberOfTaskSlots: 4
|
|
1670
|
+
parallelism.default: 2
|
|
1671
|
+
- AWS_REGION=us-west-2
|
|
1672
|
+
```
|
|
1673
|
+
5. Build and start the Flink cluster:
|
|
1674
|
+
|
|
1675
|
+
```
|
|
1676
|
+
# Build and start the cluster
|
|
1677
|
+
docker-compose build --no-cache
|
|
1678
|
+
docker-compose up -d
|
|
1679
|
+
|
|
1680
|
+
# Verify the cluster is running
|
|
1681
|
+
docker-compose ps
|
|
1682
|
+
|
|
1683
|
+
# Verify required JARs are present
|
|
1684
|
+
docker-compose exec flink-jobmanager ls -la /opt/flink/lib/ | grep -E "(iceberg|hadoop)"
|
|
1685
|
+
```
|
|
1686
|
+
|
|
1687
|
+
You should see the JARs you downloaded in the previous step.
|
|
1688
|
+
6. Connect to the Flink SQL client:
|
|
1689
|
+
|
|
1690
|
+
```
|
|
1691
|
+
docker-compose exec flink-jobmanager ./bin/sql-client.sh
|
|
1692
|
+
```
|
|
1693
|
+
|
|
1694
|
+
You can also access the Flink web UI at `http://localhost:8081` to monitor jobs.
|
|
1695
|
+
7. Create the Dremio catalog connection in Flink:
|
|
1696
|
+
|
|
1697
|
+
```
|
|
1698
|
+
CREATE CATALOG polaris WITH (
|
|
1699
|
+
'type' = 'iceberg',
|
|
1700
|
+
'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
|
|
1701
|
+
'uri' = 'https://catalog.dremio.cloud/api/iceberg',
|
|
1702
|
+
'token' = '<personal_access_token>',
|
|
1703
|
+
'warehouse' = '<catalog_name>',
|
|
1704
|
+
'header.X-Iceberg-Access-Delegation' = 'vended-credentials',
|
|
1705
|
+
'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO'
|
|
1706
|
+
);
|
|
1707
|
+
```
|
|
1708
|
+
|
|
1709
|
+
Replace the following:
|
|
1710
|
+
|
|
1711
|
+
* `<personal_access_token>` with your Dremio personal access token.
|
|
1712
|
+
* `<catalog_name>` with your catalog name.
|
|
1713
|
+
|
|
1714
|
+
note
|
|
1715
|
+
|
|
1716
|
+
* In this configuration, `polaris` is the catalog identifier used in Flink queries. The `CREATE CATALOG` command maps this identifier to your actual Dremio catalog.
|
|
1717
|
+
* In `token`, you provide your Dremio personal access token directly. Dremio's catalog API accepts PATs as bearer tokens without requiring token exchange.
|
|
1718
|
+
8. Verify the catalog connection:
|
|
1719
|
+
|
|
1720
|
+
```
|
|
1721
|
+
Flink SQL> show catalogs;
|
|
1722
|
+
```
|
|
1723
|
+
|
|
1724
|
+
### Usage Examples
|
|
1725
|
+
|
|
1726
|
+
Once Apache Flink is running and connected to your Dremio catalog:
|
|
1727
|
+
|
|
1728
|
+
List namespaces
|
|
1729
|
+
|
|
1730
|
+
```
|
|
1731
|
+
Flink SQL> show databases in polaris;
|
|
1732
|
+
```
|
|
1733
|
+
|
|
1734
|
+
Query a table
|
|
1735
|
+
|
|
1736
|
+
```
|
|
1737
|
+
Flink SQL> select * from polaris.your_namespace.your_table;
|
|
1738
|
+
```
|
|
1739
|
+
|
|
1740
|
+
Create a table
|
|
1741
|
+
|
|
1742
|
+
```
|
|
1743
|
+
Flink SQL> CREATE TABLE polaris.demo_namespace.test_table (
|
|
1744
|
+
id INT,
|
|
1745
|
+
name STRING,
|
|
1746
|
+
created_date DATE,
|
|
1747
|
+
`value` DOUBLE
|
|
1748
|
+
);
|
|
1749
|
+
```
|
|
1750
|
+
|
|
1751
|
+
### Limitations
|
|
1752
|
+
|
|
1753
|
+
* **Reserved keywords:** Column names that are reserved keywords, such as `value`, `timestamp`, and `date`, must be enclosed in backticks when creating or querying tables.
|
|
1754
|
+
|
|
1755
|
+
Was this page helpful?
|
|
1756
|
+
|
|
1757
|
+
* Apache Spark
|
|
1758
|
+
+ Prerequisites
|
|
1759
|
+
+ Authenticate with a PAT
|
|
1760
|
+
+ Authenticate with an IDP
|
|
1761
|
+
+ Usage Examples
|
|
1762
|
+
* Trino
|
|
1763
|
+
+ Prerequisites
|
|
1764
|
+
+ Configuration
|
|
1765
|
+
+ Usage Examples
|
|
1766
|
+
+ Limitations
|
|
1767
|
+
* Apache Flink
|
|
1768
|
+
+ Prerequisites
|
|
1769
|
+
+ Usage Examples
|
|
1770
|
+
+ Limitations
|
|
1771
|
+
|
|
1772
|
+
<div style="page-break-after: always;"></div>
|
|
1773
|
+
|
|
1774
|
+
# Usage | Dremio Documentation
|
|
1775
|
+
|
|
1776
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/subscription/usage
|
|
1777
|
+
|
|
1778
|
+
On this page
|
|
1779
|
+
|
|
1780
|
+
There are multiple forms of billable Dremio usage within an [organization](/dremio-cloud/admin/subscription/#organizations):
|
|
1781
|
+
|
|
1782
|
+
* Dremio Consumption Units (DCUs) represent the usage of Dremio engines. DCUs are only consumed when your engines are running.
|
|
1783
|
+
* Large-language model (LLM) tokens are billed when you use Dremio's AI features via the Dremio-Provided LLM.
|
|
1784
|
+
* Storage usage is billed in terabyte-months and only applies to projects that use Dremio-hosted storage. If your projects use an object storage bucket in your account with a cloud provider as the catalog store, storage fees do not apply.
|
|
1785
|
+
|
|
1786
|
+
## How DCUs are Calculated
|
|
1787
|
+
|
|
1788
|
+
The number of DCUs consumed by an engine depends on two factors:
|
|
1789
|
+
|
|
1790
|
+
* The size of the engine
|
|
1791
|
+
* How long the engine and its replicas have been running for
|
|
1792
|
+
|
|
1793
|
+
DCU consumption for an engine is calculated as `(Total uptime for the engine and its replicas) * (DCU consumption rate for that engine size)`.
|
|
1794
|
+
|
|
1795
|
+
Uptime is measured in seconds and has a 60-second minimum.
|
|
1796
|
+
|
|
1797
|
+
The DCU consumption rate for each engine size supported in Dremio is listed in [Manage Engines](/dremio-cloud/admin/engines/).
|
|
1798
|
+
|
|
1799
|
+
### DCU Examples
|
|
1800
|
+
|
|
1801
|
+
#### Example 1
|
|
1802
|
+
|
|
1803
|
+
An organization has two Dremio Cloud engines defined: Engine A and Engine B, where Engine A is a 2XSmall engine, and Engine B is a Medium engine.
|
|
1804
|
+
|
|
1805
|
+
Suppose that between 8 a.m. and 9 a.m. one day:
|
|
1806
|
+
|
|
1807
|
+
* Engine A had 2 replicas running for 40 minutes each, so it accumulates a total of 80 minutes of engine uptime.
|
|
1808
|
+
* Engine B had 5 replicas running for 50 minutes each, so it accumulates a total of 250 minutes of engine uptime.
|
|
1809
|
+
|
|
1810
|
+
The total usage for Engine A for this hour is `(80/60) * (16 DCUs/hour) = 21.33 DCUs`.
|
|
1811
|
+
|
|
1812
|
+
The total usage for Engine B for this hour is `(250/60) * (128 DCUs/hour) = 533.33 DCUs`.
|
|
1813
|
+
|
|
1814
|
+
#### Example 2
|
|
1815
|
+
|
|
1816
|
+
An organization has one Dremio Cloud engine defined: Engine A, where Engine A is a Medium engine.
|
|
1817
|
+
|
|
1818
|
+
Suppose that between 8 a.m. and 9 a.m. one day:
|
|
1819
|
+
|
|
1820
|
+
* Engine A had 1 replica running for the entire hour (60 minutes).
|
|
1821
|
+
* Engine A needed to spin up an additional replica for 30 minutes to tackle a workload spike.
|
|
1822
|
+
|
|
1823
|
+
Engine A accumulated a total of 90 minutes of engine uptime, so the total usage for Engine A for this hour is `(90/60) * (128 DCUs/hour) = 192 DCUs`.
|
|
1824
|
+
|
|
1825
|
+
## How AI Usage Is Calculated
|
|
1826
|
+
|
|
1827
|
+
If you use the Dremio-Provided LLM, you pay directly for the cost of both the input and output tokens used. If you connect to another LLM via your own model provider, you are not currently charged for this usage.
|
|
1828
|
+
|
|
1829
|
+
### AI Examples
|
|
1830
|
+
|
|
1831
|
+
#### Example 1
|
|
1832
|
+
|
|
1833
|
+
Say that you use an external model provider as well as the Dremio-Provided LLM to use Dremio's AI features, resulting in a usage footprint like the below:
|
|
1834
|
+
|
|
1835
|
+
* External model provider 500K input tokens used.
|
|
1836
|
+
* External model provider: 30K output tokens used.
|
|
1837
|
+
* Dremio-Provided: 200K input tokens used.
|
|
1838
|
+
* Dremio-Provided: 20K output tokens used.
|
|
1839
|
+
|
|
1840
|
+
You are not charged for using Dremio's AI features via an external model. Instead, you are only charged for the tokens consumed by the Dremio-Provided LLM:
|
|
1841
|
+
|
|
1842
|
+
* (200K input tokens)\*($1.25/1 million tokens) = $0.25
|
|
1843
|
+
* (20K output tokens)\*($10.00/1 million tokens) = $0.20
|
|
1844
|
+
|
|
1845
|
+
In this scenario, you would be billed for $0.45 of AI feature usage.
|
|
1846
|
+
|
|
1847
|
+
In order to simplify the billing experience for AI features Dremio may explore the addition of an AI specific credit, similar to DCUs, in the future.
|
|
1848
|
+
|
|
1849
|
+
## How Storage Usage Is Calculated
|
|
1850
|
+
|
|
1851
|
+
Storage is calculated through the collection of periodic snapshots of the Dremio-hosted bucket. These snapshots throughout a billing period are averaged through the billing period to calculate a number of billable terabyte-months.
|
|
1852
|
+
|
|
1853
|
+
### Storage Usage Examples
|
|
1854
|
+
|
|
1855
|
+
#### Example 1
|
|
1856
|
+
|
|
1857
|
+
Suppose an organization has one Dremio project in a region where the price of a terabyte-month is $23.00, and that in a given month this project:
|
|
1858
|
+
|
|
1859
|
+
* Stores 1 terabyte of data for the entire 30 days of the billing period
|
|
1860
|
+
|
|
1861
|
+
Then the total amount charged for the storage would be (1) \* ($23.00) = $23.00
|
|
1862
|
+
|
|
1863
|
+
#### Example 2
|
|
1864
|
+
|
|
1865
|
+
Suppose your organization has a project in a region where the price of a terabyte-month is $23.00, and that in a given period this project:
|
|
1866
|
+
|
|
1867
|
+
* Stores 1 terabyte of data for the first 15 days of the month
|
|
1868
|
+
* Stores 2 terabyte of data for the last 15 days of the month
|
|
1869
|
+
|
|
1870
|
+
On average throughout the month, the project was storing 1.5Tb of data. So the bill would be (1.5) \* ($23.00) = $34.5.
|
|
1871
|
+
|
|
1872
|
+
Was this page helpful?
|
|
1873
|
+
|
|
1874
|
+
* How DCUs are Calculated
|
|
1875
|
+
+ DCU Examples
|
|
1876
|
+
* How AI Usage Is Calculated
|
|
1877
|
+
+ AI Examples
|
|
1878
|
+
* How Storage Usage Is Calculated
|
|
1879
|
+
+ Storage Usage Examples
|
|
1880
|
+
|
|
1881
|
+
<div style="page-break-after: always;"></div>
|
|
1882
|
+
|
|
1883
|
+
# Project Preferences | Dremio Documentation
|
|
1884
|
+
|
|
1885
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/preferences
|
|
1886
|
+
|
|
1887
|
+
Preferences let you customize the behavior of specific features in the Dremio console.
|
|
1888
|
+
|
|
1889
|
+
To view the available preferences:
|
|
1890
|
+
|
|
1891
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
1892
|
+
2. Select **Preferences** in the project settings sidebar.
|
|
1893
|
+
|
|
1894
|
+
This opens the Preferences page, showing the Dremio console settings that can be modified.
|
|
1895
|
+
3. Use the toggle next to the setting to enable or disable for all users.
|
|
1896
|
+
|
|
1897
|
+
If any preferences are modified, users must refresh their browsers to see the change.
|
|
1898
|
+
|
|
1899
|
+
These preferences and their descriptions are listed in the table below.
|
|
1900
|
+
|
|
1901
|
+
| Setting | Default | Enabled | Disabled | Details |
|
|
1902
|
+
| --- | --- | --- | --- | --- |
|
|
1903
|
+
| SQL Autocomplete | Enabled | Autocomplete provides suggestions for SQL keywords, catalog objects, and functions while you are constructing SQL statements. is visible in the SQL editor, although users can switch the button off within their own accounts. | The button is hidden from the SQL editor and suggestions are not provided. | See how this works in the [SQL editor](/dremio-cloud/get-started/quick-tour). |
|
|
1904
|
+
| Copy or Download Results | Enabled | and are visible above the results table, because users are allowed to copy or download the results in the SQL editor. | The buttons are hidden and users cannot copy or download results in the SQL editor. | See how this works in [result set actions](/dremio-cloud/get-started/quick-tour). |
|
|
1905
|
+
| Query Dataset on Click | Enabled | Clicking on a dataset opens the SQL Runner with a `SELECT` statement on the dataset. | If you would rather click directly on a dataset to see or edit the definition, disable this preference. Clicking on a dataset opens the Datasets page, showing a `SELECT` statement on the dataset or the dataset's definition that you can view or edit depending on your dataset privileges. | |
|
|
1906
|
+
| Autonomous Reflections | Enabled | Dremio automatically creates and drops Reflections based on query patterns from the last 7 days to seamlessly accelerate performance. | Dremio will provide recommendations to create and drop Reflections based on query patterns from the last 7 days to accelerate query performance. | See how this works in the [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections). |
|
|
1907
|
+
| AI Features | Enabled | When enabled, users can interact with the Dremio's AI Agent and AI functions. The AI Agent enables agentic workflows, allowing analysts to work with the agent to generate SQL queries, find insights, and create visualizations. The AI functions allow engineers to query unstructured data and use LLMs during SQL execution. | The AI Agent and AI functions will not work. | See how this works in [Explore with AI Agent](/dremio-cloud/explore-analyze/ai-agent) and [AI Functions](/dremio-cloud/sql/sql-functions/AI). |
|
|
1908
|
+
| Generate wikis and labels | Enabled | In the Details panel, both **Generate wiki** and **Generate labels** links will be visible for generating wikis and labels. | The links for **Generate wiki** and **Generate labels** will be hidden, making these features unavailable. | See how this works in [Wikis and Labels](/dremio-cloud/manage-govern/wikis-labels). |
|
|
1909
|
+
|
|
1910
|
+
Was this page helpful?
|
|
1911
|
+
|
|
1912
|
+
<div style="page-break-after: always;"></div>
|
|
1913
|
+
|
|
1914
|
+
# Profiles | Dremio Documentation
|
|
1915
|
+
|
|
1916
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/monitor/jobs/profiles
|
|
1917
|
+
|
|
1918
|
+
On this page
|
|
1919
|
+
|
|
1920
|
+
Visual profiles and raw profiles are available for jobs that have run queries.
|
|
1921
|
+
|
|
1922
|
+
## Visual Profiles
|
|
1923
|
+
|
|
1924
|
+
You can view the operations in visual profiles to diagnose performance or cost issues and to see the results of changes that you make, either to queries themselves or their environment, to improve performance or reduce costs.
|
|
1925
|
+
|
|
1926
|
+
A query profile details the plan that Dremio devised for running a query and shows statistics from the query's execution. A visual representation of a query profile is located on the Visual Profile tab. This visual profile consists of operators that are arranged as a tree, where each operator has one or two upstream operators that represent a specific action, such as a table scan, join, or sort. At the top of the tree, a single root operator represents the query results, and at the bottom, the leaf operators represent scan or read operations from datasets.
|
|
1927
|
+
|
|
1928
|
+
Data processing begins with the reading of datasets at the bottom of the tree structure, and data is sequentially processed up the tree. A query plan can have many branches, and each branch is processed separately until a join or other operation connects it to the rest of the tree.
|
|
1929
|
+
|
|
1930
|
+
### Phases
|
|
1931
|
+
|
|
1932
|
+
A query plan is composed of query phases (also called major fragments), and each phase defines a series of operations that are running in parallel. A query phase is depicted by the same colored boxes that are grouped together in a visual profile.
|
|
1933
|
+
|
|
1934
|
+
Within the query phases are multiple, single-threaded instances (also called minor fragments) running in parallel. Each thread is processing a different set of data through the same series of operations, and this data is exchanged from one phase to another. The number of threads for each operator can be found in the Details section (right panel) of a visual profile.
|
|
1935
|
+
|
|
1936
|
+
### Use Visual Profiles
|
|
1937
|
+
|
|
1938
|
+
To navigate to the visual profile for a job:
|
|
1939
|
+
|
|
1940
|
+
1. Click  in the side navigation bar.
|
|
1941
|
+
2. On the Jobs page, click a job that you would like to see the visual profile for.
|
|
1942
|
+
3. At the top of the next page, click the Visual Profile tab to open.
|
|
1943
|
+
|
|
1944
|
+
The main components of a visual profile are shown below:
|
|
1945
|
+
|
|
1946
|
+

|
|
1947
|
+
|
|
1948
|
+
| Location | Description |
|
|
1949
|
+
| --- | --- |
|
|
1950
|
+
| 1 | The Visual Profile tab shows a visual representation of a query profile. |
|
|
1951
|
+
| 2 | The left panel is where you can view the phases of the query execution or single operators, sorting them by runtime, total memory used, or records produced. Operators of the same color are within the same phase. Clicking the Collapse This button hides a panel from view. hides the left panel from view. |
|
|
1952
|
+
| 3 | The tree graph allows you to select an operator and find out where it is in relation to the rest of the query plan. |
|
|
1953
|
+
| 4 | The zoom controls the size of the tree graph so it's easier for you to view. |
|
|
1954
|
+
| 5 | The right panel shows the details and statistics about the selected operator. Clicking the Collapse This button hides a panel from view. hides the right panel from view. |
|
|
1955
|
+
|
|
1956
|
+
### Use Cases
|
|
1957
|
+
|
|
1958
|
+
#### Improve the Performance of Queries
|
|
1959
|
+
|
|
1960
|
+
You may notice that a query is taking more time than expected and want to know if something can be done to reduce the execution time. By viewing its visual profile, you can, for example, quickly find the operators with the highest processing times.
|
|
1961
|
+
|
|
1962
|
+
You might decide to try making simple adjustments to cause Dremio to choose a different plan. Some of the possible adjustments include:
|
|
1963
|
+
|
|
1964
|
+
* Adding a filter on a partition column to reduce the amount of data scanned
|
|
1965
|
+
* Changing join logic to avoid expanding joins (which return more rows than either of the inputs) or nested-loop joins
|
|
1966
|
+
* Creating a Reflection to avoid some of processing-intensive work done by the query
|
|
1967
|
+
|
|
1968
|
+
#### Reduce Query-Execution Costs
|
|
1969
|
+
|
|
1970
|
+
If you are an administrator, you may be interested in tuning the system as a whole to support higher concurrency and lower resource usage across the system, because you want to identify the most expensive queries in the system and then see what can be done to lower the cost of these queries. Such an investigation is often important even if individual users are happy with the performance of their own queries.
|
|
1971
|
+
|
|
1972
|
+
On the Jobs page, you can use the columns to find the queries with the highest cost, greatest number of rows scanned, and more. You can then study the visual profiles for these queries, identifying system or data problems, and mismatches between how data is stored and how these queries retrieve it. You can try repartitioning data, modifying data types, sorting, creating views, creating Reflections, and other changes.
|
|
1973
|
+
|
|
1974
|
+
## Raw Profiles
|
|
1975
|
+
|
|
1976
|
+
Click **Raw Profile** to open a raw profile of the job in a separate dialog, which includes a job summary, state durations, threads, resource allocation, operators, visualized plan, acceleration, and other details.
|
|
1977
|
+
|
|
1978
|
+
A raw profile is a UI-generated profile that is a subset of the data that you can download and provides a summary of metrics collected for each executed query that can be used to monitor and analyze query performance.
|
|
1979
|
+
|
|
1980
|
+
To navigate to a raw profile:
|
|
1981
|
+
|
|
1982
|
+
1. Click  in the side navigation bar to open the Jobs page.
|
|
1983
|
+
2. On the Jobs page, click a job that you would like to see the raw profile for.
|
|
1984
|
+
3. At the top of the next page, click the Raw Profile tab to open a raw profile of the job in a separate dialog. The associated raw profile dialog shows a variety of information for review.
|
|
1985
|
+
|
|
1986
|
+
### Views
|
|
1987
|
+
|
|
1988
|
+
Within the Raw Profile dialog, you can analyze the Job Metrics based on the following views:
|
|
1989
|
+
|
|
1990
|
+
| View | Description |
|
|
1991
|
+
| --- | --- |
|
|
1992
|
+
| Query | Shows the selected query statement and job metrics. See if your SQL query is what you were expecting and the query is run against the source data. |
|
|
1993
|
+
| Visualized Plan | Shows a visualized diagram and job metrics. This view is useful in understanding the flow of the query and for analyzing out of memory issues and incorrect results. The detailed visualized pan diagram is always read from the bottom up. |
|
|
1994
|
+
| Planning | Shows planning metrics, query output schema, non default options, and job metrics. This view shows how query planning is executed, because it provides statistics about the actual cost of the query operations in terms of memory, input/output, and CPU processing. You can use this view to identify which operations consumed the majority of the resources during a query and to address the cost-intensive operations. In particular, the following information is useful: * Non Default Options – See if non-default parameters are being used. * Metadata Cache Hits and Misses with times * Final Physical Transformation – Look for pushdown queries for RDBMS, MongoDB, or Elasticsearch, filter pushdowns or partition pruning for parquet, and view usage of stripes for ORC. * Compare the estimated row count versus the actual scan, join, or aggregate result. * Row Count – See if row count (versus rows) is used. Row count can cause an expensive broadcast. * Build – See if build (versus probe) is used. Build loads data into memory. |
|
|
1995
|
+
| Acceleration | Shows Reflection outcome, canonicalized user query alternatives, Reflection details, and job metrics. * Multiple substitutions – See if the substitutions are excessive. * System activity – See if `sys.project.reflections`, `sys.project.materializations`, and `sys.project.refreshes` are excessive. * Comparisons – Compare cumulative cost (found in Best Cost Replacement Plan) against Logical Planning, which is in the Planning view. This view is useful for determining whether exceptions or matches are occurring. The following considerations determines the acceleration process: * Considered, Matched, Chosen – The query is accelerated. * Considered, Matched, Not Chosen – The query is not accelerated because either a costing issue or an exception during substitution occurred. * Considered, Not Matched, Not Chosen – The query is not accelerated because the Reflection does not have the data to accelerate. |
|
|
1996
|
+
| Error | (If applicable) Shows information about an error. The Failure Node is always the coordinator node and the server name inside the error message is the actual affected node. |
|
|
1997
|
+
|
|
1998
|
+
### Job Metrics
|
|
1999
|
+
|
|
2000
|
+
Each view displays the following metrics:
|
|
2001
|
+
|
|
2002
|
+
* **Job Summary**
|
|
2003
|
+
* **Time in UTC**
|
|
2004
|
+
* **State Durations**
|
|
2005
|
+
* **Context**
|
|
2006
|
+
* **Threads**
|
|
2007
|
+
* **Resource Allocation**
|
|
2008
|
+
* **Nodes**
|
|
2009
|
+
* **Operators**
|
|
2010
|
+
|
|
2011
|
+
#### Job Summary
|
|
2012
|
+
|
|
2013
|
+
The job summary information includes:
|
|
2014
|
+
|
|
2015
|
+
* State
|
|
2016
|
+
* Coordinator
|
|
2017
|
+
* Threads
|
|
2018
|
+
* Command Pool Wait
|
|
2019
|
+
* Total Query Time
|
|
2020
|
+
* # Joins in user query
|
|
2021
|
+
* # Joins in final plan
|
|
2022
|
+
* Considered Reflections
|
|
2023
|
+
* Matched Reflections
|
|
2024
|
+
* Chosen Reflections
|
|
2025
|
+
|
|
2026
|
+
#### Time in UTC
|
|
2027
|
+
|
|
2028
|
+
The Time in UTC section lists the job's start and end time, in UTC format.
|
|
2029
|
+
|
|
2030
|
+
#### State Durations
|
|
2031
|
+
|
|
2032
|
+
The State Durations section lists the length of time (in milliseconds) for each of the job states:
|
|
2033
|
+
|
|
2034
|
+
* Pending
|
|
2035
|
+
* Metadata Retrieval
|
|
2036
|
+
* Planning
|
|
2037
|
+
* Engine Start
|
|
2038
|
+
* Queued
|
|
2039
|
+
* Execution Planning
|
|
2040
|
+
* Starting
|
|
2041
|
+
* Running
|
|
2042
|
+
|
|
2043
|
+
For descriptions of the job states, see [Job States and Statuses](/dremio-cloud/admin/monitor/jobs/#job-states-and-statuses).
|
|
2044
|
+
|
|
2045
|
+
#### Context
|
|
2046
|
+
|
|
2047
|
+
If you are querying an Iceberg catalog object, the Context section lists the Iceberg catalog and branch that is referenced in the query. Otherwise, the Context section is not populated. Read [Iceberg Catalogs in Dremio](/dremio-cloud/developer/data-formats/iceberg#iceberg-catalogs-in-dremio) for more information.
|
|
2048
|
+
|
|
2049
|
+
#### Threads
|
|
2050
|
+
|
|
2051
|
+
The Threads section provides an overview table and a major fragment block for each major fragment. Each row in the Overview table provides the number of minor fragments that Dremio parallelized from each major fragment, as well as aggregate time and memory metrics for the minor fragments.
|
|
2052
|
+
|
|
2053
|
+
Major fragment blocks correspond to a row in the Overview table. You can expand the blocks to see metrics for all of the minor fragments that were parallelized from each major fragment, including the host on which each minor fragment ran. Each row in the major fragment table presents the fragment state, time metrics, memory metrics, and aggregate input metrics of each minor fragment.
|
|
2054
|
+
|
|
2055
|
+
In particular, the following metrics are useful:
|
|
2056
|
+
|
|
2057
|
+
* Setup – Time opening and closing of files.
|
|
2058
|
+
* Waiting – Time waiting on the CPU.
|
|
2059
|
+
* Blocked on Downstream – Represents completed work whereas the next phase is not ready to accept work.
|
|
2060
|
+
* Blocked on Upstream – Represents the phase before it is ready to give work though the cloud phase is not ready.
|
|
2061
|
+
* Phase Metrics – Displays memory used per node (Phases can run in parallel).
|
|
2062
|
+
|
|
2063
|
+
#### Resource Allocation
|
|
2064
|
+
|
|
2065
|
+
The Resource Allocation section shows the following details for managed resources and workloads:
|
|
2066
|
+
|
|
2067
|
+
* Engine Name
|
|
2068
|
+
* Queue Name
|
|
2069
|
+
* Queue Id
|
|
2070
|
+
* Query Cost
|
|
2071
|
+
* Query Type
|
|
2072
|
+
|
|
2073
|
+
#### Nodes
|
|
2074
|
+
|
|
2075
|
+
The Nodes section includes host name, resource waiting time, and peak memory.
|
|
2076
|
+
|
|
2077
|
+
#### Operators
|
|
2078
|
+
|
|
2079
|
+
The Operators section shows aggregate metrics for each operator within a major fragment that performed relational operations during query execution.
|
|
2080
|
+
|
|
2081
|
+
**Operator Overview Table**
|
|
2082
|
+
|
|
2083
|
+
The following table lists descriptions for each column in the Operators Overview table:
|
|
2084
|
+
|
|
2085
|
+
| Column Name | Description |
|
|
2086
|
+
| --- | --- |
|
|
2087
|
+
| SqlOperatorImpl ID | The coordinates of an operator that performed an operation during a particular phase of the query. For example, 02-xx-03 where 02 is the major fragment ID, xx corresponds to a minor fragment ID, and 03 is the Operator ID. |
|
|
2088
|
+
| Type | The operator type. Operators can be of type project, filter, hash join, single sender, or unordered receiver. |
|
|
2089
|
+
| Min Setup Time, Avg Setup Time, Max Setup Time | In general, the time spent opening and closing files. Specifically, the minimum, average, and maximum amount of time spent by the operator to set up before performing the operation. |
|
|
2090
|
+
| Min Process Time, Avg Process Time, Max Process Time | The shortest amount of time the operator spent processing a record, the average time the operator spent in processing each record, and the maximum time that the operator spent in processing a record. |
|
|
2091
|
+
| Wait (min, avg, max) | In general, the time spent waiting on Disk I/O. These fields represent the minimum, average, and maximum times spent by operators waiting on disk I/O. |
|
|
2092
|
+
| Avg Peak Memory | Represents the average of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort. |
|
|
2093
|
+
| Max Peak Memory | Represents the maximum of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort. |
|
|
2094
|
+
|
|
2095
|
+
**Operator Block**
|
|
2096
|
+
|
|
2097
|
+
The Operator Block shows time and memory metrics for each operator type within a major fragment. Examples of operator types include:
|
|
2098
|
+
|
|
2099
|
+
* SCREEN
|
|
2100
|
+
* PROJECT
|
|
2101
|
+
* WRITER\_COMMITTER
|
|
2102
|
+
* ARROW\_WRITER
|
|
2103
|
+
|
|
2104
|
+
The following table describes each column in the Operator Block:
|
|
2105
|
+
|
|
2106
|
+
| Column Name | Description |
|
|
2107
|
+
| --- | --- |
|
|
2108
|
+
| Thread | The coordinate ID of the minor fragment on which the operator ran. For example, 04-03-01 where 04 is the major fragment ID, 03 is the minor fragment ID, and 01 is the Operator ID. |
|
|
2109
|
+
| Setup Time | The amount of time spent by the operator to set up before performing its operation. This includes run-time code generation and opening a file. |
|
|
2110
|
+
| Process Time | The amount of time spent by the operator to perform its operation. |
|
|
2111
|
+
| Wait Time | The cumulative amount of time spent by an operator waiting for external resources. such as waiting to send records, waiting to receive records, waiting to write to disk, and waiting to read from disk. |
|
|
2112
|
+
| Max Batches | The maximum number of record batches consumed from a single input stream. |
|
|
2113
|
+
| Max Records | The maximum number of records consumed from a single input stream. |
|
|
2114
|
+
| Peak Memory | Represents the peak direct memory allocated. Relates to the memory needed by the operators to perform their operations, such as hash join and sort. |
|
|
2115
|
+
| Host Name | The hostname of the Executor the minor fragment is running on. |
|
|
2116
|
+
| Record Processing Rate | The rate at which records in the minor fragment are being processed. Combined with the Host Name, the Record Processing Rate can help find hot spots in the cluster, either from skewed data or a noisy query running on the same cluster. |
|
|
2117
|
+
| Operator State | The status of the minor fragment. |
|
|
2118
|
+
| Last Schedule Time | The last time at which work related to the minor fragment was scheduled to be executed. |
|
|
2119
|
+
|
|
2120
|
+
Operator blocks also contain three drop-down menus: Operator Metrics, Operator Details, and Host Metrics. Operator Metrics and Operator Details are unique to the type of operator and provide more detail about the operation of the minor fragments. Operator Metrics and Operator Details are intended to be consumed by Dremio engineers. Depending on the operator, both can be blank. Host Metrics provides high-level information about the host used when executing the operator.
|
|
2121
|
+
|
|
2122
|
+
Was this page helpful?
|
|
2123
|
+
|
|
2124
|
+
* Visual Profiles
|
|
2125
|
+
+ Phases
|
|
2126
|
+
+ Use Visual Profiles
|
|
2127
|
+
+ Use Cases
|
|
2128
|
+
* Raw Profiles
|
|
2129
|
+
+ Views
|
|
2130
|
+
+ Job Metrics
|
|
2131
|
+
|
|
2132
|
+
<div style="page-break-after: always;"></div>
|
|
2133
|
+
|
|
2134
|
+
# Results Cache | Dremio Documentation
|
|
2135
|
+
|
|
2136
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/results-cache
|
|
2137
|
+
|
|
2138
|
+
On this page
|
|
2139
|
+
|
|
2140
|
+
Results cache improves query performance by reusing results from previous executions of the same deterministic query, provided that the underlying dataset remains unchanged and the previous execution was by the same user. The results cache feature works out of the box, requires no configuration, and automatically caches and reuses results. Regardless of whether a query uses results cache, it always returns the same results.
|
|
2141
|
+
|
|
2142
|
+
Results cache is client-agnostic, meaning a query executed in the Dremio console will result in a cache hit even if it is later re-run through other clients like JDBC, ODBC, REST, or Arrow Flight. For a query to use the cache, its query plan must remain identical to the original cached version. Any changes to the schema or dataset generate a new query plan, invalidating the cache.
|
|
2143
|
+
|
|
2144
|
+
Results cache also supports seamless coordinator scale-out, allowing newly added coordinators to benefit immediately from previously cached results.
|
|
2145
|
+
|
|
2146
|
+
## Cases Supported By Results Cache
|
|
2147
|
+
|
|
2148
|
+
Query result are cached in the following cases:
|
|
2149
|
+
|
|
2150
|
+
* The SQL statement is a `SELECT` statement.
|
|
2151
|
+
* The query reads from an Iceberg, Parquet dataset, or from a raw Reflection defined on other Dremio supported data sources and formats, such as relational databases, `CSV`, `JSON`, or `TEXT`.
|
|
2152
|
+
* The query does not contain dynamic functions such as `QUERY_USER`, `IS_MEMBER`, `RAND`, `CURRENT_DATE`, or `NOW`.
|
|
2153
|
+
* The query does not reference `SYS` or `INFORMATION_SCHEMA` tables, or use external query.
|
|
2154
|
+
* The result set size, when stored in Arrow format, is less than or equal to 20 MB.
|
|
2155
|
+
* The query is not executed in Dremio console as a preview.
|
|
2156
|
+
|
|
2157
|
+
## View Whether Queries Used Results Cache
|
|
2158
|
+
|
|
2159
|
+
You can view the list of jobs on the Jobs page to determine if queries from data consumers were accelerated by the results cache.
|
|
2160
|
+
|
|
2161
|
+
To find whether a query was accelerated by a results cache:
|
|
2162
|
+
|
|
2163
|
+
1. Find the job that ran the query and look for  next to it, which indicates that the query was accelerated using either Reflections or the results cache.
|
|
2164
|
+
2. Click on the row representing the job that ran the query to view the job summary. The summary, displayed in the pane to the right, provides details on whether the query was accelerated using results cache or Reflections.
|
|
2165
|
+
|
|
2166
|
+

|
|
2167
|
+
|
|
2168
|
+
## Storage
|
|
2169
|
+
|
|
2170
|
+
Cached results are stored in the project store alongside all project-specific data, such as metadata and Reflections. Executors write cache entries as Arrow data files and read them when processing `SELECT` queries that result in a cache hit. Coordinators are responsible for managing the deletion of expired cache files.
|
|
2171
|
+
|
|
2172
|
+
## Deletion
|
|
2173
|
+
|
|
2174
|
+
A background task running on one of the Dremio coordinators handles cache expiration. This task runs every hour to mark cache entries that have not been accessed in the past 24 hours as expired and subsequently deletes them along with their associated cache files.
|
|
2175
|
+
|
|
2176
|
+
## Considerations and Limitations
|
|
2177
|
+
|
|
2178
|
+
SQL queries executed through the Dremio console or a REST client that access the cache will rewrite the cached query results to the job results store to enable pagination.
|
|
2179
|
+
|
|
2180
|
+
Was this page helpful?
|
|
2181
|
+
|
|
2182
|
+
* Cases Supported By Results Cache
|
|
2183
|
+
* View Whether Queries Used Results Cache
|
|
2184
|
+
* Storage
|
|
2185
|
+
* Deletion
|
|
2186
|
+
* Considerations and Limitations
|
|
2187
|
+
|
|
2188
|
+
<div style="page-break-after: always;"></div>
|
|
2189
|
+
|
|
2190
|
+
# Workload Management | Dremio Documentation
|
|
2191
|
+
|
|
2192
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/engines/workload-management
|
|
2193
|
+
|
|
2194
|
+
On this page
|
|
2195
|
+
|
|
2196
|
+
This topic covers how to manage resources and workloads by routing queries to particular engines through rules.
|
|
2197
|
+
|
|
2198
|
+
## Overview
|
|
2199
|
+
|
|
2200
|
+
You can manage Dremio workloads via routing rules, which are evaluated at runtime (before query planning) to decide which [query engine](/dremio-cloud/admin/engines/) to use for a given query. In projects with only one engine, all queries share the same execution resources and route to the same single engine. However, when multiple engines are provisioned, rules determine the engine to be used.
|
|
2201
|
+
|
|
2202
|
+
You must arrange the rules in the order that you want them to be evaluated. In the case that multiple rules evaluate to true for a given query, the first rule that returns true will be used to select the engine.
|
|
2203
|
+
The following diagram shows a series of rules that are evaluated when a job gets submitted.
|
|
2204
|
+
|
|
2205
|
+
* Rule1 routes jobs to Engine1
|
|
2206
|
+
* Rule2 routes jobs to Engine2
|
|
2207
|
+
* Rule3 routes jobs to the default engine that was created on project start up
|
|
2208
|
+
* Rule4 rejects the jobs that evaluate to true
|
|
2209
|
+
|
|
2210
|
+

|
|
2211
|
+
|
|
2212
|
+
## Rules
|
|
2213
|
+
|
|
2214
|
+
You can use Dremio SQL syntax to specify rules to target particular jobs.
|
|
2215
|
+
|
|
2216
|
+
The following are the types of rules that can be created along with examples.
|
|
2217
|
+
|
|
2218
|
+
### User
|
|
2219
|
+
|
|
2220
|
+
Create a rule that identifies the user that triggers the job.
|
|
2221
|
+
|
|
2222
|
+
Create rule that identifies user
|
|
2223
|
+
|
|
2224
|
+
```
|
|
2225
|
+
USER in ('JRyan','PDirk','CPhillips')
|
|
2226
|
+
```
|
|
2227
|
+
|
|
2228
|
+
### Group Membership
|
|
2229
|
+
|
|
2230
|
+
Create a rule that identifies if the user that triggers the job is part of a particular group.
|
|
2231
|
+
|
|
2232
|
+
Create rule that identifies whether user belongs to a specified group
|
|
2233
|
+
|
|
2234
|
+
```
|
|
2235
|
+
is_member('MarketingOps') OR
|
|
2236
|
+
is_member('Engineering')
|
|
2237
|
+
```
|
|
2238
|
+
|
|
2239
|
+
### Job Type
|
|
2240
|
+
|
|
2241
|
+
Create a rule depending on the type of job. The types of jobs can be identified by the following categories:
|
|
2242
|
+
|
|
2243
|
+
* Flight
|
|
2244
|
+
* JDBC
|
|
2245
|
+
* Internal Preview
|
|
2246
|
+
* Internal Run
|
|
2247
|
+
* Metadata Refresh
|
|
2248
|
+
* ODBC
|
|
2249
|
+
* Reflections
|
|
2250
|
+
* REST
|
|
2251
|
+
* UI Download
|
|
2252
|
+
* UI Preview
|
|
2253
|
+
* UI Run
|
|
2254
|
+
|
|
2255
|
+
Create rule based on type of job
|
|
2256
|
+
|
|
2257
|
+
```
|
|
2258
|
+
query_type() IN ('JDBC', 'ODBC', 'UI Run', 'Flight')
|
|
2259
|
+
```
|
|
2260
|
+
|
|
2261
|
+
### Query Label
|
|
2262
|
+
|
|
2263
|
+
Labels enable rules that route queries running named commands to specific engines. Dremio supports the following query labels:
|
|
2264
|
+
|
|
2265
|
+
| Query Label | Description |
|
|
2266
|
+
| --- | --- |
|
|
2267
|
+
| COPY | Assigned to all queries running a [COPY INTO](/dremio-cloud/sql/commands/copy-into-table) SQL command |
|
|
2268
|
+
| CTAS | Assigned to all queries running a [CREATE TABLE AS](/dremio-cloud/sql/commands/create-table-as) SQL command |
|
|
2269
|
+
| DML | Assigned to all queries running an [INSERT](/dremio-cloud/sql/commands/insert), [UPDATE](/dremio-cloud/sql/commands/update), [DELETE](/dremio-cloud/sql/commands/delete), [MERGE](/dremio-cloud/sql/commands/merge), or [TRUNCATE](/dremio-cloud/sql/commands/truncate) SQL command |
|
|
2270
|
+
| OPTIMIZATION | Assigned to all queries running an [OPTIMIZE](/dremio-cloud/sql/commands/optimize-table) SQL command |
|
|
2271
|
+
|
|
2272
|
+
Here are two example routing rules:
|
|
2273
|
+
|
|
2274
|
+
Create a routing rule for queries running a COPY INTO command
|
|
2275
|
+
|
|
2276
|
+
```
|
|
2277
|
+
query_label() IN ('COPY')
|
|
2278
|
+
```
|
|
2279
|
+
|
|
2280
|
+
Create a routing rule for queries running the DML commands INSERT, UPDATE, DELETE, MERGE, or TRUNCATE
|
|
2281
|
+
|
|
2282
|
+
```
|
|
2283
|
+
query_label() IN ('DML')
|
|
2284
|
+
```
|
|
2285
|
+
|
|
2286
|
+
### Query Attributes
|
|
2287
|
+
|
|
2288
|
+
Query attributes enable routing rules that direct queries to specific engines based on their characteristics.
|
|
2289
|
+
|
|
2290
|
+
Dremio supports the following query attributes:
|
|
2291
|
+
|
|
2292
|
+
| Query Attribute | Description |
|
|
2293
|
+
| --- | --- |
|
|
2294
|
+
| `DREMIO_MCP` | Set when the job is submitted via the Dremio MCP Server. |
|
|
2295
|
+
| `AI_AGENT` | Set when the job is submitted via the Dremio AI Agent. |
|
|
2296
|
+
| `AI_FUNCTIONS` | Set when the job contains AI functions. |
|
|
2297
|
+
|
|
2298
|
+
You can use the following functions to define routing rules based on query attributes:
|
|
2299
|
+
|
|
2300
|
+
| Function | Applicable Attribute | Description |
|
|
2301
|
+
| --- | --- | --- |
|
|
2302
|
+
| `query_has_attribute(<attr>)` | `DREMIO_MCP`, `AI_AGENT`, `AI_FUNCTIONS` | Returns true if the specified attribute is present. |
|
|
2303
|
+
| `query_attribute(<attr>)` | `DREMIO_MCP`, `AI_AGENT`, `AI_FUNCTIONS` | Returns the value of the attribute (if present), otherwise NULL. |
|
|
2304
|
+
| `query_calls_ai_functions()` | NA | Returns true if the job has an AI function in the query. |
|
|
2305
|
+
|
|
2306
|
+
Examples:
|
|
2307
|
+
|
|
2308
|
+
Create a routing rule for queries that use AI functions and are executed by a user
|
|
2309
|
+
|
|
2310
|
+
```
|
|
2311
|
+
query_calls_ai_functions() AND USER = 'JRyan'
|
|
2312
|
+
```
|
|
2313
|
+
|
|
2314
|
+
Create a routing rule for queries with `DREMIO_MCP` and `AI_FUNCTION`
|
|
2315
|
+
|
|
2316
|
+
```
|
|
2317
|
+
query_has_attribute('DREMIO_MCP') AND query_has_attribute('AI_FUNCTIONS')
|
|
2318
|
+
```
|
|
2319
|
+
|
|
2320
|
+
### Tag
|
|
2321
|
+
|
|
2322
|
+
Create a rule that routes jobs based on a routing tag.
|
|
2323
|
+
|
|
2324
|
+
Create rule that routes jobs based on routing tag
|
|
2325
|
+
|
|
2326
|
+
```
|
|
2327
|
+
tag() = 'ProductionDashboardQueue'
|
|
2328
|
+
```
|
|
2329
|
+
|
|
2330
|
+
### Date and Time
|
|
2331
|
+
|
|
2332
|
+
Create a rule that routes a job based on the time it was triggered. Use Dremio SQL Functions.
|
|
2333
|
+
|
|
2334
|
+
Create rule that routes jobs based on time triggered
|
|
2335
|
+
|
|
2336
|
+
```
|
|
2337
|
+
EXTRACT(HOUR FROM CURRENT_TIME)
|
|
2338
|
+
BETWEEN 9 AND 18
|
|
2339
|
+
```
|
|
2340
|
+
|
|
2341
|
+
### Combined Conditions
|
|
2342
|
+
|
|
2343
|
+
Create rules based on multiple conditions.
|
|
2344
|
+
|
|
2345
|
+
The following example routes a job depending on user, group membership, query type, query cost, tag, and the time of day that it was triggered.
|
|
2346
|
+
|
|
2347
|
+
Create rule based on user, group, job type, query cost, tag, and time triggered
|
|
2348
|
+
|
|
2349
|
+
```
|
|
2350
|
+
(
|
|
2351
|
+
USER IN ('JRyan', 'PDirk', 'CPhillips')
|
|
2352
|
+
OR is_member('superadmins')
|
|
2353
|
+
)
|
|
2354
|
+
AND query_type IN ('ODBC')
|
|
2355
|
+
AND EXTRACT(HOUR FROM CURRENT_TIME)
|
|
2356
|
+
BETWEEN 9 AND 18
|
|
2357
|
+
```
|
|
2358
|
+
|
|
2359
|
+
### Default Rules
|
|
2360
|
+
|
|
2361
|
+
Each Dremio [project](/dremio-cloud/admin/projects/) has its own set of rules. When a project is created, Dremio automatically creates rules for the default and preview engines. You can edit these rules as needed.
|
|
2362
|
+
|
|
2363
|
+
| Order | Rule Name | Rule | Engine |
|
|
2364
|
+
| --- | --- | --- | --- |
|
|
2365
|
+
| 1 | UI Previews | query\_type() = 'UI Preview' | preview |
|
|
2366
|
+
| 2 | Reflections | query\_type() = 'Reflections' | default |
|
|
2367
|
+
| 3 | All Other Queries | All other queries | default |
|
|
2368
|
+
|
|
2369
|
+
## View All Rules
|
|
2370
|
+
|
|
2371
|
+
To view all rules:
|
|
2372
|
+
|
|
2373
|
+
1. Click the Project Settings  icon in the side navigation bar.
|
|
2374
|
+
2. Select **Engine Routing** in the project settings sidebar to see the list of engine routing rules.
|
|
2375
|
+
|
|
2376
|
+
## Add a Rule
|
|
2377
|
+
|
|
2378
|
+
To add a rule:
|
|
2379
|
+
|
|
2380
|
+
1. On the Engine Routing page, click the **Add Rule** button at the top-right corner of the screen.
|
|
2381
|
+
2. In the **New Rule** dialog, for **Rule Name**, enter a name.
|
|
2382
|
+
3. For **Conditions**, enter the routing condition. See Rules for supported conditions.
|
|
2383
|
+
4. For **Action**, complete one of the following options:
|
|
2384
|
+
|
|
2385
|
+
a. If you want to route the jobs that meet the conditions to a particular engine, select the **Route to engine** option. Then use the engine selector to choose the engine.
|
|
2386
|
+
|
|
2387
|
+
b. If you want to reject the jobs that meet the conditions, select the **Reject** option.
|
|
2388
|
+
5. Click **Add**.
|
|
2389
|
+
|
|
2390
|
+
## Edit a Rule
|
|
2391
|
+
|
|
2392
|
+
To edit a rule:
|
|
2393
|
+
|
|
2394
|
+
1. On the Engine Routing page, hover over the rule and click the Edit Rule  icon that appears next to the rule.
|
|
2395
|
+
2. In the **Edit Rule** dialog, for **Rule Name**, enter a name.
|
|
2396
|
+
3. For **Conditions**, enter the routing condition. See Rules for supported conditions.
|
|
2397
|
+
4. For **Action**, complete one of the following options:
|
|
2398
|
+
|
|
2399
|
+
a. If you want to route the jobs that meet the conditions to a particular engine, select the **Route to engine** option. Then use the engine selector to choose the engine.
|
|
2400
|
+
|
|
2401
|
+
b. If you want to reject the jobs that meet the conditions, select the **Reject** option.
|
|
2402
|
+
5. Click **Save**.
|
|
2403
|
+
|
|
2404
|
+
## Delete a Rule
|
|
2405
|
+
|
|
2406
|
+
To delete a rule:
|
|
2407
|
+
|
|
2408
|
+
1. On the Engine Routing page, hover over the rule and click the Delete Rule  icon that appears next to the rule.
|
|
2409
|
+
|
|
2410
|
+
caution
|
|
2411
|
+
|
|
2412
|
+
You must have at least one rule per project to route queries to a particular engine.
|
|
2413
|
+
|
|
2414
|
+
2. In the **Delete Rule** dialog, click **Delete** to confirm.
|
|
2415
|
+
|
|
2416
|
+
## Set and Reset Engines
|
|
2417
|
+
|
|
2418
|
+
The [`SET ENGINE`](/dremio-cloud/sql/commands/set-engine) SQL command is used to specify the exact execution engine to run subsequent queries in the current session. When using `SET ENGINE`, WLM rules and direct routing connection properties are bypassed, and queries are routed directly to the specified queue. The [`RESET ENGINE`](/dremio-cloud/sql/commands/reset-engine) command clears the session-level engine override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.
|
|
2419
|
+
|
|
2420
|
+
## SET TAG
|
|
2421
|
+
|
|
2422
|
+
The [`SET TAG`](/dremio-cloud/sql/commands/set-tag) SQL command is used to specify routing tag for subsequent queries in the current session. If a `ROUTING_TAG` connection property is already set for the session, `SET TAG` will override it. When using `SET TAG`, you must have a previously defined Workload Management (WLM) routing rule that routes queries based on that routing tag. The [`RESET TAG`](/dremio-cloud/sql/commands/reset-tag) command clears the session-level routing tag override, reverting query routing to follow the Workload Management (WLM) rules or any direct routing connection property if set.
|
|
2423
|
+
|
|
2424
|
+
## Connection Tagging and Direct Routing Configuration
|
|
2425
|
+
|
|
2426
|
+
Routing tags are configured by setting the `ROUTING_TAG = <Tag Name>` parameter for a given session to the desired tag name.
|
|
2427
|
+
|
|
2428
|
+
### JDBC Session Configuration
|
|
2429
|
+
|
|
2430
|
+
To configure JDBC sessions add the `ROUTING_TAG` parameter to the JDBC connection URL. For example: `jdbc:dremio:direct=localhost;ROUTING_TAG='TagA'`.
|
|
2431
|
+
|
|
2432
|
+
### ODBC Session Configuration
|
|
2433
|
+
|
|
2434
|
+
Configure ODBC sessions as follows:
|
|
2435
|
+
|
|
2436
|
+
*Windows Sessions*
|
|
2437
|
+
|
|
2438
|
+
Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the ODBC DSN field.
|
|
2439
|
+
|
|
2440
|
+
*Mac OS Sessions*
|
|
2441
|
+
|
|
2442
|
+
1. Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the system `odbc.ini` file located at `/Library/ODBC/odbc.ini`. After adding the parameter, an example Advanced Properties configuration might be: `AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;NumberOfPrefetchBuffers=5;ROUTING_TAG='TagA';`
|
|
2443
|
+
2. Add the `ROUTING_TAG` parameter to the `AdvancedProperties` parameter in the user's DSN located at `~/Library/ODBC/odbc.ini`
|
|
2444
|
+
|
|
2445
|
+
## Best Practices for Workload Management
|
|
2446
|
+
|
|
2447
|
+
Because every query workload is different, engine sizing often depends on several factors, such as the complexity of queries, number of concurrent users, data sources, dataset size, file and table formats, and specific business requirements for latency and cost. Workload management (WLM) ensures reliable query performance by choosing adequately sized engines for each workload type, configuring engines, and implementing query routing rules to segregate and route query workload types to appropriate engines.
|
|
2448
|
+
|
|
2449
|
+
This section describes best practices for adding and using Dremio engines, as well as configuring WLM to achieve reliable query performance in Dremio. This section also includes tips for migrating from self-managed Dremio Software to fully managed Dremio and information about using the system table `sys.project.history.jobs`, which stores metadata for historical jobs executed in a project, to assess the efficacy of WLM settings and make adjustments.
|
|
2450
|
+
|
|
2451
|
+
### Set Up Engines
|
|
2452
|
+
|
|
2453
|
+
As a fully managed offering, Dremio is the best deployment model for Dremio in production because it allows you to achieve high levels of reliability and durability for your queries and maximize resource efficiency with engine autoscaling and does not require you to manually create and manage engines.
|
|
2454
|
+
|
|
2455
|
+
Segregating workload types into separate engines is vital for mitigating noisy neighbor issues, which can jeopardize performance reliability. You can segregate workloads by type, such as ad hoc, dashboard, and lakehouse (COPY INTO, DML, and optimization), as well as by business unit to facilitate cost distribution.
|
|
2456
|
+
|
|
2457
|
+
Metadata and Reflection refresh workloads should have their own engines for executing metadata and Reflection refresh queries. These internal queries can use a substantial amount of engine bandwidth, so assigning separate engines ensures that they do not interfere with user-initiated queries. Initial engine sizes should be XSmall and Small, but these sizes may change depending on the number and complexity of Reflection refresh and metadata jobs.
|
|
2458
|
+
|
|
2459
|
+
Dremio recommends the following engine setup configurations:
|
|
2460
|
+
|
|
2461
|
+
* Dremio offers a range of [engine sizes](/dremio-cloud/admin/engines/#sizes). Experiment with typical queries, concurrency, and engine sizes to establish the best engine size for each workload type based on your organization's budget constraints and latency requirements.
|
|
2462
|
+
* Maximum concurrency is the maximum number of jobs that Dremio can execute concurrently on an engine replica. Dremio provides an out-of-the-box value for maximum concurrency based on engine size, but we recommend testing with typical queries directed to specific engines to determine the best maximum concurrency values for your query workloads.
|
|
2463
|
+
* Dremio offers autoscaling to meet the demands of dynamic workloads with engine replicas. It is vital to assess and configure each engine's autoscaling parameters based on your organization's budget constraints and latency requirements for each workload type. You can choose the minimum and maximum number of replicas for each engine and specify any advanced configuration as needed. For example, dashboard workloads must meet stringent low-latency requirements and are prioritized for performance rather than cost. Engines added and assigned to execute the dashboard workloads may therefore be configured to autoscale using replicas. On the other hand, an engine for ad hoc workloads may have budget constraints and therefore be configured to autoscale with a maximum of one replica.
|
|
2464
|
+
|
|
2465
|
+
### Route Workloads
|
|
2466
|
+
|
|
2467
|
+
Queries are routed to engines according to routing rules. You may use Dremio's out-of-the-box routing rules that route queries to the preview engines that are established by default, but Dremio recommends creating custom routing rules based on your workloads and business requirements. Custom rules can include factors such as user, group membership, job type, date and time, query label, and tag. Read Rules for examples.
|
|
2468
|
+
|
|
2469
|
+
The following table lists example routing rules based on query\_type, query\_label, and tags:
|
|
2470
|
+
|
|
2471
|
+
| Order | Rule Name | Rule | Engine |
|
|
2472
|
+
| --- | --- | --- | --- |
|
|
2473
|
+
| 1 | Reflections | `query_type() = 'Reflections'` | Reflection |
|
|
2474
|
+
| 2 | Metadata | `query_type() = 'Metadata Refresh'` | Metadata |
|
|
2475
|
+
| 3 | Dashboards | `tag() = 'dashboard'` | Dashboard |
|
|
2476
|
+
| 4 | Ad hoc Queries | `query_type() IN ( 'UI Run' , 'REST') OR tag() = 'ad hoc'` | Ad hoc |
|
|
2477
|
+
| 5 | Lakehouse Queries | `query_label() IN ('COPY','DML','CTAS', 'OPTIMIZATION')` | Lakehouse |
|
|
2478
|
+
| 6 | All Other Queries | All other queries | Preview |
|
|
2479
|
+
|
|
2480
|
+
### Use the `sys.project.history.jobs` System Table
|
|
2481
|
+
|
|
2482
|
+
The [`sys.project.history.jobs`](/dremio-cloud/sql/system-tables/jobs-historical) system table contains metadata for recent jobs executed in a project, including time statistics, cost, and other relevant information. You can use the data in the `sys.project.history.jobs` system table to evaluate the effectiveness of WLM settings and make adjustments based on job metadata.
|
|
2483
|
+
|
|
2484
|
+
### Use Job Analyzer
|
|
2485
|
+
|
|
2486
|
+
Job Analyzer is a package of useful query and view definitions that you may create over the `sys.projects.history.jobs` system table and use to analyze job metadata. Job Analyzer is available in a [public GitHub repository](https://github.com/dremio/professional-services/tree/main/tools/dremio-cloud-job-analyzer).
|
|
2487
|
+
|
|
2488
|
+
Was this page helpful?
|
|
2489
|
+
|
|
2490
|
+
* Overview
|
|
2491
|
+
* Rules
|
|
2492
|
+
+ User
|
|
2493
|
+
+ Group Membership
|
|
2494
|
+
+ Job Type
|
|
2495
|
+
+ Query Label
|
|
2496
|
+
+ Query Attributes
|
|
2497
|
+
+ Tag
|
|
2498
|
+
+ Date and Time
|
|
2499
|
+
+ Combined Conditions
|
|
2500
|
+
+ Default Rules
|
|
2501
|
+
* View All Rules
|
|
2502
|
+
* Add a Rule
|
|
2503
|
+
* Edit a Rule
|
|
2504
|
+
* Delete a Rule
|
|
2505
|
+
* Set and Reset Engines
|
|
2506
|
+
* SET TAG
|
|
2507
|
+
* Connection Tagging and Direct Routing Configuration
|
|
2508
|
+
+ JDBC Session Configuration
|
|
2509
|
+
+ ODBC Session Configuration
|
|
2510
|
+
* Best Practices for Workload Management
|
|
2511
|
+
+ Set Up Engines
|
|
2512
|
+
+ Route Workloads
|
|
2513
|
+
+ Use the `sys.project.history.jobs` System Table
|
|
2514
|
+
+ Use Job Analyzer
|
|
2515
|
+
|
|
2516
|
+
<div style="page-break-after: always;"></div>
|
|
2517
|
+
|
|
2518
|
+
# Manual Reflections | Dremio Documentation
|
|
2519
|
+
|
|
2520
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/
|
|
2521
|
+
|
|
2522
|
+
On this page
|
|
2523
|
+
|
|
2524
|
+
With [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) reducing the need for manual work, you no longer need to create or manage Reflections. However, when Autonomous Reflections are not enabled or for situations that require manual control, this page provides guidance on getting Reflection recommendations and how to manage raw Reflections, aggregation Reflections, and external Reflections in Dremio.
|
|
2525
|
+
|
|
2526
|
+
note
|
|
2527
|
+
|
|
2528
|
+
For non-duplicating joins, Dremio can accelerate queries that reference only some of the joins in a Reflection, eliminating the need to create separate Reflections for every table combination.
|
|
2529
|
+
|
|
2530
|
+
## Reflection Recommendations
|
|
2531
|
+
|
|
2532
|
+
When [Autonomous Reflections](/dremio-cloud/admin/performance/autonomous-reflections) are not enabled, Dremio automatically provides recommendations to add and remove Reflections based on query patterns to optimize performance for queries on Iceberg tables, UniForm table, Parquet datasets, and any views built on these datasets.
|
|
2533
|
+
|
|
2534
|
+
Recommendations to add Reflections are sorted by overall effectiveness, with the most effective recommendations shown on top. Effectiveness relates to metrics such as the estimated number of accelerated jobs, potential increase in query execution speedup, and potential time saved during querying. These are rough estimates based on past data that can give you insight into the potential benefits of each recommendation.
|
|
2535
|
+
Reflections created using these recommendations refresh automatically when source data changes on:
|
|
2536
|
+
|
|
2537
|
+
* Iceberg tables – When the table is modified through Dremio or other engines. Dremio polls tables every 10 seconds.
|
|
2538
|
+
* Parquet datasets – When metadata is updated in Dremio.
|
|
2539
|
+
|
|
2540
|
+
To view and apply the Reflection recommendations:
|
|
2541
|
+
|
|
2542
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project Settings**.
|
|
2543
|
+
2. Select **Reflections** from the project settings sidebar.
|
|
2544
|
+
3. Click **Reflections Recommendations** to access the list of suggested Reflections.
|
|
2545
|
+
4. To apply a recommendation, click  at the end of the corresponding row.
|
|
2546
|
+
|
|
2547
|
+
Reflections created using usage-based recommendations are only used when fully synchronized with their source data to ensure up-to-date query results.
|
|
2548
|
+
|
|
2549
|
+
To generate recommendations for default raw and aggregation Reflections, you can obtain the job IDs by looking them up on the [Jobs page](/dremio-cloud/admin/monitor/jobs). Then, use either the [`SYS.RECOMMEND_REFLECTIONS`](/dremio-cloud/sql/table-functions/recommend-reflections) table function or the [Recommendations API](/dremio-cloud/api/reflection/recommendations) to submit job IDs to accelerate specific SQL queries.
|
|
2550
|
+
|
|
2551
|
+
## Raw Reflections
|
|
2552
|
+
|
|
2553
|
+
Retain the same number of records as its anchor while allowing a subset of columns. It enhances query performance by materializing complex views, transforming data from non-performant sources into the Iceberg table format optimized for large-scale analytics, and utilizing partitioning and sorting for faster access. By precomputing and storing data in an optimized format, raw Reflections significantly reduce query latency and improve overall efficiency.
|
|
2554
|
+
|
|
2555
|
+
You can use the Reflections editor to create two types of raw Reflection:
|
|
2556
|
+
|
|
2557
|
+
* A default raw Reflection that includes all of the columns of the anchor, but does not sort or horizontally partition on any columns
|
|
2558
|
+
* A raw Reflection that includes all or a subset of the columns of the anchor, and that does one or both of the following things:
|
|
2559
|
+
|
|
2560
|
+
+ Sorts on one or more columns
|
|
2561
|
+
+ Horizontally partitions the data according to the values in one or more columns
|
|
2562
|
+
|
|
2563
|
+
note
|
|
2564
|
+
|
|
2565
|
+
For creating Reflections on views and tables with row-access and column-masking policies, see [Use Reflections on Datasets with Policies](/dremio-cloud/manage-govern/row-column-policies#use-reflections-on-datasets-with-policies).
|
|
2566
|
+
|
|
2567
|
+
### Prerequisites
|
|
2568
|
+
|
|
2569
|
+
* If you want to accelerate queries on unoptimized data or data in slow storage, create a view that is itself created from a table in a non-columnar format or on slow-scan storage. You can then create your raw Reflection from that view.
|
|
2570
|
+
* If you want to accelerate "needle-in-a-haystack" queries, create a view that includes a predicate to include only the rows that you want to scan. You can then create your raw Reflection from that view.
|
|
2571
|
+
* If you want to accelerate queries that perform expensive transformations, create a view that performs those transformations. You can then create your raw Reflection from that view.
|
|
2572
|
+
* If you want to accelerate queries that perform joins, create a view that performs the joins. You can then create your raw Reflection from that view.
|
|
2573
|
+
|
|
2574
|
+
### Create Default Raw Reflections
|
|
2575
|
+
|
|
2576
|
+
In the **Basic** view of the Reflections editor, you can create a raw Reflection that includes all of the fields that are in a table or view. Creating a basic raw Reflection ensures that Dremio never runs user queries against the underlying table or view when the raw Reflection is enabled.
|
|
2577
|
+
|
|
2578
|
+
To create a raw Reflection in the **Basic** view of the Reflections editor:
|
|
2579
|
+
|
|
2580
|
+
1. In the Dremio console, click  in the side navigation bar to go to the Datasets page.
|
|
2581
|
+
2. In the catalog or folder in which the anchor is located, hover over the anchor name and click .
|
|
2582
|
+
3. Select **Reflections** in the table or view settings sidebar.
|
|
2583
|
+
4. Click the toggle switch on the left side of the **Raw Reflections** bar.
|
|
2584
|
+
|
|
2585
|
+

|
|
2586
|
+
5. Click **Save**.
|
|
2587
|
+
|
|
2588
|
+
#### Restrictions of the **Basic** View
|
|
2589
|
+
|
|
2590
|
+
* You cannot select fields to sort or create horizontal partitions on.
|
|
2591
|
+
* The name of the Reflection that you create is restricted to "Raw Reflection".
|
|
2592
|
+
* You can create only one raw Reflection. If you want to create multiple raw Reflections at a time, use the **Advanced** view.
|
|
2593
|
+
|
|
2594
|
+
### Create Customized Raw Reflections
|
|
2595
|
+
|
|
2596
|
+
In the **Advanced** view of the Reflections editor, you can create one or more raw Reflections that include all or a selection of the fields that are in the anchor or supported anchor. You can also choose sort fields and fields for partitioning horizontally.
|
|
2597
|
+
|
|
2598
|
+
Dremio recommends that you follow the best practices listed in [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) when you create customized raw Reflections.
|
|
2599
|
+
|
|
2600
|
+
If you make any of the following changes to a raw Reflection when you are using the **Advanced** view, you cannot switch to the **Basic** view:
|
|
2601
|
+
|
|
2602
|
+
* Deselect one or more fields in the **Display** column. By default, all of the fields are selected.
|
|
2603
|
+
* Select one or more fields in the **Sort**, **Partition**, or **Distribute** column.
|
|
2604
|
+
|
|
2605
|
+
To create a raw Reflection in the **Advanced** view of the Reflections editor:
|
|
2606
|
+
|
|
2607
|
+
1. In the Dremio console, click  in the side navigation bar to go to the Datasets page.
|
|
2608
|
+
2. In the catalog or folder in which the anchor is located, hover over the anchor name and click .
|
|
2609
|
+
3. If the **Advanced** view is not already displayed, click the **Advanced View** button in the top-right corner of the editor.
|
|
2610
|
+
4. Click the toggle switch in the table labeled **Raw Reflection** to enable the raw Reflection.
|
|
2611
|
+
|
|
2612
|
+
Queries do not start using the Reflection, however, until after you have finished editing the Reflection and click **Save** in a later step.
|
|
2613
|
+
|
|
2614
|
+

|
|
2615
|
+
5. (Optional) Click in the label to rename the Reflection.
|
|
2616
|
+
|
|
2617
|
+
The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
|
|
2618
|
+
6. In the columns of the table, follow these steps, which you don't have to do in any particular order:
|
|
2619
|
+
|
|
2620
|
+
note
|
|
2621
|
+
|
|
2622
|
+
Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
|
|
2623
|
+
|
|
2624
|
+
* Click in the **Display** column to include fields in or exclude them from your Reflection.
|
|
2625
|
+
* Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
|
|
2626
|
+
* Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
|
|
2627
|
+
|
|
2628
|
+
note
|
|
2629
|
+
|
|
2630
|
+
If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, recommended partition columns and transforms are selected for you. If you change the selection of columns, then this icon appears at the top of the table: . You can click it to revert to the recommended selection of partition columns.
|
|
2631
|
+
7. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
|
|
2632
|
+
|
|
2633
|
+
a. Click the  in the table in which you are defining the Reflection.
|
|
2634
|
+
|
|
2635
|
+
b. In the field **Reflection execution strategy**, select either of these options:
|
|
2636
|
+
|
|
2637
|
+
* Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
|
|
2638
|
+
* Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
|
|
2639
|
+
8. Click **Save** when you are finished.
|
|
2640
|
+
|
|
2641
|
+
### Edit Raw Reflections
|
|
2642
|
+
|
|
2643
|
+
You can edit an existing raw Reflection. You might want to do so if you are iteratively designing and testing a raw Reflection, if the definition of the view that the Reflection was created from was changed, or if the schema of the underlying table was changed.
|
|
2644
|
+
|
|
2645
|
+
If you created a raw Reflection in the **Basic** view of the Reflections editor, you must use the **Advanced** view to edit it.
|
|
2646
|
+
|
|
2647
|
+
Dremio runs the job or jobs to recreate the Reflection after you click **Save**.
|
|
2648
|
+
|
|
2649
|
+
To edit a raw Reflection in the **Advanced** view of the Reflections editor:
|
|
2650
|
+
|
|
2651
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
2652
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2653
|
+
3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
|
|
2654
|
+
4. Click the **Advanced View** button in the top-right corner of the editor.
|
|
2655
|
+
5. In the **Raw Reflections** section of the **Advanced** view, locate the table that shows the definition of your Reflection.
|
|
2656
|
+
6. (Optional) Click in the label to rename the Reflection.
|
|
2657
|
+
|
|
2658
|
+
The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
|
|
2659
|
+
7. In the columns of the table, follow these steps, which you don't have to do in any particular order:
|
|
2660
|
+
|
|
2661
|
+
* Click in the **Display** column to include fields in or exclude them from your Reflection.
|
|
2662
|
+
* Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
|
|
2663
|
+
* Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
|
|
2664
|
+
|
|
2665
|
+
If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, partition columns and transforms are recommended for you. Hover over  at the top of the table to see the recommendation. Click the icon to accept the recommendation.
|
|
2666
|
+
|
|
2667
|
+
note
|
|
2668
|
+
|
|
2669
|
+
Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
|
|
2670
|
+
8. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
|
|
2671
|
+
|
|
2672
|
+
a. Click the  in the table in which you are defining the Reflection.
|
|
2673
|
+
|
|
2674
|
+
b. In the field **Reflection execution strategy**, select either of these options:
|
|
2675
|
+
|
|
2676
|
+
* Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
|
|
2677
|
+
* Select **Minimize Number Of Files** when you want to improve read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
|
|
2678
|
+
9. Click **Save** when you are finished.
|
|
2679
|
+
|
|
2680
|
+
## Aggregation Reflections
|
|
2681
|
+
|
|
2682
|
+
Accelerate BI-style queries that involve aggregations (`GROUP BY` queries) by precomputing results (like `SUM`, `COUNT`, `AVG`, `GROUP BY`) across selected dimensions and measures. By precomputing expensive computations, they significantly improve query performance at runtime. These Reflections are ideal for analytical workloads with frequent aggregations on large datasets.
|
|
2683
|
+
|
|
2684
|
+
### Create Default Aggregation Reflections
|
|
2685
|
+
|
|
2686
|
+
You can use the **Basic** view of the Reflections editor to create one aggregation Reflection that includes fields, from the anchor or supported anchor, that are recommended for use as dimensions or measures. You can add or remove dimensions and measures, too.
|
|
2687
|
+
|
|
2688
|
+
To create an aggregation Reflection in the **Basic** view of the Reflections editor:
|
|
2689
|
+
|
|
2690
|
+
1. In the Dremio console, click  in the side navigation bar to go to the Datasets page.
|
|
2691
|
+
2. In the catalog or folder in which the anchor is located, hover over the anchor name and click .
|
|
2692
|
+
3. In the **Aggregations Reflections** section of the editor, click **Generate** to get recommended fields to use as dimensions and measures. This will override any previously selected dimensions and measures. If you wish to proceed, click **Continue** in the confirmation dialog that follows.
|
|
2693
|
+
4. In the **Aggregation Reflection** section of the editor, modify or accept the recommended fields for dimensions and measures.
|
|
2694
|
+
5. To make the Reflection available to the query optimizer after you create it, click the toggle switch on the left side of the **Aggregation Reflections** bar.
|
|
2695
|
+
|
|
2696
|
+

|
|
2697
|
+
6. Click **Save**.
|
|
2698
|
+
|
|
2699
|
+
#### Restrictions
|
|
2700
|
+
|
|
2701
|
+
* You can create only one aggregation Reflection in the **Basic** view. If you want to create multiple aggregations Reflections at a time, use the **Advanced** view.
|
|
2702
|
+
* You cannot select fields for sorting or horizontally partitioning.
|
|
2703
|
+
* The name of the Reflection is restricted to "Aggregation Reflection".
|
|
2704
|
+
|
|
2705
|
+
### Create Customized Aggregation Reflections
|
|
2706
|
+
|
|
2707
|
+
You can use the **Advanced** view of the Reflections editor to create one or more aggregation Reflections that select which fields in the anchor or supporting anchor to use as dimensions and measures. For each field that you use as a measure, you can use one or more of these SQL functions: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`. You can also choose sort fields and fields for partitioning horizontally.
|
|
2708
|
+
|
|
2709
|
+
Before you create customized aggregation Reflections, Dremio recommends that you follow the best practices listed in [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) when you create customized aggregation Reflections.
|
|
2710
|
+
|
|
2711
|
+
To create an aggregation Reflection in the **Advanced** view of the Reflections editor:
|
|
2712
|
+
|
|
2713
|
+
1. In the Dremio console, click  in the side navigation bar to go to the Datasets page.
|
|
2714
|
+
2. In the catalog or folder in which the anchor is located, hover over the anchor name and click .
|
|
2715
|
+
3. Click the **Advanced View** button in the top-right corner of the editor.
|
|
2716
|
+
4. Click **Aggregation Reflections**.
|
|
2717
|
+
|
|
2718
|
+
The Aggregation Reflections section is displayed, and one table for refining the aggregation Reflection that appeared in the **Basic** view is ready.
|
|
2719
|
+
|
|
2720
|
+

|
|
2721
|
+
5. (Optional) Click in the name to rename the Reflection.
|
|
2722
|
+
|
|
2723
|
+
The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
|
|
2724
|
+
6. In the columns of the table, follow these steps, which you don't have to do in any particular order:
|
|
2725
|
+
|
|
2726
|
+
* Click in the **Dimension** column to include or exclude fields to use as dimensions.
|
|
2727
|
+
* Click in the **Measure** column to include or exclude fields to use as measures. You can use one or more of these SQL functions for each measure: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`.
|
|
2728
|
+
|
|
2729
|
+
If you want to include a computed measure, first create a view with the computed column to use as a measure, and then create the aggregation Reflection on the view.
|
|
2730
|
+
|
|
2731
|
+
The full list of SQL aggregation functions that Dremio supports is not supported in the Reflections editor. If you want to create a Reflection that aggregates data by using the `AVG`, `CORR`, `HLL`, `SUM`, `VAR_POP`, or `VAR_SAMP` SQL functions, you must create a view that uses the function, and then create a raw Reflection from that view.
|
|
2732
|
+
|
|
2733
|
+
* Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
|
|
2734
|
+
* Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
|
|
2735
|
+
|
|
2736
|
+
If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, recommended partition columns and transforms are selected for you. If you change the selection of columns, then this icon appears at the top of the table: . You can click it to revert back to the recommended selection of partition columns.
|
|
2737
|
+
|
|
2738
|
+
note
|
|
2739
|
+
|
|
2740
|
+
Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
|
|
2741
|
+
7. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
|
|
2742
|
+
|
|
2743
|
+
a. Click the  in the table in which you are defining the Reflection.
|
|
2744
|
+
|
|
2745
|
+
b. In the field **Reflection execution strategy**, select either of these options:
|
|
2746
|
+
|
|
2747
|
+
* Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
|
|
2748
|
+
* Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
|
|
2749
|
+
8. Click **Save** when you are finished.
|
|
2750
|
+
|
|
2751
|
+
### Edit Aggregation Reflections
|
|
2752
|
+
|
|
2753
|
+
You might want to edit an aggregation Reflection if you are iteratively designing and testing an aggregation Reflection, if the definition of the view that the Reflection was created from was changed, if the schema of the underlying table was changed, or if you want to revise one or more aggregations defined in the Reflection.
|
|
2754
|
+
|
|
2755
|
+
If you created an aggregation Reflection in the **Basic** view of the Reflections editor, you can edit that Reflection either in the **Basic** view or in the **Advanced** view.
|
|
2756
|
+
|
|
2757
|
+
Dremio runs the job or jobs to recreate the Reflection after you click **Save**.
|
|
2758
|
+
|
|
2759
|
+
#### Use the Basic View
|
|
2760
|
+
|
|
2761
|
+
To edit an aggregation Reflection in the **Basic** view of the Reflections editor:
|
|
2762
|
+
|
|
2763
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
2764
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2765
|
+
3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
|
|
2766
|
+
4. In the Aggregation Reflection section of the editor, modify or accept the recommendation for **Dimension** and **Measure** columns.
|
|
2767
|
+
5. Click **Save**.
|
|
2768
|
+
|
|
2769
|
+
#### Use the Advanced View
|
|
2770
|
+
|
|
2771
|
+
To edit an aggregation Reflection in the **Advanced** view of the Reflections editor:
|
|
2772
|
+
|
|
2773
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
2774
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2775
|
+
3. Click the name of the Reflection. This opens the Acceleration dialog with the Reflections editor.
|
|
2776
|
+
4. Click the **Advanced View** button in the top-right corner of the editor.
|
|
2777
|
+
5. Click **Aggregation Reflections**.
|
|
2778
|
+
6. (Optional) Click in the name to rename the Reflection.
|
|
2779
|
+
|
|
2780
|
+
The purpose of the name is to help you understand, when you read job reports, which Reflections the query optimizer considered and chose when planning queries.
|
|
2781
|
+
7. In the columns of the table, follow these steps, which you don't have to do in any particular order:
|
|
2782
|
+
|
|
2783
|
+
* Click in the **Dimension** column to include or exclude fields to use as dimensions.
|
|
2784
|
+
* Click in the **Measure** column to include or exclude fields to use as measures. You can use one or more of these SQL functions for each measure: `APPROX_DISTINCT_COUNT`, `COUNT`, `MAX`, and `MIN`.
|
|
2785
|
+
|
|
2786
|
+
The full list of SQL aggregation functions that Dremio supports is not supported in the Reflections editor. If you want to create a Reflection that aggregates data by using the `AVG`, `CORR`, `HLL`, `SUM`, `VAR_POP`, or `VAR_SAMP` SQL functions, you must create a view that uses the function, and then create a raw Reflection from that view.
|
|
2787
|
+
|
|
2788
|
+
* Click in the **Sort** column to select fields on which to sort the data in the Reflection. For guidance in selecting a field on which to sort, see [Sort Reflections on High-Cardinality Fields](/dremio-cloud/help-support/well-architected-framework/operational-excellence#sort-reflections-on-high-cardinality-fields).
|
|
2789
|
+
* Click in the **Partition** column to select fields on which to horizontally partition the rows in the Reflection. For guidance in selecting fields on which to partition, and which partition transforms to apply to those fields, see [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence#horizontally-partition-reflections-that-have-many-rows).
|
|
2790
|
+
|
|
2791
|
+
If the Reflection is based on an Iceberg table, a filesystem source, an AWS Glue source, or a Hive source, and that table is partitioned, partition columns and transforms are recommended for you. Hover over  at the top of the table to see the recommendation. Click the icon to accept the recommendation.
|
|
2792
|
+
|
|
2793
|
+
note
|
|
2794
|
+
|
|
2795
|
+
Ignore the **Distribution** column. Selecting fields in it has no effect on the Reflection.
|
|
2796
|
+
8. (Optional) Optimize the number of files used to store the Reflection. You can optimize for fast refreshes or for fast read performance by queries. Follow these steps:
|
|
2797
|
+
|
|
2798
|
+
a. Click the  in the table in which you are defining the Reflection.
|
|
2799
|
+
|
|
2800
|
+
b. In the field **Reflection execution strategy**, select either of these options:
|
|
2801
|
+
|
|
2802
|
+
* Select **Minimize Time Needed To Refresh** if you need the Reflection to be created as fast as possible. This option can result in the data for the Reflection being stored in many small files. This is the default option.
|
|
2803
|
+
* Select **Minimize Number Of Files** when you want to improve the read performance of queries against the Reflection. With this option, there tend to be fewer seeks performed for a given query.
|
|
2804
|
+
9. Click **Save** when you are finished.
|
|
2805
|
+
|
|
2806
|
+
## External Reflections
|
|
2807
|
+
|
|
2808
|
+
Reference precomputed tables in external data sources instead of materializing Reflections within Dremio, eliminating refresh overhead and storage costs. You can use an external Reflection by defining a view in Dremio that matches the precomputed table and mapping the view to the external data source. The data in the precomputed table is not refreshed by Dremio. When querying the view, Dremio’s query planner leverages the external Reflection to generate optimal execution plans, improving query performance without additional storage consumption in Dremio.
|
|
2809
|
+
|
|
2810
|
+
### Create External Reflections
|
|
2811
|
+
|
|
2812
|
+
To create an external Reflection:
|
|
2813
|
+
|
|
2814
|
+
1. Follow these steps in the data source:
|
|
2815
|
+
|
|
2816
|
+
a. Select your source table.
|
|
2817
|
+
|
|
2818
|
+
b. Create a table that is derived from the source table, such as an aggregation table, if you do not already have one.
|
|
2819
|
+
2. Follow these steps in Dremio:
|
|
2820
|
+
|
|
2821
|
+
a. [Define a view on the derived table in the data source.](/dremio-cloud/sql/commands/create-view) The definition must match that of the derived table.
|
|
2822
|
+
|
|
2823
|
+
b. [Define a new external Reflection that maps the view to the derived table.](/dremio-cloud/sql/commands/alter-table)
|
|
2824
|
+
|
|
2825
|
+
note
|
|
2826
|
+
|
|
2827
|
+
The data types and column names in the external Reflection must match those in the view that the external Reflection is mapped to.
|
|
2828
|
+
|
|
2829
|
+
Suppose you have a data source named `mySource` that is connected to Dremio. In that data source, there are (among all of your other tables) these two tables:
|
|
2830
|
+
|
|
2831
|
+
* `sales`, which is a very large table of sales data.
|
|
2832
|
+
* `sales_by_region`, which aggregates by region the data that is in `sales`.
|
|
2833
|
+
You want to make the data in `sales_by_region` available to data analysts who use Dremio. However, because you already have the `sales_by_region` table created, you do not see the need to create a Dremio table from `sales`, then create a Dremio view that duplicates `sales_by_region`, and finally create a Reflection on the view. You would like instead to make `sales_by_region` available to queries run from BI tools through Dremio.
|
|
2834
|
+
|
|
2835
|
+
To do that, you follow these steps:
|
|
2836
|
+
|
|
2837
|
+
1. Create a view in Dremio that has the same definition as `sales_by_region`. Notice that the `FROM` clause points to the `sales` table that is in your data source, not to a Dremio table.
|
|
2838
|
+
|
|
2839
|
+
Example View
|
|
2840
|
+
|
|
2841
|
+
```
|
|
2842
|
+
CREATE VIEW "myWorkspace"."sales_by_region" AS
|
|
2843
|
+
SELECT
|
|
2844
|
+
AVG(sales_amount) average_sales,
|
|
2845
|
+
SUM(sales_amount) total_sales,
|
|
2846
|
+
COUNT(*) sales_count,
|
|
2847
|
+
region
|
|
2848
|
+
FROM mySource.sales
|
|
2849
|
+
GROUP BY region
|
|
2850
|
+
```
|
|
2851
|
+
2. Create an external Reflection that maps the view above to `sales_by_region` in `mySource`.
|
|
2852
|
+
|
|
2853
|
+
Example External Reflection
|
|
2854
|
+
|
|
2855
|
+
```
|
|
2856
|
+
ALTER DATASET "myWorkspace"."sales_by_region"
|
|
2857
|
+
CREATE EXTERNAL Reflection "external_sales_by_region"
|
|
2858
|
+
USING "mySource"."sales_by_region"
|
|
2859
|
+
```
|
|
2860
|
+
|
|
2861
|
+
The external Reflection lets Dremio's query planner know that there is a table in `mySource` that matches the Dremio view `myWorkplace.sales_by_region` and that can be used to satisfy queries against the view. When Dremio users query `myWorkspace.sales_by_region`, Dremio routes the query to the data source `mySource`, which runs the query against `mySource.sales_by_region`.
|
|
2862
|
+
|
|
2863
|
+
### Edit External Reflections
|
|
2864
|
+
|
|
2865
|
+
If you have modified the DDL of a derived table in your data source, follow these steps in Dremio to update the corresponding external Reflection:
|
|
2866
|
+
|
|
2867
|
+
1. [Replace the view with one that has a definition that matches the definition of the derived table](/dremio-cloud/sql/commands/create-view). When you do so, the external Reflection is dropped.
|
|
2868
|
+
2. [Define a new external Reflection that maps the view to the derived table.](/dremio-cloud/sql/commands/alter-table)
|
|
2869
|
+
|
|
2870
|
+
## Test Reflections
|
|
2871
|
+
|
|
2872
|
+
You can test whether the Reflections that you created are used to satisfy a query without actually running the query. This practice can be helpful when the tables are very large and you want to avoid processing large queries unnecessarily.
|
|
2873
|
+
|
|
2874
|
+
To test whether one or more Reflections are used by a query:
|
|
2875
|
+
|
|
2876
|
+
1. In the Dremio console, click  in the side navigation bar to open the SQL Runner.
|
|
2877
|
+
2. In the SQL editor, type `EXPLAIN PLAN FOR` and then type or paste in your query.
|
|
2878
|
+
3. Click **Run**.
|
|
2879
|
+
4. When the query has finished, click the **Run** link found directly above the query results to view the job details. Any Reflections used will be shown on the page.
|
|
2880
|
+
|
|
2881
|
+
## View Whether Queries Used Reflections
|
|
2882
|
+
|
|
2883
|
+
You can view the list of jobs on the Jobs page to find out whether queries were accelerated by Reflections. The Jobs page lists the jobs that ran queries, both queries from your data consumers and queries run within the Dremio user interface.
|
|
2884
|
+
|
|
2885
|
+
To find whether a query used a Reflection:
|
|
2886
|
+
|
|
2887
|
+
1. Find the job that ran the query by looking below the details in each row.
|
|
2888
|
+
2. Look for  next to the job to indicate that one or more Reflections were used.
|
|
2889
|
+
3. View the job summary by clicking the row that represents the job that ran the query. The job summary appears in the pane to the right of the list of jobs.
|
|
2890
|
+
|
|
2891
|
+
### Relationship between Reflections and Jobs
|
|
2892
|
+
|
|
2893
|
+
The relationship between a job and a Reflection can be one of the following types:
|
|
2894
|
+
|
|
2895
|
+
* CONSIDERED – The Reflection is defined on a dataset that is used in the query but was determined not to cover the query (for example, the Reflection did not have a field that is used by the query).
|
|
2896
|
+
* MATCHED – A Reflection could have been used to accelerate the query, but Dremio determined that it would not provide any benefits or another Reflection was determined to be a better choice.
|
|
2897
|
+
* CHOSEN – A Reflection is used to accelerate the query. Note that multiple Reflections can be used to accelerate queries.
|
|
2898
|
+
|
|
2899
|
+
## Disable Reflections
|
|
2900
|
+
|
|
2901
|
+
Disabled Reflections become unavailable for use by queries and will not be refreshed manually or according to their schedule.
|
|
2902
|
+
|
|
2903
|
+
note
|
|
2904
|
+
|
|
2905
|
+
Dremio does not disable external Reflections.
|
|
2906
|
+
|
|
2907
|
+
To disable a Reflection:
|
|
2908
|
+
|
|
2909
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project Settings**.
|
|
2910
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2911
|
+
|
|
2912
|
+
This opens the Reflections editor for the Reflection's anchor or supporting anchor.
|
|
2913
|
+
3. Follow one of these steps:
|
|
2914
|
+
|
|
2915
|
+
* If there is only one raw Reflection for the table or view, in the **Basic** view, click the toggle switch in the **Raw Reflections** bar.
|
|
2916
|
+
* If there are two or more raw Reflections for the table or view, in the **Advanced** view, click the toggle switch for the individual raw Reflection that you want to disable.
|
|
2917
|
+
* If there is only one aggregation Reflection for the table or view, in the **Basic** view, click the toggle switch in the **Raw Reflections** bar.
|
|
2918
|
+
* If there are two or more aggregation Reflections for the table or view, in the **Advanced** view, click the toggle switch for the individual aggregation Reflection that you want to disable.
|
|
2919
|
+
4. Click **Save**. The changes take effect immediately.
|
|
2920
|
+
|
|
2921
|
+
## Delete Reflections
|
|
2922
|
+
|
|
2923
|
+
You can delete Reflections individually, or all of the Reflections on a table or view. When you delete a Reflection, its definition, data, and metadata are entirely deleted.
|
|
2924
|
+
|
|
2925
|
+
To delete a single raw or aggregation Reflection:
|
|
2926
|
+
|
|
2927
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project settings**.
|
|
2928
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2929
|
+
|
|
2930
|
+
This opens the Reflections editor for the Reflection's anchor or supporting anchor.
|
|
2931
|
+
3. Open the **Advanced** view, if it is not already open.
|
|
2932
|
+
4. If the Reflection is an aggregation Reflection, click **Aggregation Reflections**.
|
|
2933
|
+
5. Click  for the Reflection that you want to delete.
|
|
2934
|
+
6. Click **Save**. The deletion takes effect immediately.
|
|
2935
|
+
|
|
2936
|
+
To delete all raw and aggregation Reflections on a table or view:
|
|
2937
|
+
|
|
2938
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project Settings**.
|
|
2939
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
2940
|
+
|
|
2941
|
+
This opens the Reflections editor for the Reflection's anchor or supporting anchor.
|
|
2942
|
+
3. Click the in the top right corner of the Reflections page.
|
|
2943
|
+
4. Click **Delete all reflections**.
|
|
2944
|
+
5. Click **Save**.
|
|
2945
|
+
|
|
2946
|
+
To delete an external Reflection, or to delete a raw or aggregation Reflection without using the Reflections editor, run this SQL command:
|
|
2947
|
+
|
|
2948
|
+
Delete a Reflection
|
|
2949
|
+
|
|
2950
|
+
```
|
|
2951
|
+
ALTER DATASET <DATASET_PATH> DROP Reflection <REFLECTION_NAME>
|
|
2952
|
+
```
|
|
2953
|
+
|
|
2954
|
+
* `DATASET_PATH`: The path of the view on which the external Reflection is based.
|
|
2955
|
+
* `REFLECTION_NAME`: The name of the external Reflection.
|
|
2956
|
+
|
|
2957
|
+
## Related Topics
|
|
2958
|
+
|
|
2959
|
+
* [Data Reflections Deep Dive](https://university.dremio.com/course/data-reflections-deep-dive) – Enroll in this Dremio University course to learn more about Reflections.
|
|
2960
|
+
* [Operational Excellence](/dremio-cloud/help-support/well-architected-framework/operational-excellence/) - Follow best practices in Dremio's Well-Architected Framework for creating and managing Reflections.
|
|
2961
|
+
|
|
2962
|
+
Was this page helpful?
|
|
2963
|
+
|
|
2964
|
+
* Reflection Recommendations
|
|
2965
|
+
* Raw Reflections
|
|
2966
|
+
+ Prerequisites
|
|
2967
|
+
+ Create Default Raw Reflections
|
|
2968
|
+
+ Create Customized Raw Reflections
|
|
2969
|
+
+ Edit Raw Reflections
|
|
2970
|
+
* Aggregation Reflections
|
|
2971
|
+
+ Create Default Aggregation Reflections
|
|
2972
|
+
+ Create Customized Aggregation Reflections
|
|
2973
|
+
+ Edit Aggregation Reflections
|
|
2974
|
+
* External Reflections
|
|
2975
|
+
+ Create External Reflections
|
|
2976
|
+
+ Edit External Reflections
|
|
2977
|
+
* Test Reflections
|
|
2978
|
+
* View Whether Queries Used Reflections
|
|
2979
|
+
+ Relationship between Reflections and Jobs
|
|
2980
|
+
* Disable Reflections
|
|
2981
|
+
* Delete Reflections
|
|
2982
|
+
* Related Topics
|
|
2983
|
+
|
|
2984
|
+
<div style="page-break-after: always;"></div>
|
|
2985
|
+
|
|
2986
|
+
# Bring Your Own Project Store | Dremio Documentation
|
|
2987
|
+
|
|
2988
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/projects/your-own-project-storage
|
|
2989
|
+
|
|
2990
|
+
On this page
|
|
2991
|
+
|
|
2992
|
+
To enable secure access between Dremio and your AWS environment, you must create an AWS Identity and Access Management (IAM) role with specific permissions and a trust relationship that allows Dremio’s AWS account to assume that role. The IAM policy and trust configuration are detailed bellow.
|
|
2993
|
+
|
|
2994
|
+
## Create Your IAM Role
|
|
2995
|
+
|
|
2996
|
+
You will create an IAM Role in your AWS account that grants Dremio the permissions it needs to access your S3 bucket.
|
|
2997
|
+
|
|
2998
|
+
Attach the following policy to the role and replace `<bucket-name>` with the name of your own S3 bucket.
|
|
2999
|
+
|
|
3000
|
+
IAM Policy
|
|
3001
|
+
|
|
3002
|
+
```
|
|
3003
|
+
{
|
|
3004
|
+
"Version": "2012-10-17",
|
|
3005
|
+
"Statement": [
|
|
3006
|
+
{
|
|
3007
|
+
"Effect": "Allow",
|
|
3008
|
+
"Action": [
|
|
3009
|
+
"s3:GetBucketLocation",
|
|
3010
|
+
"s3:ListAllMyBuckets"
|
|
3011
|
+
],
|
|
3012
|
+
"Resource": "*"
|
|
3013
|
+
},
|
|
3014
|
+
{
|
|
3015
|
+
"Effect": "Allow",
|
|
3016
|
+
"Action": [
|
|
3017
|
+
"s3:PutObject",
|
|
3018
|
+
"s3:GetObject",
|
|
3019
|
+
"s3:ListBucket",
|
|
3020
|
+
"s3:DeleteObject"
|
|
3021
|
+
],
|
|
3022
|
+
"Resource": [
|
|
3023
|
+
"arn:aws:s3:::<bucket-name>",
|
|
3024
|
+
"arn:aws:s3:::<bucket-name>/*"
|
|
3025
|
+
]
|
|
3026
|
+
}
|
|
3027
|
+
]
|
|
3028
|
+
}
|
|
3029
|
+
```
|
|
3030
|
+
|
|
3031
|
+
The first statement allows Dremio to find buckets in your account.
|
|
3032
|
+
|
|
3033
|
+
* **ListAllMyBuckets** – Allow Dremio to discover your buckets when validating connectivity.
|
|
3034
|
+
* **GetBucketLocation** - Allow Dremio to discover your bucket's location.
|
|
3035
|
+
|
|
3036
|
+
The second statement allows Dremio to work with the data in your bucket.
|
|
3037
|
+
|
|
3038
|
+
* **PutObject / GetObject / DeleteObject** – Allow Dremio to read, write, and delete data within the bucket.
|
|
3039
|
+
* **ListBucket** – Allow Dremio to enumerate objects in the bucket.
|
|
3040
|
+
|
|
3041
|
+
## Define the Trust Relationship
|
|
3042
|
+
|
|
3043
|
+
The trust relationship determines which AWS account (in this case, Dremio’s) is permitted to assume your IAM role.
|
|
3044
|
+
|
|
3045
|
+
Attach the following policy to the role.
|
|
3046
|
+
|
|
3047
|
+
Dremio's US trust account ID is `894535543691`.
|
|
3048
|
+
|
|
3049
|
+
Trust Relationship
|
|
3050
|
+
|
|
3051
|
+
```
|
|
3052
|
+
{
|
|
3053
|
+
"Version": "2012-10-17",
|
|
3054
|
+
"Statement": [
|
|
3055
|
+
{
|
|
3056
|
+
"Effect": "Allow",
|
|
3057
|
+
"Principal": {
|
|
3058
|
+
"AWS": "arn:aws:iam::894535543691:root"
|
|
3059
|
+
},
|
|
3060
|
+
"Action": [
|
|
3061
|
+
"sts:AssumeRole",
|
|
3062
|
+
"sts:TagSession"
|
|
3063
|
+
]
|
|
3064
|
+
}
|
|
3065
|
+
]
|
|
3066
|
+
}
|
|
3067
|
+
```
|
|
3068
|
+
|
|
3069
|
+
* **AssumeRole** - Allows Dremio to assume the provided role.
|
|
3070
|
+
* **TagSession** - Allows Dremio to pass identifying tags during role assumption, enabling improved tracking and auditing across accounts.
|
|
3071
|
+
|
|
3072
|
+
## Validate Role Configuration
|
|
3073
|
+
|
|
3074
|
+
1. In the AWS Console, navigate to **IAM → Roles → [Your Role Name]**.
|
|
3075
|
+
2. Confirm that:
|
|
3076
|
+
|
|
3077
|
+
* The permissions policy matches the example above.
|
|
3078
|
+
* The trust relationship allows the Dremio AWS account as the trusted principal.
|
|
3079
|
+
* Both `sts:AssumeRole` and `sts:TagSession` actions are present.
|
|
3080
|
+
3. If Dremio provided an AWS account ID or specific region endpoint, ensure these match your configuration.
|
|
3081
|
+
|
|
3082
|
+
## Provide Role ARN to Dremio
|
|
3083
|
+
|
|
3084
|
+
Once your role is created and validated:
|
|
3085
|
+
|
|
3086
|
+
* Copy the Role ARN (e.g. `arn:aws:iam::<your-account-id>:role/<role-name>`).
|
|
3087
|
+
* Provide this ARN to Dremio via the [Create Project](/dremio-cloud/admin/projects/#create-a-project) flow.
|
|
3088
|
+
|
|
3089
|
+
This allows Dremio to assume the role securely and begin reading/writing data to your S3 bucket.
|
|
3090
|
+
|
|
3091
|
+
## (Optional) Enable PrivateLink Connectivity
|
|
3092
|
+
|
|
3093
|
+
To enhance security and keep data traffic within AWS’s private network, Dremio supports integration via [AWS PrivateLink](/dremio-cloud/security/privatelink) with DNS-based endpoint resolution.
|
|
3094
|
+
|
|
3095
|
+
**To enable:**
|
|
3096
|
+
|
|
3097
|
+
* Ensure your AWS environment has PrivateLink endpoints configured for the required services.
|
|
3098
|
+
* Verify that DNS resolution is enabled so that Dremio can route traffic to your private endpoints.
|
|
3099
|
+
* Confirm connectivity by testing the endpoint using your VPC configuration.
|
|
3100
|
+
|
|
3101
|
+
Was this page helpful?
|
|
3102
|
+
|
|
3103
|
+
* Create Your IAM Role
|
|
3104
|
+
* Define the Trust Relationship
|
|
3105
|
+
* Validate Role Configuration
|
|
3106
|
+
* Provide Role ARN to Dremio
|
|
3107
|
+
* (Optional) Enable PrivateLink Connectivity
|
|
3108
|
+
|
|
3109
|
+
<div style="page-break-after: always;"></div>
|
|
3110
|
+
|
|
3111
|
+
# Autonomous Reflections | Dremio Documentation
|
|
3112
|
+
|
|
3113
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/autonomous-reflections/
|
|
3114
|
+
|
|
3115
|
+
On this page
|
|
3116
|
+
|
|
3117
|
+
Dremio automatically creates and manages Reflections based on query patterns to optimize performance for queries on Iceberg tables, UniForm tables, Parquet datasets, and any views built on these datasets. With Autonomous Reflections, management and maintenance are fully automated, reducing manual effort and ensuring queries run efficiently. This eliminates the need for manual performance tuning while maintaining query correctness.
|
|
3118
|
+
|
|
3119
|
+
note
|
|
3120
|
+
|
|
3121
|
+
For data sources and formats not supported by Autonomous Reflections, you can create [manual Reflections](/dremio-cloud/admin/performance/manual-reflections) to optimize query performance.
|
|
3122
|
+
|
|
3123
|
+
## What Is a Reflection?
|
|
3124
|
+
|
|
3125
|
+
A Reflection is a precomputed and optimized copy of a query result, designed to speed up query performance. It is derived from an existing table or view, known as its anchor.
|
|
3126
|
+
|
|
3127
|
+
Dremio's query optimizer uses Reflections to accelerate queries by avoiding the need to scan the original data. Instead of querying the raw source, Dremio automatically rewrites queries to use Reflections when they provide the necessary results, without requiring you to reference them directly.
|
|
3128
|
+
|
|
3129
|
+
When Dremio receives a query, it first determines whether any Reflections have at least one table in common with the tables and views referenced by the query. If any Reflections do, Dremio evaluates them to determine whether they satisfy the query. Then, if any Reflections do satisfy the query, Dremio generates a query plan that uses them.
|
|
3130
|
+
|
|
3131
|
+
Dremio then compares the cost of the plan to the cost of executing the query directly against the tables, and selects the plan with the lower cost. Finally, Dremio executes the selected query plan. Typically, plans that use one or more Reflections are less expensive than plans that run against raw data.
|
|
3132
|
+
|
|
3133
|
+
## How Workloads Are Autonomously Accelerated
|
|
3134
|
+
|
|
3135
|
+
Dremio autonomously creates Reflections to accelerate queries on existing views, queries with joins written directly on base tables (not referencing any views), and queries that summarize data, typically submitted by AI Agents and BI dashboards.
|
|
3136
|
+
|
|
3137
|
+
Reflections are automatically generated based on query patterns without user intervention. Dremio continuously collects metadata from user queries, and the Autonomous Algorithm runs daily at midnight UTC to analyze recent query patterns from the last 7 days and create Autonomous Reflections that accelerate frequent and expensive queries.
|
|
3138
|
+
|
|
3139
|
+
### Query Qualification
|
|
3140
|
+
|
|
3141
|
+
Only queries meeting the following criteria are considered:
|
|
3142
|
+
|
|
3143
|
+
1. Based on Iceberg tables, Uniform table, Parquet datasets, or views built on them. Queries referencing non-Iceberg or non-Parquet datasets, either directly or via a view, are excluded.
|
|
3144
|
+
2. Execution time longer than one second.
|
|
3145
|
+
|
|
3146
|
+
Dremio may create system-managed views to anchor raw or aggregation Reflections that cannot be modified or referenced by users. Admins can drop these views, which also deletes the associated Reflection.
|
|
3147
|
+
|
|
3148
|
+
### Reflection Limits
|
|
3149
|
+
|
|
3150
|
+
Dremio can create up to 100 Reflections total, with a maximum of 10 new Reflections created per day. The actual number depends on query patterns.
|
|
3151
|
+
|
|
3152
|
+
## How Autonomous Reflections Are Maintained
|
|
3153
|
+
|
|
3154
|
+
Autonomous Reflections refresh automatically when source data changes:
|
|
3155
|
+
|
|
3156
|
+
* **Iceberg tables**: Refreshed when the table is modified via Dremio (triggered immediately) or other engines (Dremio polls tables every 10 seconds to detect changes).
|
|
3157
|
+
* **Uniform tables**: Refreshed when the table is modified via Dremio (triggered immediately) or other engines (Dremio polls tables every 10 seconds to detect changes).
|
|
3158
|
+
* **Parquet datasets**: Refreshed when metadata updates occur in Dremio.
|
|
3159
|
+
|
|
3160
|
+
**Refresh Engine:** When a project is created, Dremio automatically provisions a Small internal refresh engine dedicated to executing Autonomous Reflection refresh jobs. This ensures Reflections are always accurate and up-to-date without manual refresh. The engine automatically shuts down after 30 seconds of idle time to optimize resource usage and costs.
|
|
3161
|
+
|
|
3162
|
+
## Usage and Data Freshness
|
|
3163
|
+
|
|
3164
|
+
Dremio only uses Reflections in query plans when they refresh with the most recent data in tables on which they are based. If a Reflection is not yet refreshed, queries automatically fall back to the raw data source, ensuring query correctness is never compromised.
|
|
3165
|
+
|
|
3166
|
+
### Monitor Reflections
|
|
3167
|
+
|
|
3168
|
+
To view Autonomous Reflections created for your project and their metadata (including status, score, footprint, and queries accelerated), see [View Reflection Details](/dremio-cloud/admin/performance/manual-reflections/reflection-details).
|
|
3169
|
+
|
|
3170
|
+
To view the history of changes to Autonomous Reflections in the last 30 days:
|
|
3171
|
+
|
|
3172
|
+
1. Go to **Project Settings** > **Reflections**.
|
|
3173
|
+
2. Click **History Log**.
|
|
3174
|
+
|
|
3175
|
+
## Remove Reflections
|
|
3176
|
+
|
|
3177
|
+
Autonomous Reflections can be removed in two ways:
|
|
3178
|
+
|
|
3179
|
+
1. **Automatic Removal** – When a Autonomous Reflection's score falls below the threshold, it is disabled for 7 days before being automatically dropped. Admins can view disabled Autonomous Reflections in the history log.
|
|
3180
|
+
2. **Manual Removal** – Admins can manually drop Autonomous Reflections at any time. Autonomous Reflections cannot be modified by users. If an admin manually drops a Autonomous Reflection three times, Dremio will not recreate it for 90 days.
|
|
3181
|
+
|
|
3182
|
+
## Disable Reflections
|
|
3183
|
+
|
|
3184
|
+
Every project created in Dremio is automatically accelerated with Autonomous Reflections. To disable Autonomous Reflections for a project:
|
|
3185
|
+
|
|
3186
|
+
1. Go to **Project Settings** > **Preferences**.
|
|
3187
|
+
2. Toggle the **Autonomous Reflections** setting to off.
|
|
3188
|
+
|
|
3189
|
+
## Related Topics
|
|
3190
|
+
|
|
3191
|
+
* [Data Product Fundamentals](https://university.dremio.com/course/data-product-fundamentals) – Enroll in this Dremio University course to learn more about Autonomous Reflections.
|
|
3192
|
+
|
|
3193
|
+
Was this page helpful?
|
|
3194
|
+
|
|
3195
|
+
* What Is a Reflection?
|
|
3196
|
+
* How Workloads Are Autonomously Accelerated
|
|
3197
|
+
+ Query Qualification
|
|
3198
|
+
+ Reflection Limits
|
|
3199
|
+
* How Autonomous Reflections Are Maintained
|
|
3200
|
+
* Usage and Data Freshness
|
|
3201
|
+
+ Monitor Reflections
|
|
3202
|
+
* Remove Reflections
|
|
3203
|
+
* Disable Reflections
|
|
3204
|
+
* Related Topics
|
|
3205
|
+
|
|
3206
|
+
<div style="page-break-after: always;"></div>
|
|
3207
|
+
|
|
3208
|
+
# View Reflection Details | Dremio Documentation
|
|
3209
|
+
|
|
3210
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/reflection-details
|
|
3211
|
+
|
|
3212
|
+
On this page
|
|
3213
|
+
|
|
3214
|
+
The Reflections page lists all raw and aggregation Reflections in Dremio.
|
|
3215
|
+
|
|
3216
|
+
To view this page, follow these steps:
|
|
3217
|
+
|
|
3218
|
+
1. In the Dremio console, hover over  in the side navigation bar and select **Project Settings**.
|
|
3219
|
+
2. Select **Reflections** in the project settings sidebar.
|
|
3220
|
+
|
|
3221
|
+
For any particular Reflection, the Reflections page presents information that answers these questions:
|
|
3222
|
+
|
|
3223
|
+
| Question | Column with the answer |
|
|
3224
|
+
| --- | --- |
|
|
3225
|
+
| What is the status of this Reflection? | Name |
|
|
3226
|
+
| Is this a raw or aggregation Reflection? | Type |
|
|
3227
|
+
| Which table or view is this Reflection defined on? | Dataset |
|
|
3228
|
+
| How valuable is this Reflection? | Reflection Score |
|
|
3229
|
+
| How Reflection was created and managed? | Mode |
|
|
3230
|
+
| How can I see a list of the jobs that created and refreshed this Reflection? | Refresh Job History |
|
|
3231
|
+
| How many times has the query planner chosen this Reflection? | Acceleration Count |
|
|
3232
|
+
| How many times has the query planner considered using this Reflection? | Considered Count |
|
|
3233
|
+
| How many times did the query planner match a query to this Reflection? | Matched Count |
|
|
3234
|
+
| How do I find out how effective this Reflection is? | Acceleration Count |
|
|
3235
|
+
| When was this Reflection last refreshed? | Last Refresh From Table |
|
|
3236
|
+
| Is this Reflection being refreshed now? | Refresh Status |
|
|
3237
|
+
| What type of refreshes are used for this Reflection? | Refresh Method |
|
|
3238
|
+
| Are refreshes scheduled for this Reflection, or do they need to be triggered manually? | Refresh Status |
|
|
3239
|
+
| How much time did the most recent refresh of this Reflection take? | Last Refresh Duration |
|
|
3240
|
+
| How many records are in this Reflection? | Record Count |
|
|
3241
|
+
| How much storage is this Reflection taking up? | Current Footprint |
|
|
3242
|
+
| When does this Reflection expire? | Available Until |
|
|
3243
|
+
|
|
3244
|
+
## Columns
|
|
3245
|
+
|
|
3246
|
+
### Acceleration Count
|
|
3247
|
+
|
|
3248
|
+
Shows the number of times within the last 30 days that the query planner considered using a Reflection defined on a dataset referenced by a query, determined the Reflection could be used to satisfy the query, and chose to use the Reflection to satisfy the query.
|
|
3249
|
+
|
|
3250
|
+
If this count is low relative to the numbers in the **Considered Count** and **Matched Count**, the Reflection is not effective in reducing the execution times of queries on the dataset.
|
|
3251
|
+
|
|
3252
|
+
### Available Until
|
|
3253
|
+
|
|
3254
|
+
Shows the date and time when this Reflection expires, based on the refresh policy of the queried dataset.
|
|
3255
|
+
|
|
3256
|
+
If a Reflection is set to expire soon and you want to continue using it, you can take either of these actions:
|
|
3257
|
+
|
|
3258
|
+
* Change the expiration setting on the table which the Reflection is either directly or indirectly defined on. A Reflection is indirectly defined on a table when it is defined on a view that is derived from that table. When you change the setting by using this method, the change goes into effect after the next refresh.
|
|
3259
|
+
* Change the expiration setting on the data source where the table is located.
|
|
3260
|
+
|
|
3261
|
+
For the steps, see [Set the Reflection Expiration Policy](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh#set-the-reflection-expiration-policy).
|
|
3262
|
+
|
|
3263
|
+
### Mode
|
|
3264
|
+
|
|
3265
|
+
Shows how Reflection was created and managed.
|
|
3266
|
+
|
|
3267
|
+
* **autonomous**: Created and managed by Dremio
|
|
3268
|
+
* **manual**: Created and managed by user
|
|
3269
|
+
|
|
3270
|
+
### Considered Count
|
|
3271
|
+
|
|
3272
|
+
Shows the number of queries, within the last 30 days, that referenced the dataset that a Reflection is defined on. Whenever a query references a dataset on which a Reflection is defined, the query planner considers whether to use the Reflection to help satisfy the query.
|
|
3273
|
+
|
|
3274
|
+
If the query planner determines that the Reflection can do that (that the Reflection matches the query), the query planner compares the Reflection to any others that might also be defined on the same dataset.
|
|
3275
|
+
|
|
3276
|
+
If the query planner does not determine this, it ignores the Reflection.
|
|
3277
|
+
|
|
3278
|
+
Reflections with high considered counts and no match counts are contributing to high logical planning times. Consider deleting them.
|
|
3279
|
+
|
|
3280
|
+
Reflections with a considered count of 0 should be removed. They are merely taking up storage and, during refreshes, resources on compute engines.
|
|
3281
|
+
|
|
3282
|
+
### Current Footprint
|
|
3283
|
+
|
|
3284
|
+
Shows the current size, in kilobytes, of a Reflection.
|
|
3285
|
+
|
|
3286
|
+
### Dataset
|
|
3287
|
+
|
|
3288
|
+
Shows the name of the table or view that a Reflection is defined on.
|
|
3289
|
+
|
|
3290
|
+
### Last Refresh Duration
|
|
3291
|
+
|
|
3292
|
+
Shows the length of time required for the most recent refresh of a Reflection.
|
|
3293
|
+
|
|
3294
|
+
### Last Refresh From Table
|
|
3295
|
+
|
|
3296
|
+
Shows the date and time that the Reflection data was last refreshed. If the refresh is running, failing, or disabled, the value is `12/31/1969 23:59:59`.
|
|
3297
|
+
|
|
3298
|
+
### Matched Count
|
|
3299
|
+
|
|
3300
|
+
Shows the number of times, within the last 30 days, that the query planner both considered a Reflection for satisfying a query and determined that the Reflection would in fact satisfy the query. However, the query planner might have decided to use a different Reflection that also matched the query. For example, a different query plan that did not include the Reflection might have had a lower cost.
|
|
3301
|
+
|
|
3302
|
+
This number does not show how many times the query planner used the Reflection to satisfy the query. For that number, see Acceleration Count.
|
|
3303
|
+
|
|
3304
|
+
If the matched count is high and the accelerating count is low, the query planner is more often deciding to use a different Reflection that also matches a query. In this case, consider deleting the Reflection.
|
|
3305
|
+
|
|
3306
|
+
### Name
|
|
3307
|
+
|
|
3308
|
+
Shows the name of the Reflection and its status. The tooltip on the icon represents a combination of the status of the Reflection (which you can filter on through the values in the **Acceleration Status** field above the list) and the value in the **Refresh Status** column.
|
|
3309
|
+
|
|
3310
|
+
### Record Count
|
|
3311
|
+
|
|
3312
|
+
Shows the number of records in the Reflection.
|
|
3313
|
+
|
|
3314
|
+
### Reflection Score
|
|
3315
|
+
|
|
3316
|
+
Shows the score for a Reflection on a scale of 0 (worst) to 100 (best). The score indicates the value that the Reflection provides to your workloads based on the jobs that have been executed in the last 7 days. Reflection scores are calculated once each day. Factors considered in the score include the number of jobs accelerated by the Reflection and the expected improvement in query run times due to the Reflection.
|
|
3317
|
+
|
|
3318
|
+
To help you interpret the scores, the scores have the following labels:
|
|
3319
|
+
|
|
3320
|
+
* **Good**: The score is more than 75.
|
|
3321
|
+
* **Fair**: The score is 25 to 75.
|
|
3322
|
+
* **Poor**: The score is less than 25.
|
|
3323
|
+
* **New**: The score is blank because the Reflection was created within the past 24 hours.
|
|
3324
|
+
|
|
3325
|
+
note
|
|
3326
|
+
|
|
3327
|
+
If a Reflection's score is listed as **-**, the score needs to be recalculated due to an error or an upgraded instance.
|
|
3328
|
+
|
|
3329
|
+
### Refresh Job History
|
|
3330
|
+
|
|
3331
|
+
Opens a list of all of the jobs that created and refreshed a Reflection.
|
|
3332
|
+
|
|
3333
|
+
### Refresh Method
|
|
3334
|
+
|
|
3335
|
+
Shows which type of refresh was last used for a Reflection.
|
|
3336
|
+
|
|
3337
|
+
* **Full**: All of the data in the Reflection was replaced. The new data is based on the current data in the underlying dataset.
|
|
3338
|
+
* **Incremental**:
|
|
3339
|
+
+ For Reflections defined on Apache Iceberg tables: Either snapshot-based incremental refresh was used (if the changes were appends only) or partition-based incremental refresh was used (if the changes included DML operations).
|
|
3340
|
+
+ For Reflections defined on Delta Lake tables: This value does not appear. Only full refreshes are supported for these Reflections.
|
|
3341
|
+
+ For Reflections defined on all other tables: Data added to the underlying dataset since the last refresh of the Reflection was appended to the existing data in the Reflection.
|
|
3342
|
+
* **None**: Incremental refreshes were selected in the settings for the table. However, Dremio has not confirmed that it is possible to refresh the Reflection incrementally. Applies only to Reflections that are not defined on Iceberg or Delta Lake tables.
|
|
3343
|
+
|
|
3344
|
+
For more information, see [Refresh Reflections](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh).
|
|
3345
|
+
|
|
3346
|
+
### Refresh Status
|
|
3347
|
+
|
|
3348
|
+
Shows one of these values:
|
|
3349
|
+
|
|
3350
|
+
* **Manual**: Refreshes are not run on a schedule, but must be triggered manually. See [Trigger Reflection Refreshes](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh#trigger-reflection-refreshes).
|
|
3351
|
+
* **Pending**: If the Reflection depends on other Reflections, the refresh will begin after the refreshes of the other Reflections are finished.
|
|
3352
|
+
* **Running**: The Reflection is currently being refreshed.
|
|
3353
|
+
* **Scheduled**: Refreshes run on a schedule, but a refresh is not currently running.
|
|
3354
|
+
* **Auto**: All of the Reflection’s underlying tables are in Iceberg format, and the Reflection automatically refreshes when new snapshots are created after an update to an underlying table, but a refresh is not currently running.
|
|
3355
|
+
* **Failed**: Multiple attempts to refresh a Reflection have failed. You must disable and enable the Reflection to rebuild it and continue using it. Reflections in this state will not be considered to accelerate queries.
|
|
3356
|
+
|
|
3357
|
+
For more information, see [Refresh Reflections](/dremio-cloud/admin/performance/manual-reflections/reflection-refresh).
|
|
3358
|
+
|
|
3359
|
+
### Total Footprint
|
|
3360
|
+
|
|
3361
|
+
Shows the current size, in kilobytes, of all of the existing materializations of the Reflection. More than one materialization of a Reflection can exist at the same time, so that refreshes do not interrupt running queries that are being satisfied by the Reflection.
|
|
3362
|
+
|
|
3363
|
+
### Type
|
|
3364
|
+
|
|
3365
|
+
Shows whether the Reflection is a raw or aggregation Reflection.
|
|
3366
|
+
|
|
3367
|
+
Was this page helpful?
|
|
3368
|
+
|
|
3369
|
+
* Columns
|
|
3370
|
+
+ Acceleration Count
|
|
3371
|
+
+ Available Until
|
|
3372
|
+
+ Mode
|
|
3373
|
+
+ Considered Count
|
|
3374
|
+
+ Current Footprint
|
|
3375
|
+
+ Dataset
|
|
3376
|
+
+ Last Refresh Duration
|
|
3377
|
+
+ Last Refresh From Table
|
|
3378
|
+
+ Matched Count
|
|
3379
|
+
+ Name
|
|
3380
|
+
+ Record Count
|
|
3381
|
+
+ Reflection Score
|
|
3382
|
+
+ Refresh Job History
|
|
3383
|
+
+ Refresh Method
|
|
3384
|
+
+ Refresh Status
|
|
3385
|
+
+ Total Footprint
|
|
3386
|
+
+ Type
|
|
3387
|
+
|
|
3388
|
+
<div style="page-break-after: always;"></div>
|
|
3389
|
+
|
|
3390
|
+
# Refresh Reflections | Dremio Documentation
|
|
3391
|
+
|
|
3392
|
+
Original URL: https://docs.dremio.com/dremio-cloud/admin/performance/manual-reflections/reflection-refresh
|
|
3393
|
+
|
|
3394
|
+
On this page
|
|
3395
|
+
|
|
3396
|
+
The data in a Reflection can become stale and may need to be refreshed. Refreshing a Reflection triggers two updates:
|
|
3397
|
+
|
|
3398
|
+
* The data stored in the Apache Iceberg table for the Reflection is updated.
|
|
3399
|
+
* The metadata that stores details about the Reflection is updated.
|
|
3400
|
+
|
|
3401
|
+
note
|
|
3402
|
+
|
|
3403
|
+
Dremio does not refresh the data that external Reflections are mapped to.
|
|
3404
|
+
|
|
3405
|
+
## Types of Reflection Refresh
|
|
3406
|
+
|
|
3407
|
+
How Reflections are refreshed depend on the format of the base table.
|
|
3408
|
+
|
|
3409
|
+
### Apache Iceberg Tables, Filesystem Sources, AWS Glue Sources, and Hive Sources
|
|
3410
|
+
|
|
3411
|
+
There are two methods that can be used to refresh Reflections that are defined either on Iceberg tables or on these types of datasets in filesystem, AWS Glue, and Hive sources:
|
|
3412
|
+
|
|
3413
|
+
* Parquet datasets in Filesystem sources (on S3, Azure Storage, Google Cloud Storage, or HDFS)
|
|
3414
|
+
* Parquet datasets, Avro datasets, or non-transactional ORC datasets on AWS Glue or Hive (Hive 2 or Hive 3) sources
|
|
3415
|
+
|
|
3416
|
+
Iceberg tables in all supported file-system sources (Amazon S3, Azure Storage, Google Cloud Storage, and HDFS) and non-file-system sources (AWS Glue, Hive, and Nessie) can be refreshed with either of these methods.
|
|
3417
|
+
|
|
3418
|
+
* Incremental refreshes
|
|
3419
|
+
* Full refreshes
|
|
3420
|
+
|
|
3421
|
+
#### Incremental Refreshes
|
|
3422
|
+
|
|
3423
|
+
There are two types of incremental refreshes:
|
|
3424
|
+
|
|
3425
|
+
* Incremental refreshes when changes to an anchor table are only append operations
|
|
3426
|
+
* Incremental refreshes when changes to an anchor table include non-append operations
|
|
3427
|
+
|
|
3428
|
+
note
|
|
3429
|
+
|
|
3430
|
+
* Whether an incremental refresh can be performed depends on the outcome of an algorithm.
|
|
3431
|
+
* The initial refresh of a Reflection is always a full refresh.
|
|
3432
|
+
|
|
3433
|
+
#### Incremental Refreshes When Changes to an Anchor Table Are Only Append Operations
|
|
3434
|
+
|
|
3435
|
+
note
|
|
3436
|
+
|
|
3437
|
+
Optimize operations on Iceberg tables are also supported for this type of incremental refresh.
|
|
3438
|
+
|
|
3439
|
+
This type of incremental refresh is used only when the changes to the anchor table are appends and do not include updates or deletes. There are two cases to consider:
|
|
3440
|
+
|
|
3441
|
+
* When a Reflection is defined on one anchor table
|
|
3442
|
+
|
|
3443
|
+
When a Reflection is defined on an anchor table or on a view that is defined on one anchor table, an incremental refresh is based on the differences between the current snapshot of the anchor table and the snapshot at the time of the last refresh.
|
|
3444
|
+
* When a Reflection is defined on a view that joins two or more anchor tables
|
|
3445
|
+
|
|
3446
|
+
When a Reflection is defined on a view that joins two or more anchor tables, whether an incremental refresh can be performed depends on how many anchor tables have changed since the last refresh of the Reflection:
|
|
3447
|
+
|
|
3448
|
+
+ If just one of the anchor tables has changed since the last refresh, an incremental refresh can be performed. It is based on the differences between the current snapshot of the one changed anchor table and the snapshot at the time of the last refresh.
|
|
3449
|
+
+ If two or more tables have been refreshed since the last refresh, then a full refresh is used to refresh the Reflection.
|
|
3450
|
+
|
|
3451
|
+
#### Incremental Refreshes When Changes to an Anchor Table Include Non-append Operations
|
|
3452
|
+
|
|
3453
|
+
For Iceberg tables, this type of incremental refresh is used when the changes are DML operations that delete or modify the data (UPDATE, DELETE, etc.) made either through the Copy-on-Write (COW) or the Merge-on-Read (MOR) storage mechanism. For more information about COW and MOR, see [Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg](https://www.dremio.com/blog/row-level-changes-on-the-lakehouse-copy-on-write-vs-merge-on-read-in-apache-iceberg/).
|
|
3454
|
+
|
|
3455
|
+
For sources in filesystems or AWS Glue, non-append operations can include, for example:
|
|
3456
|
+
|
|
3457
|
+
* In filesystem sources, files being deleted from Parquet datasets
|
|
3458
|
+
* In AWS Glue sources, DML-equivalent operations being performed on Parquet datasets, Avro datasets, or non-transactional ORC datasets
|
|
3459
|
+
|
|
3460
|
+
Both the anchor table and the Reflection must be partitioned, and the partition transforms that they use must be compatible.
|
|
3461
|
+
|
|
3462
|
+
There are two cases to consider:
|
|
3463
|
+
|
|
3464
|
+
* When a Reflection is defined on one anchor table
|
|
3465
|
+
|
|
3466
|
+
When a Reflection is defined on an anchor table or on a view that is defined on one anchor table, an incremental refresh is based on Iceberg metadata that is used to identify modified partitions and to restrict the scope of the refresh to only those partitions.
|
|
3467
|
+
* When a Reflection is defined on a view that joins two or more anchor tables
|
|
3468
|
+
|
|
3469
|
+
When a Reflection is defined on a view that joins two or more anchor tables, whether an incremental refresh can be performed depends on how many anchor tables have changed since the last refresh of the Reflection:
|
|
3470
|
+
|
|
3471
|
+
+ If just one of the anchor tables has changed since the last refresh, an incremental refresh can be performed. It is based on Iceberg metadata that is used to identify modified partitions and to restrict the scope of the refresh to only those partitions.
|
|
3472
|
+
+ If two or more tables have been refreshed since the last refresh, then a full refresh is used to refresh the Reflection.
|
|
3473
|
+
|
|
3474
|
+
note
|
|
3475
|
+
|
|
3476
|
+
Dremio uses Iceberg tables to store metadata for filesystem and AWS Glue sources.
|
|
3477
|
+
|
|
3478
|
+
For information about partitioning Reflections and applying partition transforms, see the section [Horizontally Partition Reflections that Have Many Rows](/dremio-cloud/help-support/well-architected-framework/operational-excellence/#horizontally-partition-reflections-that-have-many-rows).
|
|
3479
|
+
|
|
3480
|
+
For information about partitioning Reflections in ways that are compatible with the partitioning of anchor tables, see [Partition Reflections to Allow for Partition-Based Incremental Refreshes](/dremio-cloud/help-support/well-architected-framework/operational-excellence/#partition-reflections-to-allow-for-partition-based-incremental-refreshes).
|
|
3481
|
+
|
|
3482
|
+
#### Full Refreshes
|
|
3483
|
+
|
|
3484
|
+
In a full refresh, a Reflection is dropped, recreated, and loaded.
|
|
3485
|
+
|
|
3486
|
+
note
|
|
3487
|
+
|
|
3488
|
+
* Whether a full refresh is performed depends on the outcome of an algorithm.
|
|
3489
|
+
* The initial refresh of a Reflection is always a full refresh.
|
|
3490
|
+
|
|
3491
|
+
#### Algorithm for Determining Whether an Incremental or a Full Refresh Is Used
|
|
3492
|
+
|
|
3493
|
+
The following algorithm determines which refresh method is used:
|
|
3494
|
+
|
|
3495
|
+
1. If the Reflection has never been refreshed, then a full refresh is performed.
|
|
3496
|
+
2. If the Reflection is created from a view that uses nested group-bys, unions, window functions, or joins other than inner or cross joins, then a full refresh is performed.
|
|
3497
|
+
3. If the Reflection is created from a view that joins two or more anchor tables and more than one anchor table has changed since the previous refresh, then a full refresh is performed.
|
|
3498
|
+
4. If the Reflection is based on a view and the changed anchor table is used multiple times in that view, then a full refresh is performed.
|
|
3499
|
+
5. If the changes to the anchor table are only appends, then an incremental refresh based on table snapshots is performed.
|
|
3500
|
+
6. If the changes to the anchor table include non-append operations, then the compatibility of the partitions of the anchor table and the partitions of the Reflection is checked:
|
|
3501
|
+
* If the partitions of the anchor table and the partitions of the Reflection are not compatible, or if either the anchor table or the Reflection is not partitioned, then a full refresh is performed.
|
|
3502
|
+
* If the partition scheme of the anchor table has been changed since the last refresh to be incompatible with the partitioning scheme of a Reflection, and if changes have occurred to data belonging to a prior partition scheme or the new partition scheme, then a full refresh is performed.
|
|
3503
|
+
To avoid a full refresh when these two conditions hold, update the partition scheme for Reflection to match the partition scheme for the table. You do so in the **Advanced** view of the Reflection editor or through the `ALTER DATASET` SQL command.
|
|
3504
|
+
* If the partitions of the anchor table and the partitions of the Reflection are compatible, then an incremental refresh is performed.
|
|
3505
|
+
|
|
3506
|
+
Because this algorithm is used to determine which type of refresh to perform, you do not select a type of refresh for Reflections in the settings of the anchor table.
|
|
3507
|
+
|
|
3508
|
+
However, no data is read in the `REFRESH REFLECTION` job for Reflections that are dependent only on Iceberg, Parquet, Avro, non-transactional ORC datasets, or other Reflections and that have no new data since the last refresh based on the table snapshots. Instead, a "no-op" Reflection refresh is planned and a materialization is created, eliminating redundancy and minimizing the cost of a full or incremental Reflection refresh.
|
|
3509
|
+
|
|
3510
|
+
### Delta Lake tables
|
|
3511
|
+
|
|
3512
|
+
Only full refreshes are supported. In a full refresh, the Reflection being refreshed is dropped, recreated, and loaded.
|
|
3513
|
+
|
|
3514
|
+
### All Other Tables
|
|
3515
|
+
|
|
3516
|
+
* **Incremental refreshes**
|
|
3517
|
+
|
|
3518
|
+
Dremio appends data to the existing data for a Reflection. Incremental refreshes are faster than full refreshes for large Reflections, and are appropriate for Reflections that are defined on tables that are not partitioned.
|
|
3519
|
+
|
|
3520
|
+
There are two ways in which Dremio can identify new records:
|
|
3521
|
+
|
|
3522
|
+
+ **For directory datasets in file-based data sources like S3 and HDFS:**
|
|
3523
|
+
Dremio can automatically identify new files in the directory that were added after the prior refresh.
|
|
3524
|
+
+ **For all other datasets (such as datasets in relational or NoSQL databases):**
|
|
3525
|
+
An administrator specifies a strictly monotonically increasing field, such as an auto-incrementing key, that must be of type BigInt, Int, Timestamp, Date, Varchar, Float, Double, or Decimal. This allows Dremio to find and fetch the records that have been created since the last time the acceleration was incrementally refreshed.
|
|
3526
|
+
|
|
3527
|
+
caution
|
|
3528
|
+
|
|
3529
|
+
Use incremental refreshes only for Reflections that are based on tables and views that are appended to. If records can be updated or deleted in a table or view, use full refreshes for the Reflections that are based on that table or view.
|
|
3530
|
+
* **Full refreshes**
|
|
3531
|
+
|
|
3532
|
+
In a full refresh, the Reflection being refreshed is dropped, recreated, and loaded.
|
|
3533
|
+
|
|
3534
|
+
Full refreshes are always used in these three cases:
|
|
3535
|
+
|
|
3536
|
+
+ A Reflection is partitioned on one or more fields.
|
|
3537
|
+
+ A Reflection is created on a table that was promoted from a file, rather than from a folder, or is created on a view that is based on such a table.
|
|
3538
|
+
+ A Reflection is created from a view that uses nested group-bys, joins, unions, or window functions.
|
|
3539
|
+
|
|
3540
|
+
## Specify the Reflection Refresh Policy
|
|
3541
|
+
|
|
3542
|
+
In the settings for a data source, you specify the refresh policy for refreshes of all Reflections that are on the tables in that data source. The default policy is period-based, with one hour between each refresh. If you select a schedule policy, the default is every day at 8:00 a.m. (UTC).
|
|
3543
|
+
|
|
3544
|
+
In the settings for a table that is not in the Iceberg or Delta Lake format, you can specify the type of refresh to use for all Reflections that are ultimately derived from the table. The default refresh type is **Full refresh**.
|
|
3545
|
+
|
|
3546
|
+
For tables in all supported table formats, you can specify a refresh policy for Reflection refreshes that overrides the policy specified in the settings for the table's data source. The default policy is the schedule set at the source of the table.
|
|
3547
|
+
|
|
3548
|
+
To set the refresh policy on a data source:
|
|
3549
|
+
|
|
3550
|
+
1. In the Dremio console, right-click a data lake or external source.
|
|
3551
|
+
2. Select **Edit Details**.
|
|
3552
|
+
3. In the sidebar of the Edit Source window, select **Reflection Refresh**.
|
|
3553
|
+
4. When you are done making your selections, click **Save**. Your changes go into effect immediately.
|
|
3554
|
+
|
|
3555
|
+
To edit the refresh policy on a table:
|
|
3556
|
+
|
|
3557
|
+
1. Locate the table.
|
|
3558
|
+
2. Hover over the row in which it appears and click  to the right.
|
|
3559
|
+
3. Select **Reflection Refresh** in the dataset settings sidebar.
|
|
3560
|
+
4. When you are done making your selections, click **Save**. Your changes go into effect immediately.
|
|
3561
|
+
|
|
3562
|
+
### Types of Refresh Policies
|
|
3563
|
+
|
|
3564
|
+
Datasets and sources can set Reflections to refresh according to the following policy types:
|
|
3565
|
+
|
|
3566
|
+
| Refresh policy type | Description |
|
|
3567
|
+
| --- | --- |
|
|
3568
|
+
| Never | Reflections are not refreshed. |
|
|
3569
|
+
| Period (default) | Reflections refresh at the specified number of hours, days, or weeks. The default refresh period is one hour. |
|
|
3570
|
+
| Schedule | Reflections refresh at a specific time on the specified days of the week, in UTC. The default is every day at 8:00 a.m. (UTC). |
|
|
3571
|
+
| Auto refresh when Iceberg table data changes | Reflections automatically refresh for underlying Iceberg tables whenever new updates occur. Reflections under this policy type are known as Live Reflections. Live Reflections are also updated based on the minimum refresh frequency defined by the source-level policy. This refresh policy is only available for data sources that support the Iceberg table format. |
|
|
3572
|
+
|
|
3573
|
+
## Set the Reflection Expiration Policy
|
|
3574
|
+
|
|
3575
|
+
Rather than delete a Reflection manually, you can specify how long you want Dremio to retain the Reflection before deleting it automatically.
|
|
3576
|
+
|
|
3577
|
+
note
|
|
3578
|
+
|
|
3579
|
+
Dremio does not allow expiration policies to be set on external Reflections or Reflections that automatically refresh when Iceberg data changes according to the refresh policy.
|
|
3580
|
+
|
|
3581
|
+
To set the expiration policy for all Reflections derived from tables in a data source:
|
|
3582
|
+
|
|
3583
|
+
1. Right-click a data lake or external source.
|
|
3584
|
+
2. Select **Edit Details**.
|
|
3585
|
+
3. Select **Reflection Refresh** in the edit source sidebar.
|
|
3586
|
+
4. After making your changes, click **Save**. The changes take effect on the next refresh.
|
|
3587
|
+
|
|
3588
|
+
To set the expiration policy on Reflections derived from a particular table:
|
|
3589
|
+
|
|
3590
|
+
note
|
|
3591
|
+
|
|
3592
|
+
The table must be based on more than one file.
|
|
3593
|
+
|
|
3594
|
+
1. Locate a table.
|
|
3595
|
+
2. Click the  to its right.
|
|
3596
|
+
3. Select **Reflection Refresh** in the dataset settings sidebar.
|
|
3597
|
+
4. After making your changes, click **Save**. The changes take effect on the next refresh.
|
|
3598
|
+
|
|
3599
|
+
## View the Reflection Refresh History
|
|
3600
|
+
|
|
3601
|
+
You can find out whether a refresh job for a Reflection has run, and how many times refresh jobs for a Reflection have been run.
|
|
3602
|
+
|
|
3603
|
+
To view the refresh history:
|
|
3604
|
+
|
|
3605
|
+
1. In the Dremio console, go to the catalog or folder that lists the table or view from which the Reflection was created.
|
|
3606
|
+
2. Hover over the row for the table or view.
|
|
3607
|
+
3. In the **Actions** field, click .
|
|
3608
|
+
4. Select **Reflections** in the dataset settings sidebar.
|
|
3609
|
+
5. Click **History** in the heading for the Reflection.
|
|
3610
|
+
|
|
3611
|
+
The Jobs page is opened with the ID of the Reflection in the search box, and only jobs related to that ID are listed.
|
|
3612
|
+
|
|
3613
|
+
When a Reflection is refreshed, Dremio runs a single job with two steps:
|
|
3614
|
+
|
|
3615
|
+
* The first step writes the query results as a materialization to the distributed acceleration storage by running a `REFRESH REFLECTION` command.
|
|
3616
|
+
* The second step registers the materialization table and its metadata with the catalog so that the query optimizer can find the Reflection's definition and structure.
|
|
3617
|
+
|
|
3618
|
+
The following screenshot shows the `REFRESH REFLECTION` command used to refresh the Reflection named `Super-duper reflection`:
|
|
3619
|
+
|
|
3620
|
+

|
|
3621
|
+
|
|
3622
|
+
The Reflection refresh is listed as a single job on the Jobs page, as shown in the example below:
|
|
3623
|
+
|
|
3624
|
+

|
|
3625
|
+
|
|
3626
|
+
To find out which type of refresh was performed:
|
|
3627
|
+
|
|
3628
|
+
1. Click the ID of the job that ran the `REFRESH REFLECTION` command.
|
|
3629
|
+
2. Click the **Raw Profile** tab.
|
|
3630
|
+
3. Click the **Planning** tab.
|
|
3631
|
+
4. Scroll down to the **Refresh Decision** section.
|
|
3632
|
+
|
|
3633
|
+
## Retry a Reflection Refresh Policy
|
|
3634
|
+
|
|
3635
|
+
When a Reflection refresh job fails, Dremio retries the refresh according to a uniform policy. This policy is designed to balance resource consumption with the need to keep Reflection data up to date. It prioritizes newly failed Reflections to reduce excessive retries on persistent failures and helps ensure that Reflection data does not become overly stale.
|
|
3636
|
+
|
|
3637
|
+
After a refresh failure, Dremio's default is to repeat the refresh attempt at exponential intervals up to 4 hours: 1 minute, 2 minutes, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, and 4 hours. Then, Dremio continues trying to refresh the Reflection every 4 hours.
|
|
3638
|
+
|
|
3639
|
+
There are two optimizations for special cases:
|
|
3640
|
+
|
|
3641
|
+
* **Long-running refresh jobs**: The backoff interval will never be shorter than the last successful duration.
|
|
3642
|
+
* **Small maximum retry attempts**: At least one 4-hour backoff attempt is guaranteed to ensure meaningful coverage of the retry policy.
|
|
3643
|
+
|
|
3644
|
+
Dremio stops retrying after 24 attempts, which typically takes about 71 hours and 52 minutes, or when the 72-hour retry window is reached, whichever comes first.
|
|
3645
|
+
|
|
3646
|
+
To configure a different maximum number of retry attempts for Reflection refreshes than Dremio's default of 24 retries:
|
|
3647
|
+
|
|
3648
|
+
1. Click  in the left navbar.
|
|
3649
|
+
2. Select **Reflections** in the left sidebar.
|
|
3650
|
+
3. On the Reflections page, click  in the top-right corner and select **Acceleration Settings**.
|
|
3651
|
+
4. In the field next to **Maximum attempts for Reflection job failures**, specify the maximum number of retries.
|
|
3652
|
+
5. Click **Save**. The change goes into effect immediately.
|
|
3653
|
+
|
|
3654
|
+
Dremio applies the retry policy after a refresh failure for all types of Reflection refreshes, no matter whether the refresh was triggered or set by a refresh policy.
|
|
3655
|
+
|
|
3656
|
+
## Trigger Reflection Refreshes
|
|
3657
|
+
|
|
3658
|
+
You can click a button to start the refresh of all of the Reflections that are defined on a table or on views derived from that table.
|
|
3659
|
+
|
|
3660
|
+
To trigger a refresh manually:
|
|
3661
|
+
|
|
3662
|
+
1. Locate the table.
|
|
3663
|
+
2. Hover over the row in which it appears and click  to the right.
|
|
3664
|
+
3. In the sidebar of the Dataset Settings window, click **Reflection Refresh**.
|
|
3665
|
+
4. Click **Refresh Now**. The message "All dependent Reflections will be refreshed." appears at the top of the screen.
|
|
3666
|
+
5. Click **Save**.
|
|
3667
|
+
|
|
3668
|
+
You can refresh Reflections by using the Reflection API, the Catalog API, and the SQL commands [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table) and [`ALTER VIEW`](/dremio-cloud/sql/commands/alter-view).
|
|
3669
|
+
|
|
3670
|
+
* With the Reflection API, you specify the ID of a Reflection. See [Refresh a Reflection](/dremio-cloud/api/reflection/#refresh-a-reflection).
|
|
3671
|
+
* With the Catalog API, you specify the ID of a table or view that the Reflections are defined on. See [Refresh the Reflections on a Table](/dremio-cloud/api/catalog/table#refresh-the-reflections-on-a-table) and [Refresh the Reflections on a View](/dremio-cloud/api/catalog/view#refresh-the-reflections-on-a-view).
|
|
3672
|
+
* With the [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table) and [`ALTER VIEW`](/dremio-cloud/sql/commands/alter-view) commands, you specify the path and name of the table or view that the Reflections are defined on.
|
|
3673
|
+
|
|
3674
|
+
The refresh action follows this logic for the Reflection API:
|
|
3675
|
+
|
|
3676
|
+
* If the Reflection is defined on a view, the action refreshes all Reflections that are defined on the tables and on downstream/dependent views that the anchor view is itself defined on.
|
|
3677
|
+
* If the Reflection is defined on a table, the action refreshes the Reflections that are defined on the table and all Reflections that are defined on the downstream/dependent views of the anchor table.
|
|
3678
|
+
|
|
3679
|
+
The refresh action follows similar logic for the Catalog API and the SQL commands:
|
|
3680
|
+
|
|
3681
|
+
* If the action is started on a view, it refreshes all Reflections that are defined on the tables and on downstream/dependent views that the view is itself defined on.
|
|
3682
|
+
* If the action is started on a table, it refreshes the Reflections that are defined on the table and all Reflections that are defined on the downstream/dependent views of the anchor table.
|
|
3683
|
+
|
|
3684
|
+
For example, suppose that you had the following tables and views, with Reflections R1 through R5 defined on them:
|
|
3685
|
+
|
|
3686
|
+
```
|
|
3687
|
+
View2(R5)
|
|
3688
|
+
/ \
|
|
3689
|
+
View1(R3) Table3(R4)
|
|
3690
|
+
/ \
|
|
3691
|
+
Table1(R1) Table2(R2)
|
|
3692
|
+
```
|
|
3693
|
+
|
|
3694
|
+
* Refreshing Reflection R5 through the API also refreshes R1, R2, R3, and R4.
|
|
3695
|
+
* Refreshing Reflection R4 through the API also refreshes R5.
|
|
3696
|
+
* Refreshing Reflection R3 through the API also refreshes R1, R2, and R5.
|
|
3697
|
+
* Refreshing Reflection R2 through the API also refreshes R3 and R5.
|
|
3698
|
+
* Refreshing Reflection R1 through the API also refreshes R3 and R5.
|
|
3699
|
+
|
|
3700
|
+
## Obtain Reflection IDs
|
|
3701
|
+
|
|
3702
|
+
You will need one or more Reflection IDs for some of the Reflection hints. Reflection IDs can be found in three places: the Acceleration section of the raw profile of the job that ran a query using the Reflection, the [`SYS.PROJECT.REFLECTIONS`](/dremio-cloud/sql/system-tables/reflections) system table, and the Reflection summary objects that you retrieve with the Reflection API.
|
|
3703
|
+
|
|
3704
|
+
To find the ID of a Reflection in Acceleration section of the raw profile of job that ran a query that used the Reflection:
|
|
3705
|
+
|
|
3706
|
+
1. In the Dremio console, click  in the side navigation bar.
|
|
3707
|
+
2. In the list of jobs, locate the job that ran the query. If the query was satisfied by a Reflection,  appears after the name of the user who ran the query.
|
|
3708
|
+
3. Click the ID of the job.
|
|
3709
|
+
4. Click **Raw Profile** at the top of the page.
|
|
3710
|
+
5. Click the **Acceleration** tab.
|
|
3711
|
+
6. In the Reflection Outcome section, locate the ID of the Reflection.
|
|
3712
|
+
|
|
3713
|
+
|
|
3714
|
+
|
|
3715
|
+
To find the ID of a Reflection in the `SYS.PROJECT.REFLECTIONS` system table:
|
|
3716
|
+
|
|
3717
|
+
1. In the Dremio console, click  in the left navbar.
|
|
3718
|
+
2. Copy this query and paste it into the SQL editor:
|
|
3719
|
+
|
|
3720
|
+
Query for listing info about all existing Reflections
|
|
3721
|
+
|
|
3722
|
+
```
|
|
3723
|
+
SELECT * FROM SYS.PROJECT.REFLECTIONS
|
|
3724
|
+
```
|
|
3725
|
+
3. Sort the results on the `dataset_name` column.
|
|
3726
|
+
4. In the `dataset_name` column, locate the name of the dataset that the Reflection was defined on.
|
|
3727
|
+
5. Scroll the table to the right to look through the display columns, dimensions, measures, sort columns, and partition columns to find the combination of attributes that define the Reflection.
|
|
3728
|
+
6. Scroll the table all the way to the left to find the ID of the Reflection.
|
|
3729
|
+
|
|
3730
|
+
|
|
3731
|
+
|
|
3732
|
+
To find the ID of a Reflection by using REST APIs:
|
|
3733
|
+
|
|
3734
|
+
1. Obtain the ID of the table or view that the Reflection was defined on by using retrieving either the [table](/dremio-cloud/api/catalog/table#retrieve-a-table-by-path) or [view](/dremio-cloud/api/catalog/view#retrieve-a-view-by-path) by its path.
|
|
3735
|
+
2. [Use the Reflections API to retrieve a list of all of the Reflections that are defined on the table or view](/dremio-cloud/api/reflection/#retrieve-all-reflections-for-a-dataset).
|
|
3736
|
+
3. In the response, locate the Reflection by its combination of attributes.
|
|
3737
|
+
4. Copy the Reflection's ID.
|
|
3738
|
+
|
|
3739
|
+
Was this page helpful?
|
|
3740
|
+
|
|
3741
|
+
* Types of Reflection Refresh
|
|
3742
|
+
+ Apache Iceberg Tables, Filesystem Sources, AWS Glue Sources, and Hive Sources
|
|
3743
|
+
+ Delta Lake tables
|
|
3744
|
+
+ All Other Tables
|
|
3745
|
+
* Specify the Reflection Refresh Policy
|
|
3746
|
+
+ Types of Refresh Policies
|
|
3747
|
+
* Set the Reflection Expiration Policy
|
|
3748
|
+
* View the Reflection Refresh History
|
|
3749
|
+
* Retry a Reflection Refresh Policy
|
|
3750
|
+
* Trigger Reflection Refreshes
|
|
3751
|
+
* Obtain Reflection IDs
|
|
3752
|
+
|
|
3753
|
+
<div style="page-break-after: always;"></div>
|
|
3754
|
+
|