dremiojs 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/.eslintrc.json +14 -0
  2. package/.prettierrc +7 -0
  3. package/README.md +59 -0
  4. package/dremiodocs/dremio-cloud/cloud-api-reference.md +748 -0
  5. package/dremiodocs/dremio-cloud/dremio-cloud-about.md +225 -0
  6. package/dremiodocs/dremio-cloud/dremio-cloud-admin.md +3754 -0
  7. package/dremiodocs/dremio-cloud/dremio-cloud-bring-data.md +6098 -0
  8. package/dremiodocs/dremio-cloud/dremio-cloud-changelog.md +32 -0
  9. package/dremiodocs/dremio-cloud/dremio-cloud-developer.md +1147 -0
  10. package/dremiodocs/dremio-cloud/dremio-cloud-explore-analyze.md +2522 -0
  11. package/dremiodocs/dremio-cloud/dremio-cloud-get-started.md +300 -0
  12. package/dremiodocs/dremio-cloud/dremio-cloud-help-support.md +869 -0
  13. package/dremiodocs/dremio-cloud/dremio-cloud-manage-govern.md +800 -0
  14. package/dremiodocs/dremio-cloud/dremio-cloud-overview.md +36 -0
  15. package/dremiodocs/dremio-cloud/dremio-cloud-security.md +1844 -0
  16. package/dremiodocs/dremio-cloud/sql-docs.md +7180 -0
  17. package/dremiodocs/dremio-software/dremio-software-acceleration.md +1575 -0
  18. package/dremiodocs/dremio-software/dremio-software-admin.md +884 -0
  19. package/dremiodocs/dremio-software/dremio-software-client-applications.md +3277 -0
  20. package/dremiodocs/dremio-software/dremio-software-data-products.md +560 -0
  21. package/dremiodocs/dremio-software/dremio-software-data-sources.md +8701 -0
  22. package/dremiodocs/dremio-software/dremio-software-deploy-dremio.md +3446 -0
  23. package/dremiodocs/dremio-software/dremio-software-get-started.md +848 -0
  24. package/dremiodocs/dremio-software/dremio-software-monitoring.md +422 -0
  25. package/dremiodocs/dremio-software/dremio-software-reference.md +677 -0
  26. package/dremiodocs/dremio-software/dremio-software-security.md +2074 -0
  27. package/dremiodocs/dremio-software/dremio-software-v25-api.md +32637 -0
  28. package/dremiodocs/dremio-software/dremio-software-v26-api.md +36757 -0
  29. package/jest.config.js +10 -0
  30. package/package.json +25 -0
  31. package/src/api/catalog.ts +74 -0
  32. package/src/api/jobs.ts +105 -0
  33. package/src/api/reflection.ts +77 -0
  34. package/src/api/source.ts +61 -0
  35. package/src/api/user.ts +32 -0
  36. package/src/client/base.ts +66 -0
  37. package/src/client/cloud.ts +37 -0
  38. package/src/client/software.ts +73 -0
  39. package/src/index.ts +16 -0
  40. package/src/types/catalog.ts +31 -0
  41. package/src/types/config.ts +18 -0
  42. package/src/types/job.ts +18 -0
  43. package/src/types/reflection.ts +29 -0
  44. package/tests/integration_manual.ts +95 -0
  45. package/tsconfig.json +19 -0
@@ -0,0 +1,1147 @@
1
+ # Developer Guide | Dremio Documentation
2
+
3
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/
4
+
5
+ On this page
6
+
7
+ You can develop applications that connect to Dremio using Arrow Flight for high-performance data access, APIs for management operations, or by integrating with development tools and frameworks.
8
+
9
+ ## Build Custom Applications
10
+
11
+ Use Arrow Flight and Python SDKs to build applications that connect to Dremio:
12
+
13
+ * [Arrow Flight](/dremio-cloud/developer/arrow-flight) – High-performance data access for analytics applications
14
+ * [Arrow Flight SQL](/dremio-cloud/developer/arrow-flight-sql) – Standardized SQL database interactions with prepared statements
15
+ * [Python](/dremio-cloud/developer/python) – Build applications using Arrow Flight or REST APIs
16
+ * [Dremio MCP Server](/dremio-cloud/developer/mcp-server) – AI Agent integration for natural language interactions
17
+
18
+ ## Build Pipelines and Transformations
19
+
20
+ Use your tool of choice to build pipelines, perform transformations, and work with Dremio:
21
+
22
+ * [dbt Integration](/dremio-cloud/developer/dbt) – Transform data with version control and testing
23
+ * [VS Code Extension](/dremio-cloud/developer/vs-code) – Query Dremio from Visual Studio Code
24
+
25
+ ## Customize and Automate
26
+
27
+ Use APIs to power any type of customization or automation:
28
+
29
+ * [API Reference](/dremio-cloud/api/) – Web applications and administrative automation
30
+
31
+ For sample applications, connectors, and additional integrations, see [Dremio Hub](https://github.com/dremio-hub).
32
+
33
+ ## Supported Data Formats
34
+
35
+ For a deep dive into open table and data formats that Dremio supports, see [Data Formats](/dremio-cloud/developer/data-formats/).
36
+
37
+ Was this page helpful?
38
+
39
+ * Build Custom Applications
40
+ * Build Pipelines and Transformations
41
+ * Customize and Automate
42
+ * Supported Data Formats
43
+
44
+ <div style="page-break-after: always;"></div>
45
+
46
+ # dbt | Dremio Documentation
47
+
48
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/dbt
49
+
50
+ On this page
51
+
52
+ dbt enables analytics engineers to transform their data using the same practices that software engineers use to build applications.
53
+
54
+ You can use Dremio's dbt connector `dbt-dremio` to transform data that is in data sources that are connected to a Dremio project.
55
+
56
+ ## Prerequisites
57
+
58
+ * Download the `dbt-dremio` package from <https://github.com/dremio/dbt-dremio>.
59
+ * Ensure that Python 3.9.x or later is installed.
60
+ * Before connecting from a dbt project to Dremio, follow these prerequisite steps:
61
+ + Ensure that you have the ID of the Dremio project that you want to use. See [Obtain the ID of a Project](/dremio-cloud/admin/projects/#obtain-the-id-of-a-project).
62
+ + Ensure that you have a personal access token (PAT) for authenticating to Dremio. See [Create a PAT](/dremio-cloud/security/authentication/personal-access-token#create-a-pat).
63
+
64
+ ## Install
65
+
66
+ Install this package from PyPi by running this command:
67
+
68
+ Install dbt-dremio package
69
+
70
+ ```
71
+ pip install dbt-dremio
72
+ ```
73
+
74
+ note
75
+
76
+ `dbt-dremio` works exclusively with dbt-core versions 1.8-1.9. Previous versions of dbt-core are outside of official support.
77
+
78
+ ## Initialize a dbt Project
79
+
80
+ 1. Run the command `dbt init <project_name>`.
81
+ 2. Select `dremio` as the database to use.
82
+ 3. Select the `dremio_cloud` option.
83
+ 4. Provide a value for `cloud_host`.
84
+ 5. Enter your username, PAT, and the ID of your Dremio project.
85
+ 6. Select the `enterprise_catalog` option.
86
+ 7. For `enterprise_catalog_namespace`, enter the name of an existing namespace within the catalog.
87
+ 8. For `enterprise_catalog_folder`, enter the name of a folder which already exists within the namespace.
88
+
89
+ For descriptions of the configurations in the above steps, see Configurations.
90
+
91
+ After these steps are completed, you will now have a profile for your new dbt project. This file will typically be named `profiles.yml`.
92
+
93
+ This file can be edited to add multiple profiles, one for each `target` configuration of Dremio.
94
+ A common pattern is to have a `dev` target a dbt project is tested, and then another `prod` target where changes to the model are promoted after testing:
95
+
96
+ Example Profile
97
+
98
+ ```
99
+ [project name]:
100
+ outputs:
101
+ dev:
102
+ cloud_host: api.dremio.cloud
103
+ cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
104
+ enterprise_catalog_folder: sales
105
+ enterprise_catalog_namespace: dev
106
+ pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
107
+ threads: 1
108
+ type: dremio
109
+ use_ssl: true
110
+ user: name@company.com
111
+ prod:
112
+ cloud_host: api.dremio.cloud
113
+ cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
114
+ enterprise_catalog_folder: sales
115
+ enterprise_catalog_namespace: prod
116
+ pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
117
+ threads: 1
118
+ type: dremio
119
+ use_ssl: true
120
+ user: name@company.com
121
+ target: dev
122
+ ```
123
+
124
+ Note that the `target` value inside of the profiles.yml file can be overriden when invoking the `dbt run`.
125
+
126
+ Specify target for dbt run command
127
+
128
+ ```
129
+ dbt run --target <target_name>
130
+ ```
131
+
132
+ ## Configurations
133
+
134
+ | Configuration | Required | Default Value | Description |
135
+ | --- | --- | --- | --- |
136
+ | `cloud_host` | Yes | `api.dremio.cloud` | US Control Plane: `api.dremio.cloud` EU Control Plane: `api.eu.dremio.cloud` |
137
+ | `cloud_project_id` | Yes | None | The ID of the Dremio project in which to run transformations. |
138
+ | `enterprise_catalog_namespace` | Yes | None | The namespace in which to create tables, views, etc. The dbt aliases are `datalake` (for objects) and `database` (for views). |
139
+ | `enterprise_catalog_folder` | Yes | None | The path in the catalog in which to create catalog objects. The dbt aliases are `root_path` (for objects) and `schema` (for views). Nested folders in the path are separated with periods. |
140
+ | `pat` | Yes | None | The personal access token to use for authentication. See [Personal Access Tokens](/dremio-cloud/security/authentication/personal-access-token/) for instructions about obtaining a token. |
141
+ | `threads` | Yes | 1 | The number of threads the dbt project runs on. |
142
+ | `type` | Yes | `dremio` | Auto-populated when creating a Dremio project. Do not change this value. |
143
+ | `use_ssl` | Yes | `true` | The value must be `true`. |
144
+ | `user` | Yes | None | Email address used as a username in Dremio. |
145
+
146
+ ## Known Limitations
147
+
148
+ [Model contracts](https://docs.getdbt.com/docs/collaborate/govern/model-contracts) are not supported.
149
+
150
+ Was this page helpful?
151
+
152
+ * Prerequisites
153
+ * Install
154
+ * Initialize a dbt Project
155
+ * Configurations
156
+ * Known Limitations
157
+
158
+ <div style="page-break-after: always;"></div>
159
+
160
+ # Python | Dremio Documentation
161
+
162
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/python
163
+
164
+ On this page
165
+
166
+ You can develop client applications in Python that use that use [Arrow Flight](/dremio-cloud/developer/arrow-flight/) and connect to Dremio's Arrow Flight server endpoint. For help getting started, try out the sample application.
167
+
168
+ ## Sample Python Arrow Flight Client Application
169
+
170
+ This lightweight sample Python client application connects to the Dremio Arrow Flight server endpoint. You can use token-based credentials for authentication. Any datasets in Dremio that are accessible by the provided Dremio user can be queried. You can change settings in a `.yaml` configuration file before running the client.
171
+
172
+ The Sample Python Client Application
173
+
174
+ ```
175
+ """
176
+ Copyright (C) 2017-2021 Dremio Corporation
177
+
178
+ Licensed under the Apache License, Version 2.0 (the "License");
179
+ you may not use this file except in compliance with the License.
180
+ You may obtain a copy of the License at
181
+
182
+ http://www.apache.org/licenses/LICENSE-2.0
183
+
184
+ Unless required by applicable law or agreed to in writing, software
185
+ distributed under the License is distributed on an "AS IS" BASIS,
186
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
187
+ See the License for the specific language governing permissions and
188
+ limitations under the License.
189
+ """
190
+ from dremio.arguments.parse import get_config
191
+ from dremio.flight.endpoint import DremioFlightEndpoint
192
+
193
+ if __name__ == "__main__":
194
+ # Parse the config file.
195
+ args = get_config()
196
+
197
+ # Instantiate DremioFlightEndpoint object
198
+ dremio_flight_endpoint = DremioFlightEndpoint(args)
199
+
200
+ # Connect to Dremio Arrow Flight server endpoint.
201
+ flight_client = dremio_flight_endpoint.connect()
202
+
203
+ # Execute query
204
+ dataframe = dremio_flight_endpoint.execute_query(flight_client)
205
+
206
+ # Print out the data
207
+ print(dataframe)
208
+ ```
209
+
210
+ ### Steps
211
+
212
+ 1. Install [Python 3](https://www.python.org/downloads/).
213
+ 2. Download the [Dremio Flight endpoint .whl file](https://github.com/dremio-hub/arrow-flight-client-examples/releases).
214
+ 3. Install the `.whl` file:
215
+ Command for installing the file
216
+
217
+ ```
218
+ python3 -m pip install <path to .whl file>
219
+ ```
220
+ 4. Create a local folder to store the client file and config file.
221
+ 5. Create a file named `example.py` in the folder that you created.
222
+ 6. Copy the contents of `arrow-flight-client-examples/python/example.py` (available [here](https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/python/example.py)) into `example.py`.
223
+ 7. Create a file named `config.yaml` in the folder that you created.
224
+ 8. Copy the contents of `arrow-flight-client-examples/python/config_template.yaml` (available [here](https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/python/config_template.yaml)) into `config.yaml`.
225
+ 9. Uncomment the options in `config.yaml`, as needed, appending arguments after their keys (i.e., `username: my_username`). You can either delete the options that are not being used or leave them commented.
226
+
227
+ Example config file for connecting to Dremio
228
+
229
+ ```
230
+ hostname: data.dremio.cloud
231
+ port: 443
232
+ pat: my_PAT
233
+ tls: true
234
+ query: SELECT * FROM Samples."samples.dremio.com"."NYC-taxi-trips" limit 10
235
+ ```
236
+ 10. Run the Python Arrow Flight Client by navigating to the folder that you created in the previous step and running this command:
237
+ Command for running the client
238
+
239
+ ```
240
+ python3 example.py [-config CONFIG_REL_PATH | --config-path CONFIG_REL_PATH]
241
+ ```
242
+
243
+ * `[-config CONFIG_REL_PATH | --config-path CONFIG_REL_PATH]`: Use either of these options to set the relative path to the config file. The default is "./config.yaml".
244
+
245
+ ### Config File Options
246
+
247
+ Default content of the config file
248
+
249
+ ```
250
+ hostname:
251
+ port:
252
+ username:
253
+ password:
254
+ token:
255
+ query:
256
+ tls:
257
+ disable_certificate_verification:
258
+ path_to_certs:
259
+ session_properties:
260
+ engine:
261
+ ```
262
+
263
+ | Name | Type | Required? | Default | Description |
264
+ | --- | --- | --- | --- | --- |
265
+ | `hostname` | string | No | `localhost` | Must be `data.dremio.cloud`. |
266
+ | `port` | integer | No | 32010 | Dremio's Arrow Flight server port. Must be `443`. |
267
+ | `username` | string | No | N/A | Not applicable when connecting to Dremio. |
268
+ | `password` | string | No | N/A | Not applicable when connecting to Dremio. |
269
+ | `token` | string | Yes | N/A | Either a Personal Access Token or an OAuth2 Token. |
270
+ | `query` | string | Yes | N/A | The SQL query to test. |
271
+ | `tls` | boolean | No | false | Enables encryption on a connection. |
272
+ | `disable_certificate_verification` | boolean | No | false | Disables TLS server verification. |
273
+ | `path_to_certs` | string | No | System Certificates | Path to trusted certificates for encrypted connections. |
274
+ | `session_properties` | list of strings | No | N/A | Key value pairs of `session_properties`. Example: ``` session_properties: - schema='Samples."samples.dremio.com"' ``` For a list of the available properties, see [Manage Workloads](/dremio-cloud/developer/arrow-flight#manage-workloads). |
275
+ | `engine` | string | No | N/A | The specific engine to run against. |
276
+
277
+ Was this page helpful?
278
+
279
+ * Sample Python Arrow Flight Client Application
280
+ + Steps
281
+ + Config File Options
282
+
283
+ <div style="page-break-after: always;"></div>
284
+
285
+ # What You Can Do
286
+
287
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/vs-code
288
+
289
+ On this page
290
+
291
+ The Dremio Visual Studio (VS) Code extension transforms VS Code into an AI-ready workspace, enabling you to discover, explore, and analyze enterprise data with natural language and SQL side by side, directly in your IDE.
292
+
293
+ # What You Can Do
294
+
295
+ The VS Code extension for Dremio allows you to:
296
+
297
+ * Connect across projects – Access one or more Dremio Cloud projects from within VS Code.
298
+ * Browse & discover with context – Explore governed objects in your catalog, complete with metadata and semantic context.
299
+ * Query with intelligence – Write and run SQL with autocomplete, formatting, and syntax highlighting—or let AI agents generate SQL for you.
300
+ * Explore and get insights using natural language – Use the built-in Microsoft Copilot integration to ask questions in plain English, moving from questions to insights faster, without leaving your development environment.
301
+
302
+ ## Prerequisites
303
+
304
+ Before you begin, ensure you have:
305
+
306
+ * Access to a Dremio Cloud project.
307
+ * Personal access token (PAT) for connectivity to your project. For instructions, see [Create a PAT](/cloud/security/authentication/personal-access-token/#creating-a-pat).
308
+ * Visual Studio Code installed with access to the Extensions tab in the tool.
309
+
310
+ ## Install VS Code Extension for Dremio
311
+
312
+ 1. Launch VS Code and click the Extensions button on the left navigation toolbar.
313
+ 2. Search for and click on the **Dremio** extension.
314
+ 3. On the Dremio extension page, click **Install**.
315
+ Once the installation is complete, you're ready to start querying Dremio from VS Code.
316
+
317
+ ## Connect to Dremio from VS Code
318
+
319
+ To create a connection from VS Code:
320
+
321
+ 1. From the extension for Dremio, click the + button that appears when you hover over the **Connections** heading on the left panel.
322
+ 2. For **Select your Dremio deployment**, select **Dremio Cloud**.
323
+ 3. From the **Select a control plane** menu, select **US Control Plane** or **European Control Plane** based on where your Dremio Cloud organization is located.
324
+ 4. Click **Personal Access Token** and enter the PAT that you have previously generated and press Enter.
325
+ 5. The connection to your Dremio Cloud project will appear on the left under **Connections**.
326
+ 6. To browse your data, click `<your_dremio_account_email>` under your connection.
327
+
328
+ ## Use the Copilot Integration
329
+
330
+ With Copilot in VS Code set to Agent mode, you can interact with your data through plain-language queries powered by Dremio’s semantic layer. For example, try asking:
331
+
332
+ * "What curated views are available for financial analysis?"
333
+ * "Summarize sales trends over the last 90 days by product category."
334
+ * "Write SQL to compare revenue growth in North America vs. Europe."
335
+
336
+ Behind the scenes, Copilot taps into Dremio’s AI Semantic Layer and autonomous optimization to ensure queries run with sub-second performance — whether executed by humans or AI agents.
337
+
338
+ Was this page helpful?
339
+
340
+ * Prerequisites
341
+ * Install VS Code Extension for Dremio
342
+ * Connect to Dremio from VS Code
343
+ * Use the Copilot Integration
344
+
345
+ <div style="page-break-after: always;"></div>
346
+
347
+ # Dremio MCP Server | Dremio Documentation
348
+
349
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/mcp-server
350
+
351
+ On this page
352
+
353
+ The [Dremio MCP Server](https://github.com/dremio/dremio-mcp) is an open-source project that enables AI chat clients or agents to securely interact with your Dremio deployment using natural language. Connecting to the Dremio-hosted MCP Server is the fastest path to enabling external AI chat clients to work with Dremio. The Dremio-hosted MCP Server provides OAuth support, which guarantees and propagates the user identity, authentication, and authorization for all interactions with Dremio. Once connected, you can use natural language to explore and query data, perform analysis and create visualizations, create views, and analyze system performance. While you can fork the open-source Dremio MCP Server for customization or install it locally for use with a personal AI chat client account we recommend using the Dremio-hosted MCP Server available to all projects for experimentation, development and production when possible.
354
+
355
+ ## Configure Connectivity
356
+
357
+ Review the documentation below from AI chat client providers to verify you meet the requirements for creating custom connectors before proceeding.
358
+
359
+ * [Claude Custom Connector Documentation](https://support.claude.com/en/articles/11175166-getting-started-with-custom-connectors-using-remote-mcp#h_3d1a65aded)
360
+ * [ChatGPT Custom Connector Documentation](https://help.openai.com/en/articles/11487775-connectors-in-chatgpt#h_a454f0d0b6)
361
+
362
+ To configure connectivity to your Dremio-hosted MCP Server, you first need to set up a [Native OAUth application](/dremio-cloud/security/authentication/app-authentication/oauth-apps) and provide the redirect URLs for the AI chat client you are using.
363
+
364
+ * If you are using Claude, fill in `https://claude.ai/api/mcp/auth_callback,https://claude.com/api/mcp/auth_callback,http://localhost/callback,http://localhost` as redirect URLs for the OAuth Application
365
+ * If you are using ChatGPT, fill in `https://chatgpt.com/connector_platform_oauth_redirect,http://localhost` as the redirect URLs for the OAuth Application
366
+ * For a custom AI chat client, you will need to speak to your administrator.
367
+
368
+ Then configure the custom connector to the Dremio-hosted MCP Server by providing the client ID from the OAuth application and the MCP endpoint for your control plane.
369
+
370
+ * For Dremio instances using the US control plane, your MCP endpoint is `mcp.dremio.cloud/mcp/{project_id}`.
371
+ * For Dremio instances using the European control plane, your MCP endpoint is `mcp.eu.dremio.cloud/mcp/{project_id}`.
372
+ * If you are unsure of your endpoint, you can copy the **MCP endpoint** from the Project Overview page in Project Settings.
373
+
374
+ Was this page helpful?
375
+
376
+ * Configure Connectivity
377
+
378
+ <div style="page-break-after: always;"></div>
379
+
380
+ # Arrow Flight | Dremio Documentation
381
+
382
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/arrow-flight
383
+
384
+ On this page
385
+
386
+ You can create client applications that use [Arrow Flight](https://arrow.apache.org/docs/format/Flight.html) to query data lakes at data-transfer speeds greater than speeds possible with ODBC and JDBC, without incurring the cost in time and CPU resources of deserializing data. As the volumes of data that are transferred increase in size, the performance benefits from the use of Apache Flight rather than ODBC or JDBC also increase.
387
+
388
+ You can run queries on datasets that are in the default project of a Dremio organization. Dremio is able to determine the organization and the default project from the authentication token that a Flight client uses. To query datasets in a non-default project, you can pass in the ID for the non-default project.
389
+
390
+ Dremio provides these endpoints for Arrow Flight connections:
391
+
392
+ * In the US control plane: `data.dremio.cloud:443`
393
+ * In the EU control plane: `data.eu.dremio.cloud:443`
394
+
395
+ All traffic within a control plane between Flight clients and Dremio go through the endpoint for that control plane. However, Dremio can scale up or down automatically to accommodate increasing and decreasing traffic on the endpoint.
396
+
397
+ Unless you pass in a different project ID, Arrow Flight clients run queries only against datasets that are in the default project or on datasources that are associated with the default project. By default, Dremio uses the oldest project in an organization as that organization's default project.
398
+
399
+ ## Supported Versions of Apache Arrow
400
+
401
+ Dremio supports client applications that use Arrow Flight in Apache Arrow version 6.0.
402
+
403
+ ## Supported Authentication Method
404
+
405
+ Client applications can authenticate to Dremio with personal access tokens (PATs). To create a PAT, follow the steps in the section [Creating a Token](/dremio-cloud/security/authentication/personal-access-token#create-a-pat).
406
+
407
+ ## Flight Sessions
408
+
409
+ A Flight session has a duration of 120 minutes during which a Flight client interacts with Dremio. A Flight client initiates a new session by passing a `getFlightInfo()` request that does not include a Cookie header that specifies a session ID that was obtained from Dremio. All requests that pass the same session ID are considered to be in the same session.
410
+
411
+ ![](/images/cloud/arrow-flight-session.png)
412
+
413
+ 1. The Flight client, having obtained a PAT from Dremio, sends a `getFlightInfo()` request that includes the query to run, the URI for the endpoint, and the bearer token (PAT). A single bearer token can be used for requests until it expires.
414
+ 2. If Dremio is able to authenticate the Flight client by using the bearer token, it sends a response that includes FlightInfo, a Set-Cookie header with the session ID, the bearer token, and a Set-Cookie header with the ID of the default project in the organization.
415
+
416
+ FlightInfo responses from Dremio include the single endpoint for the control plane being used and the ticket for that endpoint. There is only one endpoint listed in FlightInfo responses.
417
+
418
+ Session IDs are generated by Dremio.
419
+ 3. The client sends a `getStream()` request that includes the ticket, a Cookie header for the session ID, the bearer token, and a Cookie header for the ID of the default project.
420
+ 4. Dremio returns the query results in one flight.
421
+ 5. The Flight client sends another `getFlightInfo()` request using the same session ID and bearer token. If this second request did not include the session ID that Dremio sent in response to the first request, then Dremio would send a new session ID and a new session would begin.
422
+
423
+ ### Use a Non-Default Project
424
+
425
+ To run queries on datasets and data sources in non-default projects in Dremio, the `project_id` of the projects must be passed as a session option. The `project_id` is stored in the user session, and the server responds with a `Set-Cookie` header containing the session ID. The client must include this cookie in all subsequent requests.
426
+
427
+ To enable this behavior, a cookie middleware must be added to the Flight client. This middleware is responsible for managing cookies and will add the previous session ID to all subsequent requests.
428
+
429
+ After adding the middleware when initializing the client object, the `project_id` can be passed as a session option.
430
+
431
+ Here are examples of how to implement the `project_id` in Java and Go:
432
+
433
+ * Java
434
+ * Go
435
+
436
+ Pass in the ID for a non-default project in [Java](https://arrow.apache.org/docs/java/)
437
+
438
+ ```
439
+ // Create a ClientCookieMiddleware
440
+ final FlightClient.Builder flightClientBuilder = FlightClient.builder();
441
+ final ClientCookieMiddleware.Factory cookieFactory = new ClientCookieMiddleware.Factory();
442
+ flightClientBuilder.intercept(cookieFactory);
443
+
444
+ // Add the project ID to the session options
445
+ final SetSessionOptionsRequest setSessionOptionRequest =
446
+ new SetSessionOptionsRequest(ImmutableMap.<String, SessionOptionValue>
447
+ builder().put("project_id",
448
+ SessionOptionValueFactory.makeSessionOptionValue(yourprojectid)).build());
449
+
450
+ // Close your session later once query is done
451
+ client.closeSession(new CloseSessionRequest(), bearerToken, headerCallOption);
452
+ ```
453
+
454
+ Pass in the ID for a non-default project in [Go](https://github.com/apache/arrow-go)
455
+
456
+ ```
457
+ // Create a ClientCookieMiddleware
458
+ client, err := flight.NewClientWithMiddleware(
459
+ net.JoinHostPort(config.Host, config.Port),
460
+ nil,
461
+ []flight.ClientMiddleware{flight.NewClientCookieMiddleware(),},
462
+ grpc.WithTransportCredentials(creds),
463
+ )
464
+ // Close the session once the query is done
465
+ defer client.CloseSession(ctx, &flight.CloseSessionRequest{})
466
+ // Add the project ID to the session options
467
+ projectIdSessionOption, err := flight.NewSessionOptionValue(projectID)
468
+ sessionOptionsRequest := flight.SetSessionOptionsRequest{
469
+ SessionOptions: map[string]*flight.SessionOptionValue{
470
+ "project_id": &projectIdSessionOption,
471
+ },
472
+ }
473
+ response, err = client.SetSessionOptions(ctx, &sessionOptionsRequest)
474
+ ```
475
+
476
+ note
477
+
478
+ In Dremio, the term catalog is sometimes used interchangeably with `project_id`. Therefore, using catalog instead of `project_id` will also work when selecting a non-default project. We recommend using `project_id` for clarity. Throughout this documentation, we will consistently use `project_id`.
479
+
480
+ ## Manage Workloads
481
+
482
+ Dremio administrators can use the Arrow Flight server endpoint to manage query workloads by adding the following connection properties to Flight clients:
483
+
484
+ | Flight Client Property | Description |
485
+ | --- | --- |
486
+ | `ENGINE` | Name of the engine to use to process all queries issued during the current session. |
487
+ | `SCHEMA` | The name of the schema (datasource or folder, including child paths, such as `mySource.folder1` and `folder1.folder2`) to use by default when a schema is not specified in a query. |
488
+
489
+ ## Sample Arrow Flight Client Applications
490
+
491
+ Dremio provides sample Arrow Flight client applications in several languages at [Dremio Hub](https://github.com/dremio-hub/arrow-flight-client-examples).
492
+
493
+ Both sample clients use the hostname `local` and the port number `32010` by default. Make sure you override these defaults with the hostname `data.dremio.cloud` or `data.eu.dremio.cloud` and the port number `443`.
494
+
495
+ note
496
+
497
+ The Python sample application only supports connecting to the default project in Dremio.
498
+
499
+ Was this page helpful?
500
+
501
+ * Supported Versions of Apache Arrow
502
+ * Supported Authentication Method
503
+ * Flight Sessions
504
+ + Use a Non-Default Project
505
+ * Manage Workloads
506
+ * Sample Arrow Flight Client Applications
507
+
508
+ <div style="page-break-after: always;"></div>
509
+
510
+ # Data Formats | Dremio Documentation
511
+
512
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/data-formats/
513
+
514
+ Dremio supports the following data formats:
515
+
516
+ * File Formats
517
+
518
+ + Delimited text files, such as comma-separated values
519
+ + JSON
520
+ + ORC
521
+ + [Parquet](/dremio-cloud/developer/data-formats/parquet)
522
+ * Table Formats
523
+
524
+ + [Apache Iceberg](/dremio-cloud/developer/data-formats/iceberg)
525
+ + [Delta Lake](/dremio-cloud/developer/data-formats/delta-lake)
526
+
527
+ Was this page helpful?
528
+
529
+ <div style="page-break-after: always;"></div>
530
+
531
+ # Arrow Flight SQL | Dremio Documentation
532
+
533
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/arrow-flight-sql
534
+
535
+ On this page
536
+
537
+ You can use Apache Arrow Flight SQL to develop client applications that interact with Dremio. Apache Arrow Flight SQL is a new API developed by the Apache Arrow community for interacting with SQL databases. For more information about Apache Arrow Flight SQL, see the documentation for the [Apache Arrow project](https://arrow.apache.org/docs/format/FlightSql.html#).
538
+
539
+ Through Flight SQL, client applications can run queries, create prepared statements, and fetch metadata about the SQL dialect supported by datasource in Dremio, available types, defined tables, and more.
540
+
541
+ The requests for running queries are
542
+
543
+ * CommandExecute
544
+ * CommandStatementUpdate
545
+
546
+ The commands on prepared statements are:
547
+
548
+ * ActionClosePreparedStatementRequest: Closes a prepared statement.
549
+ * ActionCreatePreparedStatementRequest: Creates a prepared statement.
550
+ * CommandPreparedStatementQuery: Runs a prepared statement.
551
+ * CommandPreparedStatementUpdate: Runs a prepared statement that updates data.
552
+
553
+ The metadata requests that Dremio supports are:
554
+
555
+ * CommandGetDbSchemas: Lists the schemas that are in a catalog.
556
+ * CommandGetTables: Lists that tables that are in a catalog or schema.
557
+ * CommandGetTableTypes: Lists the table types that are supported in a catalog or schema. The types are Table, View, and System Table.
558
+ * CommandGetSqlInfo: Retrieves information about the datasource and the SQL dialect that it supports.
559
+
560
+ There are two clients already implemented and available in the Apache Arrow repository on GitHub for you to make use of:
561
+
562
+ * [Client in C++](https://github.com/apache/arrow/blob/dfca6a704ad7e8e87e1c8c3d0224ba13b25786ea/cpp/src/arrow/flight/sql/client.h)
563
+ * [Client in Java](https://github.com/apache/arrow/blob/dfca6a704ad7e8e87e1c8c3d0224ba13b25786ea/java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlClient.java)
564
+
565
+ note
566
+
567
+ At this time, you can only connect to the default project in Dremio.
568
+
569
+ ## Use the Sample Client
570
+
571
+ You can download and try out the sample client from <https://github.com/dremio-hub/arrow-flight-sql-clients>. Extract the content of the file and then, in a terminal window, change to the `flight-sql-client-example` directory.
572
+
573
+ Before running the sample client, ensure that you have met these prerequisites:
574
+
575
+ * Add the Samples data lake to your Dremio project by clicking the ![This is the Add Source icon.](/images/icons/plus.png "This is the Add Source icon.") icon in the **Data Lakes** section of the Datasets page.
576
+ * Ensure that Java 8 or later (up to Java 15) is installed on the system on which you run the example commands.
577
+
578
+ ### Command Syntax for the Sample Client
579
+
580
+ Use this syntax when sending commands to the sample client:
581
+
582
+ Sample client usage
583
+
584
+ ```
585
+ Usage: java -jar flight-sql-sample-client-application.jar -host data.dremio.cloud -port 443 ...
586
+
587
+ -command,--command <arg> Method to run
588
+ -dsv,--disableServerVerification <arg> Disable TLS server verification.
589
+ Defaults to false.
590
+ -host,--hostname <arg> `data.dremio.cloud` for Dremio's US control plane
591
+ `data.eu.dremio.cloud` for Dremio's European control plane
592
+ -kstpass,--keyStorePassword <arg> The jks keystore password.
593
+ -kstpath,--keyStorePath <arg> Path to the jks keystore.
594
+ -pat,--personalAccessToken <arg> Personal access token
595
+ -port,--flightport <arg> 443
596
+ -query,--query <arg> The query to run
597
+ -schema,--schema <arg> The schema to use
598
+ -sp,--sessionProperty <arg> Key value pairs of
599
+ SessionProperty, example: -sp
600
+ schema='Samples."samples.dremio.
601
+ com"' -sp key=value
602
+ -table,--table <arg> The table to query
603
+ -tls,--tls <arg> Enable encrypted connection.
604
+ Defaults to true.
605
+ ```
606
+
607
+ ### Examples
608
+
609
+ The examples demonstrate what is returned for each of these requests:
610
+
611
+ * CommandGetDbSchemas
612
+ * CommandGetTables
613
+ * CommandGetTableTypes
614
+ * CommandExecute
615
+
616
+ note
617
+
618
+ These examples use the Flight endpoint for Dremio's US control plane: `data.dremio.cloud`. To use Dremio's European control plane, use this endpoint instead: `data.eu.dremio.cloud`.
619
+
620
+ #### Flight SQL Request: CommandGetDbSchemas
621
+
622
+ This command submits a `CommandGetDbSchemas` request to list the schemas in a catalog.
623
+
624
+ Example CommandGetDbSchemas request
625
+
626
+ ```
627
+ java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetSchemas
628
+ ```
629
+
630
+ Example output for CommandGetDbSchemas request
631
+
632
+ ```
633
+ catalog_name db_schema_name
634
+ null @myUserName
635
+ null INFORMATION_SCHEMA
636
+ null Samples
637
+ null sys
638
+ ```
639
+
640
+ #### Flight SQL Request: CommandGetTables
641
+
642
+ This command submits a `CommandGetTables` request to list the tables that are in a catalog or schema.
643
+
644
+ Example CommandGetTables request
645
+
646
+ ```
647
+ java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTables -schema INFORMATION_SCHEMA
648
+ ```
649
+
650
+ If you have a folder in your schema, you can escape it like this:
651
+
652
+ Example CommandGetTables request with folder in schema
653
+
654
+ ```
655
+ java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTables -schema "Samples\ (1).samples.dremio.com"
656
+ ```
657
+
658
+ Example output for CommandGetTables request
659
+
660
+ ```
661
+ catalog_name db_schema_name table_name table_type
662
+ null INFORMATION_SCHEMA CATALOGS SYSTEM_TABLE
663
+ null INFORMATION_SCHEMA COLUMNS SYSTEM_TABLE
664
+ null INFORMATION_SCHEMA SCHEMATA SYSTEM_TABLE
665
+ null INFORMATION_SCHEMA TABLES SYSTEM_TABLE
666
+ null INFORMATION_SCHEMA VIEWS SYSTEM_TABLE
667
+ ```
668
+
669
+ #### Flight SQL Request: CommandGetTableTypes
670
+
671
+ This command submits a `CommandTableTypes` request to list the table types supported.
672
+
673
+ Example CommandTableTypes request
674
+
675
+ ```
676
+ java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTableTypes
677
+ ```
678
+
679
+ Example output for CommandTableTypes request
680
+
681
+ ```
682
+ table_type
683
+ TABLE
684
+ SYSTEM_TABLE
685
+ VIEW
686
+ ```
687
+
688
+ #### Flight SQL Request: CommandExecute
689
+
690
+ This command submits a `CommandExecute` request to run a single SQL statement.
691
+
692
+ Example CommandExecute request
693
+
694
+ ```
695
+ java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command Execute -query 'SELECT * FROM Samples."samples.<Dremio-user-name>.com"."NYC-taxi-trips" limit 10'
696
+ ```
697
+
698
+ Example output for CommandExecute request
699
+
700
+ ```
701
+ pickup_datetime passenger_count trip_distance_mi fare_amount tip_amount total_amount
702
+ 2013-05-27T19:15 1 1.26 7.5 0.0 8.0
703
+ 2013-05-31T16:40 1 0.73 5.0 1.2 7.7
704
+ 2013-05-27T19:03 2 9.23 27.5 5.0 38.33
705
+ 2013-05-31T16:24 1 2.27 12.0 0.0 13.5
706
+ 2013-05-27T19:17 1 0.71 5.0 0.0 5.5
707
+ 2013-05-27T19:11 1 2.52 10.5 3.15 14.15
708
+ 2013-05-31T16:41 5 1.01 6.0 1.1 8.6
709
+ 2013-05-31T16:37 1 1.25 8.5 0.0 10.0
710
+ 2013-05-31T16:39 1 2.04 10.0 1.5 13.0
711
+ 2013-05-27T19:02 1 11.73 32.5 8.12 41.12
712
+ ```
713
+
714
+ ## Code Samples
715
+
716
+ ### Create a FlightSqlClient
717
+
718
+ Refer to [this code sample](https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/java/src/main/java/com/adhoc/flight/client/AdhocFlightClient.java) to create a `FlightClient`. Then, wrap your `FlightClient` in a `FlightSqlClient`:
719
+
720
+ Wrap FlightClient in FlightSqlClient
721
+
722
+ ```
723
+ // Wraps a FlightClient in a FlightSqlClient
724
+ FlightSqlClient flightSqlClient = new FlightSqlClient(flightClient);
725
+
726
+ // Be sure to close the FlightSqlClient after using it
727
+ flightSqlClient.close();
728
+ ```
729
+
730
+ ### Retrieve a List of Database Schemas
731
+
732
+ This code issues a CommandGetSchemas metadata request:
733
+
734
+ CommandGetSchemas metadata request
735
+
736
+ ```
737
+ String catalog = null; // The catalog. (may be null)
738
+ String dbSchemaFilterPattern = null; // The schema filter pattern. (may be null)
739
+ FlightInfo flightInfo = flightSqlClient.getSchemas(catalog, dbSchemaFilterPattern);
740
+ ```
741
+
742
+ ### Retrieve a List of Tables
743
+
744
+ This code issues a CommandGetTables metadata request:
745
+
746
+ CommandGetTables metadata request
747
+
748
+ ```
749
+ String catalog = null; // The catalog. (may be null)
750
+ String dbSchemaFilterPattern = "Samples\\ (1).samples.dremio.com"; // The schema filter pattern. (may be null)
751
+ String tableFilterPattern = null; // The table filter pattern. (may be null)
752
+ List<String> tableTypes = null; // The table types to include. (may be null)
753
+ boolean includeSchema = false; // True to include the schema upon return, false to not include the schema.
754
+ FlightInfo flightInfo = flightSqlClient.getTables(catalog, dbSchemaFilterPattern, tableFilterPattern, tableTypes, includeSchema);
755
+ ```
756
+
757
+ ### Retrieve a List of Table Types That a Database Supports
758
+
759
+ This code issues a CommandGetTableTypes metadata request:
760
+
761
+ CommandGetTableTypes metadata request
762
+
763
+ ```
764
+ FlightInfo flightInfo = flightSqlClient.getTableTypes();
765
+ ```
766
+
767
+ ### Run a Query
768
+
769
+ This code issues a CommandExecute request:
770
+
771
+ CommandExecute request
772
+
773
+ ```
774
+ FlightInfo flightInfo = flightSqlClient.execute("SELECT * FROM Samples.\"samples.myUserName.com\".\"NYC-taxi-trips\" limit 10");
775
+ ```
776
+
777
+ ### Consume Data Returned for a Query
778
+
779
+ Consume data returned for query
780
+
781
+ ```
782
+ FlightInfo flightInfo; // Use a FlightSqlClient method to get a FlightInfo
783
+
784
+ // 1. Fetch each partition sequentially (though this can be done in parallel)
785
+ for (FlightEndpoint endpoint : flightInfo.getEndpoints()) {
786
+
787
+ // 2. Get a stream of results as Arrow vectors
788
+ try (FlightStream stream = flightSqlClient.getStream(endpoint.getTicket())) {
789
+
790
+ // 3. Iterate through the stream until the end
791
+ while (stream.next()) {
792
+
793
+ // 4. Get a chunk of results (VectorSchemaRoot) and print it to the console
794
+ VectorSchemaRoot vectorSchemaRoot = stream.getRoot();
795
+ System.out.println(vectorSchemaRoot.contentToTSVString());
796
+ }
797
+ }
798
+ }
799
+ ```
800
+
801
+ ## Client Interactions with Dremio
802
+
803
+ This diagram shows an example of how an Arrow Flight SQL client initiates a Flight session and runs a query. It also shows what messages pass between the proxy at the Arrow Flight SQL endpoint, the control plane, and the execution plane.
804
+
805
+ ![](/images/cloud/arrow-flight-sql-session.png)
806
+
807
+ 1. The Flight client, having obtained a PAT from Dremio, calls the `execute()` method, which then sends a `getFlightInfo()` request. This request includes the query to run, the URI for the endpoint, and the bearer token (PAT). A single bearer token can be used for requests until it expires.
808
+
809
+ A `getFlightInfo()` request initiates a new Flight session, which has a duration of 120 minutes. A Flight session is identified by its ID. Session IDs are generated by the proxy at the Arrow Flight SQL endpoint. All requests that pass the same session ID are considered to be in the same Flight session.
810
+ 2. The bearer token includes the user ID and the organization ID. From those two pieces of information, the proxy at the endpoint determines the project ID, and then passes the organization ID, project ID, and user ID in the `getFlightInfo()` request that it forwards to the control plane.
811
+ 3. If the control plane is able to authenticate the Flight client by using the bearer token, it sends a response that includes FlightInfo to the proxy.
812
+
813
+ FlightInfo responses include the single endpoint for the control plane being used and the ticket for that endpoint. There is only one endpoint listed in FlightInfo responses.
814
+ 4. The proxy at the endpoint adds the session ID and the project ID, and passes the response to the client.
815
+ 5. The client sends a `getStream()` request that includes the ticket, a Cookie header for the session ID, the bearer token, and a Cookie header for the ID of the default project.
816
+ 6. The proxy adds the organization ID and passes the `getStream()` request to the control plane.
817
+ 7. The control plane devises the query plan and sends that to the execution plane.
818
+ 8. The execution plane runs the query and sends the results to the control plane in one flight.
819
+ 9. The control plane passes the results to the proxy.
820
+ 10. The proxy passes the results to the client.
821
+
822
+ Was this page helpful?
823
+
824
+ * Use the Sample Client
825
+ + Command Syntax for the Sample Client
826
+ + Examples
827
+ * Code Samples
828
+ + Create a FlightSqlClient
829
+ + Retrieve a List of Database Schemas
830
+ + Retrieve a List of Tables
831
+ + Retrieve a List of Table Types That a Database Supports
832
+ + Run a Query
833
+ + Consume Data Returned for a Query
834
+ * Client Interactions with Dremio
835
+
836
+ <div style="page-break-after: always;"></div>
837
+
838
+ # Apache Iceberg | Dremio Documentation
839
+
840
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/data-formats/iceberg
841
+
842
+ On this page
843
+
844
+ [Apache Iceberg](https://iceberg.apache.org/docs/latest/) enables Dremio to provide powerful, SQL database-like functionality on data lakes using industry-standard SQL commands. Dremio currently supports [Iceberg v2](https://iceberg.apache.org/spec/#version-2) tables, offering a solid foundation for building and managing data lakehouse tables. Certain features, such as Iceberg native branching and tagging, and the UUID data type, are not yet supported.
845
+
846
+ For a deeper dive into Apache Iceberg, see:
847
+
848
+ * [Apache Iceberg: An Architectural Look Under the Covers](https://www.dremio.com/apache-iceberg-an-architectural-look-under-the-covers/)
849
+ * [What is Apache Iceberg?](https://www.dremio.com/data-lake/apache-iceberg/)
850
+
851
+ ### Benefits of Iceberg Tables
852
+
853
+ Iceberg tables offer the following benefits over other formats traditionally used in the data lake, including:
854
+
855
+ * **[Schema evolution](https://iceberg.apache.org/docs/latest/evolution/):** Supports add, drop, update, or rename column commands with no side effects or inconsistency.
856
+ * **[Partition evolution](https://iceberg.apache.org/docs/latest/evolution/#partition-evolution):** Facilitates the modification of partition layouts in a table, such as data volume or query pattern changes without needing to rewrite the entire table.
857
+ * **Transactional consistency:** Helps users avoid partial or uncommitted changes by tracking atomic transactions with atomicity, consistency, isolation, and durability (ACID) properties.
858
+ * **Increased performance:** Ensures data files are intelligently filtered for accelerated processing via advanced partition pruning and column-level statistics.
859
+ * **Time travel:** Allows users to query any previous versions of the table to examine and compare data or reproduce results using previous queries.
860
+ * **[Automatic optimization](/dremio-cloud/manage-govern/optimization):** Optimize query performance to maximize the speed and efficiency with which data is retrieved.
861
+ * **Version rollback:** Corrects any discovered problems quickly by resetting tables to a known good state.
862
+
863
+ ## Clustering
864
+
865
+ Clustered Iceberg tables in Dremio makes use of Z-Ordering to provide a more intuitive data layout with comparable or better performance characteristics to Iceberg partitioning.
866
+
867
+ Iceberg clustering sorts individual records in data files based on the clustered columns provided in the [`CREATE TABLE`](/dremio-cloud/sql/commands/create-table) or [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table/) statement. The data file level clustering of data allows Parquet metadata to be used in query planning and execution to reduce the amount of data scanned as part of the query. In addition, clustering eliminates common problems with partitioned data, such as over-partitioned tables and partition skew.
868
+
869
+ Clustering provides a general-purpose file layout that enables both efficient reads and writes. However, you may not see immediate benefits from clustering if the tables are too small.
870
+
871
+ A common pattern is to choose clustered columns that are either primary keys of the table or commonly used for query filters. These column choices will effectively filter the working dataset, thereby improving query times. Clustered columns are ordered in precedence of filtering or cardinality with the most commonly queried columns of highest cardinality first.
872
+
873
+ #### Supported Data Types for Clustered Columns
874
+
875
+ Dremio Iceberg clustering supports clustered columns of the following data types:
876
+
877
+ * `DECIMAL`
878
+ * `INT`
879
+ * `BIGINT`
880
+ * `FLOAT`
881
+ * `DOUBLE`
882
+ * `VARCHAR`
883
+ * `VARBINARY`
884
+ * `DATE`
885
+ * `TIME`
886
+ * `TIMESTAMP`
887
+
888
+ Automated table maintenance eliminates the need to run optimizations for clustered Iceberg tables manually, although if using manual optimization, its behavior differs based on whether or not tables are clustered.
889
+
890
+ For clustered tables, [`OPTIMIZE TABLE`](/dremio-cloud/sql/commands/optimize-table) incrementally reorders data to achieve the optimal data layout and manages file sizes. This mechanism may take longer to run on newly loaded or unsorted tables. Additionally, you may be required to run multiple `OPTIMIZE TABLE` SQL commands to converge on an optimal file layout.
891
+
892
+ For unclustered tables, `OPTIMIZE TABLE` combines small files or splits large files to achieve an optimal file size, reducing metadata overhead and runtime file open costs.
893
+
894
+ #### CTAS Behavior and Clustering
895
+
896
+ When running a [`CREATE TABLE AS`](/dremio-cloud/sql/commands/create-table-as) statement with clustering, the data is written in an unordered way. For the best performance, you should run an `OPTIMIZE TABLE` SQL command after creating a table using a [`CREATE TABLE AS`](/dremio-cloud/sql/commands/create-table-as) statement.
897
+
898
+ ## Iceberg Table Management
899
+
900
+ Learn how to manage Iceberg tables in Dremio with supported Iceberg features such as expiring snapshots and optimizing tables.
901
+
902
+ ### Vacuum
903
+
904
+ Each write to an Iceberg table creates a snapshot of that table, which is a timestamped version of the table. As snapshots accumulate, data files that are no longer referenced in recent snapshots take up more and more storage. Additionally, the more snapshots a table has, the larger its metadata becomes. You can expire older snapshots to delete the data files that are unique to them and to remove them from table metadata. It is recommended that you expire snapshots regularly. For the SQL command to expire snapshots, see [`VACUUM TABLE`](/dremio-cloud/sql/commands/vacuum-table/).
905
+
906
+ Sometimes failed SQL commands may leave orphan data files in the table location that are no longer referenced by any active snapshot of the table. You can remove orphan files in the table location by running `remove_orphan_files`. See [`VACUUM TABLE`](/dremio-cloud/sql/commands/vacuum-table/) for details.
907
+
908
+ ### Optimization
909
+
910
+ Dremio provides [automatic optimization](/dremio-cloud/manage-govern/optimization/), which automatically maintains Iceberg tables in the Open Catalog using a dedicated engine configured by Dremio. However, for immediate optimization, you can use the [`OPTIMIZE TABLE`](/dremio-cloud/sql/commands/optimize-table) SQL command and route jobs to specific engines in your project by creating a routing rule with the `query_label()` condition and the `OPTIMIZATION` label. For more information, see [Workload Management](/dremio-cloud/admin/engines/workload-management).
911
+
912
+ When optimizing tables manually, you can use:
913
+
914
+ * [`FOR PARTITIONS`](/dremio-cloud/sql/commands/optimize-table/) to optimize selected partitions.
915
+ * [`MIN_INPUT_FILES`](/dremio-cloud/sql/commands/optimize-table/) to consider the minimum number of qualified files needed for compaction. Delete files count towards determining whether the minimum threshold is reached.
916
+
917
+ ## Iceberg Catalogs in Dremio
918
+
919
+ The Apache Iceberg table format uses an Iceberg catalog service to track snapshots and ensure transactional consistency between tools. For more information about how Iceberg catalogs and tables work together, see [Iceberg Catalog](https://www.dremio.com/resources/guides/apache-iceberg-an-architectural-look-under-the-covers/#toc_item_Iceberg%20catalog).
920
+
921
+ note
922
+
923
+ Currently, Dremio does not support the Amazon DynamoDB nor JDBC catalogs. For additional information on limitations of Apache Iceberg as implemented in Dremio, see Limitations.
924
+
925
+ The catalog is the source of truth for the current metadata pointer for a table. You can use [Dremio's Open Catalog](/dremio-cloud/developer/data-formats/iceberg/#iceberg-catalogs-in-dremio) as a catalog for all your tables. You can also add external Iceberg catalogs as a source in Dremio, which allows you to work with Iceberg tables that are not cataloged in Dremio's Open Catalog.The list of Iceberg catalogs that can be added as a source can be found here:
926
+
927
+ * AWS Glue Data Catalog
928
+ * Iceberg REST Catalog
929
+ * Snowflake Open Catalog
930
+ * Unity Catalog
931
+
932
+ Once a table is created with a specific catalog, you must continue using that same catalog to access the table. For example, if you create a table using AWS Glue as the catalog, you cannot later access that table by adding its S3 location as a source in Dremio. You must add the AWS Glue Data Catalog as a source and access the table through it.
933
+
934
+ ## Rollbacks
935
+
936
+ When you modify an Iceberg table using data definition language (DDL) or data manipulation language (DML), each change creates a new [snapshot](https://iceberg.apache.org/terms/#snapshot) in the table's metadata. The Iceberg [catalog](/dremio-cloud/developer/data-formats/iceberg/#iceberg-catalogs-in-dremio) tracks the current snapshot through a root pointer.
937
+ You can use the [`ROLLBACK TABLE`](/dremio-cloud/sql/commands/rollback-table) SQL command to roll back a table by redirecting this pointer to an earlier snapshot—useful for undoing recent data errors. Rollbacks can target a specific timestamp or snapshot ID.
938
+ When you perform a rollback, Dremio creates a new snapshot identical to the selected one. For example, if a table has snapshots (1) `first_snapshot`, (2) `second_snapshot`, and (3) `third_snapshot`, rolling back to `first_snapshot` restores the table to that state while preserving all snapshots for time travel queries.
939
+
940
+ ## SQL Command Compatibility
941
+
942
+ Dremio supports running most combinations of concurrent SQL commands on Iceberg tables. To take a few examples, two [`INSERT`](/dremio-cloud/sql/commands/insert) commands can run concurrently on the same table, as can two [`SELECT`](/dremio-cloud/sql/commands/SELECT) commands, or an [`UPDATE`](/dremio-cloud/sql/commands/update) and an [`ALTER`](/dremio-cloud/sql/commands/alter-table) command.
943
+
944
+ However, Apache Iceberg’s Serializable Isolation level with non-locking table semantics can result in scenarios in which write collisions occur. In these circumstances, the SQL command that finishes second fails with an error. Such failures occur only for a subset of combinations of two SQL commands running concurrently on a single Iceberg table.
945
+
946
+ This table shows which types of SQL commands can and cannot run concurrently with other types on a single Iceberg table:
947
+
948
+ * Y: Running these two types of commands concurrently is supported.
949
+ * N: Running these two types of commands concurrently is not supported. The second command to complete fails with an error.
950
+ * D: Running two [`OPTIMIZE`](/cloud/reference/sql/commands/optimize-table) commands concurrently is supported if they run against different table partitions.
951
+
952
+ ![SQL commands that cause concurrency conflicts](/images/concurrency-table.png "SQL commands that cause concurrency conflicts")
953
+
954
+ ## Table Properties
955
+
956
+ The following Apache Iceberg table properties are supported in Dremio. You can use these properties to configure aspects of Apache Iceberg tables:
957
+
958
+ | Property | Description | Default |
959
+ | --- | --- | --- |
960
+ | commit.manifest.target-size-bytes | The target size when merging manifest files. | `8 MB` |
961
+ | commit.status-check.num-retries | The number of times to check whether a commit succeeded after a connection is lost before failing due to an unknown commit state. | `3` |
962
+ | compatibility.snapshot-id-inheritance.enabled | Enables committing snapshots without explicit snapshot IDs. | `false` (always `true` if the format version is > 1) |
963
+ | format-version | The table’s format version defined in the Spec. Options: `1` or `2` | `2` |
964
+ | history.expire.max-snapshot-age-ms | The maximum age (in milliseconds) of snapshots to keep as expiring snapshots. | `432000000` (5 days) |
965
+ | history.expire.min-snapshots-to-keep | The default minimum number of snapshots to keep as expiring snapshots. | `1` |
966
+ | write.delete.mode | The table’s method for handling row-level deletes. See [Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg](https://www.dremio.com/blog/row-level-changes-on-the-lakehouse-copy-on-write-vs-merge-on-read-in-apache-iceberg/) for more information on which mode is best for your table’s DML operations. Options: `copy-on-write` or `merge-on-read` | `copy-on-write` |
967
+ | write.merge.mode | The table’s method for handling row-level merges. See [Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg](https://www.dremio.com/blog/row-level-changes-on-the-lakehouse-copy-on-write-vs-merge-on-read-in-apache-iceberg/) for more information on which mode is best for your table’s DML operations. Options: `copy-on-write` or `merge-on-read` | `copy-on-write` |
968
+ | write.metadata.compression-codec | The Metadata compression codec. Options: `none` or `gzip` | `none` |
969
+ | write.metadata.delete-after-commit.enabled | Controls whether to delete the oldest tracked version metadata files after commit. | `false` |
970
+ | write.metadata.metrics.column.col1 | Metrics mode for column `col1` to allow per-column tuning. Options: `none`, `counts`, `truncate(length)`, or `full` | (not set) |
971
+ | write.metadata.metrics.default | Default metrics mode for all columns in the table. Options: `none`, `counts`, `truncate(length)`, or `full` | `truncate(16)` |
972
+ | write.metadata.metrics.max-inferred-column-defaults | Defines the maximum number of top-level columns for which metrics are collected. The number of stored metrics can be higher than this limit for a table with nested fields. | `100` |
973
+ | write.metadata.previous-versions-max | The maximum number of previous version metadata files to keep before deleting after commit. | `100` |
974
+ | write.parquet.compression-codec | The Parquet compression codec. Options: `zstd`, `gzip`, `snappy`, or `uncompressed` | `zstd` |
975
+ | write.parquet.compression-level | The Parquet compression level. Supported for `gzip` and `zstd`. | `null` |
976
+ | write.parquet.dict-size-bytes | The Parquet dictionary page size (in bytes). | `2097152` (2 MB) |
977
+ | write.parquet.page-row-limit | The Parquet page row limit. | `20000` |
978
+ | write.parquet.page-size-bytes | The Parquet page size (in bytes). | `1048576` (1 MB) |
979
+ | write.parquet.row-group-size-bytes | Parquet row group size. Dremio uses this property as a target file size since it writes one row-group per Parquet file. Ignores the `store.parquet.block-size` and `dremio.iceberg.optimize.target_file_size_mb` support keys. | `134217728` (128 MB) |
980
+ | write.summary.partition-limit | Includes partition-level summary stats in snapshot summaries if the changed partition count is less than this limit. | `0` |
981
+ | write.update.mode | The table’s method for handling row-level updates. See [Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg](https://www.dremio.com/blog/row-level-changes-on-the-lakehouse-copy-on-write-vs-merge-on-read-in-apache-iceberg/) for more information on which mode is best for your table’s DML operations. Options: `copy-on-write` or `merge-on-read` | `copy-on-write` |
982
+
983
+ You can configure these properties when you [create](/dremio-cloud/sql/commands/create-table) or [alter](/dremio-cloud/sql/commands/alter-table) Iceberg tables.
984
+
985
+ Dremio uses the Iceberg default value for table properties that are not set. See Iceberg's documentation for the full list of [table properties](https://iceberg.apache.org/docs/latest/configuration/#table-properties). To view the properties that are set for a table, use the SQL command [`SHOW TBLPROPERTIES`](/dremio-cloud/sql/commands/show-table-properties).
986
+
987
+ In cases where Dremio has a support key for a feature covered by a table property, Dremio uses the table property instead of the support key.
988
+
989
+ ## Limitations
990
+
991
+ The following are limitations with Apache Iceberg as implemented in Dremio:
992
+
993
+ * Only Parquet file formats are currently supported. Other formats (such as ORC and Avro) are not supported at this time.
994
+ * Amazon DynamoDB and JDBC catalogs are currently not supported.
995
+ * Unable to use DynamoDB as a lock manager with the Hadoop catalog on Amazon S3.
996
+ * Dremio caches query plans for recently executed statements to improve query performance. However, running a rollback query using a snapshot ID invalidates all cached query plans that reference the affected table.
997
+ * If a table is running DML operations when a rollback query using a snapshot ID executes, the DML operations can fail to complete because the current snapshot ID has changed to a new value due to the rollback query. However, `SELECT` queries that are in the midst of executing can be completed.
998
+ * Clustering keys must be columns in the table. Transformations are not supported.
999
+ * You can run only one optimize query at a time on the selected Iceberg table partition.
1000
+ * The optimize functionality does not support sort ordering.
1001
+
1002
+ ## Related Topics
1003
+
1004
+ * [Automatic Optimization](/dremio-cloud/manage-govern/optimization/) – Learn how Dremio optimizes Iceberg tables automatically.
1005
+ * [Load Data Into Tables](/dremio-cloud/bring-data/load/) - Load data from CSV, JSON, or Parquet files into existing Iceberg tables.
1006
+ * [SQL Commands](/dremio-cloud/sql/commands/) – See the syntax of the SQL commands that Dremio supports for Iceberg tables.
1007
+
1008
+ Was this page helpful?
1009
+
1010
+ * Benefits of Iceberg Tables
1011
+ * Clustering
1012
+ * Iceberg Table Management
1013
+ + Vacuum
1014
+ + Optimization
1015
+ * Iceberg Catalogs in Dremio
1016
+ * Rollbacks
1017
+ * SQL Command Compatibility
1018
+ * Table Properties
1019
+ * Limitations
1020
+ * Related Topics
1021
+
1022
+ <div style="page-break-after: always;"></div>
1023
+
1024
+ # Parquet | Dremio Documentation
1025
+
1026
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/data-formats/parquet
1027
+
1028
+ On this page
1029
+
1030
+ This topic provides general information and recommendations for Parquet files.
1031
+
1032
+ ## Read Parquet Files
1033
+
1034
+ Dremio's vectorized Parquet file reader improves parallelism on columnar data, reduces latencies, and enables more efficient resource and memory usage.
1035
+
1036
+ Dremio supports off-heap memory buffers for reading Parquet files.
1037
+
1038
+ Dremio supports file compression with `snappy`, `gzip`, and `zstd` for reading Parquet files.
1039
+
1040
+ ## Parquet Limitations
1041
+
1042
+ Take into consideration the following limitations when generating and configuring Parquet files. Failure to adhere to these restrictions may cause errors to trigger when using Parquet files with Dremio.
1043
+
1044
+ * **Maximum nested levels are restricted to 16.** Multiple structs may be defined up to a total nesting level of 16. Exceeding this results in a failed query.
1045
+ * **Maximum allowable elements in an array are restricted to 128.** The maximum allowable number of elements in an array may not exceed this quantity. Additional elements beyond the allowed 128 results in a query failure.
1046
+ * **Maximum footer size is restricted to 16MB.** The footer consists of metadata. This includes information about the version of the format, the schema, extra key-value pairs, and metadata for columns in the file. When the footer exceeds this size, a query failure occurs.
1047
+
1048
+ ## Recommended Configuration
1049
+
1050
+ When using other tools to generate Parquet files for consumption in Dremio, we recommend the following configuration:
1051
+
1052
+ | Type | Implementation |
1053
+ | --- | --- |
1054
+ | Row Groups | Implement your row groups using the following: A single row group per file, and a target of 1MB-25MB column stripes for most datasets (ideally). By default, Dremio uses 256 MB row groups for the Parquet files that it generates. |
1055
+ | Pages | Implement your pages using the following: Snappy compression, and a target of ~100K page size. Use a recent Parquet library to avoid bad statistics issues. |
1056
+ | Statistics | Use a recent Parquet library to avoid bad statistics issues. |
1057
+
1058
+ Was this page helpful?
1059
+
1060
+ * Read Parquet Files
1061
+ * Parquet Limitations
1062
+ * Recommended Configuration
1063
+
1064
+ <div style="page-break-after: always;"></div>
1065
+
1066
+ # Delta Lake | Dremio Documentation
1067
+
1068
+ Original URL: https://docs.dremio.com/dremio-cloud/developer/data-formats/delta-lake
1069
+
1070
+ On this page
1071
+
1072
+ [Delta Lake](https://docs.delta.io/latest/index.html) is an open-source table format that provides transactional consistency and increased scale for datasets by creating a consistent definition of datasets and including schema evolution changes and data mutations. With Delta Lake, updates to datasets are viewed in a consistent manner across all applications consuming the datasets, and users are kept from seeing inconsistent views of data during transformations. Consistent and reliable views of datasets in a data lake are maintained even as the datasets are updated and modified over time.
1073
+
1074
+ Data consistency for a dataset is enabled through the creation of a series of manifest files which define the schema and data for a given point in time, as well as a transaction log that defines an ordered record of every transaction on the dataset. By reading the transaction log and manifest files, applications are guaranteed to see a consistent view of data at any point in time, and users can ensure intermediate changes are invisible until a write operation is complete.
1075
+
1076
+ Delta Lake provides the following benefits:
1077
+
1078
+ * Large-scale support: Efficient metadata handling enables applications to readily process petabyte-sized datasets with millions of files
1079
+ * Schema consistency: All applications processing a dataset operate on a consistent and shared definition of the dataset metadata such as columns, data types, partitions.
1080
+
1081
+ ## Supported Data Sources
1082
+
1083
+ The Delta Lake table format is supported with the following sources in the Parquet file format:
1084
+
1085
+ * [Amazon S3](/dremio-cloud/bring-data/connect/object-storage/amazon-s3)
1086
+ * [AWS Glue Data Catalog](/dremio-cloud/bring-data/connect/catalogs/aws-glue-data-catalog)
1087
+
1088
+ ## Analyze Delta Lake Datasets
1089
+
1090
+ Dremio supports analyzing Delta Lake datasets on the sources listed above through a native and high-performance reader. It automatically identifies which datasets are saved in the Delta Lake format, and imports table information from the Delta Lake manifest files. Dataset promotion is seamless and operates the same as any other data format in Dremio, where users can promote file system directories containing a Delta Lake dataset to a table manually or automatically by querying the directory. When using Delta Lake format, Dremio supports datasets of any size including petabyte-sized datasets with billions of files.
1091
+
1092
+ Dremio reads Delta Lake tables created or updated by another engine, such as Spark and others, with transactional consistency. Dremio automatically identifies tables that are in the Delta Lake format and selects the appropriate format for the user.
1093
+
1094
+ ### Refresh Metadata
1095
+
1096
+ Metadata refresh is required to query the latest version of a Delta Lake table. You can wait for an automatic refresh of metadata or manually refresh it.
1097
+
1098
+ #### Example of Querying a Delta Lake Table
1099
+
1100
+ Perform the following steps to query a Delta Lake table:
1101
+
1102
+ 1. In Dremio, open the **Datasets** page.
1103
+ 2. Go to the data source that contains the Delta Lake table.
1104
+ 3. If the data source is not an AWS Glu Data Catalog, follow these steps:
1105
+ 1. Hover over the row for the table and click ![The Format Folder icon](/images/cloud/format-data.png "The Format Folder icon") to the right. Dremio automatically identifies tables that are in the Delta Lake format and selects the appropriate format.
1106
+ 2. Click **Save**.
1107
+ 4. If the data source is an AWS Glue Data Catalog, hover over the row for the table and click ![The Go To Table icon](/images/cloud/go-to-table.png "The Go To Table icon") to the right.
1108
+ 5. Run a query on the Delta Lake table to see the results.
1109
+ 6. Update the table in the data source.
1110
+ 7. Go back to the **Datasets** UI and wait for the table metadata to refresh or manually refresh it using the syntax below.
1111
+
1112
+ Syntax to manually refresh table metadata
1113
+
1114
+ ```
1115
+ ALTER TABLE `<path_of_the_dataset>`
1116
+ REFRESH METADATA
1117
+ ```
1118
+
1119
+ The following statement shows refreshing metadata of a Delta Lake table.
1120
+
1121
+ Example command to manually refresh table metadata
1122
+
1123
+ ```
1124
+ ALTER TABLE s3."data.dremio.com".data.deltalake."tpcds10_delta"."call_center"
1125
+ REFRESH METADATA
1126
+ ```
1127
+
1128
+ 8. Run the previous query on the Delta Lake table to retrieve the results from the updated Delta Lake table.
1129
+
1130
+ ## Limitations
1131
+
1132
+ * Creating Delta Lake tables is not supported.
1133
+ * DML operations are not supported.
1134
+ * Incremental Reflections are not supported.
1135
+ * Metadata refresh is required to query the latest version of a Delta Lake table.
1136
+ * Time travel or data versioning is not supported.
1137
+ * Only Delta Lake tables with minReaderVersion 1 or 2 can be read. Column Mapping is supported with minReaderVersion 2.
1138
+
1139
+ Was this page helpful?
1140
+
1141
+ * Supported Data Sources
1142
+ * Analyze Delta Lake Datasets
1143
+ + Refresh Metadata
1144
+ * Limitations
1145
+
1146
+ <div style="page-break-after: always;"></div>
1147
+