dremiojs 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/.eslintrc.json +14 -0
  2. package/.prettierrc +7 -0
  3. package/README.md +59 -0
  4. package/dremiodocs/dremio-cloud/cloud-api-reference.md +748 -0
  5. package/dremiodocs/dremio-cloud/dremio-cloud-about.md +225 -0
  6. package/dremiodocs/dremio-cloud/dremio-cloud-admin.md +3754 -0
  7. package/dremiodocs/dremio-cloud/dremio-cloud-bring-data.md +6098 -0
  8. package/dremiodocs/dremio-cloud/dremio-cloud-changelog.md +32 -0
  9. package/dremiodocs/dremio-cloud/dremio-cloud-developer.md +1147 -0
  10. package/dremiodocs/dremio-cloud/dremio-cloud-explore-analyze.md +2522 -0
  11. package/dremiodocs/dremio-cloud/dremio-cloud-get-started.md +300 -0
  12. package/dremiodocs/dremio-cloud/dremio-cloud-help-support.md +869 -0
  13. package/dremiodocs/dremio-cloud/dremio-cloud-manage-govern.md +800 -0
  14. package/dremiodocs/dremio-cloud/dremio-cloud-overview.md +36 -0
  15. package/dremiodocs/dremio-cloud/dremio-cloud-security.md +1844 -0
  16. package/dremiodocs/dremio-cloud/sql-docs.md +7180 -0
  17. package/dremiodocs/dremio-software/dremio-software-acceleration.md +1575 -0
  18. package/dremiodocs/dremio-software/dremio-software-admin.md +884 -0
  19. package/dremiodocs/dremio-software/dremio-software-client-applications.md +3277 -0
  20. package/dremiodocs/dremio-software/dremio-software-data-products.md +560 -0
  21. package/dremiodocs/dremio-software/dremio-software-data-sources.md +8701 -0
  22. package/dremiodocs/dremio-software/dremio-software-deploy-dremio.md +3446 -0
  23. package/dremiodocs/dremio-software/dremio-software-get-started.md +848 -0
  24. package/dremiodocs/dremio-software/dremio-software-monitoring.md +422 -0
  25. package/dremiodocs/dremio-software/dremio-software-reference.md +677 -0
  26. package/dremiodocs/dremio-software/dremio-software-security.md +2074 -0
  27. package/dremiodocs/dremio-software/dremio-software-v25-api.md +32637 -0
  28. package/dremiodocs/dremio-software/dremio-software-v26-api.md +36757 -0
  29. package/jest.config.js +10 -0
  30. package/package.json +25 -0
  31. package/src/api/catalog.ts +74 -0
  32. package/src/api/jobs.ts +105 -0
  33. package/src/api/reflection.ts +77 -0
  34. package/src/api/source.ts +61 -0
  35. package/src/api/user.ts +32 -0
  36. package/src/client/base.ts +66 -0
  37. package/src/client/cloud.ts +37 -0
  38. package/src/client/software.ts +73 -0
  39. package/src/index.ts +16 -0
  40. package/src/types/catalog.ts +31 -0
  41. package/src/types/config.ts +18 -0
  42. package/src/types/job.ts +18 -0
  43. package/src/types/reflection.ts +29 -0
  44. package/tests/integration_manual.ts +95 -0
  45. package/tsconfig.json +19 -0
@@ -0,0 +1,800 @@
1
+ # Manage and Govern Your Data | Dremio Documentation
2
+
3
+ Original URL: https://docs.dremio.com/dremio-cloud/manage-govern/
4
+
5
+ On this page
6
+
7
+ Data management focuses on the operational efficiency, performance, and reliability of your data at scale. With Dremio’s autonomous management capabilities, many of these processes are intelligently automated; reducing manual effort and ensuring consistent optimization. Dremio automates table optimization by merging small files into optimally sized ones (typically around 512 MB), reducing metadata overhead, and reclaiming storage by physically removing deleted rows. It also reorganizes data to align with clustering specifications, ensuring consistent, high-performance queries across large datasets. Together, these autonomous management features help keep your lakehouse fast, efficient, and cost-effective.
8
+
9
+ Data governance is the foundation of a secure, reliable, and compliant lakehouse. It ensures that data across your environment is accurate, consistent, and properly controlled throughout its lifecycle. With Dremio, you can implement robust governance practices by maintaining complete data lineage for transparency and auditability, defining role-based and fine-grained (row-access and column-masking) access controls on data, and using documentation and tags to improve data discoverability. Together, these capabilities enable trustworthy, well-governed data that fuels analytics and AI with confidence.
10
+
11
+ ## Autonomous Management
12
+
13
+ ### Optimization
14
+
15
+ Managing [Apache Iceberg tables](/dremio-cloud/manage-govern/optimization/) is critical to maintaining fast and predictable query performance, especially for agentic AI workloads that demand low latency. As new data is ingested and tables are updated, metadata and small data files accumulate, leading to performance degradation over time. Dremio automates table optimization by merging small files into optimally sized ones (typically ~512 MB), reducing metadata overhead, organizing data to align with clustering specification and reclaiming storage by physically removing deleted rows.
16
+
17
+ ### Clustering
18
+
19
+ Dremio also reorganizes data to align with [clustering](/dremio-cloud/manage-govern/optimization/) specifications, ensuring consistent, high-performance queries at scale.
20
+
21
+ ### Materialize and Query Rewrite
22
+
23
+ Dremio can autonomously materialize datasets using Reflections, a precomputed and optimized copy of source data or a query result, designed to speed up query performance. Dremio's query optimizer can accelerate a query against tables or views by using one or more Reflections to partially or entirely satisfy that query, rather than processing the raw data in the underlying data source. Queries do not need to reference Reflections directly. Instead, Dremio rewrites queries on the fly to use the Reflections that satisfy them. For more information, see [Reflections](/dremio-cloud/admin/performance/autonomous-reflections/).
24
+
25
+ ## Governance
26
+
27
+ ### Lineage
28
+
29
+ Track and visualize how data flows through your lakehouse, from source to consumption. [Lineage](/dremio-cloud/manage-govern/lineage/) helps you understand data origins, track transformations, identify dependencies, and perform impact analysis.
30
+
31
+ ### Wikis
32
+
33
+ Enrich data understanding by documenting datasets with wikis. Use Generative AI to automatically generate [wikis](/dremio-cloud/manage-govern/wikis-labels/), reducing manual documentation effort. Wikis are used by Dremio's AI Agent to understand the semantics of your environment and adhere to these definitions in response to user prompts.
34
+
35
+ ### Labels
36
+
37
+ Enhance data discoverability and searchability by categorizing datasets with labels. Use Generative AI to automatically generate [labels](/dremio-cloud/manage-govern/wikis-labels/), reducing manual cataloging effort.
38
+
39
+ ### Role-Based Access Control Policies
40
+
41
+ Manage access to datasets through [roles](/dremio-cloud/security/roles) rather than individual user grants for easier administration. Assign [privileges](/dremio-cloud/security/privileges) to roles, simplifying management and ensuring users only have access to what they need to perform their job.
42
+
43
+ ### Row-Access and Column-Masking Policies
44
+
45
+ Apply fine-grained access controls to protect sensitive data using row-access and column-masking policies. Control access to specific rows and columns based on rules and conditions to maintain compliance and adhere to regulatory requirements. For more information, see [Row-Access & Column-Masking Policies](/dremio-cloud/manage-govern/row-column-policies/).
46
+
47
+ ## Related Topics
48
+
49
+ * [Roles](/dremio-cloud/security/roles) – Manage role-based access control.
50
+ * [Explore and Analyze Your Data](/dremio-cloud/explore-analyze/) - Explore and analyze your governed data.
51
+ * [Catalog API - Lineage](/dremio-cloud/api/catalog/lineage/) - Retrieve lineage information about datasets.
52
+
53
+ Was this page helpful?
54
+
55
+ * Autonomous Management
56
+ + Optimization
57
+ + Clustering
58
+ + Materialize and Query Rewrite
59
+ * Governance
60
+ + Lineage
61
+ + Wikis
62
+ + Labels
63
+ + Role-Based Access Control Policies
64
+ + Row-Access and Column-Masking Policies
65
+ * Related Topics
66
+
67
+ <div style="page-break-after: always;"></div>
68
+
69
+ # Lineage | Dremio Documentation
70
+
71
+ Original URL: https://docs.dremio.com/dremio-cloud/manage-govern/lineage
72
+
73
+ On this page
74
+
75
+ Lineage provides a graph of a dataset's relationships (its source, parent datasets, and child datasets) to illustrate how datasets are connected, where the data originates, while tracking its movement and transformations.
76
+
77
+ By default, the lineage graph focuses on the initially selected dataset and its relationships with other datasets, represented as nodes that display the dataset name and path. To view additional metadata, use the **Show/hide layers** options.
78
+
79
+ If you wish to track lineage for a different dataset node, the lineage graph needs to be refocused. To refocus the lineage graph on a different dataset, you can either click ![This is the Focus icon.](/images/icons/focus.png "Focus icon") or ![](/images/icons/more.png) on the right of the dataset name, and then select **Focus on this dataset**.
80
+
81
+ ![This is a screenshot showing the option to refocus the lineage graph on a different dataset.](/images/lineage-focus.png "Lineage focus on dataset")
82
+
83
+ ## Privileges Required for Lineage
84
+
85
+ * If you have the `SELECT` privilege on the parent datasets and the child datasets, you can see the parent datasets and data sources on the left. The child datasets appear on the right.
86
+ * If you have only the `READ METADATA` privilege on the parent and child datasets, then you can only see limited metadata for these datasets.
87
+ * If you do not have the `SELECT` or the `READ METADATA` privilege on the parent and child datasets, they are not visible.
88
+
89
+ ## Lineage Refresh with Dataset Schema Changes
90
+
91
+ For datasets in Iceberg REST catalogs, the lineage graphs are stored in Dremio's metadata cache, which is automatically refreshed at fixed time intervals. For more information, see [Metadata Refresh](/dremio-cloud/bring-data/connect/catalogs/iceberg-rest-catalog/#metadata). It is possible that the lineage graph might show an outdated schema for the dataset if the dataset schema has been recently updated and Dremio's metadata cache has not yet been refreshed.
92
+
93
+ Was this page helpful?
94
+
95
+ * Privileges Required for Lineage
96
+ * Lineage Refresh with Dataset Schema Changes
97
+
98
+ <div style="page-break-after: always;"></div>
99
+
100
+ # Automatic Optimization | Dremio Documentation
101
+
102
+ Original URL: https://docs.dremio.com/dremio-cloud/manage-govern/optimization
103
+
104
+ On this page
105
+
106
+ As [Apache Iceberg](/dremio-cloud/developer/data-formats/iceberg) tables are written to and updated, data and metadata files accumulate, which can affect query performance. For example, small files produced by data ingestion jobs slow queries because the query engine must read more files.
107
+
108
+ To optimize performance, Dremio automates table maintenance in the Open Catalog. This process compacts small files into larger ones, partitions data based on the values of a table's columns, rewrites manifest files, removes position delete files, and clusters tables—improving query speed while reducing storage costs.
109
+
110
+ Automatic optimization runs on a dedicated engine configured by Dremio, ensuring peak performance without impacting project query workloads.
111
+
112
+ When Dremio optimizes a table, it evaluates file sizes, partition layout, and metadata organization to reduce I/O and metadata overhead. Optimization consists of five main operations: clustering, data file compaction, partition evolution, manifest file rewriting, and position delete files.
113
+
114
+ ## Clustering
115
+
116
+ Iceberg clustering sorts individual records in data files based on the clustered columns provided in the [`CREATE TABLE`](/dremio-cloud/sql/commands/create-table/) or [`ALTER TABLE`](/dremio-cloud/sql/commands/alter-table/) statement.
117
+
118
+ To cluster a table, you must first define the clustering keys. Then, automatic optimization uses the clustering keys to optimize tables. For details, see [Clustering](/dremio-cloud/developer/data-formats/iceberg/#clustering).
119
+
120
+ ## Data File Compaction
121
+
122
+ Iceberg tables that are constantly being updated can have data files of various sizes. As a result, query performance can be negatively affected by sub-optimal file sizes. The optimal file size in Dremio is 256 MB.
123
+
124
+ Dremio logically combines smaller files and splits larger ones to 256 MB (see the following graphic), helping to reduce metadata overhead and costs related to opening and reading files.
125
+
126
+ ![Optimizing file sizes in Dremio.](/images/file-sizes3.png "Optimizing file sizes in Dremio.")
127
+
128
+ ## Partition Evolution
129
+
130
+ To improve read or write performance, data is partitioned based on the values of a table's columns. If the columns used in a partition evolve over time, query performance can be impacted when the queries are not aligned with the current segregations of the partition. Dremio detects and rewrites these files to align with the current partition specification. This operation is used:
131
+
132
+ * When select partitions are queried more often or are of more importance (than others), and it's not necessary to optimize the entire table.
133
+ * When select partitions are more active and are constantly being updated. Optimization should only occur when activity is low or paused.
134
+
135
+ ## Manifest File Rewriting
136
+
137
+ Iceberg uses metadata files (or manifests) to track point-in-time snapshots by maintaining all deltas as a table. This metadata layer functions as an index over a table’s data and the manifest files contained in this layer speed up query planning and prune unnecessary data files. For Iceberg tables that are constantly being updated (such as the ingestion of streaming data or users performing frequent DML operations), the number of manifest files that are suboptimal in size can grow over time. Additionally, the clustering of metadata entries in these files may not be optimal. As a result, suboptimal manifests can impact the time it takes to plan and execute a query.
138
+
139
+ Dremio rewrites these manifest files quickly based on size criteria. The target size for a manifest file is based on the Iceberg table's property. If a default size is not set, Dremio defaults to 8 MB. For the target size, Dremio considers the range from 0.75x to 1.8x, inclusive, to be optimal. Manifest files exceeding the 1.8x size will be split while files smaller than the 0.75x size will be compacted.
140
+
141
+ This operation results in the optimization of the metadata, helping to reduce query planning time.
142
+
143
+ ## Position Delete Files
144
+
145
+ Iceberg v2 added the ability for delete files to be encoded to rows that have been deleted in existing data files. This enables you to delete or replace individual rows in immutable data files without the need to rewrite those files. [Position delete files](https://iceberg.apache.org/spec/#position-delete-files) identify deleted rows by file and position in one or more data files, as shown in the following example.
146
+
147
+ | `file_path` | `pos` |
148
+ | --- | --- |
149
+ | `file:/Users/test.user/Downloads/gen_tables/orders_with_deletes/data/2021/2021-00.parquet` | `6` |
150
+ | `file:/Users/test.user/Downloads/gen_tables/orders_with_deletes/data/2021/2021-00.parquet` | `16` |
151
+
152
+ Dremio can optimize Iceberg tables containing position delete files. This is beneficial to do because when data files are read, the associated delete files are stored in memory. Also, one data file can be linked to several delete files, which can impact read time.
153
+
154
+ When tables are optimized in Dremio, the position delete files are removed and the data files that are linked to them are rewritten. Data files are rewritten if any of the following conditions are met:
155
+
156
+ * The file size is not within the optimum range.
157
+ * The partition's specification is not current.
158
+ * The data file has an attached delete file.
159
+
160
+ ## Related Topics
161
+
162
+ * [Apache Iceberg](/dremio-cloud/developer/data-formats/iceberg) – Learn more about the Apache Iceberg table format.
163
+ * [Load Data Into Tables](/dremio-cloud/bring-data/load/) – Load data from CSV, JSON, or Parquet files into existing Iceberg tables.
164
+
165
+ Was this page helpful?
166
+
167
+ * Clustering
168
+ * Data File Compaction
169
+ * Partition Evolution
170
+ * Manifest File Rewriting
171
+ * Position Delete Files
172
+ * Related Topics
173
+
174
+ <div style="page-break-after: always;"></div>
175
+
176
+ # Wikis and Labels | Dremio Documentation
177
+
178
+ Original URL: https://docs.dremio.com/dremio-cloud/manage-govern/wikis-labels
179
+
180
+ On this page
181
+
182
+ Wikis and labels help users document, organize, and discover datasets within the Open Catalog. This page explains how to manage wikis and labels, as well as how Dremio’s Generative AI features can assist in generating wikis and labels for you.
183
+
184
+ ## Wikis
185
+
186
+ Wikis for datasets provide an efficient way to document and describe datasets within the Open Catalog. These wikis enable users to add comprehensive information, context, and relevant details about the datasets they manage.
187
+ With a user-friendly, rich text editor, the wikis support [Github-flavored markdown](https://github.github.com/gfm/), allowing users to format content easily and enhance readability.
188
+ Wikis ensure that dataset documentation is both accessible and structured, making it simpler for teams to understand the datasets and how to work with them effectively.
189
+
190
+ ![This image shows an example of the Wiki editor in Dremio.](/images/data-wiki-new.png "Creating a Wiki Entry in Dremio")
191
+
192
+ ### Manage Wikis
193
+
194
+ note
195
+
196
+ Ensure you have sufficient [Role-Based Access Control (RBAC) privileges](/dremio-cloud/security/privileges/) to view or edit wikis.
197
+
198
+ To view or edit the wiki for a dataset in the Dremio console:
199
+
200
+ 1. On the Datasets page, navigate to the folder where your dataset is stored.
201
+ 2. Hover over your dataset, and on the right-hand side, click the ![This is the icon that represents more actions.](/images/icons/more.png "Icon represents more actions.") icon.
202
+ 3. Click **Open Details Panel**.
203
+ * You can edit the dataset wiki by clicking **Edit Wiki**, writing your wiki content, and clicking **Save**.
204
+
205
+ ## Labels
206
+
207
+ Labels for datasets offer a powerful way to organize and retrieve datasets within a data catalog. By creating and assigning labels to datasets, users can easily search and filter through large collections related datasets.
208
+ Labels also enhance the search experience, allowing users to quickly locate datasets associated with a specific label. By clicking on a label, users can initiate a search that brings up all datasets linked to that label, streamlining the process of finding relevant data and improving overall data management.
209
+
210
+ The following image shows a dataset in the catalog with several label and a brief wiki. In this example, the label "pii-data" was used in the search field to narrow down on a customer dataset that contains Personally Identifiable Information (PII).
211
+
212
+ ![This image shows an example of creating labels.](/images/tags-new.png "Creating Labels")
213
+
214
+ ### Manage Labels
215
+
216
+ note
217
+
218
+ Ensure you have sufficient [Role-Based Access Control (RBAC) privileges](/dremio-cloud/security/privileges/) to view or edit labels.
219
+
220
+ To view or edit the labels for a dataset in the Dremio console:
221
+
222
+ 1. On the Datasets page, navigate to the folder where your dataset is stored.
223
+ 2. Hover over your dataset, and on the right-hand side, click the ![This is the icon that represents more actions.](/images/icons/more.png "Icon represents more actions.") icon.
224
+ 3. Click **Open Details Panel**.
225
+ * You can add a label by clicking on the ![](/images/icons/edit.png) icon, typing a label name (e.g. `PII`), and clicking **Enter**.
226
+
227
+ ## Generate Labels and Wikis Preview
228
+
229
+ To help eliminate the need for manual profiling and cataloging, you can use Generative AI to generate labels and wikis for your datasets.
230
+
231
+ note
232
+
233
+ If you haven't opted into the Generative AI features, see [Dremio Preferences](/dremio-cloud/admin/projects/preferences) for the steps on how to enable.
234
+
235
+ #### Generate Labels
236
+
237
+ In order to generate a label, Generative AI bases its understanding on your schema by considering other labels that have been previously generated and labels that have been created by other users.
238
+
239
+ To generate labels:
240
+
241
+ 1. Navigate to either the Details page or Details Panel of a dataset.
242
+ 2. In the Dataset Overview on the right, click ![This is the icon that represents Generative AI.](/images/cloud/gen-ai-icon.png "Icon represents Generative AI.") to generate labels.
243
+ 3. In the Generating labels dialog, review the labels generated for the dataset and decide which to save. If multiple labels have been generated, you can save some, all, or none of them. To remove, simply click the **x** on the label.
244
+
245
+ ![This screenshot is showing how to generate a label.](/images/cloud/label-autolabel-new.png "Generating a label.")
246
+
247
+ 4. Complete one of the following actions:
248
+
249
+ * If these are the only labels for your dataset, click **Save**.
250
+ * If you already have labels for the dataset and want to add these generated labels, click **Append**.
251
+ * If you already have labels for the dataset and want to replace them with these generated labels, click **Overwrite**.
252
+
253
+ The labels for the dataset will appear in the Dataset Overview.
254
+
255
+ #### Generate Wikis
256
+
257
+ In order to generate a wiki, Generative AI bases its understanding on your schema and data to produce descriptions of datasets, because it can determine how the columns within the dataset relate to each other and to the dataset as a whole.
258
+
259
+ You can generate wikis only if you are the dataset owner or have `ALTER` privileges on the dataset.
260
+
261
+ To generate a wiki:
262
+
263
+ 1. Navigate to either the Details page or Details Panel of a dataset.
264
+ 2. In the Wiki section, click **Generate wiki**. A dialog will open and a preview of the wiki content will generate on the right of the dialog. If you would like to regenerate, click ![](/images/icons/regenerate.png).
265
+
266
+ ![This screenshot is showing how to generate wikis.](/images/cloud/wiki-autosummarize-new.png "Generating a Wiki.")
267
+
268
+ 3. Click ![](/images/cloud/copy-button.png) to copy the generated wiki content on the right of the dialog.
269
+ 4. Click within the text box on the left and paste the wiki content.
270
+ 5. (Optional) Use the toolbar to make edits to the wiki content. If you would like to regenerate, click ![This is the icon that represents Generative AI.](/images/cloud/gen-ai-icon.png "Icon represents Generative AI.") in the toolbar to regenerate wiki content in the preview.
271
+ 6. Click **Save**.
272
+
273
+ The wiki for the dataset will appear in the Wiki section.
274
+
275
+ ## Related Topics
276
+
277
+ * [Search for Dremio Objects and Entities](/dremio-cloud/explore-analyze/discover#search-for-dremio-objects-and-entities) - Explore Dremio's semantic search capabilities.
278
+ * [Data Privacy](/data-privacy/) - Learn more about Dremio's data privacy practices.
279
+
280
+ Was this page helpful?
281
+
282
+ * Wikis
283
+ + Manage Wikis
284
+ * Labels
285
+ + Manage Labels
286
+ * Generate Labels and Wikis Preview
287
+ * Related Topics
288
+
289
+ <div style="page-break-after: always;"></div>
290
+
291
+ # Row-Access and Column-Masking Policies | Dremio Documentation
292
+
293
+ Original URL: https://docs.dremio.com/dremio-cloud/manage-govern/row-column-policies
294
+
295
+ On this page
296
+
297
+ Row-access and column-masking policies may be applied to tables, views, and columns via [user-defined functions (UDFs)](/dremio-cloud/sql/commands/create-function/). Using these policies, you can control access to sensitive data based upon the rules and conditions you need to maintain compliance or adhere to regulatory requirements, while also removing the need to produce a secondary set of data with protected information manually removed.
298
+
299
+ The following restrictions apply to policies and UDFs:
300
+
301
+ * Only users with the ADMIN role can create UDFs.
302
+ * UDFs can only have one owner, which is the user that created the UDF, by default.
303
+ * You can transfer ownership of a UDF using the `GRANT OWNERSHIP` command (see [Privileges](/dremio-cloud/security/privileges)).
304
+ * Users or roles must have the EXECUTE privilege in order to apply filtering and masking policies.
305
+
306
+ ## Column-Masking Policies
307
+
308
+ Column-masking is a way to mask—or scramble—private data at the column-level dynamically prior to query execution. For example, the owner of a table or view may apply a policy to a column to only display the year of a date or the last four digits of a credit card.
309
+
310
+ Column-masking policies may be any UDF with a scalar return type that is identical to the data type of the column on which it is applied. However, only one column-masking policy may be applied to each column.
311
+
312
+ In the following example of a user-defined function, only users within in the Accounting department in the state of California (CA) may see an entry's social security number (ssn) if the record lists an income above $10,000, otherwise the SSN value is masked with XXX-XX-.
313
+
314
+ Column-masking policy example
315
+
316
+ ```
317
+ CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
318
+ RETURNS VARCHAR(11)
319
+ RETURN SELECT CASE WHEN query_user()='jdoe@dremio.com' OR is_member('Accounting') THEN ssn
320
+ ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
321
+ END;
322
+ ```
323
+
324
+ ## Row-Access Policies
325
+
326
+ Row-access policies are a way to control which records in a table or view are returned for specific users and roles. For example, the owner of a table or view may apply a policy that filters out customers from a specific country unless the user running the query has a specific role.
327
+
328
+ Row-access policy example
329
+
330
+ ```
331
+ CREATE FUNCTION country_filter (country VARCHAR)
332
+ RETURNS BOOLEAN
333
+ RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');
334
+ ```
335
+
336
+ Row-access policies may be any boolean UDF applied to the table or view. The return value of the UDF is treated logically in a query as an `AND` operator included in a `WHERE` clause. The return type of the UDF must be `BOOLEAN`, otherwise Dremio will give an error at execution time.
337
+
338
+ ## User-Defined Functions
339
+
340
+ A user-defined function, or [UDF,](/dremio-cloud/sql/commands/create-function) is a callable routine that accepts input parameters, executes the function body, and returns a single value or a set of rows.
341
+
342
+ The UDFs which serve as the basis for filtering and masking policies must be defined independently of your sources. Not only does this allow organizations to use a single policy for multiple tables and views, but this also restricts user access to policies and prevents unauthorized tampering. Modifying a single UDF automatically updates the policy in the context of any tables or views using that access or mask policy.
343
+
344
+ The following process describes how policies are enforced with Dremio:
345
+
346
+ 1. A user with the ADMIN role creates a UDF to serve as a security policy.
347
+ 2. The administrator then sets the security policy to one or more tables, views, and/or columns.
348
+ 3. Dremio enforces the policy at runtime when an end-user performs a query.
349
+
350
+ Creating UDFs and attaching security policies is done through SQL commands. Policies are applied prior to execution during the query planning phase. At this point, Dremio checks first the table/view for a row-access policy and then each column accessed for a column-masking policy. If any policies are found, they are automatically applied to the policy's scope using the associated UDF in the query plan.
351
+
352
+ ### Query Substitutions
353
+
354
+ Row-access and column-masking function act as an "implicit view," replacing a table/view reference in an SQL statement prior to processing the query. This implicit view is created through an examination of each policy applied to a table, view, or column.
355
+
356
+ For example, [jdoe@dremio.com](mailto:jdoe@dremio.com) has SELECT access to table\_1. However, the column-masking policy protect\_ssn is set for the column\_1 column with a UDF to replace all but the last four digits of a social security number with X for anyone that is not a member of the Accounting department, or this user. When they run a query in Dremio that includes this column-masking policy, the following occurs:
357
+
358
+ 1. During the SQL Planning phase, Dremio identifies which tables, views, and columns are being accessed (table\_1) and whether security policies must be enforced.
359
+ 2. The engine searches for any security policies set to the associated objects, such as protect\_ssn (see Examples of UDFs below).
360
+ 3. When the protect\_ssn policy is found for the object affected by the query, the query planner immediately modifies the execution path to incorporate the masking function.
361
+ 4. Query execution proceeds as normal with the associated UDF included within the execution path.
362
+
363
+ ## List Existing UDFs
364
+
365
+ To view all existing UDFs created in Dremio, use the [`SHOW FUNCTIONS`](/dremio-cloud/sql/commands/show-functions/) SQL command.
366
+
367
+ ## List Existing Policies
368
+
369
+ To view row-access and column-masking policies, use a [`SELECT` statement](/dremio-cloud/sql/commands/SELECT) with the target table/view, system table, and policies specified.
370
+
371
+ List existing column-masking and row-access policies
372
+
373
+ ```
374
+ SELECT view_name, masking_policies, row_access_policies FROM sys.project.views;
375
+ SELECT table_name, masking_policies, row_access_policies FROM sys.project."tables";
376
+ ```
377
+
378
+ To view all column-masking policies set for a given table, use the [`DESCRIBE TABLE`](/dremio-cloud/sql/commands/describe-table/) command.
379
+
380
+ ## Set a Policy
381
+
382
+ To create a row-access or column-masking policy, you must perform the following steps using the associated SQL commands:
383
+
384
+ 1. Create a new UDF or replace an existing one using the `CREATE \[OR REPLACE\]` [function](/dremio-cloud/sql/commands/create-function/) command.
385
+
386
+ Create or replace UDF
387
+
388
+ ```
389
+ CREATE FUNCTION country_filter (country VARCHAR)
390
+ RETURNS BOOLEAN
391
+ RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');
392
+
393
+ CREATE FUNCTION id_filter (id INT)
394
+ RETURNS BOOLEAN
395
+ RETURN SELECT id = 1;
396
+ ```
397
+ 2. Grant the [EXECUTE privilege](/dremio-cloud/security/privileges) to the role/users to apply the policy.
398
+
399
+ Grant EXECUTE privilege
400
+
401
+ ```
402
+ GRANT EXECUTE ON FUNCTION country_filter TO role Policy_Role;
403
+ ```
404
+ 3. Create a policy to apply the function use `ADD ROW ACCESS POLICY` for row-level access or `SET MASKING POLICY` for column-masking. These may be used with the `CREATE TABLE`, `CREATE VIEW`, `ALTER TABLE`, and `ALTER VIEW` commands.
405
+
406
+ Create policy to apply function
407
+
408
+ ```
409
+ -- Add row-access policy
410
+ ALTER TABLE e.employee
411
+ ADD ROW ACCESS POLICY country_filter(country);
412
+
413
+ -- Add column-masking policy
414
+ ALTER VIEW e.employee_view
415
+ SET MASKING POLICY protect_ssn (ssn_col, region);
416
+
417
+ -- Create table with row policy
418
+ CREATE TABLE e.employee(
419
+ id INTEGER,
420
+ ssn VARCHAR(11),
421
+ country VARCHAR,
422
+ ROW ACCESS POLICY country_filter(country)
423
+ );
424
+
425
+ -- Create table with masking policy
426
+ CREATE VIEW e.employee_view(
427
+ ssn_col VARCHAR MASKING POLICY protect_ssn (ssn_col, region),
428
+ region VARCHAR,
429
+ state_col VARCHAR)
430
+ );
431
+ ```
432
+
433
+ note
434
+
435
+ Both row-access and column-masking UDFs may be applied in a single security policy, or set individually.
436
+
437
+ ## Drop a Policy
438
+
439
+ To remove a security policy from a table, view, or row, use `UNSET MASKING POLICY` or `DROP ROW ACCESS POLICY` with `ALTER TABLE` or `ALTER VIEW`.
440
+
441
+ Remove security policy
442
+
443
+ ```
444
+ ALTER TABLE w.employee DROP ROW ACCESS POLICY country_filter(country);
445
+ ALTER VIEW e.employees_view MODIFY COLUMN ssn_col UNSET MASKING POLICY protect_ssn;
446
+ ```
447
+
448
+ ## Examples of UDFs
449
+
450
+ The following are examples of user-defined functions that you may create with Dremio.
451
+
452
+ ### Column-Masking Policies
453
+
454
+ Redact SSN
455
+
456
+ ```
457
+ CREATE FUNCTION
458
+ protect_ssn (val VARCHAR)
459
+ RETURNS VARCHAR
460
+ RETURN
461
+ SELECT
462
+ CASE
463
+ WHEN query_user() IN ('jdoe@dremio.com','janders@dremio.com')
464
+ OR is_member('Accounting') THEN val
465
+ ELSE CONCAT('XXX-XX-',SUBSTR(value,8,4))
466
+ END;
467
+ ```
468
+
469
+ Use column-masking and row-access policies
470
+
471
+ ```
472
+ CREATE FUNCTION lower_country(country VARCHAR)
473
+ RETURNS VARCHAR
474
+ RETURN SELECT lower(country);
475
+
476
+ CREATE FUNCTION country_filter (country VARCHAR)
477
+ RETURNS BOOLEAN
478
+ RETURN SELECT query_user()='dremio'
479
+ OR (is_member('Accounting')
480
+ AND country='CA');
481
+
482
+ CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
483
+ RETURNS VARCHAR(11)
484
+ RETURN SELECT CASE WHEN query_user()='dremio' OR is_member('Accounting') THEN ssn
485
+ ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
486
+ END;
487
+
488
+ CREATE FUNCTION salary_range (salary FLOAT, id INTEGER)
489
+ RETURNS BOOLEAN
490
+ RETURN SELECT CASE WHEN id > 1 AND salary > 10000 THEN true
491
+ ELSE false
492
+ END;
493
+ ```
494
+
495
+ Use STRUCT
496
+
497
+ ```
498
+ --
499
+ CREATE TABLE struct_demo (emp_info struct <name : VARCHAR>);
500
+ INSERT INTO nas.struct_demo VALUES(SELECT convert_from('{"name":"a"}', 'json'));
501
+ CREATE FUNCTION hello(nameCol struct<name:VARCHAR>) RETURNS struct<name:VARCHAR> RETURN SELECT nameCol;
502
+ ALTER TABLE nas.struct_demo MODIFY COLUMN emp_info SET MASKING POLICY hello(emp_info);
503
+ ```
504
+
505
+ Use LIST
506
+
507
+ ```
508
+ CREATE FUNCTION hello_country(countryList LIST<VARCHAR>) RETURNS VARCHAR RETURN SELECT 'Hello World';
509
+ ALTER TABLE "test.json" MODIFY COLUMN country SET MASKING POLICY hello_country(country);
510
+ ```
511
+
512
+ ### Row-Access Policies
513
+
514
+ Use simple filter expressions
515
+
516
+ ```
517
+ CREATE FUNCTION country_filter (country VARCHAR)
518
+ RETURNS BOOLEAN
519
+ RETURN SELECT state='CA';
520
+ ```
521
+
522
+ Match users
523
+
524
+ ```
525
+ CREATE FUNCTION query_1(my_value varchar)
526
+ RETURNS BOOLEAN
527
+ RETURN SELECT CASE
528
+ WHEN current_user = 'jdoe@dremio.com' THEN true
529
+ ELSE false
530
+ END;
531
+ ```
532
+
533
+ ### Table-Driven Policy with a Subquery
534
+
535
+ Use a subquery as a table-driven policy
536
+
537
+ ```
538
+ DROP TABLE <catalog-name>.salesmanagerregions;
539
+ CREATE TABLE <catalog-name>.salesmanagerregions (
540
+ sales_manager varchar,
541
+ sales_region varchar
542
+ );
543
+
544
+ INSERT INTO <catalog-name>.salesmanagerregions
545
+ VALUES ('john.smith@example.com', 'WW'),
546
+ ('jane.doe@example.com', 'NA'),
547
+ ('viktor.jones@example.com', 'EU');
548
+
549
+ CREATE TABLE <catalog-name>.revenue (
550
+ company varchar,
551
+ region varchar,
552
+ revenue decimal(18,2)
553
+ );
554
+
555
+ INSERT INTO <catalog-name>.revenue
556
+ VALUES ('Acme', 'EU', 2.5),
557
+ ('Acme', 'NA', 1.5);
558
+
559
+ CREATE OR REPLACE FUNCTION security.sales_policy (sales_region_in varchar) RETURNS BOOLEAN
560
+ RETURN SELECT is_member('sales_executive_role')
561
+ OR EXISTS (
562
+ SELECT 1 FROM <catalog-name>.salesmanagerregions
563
+ WHERE user() = sales_manager
564
+ AND sales_region = sales_region_in
565
+ );
566
+
567
+ ALTER TABLE <catalog-name>.revenue
568
+ ADD ROW ACCESS POLICY security.sales_policy(region);
569
+
570
+ SELECT * FROM <catalog-name>.revenue;
571
+ -- company, region, revenue
572
+ -- Acme, NA, 1.50
573
+ ```
574
+
575
+ ## Use Reflections on Datasets with Policies
576
+
577
+ Dremio supports Reflection creation on views and tables with row-access and column-masking policies defined on any of the underlying anchor datasets. See the following examples.
578
+
579
+ Example of a view with a row-access policy and a raw Reflection
580
+
581
+ ```
582
+ -- Create nested views
583
+ CREATE OR REPLACE VIEW myView AS
584
+ SELECT city, state, pop FROM Samples."samples.dremio.com"."zips.json"
585
+ WHERE pop > 10000;
586
+ CREATE OR REPLACE VIEW myView2 AS
587
+ SELECT city, state FROM myView
588
+ WHERE STARTS_WITH(city, 'A');
589
+
590
+ -- Create a raw Reflection on the inner view
591
+ ALTER TABLE myView
592
+ CREATE RAW REFLECTION myReflection
593
+ USING DISPLAY(city, state);
594
+
595
+ -- Query the view after the Reflection is created
596
+ SELECT * FROM myView2;
597
+
598
+ -- Create a UDF
599
+ CREATE OR REPLACE FUNCTION isMA(state VARCHAR)
600
+ RETURNS BOOLEAN
601
+ RETURN SELECT CASE WHEN IS_MEMBER('hr') THEN state='MA'
602
+ ELSE NULL
603
+ END;
604
+
605
+ -- Add a row-access policy and query the view
606
+ ALTER TABLE myView
607
+ ADD ROW ACCESS POLICY isMA("state");
608
+ SELECT * FROM myView2;
609
+ ```
610
+
611
+ After running the last query, the Reflection is used to accelerate the query as shown in the results below:
612
+
613
+ ![](/assets/images/rcac_reflection_accelerated-31f0960f65be2a237384c0bd0956681f.png)
614
+
615
+ The `Query1` results show that the row-access policy has been applied successfully:
616
+
617
+ ![](/assets/images/rcac_reflection_policy-e386c1d9134a00081efd62fe472e3edb.png)
618
+
619
+ The `Query2` results do not appear to those who are not members of HR:
620
+
621
+ ![](/assets/images/rcac_reflection_accelerated_nonmember-28eade94013c96a7ec2ba42c29b2b67d.png)
622
+
623
+ The `Query2` results appear to those who are members of HR:
624
+
625
+ ![](/assets/images/rcac_reflection_accelerated_member-9c4fd2be39fe181162620c678e99766c.png)
626
+
627
+ Example of a table with a row-access policy and an aggregation Reflection
628
+
629
+ ```
630
+ ALTER TABLE NAS.rcac.employee
631
+ ADD ROW ACCESS POLICY is_recent_employee(hire_date);
632
+ ALTER TABLE NAS.rcac.employee
633
+ CREATE AGGREGATE REFLECTION ar_tvrf_1 USING DIMENSIONS(hire_date);
634
+ SELECT MIN(SALARY) FROM NAS.rcac.employee
635
+ GROUP BY hire_date;
636
+ ```
637
+
638
+ ### Limitations
639
+
640
+ See the following limitations where datasets with row-access and/or column-masking policies cannot support Reflections:
641
+
642
+ * Policies with Multiple Arguments
643
+ * Aggregates on Masked Columns
644
+ * SET Operations
645
+ * NULL Generating JOINs
646
+ * Trimming Projects
647
+
648
+ #### Policies with Multiple Arguments
649
+
650
+ If a policy on an anchor dataset contains multiple columns, the Reflection created on the view containing the policy fails. See the following example:
651
+
652
+ Example of the limitation
653
+
654
+ ```
655
+ -- Create tables
656
+ CREATE TABLE employees (
657
+ id INT,
658
+ hire_date DATE,
659
+ ssn VARCHAR(11),
660
+ name VARCHAR,
661
+ country VARCHAR,
662
+ salary FLOAT,
663
+ job_id INT);
664
+ CREATE TABLE jobs (
665
+ id INT,
666
+ title VARCHAR,
667
+ is_good BOOLEAN);
668
+
669
+ -- Create a view
670
+ CREATE VIEW job_salary_in_the_usa AS
671
+ SELECT job_id, salary
672
+ FROM employees
673
+ WHERE country = 'USA';
674
+
675
+ -- Create a UDF
676
+ CREATE OR REPLACE FUNCTION hide_salary_on_bad_job(salary FLOAT, job_id_in INT)
677
+ RETURNS BOOLEAN
678
+ RETURN SELECT CASE WHEN IS_MEMBER('public') AND (
679
+ SELECT is_good FROM jobs j WHERE job_id_in = j.id)
680
+ THEN NULL
681
+ ELSE salary
682
+ END;
683
+
684
+ -- Add a column-masking policy
685
+ ALTER TABLE employees
686
+ MODIFY COLUMN salary
687
+ SET MASKING POLICY hide_salary_on_bad_job(salary, job_id);
688
+
689
+ -- Create a raw Reflection on the view
690
+ ALTER DATASET job_salary_in_the_usa
691
+ CREATE RAW REFLECTION job_salary_drr USING DISPLAY(job_id, salary);
692
+ ```
693
+
694
+ In the above example, the `job_salary_drr` Reflection fails to materialize due to the multi-argument policy on `test.tables.employees::salary`.
695
+
696
+ #### Aggregates on Masked Columns
697
+
698
+ You cannot create a raw Reflection on the view if there is a policy defined on the masked column.
699
+
700
+ Example of the limitation
701
+
702
+ ```
703
+ CREATE OR REPLACE VIEW myView AS
704
+ SELECT MIN(salary)
705
+ FROM employees
706
+ ```
707
+
708
+ In the above example, there is a policy defined on `salary`, so you cannot create a Reflection on this view.
709
+
710
+ #### NULL Generating JOINs
711
+
712
+ You can only apply the policy if it's on the “join side” of the join, such as:
713
+
714
+ * Left side of LEFT JOIN
715
+ * Right side of RIGHT JOIN
716
+ * Either side of INNER JOIN
717
+ * Neither side of FULL OUTER JOIN
718
+
719
+ If the policy is not on the "join side", the join generates NULL values for all the entries that didn’t match the join condition.
720
+
721
+ Example of the limitation
722
+
723
+ ```
724
+ CREATE OR REPLACE VIEW myView AS
725
+ SELECT emp.department_id, dept.department_name, emp.name
726
+ FROM employees as emp
727
+ RIGHT JOIN department as dept
728
+ ON emp.department_id = dept.department_id
729
+ ```
730
+
731
+ In the above example, there is a policy defined on the `employees` table, which is on the left side of the RIGHT JOIN, so you cannot create a Reflection on this view.
732
+
733
+ #### SET Operations
734
+
735
+ The policy must be defined on all UNION datasets and on the same field.
736
+
737
+ Example of the limitation
738
+
739
+ ```
740
+ CREATE OR REPLACE VIEW myView AS
741
+ SELECT * FROM a
742
+ UNION SELECT * FROM employees
743
+ UNION SELECT * FROM c
744
+ ```
745
+
746
+ In the above example, there is a policy defined on the `employees` table, so you cannot create a Reflection on this view.
747
+
748
+ #### Trim Projects
749
+
750
+ In order to create a Reflection on a view, the view should reference all the fields that are part of the row-access and column-masking policies.
751
+
752
+ Example of the limitation
753
+
754
+ ```
755
+ -- Create a UDF
756
+ CREATE OR REPLACE FUNCTION isMA(state VARCHAR)
757
+ RETURNS BOOLEAN
758
+ RETURN SELECT CASE WHEN IS_MEMBER('public') THEN state='MA'
759
+ ELSE NULL
760
+ END;
761
+
762
+ -- Create views
763
+ CREATE OR REPLACE VIEW myView1 AS
764
+ SELECT city, state, pop FROM Samples."samples.dremio.com"."zips.json"
765
+ WHERE pop > 10000;
766
+
767
+ -- Add a row-access policy
768
+ ALTER TABLE myView1
769
+ ADD ROW ACCESS POLICY isMA("state");
770
+
771
+ -- Create views
772
+ CREATE OR REPLACE VIEW myView2 AS
773
+ SELECT * FROM myView1;
774
+ CREATE OR REPLACE VIEW myView3 AS
775
+ SELECT city, pop FROM myView1;
776
+ ```
777
+
778
+ #### Trimming Projects
779
+
780
+ In the above example, you can create a Reflection on `myView2` but not on `myView3` since it trims the `state` column from the view which has a policy defined on it.
781
+
782
+ Was this page helpful?
783
+
784
+ * Column-Masking Policies
785
+ * Row-Access Policies
786
+ * User-Defined Functions
787
+ + Query Substitutions
788
+ * List Existing UDFs
789
+ * List Existing Policies
790
+ * Set a Policy
791
+ * Drop a Policy
792
+ * Examples of UDFs
793
+ + Column-Masking Policies
794
+ + Row-Access Policies
795
+ + Table-Driven Policy with a Subquery
796
+ * Use Reflections on Datasets with Policies
797
+ + Limitations
798
+
799
+ <div style="page-break-after: always;"></div>
800
+