iparq 0.1.5__tar.gz → 0.1.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,12 @@
1
+ # Copilot instructions
2
+
3
+ ## Package management
4
+
5
+ Use UV for package management.
6
+
7
+ ## Python
8
+
9
+ Since this is a CLI tool, use typer for the CLI interface.
10
+
11
+ Use pydantic for data validation and settings management.
12
+ Use pytest for testing.
@@ -171,3 +171,4 @@ cython_debug/
171
171
  .pypirc
172
172
  .github/.DS_Store
173
173
  yellow_tripdata_2024-01.parquet
174
+ filter.parquet
@@ -0,0 +1,32 @@
1
+ # Contributing to iparq
2
+
3
+ Thank you for considering contributing to iparq! We're excited to collaborate with you. Here are some guidelines to help you get started:
4
+
5
+ ## How to Contribute
6
+
7
+ 1. **Fork the repository**: Click the "Fork" button at the top right of this page to create a copy of the repository.
8
+ 2. **Clone your fork**: Use `git clone <your-fork-url>` to clone your forked repository to your local machine.
9
+ 3. **Create a branch**: Use `git checkout -b <branch-name>` to create a new branch for your changes.
10
+ 4. **Make your changes**: Make the necessary changes in your local repository.
11
+ 5. **Commit your changes**: Use `git commit -m "Description of changes"` to commit your changes.
12
+ 6. **Push your changes**: Use `git push origin <branch-name>` to push your changes to your forked repository.
13
+ 7. **Create a pull request**: Go to the original repository and create a pull request from your forked repository.
14
+
15
+ ## Guidelines
16
+
17
+ - **Code of Conduct**: Please adhere to our [Code of Conduct](CODE_OF_CONDUCT.md) to ensure a welcoming and friendly environment.
18
+ - **Documentation**: Ensure your code changes are well-documented. Update any relevant documentation in the `docs` folder.
19
+ - **Tests**: Include tests for your changes to ensure functionality and avoid regressions.
20
+ - **Commit Messages**: Write clear and concise commit messages. Follow the format: `type(scope): message`.
21
+
22
+ ## Reporting Issues
23
+
24
+ If you encounter any issues or bugs, please open an issue in the repository. Provide as much detail as possible, including steps to reproduce the issue and any relevant logs or screenshots.
25
+
26
+ ## License
27
+
28
+ By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE).
29
+
30
+ Thank you for your contributions and support!
31
+
32
+ Happy coding!
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: iparq
3
- Version: 0.1.5
3
+ Version: 0.1.7
4
4
  Summary: Display version and compression information about a parquet file
5
5
  Author-email: MiguelElGallo <miguel.zurcher@gmail.com>
6
6
  License-File: LICENSE
@@ -26,6 +26,10 @@ Description-Content-Type: text/markdown
26
26
  ![alt text](media/iparq.png)
27
27
  After reading [this blog](https://duckdb.org/2025/01/22/parquet-encodings.html), I began to wonder which Parquet version and compression methods the everyday tools we rely on actually use, only to find that there’s no straightforward way to determine this. That curiosity and the difficulty of quickly discovering such details motivated me to create iparq (Information Parquet). My goal with iparq is to help users easily identify the specifics of the Parquet files generated by different engines, making it clear which features—like newer encodings or certain compression algorithms—the creator of the parquet is using.
28
28
 
29
+ ***New*** Bloom filters information: Displays if there are bloom filters.
30
+ Read more about bloom filters in this [great article](https://duckdb.org/2025/03/07/parquet-bloom-filters-in-duckdb.html).
31
+
32
+
29
33
  ## Installation
30
34
 
31
35
  ### Using pip
@@ -80,7 +84,63 @@ iparq <filename>
80
84
 
81
85
  Replace `<filename>` with the path to your .parquet file. The utility will read the metadata of the file and print the compression codecs used in the parquet file.
82
86
 
83
- ## Example output
87
+ ## Example ouput - Bloom Filters
88
+
89
+ ```log
90
+ ParquetMetaModel(
91
+ created_by='DuckDB version v1.2.1 (build 8e52ec4395)',
92
+ num_columns=1,
93
+ num_rows=100000000,
94
+ num_row_groups=10,
95
+ format_version='1.0',
96
+ serialized_size=1196
97
+ )
98
+ Column Compression Info:
99
+ Row Group 0:
100
+ Column 'r' (Index 0): SNAPPY
101
+ Row Group 1:
102
+ Column 'r' (Index 0): SNAPPY
103
+ Row Group 2:
104
+ Column 'r' (Index 0): SNAPPY
105
+ Row Group 3:
106
+ Column 'r' (Index 0): SNAPPY
107
+ Row Group 4:
108
+ Column 'r' (Index 0): SNAPPY
109
+ Row Group 5:
110
+ Column 'r' (Index 0): SNAPPY
111
+ Row Group 6:
112
+ Column 'r' (Index 0): SNAPPY
113
+ Row Group 7:
114
+ Column 'r' (Index 0): SNAPPY
115
+ Row Group 8:
116
+ Column 'r' (Index 0): SNAPPY
117
+ Row Group 9:
118
+ Column 'r' (Index 0): SNAPPY
119
+ Bloom Filter Info:
120
+ Row Group 0:
121
+ Column 'r' (Index 0): Has bloom filter
122
+ Row Group 1:
123
+ Column 'r' (Index 0): Has bloom filter
124
+ Row Group 2:
125
+ Column 'r' (Index 0): Has bloom filter
126
+ Row Group 3:
127
+ Column 'r' (Index 0): Has bloom filter
128
+ Row Group 4:
129
+ Column 'r' (Index 0): Has bloom filter
130
+ Row Group 5:
131
+ Column 'r' (Index 0): Has bloom filter
132
+ Row Group 6:
133
+ Column 'r' (Index 0): Has bloom filter
134
+ Row Group 7:
135
+ Column 'r' (Index 0): Has bloom filter
136
+ Row Group 8:
137
+ Column 'r' (Index 0): Has bloom filter
138
+ Row Group 9:
139
+ Column 'r' (Index 0): Has bloom filter
140
+ Compression codecs: {'SNAPPY'}
141
+ ```
142
+
143
+ ## Example output
84
144
 
85
145
  ```log
86
146
  ParquetMetaModel(
@@ -9,6 +9,10 @@
9
9
  ![alt text](media/iparq.png)
10
10
  After reading [this blog](https://duckdb.org/2025/01/22/parquet-encodings.html), I began to wonder which Parquet version and compression methods the everyday tools we rely on actually use, only to find that there’s no straightforward way to determine this. That curiosity and the difficulty of quickly discovering such details motivated me to create iparq (Information Parquet). My goal with iparq is to help users easily identify the specifics of the Parquet files generated by different engines, making it clear which features—like newer encodings or certain compression algorithms—the creator of the parquet is using.
11
11
 
12
+ ***New*** Bloom filters information: Displays if there are bloom filters.
13
+ Read more about bloom filters in this [great article](https://duckdb.org/2025/03/07/parquet-bloom-filters-in-duckdb.html).
14
+
15
+
12
16
  ## Installation
13
17
 
14
18
  ### Using pip
@@ -63,7 +67,63 @@ iparq <filename>
63
67
 
64
68
  Replace `<filename>` with the path to your .parquet file. The utility will read the metadata of the file and print the compression codecs used in the parquet file.
65
69
 
66
- ## Example output
70
+ ## Example ouput - Bloom Filters
71
+
72
+ ```log
73
+ ParquetMetaModel(
74
+ created_by='DuckDB version v1.2.1 (build 8e52ec4395)',
75
+ num_columns=1,
76
+ num_rows=100000000,
77
+ num_row_groups=10,
78
+ format_version='1.0',
79
+ serialized_size=1196
80
+ )
81
+ Column Compression Info:
82
+ Row Group 0:
83
+ Column 'r' (Index 0): SNAPPY
84
+ Row Group 1:
85
+ Column 'r' (Index 0): SNAPPY
86
+ Row Group 2:
87
+ Column 'r' (Index 0): SNAPPY
88
+ Row Group 3:
89
+ Column 'r' (Index 0): SNAPPY
90
+ Row Group 4:
91
+ Column 'r' (Index 0): SNAPPY
92
+ Row Group 5:
93
+ Column 'r' (Index 0): SNAPPY
94
+ Row Group 6:
95
+ Column 'r' (Index 0): SNAPPY
96
+ Row Group 7:
97
+ Column 'r' (Index 0): SNAPPY
98
+ Row Group 8:
99
+ Column 'r' (Index 0): SNAPPY
100
+ Row Group 9:
101
+ Column 'r' (Index 0): SNAPPY
102
+ Bloom Filter Info:
103
+ Row Group 0:
104
+ Column 'r' (Index 0): Has bloom filter
105
+ Row Group 1:
106
+ Column 'r' (Index 0): Has bloom filter
107
+ Row Group 2:
108
+ Column 'r' (Index 0): Has bloom filter
109
+ Row Group 3:
110
+ Column 'r' (Index 0): Has bloom filter
111
+ Row Group 4:
112
+ Column 'r' (Index 0): Has bloom filter
113
+ Row Group 5:
114
+ Column 'r' (Index 0): Has bloom filter
115
+ Row Group 6:
116
+ Column 'r' (Index 0): Has bloom filter
117
+ Row Group 7:
118
+ Column 'r' (Index 0): Has bloom filter
119
+ Row Group 8:
120
+ Column 'r' (Index 0): Has bloom filter
121
+ Row Group 9:
122
+ Column 'r' (Index 0): Has bloom filter
123
+ Compression codecs: {'SNAPPY'}
124
+ ```
125
+
126
+ ## Example output
67
127
 
68
128
  ```log
69
129
  ParquetMetaModel(
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "iparq"
3
- version = "0.1.5"
3
+ version = "0.1.7"
4
4
  description = "Display version and compression information about a parquet file"
5
5
  readme = "README.md"
6
6
  authors = [
@@ -120,6 +120,45 @@ def print_compression_types(parquet_metadata) -> None:
120
120
  pass
121
121
 
122
122
 
123
+ def print_bloom_filter_info(parquet_metadata) -> None:
124
+ """
125
+ Prints information about bloom filters for each column in each row group of the Parquet file.
126
+ """
127
+ try:
128
+ num_row_groups = parquet_metadata.num_row_groups
129
+ num_columns = parquet_metadata.num_columns
130
+ has_bloom_filters = False
131
+
132
+ console.print("[bold underline]Bloom Filter Info:[/bold underline]")
133
+
134
+ for i in range(num_row_groups):
135
+ row_group = parquet_metadata.row_group(i)
136
+ bloom_filters_in_group = False
137
+
138
+ for j in range(num_columns):
139
+ column_chunk = row_group.column(j)
140
+ column_name = parquet_metadata.schema.column(j).name
141
+
142
+ # Check if this column has bloom filters using is_stats_set
143
+ if hasattr(column_chunk, "is_stats_set") and column_chunk.is_stats_set:
144
+ if not bloom_filters_in_group:
145
+ console.print(f"[bold]Row Group {i}:[/bold]")
146
+ bloom_filters_in_group = True
147
+ has_bloom_filters = True
148
+ console.print(
149
+ f" Column '{column_name}' (Index {j}): [green]Has bloom filter[/green]"
150
+ )
151
+
152
+ if not has_bloom_filters:
153
+ console.print(" [italic]No bloom filters found in any column[/italic]")
154
+
155
+ except Exception as e:
156
+ console.print(
157
+ f"Error while printing bloom filter information: {e}",
158
+ style="blink bold red underline on white",
159
+ )
160
+
161
+
123
162
  @app.command()
124
163
  def main(filename: str):
125
164
  """
@@ -135,6 +174,7 @@ def main(filename: str):
135
174
 
136
175
  print_parquet_metadata(parquet_metadata)
137
176
  print_compression_types(parquet_metadata)
177
+ print_bloom_filter_info(parquet_metadata)
138
178
  print(f"Compression codecs: {compression}")
139
179
 
140
180
 
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes