parqv 0.1.0__tar.gz → 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- parqv-0.2.0/PKG-INFO +104 -0
- parqv-0.2.0/README.md +88 -0
- {parqv-0.1.0 → parqv-0.2.0}/pyproject.toml +4 -3
- parqv-0.2.0/src/parqv/app.py +168 -0
- parqv-0.2.0/src/parqv/handlers/__init__.py +13 -0
- parqv-0.2.0/src/parqv/handlers/base_handler.py +114 -0
- parqv-0.2.0/src/parqv/handlers/json.py +450 -0
- parqv-0.2.0/src/parqv/handlers/parquet.py +640 -0
- parqv-0.2.0/src/parqv/views/metadata_view.py +26 -0
- parqv-0.2.0/src/parqv/views/schema_view.py +246 -0
- parqv-0.2.0/src/parqv.egg-info/PKG-INFO +104 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv.egg-info/SOURCES.txt +4 -2
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv.egg-info/requires.txt +1 -0
- parqv-0.1.0/PKG-INFO +0 -91
- parqv-0.1.0/README.md +0 -76
- parqv-0.1.0/src/parqv/app.py +0 -131
- parqv-0.1.0/src/parqv/parquet_handler.py +0 -389
- parqv-0.1.0/src/parqv/views/metadata_view.py +0 -19
- parqv-0.1.0/src/parqv/views/row_group_view.py +0 -33
- parqv-0.1.0/src/parqv/views/schema_view.py +0 -187
- parqv-0.1.0/src/parqv.egg-info/PKG-INFO +0 -91
- {parqv-0.1.0 → parqv-0.2.0}/LICENSE +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/setup.cfg +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv/__init__.py +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv/parqv.css +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv/views/__init__.py +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv/views/data_view.py +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv.egg-info/dependency_links.txt +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv.egg-info/entry_points.txt +0 -0
- {parqv-0.1.0 → parqv-0.2.0}/src/parqv.egg-info/top_level.txt +0 -0
parqv-0.2.0/PKG-INFO
ADDED
@@ -0,0 +1,104 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: parqv
|
3
|
+
Version: 0.2.0
|
4
|
+
Summary: An interactive Python TUI for visualizing, exploring, and analyzing files directly in your terminal.
|
5
|
+
Author-email: Sangmin Yoon <sanspareilsmyn@gmail.com>
|
6
|
+
License-Expression: Apache-2.0
|
7
|
+
Requires-Python: >=3.10
|
8
|
+
Description-Content-Type: text/markdown
|
9
|
+
License-File: LICENSE
|
10
|
+
Requires-Dist: textual>=1.0.0
|
11
|
+
Requires-Dist: pyarrow>=16.0.0
|
12
|
+
Requires-Dist: pandas>=2.0.0
|
13
|
+
Requires-Dist: numpy>=1.20.0
|
14
|
+
Requires-Dist: duckdb>=1.2.0
|
15
|
+
Dynamic: license-file
|
16
|
+
|
17
|
+
# parqv
|
18
|
+
|
19
|
+
[](https://www.python.org/)
|
20
|
+
[](LICENSE)
|
21
|
+
[](https://badge.fury.io/py/parqv) <!-- TODO: Link after first PyPI release -->
|
22
|
+
[](https://textual.textualize.io/)
|
23
|
+
|
24
|
+
---
|
25
|
+
|
26
|
+
**Supported File Formats:** ✅ **Parquet** | ✅ **JSON** / **JSON Lines (ndjson)** | *(More planned!)*
|
27
|
+
|
28
|
+
---
|
29
|
+
|
30
|
+
**`parqv` is a Python-based interactive TUI (Text User Interface) tool designed to explore, analyze, and understand various data file formats directly within your terminal.** Initially supporting Parquet and JSON, `parqv` aims to provide a unified, visual experience for quick data inspection without leaving your console.
|
31
|
+
|
32
|
+
## 💻 Demo (Showing Parquet)
|
33
|
+
|
34
|
+

|
35
|
+
*(Demo shows Parquet features; UI adapts for other formats)*
|
36
|
+
|
37
|
+
## 🤔 Why `parqv`?
|
38
|
+
1. **Unified Interface:** Launch `parqv <your_data_file>` to access **metadata, schema, data preview, and column statistics** all within a single, navigable terminal window. No more juggling different commands for different file types.
|
39
|
+
2. **Interactive Exploration:**
|
40
|
+
* **🖱️ Keyboard & Mouse Driven:** Navigate using familiar keys (arrows, `hjkl`, Tab) or even your mouse (thanks to `Textual`).
|
41
|
+
* **📜 Scrollable Views:** Easily scroll through large schemas, data tables, or column lists.
|
42
|
+
* **🌲 Clear Schema View:** Understand column names, data types, and nullability at a glance. (Complex nested structures visualization might vary by format).
|
43
|
+
* **📊 Dynamic Stats:** Select a column and instantly see its detailed statistics (counts, nulls, min/max, mean, distinct values, etc.).
|
44
|
+
3. **Cross-Format Consistency:**
|
45
|
+
* **🎨 Rich Display:** Leverages `rich` and `Textual` for colorful, readable tables and text across supported formats.
|
46
|
+
* **📈 Quick Stats:** Get key statistical insights consistently, regardless of the underlying file type.
|
47
|
+
* **🔌 Extensible:** Designed with a handler interface to easily add support for more file formats in the future (like CSV, Arrow IPC, etc.).
|
48
|
+
|
49
|
+
## ✨ Features (TUI Mode)
|
50
|
+
* **Multi-Format Support:** Currently supports **Parquet** (`.parquet`) and **JSON/JSON Lines** (`.json`, `.ndjson`). Run `parqv <your_file.{parquet,json,ndjson}>`.
|
51
|
+
* **Metadata Panel:** Displays key file information (path, format, size, total rows, column count, etc.). *Fields may vary slightly depending on the file format.*
|
52
|
+
* **Schema Explorer:**
|
53
|
+
* Interactive list view of columns.
|
54
|
+
* Clearly shows column names, data types, and nullability.
|
55
|
+
* **Data Table Viewer:**
|
56
|
+
* Scrollable table preview of the file's data.
|
57
|
+
* Attempts to preserve data types for better representation.
|
58
|
+
* **Column Statistics Viewer:**
|
59
|
+
* Select a column in the Schema tab to view detailed statistics.
|
60
|
+
* Shows counts (total, valid, null), percentages, and type-specific stats (min/max, mean, stddev, distinct counts, length stats, boolean value counts where applicable).
|
61
|
+
* **Row Group Inspector (Parquet Specific):**
|
62
|
+
* *This panel only appears when viewing Parquet files.*
|
63
|
+
* Lists row groups with stats (row count, compressed/uncompressed size).
|
64
|
+
* (Planned) Select a row group for more details.
|
65
|
+
|
66
|
+
## 🚀 Getting Started
|
67
|
+
|
68
|
+
**1. Prerequisites:**
|
69
|
+
* **Python:** Version 3.10 or higher.
|
70
|
+
* **pip:** The Python package installer.
|
71
|
+
|
72
|
+
**2. Install `parqv`:**
|
73
|
+
* Open your terminal and run:
|
74
|
+
```bash
|
75
|
+
pip install parqv
|
76
|
+
```
|
77
|
+
*(This will also install dependencies like `textual`, `pyarrow`, `pandas`, and `duckdb`)*
|
78
|
+
* **Updating `parqv`:**
|
79
|
+
```bash
|
80
|
+
pip install --upgrade parqv
|
81
|
+
```
|
82
|
+
|
83
|
+
**3. Run `parqv`:**
|
84
|
+
* Point `parqv` to your data file:
|
85
|
+
```bash
|
86
|
+
#parquet
|
87
|
+
parqv /path/to/your/data.parquet
|
88
|
+
|
89
|
+
# json
|
90
|
+
parqv /path/to/your/data.json
|
91
|
+
* The interactive TUI will launch. Use your keyboard (and mouse, if supported by your terminal) to navigate:
|
92
|
+
* **Arrow Keys / `j`,`k` (in lists):** Move selection up/down.
|
93
|
+
* **`Tab` / `Shift+Tab`:** Cycle focus between the main tab content and potentially other areas. (Focus handling might evolve).
|
94
|
+
* **`Enter` (in column list):** Select a column to view statistics.
|
95
|
+
* **View Switching:** Use `Ctrl+N` (Next Tab) and `Ctrl+P` (Previous Tab) or click on the tabs (Metadata, Schema, Data Preview).
|
96
|
+
* **Scrolling:** Use `PageUp` / `PageDown` / `Home` / `End` or arrow keys/mouse wheel within scrollable areas (like Schema stats or Data Preview).
|
97
|
+
* **`q` / `Ctrl+C`:** Quit `parqv`.
|
98
|
+
* *(Help Screen `?` is planned)*
|
99
|
+
|
100
|
+
---
|
101
|
+
|
102
|
+
## 📄 License
|
103
|
+
|
104
|
+
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full license text.
|
parqv-0.2.0/README.md
ADDED
@@ -0,0 +1,88 @@
|
|
1
|
+
# parqv
|
2
|
+
|
3
|
+
[](https://www.python.org/)
|
4
|
+
[](LICENSE)
|
5
|
+
[](https://badge.fury.io/py/parqv) <!-- TODO: Link after first PyPI release -->
|
6
|
+
[](https://textual.textualize.io/)
|
7
|
+
|
8
|
+
---
|
9
|
+
|
10
|
+
**Supported File Formats:** ✅ **Parquet** | ✅ **JSON** / **JSON Lines (ndjson)** | *(More planned!)*
|
11
|
+
|
12
|
+
---
|
13
|
+
|
14
|
+
**`parqv` is a Python-based interactive TUI (Text User Interface) tool designed to explore, analyze, and understand various data file formats directly within your terminal.** Initially supporting Parquet and JSON, `parqv` aims to provide a unified, visual experience for quick data inspection without leaving your console.
|
15
|
+
|
16
|
+
## 💻 Demo (Showing Parquet)
|
17
|
+
|
18
|
+

|
19
|
+
*(Demo shows Parquet features; UI adapts for other formats)*
|
20
|
+
|
21
|
+
## 🤔 Why `parqv`?
|
22
|
+
1. **Unified Interface:** Launch `parqv <your_data_file>` to access **metadata, schema, data preview, and column statistics** all within a single, navigable terminal window. No more juggling different commands for different file types.
|
23
|
+
2. **Interactive Exploration:**
|
24
|
+
* **🖱️ Keyboard & Mouse Driven:** Navigate using familiar keys (arrows, `hjkl`, Tab) or even your mouse (thanks to `Textual`).
|
25
|
+
* **📜 Scrollable Views:** Easily scroll through large schemas, data tables, or column lists.
|
26
|
+
* **🌲 Clear Schema View:** Understand column names, data types, and nullability at a glance. (Complex nested structures visualization might vary by format).
|
27
|
+
* **📊 Dynamic Stats:** Select a column and instantly see its detailed statistics (counts, nulls, min/max, mean, distinct values, etc.).
|
28
|
+
3. **Cross-Format Consistency:**
|
29
|
+
* **🎨 Rich Display:** Leverages `rich` and `Textual` for colorful, readable tables and text across supported formats.
|
30
|
+
* **📈 Quick Stats:** Get key statistical insights consistently, regardless of the underlying file type.
|
31
|
+
* **🔌 Extensible:** Designed with a handler interface to easily add support for more file formats in the future (like CSV, Arrow IPC, etc.).
|
32
|
+
|
33
|
+
## ✨ Features (TUI Mode)
|
34
|
+
* **Multi-Format Support:** Currently supports **Parquet** (`.parquet`) and **JSON/JSON Lines** (`.json`, `.ndjson`). Run `parqv <your_file.{parquet,json,ndjson}>`.
|
35
|
+
* **Metadata Panel:** Displays key file information (path, format, size, total rows, column count, etc.). *Fields may vary slightly depending on the file format.*
|
36
|
+
* **Schema Explorer:**
|
37
|
+
* Interactive list view of columns.
|
38
|
+
* Clearly shows column names, data types, and nullability.
|
39
|
+
* **Data Table Viewer:**
|
40
|
+
* Scrollable table preview of the file's data.
|
41
|
+
* Attempts to preserve data types for better representation.
|
42
|
+
* **Column Statistics Viewer:**
|
43
|
+
* Select a column in the Schema tab to view detailed statistics.
|
44
|
+
* Shows counts (total, valid, null), percentages, and type-specific stats (min/max, mean, stddev, distinct counts, length stats, boolean value counts where applicable).
|
45
|
+
* **Row Group Inspector (Parquet Specific):**
|
46
|
+
* *This panel only appears when viewing Parquet files.*
|
47
|
+
* Lists row groups with stats (row count, compressed/uncompressed size).
|
48
|
+
* (Planned) Select a row group for more details.
|
49
|
+
|
50
|
+
## 🚀 Getting Started
|
51
|
+
|
52
|
+
**1. Prerequisites:**
|
53
|
+
* **Python:** Version 3.10 or higher.
|
54
|
+
* **pip:** The Python package installer.
|
55
|
+
|
56
|
+
**2. Install `parqv`:**
|
57
|
+
* Open your terminal and run:
|
58
|
+
```bash
|
59
|
+
pip install parqv
|
60
|
+
```
|
61
|
+
*(This will also install dependencies like `textual`, `pyarrow`, `pandas`, and `duckdb`)*
|
62
|
+
* **Updating `parqv`:**
|
63
|
+
```bash
|
64
|
+
pip install --upgrade parqv
|
65
|
+
```
|
66
|
+
|
67
|
+
**3. Run `parqv`:**
|
68
|
+
* Point `parqv` to your data file:
|
69
|
+
```bash
|
70
|
+
#parquet
|
71
|
+
parqv /path/to/your/data.parquet
|
72
|
+
|
73
|
+
# json
|
74
|
+
parqv /path/to/your/data.json
|
75
|
+
* The interactive TUI will launch. Use your keyboard (and mouse, if supported by your terminal) to navigate:
|
76
|
+
* **Arrow Keys / `j`,`k` (in lists):** Move selection up/down.
|
77
|
+
* **`Tab` / `Shift+Tab`:** Cycle focus between the main tab content and potentially other areas. (Focus handling might evolve).
|
78
|
+
* **`Enter` (in column list):** Select a column to view statistics.
|
79
|
+
* **View Switching:** Use `Ctrl+N` (Next Tab) and `Ctrl+P` (Previous Tab) or click on the tabs (Metadata, Schema, Data Preview).
|
80
|
+
* **Scrolling:** Use `PageUp` / `PageDown` / `Home` / `End` or arrow keys/mouse wheel within scrollable areas (like Schema stats or Data Preview).
|
81
|
+
* **`q` / `Ctrl+C`:** Quit `parqv`.
|
82
|
+
* *(Help Screen `?` is planned)*
|
83
|
+
|
84
|
+
---
|
85
|
+
|
86
|
+
## 📄 License
|
87
|
+
|
88
|
+
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full license text.
|
@@ -4,8 +4,8 @@ build-backend = "setuptools.build_meta"
|
|
4
4
|
|
5
5
|
[project]
|
6
6
|
name = "parqv"
|
7
|
-
version = "0.
|
8
|
-
description = "An interactive Python TUI for visualizing, exploring, and analyzing
|
7
|
+
version = "0.2.0"
|
8
|
+
description = "An interactive Python TUI for visualizing, exploring, and analyzing files directly in your terminal."
|
9
9
|
readme = "README.md"
|
10
10
|
requires-python = ">=3.10"
|
11
11
|
license = "Apache-2.0"
|
@@ -15,7 +15,8 @@ dependencies = [
|
|
15
15
|
"textual>=1.0.0",
|
16
16
|
"pyarrow>=16.0.0",
|
17
17
|
"pandas>=2.0.0",
|
18
|
-
"numpy>=1.20.0"
|
18
|
+
"numpy>=1.20.0",
|
19
|
+
"duckdb>=1.2.0"
|
19
20
|
]
|
20
21
|
|
21
22
|
[project.scripts]
|
@@ -0,0 +1,168 @@
|
|
1
|
+
import logging
|
2
|
+
import sys
|
3
|
+
from logging.handlers import RotatingFileHandler
|
4
|
+
from pathlib import Path
|
5
|
+
from typing import Optional, Type
|
6
|
+
|
7
|
+
from textual.app import App, ComposeResult, Binding
|
8
|
+
from textual.containers import Container
|
9
|
+
from textual.widgets import Header, Footer, Static, Label, TabbedContent, TabPane
|
10
|
+
|
11
|
+
from .handlers import (
|
12
|
+
DataHandler,
|
13
|
+
DataHandlerError,
|
14
|
+
ParquetHandler,
|
15
|
+
JsonHandler,
|
16
|
+
)
|
17
|
+
from .views.data_view import DataView
|
18
|
+
from .views.metadata_view import MetadataView
|
19
|
+
from .views.schema_view import SchemaView
|
20
|
+
|
21
|
+
LOG_FILENAME = "parqv.log"
|
22
|
+
file_handler = RotatingFileHandler(
|
23
|
+
LOG_FILENAME, maxBytes=1024 * 1024 * 5, backupCount=3, encoding="utf-8"
|
24
|
+
)
|
25
|
+
logging.basicConfig(
|
26
|
+
level=logging.INFO,
|
27
|
+
format="%(asctime)s [%(levelname)-5.5s] %(name)s (%(filename)s:%(lineno)d) - %(message)s",
|
28
|
+
handlers=[file_handler, logging.StreamHandler(sys.stdout)],
|
29
|
+
)
|
30
|
+
log = logging.getLogger(__name__)
|
31
|
+
|
32
|
+
AnyHandler = DataHandler
|
33
|
+
AnyHandlerError = DataHandlerError
|
34
|
+
|
35
|
+
|
36
|
+
class ParqV(App[None]):
|
37
|
+
"""A Textual app to visualize Parquet or JSON files."""
|
38
|
+
|
39
|
+
CSS_PATH = "parqv.css"
|
40
|
+
BINDINGS = [
|
41
|
+
Binding("q", "quit", "Quit", priority=True),
|
42
|
+
]
|
43
|
+
|
44
|
+
# App State
|
45
|
+
file_path: Optional[Path] = None
|
46
|
+
handler: Optional[AnyHandler] = None # Use ABC type hint
|
47
|
+
handler_type: Optional[str] = None # Keep for display ('parquet', 'json')
|
48
|
+
error_message: Optional[str] = None
|
49
|
+
|
50
|
+
def __init__(self, file_path_str: Optional[str] = None, *args, **kwargs):
|
51
|
+
super().__init__(*args, **kwargs)
|
52
|
+
if not file_path_str:
|
53
|
+
self.error_message = "No file path provided."
|
54
|
+
log.error(self.error_message)
|
55
|
+
return
|
56
|
+
|
57
|
+
self.file_path = Path(file_path_str)
|
58
|
+
log.debug(f"Input file path: {self.file_path}")
|
59
|
+
|
60
|
+
if not self.file_path.is_file():
|
61
|
+
self.error_message = f"File not found or is not a regular file: {self.file_path}"
|
62
|
+
log.error(self.error_message)
|
63
|
+
return
|
64
|
+
|
65
|
+
# Handler Detection
|
66
|
+
handler_class: Optional[Type[AnyHandler]] = None
|
67
|
+
handler_error_class: Type[AnyHandlerError] = DataHandlerError
|
68
|
+
detected_type = "unknown"
|
69
|
+
file_suffix = self.file_path.suffix.lower()
|
70
|
+
|
71
|
+
if file_suffix == ".parquet":
|
72
|
+
log.info("Detected '.parquet' extension, using ParquetHandler.")
|
73
|
+
handler_class = ParquetHandler
|
74
|
+
detected_type = "parquet"
|
75
|
+
elif file_suffix in [".json", ".ndjson"]:
|
76
|
+
log.info(f"Detected '{file_suffix}' extension, using JsonHandler.")
|
77
|
+
handler_class = JsonHandler
|
78
|
+
detected_type = "json"
|
79
|
+
else:
|
80
|
+
self.error_message = f"Unsupported file extension: '{file_suffix}'. Only .parquet, .json, .ndjson are supported."
|
81
|
+
log.error(self.error_message)
|
82
|
+
return
|
83
|
+
|
84
|
+
# Instantiate Handler
|
85
|
+
if handler_class:
|
86
|
+
log.info(f"Attempting to initialize {detected_type.capitalize()} handler for: {self.file_path}")
|
87
|
+
try:
|
88
|
+
self.handler = handler_class(self.file_path)
|
89
|
+
self.handler_type = detected_type
|
90
|
+
log.info(f"{detected_type.capitalize()} handler initialized successfully.")
|
91
|
+
except DataHandlerError as e:
|
92
|
+
self.error_message = f"Failed to initialize {detected_type} handler: {e}"
|
93
|
+
log.error(self.error_message, exc_info=True)
|
94
|
+
except Exception as e:
|
95
|
+
self.error_message = f"An unexpected error occurred during {detected_type} handler initialization: {e}"
|
96
|
+
log.exception(f"Unexpected error during {detected_type} handler initialization:")
|
97
|
+
|
98
|
+
def compose(self) -> ComposeResult:
|
99
|
+
yield Header()
|
100
|
+
if self.error_message:
|
101
|
+
log.error(f"Displaying error message: {self.error_message}")
|
102
|
+
yield Container(
|
103
|
+
Label("Error Loading File:", classes="error-title"),
|
104
|
+
Static(self.error_message, classes="error-content"),
|
105
|
+
id="error-container"
|
106
|
+
)
|
107
|
+
elif self.handler:
|
108
|
+
log.debug(f"Composing main layout with TabbedContent for {self.handler_type} handler.")
|
109
|
+
with TabbedContent(id="main-tabs"):
|
110
|
+
yield TabPane("Metadata", MetadataView(id="metadata-view"), id="tab-metadata")
|
111
|
+
yield TabPane("Schema", SchemaView(id="schema-view"), id="tab-schema")
|
112
|
+
yield TabPane("Data Preview", DataView(id="data-view"), id="tab-data")
|
113
|
+
else:
|
114
|
+
log.error("Compose called but no handler and no error message. Initialization likely failed silently.")
|
115
|
+
yield Container(Label("Initialization failed."), id="init-failed")
|
116
|
+
yield Footer()
|
117
|
+
|
118
|
+
def on_mount(self) -> None:
|
119
|
+
log.debug("App mounted.")
|
120
|
+
try:
|
121
|
+
header = self.query_one(Header)
|
122
|
+
display_name = "N/A"
|
123
|
+
format_name = "Unknown"
|
124
|
+
if self.handler and self.file_path:
|
125
|
+
display_name = self.file_path.name
|
126
|
+
format_name = self.handler_type.capitalize() if self.handler_type else "Unknown"
|
127
|
+
header.title = f"parqv - {display_name}"
|
128
|
+
header.sub_title = f"Format: {format_name}"
|
129
|
+
elif self.error_message:
|
130
|
+
header.title = "parqv - Error"
|
131
|
+
else:
|
132
|
+
header.title = "parqv"
|
133
|
+
except Exception as e:
|
134
|
+
log.error(f"Failed to set header title: {e}")
|
135
|
+
|
136
|
+
def action_quit(self) -> None:
|
137
|
+
log.info("Quit action triggered.")
|
138
|
+
if self.handler:
|
139
|
+
try:
|
140
|
+
self.handler.close()
|
141
|
+
except Exception as e:
|
142
|
+
log.error(f"Error during handler cleanup: {e}")
|
143
|
+
self.exit()
|
144
|
+
|
145
|
+
|
146
|
+
# CLI Entry Point
|
147
|
+
def run_app():
|
148
|
+
log.info("--- parqv (ABC Handler) started ---")
|
149
|
+
if len(sys.argv) < 2:
|
150
|
+
print("Usage: parqv <path_to_parquet_or_json_file>")
|
151
|
+
log.error("No file path provided.")
|
152
|
+
sys.exit(1)
|
153
|
+
|
154
|
+
file_path_str = sys.argv[1]
|
155
|
+
log.debug(f"File path from argument: {file_path_str}")
|
156
|
+
|
157
|
+
_path = Path(file_path_str)
|
158
|
+
if not _path.suffix.lower() in ['.parquet', '.json', '.ndjson']:
|
159
|
+
print(f"Error: Unsupported file type '{_path.suffix}'. Please provide a .parquet, .json, or .ndjson file.")
|
160
|
+
log.error(f"Unsupported file type provided via CLI: {_path.suffix}")
|
161
|
+
sys.exit(1)
|
162
|
+
|
163
|
+
app = ParqV(file_path_str=file_path_str)
|
164
|
+
app.run()
|
165
|
+
|
166
|
+
|
167
|
+
if __name__ == "__main__":
|
168
|
+
run_app()
|
@@ -0,0 +1,13 @@
|
|
1
|
+
# src/parqv/handlers/__init__.py
|
2
|
+
from .base_handler import DataHandler, DataHandlerError
|
3
|
+
from .parquet import ParquetHandler, ParquetHandlerError
|
4
|
+
from .json import JsonHandler, JsonHandlerError
|
5
|
+
|
6
|
+
__all__ = [
|
7
|
+
"DataHandler",
|
8
|
+
"DataHandlerError",
|
9
|
+
"ParquetHandler",
|
10
|
+
"ParquetHandlerError",
|
11
|
+
"JsonHandler",
|
12
|
+
"JsonHandlerError",
|
13
|
+
]
|
@@ -0,0 +1,114 @@
|
|
1
|
+
import logging
|
2
|
+
from abc import ABC, abstractmethod
|
3
|
+
from pathlib import Path
|
4
|
+
from typing import Any, Dict, List, Optional
|
5
|
+
|
6
|
+
import pandas as pd
|
7
|
+
|
8
|
+
log = logging.getLogger(__name__)
|
9
|
+
|
10
|
+
|
11
|
+
class DataHandlerError(Exception):
|
12
|
+
"""Base exception for all data handler errors."""
|
13
|
+
pass
|
14
|
+
|
15
|
+
|
16
|
+
class DataHandler(ABC):
|
17
|
+
"""
|
18
|
+
Abstract Base Class for data handlers.
|
19
|
+
Defines the common interface required by the ParqV application
|
20
|
+
to interact with different data file formats.
|
21
|
+
"""
|
22
|
+
|
23
|
+
def __init__(self, file_path: Path):
|
24
|
+
"""
|
25
|
+
Initializes the handler with the file path.
|
26
|
+
Subclasses should open the file or set up necessary resources here.
|
27
|
+
|
28
|
+
Args:
|
29
|
+
file_path: Path to the data file.
|
30
|
+
|
31
|
+
Raises:
|
32
|
+
DataHandlerError: If initialization fails (e.g., file not found, format error).
|
33
|
+
"""
|
34
|
+
self.file_path = file_path
|
35
|
+
|
36
|
+
@abstractmethod
|
37
|
+
def close(self) -> None:
|
38
|
+
"""
|
39
|
+
Closes any open resources (files, connections, etc.).
|
40
|
+
Must be implemented by subclasses.
|
41
|
+
"""
|
42
|
+
pass
|
43
|
+
|
44
|
+
@abstractmethod
|
45
|
+
def get_metadata_summary(self) -> Dict[str, Any]:
|
46
|
+
"""
|
47
|
+
Returns a dictionary containing summary metadata about the data source.
|
48
|
+
Keys should be human-readable strings. Values can be of various types.
|
49
|
+
Should include an 'error' key if metadata retrieval fails.
|
50
|
+
|
51
|
+
Returns:
|
52
|
+
A dictionary with metadata summary or an error dictionary.
|
53
|
+
"""
|
54
|
+
pass
|
55
|
+
|
56
|
+
@abstractmethod
|
57
|
+
def get_schema_data(self) -> Optional[List[Dict[str, str]]]:
|
58
|
+
"""
|
59
|
+
Returns the schema as a list of dictionaries.
|
60
|
+
Each dictionary should represent a column and ideally contain keys:
|
61
|
+
'name' (str): Column name.
|
62
|
+
'type' (str): Formatted data type string.
|
63
|
+
'nullable' (Any): Indicator of nullability (e.g., bool, str "YES"/"NO").
|
64
|
+
|
65
|
+
Returns:
|
66
|
+
A list of schema dictionaries, an empty list if no columns,
|
67
|
+
or None if schema retrieval failed.
|
68
|
+
"""
|
69
|
+
pass
|
70
|
+
|
71
|
+
@abstractmethod
|
72
|
+
def get_data_preview(self, num_rows: int = 50) -> Optional[pd.DataFrame]:
|
73
|
+
"""
|
74
|
+
Fetches a preview of the data.
|
75
|
+
|
76
|
+
Args:
|
77
|
+
num_rows: The maximum number of rows to fetch.
|
78
|
+
|
79
|
+
Returns:
|
80
|
+
A pandas DataFrame with preview data, an empty DataFrame if no data,
|
81
|
+
a DataFrame with an 'error' column on failure, or None on critical failure.
|
82
|
+
"""
|
83
|
+
pass
|
84
|
+
|
85
|
+
@abstractmethod
|
86
|
+
def get_column_stats(self, column_name: str) -> Dict[str, Any]:
|
87
|
+
"""
|
88
|
+
Calculates and returns statistics for a specific column.
|
89
|
+
The returned dictionary should ideally contain keys like:
|
90
|
+
'column' (str): Column name.
|
91
|
+
'type' (str): Formatted data type string.
|
92
|
+
'nullable' (Any): Nullability indicator.
|
93
|
+
'calculated' (Dict[str, Any]): Dictionary of computed statistics.
|
94
|
+
'error' (Optional[str]): Error message if calculation failed.
|
95
|
+
'message' (Optional[str]): Informational message.
|
96
|
+
|
97
|
+
Args:
|
98
|
+
column_name: The name of the column.
|
99
|
+
|
100
|
+
Returns:
|
101
|
+
A dictionary containing column statistics or error information.
|
102
|
+
"""
|
103
|
+
pass
|
104
|
+
|
105
|
+
def _format_size(self, num_bytes: int) -> str:
|
106
|
+
"""Formats bytes into a human-readable string."""
|
107
|
+
if num_bytes < 1024:
|
108
|
+
return f"{num_bytes} bytes"
|
109
|
+
elif num_bytes < 1024 ** 2:
|
110
|
+
return f"{num_bytes / 1024:.1f} KB"
|
111
|
+
elif num_bytes < 1024 ** 3:
|
112
|
+
return f"{num_bytes / 1024 ** 2:.1f} MB"
|
113
|
+
else:
|
114
|
+
return f"{num_bytes / 1024 ** 3:.1f} GB"
|