PyPI - wherewolf - Versions diffs - 0.2.2__tar.gz → 0.3.0__tar.gz - Mend

wherewolf 0.2.2tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

{wherewolf-0.2.2 → wherewolf-0.3.0}/GEMINI.md RENAMED Viewed

@@ -379,6 +379,7 @@ Before any commit, agents MUST execute:
 ruff check . --fix
 ruff format .
+ty check .
 uv run pytest
 ```
@@ -474,3 +475,6 @@ If execution fails:
 5. Re-run validation suite
 Blind retries are forbidden.
+## Gemini Added Memories
+- When creating a new release tag for this project, I MUST always increment the cacheBuster parameter in all URLs within the README.md (e.g., badges, banners, screenshots) to ensure GitHub Camo refreshes the images immediately.

{wherewolf-0.2.2 → wherewolf-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: wherewolf
-Version: 0.2.2
+Version: 0.3.0
 License-File: LICENSE
 Requires-Python: >=3.11
 Requires-Dist: duckdb>=1.5.0

{wherewolf-0.2.2 → wherewolf-0.3.0}/README.md RENAMED Viewed

@@ -1,10 +1,10 @@
 # Wherewolf
-<img src="https://raw.githubusercontent.com/beallio/wherewolf/main/src/wherewolf/assets/img/wherewolf_banner.png?cacheBuster=4" width="100%">
+<img src="https://raw.githubusercontent.com/beallio/wherewolf/main/src/wherewolf/assets/img/wherewolf_banner.png?cacheBuster=5" width="100%">
-[![CI](https://github.com/beallio/wherewolf/actions/workflows/ci.yml/badge.svg?cacheBuster=4)](https://github.com/beallio/wherewolf/actions/workflows/ci.yml)
-[![PyPI version](https://img.shields.io/pypi/v/wherewolf.svg?cacheBuster=4)](https://pypi.org/project/wherewolf/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?cacheBuster=4)](https://opensource.org/licenses/MIT)
+[![CI](https://github.com/beallio/wherewolf/actions/workflows/ci.yml/badge.svg?cacheBuster=5)](https://github.com/beallio/wherewolf/actions/workflows/ci.yml)
+[![PyPI version](https://img.shields.io/pypi/v/wherewolf.svg?cacheBuster=5)](https://pypi.org/project/wherewolf/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?cacheBuster=5)](https://opensource.org/licenses/MIT)
 A production-grade, local SQL workbench for querying files (CSV, Parquet, JSON) using DuckDB or Spark.
@@ -16,7 +16,7 @@ A production-grade, local SQL workbench for querying files (CSV, Parquet, JSON)
 - **Export:** Download query results as CSV, Excel, or Parquet.
 - **Execution Metrics:** Tracks row count and execution time.
-![Wherewolf Screenshot](https://raw.githubusercontent.com/beallio/wherewolf/main/src/wherewolf/assets/img/screenshot.png?cacheBuster=4)
+![Wherewolf Screenshot](https://raw.githubusercontent.com/beallio/wherewolf/main/src/wherewolf/assets/img/screenshot.png?cacheBuster=5)
 ## Installation

wherewolf-0.3.0/docs/agent_conversations/2026-04-21_fix_infinite_loop_regression.json ADDED Viewed

@@ -0,0 +1,13 @@
+{
+  "date": "2026-04-21",
+  "task_objective": "Fix data loading regression (infinite loop in app.py)",
+  "files_modified": [
+    "src/wherewolf/app.py"
+  ],
+  "tests_added": [],
+  "design_decisions": [
+    "Moved background query completion check above the autorefresh block.",
+    "Moved the autorefresh block to the end of the script to ensure UI components are rendered before rerunning."
+  ],
+  "results": "Fixed the regression where the app would enter an infinite loop during query execution, preventing results from loading and UI components from appearing. Verified via existing test suite."
+}

wherewolf-0.3.0/docs/agent_conversations/2026-04-21_schema_hud.json ADDED Viewed

@@ -0,0 +1,22 @@
+{
+  "date": "2026-04-21",
+  "task_objective": "Implement Schema & Metadata HUD",
+  "files_modified": [
+    "src/wherewolf/execution/duckdb_engine.py",
+    "src/wherewolf/execution/spark_engine.py",
+    "src/wherewolf/app.py",
+    "tests/test_duckdb_engine.py",
+    "tests/test_spark_engine.py"
+  ],
+  "tests_added": [
+    "tests/test_duckdb_engine.py:test_duckdb_get_schema",
+    "tests/test_spark_engine.py:test_spark_get_schema"
+  ],
+  "design_decisions": [
+    "Extracted _register_view in engines for reusability.",
+    "Added get_schema method to engines returning a normalized pd.DataFrame.",
+    "Integrated schema HUD into Streamlit sidebar with automatic refreshing.",
+    "Used dict-based DataFrame initialization to satisfy strict 'ty' type checks."
+  ],
+  "results": "Successfully implemented and verified Schema HUD for both DuckDB and Spark engines. All tests and type checks pass."
+}

wherewolf-0.3.0/docs/plans/code_review_fixes.md ADDED Viewed

@@ -0,0 +1,55 @@
+# Plan: Code Review Fixes Implementation
+## Problem Definition
+The code review identified several critical and medium-priority issues:
+1. SQL Injection vulnerability in `DuckDBEngine`.
+2. Non-functional query cancellation in the Streamlit UI.
+3. Performance bottleneck in `SparkEngine` due to redundant actions.
+4. Data loss risk in `HistoryManager` due to non-atomic writes.
+5. Inconsistent translation panel state in the UI.
+## Architecture Overview
+- **Infrastructure:** Update `HistoryManager` to use atomic filesystem operations.
+- **Core Logic:** Parametrize SQL in `DuckDBEngine` and optimize `SparkEngine` preview logic.
+- **UI:** Integrate `ThreadPoolExecutor` for background execution and fix state tracking for translations.
+## Dependency Requirements
+- `concurrent.futures.ThreadPoolExecutor` (stdlib)
+- `tempfile` (stdlib)
+- `pathlib` (stdlib)
+## Git Strategy
+- Branch: `feat/code-review-fixes`
+- Commit Frequency: Atomic commit per task.
+- Protocol: `run.sh uv run pytest` and `run.sh ruff` before each commit.
+## Phased Approach
+### Phase 1: DuckDB SQL Injection Fix
+- **Task 1.1:** Create `tests/test_duckdb_sql_injection.py` reproducing the issue.
+- **Task 1.2:** Implement parametrized query in `src/wherewolf/execution/duckdb_engine.py`.
+- **Task 1.3:** Verify with tests.
+### Phase 2: History Atomic Writes
+- **Task 2.1:** Create `tests/test_history_atomicity.py`.
+- **Task 2.2:** Update `src/wherewolf/storage/history.py` to use `tempfile` and `os.replace`.
+- **Task 2.3:** Verify with tests.
+### Phase 3: Spark Engine Optimization
+- **Task 3.1:** Create `tests/test_spark_engine_optimization.py`.
+- **Task 3.2:** Optimize `src/wherewolf/execution/spark_engine.py`.
+- **Task 3.3:** Verify with tests and benchmark.
+### Phase 4: UI Cancellation
+- **Task 4.1:** Create `tests/test_app_cancel.py` (using mocks).
+- **Task 4.2:** Refactor `src/wherewolf/app.py` to use `ThreadPoolExecutor`.
+- **Task 4.3:** Verify with tests.
+### Phase 5: UI Translation State
+- **Task 5.1:** Create `tests/test_app_translation_state.py`.
+- **Task 5.2:** Update `src/wherewolf/app.py` to track executed query state.
+- **Task 5.3:** Verify with tests.
+### Phase 6: Final Validation
+- **Task 6.1:** Run full test suite.
+- **Task 6.2:** Run Principal Engineer Code Review.

wherewolf-0.3.0/docs/plans/excel_support.md ADDED Viewed

@@ -0,0 +1,37 @@
+# Plan: Excel File Support
+## Problem Definition
+Users want to query Excel files (`.xlsx`, `.xls`) directly within Wherewolf, similar to how they query CSV, Parquet, and JSON files.
+## Architecture Overview
+- **DuckDBEngine:** Utilize the DuckDB `excel` extension. This requires running `INSTALL excel; LOAD excel;` and using `excel_scan(?)`.
+- **SparkEngine:** Since native Spark Excel support requires external JARs (e.g., `spark-excel`), and Wherewolf runs in a local environment, we will use `pandas` to read the Excel file and then convert it to a Spark DataFrame as a fallback for this engine.
+- **FileBrowser:** Update the allowed extensions to include `.xlsx` and `.xls`.
+## Core Data Structures
+No changes to `QueryResult`.
+## Public Interfaces
+- No API changes.
+- UI: `FileBrowser.render_explorer` will now permit loading of Excel files.
+## Dependency Requirements
+- `openpyxl` (already present for export)
+- `duckdb` (already present, requires extension install at runtime)
+- `pandas` (already present)
+## Testing Strategy
+- Create `tests/test_excel_support.py`.
+- Verify DuckDB can read a sample `.xlsx`.
+- Verify Spark can read a sample `.xlsx`.
+- Verify UI logic allows the extensions.
+## Task Decomposition
+- **Task 1: DuckDB Excel Logic**
+  - Implement extension loading and `excel_scan` in `src/wherewolf/execution/duckdb_engine.py`.
+- **Task 2: Spark Excel Logic**
+  - Implement pandas-based bridge for Excel in `src/wherewolf/execution/spark_engine.py`.
+- **Task 3: UI Extension Update**
+  - Update `src/wherewolf/ui/file_browser.py` to include `.xlsx` and `.xls`.
+- **Task 4: Verification**
+  - Add and run tests.

wherewolf-0.3.0/docs/plans/schema_hud.md ADDED Viewed

@@ -0,0 +1,72 @@
+# Plan: Schema & Metadata HUD
+## Problem Definition
+Users currently have no visual way to see the column names and data types of a loaded dataset without running a manual query like `SELECT * FROM dataset LIMIT 1`. This leads to friction when writing queries, especially for datasets with many or complex column names.
+## Architecture Overview
+The Schema HUD will be integrated into the existing engine-based architecture:
+1.  **Engines (`DuckDBEngine`, `SparkEngine`)**: Will be extended with a `get_schema(path: str)` method that returns a summary of the dataset's structure.
+2.  **UI (`app.py`)**: Will trigger a schema fetch when a new file is loaded and display the results in a "Schema Preview" section in the sidebar.
+3.  **State Management**: The schema will be cached in `st.session_state` to avoid redundant engine calls.
+## Core Data Structures
+The schema will be represented as a `pandas.DataFrame` with the following columns:
+-   `Column`: The name of the field.
+-   `Type`: The detected data type (e.g., `VARCHAR`, `INTEGER`, `struct`).
+## Public Interfaces
+-   `DuckDBEngine.get_schema(path: str) -> pd.DataFrame`: Uses `DESCRIBE dataset` or native relation metadata.
+-   `SparkEngine.get_schema(path: str) -> pd.DataFrame`: Uses `df.schema` or `DESCRIBE dataset`.
+## Dependency Requirements
+-   Existing dependencies (`duckdb`, `pyspark`, `pandas`, `streamlit`) are sufficient.
+## Implementation Plan
+### Phase 1: Engine Enhancements
+-   [ ] **Refactor `DuckDBEngine`**:
+    -   Extract file-to-view registration logic into a private `_register_view(path)` method.
+    -   Implement `get_schema(path)`: registers the view and runs `DESCRIBE dataset`.
+-   [ ] **Refactor `SparkEngine`**:
+    -   Extract file-to-view registration logic into a private `_register_view(path)` method.
+    -   Implement `get_schema(path)`: registers the view and runs `DESCRIBE dataset`.
+### Phase 2: UI Integration
+-   [ ] **Update `app.py` Session State**:
+    -   Initialize `st.session_state.schema = None`.
+-   [ ] **Update Path Processing**:
+    -   When `pending_path` is detected, trigger `get_schema` using the currently selected engine.
+    -   Store the result in `st.session_state.schema`.
+-   [ ] **Add Sidebar HUD**:
+    -   Add an `st.expander("📊 Schema Preview")` in the sidebar below the "Active Path" info.
+    -   Display the schema DataFrame if available.
+### Phase 3: Robustness & Polishing
+-   [ ] Handle edge cases (e.g., empty files, unsupported formats) gracefully within `get_schema`.
+-   [ ] Ensure `get_schema` is non-blocking or fast enough for the UI (metadata-only operations are typically very fast).
+## Testing Strategy
+-   **Unit Tests**:
+    -   `tests/test_duckdb_engine.py`: Add `test_get_schema` verifying column names/types for CSV and Parquet.
+    -   `tests/test_spark_engine.py`: Add `test_get_schema` verifying column names/types for CSV and Parquet.
+-   **Integration Tests**:
+    -   Verify that switching engines refreshes the schema HUD correctly.
+    -   Verify that the HUD persists across query executions.
+## Git & Workflow
+-   **Feature Branch**: Create a new branch `feat/schema-hud`.
+-   **Commits**: Use imperative style (e.g., "Add get_schema to DuckDBEngine").
+-   **Finalization**: Merge to `main` (if requested) or leave for review.
+## Verification
+-   [ ] `uv run pytest` passes.
+-   [ ] `ruff check . --fix` and `ruff format .` pass.
+-   [ ] `ty check .` (or `uv run ty`) passes if applicable.
+-   [ ] Manual verification: Load a Parquet file and confirm columns appear in the sidebar.
+## Definition of Done
+- [ ] Tests pass.
+- [ ] Linter/Formatter pass.
+- [ ] `ty` check passes.
+- [ ] Session log recorded in `docs/agent_conversations/`.
+- [ ] README updated if necessary (not needed for this internal UI feature).

{wherewolf-0.2.2 → wherewolf-0.3.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "wherewolf"
-version = "0.2.2"
+version = "0.3.0"
 requires-python = ">=3.11"
 dependencies = [
     "duckdb>=1.5.0",

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/app.py RENAMED Viewed

@@ -7,6 +7,13 @@ from wherewolf.storage import HistoryManager
 from wherewolf.export import Exporter
 from wherewolf.ui import FileBrowser
 from streamlit_ace import st_ace
+import importlib.metadata
+# Get version from metadata
+try:
+    __version__ = importlib.metadata.version("wherewolf")
+except importlib.metadata.PackageNotFoundError:
+    __version__ = "0.3.0"  # Fallback for dev runs
 # --- Configuration ---
 st.set_page_config(
@@ -24,11 +31,43 @@ hide_st_style = """
             footer {visibility: hidden;}
             /* Hide the Deploy button specifically */
             .stAppDeployButton {display: none;}
             /* Darken the sidebar */
             [data-testid="stSidebar"] {
                 background-color: #000000;
             }
+            /* Add back some top padding for main content */
+            .main .block-container, .block-container {
+                padding-top: 4rem !important;
+                margin-top: 0rem !important;
+            }
+            /* Aggressively remove top padding for sidebar */
+            [data-testid="stSidebar"] section {
+                padding-top: 0rem !important;
+            }
+            [data-testid="stSidebar"] [data-testid="stVerticalBlock"] {
+                padding-top: 0rem !important;
+            }
+            /* Specific fix for sidebar header whitespace */
+            [data-testid="stSidebarHeader"], .st-emotion-cache-10p9htt {
+                height: 3rem !important;
+                min-height: 3rem !important;
+                margin-bottom: 0rem !important;
+                padding-top: 0rem !important;
+            }
+            /* Make primary buttons green */
+            button[kind="primary"] {
+                background-color: #28a745 !important;
+                border-color: #28a745 !important;
+                color: white !important;
+            }
+            button[kind="primary"]:hover {
+                background-color: #218838 !important;
+                border-color: #1e7e34 !important;
+            }
             </style>
             """
 st.markdown(hide_st_style, unsafe_allow_html=True)
@@ -64,6 +103,12 @@ if "active_engine" not in st.session_state:
     st.session_state.active_engine = None
 if "query_future" not in st.session_state:
     st.session_state.query_future = None
+if "schema" not in st.session_state:
+    st.session_state.schema = None
+if "last_schema_path" not in st.session_state:
+    st.session_state.last_schema_path = ""
+if "last_schema_engine" not in st.session_state:
+    st.session_state.last_schema_engine = ""
 # --- Early State Update Pattern ---
 # This avoids StreamlitAPIException by updating state BEFORE widgets are instantiated.
@@ -91,9 +136,12 @@ with st.sidebar:
     st.markdown(
         f"""
-        <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 20px;">
+        <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 20px; position: relative;">
             <img src="data:image/png;base64,{logo_b64}" width="60">
-            <h1 style="margin: 0; white-space: nowrap; font-size: 2.2rem;">Wherewolf</h1>
+            <div>
+                <h1 style="margin: 0; white-space: nowrap; font-size: 2.2rem;">Wherewolf</h1>
+                <p style="margin: 0; font-size: 0.8rem; color: #666; position: absolute; bottom: -12px; left: 72px;">v{__version__}</p>
+            </div>
         </div>
         """,
         unsafe_allow_html=True,
@@ -117,6 +165,36 @@ with st.sidebar:
     engine_name = st.selectbox("Execution Engine", ["DuckDB", "Spark"])
+    # --- Schema HUD Logic ---
+    if st.session_state.path_input:
+        # Refresh schema if path or engine changed
+        if (
+            st.session_state.path_input != st.session_state.last_schema_path
+            or engine_name != st.session_state.last_schema_engine
+        ):
+            try:
+                if engine_name == "DuckDB":
+                    temp_engine = DuckDBEngine()
+                else:
+                    temp_engine = SparkEngine()
+                st.session_state.schema = temp_engine.get_schema(st.session_state.path_input)
+                st.session_state.last_schema_path = st.session_state.path_input
+                st.session_state.last_schema_engine = engine_name
+            except Exception as e:
+                st.session_state.schema = None
+                st.sidebar.error(f"Failed to fetch schema: {e}")
+        if st.session_state.schema is not None and not st.session_state.schema.empty:
+            with st.expander("📊 Schema Preview", expanded=True):
+                st.dataframe(
+                    st.session_state.schema,
+                    hide_index=True,
+                    width="stretch",
+                    height=200,
+                )
+        elif st.session_state.schema is not None:
+            st.caption("No columns detected.")
     # Auto-align input dialect if engine changes
     if st.session_state.last_engine_name != engine_name:
         st.session_state.input_dialect_ui = engine_name
@@ -178,61 +256,36 @@ with st.sidebar:
         index=themes.index("dracula"),
     )
-# --- Main Area ---
-col_h1, col_h2 = st.columns([0.7, 0.3])
-with col_h1:
-    st.header("SQL Editor")
-with col_h2:
-    input_dialect_ui = st.selectbox(
-        "Input Dialect", options=["DuckDB", "Spark", "Azure SQL"], key="input_dialect_ui"
-    )
-# --- Autorefresh while running ---
-if st.session_state.is_running and "PYTEST_CURRENT_TEST" not in os.environ:
-    import time
-    time.sleep(0.1)
-    st.rerun()
-# Use st_ace for syntax highlighting
-query_text = st_ace(
-    value=st.session_state.selected_query,
-    language="sql",
-    theme=ace_theme,
-    height=300,
-    key="sql_editor",
-    auto_update=True,
-)
+# Use a container-like column to force alignment of editor and buttons
+main_col, _ = st.columns([0.99, 0.01])
+with main_col:
+    # Dialect selector right-aligned within the main column
+    _, col_h2 = st.columns([0.7, 0.3])
+    with col_h2:
+        input_dialect_ui = st.selectbox(
+            "Input Dialect", options=["DuckDB", "Spark", "Azure SQL"], key="input_dialect_ui"
+        )
-col1, col2 = st.columns([0.1, 0.9])
-with col1:
-    run_button = st.button(
-        "🚀 Run",
-        type="primary",
-        disabled=st.session_state.is_running or not st.session_state.path_input,
+    # Use st_ace for syntax highlighting
+    query_text = st_ace(
+        value=st.session_state.selected_query,
+        language="sql",
+        theme=ace_theme,
+        height=300,
+        key="sql_editor",
+        auto_update=True,
     )
-with col2:
-    cancel_button = st.button("🛑 Cancel", disabled=not st.session_state.is_running)
-# --- Execution Logic ---
-# Handle completion of background query
-if st.session_state.query_future and st.session_state.query_future.done():
-    try:
-        result = st.session_state.query_future.result()
-        st.session_state.query_result = result
-        if result.success:
-            history_manager.add_entry(
-                st.session_state.last_engine_name.lower(),
-                st.session_state.executed_query,
-                st.session_state.path_input,
-            )
-    except Exception as e:
-        st.session_state.query_result = QueryResult(success=False, error_message=str(e))
-    st.session_state.is_running = False
-    st.session_state.query_future = None
-    st.session_state.active_engine = None
-    st.rerun()
+    # Button row inside the same alignment context
+    col_b1, col_b2, col_b3 = st.columns([0.12, 0.12, 0.76])
+    with col_b1:
+        run_button = st.button(
+            "Run",
+            type="primary",
+            disabled=st.session_state.is_running or not st.session_state.path_input,
+        )
+    with col_b2:
+        cancel_button = st.button("Cancel", disabled=not st.session_state.is_running)
 if run_button and st.session_state.path_input:
     if engine_name == "DuckDB":
@@ -287,6 +340,26 @@ if cancel_button and st.session_state.active_engine:
     st.warning("Query cancelled.")
     st.rerun()
+# --- Execution Logic ---
+# Handle completion of background query
+if st.session_state.query_future and st.session_state.query_future.done():
+    try:
+        result = st.session_state.query_future.result()
+        st.session_state.query_result = result
+        if result.success:
+            history_manager.add_entry(
+                st.session_state.last_engine_name.lower(),
+                st.session_state.executed_query,
+                st.session_state.path_input,
+            )
+    except Exception as e:
+        st.session_state.query_result = QueryResult(success=False, error_message=str(e))
+    st.session_state.is_running = False
+    st.session_state.query_future = None
+    st.session_state.active_engine = None
+    st.rerun()
 # --- Results Display ---
 if st.session_state.query_result:
     result: QueryResult = st.session_state.query_result
@@ -323,7 +396,7 @@ if st.session_state.query_result:
                 from_dialect=executed_input_key,
                 to_dialect=target_dialect,
             )
-            with st.expander(f"✨ Translated SQL ({selected_target_ui})", expanded=True):
+            with st.expander(f"Translated SQL ({selected_target_ui})", expanded=True):
                 st.code(translated_sql, language="sql")
         except Exception as e:
             st.warning(f"Translation failed: {str(e)}")
@@ -374,3 +447,10 @@ if st.session_state.query_result:
 elif not st.session_state.path_input:
     st.info("👈 Please provide a dataset path in the sidebar to begin.")
+# --- Autorefresh while running ---
+if st.session_state.is_running and "PYTEST_CURRENT_TEST" not in os.environ:
+    import time
+    time.sleep(0.1)
+    st.rerun()

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/execution/duckdb_engine.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import duckdb
 import time
+import pandas as pd
 from .models import QueryResult
@@ -9,6 +10,43 @@ class DuckDBEngine:
     def __init__(self):
         self.con = duckdb.connect(database=":memory:")
+    def _register_view(self, path: str):
+        """Registers the dataset as a view named 'dataset'."""
+        from pathlib import Path
+        abs_path = Path(path).expanduser().resolve()
+        suffix = abs_path.suffix.lower()
+        if suffix == ".csv":
+            rel_source = self.con.from_csv_auto(str(abs_path))
+        elif suffix == ".parquet":
+            rel_source = self.con.from_parquet(str(abs_path))
+        elif suffix == ".json":
+            rel_source = self.con.sql("SELECT * FROM read_json_auto(?)", params=[str(abs_path)])
+        elif suffix in [".xlsx", ".xls"]:
+            self.con.execute("INSTALL excel; LOAD excel;")
+            rel_source = self.con.sql("SELECT * FROM read_xlsx(?)", params=[str(abs_path)])
+        else:
+            rel_source = self.con.from_csv_auto(str(abs_path))
+        rel_source.create_view("dataset", replace=True)
+    def get_schema(self, path: str) -> pd.DataFrame:
+        """Returns the schema of the dataset.
+        Returns:
+            A DataFrame with 'Column' and 'Type' columns.
+        """
+        try:
+            self._register_view(path)
+            # DESCRIBE dataset returns: column_name, column_type, null, key, default, extra
+            df = self.con.sql("DESCRIBE dataset").df()
+            # Normalize to Column/Type
+            return df[["column_name", "column_type"]].rename(
+                columns={"column_name": "Column", "column_type": "Type"}
+            )
+        except Exception:
+            return pd.DataFrame({"Column": [], "Type": []})
     def execute(self, query: str, path: str, limit: int = 1000) -> QueryResult:
         """Executes a SQL query against a local file using DuckDB.
@@ -20,31 +58,10 @@ class DuckDBEngine:
         Returns:
             A QueryResult object.
         """
-        from pathlib import Path
-        abs_path = Path(path).expanduser().resolve()
         start_time = time.time()
         try:
             # 1. Register the dataset view
-            # DuckDB automatically detects CSV, Parquet, JSON based on extension or content
-            # Using Relation API to safely handle paths with special characters
-            suffix = abs_path.suffix.lower()
-            if suffix == ".csv":
-                rel_source = self.con.from_csv_auto(str(abs_path))
-            elif suffix == ".parquet":
-                rel_source = self.con.from_parquet(str(abs_path))
-            elif suffix == ".json":
-                # Use SQL with read_json_auto to avoid ty check warning about missing attribute
-                rel_source = self.con.sql("SELECT * FROM read_json_auto(?)", params=[str(abs_path)])
-            elif suffix in [".xlsx", ".xls"]:
-                # Official DuckDB excel extension
-                self.con.execute("INSTALL excel; LOAD excel;")
-                rel_source = self.con.sql("SELECT * FROM read_xlsx(?)", params=[str(abs_path)])
-            else:
-                # Fallback to auto-detection
-                rel_source = self.con.from_csv_auto(str(abs_path))
-            rel_source.create_view("dataset", replace=True)
+            self._register_view(path)
             # 2. Execute the user query
             # We wrap the user query to handle limits for the preview

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/execution/spark_engine.py RENAMED Viewed

@@ -1,4 +1,5 @@
 import time
+import pandas as pd
 from .models import QueryResult
 try:
@@ -23,42 +24,64 @@ class SparkEngine:
             self.spark = SparkSession.builder.appName("Wherewolf").master("local[*]").getOrCreate()
         return self.spark
+    def _register_view(self, path: str):
+        """Registers the dataset as a view named 'dataset'."""
+        import os
+        spark = self._get_session()
+        abs_path = os.path.abspath(path)
+        # Determine format by extension (basic detection)
+        if abs_path.endswith(".csv"):
+            df_spark = (
+                spark.read.option("header", "true").option("inferSchema", "true").csv(abs_path)
+            )
+        elif abs_path.endswith(".parquet"):
+            df_spark = spark.read.parquet(abs_path)
+        elif abs_path.endswith(".json"):
+            df_spark = spark.read.json(abs_path)
+        elif abs_path.endswith(".xlsx") or abs_path.endswith(".xls"):
+            # Use pandas as a bridge for Excel in local Spark
+            df_pd = pd.read_excel(abs_path)
+            df_spark = spark.createDataFrame(df_pd)
+        else:
+            raise ValueError(f"Unsupported file format for path: {abs_path}")
+        # 2. Register temp view
+        df_spark.createOrReplaceTempView("dataset")
+        return df_spark
+    def get_schema(self, path: str) -> pd.DataFrame:
+        """Returns the schema of the dataset.
+        Returns:
+            A DataFrame with 'Column' and 'Type' columns.
+        """
+        if not SPARK_AVAILABLE:
+            return pd.DataFrame({"Column": [], "Type": []})
+        try:
+            df_spark = self._register_view(path)
+            # Spark schema to pandas
+            schema_data = []
+            for field in df_spark.schema:
+                schema_data.append({"Column": field.name, "Type": field.dataType.simpleString()})
+            return pd.DataFrame(schema_data)
+        except Exception:
+            return pd.DataFrame({"Column": [], "Type": []})
     def execute(self, query: str, path: str, limit: int = 1000) -> QueryResult:
         if not SPARK_AVAILABLE:
             return QueryResult(success=False, error_message="PySpark not installed")
-        import os
-        abs_path = os.path.abspath(path)
         start_time = time.time()
         try:
             spark = self._get_session()
-            # 1. Read the dataset
-            # Determine format by extension (basic detection)
-            if abs_path.endswith(".csv"):
-                df_spark = (
-                    spark.read.option("header", "true").option("inferSchema", "true").csv(abs_path)
-                )
-            elif abs_path.endswith(".parquet"):
-                df_spark = spark.read.parquet(abs_path)
-            elif abs_path.endswith(".json"):
-                df_spark = spark.read.json(abs_path)
-            elif abs_path.endswith(".xlsx") or abs_path.endswith(".xls"):
-                # Use pandas as a bridge for Excel in local Spark
-                import pandas as pd
-                df_pd = pd.read_excel(abs_path)
-                df_spark = spark.createDataFrame(df_pd)
-            else:
-                # Default to automatic detection if supported,
-                # but Spark is less automatic than DuckDB
-                raise ValueError(f"Unsupported file format for path: {abs_path}")
-            # 2. Register temp view
-            df_spark.createOrReplaceTempView("dataset")
-            # 3. Execute query
+            # 1. Register the dataset view
+            self._register_view(path)
+            # 2. Execute query
             res_spark = spark.sql(query)
             # 4. Fetch the preview + 1 extra row to see if there's more

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/ui/file_browser.py RENAMED Viewed

@@ -93,7 +93,7 @@ class FileBrowser:
             if is_valid:
                 st.success(f"📄 Ready to load: `{selected_file}`")
-                if st.button("🚀 Load This File", use_container_width=True, type="primary"):
+                if st.button("Load This File", width="stretch", type="primary"):
                     return full_path
             else:
                 st.warning(f"⚠️ `{selected_file}` is not a supported data format.")

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_app.py RENAMED Viewed

@@ -9,5 +9,4 @@ def test_app_initialization():
     # Assert basic UI elements exist
     assert any("Wherewolf" in m.value for m in at.sidebar.markdown)
-    assert at.header[0].value == "SQL Editor"
     assert not at.exception

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_app_cancel.py RENAMED Viewed

@@ -17,7 +17,7 @@ def test_app_cancel_logic_mocked():
         at.run()
         # Find Cancel button
-        cancel_btn = next(b for b in at.button if b.label == "🛑 Cancel")
+        cancel_btn = next(b for b in at.button if b.label == "Cancel")
         assert not cancel_btn.disabled
         cancel_btn.click().run()

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_app_flow.py RENAMED Viewed

@@ -17,7 +17,7 @@ def test_app_query_execution_flow(tmp_path):
     at.run()
     # 3. Trigger 'Run' button
-    run_btn = next(b for b in at.button if b.label == "🚀 Run")
+    run_btn = next(b for b in at.button if b.label == "Run")
     run_btn.click().run()
     # 3.5 Run again to process the completed future

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_duckdb_engine.py RENAMED Viewed

@@ -34,6 +34,17 @@ def test_duckdb_engine_failure(csv_path):
     assert result.error_message != ""
+def test_duckdb_get_schema(csv_path):
+    engine = DuckDBEngine()
+    schema_df = engine.get_schema(csv_path)
+    assert isinstance(schema_df, pd.DataFrame)
+    # DuckDB's DESCRIBE returns many columns, but our HUD should normalize to ["Column", "Type"]
+    assert list(schema_df.columns) == ["Column", "Type"]
+    assert "name" in schema_df["Column"].values
+    assert "value" in schema_df["Column"].values
 @pytest.mark.skip(reason="Spark requires complex setup for CI, focus on DuckDB first")
 def test_spark_engine_success(csv_path):
     engine = SparkEngine()

wherewolf-0.3.0/tests/test_spark_engine.py ADDED Viewed

@@ -0,0 +1,27 @@
+import pandas as pd
+import pytest
+from wherewolf.execution.spark_engine import SparkEngine, SPARK_AVAILABLE
+@pytest.fixture
+def csv_path(tmp_path):
+    path = tmp_path / "test.csv"
+    df = pd.DataFrame({"name": ["alice", "bob", "charlie"], "value": [100, 200, 300]})
+    df.to_csv(path, index=False)
+    return str(path)
+@pytest.mark.skipif(not SPARK_AVAILABLE, reason="PySpark not installed")
+def test_spark_get_schema(csv_path):
+    engine = SparkEngine()
+    schema_df = engine.get_schema(csv_path)
+    assert isinstance(schema_df, pd.DataFrame)
+    assert list(schema_df.columns) == ["Column", "Type"]
+    assert "name" in schema_df["Column"].values
+    assert "value" in schema_df["Column"].values
+def test_spark_engine_init():
+    engine = SparkEngine()
+    assert engine is not None

{wherewolf-0.2.2 → wherewolf-0.3.0}/uv.lock RENAMED Viewed

@@ -2008,7 +2008,7 @@ wheels = [
 [[package]]
 name = "wherewolf"
-version = "0.2.1"
+version = "0.2.2"
 source = { editable = "." }
 dependencies = [
     { name = "duckdb" },

wherewolf-0.2.2/tests/test_spark_engine.py DELETED Viewed

@@ -1,6 +0,0 @@
-from wherewolf.execution.spark_engine import SparkEngine
-def test_spark_engine_init():
-    engine = SparkEngine()
-    assert engine is not None

{wherewolf-0.2.2 → wherewolf-0.3.0}/.envrc RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/.github/workflows/ci.yml RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/.github/workflows/release.yml RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/.gitignore RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/.protocol RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/.streamlit/config.toml RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/AGENTS.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/LICENSE RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/RELEASING.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/agent_conversations/2026-03-09_file_browsing.json RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/agent_conversations/2026-03-09_initial_implementation.json RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/agent_conversations/2026-04-19_github_setup.json RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/agent_conversations/2026-04-19_tag_v0.1.0.json RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/agent_protocol.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/execution_ledger.json RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/core_system_design.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/execution_engines.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/export_formats.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/file_browsing.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/file_browsing_v2.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/github_automation.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/initial_prompt.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/storage_and_history.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/docs/plans/streamlit_ui.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/run.sh RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/scripts/check_tdd.sh RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/scripts/take_screenshot.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/assets/img/screenshot.png RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/assets/img/wherewolf_banner.png RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/assets/img/wherewolf_logo.png RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/cli.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/execution/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/execution/models.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/export/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/export/exporter.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/storage/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/storage/history.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/translation/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/translation/translator.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/src/wherewolf/ui/__init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/reproduce_issue_symlink.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/reproduce_issue_symlink_v2.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/sample_queries.md RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_cli.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_config_toml.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_data.csv RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_duckdb_sql_injection.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_excel_support.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_execution___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_export___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_exporter.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_file_browser.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_file_browser_errors.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_file_browser_navigation.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_file_browser_v2.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_history.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_history_atomicity.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_models.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_protocol.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_spark_engine_logic.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_spark_engine_optimization.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_storage___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_translation___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_translation_integration.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_translator.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_ui___init__.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_ui_branding.py RENAMED Viewed

File without changes

{wherewolf-0.2.2 → wherewolf-0.3.0}/tests/test_wherewolf___init__.py RENAMED Viewed

File without changes

wherewolf 0.2.2__tar.gz → 0.3.0__tar.gz

wherewolf 0.2.2tar.gz → 0.3.0tar.gz