PyPI - recursive-cleaner - Versions diffs - 0.7.1__tar.gz → 0.8.0__tar.gz - Mend

recursive-cleaner 0.7.1tar.gz → 0.8.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

{recursive_cleaner-0.7.1 → recursive_cleaner-0.8.0}/CLAUDE.md RENAMED Viewed

@@ -4,7 +4,9 @@
 | Version | Status | Date |
 |---------|--------|------|
-| v0.6.0 | **Implemented** | 2025-01-15 |
+| v0.8.0 | **Implemented** | 2025-01-19 |
+| v0.7.0 | Implemented | 2025-01-17 |
+| v0.6.0 | Implemented | 2025-01-15 |
 | v0.5.1 | Implemented | 2025-01-15 |
 | v0.5.0 | Implemented | 2025-01-15 |
 | v0.4.0 | Implemented | 2025-01-15 |
@@ -12,9 +14,11 @@
 | v0.2.0 | Implemented | 2025-01-14 |
 | v0.1.0 | Implemented | 2025-01-14 |
-**Current State**: v0.6.0 complete. 392 tests passing, 2,967 lines total.
+**Current State**: v0.8.0 complete. 465 tests passing.
 ### Version History
+- **v0.8.0**: Terminal UI with Rich dashboard, mission control aesthetic, transmission log
+- **v0.7.0**: Markitdown integration (20+ formats), Parquet support, LLM-generated parsers
 - **v0.6.0**: Latency metrics, import consolidation, cleaning report, dry-run mode
 - **v0.5.1**: Dangerous code detection (AST-based security)
 - **v0.5.0**: Two-pass optimization with LLM agency (consolidation, early termination)
@@ -69,6 +73,8 @@ cleaner = DataCleaner(
     # Observability (v0.6.0)
     report_path="cleaning_report.md",  # Generate markdown report (None to disable)
     dry_run=False,  # Set True to analyze without generating functions
+    # Terminal UI (v0.8.0)
+    tui=True,  # Enable Rich dashboard (requires pip install recursive-cleaner[tui])
 )
 cleaner.run()  # Outputs: cleaning_functions.py, cleaning_report.md
@@ -159,6 +165,7 @@ recursive_cleaner/
     report.py            # Markdown report generation (~120 lines) [v0.6.0]
     response.py          # XML/markdown parsing + agency dataclasses (~292 lines)
     schema.py            # Schema inference (~117 lines) [v0.2.0]
+    tui.py               # Rich terminal dashboard (~520 lines) [v0.8.0]
     types.py             # LLMBackend protocol (~11 lines)
     validation.py        # Runtime validation + safety checks (~200 lines)
     vendor/
@@ -187,6 +194,7 @@ tests/                   # 392 tests
     test_sampling.py     # Sampling strategy tests [v0.4.0]
     test_schema.py       # Schema inference tests
     test_text_mode.py    # Text mode tests [v0.3.0]
+    test_tui.py          # Terminal UI tests [v0.8.0]
     test_validation.py   # Runtime validation + safety tests
     test_vendor_chunker.py  # Vendored chunker tests [v0.3.0]

{recursive_cleaner-0.7.1 → recursive_cleaner-0.8.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: recursive-cleaner
-Version: 0.7.1
+Version: 0.8.0
 Summary: LLM-powered incremental data cleaning pipeline that processes massive datasets in chunks and generates Python cleaning functions
 Project-URL: Homepage, https://github.com/gaztrabisme/recursive-data-cleaner
 Project-URL: Repository, https://github.com/gaztrabisme/recursive-data-cleaner
@@ -32,6 +32,8 @@ Provides-Extra: mlx
 Requires-Dist: mlx-lm>=0.10.0; extra == 'mlx'
 Provides-Extra: parquet
 Requires-Dist: pyarrow>=14.0.0; extra == 'parquet'
+Provides-Extra: tui
+Requires-Dist: rich>=13.0; extra == 'tui'
 Description-Content-Type: text/markdown
 # Recursive Data Cleaner
@@ -69,6 +71,11 @@ For Parquet files:
 pip install -e ".[parquet]"
 ```
+For Terminal UI (Rich dashboard):
+```bash
+pip install -e ".[tui]"
+```
 ## Quick Start
 ```python
@@ -126,6 +133,13 @@ cleaner.run()  # Generates cleaning_functions.py
 - **Parquet Support**: Load parquet files as structured data via pyarrow
 - **LLM-Generated Parsers**: Auto-generate parsers for XML and unknown formats (`auto_parse=True`)
+### Terminal UI (v0.8.0)
+- **Mission Control Dashboard**: Rich-based live terminal UI with retro aesthetic
+- **Real-time Progress**: Animated progress bars, chunk/iteration counters
+- **Transmission Log**: Parsed LLM responses showing issues detected and functions being generated
+- **Token Estimation**: Track estimated input/output tokens across the run
+- **Graceful Fallback**: Works without Rich installed (falls back to callbacks)
 ## Configuration
 ```python
@@ -160,6 +174,9 @@ cleaner = DataCleaner(
     # Format Expansion
     auto_parse=False,           # LLM generates parser for unknown formats
+    # Terminal UI
+    tui=True,                   # Enable Rich dashboard (requires [tui] extra)
     # Progress & State
     on_progress=callback,       # Progress event callback
     state_file="state.json",    # Enable resume on interrupt
@@ -265,6 +282,7 @@ recursive_cleaner/
 ├── report.py           # Markdown report generation
 ├── response.py         # XML/markdown parsing + agency dataclasses
 ├── schema.py           # Schema inference
+├── tui.py              # Rich terminal dashboard
 ├── validation.py       # Runtime validation + holdout
 └── vendor/
     └── chunker.py      # Vendored sentence-aware chunker
@@ -276,7 +294,7 @@ recursive_cleaner/
 pytest tests/ -v
 ```
-432 tests covering all features. Test datasets in `test_cases/`:
+465 tests covering all features. Test datasets in `test_cases/`:
 - E-commerce product catalogs
 - Healthcare patient records
 - Financial transaction data
@@ -292,6 +310,7 @@ pytest tests/ -v
 | Version | Features |
 |---------|----------|
+| v0.8.0 | Terminal UI with Rich dashboard, mission control aesthetic, transmission log |
 | v0.7.0 | Markitdown (20+ formats), Parquet support, LLM-generated parsers |
 | v0.6.0 | Latency metrics, import consolidation, cleaning report, dry-run mode |
 | v0.5.1 | Dangerous code detection (AST-based security) |

{recursive_cleaner-0.7.1 → recursive_cleaner-0.8.0}/README.md RENAMED Viewed

@@ -33,6 +33,11 @@ For Parquet files:
 pip install -e ".[parquet]"
 ```
+For Terminal UI (Rich dashboard):
+```bash
+pip install -e ".[tui]"
+```
 ## Quick Start
 ```python
@@ -90,6 +95,13 @@ cleaner.run()  # Generates cleaning_functions.py
 - **Parquet Support**: Load parquet files as structured data via pyarrow
 - **LLM-Generated Parsers**: Auto-generate parsers for XML and unknown formats (`auto_parse=True`)
+### Terminal UI (v0.8.0)
+- **Mission Control Dashboard**: Rich-based live terminal UI with retro aesthetic
+- **Real-time Progress**: Animated progress bars, chunk/iteration counters
+- **Transmission Log**: Parsed LLM responses showing issues detected and functions being generated
+- **Token Estimation**: Track estimated input/output tokens across the run
+- **Graceful Fallback**: Works without Rich installed (falls back to callbacks)
 ## Configuration
 ```python
@@ -124,6 +136,9 @@ cleaner = DataCleaner(
     # Format Expansion
     auto_parse=False,           # LLM generates parser for unknown formats
+    # Terminal UI
+    tui=True,                   # Enable Rich dashboard (requires [tui] extra)
     # Progress & State
     on_progress=callback,       # Progress event callback
     state_file="state.json",    # Enable resume on interrupt
@@ -229,6 +244,7 @@ recursive_cleaner/
 ├── report.py           # Markdown report generation
 ├── response.py         # XML/markdown parsing + agency dataclasses
 ├── schema.py           # Schema inference
+├── tui.py              # Rich terminal dashboard
 ├── validation.py       # Runtime validation + holdout
 └── vendor/
     └── chunker.py      # Vendored sentence-aware chunker
@@ -240,7 +256,7 @@ recursive_cleaner/
 pytest tests/ -v
 ```
-432 tests covering all features. Test datasets in `test_cases/`:
+465 tests covering all features. Test datasets in `test_cases/`:
 - E-commerce product catalogs
 - Healthcare patient records
 - Financial transaction data
@@ -256,6 +272,7 @@ pytest tests/ -v
 | Version | Features |
 |---------|----------|
+| v0.8.0 | Terminal UI with Rich dashboard, mission control aesthetic, transmission log |
 | v0.7.0 | Markitdown (20+ formats), Parquet support, LLM-generated parsers |
 | v0.6.0 | Latency metrics, import consolidation, cleaning report, dry-run mode |
 | v0.5.1 | Dangerous code detection (AST-based security) |

recursive_cleaner-0.8.0/demo_tui.py ADDED Viewed

@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+"""
+Demo script to showcase the Rich TUI with real MLX backend.
+Run with:
+    python demo_tui.py
+Requirements:
+    pip install recursive-cleaner[mlx,tui]
+"""
+from backends import MLXBackend
+from recursive_cleaner import DataCleaner
+# Use a smaller/faster model for demo (change to your preferred model)
+MODEL = "lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-4bit"
+print("=" * 60)
+print("  RECURSIVE DATA CLEANER - TUI DEMO")
+print("=" * 60)
+print(f"\nLoading model: {MODEL}")
+print("This may take a moment on first run...\n")
+llm = MLXBackend(
+    model_path=MODEL,
+    max_tokens=2048,
+    temperature=0.3,  # Lower for more consistent output
+    verbose=False,  # Disable token streaming to avoid interfering with TUI
+)
+cleaner = DataCleaner(
+    llm_backend=llm,
+    file_path="test_cases/ecommerce_products.jsonl",
+    chunk_size=5,  # Small chunks for demo
+    max_iterations=3,  # Limit iterations per chunk
+    instructions="""
+    E-commerce product data cleaning:
+    - Normalize prices to float (remove $ symbols)
+    - Fix category typos and normalize to Title Case
+    - Convert weights to kg as float
+    - Ensure stock_quantity is non-negative integer
+    """,
+    tui=True,  # Enable the Rich dashboard!
+    track_metrics=True,
+)
+print("\nStarting cleaner with TUI enabled...")
+print("Watch the dashboard below!\n")
+cleaner.run()
+print("\n" + "=" * 60)
+print("Demo complete! Check cleaning_functions.py for output.")
+print("=" * 60)

recursive_cleaner-0.8.0/docs/contracts/v080-api-contract.md ADDED Viewed

@@ -0,0 +1,62 @@
+# API Contract: Rich TUI (v0.8.0)
+## New Parameter
+```python
+DataCleaner(
+    ...,
+    tui: bool = False,  # Enable Rich terminal dashboard
+)
+```
+## Behavior Matrix
+| `tui` | Rich installed | Behavior |
+|-------|----------------|----------|
+| `False` | Any | Existing callback-based output (no change) |
+| `True` | Yes | Live dashboard replaces callback prints |
+| `True` | No | Warning logged, falls back to callbacks |
+## New Optional Dependency
+```toml
+[project.optional-dependencies]
+tui = ["rich>=13.0"]
+```
+```bash
+pip install recursive-cleaner[tui]
+```
+## TUI Module API
+### `recursive_cleaner/tui.py`
+```python
+# Check availability
+HAS_RICH: bool
+# Main renderer class
+class TUIRenderer:
+    def __init__(self, file_path: str, total_chunks: int, total_records: int)
+    def start(self) -> None
+    def stop(self) -> None
+    def update_chunk(self, chunk_index: int, iteration: int, max_iterations: int) -> None
+    def update_llm_status(self, status: str) -> None  # "calling" | "idle"
+    def add_function(self, name: str, docstring: str) -> None
+    def update_metrics(self, quality_delta: float, latency_last: float, latency_avg: float, latency_total: float, llm_calls: int) -> None
+    def show_complete(self, summary: dict) -> None
+```
+## Integration with DataCleaner
+When `tui=True` and Rich available:
+1. `on_progress` callback still fires (for logging, state tracking)
+2. TUI replaces console output, not callbacks
+3. TUI auto-stops on completion or error
+## No Breaking Changes
+- All existing parameters unchanged
+- All existing callbacks unchanged
+- `tui=False` (default) = identical to v0.7.0 behavior

recursive_cleaner-0.8.0/docs/contracts/v080-data-schema.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Data Schema: TUI Display State (v0.8.0)
+## Dashboard State
+```python
+@dataclass
+class TUIState:
+    # Header
+    file_path: str
+    total_records: int
+    version: str = "0.8.0"
+    # Progress
+    current_chunk: int = 0
+    total_chunks: int = 0
+    current_iteration: int = 0
+    max_iterations: int = 5
+    # LLM Status
+    llm_status: Literal["idle", "calling"] = "idle"
+    # Functions
+    functions: list[FunctionInfo] = field(default_factory=list)
+    # Metrics
+    quality_delta: float = 0.0  # Percentage improvement
+    latency_last_ms: float = 0.0
+    latency_avg_ms: float = 0.0
+    latency_total_ms: float = 0.0
+    llm_call_count: int = 0
+@dataclass
+class FunctionInfo:
+    name: str
+    docstring: str  # First 50 chars displayed
+```
+## Dashboard Layout Schema
+```
+┌─────────────────────────────────────────────────────────┐
+│  {file_path}                              v{version}    │  <- HEADER (size=3)
+├────────────────────┬────────────────────────────────────┤
+│  PROGRESS          │  FUNCTIONS ({len(functions)})      │  <- BODY
+│  [████░░░░░░] {%}  │  ├─ {functions[0].name}            │
+│  Chunk {cur}/{tot} │  ├─ {functions[1].name}            │
+│  Iter {i}/{max}    │  └─ {functions[2].name}            │
+│                    │      (+{n} more)                   │
+│  {spinner} {status}│  QUALITY: +{quality_delta}%        │
+├────────────────────┴────────────────────────────────────┤
+│  ⏱️ {latency_last}ms │ avg {latency_avg}ms │ {llm_calls} │  <- FOOTER (size=3)
+└─────────────────────────────────────────────────────────┘
+```
+## Color Scheme
+| Element | Color | Condition |
+|---------|-------|-----------|
+| Header title | cyan | Always |
+| Progress bar | yellow | In progress |
+| Progress bar | green | Chunk complete |
+| Spinner | yellow | LLM calling |
+| Function names | green | Always |
+| Quality delta | green | Positive |
+| Quality delta | red | Negative |
+| Latency | dim white | Always |
+## Spinner States
+| `llm_status` | Display |
+|--------------|---------|
+| `"calling"` | Animated spinner + "Calling LLM..." |
+| `"idle"` | Static checkmark or empty |
+## Completion Summary
+On `show_complete()`:
+```
+┌─────────────────────────────────────────────────────────┐
+│  ✓ COMPLETE                                             │
+├─────────────────────────────────────────────────────────┤
+│  Functions generated: {n}                               │
+│  Chunks processed: {total_chunks}                       │
+│  Quality improvement: +{quality_delta}%                 │
+│  Total time: {latency_total}ms ({llm_calls} LLM calls)  │
+│                                                         │
+│  Output: cleaning_functions.py                          │
+└─────────────────────────────────────────────────────────┘
+```

recursive_cleaner-0.8.0/docs/contracts/v080-success-criteria.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Success Criteria: Rich TUI (v0.8.0)
+## Project-Level Success
+- [ ] `pip install recursive-cleaner[tui]` installs rich>=13.0
+- [ ] `DataCleaner(..., tui=True)` shows live dashboard
+- [ ] Dashboard displays all state from data schema contract
+- [ ] Falls back gracefully when Rich not installed
+- [ ] All 432 existing tests pass
+- [ ] Zero breaking changes to existing API
+## Phase 1: Core TUI Module
+**Deliverables:**
+- [ ] `recursive_cleaner/tui.py` with `TUIRenderer` class
+- [ ] `HAS_RICH` check with graceful import
+- [ ] Basic `start()` / `stop()` lifecycle
+- [ ] Static layout matching schema (header, body split, footer)
+**Success Criteria:**
+- [ ] `from recursive_cleaner.tui import TUIRenderer, HAS_RICH` works
+- [ ] `TUIRenderer` can be instantiated without Rich (no crash)
+- [ ] With Rich: `start()` shows layout, `stop()` exits cleanly
+- [ ] Layout has correct sections per data schema
+**Tests:**
+- [ ] test_tui_import_without_rich
+- [ ] test_tui_renderer_lifecycle
+- [ ] test_tui_layout_structure
+## Phase 2: Dynamic Updates
+**Deliverables:**
+- [ ] `update_chunk()` updates progress bar and counters
+- [ ] `update_llm_status()` shows/hides spinner
+- [ ] `add_function()` appends to function list
+- [ ] `update_metrics()` updates footer stats
+**Success Criteria:**
+- [ ] Progress bar fills based on chunk_index/total_chunks
+- [ ] Spinner animates when status="calling", stops when "idle"
+- [ ] Functions list grows, shows "+N more" when >5 functions
+- [ ] Metrics panel shows formatted latency and counts
+**Tests:**
+- [ ] test_progress_updates
+- [ ] test_spinner_states
+- [ ] test_function_list_display
+- [ ] test_metrics_display
+## Phase 3: Integration & Polish
+**Deliverables:**
+- [ ] `tui=True` parameter on DataCleaner
+- [ ] Integration: TUI updates from cleaner loop
+- [ ] `show_complete()` with summary panel
+- [ ] Fallback warning when Rich not installed
+- [ ] Color transitions (yellow→green on chunk complete)
+**Success Criteria:**
+- [ ] Full cleaner run with `tui=True` shows live dashboard
+- [ ] Completion shows summary with all stats
+- [ ] `tui=True` without Rich logs warning, uses callbacks
+- [ ] Chunk completion triggers green color flash
+**Tests:**
+- [ ] test_datacleaner_tui_integration
+- [ ] test_tui_fallback_warning
+- [ ] test_completion_summary
+- [ ] test_color_transitions

recursive_cleaner-0.8.0/docs/implementation-plan-v080.md ADDED Viewed

@@ -0,0 +1,182 @@
+# Implementation Plan: Rich TUI (v0.8.0)
+## Overview
+Add optional Rich-based terminal dashboard for visual progress tracking during data cleaning runs.
+## Technology Stack
+| Layer | Choice | Rationale |
+|-------|--------|-----------|
+| TUI Library | Rich >=13.0 | Simple API, same author as Textual, 50KB |
+| Pattern | Live + Layout | Mission control style, update sections independently |
+| Fallback | Plain callbacks | Zero-dep baseline preserved |
+## Phase Breakdown
+### Phase 1: Core TUI Module
+**Objective:** Create standalone TUI renderer with basic layout.
+**Deliverables:**
+- [ ] `recursive_cleaner/tui.py` (~150 lines)
+- [ ] `tests/test_tui.py` (basic tests)
+- [ ] `pyproject.toml` update for `[tui]` extra
+**Implementation:**
+```python
+# tui.py structure
+try:
+    from rich.live import Live
+    from rich.layout import Layout
+    from rich.panel import Panel
+    HAS_RICH = True
+except ImportError:
+    HAS_RICH = False
+class TUIRenderer:
+    def __init__(self, file_path, total_chunks, total_records):
+        self._state = TUIState(...)
+        self._layout = self._make_layout() if HAS_RICH else None
+        self._live = None
+    def _make_layout(self):
+        layout = Layout()
+        layout.split_column(
+            Layout(name="header", size=3),
+            Layout(name="body"),
+            Layout(name="footer", size=3)
+        )
+        layout["body"].split_row(
+            Layout(name="progress"),
+            Layout(name="functions")
+        )
+        return layout
+    def start(self):
+        if not HAS_RICH:
+            return
+        self._live = Live(self._layout, refresh_per_second=2)
+        self._live.start()
+    def stop(self):
+        if self._live:
+            self._live.stop()
+```
+**Success Criteria:**
+- Import works with/without Rich
+- Layout renders with correct sections
+- Start/stop lifecycle works
+---
+### Phase 2: Dynamic Updates
+**Objective:** Wire up all state updates to visual components.
+**Deliverables:**
+- [ ] `update_chunk()` - progress bar + counters
+- [ ] `update_llm_status()` - spinner control
+- [ ] `add_function()` - function list panel
+- [ ] `update_metrics()` - footer stats
+- [ ] Additional tests for each update method
+**Implementation:**
+```python
+def update_chunk(self, chunk_index, iteration, max_iterations):
+    self._state.current_chunk = chunk_index
+    self._state.current_iteration = iteration
+    self._refresh_progress_panel()
+def _refresh_progress_panel(self):
+    progress = Progress(BarColumn(), TextColumn("{task.percentage:.0f}%"))
+    task = progress.add_task("", total=self._state.total_chunks)
+    progress.update(task, completed=self._state.current_chunk)
+    content = Group(
+        progress,
+        Text(f"Chunk {self._state.current_chunk}/{self._state.total_chunks}"),
+        Text(f"Iteration {self._state.current_iteration}/{self._state.max_iterations}"),
+        self._make_spinner()
+    )
+    self._layout["progress"].update(Panel(content, title="Progress"))
+```
+**Success Criteria:**
+- Progress bar animates smoothly
+- Spinner shows during LLM calls
+- Function list grows dynamically
+- Metrics update in real-time
+---
+### Phase 3: Integration & Polish
+**Objective:** Connect TUI to DataCleaner and add finishing touches.
+**Deliverables:**
+- [ ] `tui=True` parameter on DataCleaner.__init__
+- [ ] TUI updates from main processing loop
+- [ ] `show_complete()` summary panel
+- [ ] Fallback warning via logging
+- [ ] Color transitions on chunk completion
+- [ ] Integration tests
+**Implementation in cleaner.py:**
+```python
+def __init__(self, ..., tui: bool = False):
+    self.tui = tui
+    self._tui_renderer = None
+def run(self):
+    if self.tui:
+        from recursive_cleaner.tui import TUIRenderer, HAS_RICH
+        if HAS_RICH:
+            self._tui_renderer = TUIRenderer(...)
+            self._tui_renderer.start()
+        else:
+            import logging
+            logging.warning("tui=True but Rich not installed. pip install recursive-cleaner[tui]")
+    # ... existing loop with TUI updates injected ...
+    if self._tui_renderer:
+        self._tui_renderer.show_complete(summary)
+        self._tui_renderer.stop()
+```
+**Success Criteria:**
+- Full run with tui=True shows dashboard
+- Fallback logs warning, uses callbacks
+- Completion summary displays all stats
+- Green flash on chunk completion
+---
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| Terminal size too small | Low | Medium | Use `vertical_overflow="crop"` |
+| Rich version incompatibility | Low | Medium | Pin `>=13.0` (stable API) |
+| Performance overhead | Low | Low | refresh_per_second=2 is fine |
+## Out of Scope
+- Keyboard interactivity (pause/resume)
+- Mouse support
+- Scrollable function list
+- Custom themes
+- Textual upgrade path
+## File Changes Summary
+| File | Change |
+|------|--------|
+| `recursive_cleaner/tui.py` | NEW (~200 lines) |
+| `recursive_cleaner/cleaner.py` | Add `tui` param, TUI integration |
+| `recursive_cleaner/__init__.py` | Export TUIRenderer, HAS_RICH |
+| `pyproject.toml` | Add `[tui]` optional dependency |
+| `tests/test_tui.py` | NEW (~15 tests) |
+| `README.md` | Document TUI feature |

recursive-cleaner 0.7.1__tar.gz → 0.8.0__tar.gz

recursive-cleaner 0.7.1tar.gz → 0.8.0tar.gz